Judge rejects most ChatGPT copyright claims from book authors

From Ars Technica:

A US district judge in California has largely sided with OpenAI, dismissing the majority of claims raised by authors alleging that large language models powering ChatGPT were illegally trained on pirated copies of their books without their permission.

By allegedly repackaging original works as ChatGPT outputs, authors alleged, OpenAI’s most popular chatbot was just a high-tech “grift” that seemingly violated copyright laws, as well as state laws preventing unfair business practices and unjust enrichment.

According to judge Araceli Martínez-Olguín, authors behind three separate lawsuits—including Sarah Silverman, Michael Chabon, and Paul Tremblay—have failed to provide evidence supporting any of their claims except for direct copyright infringement.

OpenAI had argued as much in their promptly filed motion to dismiss these cases last August. At that time, OpenAI said that it expected to beat the direct infringement claim at a “later stage” of the proceedings.

Among copyright claims tossed by Martínez-Olguín were accusations of vicarious copyright infringement. Perhaps most significantly, Martínez-Olguín agreed with OpenAI that the authors’ allegation that “every” ChatGPT output “is an infringing derivative work” is “insufficient” to allege vicarious infringement, which requires evidence that ChatGPT outputs are “substantially similar” or “similar at all” to authors’ books.

“Plaintiffs here have not alleged that the ChatGPT outputs contain direct copies of the copyrighted books,” Martínez-Olguín wrote. “Because they fail to allege direct copying, they must show a substantial similarity between the outputs and the copyrighted materials.”

Authors also failed to convince Martínez-Olguín that OpenAI violated the Digital Millennium Copyright Act (DMCA) by allegedly removing copyright management information (CMI)—such as author names, titles of works, and terms and conditions for use of the work—from training data.

This claim failed because authors cited “no facts” that OpenAI intentionally removed the CMI or built the training process to omit CMI, Martínez-Olguín wrote. Further, the authors cited examples of ChatGPT referencing their names, which would seem to suggest that some CMI remains in the training data.

Arguing that OpenAI caused economic injury by unfairly repurposing authors’ works, even if authors could show evidence of a DMCA violation, authors could only speculate about what injury was caused, the judge said.

. . . .

The only claim under California’s unfair competition law that was allowed to proceed alleged that OpenAI used copyrighted works to train ChatGPT without authors’ permission. Because the state law broadly defines what’s considered “unfair,” Martínez-Olguín said that it’s possible that OpenAI’s use of the training data “may constitute an unfair practice.”

Remaining claims of negligence and unjust enrichment failed, Martínez-Olguín wrote, because authors only alleged intentional acts and did not explain how OpenAI “received and unjustly retained a benefit” from training ChatGPT on their works.

Authors have been ordered to consolidate their complaints and have until March 13 to amend arguments and continue pursuing any of the dismissed claims.

To shore up the tossed copyright claims, authors would likely need to provide examples of ChatGPT outputs that are similar to their works, as well as evidence of OpenAI intentionally removing CMI to “induce, enable, facilitate, or conceal infringement,” Martínez-Olguín wrote.

. . . .

As authors likely prepare to continue fighting OpenAI, the US Copyright Office has been fielding public input before releasing guidance that could one day help rights holders pursue legal claims and may eventually require works to be licensed from copyright owners for use as training materials. Among the thorniest questions is whether AI tools like ChatGPT should be considered authors when spouting outputs included in creative works.

While the Copyright Office prepares to release three reports this year “revealing its position on copyright law in relation to AI,” according to The New York Times, OpenAI recently made it clear that it does not plan to stop referencing copyrighted works in its training data. Last month, OpenAI said it would be “impossible” to train AI models without copyrighted materials, because “copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents.”

According to OpenAI, it doesn’t just need old copyrighted materials; it needs current copyright materials to ensure that chatbot and other AI tools’ outputs “meet the needs of today’s citizens.”

Link to the rest at Ars Technica

1 thought on “Judge rejects most ChatGPT copyright claims from book authors”

  1. Here’s a detailed disection of the odds against the plaintiffs:


    Bottom line:

    “I feel this argument is not about compensation. It is about control and consistency,” she said. “And this is a new model. So, when you’re seeing your name in places that you didn’t know it was going to be, and also, by the way, it might not be exactly what you wanted it to say, or the connection is going to challenge you, but it also might open a whole new audience or experience that you’ve never thought of. So, it’s going to require a lot of patience.

    “They’re thinking of the then and now, and they’re trying to protect the brand and likeness, but they are not thinking of the future economies that they could really be a part of.”

    Put less diplomatically: ego.

Comments are closed.