From Publisher’s Weekly:
Authors have now joined the growing ranks of concerned creators suing tech developers over their much-hyped generative AI technology. And a pair of copyright class action suits recently filed on behalf of authors is raising broader questions about the most effective way to protect creators and creative industries—including authors and publishers—from the potentially disruptive aspects of AI.
Filed on June 28 and July 7 by the Joseph Saveri Law Firm on behalf of five named plaintiffs (Mona Awad and Paul Tremblay in one case, and Christopher Golden, Richard Kadrey, and comedian Sarah Silverman in the other), the suits claim that Microsoft-backed OpenAI (creators of ChatGPT) and Meta (creators of LLaMA) infringed the authors’ copyrights by using unauthorized copies of their books to train their AI models, including copies allegedly scraped from notorious pirate sites. While the authors’ attorneys did not comment for this story, a spokesperson for the firm suggested to Ars Technica that, if left unchecked, AI models built with “stolen works” could eventually replace the authors they stole from, and framed the litigation as part of “a larger fight for preserving ownership rights for all artists and creators.”
The authors join a spectrum of increasingly concerned creators on whose behalf the Saveri law firm has filed similar copyright-based lawsuits in recent months. In November 2022, the firm filed suit against GitHub on behalf of a group of software developers. And in January, the firm sued three AI image generators on behalf of a group of artists. Those cases are still pending—and, like most copyright cases involving new technology, they have divided copyright experts. Those who lean in favor of the tech side claim that using unlicensed copyrighted works to train AI is fair use. Those on the content creator side argue that questions of ownership and provenance cannot simply be waved away without major, far-reaching implications.
Neither Meta nor OpenAI has yet responded to the author suits. But multiple copyright lawyers told PW on background that the claims likely face an uphill battle in court. Even if the suits get past the threshold issues associated with the alleged copying at issue and how AI training actually works—which is no sure thing—lawyers say there is ample case law to suggest fair use. For example, a recent case against plagiarism detector TurnItIn.com held that works could be ingested to create a database used to expose plagiarism by students. The landmark Kelly v. Arriba Soft case held that the reproduction and display of photos as thumbnails was fair use. And, in the publishing industry’s own backyard, there’s the landmark Google Books case. One lawyer noted that if Google’s bulk copying and display of tens of millions of books was comfortably found to be fair use, it’s hard to see how using books to train AI would not be, while also cautioning that fair use cases are notoriously fact-dependent and hard to predict.
“I just don’t see how these cases have legs,” one copyright lawyer bluntly told PW. “Look, I get it. Somebody has to make a test case. Otherwise there’s nothing but blogging and opinion pieces and stance-taking by proponents on either side. But I just think there’s too much established case law to support this kind of transformative use as a fair use.”
Cornell Law School professor James Grimmelmann—who has written extensively on the Google case and is now following AI developments closely—is also skeptical that the authors’ infringement cases can succeed, and concurred that AI developers have some “powerful precedents” to rely on. But he is also “a little more sympathetic in principle” to the idea that some AI models may be infringing. “The difference between AI and Google Books is that some AI models could emit infringing works, whereas snippet view in Google Books was designed to prevent output infringement,” he said. “That inflects the fair use analysis, although there are still a lot of factors pointing to transformative use.”
Whether the AI in question was trained using illegal copies from pirate sites could also be a complicating factor, Grimmelmann said. “There’s an orthodox copyright analysis that says if the output is not infringing, a transformative internal process is fair use,” he explained. Nevertheless, some courts will consider the source, he added, noting that the allegedly “unsavory origins” of the copies could factor into a court’s fair use analysis.
Link to the rest at Publisher’s Weekly