From Ars Technica:
This week, OpenAI finally responded to a pair of nearly identical class-action lawsuits from book authors—including Sarah Silverman, Paul Tremblay, Mona Awad, Chris Golden, and Richard Kadrey—who earlier this summer alleged that ChatGPT was illegally trained on pirated copies of their books.
In OpenAI’s motion to dismiss (filed in both lawsuits), the company asked a US district court in California to toss all but one claim alleging direct copyright infringement, which OpenAI hopes to defeat at “a later stage of the case.”
The authors’ other claims—alleging vicarious copyright infringement, violation of the Digital Millennium Copyright Act (DMCA), unfair competition, negligence, and unjust enrichment—need to be “trimmed” from the lawsuits “so that these cases do not proceed to discovery and beyond with legally infirm theories of liability,” OpenAI argued.
OpenAI claimed that the authors “misconceive the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”
According to OpenAI, even if the authors’ books were a “tiny part” of ChatGPT’s massive data set, “the use of copyrighted materials by innovators in transformative ways does not violate copyright.” Unlike plagiarists who seek to directly profit off distributing copyrighted materials, OpenAI argued that its goal was “to teach its models to derive the rules underlying human language” to do things like help people “save time at work,” “make daily life easier,” or simply entertain themselves by typing prompts into ChatGPT.
The purpose of copyright law, OpenAI argued, is “to promote the Progress of Science and useful Arts” by protecting the way authors express ideas, but “not the underlying idea itself, facts embodied within the author’s articulated message, or other building blocks of creative,” which are arguably the elements of authors’ works that would be useful to ChatGPT’s training model. Citing a notable copyright case involving Google Books, OpenAI reminded the court that “while an author may register a copyright in her book, the ‘statistical information’ pertaining to ‘word frequencies, syntactic patterns, and thematic markers’ in that book are beyond the scope of copyright protection.”
“Under the resulting judicial precedent, it is not an infringement to create ‘wholesale cop[ies] of [a work] as a preliminary step’ to develop a new, non-infringing product, even if the new product competes with the original,” OpenAI wrote.
In particular, OpenAI hopes to convince the court that the authors’ vicarious copyright infringement claim—which alleges that every ChatGPT output represents a derivative work, “regardless of whether there are any similarities between the output and the training works”— is an “erroneous legal conclusion.”
The company’s motion to dismiss cited “a simple response to a question (e.g., ‘Yes’),” or responding with “the name of the President of the United States” or with “a paragraph describing the plot, themes, and significance of Homer’s The Iliad” as examples of why every single ChatGPT output cannot seriously be considered a derivative work under authors’ “legally infirm” theory.
“That is not how copyright law works,” OpenAI argued, while claiming that any ChatGPT outputs that do connect to authors’ works are similar to “book reports or reviews.”
Link to the rest at Ars Technica
As PG has mentioned previously, he believes that using a relatively small amount of material protected by copyright along with far larger amounts of material not subject to copyright protection for the purpose of training an AI and not for the purpose of making copies of the copyrighted material qualifies as fair use.
Even absent fair use, such use is not a violation of copyright protection because the AI is not making copies of copyrighted materials.
PG has mentioned other analogies, but one that popped into his mind on this occasion is an author who reads hundreds of romance novels for the purpose of learning how to write a romance novel and then writes a romance novel using tropes and techniques that many other romance authors have used before.
Precursors of the modern popular love-romance can also be found in the sentimental novel Pamela, or Virtue Rewarded, by Samuel Richardson, published in 1740. Pamela was the first popular novel to be based on a courtship as told from the perspective of the heroine. Unlike many of the novels of the time, Pamela had a happy ending.
. . . .
Women will pick up a romance novel knowing what to expect, and this foreknowledge of the reader is very important. When the hero and heroine meet and fall in love, maybe they don’t know they’re in love but the reader does. Then a conflict will draw them apart, but you know in the end they’ll be back together, and preferably married or planning to be by page 192.Joan Schulhafer of Pocket Books, 1982
A great many of the most financially successful authors PG knows are romance authors.