From The Verge:
More authors sued OpenAI for copyright infringement, joining other writers in pursuing legal action against generative AI companies for using their books to train AI models.
The Authors Guild and 17 well-known authors like Jonathan Franzen, John Grisham, George R.R. Martin, and Jodi Picoult filed the lawsuit in the Southern District of New York. The plaintiffs hope to get the filing classified as a class action.
According to the complaint, OpenAI “copied plaintiffs’ works wholesale, without permission or consideration” and fed the copyrighted materials into large language models.
“These authors’ livelihoods derive from the works they create. But the Defendant’s LLMs endanger fiction writers’ ability to make a living in that the LLMs allow anyone to generate — automatically and freely (or very cheaply) — text that they would otherwise pay writers to create,” the lawsuit said.
The authors added that OpenAI’s LLMs could result in derivative work “that is based on, mimics, summarizes, or paraphrases” their books, which could harm their market.
OpenAI, the complaint said, could have trained GPT on works in the public domain instead of pulling in copyrighted material without paying a licensing fee.
OpenAI said in a statement to The Verge that the company is optimistic it is “having productive conversations with my creators around the world, including the Authors’ Guild, and have been working cooperatively to understand and discuss their concerns about AI.”
“We’re optimistic we will continue to find mutually beneficial ways to work together to help people utilize new technology in a rich content ecosystem,” the company said.
Link to the rest at The Verge
“These authors’ livelihoods derive from the works they create. But the Defendant’s LLMs endanger fiction writers’ ability to make a living in that the LLMs allow anyone to generate — automatically and freely (or very cheaply) — text that they would otherwise pay writers to create,” the lawsuit said.”
How awful. How democratic.
(I wonder what law guarantees them lifetime employment?)
“The authors added that OpenAI’s LLMs could result in derivative work “that is based on, mimics, summarizes, or paraphrases” their books, which could harm their market.”
Could. Could.Could?!
Well, now:
An asteroid could fall on NYC.
The russians could nuke half the world.
Aliens could show up.
Pinky and the Brain could finally succeed.
Could…
…or not.
Who is feeding these folks this… angst?
They sound scared, weak…clueless. They can’t even cite a single violation? At least the Getty operation has the watermark to link the software to their paywalled database.
https://www.theverge.com/2023/2/6/23587393/ai-art-copyright-lawsuit-getty-images-stable-diffusion
They are trying to wipe out an entire technology by suing the poorest player in the category? Do they even realize that if they could somehow bully OpenAI out of business, there would still be Microsoft, Google, Facebook, and Amazon and their own proprietary LLM tools out there? OpenAI software is at least open to any software developer willing to license their API set. The tech giant’s alternatives aren’t that startup friendly. They aren’t going to be swayed by “could”. And they can afford to countersue.
They are trully channeling Ned Ludd.
(If he ever existed.)
Buggy whip vendors suing Henry Ford for putting them out of business would’ve had a better case.
I find myself wondering if OpenAI was using pirated copies of the books that it downloaded as part of a general scraping of data from the internet. If they had actually paid for access to the titles they are not doing anything different from what authors are advised to do (“read widely in your chosen genre”) but is there a case if they used pirated books?
That is for lawyers to argue over, but given the way the training databases were assembled, it is unlikely since the software developers themselves have no idea what specific data went in or where from.
The LLM training databases were assembled via web crawler (spiders) that looked at anything and everything on the web and processed it into a database of real world language use data. The actual text is unlikely to be found in the training database as it is irrelevant to the process.
If one were inclined to be kind to the squealers, one might suggest they are confusing the training database with the runtime model (which is really just a sophisticated database of relationships) that the neural network software assembled. That model is what users interact with and, in most cases, it searches a third database to produce the output.
In this case, astronomy “AI” visual recognition, the first and third dstabase are astronomy visual:
https://www.space.com/astronomy-research-ai-future
In the case of BingChat and Google Bard, the third database is the internet in full. Which almost certainly includes “unauthorized editions” that the search engines normally censor out. ChatGPT used to use a backup of a search database, circa 2021, but recently was updated to use current data.
Bottom line is that not only can the squealers not prove any direct harm of any kind to *themselves*, they don’t even know if their “precious” was analyzed for the training database or processed by the output model off the internet. To many steps and most are blind data processing.
Finally, in the US there is the matter of fair use precedent in the first step and 30 years of legal web crawler precedent in the final step.
Again: a matter of lawyer billable hours to debate if they even have standing to file a lawsuit or if a court can ban a software with perfectly legal uses on the *possibility* that somebody, somewhere, *could* someday use it for a questionable purpose (a more speculative of the plaint in the 1976 BETAMAX case) or to compete with them in the marketplace.
The names fronting this lawsuit from (of course!) the Author’s guild are presumed to be intelligent people but they seem to have been misinformed and dragged into a foolish endeavor. LLM tech is here to stay and its deployment isn’t going to be stopped at the source.
What they really should be doing is going after their publishers the way the screenwriters did.
But the AG would never do that, right?
1- LLM is software plumbing.
2- LLM software has many uses other than chatbots or assembling pastiches.
3- For one thing, LLM software can write *software*. Big can of worms there, but it writes better software than 80% of coders out there.
4- And it is already doing things that were only far future dreams in times past.
For example, back in 1945, Vannevar Bush posited a future information management system he called The MEMEX. This inspired software development for three generations, up to and including the web. And this:
https://techcommunity.microsoft.com/t5/microsoft-onedrive-blog/unveiling-the-next-generation-of-onedrive/ba-p/3935612
Pretty much a digital MEMEX in all its glory.
And it’s not just Microsoft. They’re racing ahead because of their privileged access to OpenAI tech plus their own, parallel efforts. Google as usual is trying to copy, as is Amazon, Facebook, and any number of european, canadian, and aussie outfits. Taken together in a couple of years the LLM-based software sector will be much bigger than global trade publishing.
No whining is stopping that train.