ChatGPT maker OpenAI faces a lawsuit over how it used people’s data

From The Washington Post:

A California-based law firm is launching a class-action lawsuit against OpenAI, alleging the artificial-intelligence company that created popular chatbot ChatGPT massively violated thecopyrights and privacy of countless people when it used data scraped from the internet to train its tech.

The lawsuit seeks to test out a novel legal theory — that OpenAI violated the rights of millions of internet users when it used their social media comments, blog posts, Wikipedia articles and family recipes. Clarkson, the law firm behind the suit, has previously brought large-scale class-action lawsuits on issues ranging from data breaches to false advertising.

The firm wants to represent “real people whose information was stolen and commercially misappropriated to create this very powerful technology,” said Ryan Clarkson, the firm’s managing partner.

. . . .

The lawsuit goes to the heart of a major unresolved question hanging over the surge in “generative” AI tools such as chatbots and image generators. The technology works by ingesting billions of words from the open internet and learning to build inferences between them. After consuming enough data, the resulting “large language models” can predict what to say in response to a prompt, giving them the ability to write poetry, have complex conversations and pass professional exams. But the humans who wrote those billions of words never signed off on having a company such as OpenAI use them for its own profit.

“All of that information is being taken at scale when it was never intended to be utilized by a large language model,” Clarkson said. He said he hopes to get a court to institute some guardrails on how AI algorithms are trained and how people are compensated when their data is used.

. . . .

The legality of using data pulled from the public internet to train tools that could prove highly lucrative to their developers is still unclear. Some AI developers have argued that the use of data from the internet should be considered “fair use,” a concept in copyright law that creates an exception if the material is changed in a “transformative” way.

The question of fair use is “an open issue that we will be seeing play out in the courts in the months and years to come,” said Katherine Gardner, an intellectual-property lawyer at Gunderson Dettmer, a firm that mostly represents tech start-ups. Artists and other creative professionals who can show their copyrighted work was used to train the AI models could have an argument against the companies using it, but it’s less likely that people who simply posted or commented on a website would be able to win damages, she said.

“When you put content on a social media site or any site, you’re generally granting a very broad license to the site to be able to use your content in any way,” Gardner said. “It’s going to be very difficult for the ordinary end user to claim that they are entitled to any sort of payment or compensation for use of their data as part of the training.”

Link to the rest at The Washington Post