Does generative artificial intelligence infringe copyright?

From The Economist:

GENERATIVE ARTIFICIAL INTELLIGENCE (AI) will transform the workplace. The International Monetary Fund reckons that AI tools, which includes ones that produce text or images from written prompts, will eventually affect 40% of jobs. Goldman Sachs, a bank, says that the technology could replace 300m jobs worldwide. Sceptics say those estimates exaggerate. But some industries seem to be feeling the effects already. A paper published in August 2023 on SSRN, a repository for research which has yet to undergo formal peer review, suggests that the income of self-employed “creatives”—writers, illustrators and the like—has fallen since November 2022, when ChatGPT, a popular AI tool, was released.

Over the past year artists, authors and comedians have filed lawsuits against the tech companies behind AI tools, including OpenAI, Microsoft and Anthropic. The cases allege that, by using copyrighted material to train their AI models, tech firms have violated creators’ rights. Do those claims have merit?

AI generators translate written prompts—”draw a New York skyline in the style of Vincent van Gogh”, for example—into machine-readable commands. The models are trained on huge databases of text, images, audio or video. In many cases the tech firms appear to have scraped much of the material from the internet without permission. In 2022 David Holz, the founder of Midjourney, one of the most popular AI image generators, admitted that his tool had hoovered up 100m images without knowing where they came from or seeking permission from their owners.

Generators are supposed to make new output and on that basis AI developers argue that what their tools produce does not infringe copyright. They rely on the “fair-use doctrine”, which allows the use of copyrighted material in certain circumstances. This doctrine normally protects journalists, teachers, researchers and others when they use short excerpts of copyrighted material in their own work, for example in a book review. AI tools are not entitled to that protection, creatives believe, because they are in effect absorbing and rearranging copyrighted work rather than merely excerpting small pieces from it.

Generative AI is so new that there is almost no case law to guide courts. That makes the outcome of these cases hard to guess. Some observers reckon that many of the class-action suits against AI firms will probably fail. Andres Guadamuz, an expert in intellectual-property law at the University of Sussex, reckons that the strength of the fair-use doctrine is likely to trump claimants’ concerns.

One case will be particularly closely watched. On December 27th the New York Times sued Microsoft and OpenAI after negotiations failed. It alleges that the tech companies owe “billions of dollars” for using copyrighted work to train ChatGPT. The newspaper’s lawyers showed multiple examples of ChatGPT producing New York Times journalism word for word. This shows that AI tools do not substantially transform the material they’re trained on, and therefore are not protected by the fair-use doctrine, they claim.

On January 8th OpenAI responded, saying that it had done nothing wrong. Generative AI tools are pattern-matching technologies that write responses by predicting the likeliest next word based on what they have been trained on. As in other cases of this kind, OpenAI says that is covered by fair use. It claims that the New York Times overstates the risk of “regurgitation”, which it blames on a bug that produces errors only rarely. In a filing submitted on February 26th, OpenAI claimed that the New York Times cherry-picked answers from “tens of thousands” of queries it sent to the chatbot. Some of these were “deceptive prompts” that violated its terms of use, it alleged.

Link to the rest at The Economist

PG still thinks the use of materials protected by copyright to train AI systems qualifies as fair use. Those who use an AI system to create written material cannot, to the best of PG’s knowledge, call up an exact copy of a New York Times story.

PG is going to see if he can make his way through the various contentions, but he was immediately reminded of the Google Books case that was ultimately resolved in favor of Google in 2015.

The basis for the Google Books decision was the transformative nature of Google’s use of the content of the books to populate a huge online database with millions of books.

PG suggests that there is a much greater degree of transformation involved in AI usage of the texts involved in the New York Times’s Lawsuit Against Microsoft and OpenAI than there is in Google’s use of the text of 40 million books in 500 languages.

6 thoughts on “Does generative artificial intelligence infringe copyright?”

  1. The problem is not really copyright, rather it’s about generating more content than you can shake a big stick at, and then use this to swamp the internet. Spammers heaven.

        • Humans are adaptive.
          They’ll figure out what is worth their attention and/or cash.
          And if not, well, anything that grows skepticism on the web is a good thing.

      • Very smart people assured us that consumers could never survive that tsunami. They would be hopelessly lost trying to find a good independent book. Real authors, authentic writers, and serious artists would find their work buried under the dreck.

        Those works may indeed be buried, but consumers are happily chugging along consuming zillions of independent works, and coming back for more. Maybe the real smart folks aren’t that smart.

  2. I would bet on “substitution” carrying the day.
    Nobody (who cares for the NYT) is going to use a chatbot as a substitute for a subscription.
    Or rely on LLM output to substitute for a specific copyrighted book.

    By their logic, Cliff Notes would be illegal.

    https://www.cliffsnotes.com/literature/t/to-kill-a-mockingbird/to-kill-a-mockingbird-at-a-glance

    Collages, too.
    Luddites, the lot of them.

    And yes, the Google books case is a perfect precedent to bring up.

    As before, the plaint reduces to “somebody used my text to make money in an entirely different business.” And they forget that “different business” by itself makes it fair use.

Comments are closed.