Court Offers First Glimpse Into Whether AI Machine Learning Is Copyright Infringement Or Fair Use

From Mondaq:

As we previously blogged, multiple generative AI platforms are facing lawsuits alleging that the unauthorized use of copyright-protected material to train artificial intelligence constitutes copyright infringement.  A key defense in those cases is fair use.  Specifically, AI platforms contend that they don’t need a license to use copyright-protected content—whether scraped from the Internet or obtained from a pirate trove of books—for the purpose of developing and improving large language models (LLMs) under the theory that such use is transformative and fair use under the Copyright Act.  Whether fair use prevails in this battle is one of the biggest copyright questions of the day.

While many of the generative AI actions are pending in the U.S. District Court for the Northern District of California, a federal court in Delaware recently had the opportunity to opine on the merits of this important fair use question.  In Thomson Reuters v. Ross Intelligence, 2023 WL 6210901 (D. Del. Sept. 25, 2023), the owner of Westlaw (Thomson Reuters) claims, among other things, that an AI startup (Ross Intelligence) infringed Thomson Reuters’ copyright by using Westlaw’s headnotes to train Ross’s legal AI model.  The parties cross moved for summary judgment on various grounds, including on Ross’s fair use defense.  

Though the decision explores multiple interesting questions of copyright law, including the copyrightability of Westlaw headnotes (maybe) and whether the Copyright Act preempts Thomson Reuters’ claim for tortious interference (yes), its analysis of Ross’s fair use defense—in particular, the court’s assessment of whether Ross’s alleged use of Westlaw’s headnotes (assuming they are protected by copyright) is “transformative—is where the court appears to have broken new ground.

The court begins its fair use analysis by discussing two cases from the Ninth Circuit that deal with so-called “intermediate copying.”  In Sega Enterprises v. Accolade, 977 F.2d 1510 (9th Cir. 1992), the court held that it was fair use for a company to copy Sega’s copyright-protected console code for the purpose of learning the software’s functional components and making new games that were compatible with Sega’s console.  Similarly, in Sony Computer Entertainment v. Connectix, 203 F.3d 596 (9th Cir. 2000), the Ninth Circuit held it was fair use for a company to create a copy of Sony’s software in order to create a new gaming platform that was compatible with Sony’s games.  The Thomson Reuters court noted that the Supreme Court “has cited these intermediate copying cases favorably, particularly in the context of ‘adapting the doctrine of fair use in light of rapid technological change.’”  2023 WL 6210901, at *8 (quoting Google v. Oracle, 141 S. Ct. 1183, 1198 (2021)) (cleaned up).

Thomson Reuters attempted to distinguish the intermediate-copying cases by arguing that, unlike the companies in Sega and Sony that merely sought to “study functionality or create compatibility,” Ross sought to train its AI with Westlaw’s “creative decisions” specifically to “replicate them” in the AI’s output.  Ross, on the other hand, contended that “its AI studied the headnotes and opinion quotes only to analyze language patterns, not to replicate Westlaw’s expression,” and thus was lawful “intermediate copying.”  The court held that whether Ross’s use was transformative would turn on the “precise nature of Ross’s actions.”  

Here’s the key text:

It was transformative intermediate copying if Ross’s AI only studied the language patterns in the headnotes to learn how to produce judicial opinion quotes.  But if Thomson Reuters is right that Ross used the untransformed text of headnotes to get its AI to replicate and reproduce the creative drafting done by Westlaw’s attorney-editors, then Ross’s comparisons to cases like Sega and Sony are not apt.

. . . .

To the extent that LLMs are ingesting copyright-protected material solely to understand language patterns and not to replicate their creative expression (which may very well be the case for many LLMs), this opinion suggests that using such material to train AI is transformative.  But if the material is being used to train AI to output the “creative drafting” discerned from the original, then the use is likely not transformative.  Thus, as the Thomson Reuters court observes, the fair use question in these cases may turn on the exact nature of the AI training process.

Link to the rest at Mondaq

PG apologizes if the rest of this post is boring for anyone who isn’t a law geek, but the following may help clarify PG’s interest.

The OP intrigued PG because he got into a bit of trouble a long time ago when he suggested, in an article he wrote for The Journal of the American Bar Association, that West Publishing didn’t have a legitimate copyright to the books it published that consisted of the opinions of a large number of courts across the country.

West was a venerable professional publisher, founded in 1872 to print law books for the use of attorneys and judges.

West evolved to publish the statutes for the United States government and every state.

West also published the court opinions written by judges in the federal court system and all states.

Because the statutes and case opinions are public documents, anyone who desires to publish them is free to do so.

West contended that the improvements it made in these public documents it published were protected by copyright laws.

West built up a large business based upon the changes it made to improve the quality of the federal and state court opinions. These included:

  1. West employees proofread the opinion and corrected grammatical errors.
  2. West employees checked all of the statutory and case citations included in the opinion and corrected them to reflect generally used conventions of legal citations. (Judges, like any other human beings, sometimes make mistakes when they write their opinions. The conventions used in creating such citations can make correctly creating the citations to statutes and cases an error-prone activity.)
  3. For example, “Stearns v. Ticketmaster Corp., 655 F.3d 1013 (9th Cir. 2011),” is West’s citation for the court opinion in the case of Stephen Stearns v. Ticketmaster Corp, et al (et al is an abbreviation of the Latin term “et alia,” which means “and others.”) that was published in volume 655 of the Federal Reporter, Third Series (identified by the abbreviation “F.3d”), beginning on page 1013. The citation also shows the decision was issued by the United States Court of Appeals for the Ninth Circuit (abbreviated as 9th Cir.), in 2011.
  4. It was and is considered bad form for an attorney to cite a case other than in the form prescribed by “Blue Book Citations” in legal documents submitted to a court. West citations were the basis for Blue Book Citations. As mentioned earlier, most judges were happy to have West correct their citation errors. That service helped a judge avoid snide remarks from other judges in the judicial cafeteria.

West also categorized cases according to a West-created “Key Number System.” This is a classification system that organizes cases by topic, allowing legal researchers to quickly find cases related to a particular issue. This system was created in the 19th century, starting with seven categories: persons, property, contracts, torts, crime, remedies, and government.

The Key Number System could be quite helpful before the digitization of cases and statutes.

In 1967, the Ohio State Bar Association entered into a $7,000 agreement with Data Corporation of Beavercreek, Ohio, to create a full-text, interactive research service of the Ohio statutes.

In 1973, Mead Data Central, the successor of Data Corporation, introduced LEXIS, an online computer research service that consisted of the full text of Ohio and New York codes and cases, the U.S.
code, and some federal case law. The LEXIS search engine was clunky by today’s standards, but it allowed attorneys to search the statutes and case opinions much faster and at a more granular level than could be done with West’s printed books.

West and LEXIS (Mead Data Central)

1 thought on “Court Offers First Glimpse Into Whether AI Machine Learning Is Copyright Infringement Or Fair Use”

  1. This is in keeping with what I expected, tbh. Design a generator to replicate and it’s infringement. Build a dictionary for generation from a large corpus and it’s likely not.

Comments are closed.