PG thought he had blogged about this case before but couldn’t find any evidence of doing so when he searched TPV.
From Copywrite Lately:
While a flurry of AI copyright lawsuits from prominent authors and artists grab headlines, another case has quietly taken something more important: a head start.
Even die-hard copyright geeks would be forgiven for overlooking a lawsuit first filed over three years ago by information services company Thomson Reuters against AI start-up Ross Intelligence. That’s because the case involves Westlaw, a legal research tool that’s about as sexy as the underwear section in a 1940s Sears catalog. I say this with peace and love as a longtime Westlaw user, but let’s be honest—headnotes and key numbers are simply no match for the likes of Sarah Silverman and John Grisham.
It’s time to start paying attention though, because a Delaware District Court judge just ordered this low-profile AI case to trial, largely denying the parties’ motions for summary judgment on copyright infringement and fair use (read the opinion here). This means that a jury could weigh in on some of the thorniest copyright questions involving artificial intelligence as early as May 2024.
Thomson Reuters v. Ross Intelligence
The issues at play in Thomson Reuters v. Ross Intelligence largely mirror those I’ve discussed in connection with recent class action copyright lawsuits filed against the creators of Stable Diffusion, ChatGPT and other generative AI tools. In a nutshell, plaintiffs allege that Ross hired a third-party contractor to unlawfully copy Westlaw content—including its proprietary Key Number System and case headnotes—in order to train Ross’s own AI-driven natural language legal search engine.
Unlike the creative works ingested by AI tools in the recent lawsuits filed against OpenAI and Stability AI, the copyrights in Westlaw are more limited. Thomson Reuters doesn’t own any of the underlying judicial opinions that make up its database. It does, however, claim copyright in its keynote organization system as well as its original case summaries and headnote descriptions. These “editorial enhancements” are drafted by the company’s attorney-editors in what I’d imagine is the most thankless job this side of working for Louis Litt.
But according to Ross, it wasn’t interested in the Westlaw key numbers or headnotes. Instead, the goal of its system was for users to ask questions and for the search engine to spit out quotations directly from judicial opinions—no commentary necessary. In other words, Ross contends that the output of its tool won’t infringe any original copyrighted material owned by Thomson Reuters, notwithstanding the so-called “intermediate copies” of West’s key numbers and headnotes that may have been made to initially train Ross’s dataset. These copies, Ross claims, are fair use.
In January, Thomson Reuters moved for summary judgment on its copyright infringement claim, and both sides moved for summary judgment on Ross’s fair use defense.
Judge Stephanos Bibas ultimately declined to determine the scope of protection to be given the Key Number System or to decide whether Westlaw’s headnotes added sufficient non-trivial material to the underlying judicial opinions to meet copyright’s originality threshold. While the court did find that Ross committed an act of “actual copying” by scraping and reproducing headnotes during the AI training process, whether that copying constitutes infringement will depend on whether or not the headnotes are protected expression. That issue will be decided by a jury.
The court likewise ruled that a jury needs to decide whether there are substantial similarities in protectable expression (as opposed to unprotectable material) between Westlaw’s headnotes and summaries and thousands of “bulk memos” created by Ross’s third-party contractor to train Ross’s AI tool.
The court found disputed issues of fact on all four fair use factors, meaning that a jury will be tasked with answering most of the questions underlying this key defense.
The Purpose and Character of the Use
Interestingly, the court’s first factor analysis largely focused, not on the commercial nature of Ross’s competing tool, but on disputes over whether Ross’s copying was transformative—an inquiry that some observers (but, ahem, not this one) thought would take a backseat following the Supreme Court’s recent Warhol decision.
Judge Bibas noted that whether Ross’s so-called “intermediate copying” (copies made during the input stage of the training process) was transformative would depend on the precise nature of Ross’s actions: “It was transformative intermediate copying if Ross’s AI only studied the language patterns in the headnotes to learn how to produce judicial opinion quotes.” If, on the other hand, “Thomson Reuters is right that Ross used the untransformed text of headnotes to get its AI to replicate and reproduce the creative drafting done by Westlaw’s attorney-editors,” then the copying would weigh against a transformative fair use. This raised a material question of fact that a jury needs to decide.
The Nature of the Copyrighted Work
While declining to definitively rule that Westlaw’s headnotes were too unoriginal to satisfy the second fair use factor, the judge certainly signaled that he didn’t think plaintiffs’ contributions were at the “core of intended copyright protection,” and specifically distinguished them from “traditionally protected materials, such as literary works or visual art.”
The Amount and Substantiality of the Copying
Because it was unclear how much of Ross’s copying was of protectable expression, the court found that a jury would need to decide the third fair use factor too. Interestingly, the court also noted that copying could be deemed insubstantial if Ross’s AI actually works in the way the company claimed—i.e., if the tool outputs only the unprotectable judicial opinion, not any original expression. This suggests that the presence or absence of substantial similarity at the output stage may influence the court’s input stage rulings as well.
The Effect of the Use Upon the Market for the Work
Finally, on the fourth fair use factor, the court declined to decide whether Ross’s use of Westlaw’s material had a “meaningful or significant effect” on the value of the original or its potential market. Focusing not merely on economic effects, but “public benefits” of the copying, the court concluded that a jury would be best situated to answer these questions:
Link to the rest at Copywrite Lately
The OP brought to mind a case decided a very long time ago (BI – Before Internet) that caused PG to write an article for a legal publication. PG’s article was titled “Who Owns the Law?”
One problem with BI writings is that PG has not been able to locate an online copy of “Who Owns the Law?”
Basically, the copyright issue he wrote about BI was more than a little similar to the dispute described in the OP.
In PG’s ancient article, he wrote about West Publishing, now owned by the same Thomson Reuters mentioned in the OP.
Way back when, West was a closely-held and secretive company that claimed broad copyright protection for the volume and page numbers universally used by lawyers and judges to identify state and federal court opinions West published in printed form.
Here’s an example of a case citation:
Stearns v. Ticketmaster Corp., 655 F. 3d 1013 (9th Cir. 2011)
West assigned the 655 F. 3d 1013 portion of the citation. (Translated, it means the volume (655), reporter (F. 3d, which is an abbreviation for Federal Reporter, Third Series) and page number in volume 655, (1013) where the printed case may be found.
(The Federal Reporter series of books is reserved for decisions from the various United States Court of Appeals, the second-highest courts in the United States. 9th Cir means the decision was handed down by the 9th Circuit Court of Appeals. There are twelve regional circuits that cover the United States. The 9th Circuit is geographically the largest of the circuits by a large margin. It includes the states of California, Arizona, Nevada, Oregon, Washington, Idaho, Montana, Alaska and Hawaii. The 9th Circuit also includes Guam, and the Northern Mariana Islands.) (You’ve taken your first steps toward mastering legal research.)
West’s copyright rationale was that the company fixed the sort of typos and citation errors that were embarrassingly common during those times before spell check. West further added page numbers to the thick books containing lots of court opinions that the company printed.
West also added a short summary describing what the court case was all about. West also had (and may still have) an enormous outline of the law, which it called the West Key Number System. Its attorneys would go through each case and identify portions that correlated with its Key Number System for other court cases.
From an attorney’s point of view, if you found a case opinion similar to the one you were working on that included a West Key Number citation, you could look up that Key Number Citation and, hopefully, find a number of in-state and federal case opinions addressing issues you were working on at the moment. In some instantiations, the West Key Number index would also show you case opinions in other jurisdictions, which might suggest a line of legal argument for the hometown case you were handling.
The Key Number system was rendered obsolete almost immediately when online search systems were published that allowed an attorney to perform Boolean searches against all decisions rendered by courts in the jurisdiction. As extensive as West’s Key Number system was, it was a blunt instrument when computerized legal research came on the scene.
Additionally, West printed the cases in thick books with page numbers. Lawyers used the West page numbers in their court papers to point the judge to the particular portion of the case opinion they wanted the judge to examine.
This made it more likely that the judge would tell his judicial clerk or secretary to get a copy of a case or, at least, copies of the pages the attorney wanted the judge to read that were buried in a 50-page appellate case opinion.
A technology company then called Mead Data Central, later changed to Lexis-Nexis, referring to the Lexis online research system for lawyers and the Nexis news, magazine, academic journal, scientific publications, etc. repository that had the same computer search capabilities as were used by Lexis.
Lexis basically tore apart every West book full of court opinions, state and federal statutes and other similar collections of federal, state and local government publications that lawyers would find helpful.
After removing the materials West had added to the original government documents, Lexis sent the judicial opinions, statutes, etc., offshore, where a zillion less-expensive fingers and thumbs keyboarded them into the Lexis-Nexis computer systems. The computer systems made the electronic copies of the documents searchable.
West sued Mead Data Central, the owner of Lexis, for copyright infringement.
Mead said these were public documents and West couldn’t assert copyright protection for government documents prepared by government employees.
West said that its case citations and page numbers were copyrighted because West had developed a system of organization and included page breaks and page numbers that weren’t in the original court documents. The fact that inserting page numbers required no creativity activity that Congress intended to encourage with copyright laws didn’t bother West. It worked hard to do a good job, and Lexis shouldn’t be able to steal West’s hard work.
The hometown trial judge bought West’s dubious theory and agreed that the data West used- the words included in court opinions and government documents – were in the public domain and unprotected by copyright law. However, the hometown judge held that “the (West) arrangement and pagination of this public material reflects the skill, discretion and effort of the person crafting the arrangement.”
In other words, West didn’t own the words, but, by working hard to insert page numbers and put the cases from Alaska into a different printed book than the cases from California (“by the sweat of the West’s brow”), West was entitled to copyright to volume and page numbers in its case publications.
Since lawyers had used case numbers and page citations in documents submitted to the court to point the judge to the location of the particular court case the lawyers wanted the judge to consider out of a library full of books containing thousands of court cases (Judges don’t respond well to requests from lawyers to “Look it up yourself.”) West had built an effective monopoly on the way judges and lawyers had established so each group could do their jobs.
Mead appealed, and West, realizing that, sooner or later, some appellate court would reverse earlier court decisions, entered into a super-secret settlement agreement that effectively allowed Mead the right to use West case numbers and page numbers.
As mentioned, PG worked for Lexis a long time ago but never persuaded corporate counsel to let him see a copy of the West settlement documents. PG quickly realized that one reason for the secrecy was that the settlement provided West and Lexis with a shared monopoly on case citations.