A battle royal is brewing over copyright and AI

From The Economist:

Consider two approaches in the music industry to artificial intelligence (ai). One is that of Giles Martin, son of Sir George Martin, producer of the Beatles. Last year, in order to remix the Fab Four’s 1966 album “Revolver”, he used ai to learn the sound of each band member’s instruments (eg, John Lennon’s guitar) from a mono master tape so that he could separate them and reverse engineer them into stereo. The result is glorious. The other approach is not bad either. It is the response of Nick Cave, a moody Australian singer-songwriter, when reviewing lyrics written in his style by Chatgpt, an ai tool developed by a startup called Openai. “This song sucks,” he wrote. “Writing a good song is not mimicry, or replication, or pastiche, it is the opposite. It is an act of self-murder that destroys all one has strived to produce in the past.”

Mr Cave is unlikely to be impressed by the latest version of the algorithm behind Chatgpt, dubbed gpt-4, which Openai unveiled on March 14th. Mr Martin may find it useful. Michael Nash, chief digital officer at Universal Music Group, the world’s biggest label, cites their examples as evidence of both excitement and fear about the ai behind content-creating apps like Chatgpt (for text) or Stable Diffusion (for images). It could help the creative process. It could also destroy or usurp it. Yet for recorded music at large, the coming of the bots brings to mind a seismic event in its history: the rapid rise and fall of Napster, a platform for sharing mainly pirated songs at the turn of the millennium. Napster was ultimately brought down by copyright law. For aggressive bot providers accused of riding roughshod over intellectual property (ip), Mr Nash has a simple message that sounds, from a music-industry veteran of the Napster era, like a threat. “Don’t deploy in the market and beg for forgiveness. That’s the Napster approach.”

The main issue here is not ai-made parodies of Mr Cave or faux-Shakespearean sonnets. It is the oceans of copyrighted data the bots have siphoned up while being trained to create humanlike content. That information comes from everywhere: social-media feeds, internet searches, digital libraries, television, radio, banks of statistics and so on. Often, it is alleged, ai models plunder the databases without permission. Those responsible for the source material complain that their work is hoovered up without consent, credit or compensation. In short, some ai platforms may be doing with other media what Napster did with songs—ignoring copyright altogether. The lawsuits have started to fly.

It is a legal minefield with implications that extend beyond the creative industries to any business where machine-learning plays a role, such as self-driving cars, medical diagnostics, factory robotics and insurance-risk management. The European Union, true to bureaucratic form, has a directive on copyright that refers to data-mining (written before the recent bot boom). Experts say America lacks case history specific to generative ai. Instead, it has competing theories about whether or not data-mining without licences is permissible under the “fair use” doctrine. Napster also tried to deploy “fair use” as a defence in America—and failed. That is not to say that the outcome will be the same this time.

The main arguments around “fair use” are fascinating. To borrow from a masterclass on the topic by Mark Lemley and Bryan Casey in the Texas Law Review, a journal, use of copyrighted works is considered fair when it serves a valuable social purpose, the source material is transformed from the original and it does not affect the copyright owners’ core market. Critics argue that ais do not transform but exploit the entirety of the databases they mine. They claim that the firms behind machine learning abuse fair use to “free-ride” on the work of individuals. And they contend that this threatens the livelihoods of the creators, as well as society at large if the ai promotes mass surveillance and the spread of misinformation. The authors weigh these arguments against the fact that the more access to training sets there is, the better ai will be, and that without such access there may be no ai at all. In other words, the industry might die in its infancy. They describe it as one of the most important legal questions of the century: “Will copyright law allow robots to learn?”

An early lawsuit attracting attention is from Getty Images. The photography agency accuses Stability ai, which owns Stable Diffusion, of infringing its copyright on millions of photos from its collection in order to build an image-generating ai model that will compete with Getty. Provided the case is not settled out of court, it could set a precedent on fair use. An even more important verdict could come soon from America’s Supreme Court in a case involving the transformation of copyrighted images of Prince, a pop idol, by the late Andy Warhol, an artist. Daniel Gervais, an ip expert at Vanderbilt Law School in Nashville, believes the justices may provide long-awaited guidance on fair use in general.

Link to the rest at The Economist

As PG has mentioned before, the books, images, etc., are not being used by the creators of Artificial Intelligence engines to create copies of the books, images, etc., but rather for the AI engines to learn about what’s been created before and adapt that information in new and far different ways.

To repeat an earlier comparison, the AI program is doing the same thing an art student does when she/he/they go to an art museum to study the techniques used by other artists.

In writing, no one is upset if a new author carefully studies the style of F. Scott Fitzgerald, Ernest Hemingway, Danielle Steel, James Patterson, Margaret Atwood, Barbara Cartland, John Grisham, Alice Munro and/or Dean Koontz in order to derive information about how to successfully write fiction.

8 thoughts on “A battle royal is brewing over copyright and AI”

  1. I have never figured out how any specific copyright holder can demonstrate that AI looked at his stuff.

  2. I think the key problem here is not the “copying” issue, which a hypothetical quantum-computing-based AI training program that only “touches” and never makes a local copy would not infringe. In short, if we answer this question and rely on that answer as rock-solid, Second-Circuit-style never-to-be-questioned-notwithstanding-facts-or-other-legal-changes precedent,† that quantum-computing-based-system-that-doesn’t-make-a-copy will come into being.

    But everyone wants to answer the “copy” question because compared to what’s really at issue, “is there a literal copy made that infringes?” is the easy, narrow-minded, tunnel-visioned, and ultimately self-defeating question. This is really about creation of derivative works, not about literal copies. And the right to create derivative works is a part of the holder’s § 106 (and, especially for so-called fine art, § 106A) rights. But this is a very, very difficult question not amenable to bright-line rules. It requires judgment. It requires educated judgment; consider, for a moment, whether Procol Harum’s “A Whiter Shade of Pale” is a “derivative work” of Bach’s “Air on a G String” due to the similarity of the opening melodic lines. We can’t even make accurate decisions on whether something is “parody” or “satire” — not even for purely textual works.

    In short, people are asking the question about “copies” because they think they can answer it. But that’s not the question that requires answering.

    † As my snark may indicate, This Is A Problem. It’s how we got Tasini (Supreme Court reversed 2d Circuit), Muchnick (ditto), Kirtsaeng (ditto) — and that’s just copyright. It’s also how we got MOAC (ditto, just this morning) on a technical-but-excrutiatingly obvious question of bankruptcy and civil procedure.

  3. Nah.
    A proper headline would be: “another teapot tempest brewing”.

    Not all derivatives are infringement.
    A better test is substitution.

    This is really a tech matter not a business or creativity matter.
    And a relevant precedent is clean room reverse engineering:


    “Clean-room design is useful as a defense against copyright infringement because it relies on independent creation. However, because independent invention is not a defense against patents, clean-room designs typically cannot be used to circumvent patent restrictions.

    “The term implies that the design team works in an environment that is “clean” or demonstrably uncontaminated by any knowledge of the proprietary techniques used by the competitor.

    “Typically, a clean-room design is done by having someone examine the system to be reimplemented and having this person write a specification. This specification is then reviewed by a lawyer to ensure that no copyrighted material is included. The specification is then implemented by a team with no connection to the original examiners.”

    Note that clean room output actually substitutes for the original product but is perfectly legal. 40 years established.

    In this case:
    The “AI” teaching AP studies the original, abstracts its specification, and feeds *that* to the end user app. No copy of the original is in the training database. Neither app substitutes for the scanned item, one of a hundred trillion, which incidentally makes its “contribution” minimal and obscured. Fair use all the way. (And what remedy do they expect anyway? a hundred trillionth of a percent of the revenue?)

    It’s just more posturing and whining because “somebody figured out how to make money”.
    A shakedown attempt.
    And the tech world learned how to deal with those ages ago.

    Again, copyright is the wrong tool.
    (The continual harping on copyright reminds me of Trini Lopez. 😉 )

    A knowledgeable shakedown artist would start by searching for an obscure vintage patent vaguely resembling the data scanning or abstraction process. They’d probably still lose but they’d have a shot at a minor settlement.

  4. On another blog I saw some images that had been generated by some form of AI. The person who did the generating thought the images were impressive. But I could tell immediately and shockingly that the AI had seen/used/been trained on a game called FlashPoint: Fire Rescue. The images were totally derivative of the cards used in that game.

    A person copying a picture for learning purposes is supposedly okay. It’s considered part of their learning. But it has also always been true that if someone became expert at imitating styles and sold a picture claiming it was, let’s say, Cezanne when it wasn’t, that was considered fraud … if detected.

    From a purely lay perspective AI is flirting with the edges of that problem if not outright falling over. If you develop a style and a computer can then copy it, something has been stolen.

Comments are closed.