In light of several high-profile lawsuits in recent months, countries’ legislative frameworks are finally beginning to grapple with the challenges thrown up by copyright law and generative artificial intelligence (AI).
In January 2023, Getty Images announced a lawsuit against Stability AI in London’s High Court of Justice, alleging that the Stable Diffusion image generator infringed Getty’s copyrighted photographs and trademarks.
And, in February, the award-winning visual artists Sarah Andersen, Kelly McKernan, and Karla Ortiz filed a class action complaint in a US District Court in California against defendants Stability AI, Midjourney and DeviantArt, alleging that their works were used without permission as part of the companies’ AI training set.
Earlier, in November 2022, a group of anonymous programmers filed a class action lawsuit against GitHub, a Microsoft subsidiary, and OpenAI, alleging unauthorised and unlicensed use of the programmers’ software code to develop the defendants’ AI machines, Codex and Copilot.
Recognising a need for action, the House Judiciary Committee in the US has held a hearing, examining the intersection of generative AI and copyright law. The hearing, which took place on 17 May 2023, followed the Senate hearing on AI oversight the previous day, in which OpenAI CEO Sam Altman took the stand. What were the five key takeaways from the witness testimony?
1. Copyright’s well-established fair use doctrine arguably provides legal coverage for the training of AI models.
Sy Damle, Latham & Watkins LLP and former General Counsel of the US Copyright Office, argued that “the use of a copyrighted work to learn unprotectable facts and use those facts to create products that do not themselves infringe copyright is quintessential fair use”, and that the training of AI models generally adheres to this principle.
He spoke against the view that generative AI’s ability to replicate artistic styles undermines any fair use defence, saying, “This concern has nothing to do with copyright, which does not, and has never, granted monopolies over artistic or musical styles.”
2. Implementing a statutory or collective licencing regime would be a project “many orders of magnitude larger than any similar scheme in the history of American law”.
Sy Damle argued that it would be a bad policy to introduce statutory or collective licencing under which any use of copyrighted content to train an AI model would automatically trigger a payment obligation. This is because it would prevent case-by-case evaluation, eliminating the fair use doctrine.
Moreover, he observed that implementing such a regime would be overwhelmingly complex. A statutory licencing scheme would need to cover every publicly accessible work on the Internet – a body of work which likely numbers in the tens of billions. There are also an uncountable number of “orphan works” without identifiable owners, which would lead to massive volumes of unmatched royalties.
3. AI systems could generate outputs that potentially infringe on artists’ copyrights and right of publicity in various ways.
Chris Callison-Burch, Associate Professor of Computer and Information Science at the University of Pennsylvania and Visiting Research Scientist at the Allen Institute for Artificial Intelligence, pointed out that outputs of generative AI can violate copyright laws. For example, via memorisation of datasets, AI systems can output identical copies of copyrighted materials.
However, he observed that Google and other companies are developing strategies to prevent sophisticated prompting by the user that would elicit the underlying training data.
Text-to-image generation systems also have the ability to produce images with copyrightable characters in their dataset – a problem that may be hard for AI developers to avoid without a registry of copyrighted or trademarked characters.
He suggested that other uses of generative AI may violate “right-of-publicity” rather than copyright law. For example, there is the case of the AI-generated song called “Heart on My Sleeve””, designed to sound like the artists Drake and The Weeknd. There is also the issue of “substantial similarity” where outputs of generative AI systems look very similar to some of their training data.
4. Copyright holders can, under certain circumstances, opt out of having their works used to train AI systems.
Callison-Burch pointed out that there are several technical mechanisms that are being designed by industry to let copyright holders opt out. The first is an industry standard protocol that allows for websites to specify which parts should be indexed by web crawlers, and which part should be excluded. The protocol is implemented by placing a file called robots.txt on the website that hosts the copyrighted materials.
Organisations that collect training data, like Common Crawl and LAION, follow this protocol and exclude files that have been listed in robots.txt as “do not crawl”. There are also emerging industry efforts to allow artists and other copyright holders to opt out of future training.
Link to the rest at Verdict