AI Makers Guaranteeing That Your Generative AI Output Is Safe From Copyright Exposures Might Be A Lot Less Filling Than You Think, Says AI Ethics And AI Law

This content has been archived. It may no longer be accurate or relevant.

From Forbes:

Violating someone’s copyrighted content can be quite costly.

Whether you realize it or not, those popular generative AI apps can inadvertently lead you into the costly throes of copyright infringement. Indeed, that’s why some that are in the know are hesitant to use generative AI apps at all, especially in generating artwork or images. They anxiously worry that doing so might land them in dire legal and abysmal financial jeopardy.

Yes, just to be clear, you the user of the generative AI can be and most likely are the one on the hook for copyright infringement based on generating and then using outputs that violate someone else’s Intellectual Property (IP) rights.

That’s right, you are.

By and large, it is up in the air as to whether you could wiggle out and seek to place the blame entirely instead on the AI maker. The chances are that you will take some or all of the fall, particularly when using generative AI that entails you agreeing to indemnify the AI maker, as I’ve explained can occur without you realizing that you’ve done so (via accepting those densely packed and seemingly unreadable online licensing agreements associated with the multitude of generative AI apps.

. . . .

Maybe you don’t have to avoid generative AI and maybe the AI makers might try to come to your aid. Well, I said maybe. Don’t be cashing in your chips too soon.

In today’s column, I’ll be examining how generative AI such as the widely and wildly successful ChatGPT and others such as Bard (Google), Claude (Anthropic), etc. are potentially skating on thin ice when it comes to possibly infringing on copyrighted material. I’ve covered this previously, including matters of plagiarism, copyrights, and overarching Intellectual Property rights issues underlying modern-day generative AI.

Into this mix, I’ll be especially focusing herein on Adobe Firefly due to last week’s fascinating announcement by Adobe regarding how they will seek to protect users when it comes to potential copyright infringement. In short, Adobe indicated that their generative AI tool called Firefly will be backed by Adobe such that they will indemnify or legally and financially presumably cover your back on copyright infringement allegations, under certain circumstances.

The question on your mind might be whether this is going to be as good as it sounds. Turns out that the devil is in the details and you’d be wise to keep your eyes wide open and your Spidey sense on alert.

. . . .

Into all of this comes a plethora of AI Ethics and AI Law considerations.

There are ongoing efforts to imbue Ethical AI principles into the development and fielding of AI apps. A growing contingent of concerned and erstwhile AI ethicists are trying to ensure that efforts to devise and adopt AI takes into account a view of doing AI For Good and averting AI For Bad. Likewise, there are proposed new AI laws that are being bandied around as potential solutions to keep AI endeavors from going amok on human rights and the like. For my ongoing coverage of AI Ethics and AI Law, see the link here.

The development and promulgation of Ethical AI precepts are being pursued to hopefully prevent society from falling into a myriad of AI-inducing traps. For my coverage of the UN AI Ethics principles as devised and supported by nearly 200 countries via the efforts of UNESCO, see the link here. In a similar vein, new AI laws are being explored to try and keep AI on an even keel. One of the latest takes consists of a set of proposed AI Bill of Rights that the U.S. White House recently released to identify human rights in an age of AI, see the link here. It takes a village to keep AI and AI developers on a rightful path and deter the purposeful or accidental underhanded efforts that might undercut society.

Okay, let’s now back to our focus on potential copyright infringement associated with generative AI.

Imagine that out there on the Internet are lots of art pieces depicting cute-looking cartoonish portrayals of frogs. I’m sure there are zillions of such posted art pieces. Assume that the pattern-matching of generative AI is set up to scan one after another. Gradually, the pattern-matching coalesces toward a certain kind of patterned template about how to depict cartoon-style frogs.

If you ask the generative AI to generate or produce artwork of a cartoony frog, you likely will get one that looks decently apt and akin to all those others that already have been posted online here or there. Since the pattern that was devised is really more of a template, you won’t necessarily get a frog depiction that exactly matches any of the ones scanned during the data training. In a sense, you now seemingly have a unique piece of artwork that depicts a cartoonish frog, and you are the only one on planet Earth to have that particular imagery.

Congratulations.

Now, you might be wondering whether someone else using the same generative AI might end up getting the same exact imagery of the cartoon frog if they also perchance ask to see such an image. Overall, the odds are somewhat slim that this would occur. Part of the reason is that the generative AI is customarily making use of probabilities when it generates the end product, such that there will be sometimes subtle and sometimes dramatic differences each time the otherwise same item is requested.

Like a box of chocolates, you never know precisely what you will get.

So far, so good. We have noted that generative AI is data trained toward pattern-matching rather than strictly copying digital materials found online. In addition, whenever a request is made to generate something, the use of probabilities will vary the look of the generated output. All in all, it seems like we are free and clear of any concerns over copyright infringement.

Sorry to say, the matter is hardly cut and dry.

You can readily veer into copyright violation territory.

Plus, some exhort that we are perilously heading toward copyright infringement at scale.

This clever catchphrase is used to suggest that with millions upon millions of people using generative AI on a daily basis, the round-and-round rolling of the dice associated with statistics and probabilities are going to catch up with us. The volume of copyright infringement is going to be enormous and beyond anything we’ve witnessed before the advent of generative AI.

. . . .

What’s The Deal About Copyrights And Generative AI

Here’s a handy-dandy definition of copyright in the U.S.:

  • “Copyright infringement is the unauthorized use of another’s work. This is a legal issue that depends on whether or not the work is protected by copyright in the first place, as well as on specifics like how much is used and the purpose of the use. If one copies too much of a protected work, or copies for an unauthorized purpose, simply acknowledging the original source will not solve the problem. Only by seeking prior permission from the copyright holder does one avoid the risk of an infringement charge” (Duke University School of Law).

Let’s see how that definition pertains to generative AI and the production of generative AI outputs.

I said earlier that the pattern-matching during data training of scanned data from the Internet is ordinarily aiming to contrive a template or an overall pattern. That’s usually the case. Nonetheless, there is also a chance that within the crux of the generative AI, there can be a verbatim digital copy of something that was scanned. This can and does happen.

The rub of course is that if you happen to ask for a digital artwork to be devised by the generative AI, it could be that the item produced is not a templated variation and instead consists of the verbatim copy. You would not particularly have any means of knowing that this has happened. It is unlikely that the generative AI would be devised to alert you to this condition (though, notably, it could be so devised, if the AI makers wished to do so or were compelled to do so).

As an aside, for those of you lawyers examining these issues, that’s a notable point of argumentation.

Should the AI maker be expected and in a sense obligated to alert the user when a verbatim copy is brought forth to the user? Most of the licensing agreements for generative AI try to stipulate that if that does happen, the user is already supposed to be generally aware that it can happen. The counterargument would be that it is not of much use to broadly warn people. The moment that it happens is when the warning really needs to be conveyed. The retort to that contention is that once someone has been overall forewarned, no further warning is needed or required. On and on this goes.

. . . .

There are various limitations associated with how far copyright law extends, consider these crucial points:

  • “Copyright does not protect ideas, only the specific expression of an idea. For example, a court decided that Dan Brown did not infringe the copyright of an earlier book when he wrote The Da Vinci Code because all he borrowed from the earlier work were the basic ideas, not the specifics of plot or dialogue. Since copyright is intended to encourage creative production, using someone else’s ideas to craft a new and original work upholds the purpose of copyright, it does not violate it. Only if one copies another’s expression without permission is copyright potentially infringed” (Duke University School of Law).

I bring up that insight about the range or scope of copyright to mention that this is something of some refuge for the text-output versions of generative AI.

. . . .

There is a somewhat wide array of allowed exceptions for legally copying copyrighted materials, as this description notes:

“The Copyright Act’s exceptions and limitations found in sections 107-122 include fair use, the “first sale doctrine,” some reproductions by libraries and archives, certain performances and displays, broadcast programming transmissions by cable and satellite, to name a few. Interested in more information on fair use? Take a look at our Fair Use Index. The complete list of exemptions to copyright protection can be found in Chapter 1 of Title 17 of the United States Code. You can also use works that are in the public domain. Works in the public domain are those that are never protected by copyright (like facts or discoveries) or works whose term of protection has ended either because it expired or the owner did not satisfy a previously required formality. Currently, all pre-1926 U.S. works are in the public domain because copyright protection has expired for those works” (U.S. Copyright Office website).
You might have observed in that description that items considered in the public domain can be usually used without invoking a copyright violation.

Here’s how that comes up regarding generative AI.

The AI makers will at times try to confine their generative AI to being data trained on solely works that are in the public domain and that are also in online stock libraries that they have been licensed to use. All in all, the goal is to only data train on things that there is little or no chance of garnering a copyright infringement upon. If your generative AI-produced item is identical to a public domain item, you presumably are in the clear. If your generative AI-produced item is identical to the online stock library item that was licensed by the AI maker, you are presumably in the clear if you abide by whatever other stipulations the AI maker has imposed about doing such copying (make sure to closely review the licensing agreement of the generative AI app).

You can certainly give a loud cheer that this data-training approach will help matters. It isn’t though an ironclad proviso. I’ll tell you why.

Recall that the definition used above about copyright is that you can be an infringer even if you don’t entirely and precisely copy the copyrighted item. The indication was that if “one copies too much of a protected work” is where you can get into trouble.

Imagine this. You use generative AI that was data-trained on a stock online library as licensed by an AI maker. You generate a cartoonish frog. This was in the stock online library. It is a verbatim copy. The generative AI is showcasing it verbatim to you.

But suppose that someone else has a cartoonish copyrighted frog image and they believe that your frog image infringes. I’m sure you would argue that it is the fault of the stock online library and not yours. You would likely also argue that it is the fault of the AI maker. Sure, you can try that. The odds are that you might still be dragged into the mess.

Let’s make things murkier. The cartoonish copyrighted frog image is slightly different from the one that you produced via the generative AI. The copyright owner asserts it is overly close to their copyrighted froggy image. We are now in the muddied waters of what constitutes copyright infringement. Is it close enough in resemblance or not? Bickering and legal debating would arise. Who is to blame for this alleged almost identical froggy depiction?

Link to the rest at Forbes

PG says the author of the OP brings up some interesting issues, but if PG were advising AI art generation programmers, he would consider suggesting that they create a unique digital hash of each of the images used to prime the AI. He would further advise that a digital hash be created for each AI-generated image the program produces.

If the hashes were identical, the AI would discard the image it had created and produce another. If not, the AI would present the image to the user.

PG is ignorant enough about hashes to not know whether an image hash that is 90% the same as another image hash will be generated for a quite similar image or not.

In PG’s self-concocted hashworld, hashes would exist which could be reliably compared to other hashes to determine whether the two images were too similar to avoid copyright infringement or dissimilar enough to avoid eyeball-initiated copyright complaints.

As a hypothetical thought experiment, PG opines that if two sports photographers were standing shoulder to shoulder shooting the finish of an Olympic 100-meter dash, that the photographer who pressed the shutter button one-tenth (or one-hundredth) of a second before the second photographer standing next to her/him pressed her/his shutter button that the first photographer would have a potential copyright infringement claim against the second photographer because the second photograph violated the copyright of the of the first photographer in her/his photograph.

4 thoughts on “AI Makers Guaranteeing That Your Generative AI Output Is Safe From Copyright Exposures Might Be A Lot Less Filling Than You Think, Says AI Ethics And AI Law”

  1. I can’t agree with the sport-photographer hypothetical.

    • Assuming for the moment that P1’s photograph is sufficiently original to be copyrightable — and that’s surprisingly difficult for a still image at a designated critical moment of a sporting event (the finish of a race) — P2 had only an opportunity to copy the event… not P1’s photograph.
       Consider, for a moment, six photographers who take closeup photos — all from different angles — of James Earl Jones as King Lear at Shakespeare in the Park, even all for the same lines in the same scene. The photographers have no opportunity to copy one another. The newspapers and social media sites that later reprint the photos, however, do — and would be the proper defendants in an infringement suit by Pn against anyone who did, in fact, copy the photograph.

    • But it gets better. And more complicated. And more expensive.† Even if P2 did have an opportunity to copy; and even if P1’s photo was sufficiently original to merit copyright protection; and even if P2’s purported “copy” took away all of the commercial value of P1’s photo — there would be a distinct probability of a fair use defense succeeding. Photographs of news events generally have less protection than would, say, the official portrait of P. Rogers Nelson (as the Supreme Court just wrestled with). Combine that with the four factors under § 107 and things Get Interesting. (Further explanations will require a full retainer fee in advance; cash and wire transfers only from this crowd, to my Swiss numbered account, please.)

    • Last for now, but far from least, is the independent-creation defense. This is rarely successful in the real world, but in this context the fact that P1’s photograph has not been made available to the public prior to P2’s creation of her photo is dispositive.

    † Of course it does — there are lawyers involved, almost none of whom are more “creative” than shown in their doodles on legal pads during depositions.

  2. If two sports photographers were standing shoulder to shoulder shooting the finish of an Olympic 100-meter dash, that the photographer who pressed the shutter button one-tenth (or one-hundredth) of a second before the second photographer standing next to her/him pressed her/his shutter button that the first photographer would have a potential copyright infringement claim against the second photographer because the second photograph violated the copyright of the of the first photographer in her/his photograph.

    He could make the claim, but I doubt he’d win. A million things would be different between the two images.

  3. Don’t know about the legal aspects, but the OP seems to assume users of the software monetize the output directly without change.
    All the cases I’ve seen of commercial use involve user modification via photoshop or equivalent.
    Which renders the final product effectively the same as free stock images, right?
    How are those treated?
    So why not treat “ai” art the same way?

    For that matter, is the software even capable of producing identical images from the same prompt? So far I’ve seen none. Reproducibility would actually be a feature worth touting.

Comments are closed.