From The Illusion of More:
I recently attended a round-table discussion on the subject of artificial intelligence and copyright. The first of several engaging topics I thought warranted a post was the question of “machine learning”
. . . .
When you read a book, even if we might say, by way of analogy, that you are “copying” the content of that book onto your brain, this clearly does not infringe §106(1) of the copyright law proscribing unauthorized copying. Since the author naturally hopes that you will read her book, such a prohibition would be absurd, even if you had an eidetic memory and could, if prompted, recite the entire work verbatim. But if you used that gift to type from memory the entire book and made that document available, you would then violate more than one statute under the copyright law.
So, the question raised in regard to “machine learning” is whether the computer scientist who wishes to feed a corpus of books—say the anthology of American literature—into an AI should be required to obtain licenses for the works still under copyright. Thus, the first analysis is whether the act of “copying” can be said to occur in this circumstance any more than it would be for the human reader who consumes the same body of literature.
It strikes me that if what the AI does in this case is ingest the corpus of books and almost instantly deconstructs those works by synthesizing them through a neural network, then the computer scientist has a pretty solid argument that no copying has taken place. If the machine does not retain intact copies of works—or even large sections of works—-with the purpose of making those intact copies available to the human market, then this “machine reading” process is arguably analogous to the human whose reading does not infringe §106(1) of the copyright law.
That said, intent of the computer scientist may be a significant factor. For instance, if the training of the AI will have a commercial purpose, this may suggest a requirement to license the works under copyright. But intent can be very tricky on the leading edge of science because it is neither realistic, nor even desirable, to insist that every researcher know exactly where his experiments will lead. This would nullify the process of discovery whence many great achievements have been made; hence, discovery is justification itself, and I suspect the tech companies would appeal to this rationale in regard to “machine learning.”
If the computer scientist’s goal is to see whether he can get his AI to “learn” about the American experience through literature, but he does not have a particular product or service in mind at the outset, it seems that copyright owners would be on fairly shaky ground to enjoin his use of the books. As long as nothing that comes out the other end looks like any of the products that went in, it strikes me that this experiment exists beyond the statutory framework of copyright law.
. . . .
On the other hand, the moment Google or Facebook did announce that new product, rightsholders could justifiably complain that a massive, highly-profitable corporation has used potentially billions of dollars worth of material without paying for any of it. As one scholar at the round-table noted, tech companies may not use raw silicon for free, so why should they get to exploit millions of creative works for free, no matter what they’re turning that data into?
Link to the rest at The Illusion of More