Is “Machine Learning” Copying or Reading?

This content has been archived. It may no longer be accurate or relevant.

From The Illusion of More:

I recently attended a round-table discussion on the subject of artificial intelligence and copyright.  The first of several engaging topics I thought warranted a post was the question of “machine learning”

. . . .

When you read a book, even if we might say, by way of analogy, that you are “copying” the content of that book onto your brain, this clearly does not infringe §106(1) of the copyright law proscribing unauthorized copying.  Since the author naturally hopes that you will read her book, such a prohibition would be absurd, even if you had an eidetic memory and could, if prompted, recite the entire work verbatim.  But if you used that gift to type from memory the entire book and made that document available, you would then violate more than one statute under the copyright law.

So, the question raised in regard to “machine learning” is whether the computer scientist who wishes to feed a corpus of books—say the anthology of American literature—into an AI should be required to obtain licenses for the works still under copyright.  Thus, the first analysis is whether the act of “copying” can be said to occur in this circumstance any more than it would be for the human reader who consumes the same body of literature.

It strikes me that if what the AI does in this case is ingest the corpus of books and almost instantly deconstructs those works by synthesizing them through a neural network, then the computer scientist has a pretty solid argument that no copying has taken place.   If the machine does not retain intact copies of works—or even large sections of works—-with the purpose of making those intact copies available to the human market, then this “machine reading” process is arguably analogous to the human whose reading does not infringe §106(1) of the copyright law.

That said, intent of the computer scientist may be a significant factor.  For instance, if the training of the AI will have a commercial purpose, this may suggest a requirement to license the works under copyright.  But intent can be very tricky on the leading edge of science because it is neither realistic, nor even desirable, to insist that every researcher know exactly where his experiments will lead.  This would nullify the process of discovery whence many great achievements have been made; hence, discovery is justification itself, and I suspect the tech companies would appeal to this rationale in regard to “machine learning.”

If the computer scientist’s goal is to see whether he can get his AI to “learn” about the American experience through literature, but he does not have a particular product or service in mind at the outset, it seems that copyright owners would be on fairly shaky ground to enjoin his use of the books.  As long as nothing that comes out the other end looks like any of the products that went in, it strikes me that this experiment exists beyond the statutory framework of copyright law.

. . . .

On the other hand, the moment Google or Facebook did announce that new product, rightsholders could justifiably complain that a massive, highly-profitable corporation has used potentially billions of dollars worth of material without paying for any of it.  As one scholar at the round-table noted, tech companies may not use raw silicon for free, so why should they get to exploit millions of creative works for free, no matter what they’re turning that data into?

Link to the rest at The Illusion of More

5 thoughts on “Is “Machine Learning” Copying or Reading?”

  1. In all these comparisons of humans to machines we hear people saying the machines don’t do the same thing humans do. Well, how do humans think, reason, decide? What is the specific and basic mechanism, and how does it work? What is thinking?

    We talk about thinking, feeling, consciousness, instinct, emotion. What’s all that? How does it work? Seems we just agree with each other, presuming a common experience, but don’t have a clue what it all means.

    Do cats think? How is the process different that in humans? How about honey bees? Do they think? Is it the same mechanism humans use? If so, what is it? How about ants? The colony? Thinking?

  2. Part of the problem is the common misuse of the terms AI and Artificial Intelligence. Coupled with the public’s misperception of the terms, this results in overestimation of the program’s capability.

    If Google reigned in their programmers and told them to call the damned thing by the old term of Expert System, the problem would evaporate.

    The notion that a program ‘learns’ is bogus. No. It separates, collates, and catalogs. It does not conceive anything new. It only finds patterns in the data and displays them. It may find patterns that were undetected before, but so what? That was the intended purpose of the tool. It does not mean that the tool thinks.

    Really, people, these yahoos have watched Colossus: The Forbin Project too many times.

  3. “When you read a book, even if we might say, by way of analogy, that you are “copying” the content of that book onto your brain, …

    One might indeed say that. Show me the copy.

  4. It is neither.
    The establishment just refuses to accept the verdict on the google lawsuit.

    The “AI” system is just looking for patterns in the datasets. It is neither duplicating the data for distribution nor understanding the meaning of the text. Reading requires comprehension and “AI” comprehends nothing.

    No copying, no reading, no case.

    This is the exact same use case of Google indexing the scans.
    It is fair use, plain and simple.

    But I would love the case to make it to trial (though I doubt it will) so the hype masters would have to admit there is no actual intelligence in their “AI”.

  5. “When you read a book, even if we might say, by way of analogy, that you are “copying” the content of that book onto your brain, …”

    No, you are making an interpretation of what you imagine based on the words you are reading/hearing. Two different people can come away with two different ideas of what the book was saying.

    Which is why AI will have such a hard time understanding humans – we don’t understand ourselves well enough to explain us to the poor AI. (and until AI ‘can’ understand humans one can’t write a book or a song any better than a million monkeys banging on a million keyboards can – but at least it can do it faster! 😉 )

Comments are closed.