Data, Algorithms & Authorship in the 21st Century

From SSRN (footnotes omitted, a few paragraph breaks added):

“Data is the new gold. It’s the new oil. It’s the new plastics.”
— Mark Cuban, 2017

Over the last decade the music, motion picture, and publishing industries have faced what many have characterized as a crisis. Online piracy and the digital technologies that enable it are said to have destroyed traditional models of content creation and distribution.

The music industry is most often offered as the leading example. In the nearly two decades since the digital file-sharing service Napster burst on the scene, recording company revenues have plunged by approximately 72% in the U.S., or almost 80% adjusted for inflation.

A great deal of that decline in revenue can be traced to the ability to distribute and share content digitally without either legal permission or much chance of consequence.

The story appears to be dire, and yet it is increasingly obvious that the crisis narrative obscures more than it reveals. To be sure, the shift to digital and the related upsurge in online piracy — a phenomenon we refer to here as the “first digital disruption” — dramatically re-organized power within the music industry and transformed the ways in which the industry does business and makes (or does not make) money. But the industry adjusted, and the disruption did not fundamentally change the way music is created.

The first digital disruption mainly undermined a particular set of music industry business models. Most of the impact fell on middlemen (record labels, publishing companies, and retailers) who saw their revenues sink. And even there, the story has been as much about creation as disruption. Record labels, formerly the dominant force in the industry, are much diminished today.

But streaming services, such as Spotify, Apple Music, and Tidal, once tiny, are now important players. Turning the destructive potential of digital distribution on its head, they have utilized the internet to pioneer new and lucrative modes of content dissemination. Indeed, the total revenue of digital distributors now exceeds the total revenues of recording companies.

The U.S. live music industry has also grown substantially, and is expected to continue to grow at about twice the rate of the overall economy.  And even as record company revenues have shrunk, the best evidence suggests that more music is being produced than ever before.

On the other side of the market, consumers pay less, and have more access to, that cornucopia of music than ever before.

The next digital disruption is going to reach deeper. It will re-order how creative work is produced, and not simply how it is promoted and sold. It will transform our notions of authorship. It will raise fundamental questions about the nature and value of human creativity. And, perhaps less consequentially for the world at large — but of central importance to lawyers — it may shift how we think about the the value and utility of, and even the moral justification for, intellectual property rules.

What is this second digital disruption? We can see its onset in the high-stakes merger between AT&T, which owns digital cable and satellite networks for distributing video programming, and Time Warner, which produces film and television content. The Department of Justice challenged the merger, arguing that it would harm competition in video programming and distribution markets. In its pre-trial brief, Time Warner argued for the merger by noting that, as a stand-alone content producer it faced a competitive disadvantage versus rivals, such as Netflix, Google, and Facebook, that produce content but also own a digital distribution platform. As Time Warner argued:

First, unlike Google and Facebook, Time Warner has no access to meaningful data about its customers and their needs, interests, and preferences. In most cases, Time Warner does not even know its viewers’ names. This data gap impedes its ability to compete with Google, Facebook, and other digital companies in advertising sales, which are critical to Turner [Broadcasting (the owner of Time Warner]’s viability, and which allow Turner to keep subscription fees much lower than they otherwise would be. Whereas digital companies have the data and the technology to deliver advertisements that are both specifically addressed (shown) to a particular viewer and tailored to that viewer’s specific needs and interests, Time Warner cannot target its television advertising in those ways, creating an increasing competitive disadvantage for the company. The data gap also gives online video programmers a competitive advantage in the production and aggregation of content based on extensive data about the content preferences of their viewers.

This spring Judge Richard Leon of the United States District Court for the District of Columbia agreed, holding that “traditional programmers and distributors are experiencing increased competition from innovative, over-the-top content services [i.e., companies that provide video programming over the Internet] …. Those web-based companies are harnessing the power of the internet and data to provide lower-cost, better-tailored programming content directly to consumers. The dramatic growth of the leading [Internet video providers] in particular, including Netflix, Hulu, and Amazon Prime, can be traced in part to the value conferred by vertical integration — that is, to having content creation and aggregation as well as content distribution under the same roof.”

Data is at the core of the second digital disruption. In Mark Cuban’s words, data is “the new gold”: the resource that will create, and likely destroy, fortunes in the content business.

The “data gap” Time Warner spoke of is not just a competitive disadvantage for firms that produce many different types of creative content. Access to data about consumer preferences is rapidly becoming a competitive necessity, and the inability to gather such data, on a massive scale, is a fundamental disability.

Increasingly, we will see the rise of firms that own large and even dominant digital distribution platforms but also produce content for those platforms. Indeed, this trend is visible already. Netflix, Amazon, and, not yet but perhaps soon, Spotify, use the data they collect on consumer preferences and usage to make decisions about advertisements. All now use this data to decide how to organize and recommend content to users.

And some use their data to produce content that is more effectively targeted to consumer preferences. It is this last twist — the use of data to shape content creation, which we refer to as “data-driven authorship” — that is ultimately the most interesting feature of this new model.

Link to the rest at SSRN

PG says indie authors are conducting a variation on the concept in the OP with increasingly sophisticated salting of key words within their promotional materials in order to attract the types of people who will want to purchase their books.

One example is the more frequent use of author or title comparisons in book descriptions, such as, “If you like Penelope Blunderbuss, you’ll love ________”

When Amazon’s algorithms are trying to present books a reader will want to purchase, if that reader has just finished a book by Penelope, the algorithms may bump a book that includes Penelope’s name up near the top of its suggestions for that reader.

This is the great, great, great grand-descendant of Search Engine Optimization, first used by PG about 15 years ago to push his company’s products higher in the Google search results when people searched for those products.

Search algorithms have become enormously more sophisticated during the intervening years, particularly at Amazon, where they know both what you’ve searched for and what you’ve purchased, but the first principle of a successful search engine – show the customer what the customer wants to see – hasn’t changed.

16 thoughts on “Data, Algorithms & Authorship in the 21st Century”

  1. When they truck out Napster as a reason for the decline in music buying I suspect the rest of the blog will be equally crap.

    (What the music cartels don’t want to remember is that closing Napster ‘lowered’ their music sales in stores near campuses – because the kids could no longer try/find new songs/singers they had never heard before.)

    Heck, I hadn’t even heard of Meat Loaf until I saw ‘I will do anything for love’ on VH1. 😉 (Warning 12 minutes long!)

    • Ah, one who missed The Rocky Horror Picture Show, with, along with Sue Sarandon and Tim Curry, had Meatloaf in an ‘epicurean’ role.

        • You never heard PARADISE BY DASHBOARD LIGHT?
          Poor soul.

          BAT OUT OF HELL is one of the great rock albums of any century.

          • Missed it all until VH1. Went looking and ‘anything for love’ was on BOoH II. II? then there’s a I? Bought the 6-7 CDs they had in the store at the time. There are a couple tunes I don’t care for, but most of it is good.


  2. It’s not piracy that is disrupting content creation. Rather it is the fact that content distribution is now so cheap, it might as well be free. It costs Amazon almost nothing to download another ebook to a Kindle.

    All the old ways were based on an assumption that reproduction and distribution are expensive. Publishing– after you strip out authorship, editing, and design– is now so cheap, it’s hard to account for the costs. Content production is still expensive in time and effort, but those are one time costs. One ebook costs the same as 10,000.

    It is easy for authors produce and distribute content with almost no investment or fees. (Write on a free blog platform, using public WiFi, for example.) The amount of free content on the network is staggering. What are the economics of free? I don’t think anyone knows. Yet.

    • Added value rules.
      And curation is value subtraction.
      People prefer to see for themselves.

      In ebooks: samples!
      In print: Look Inside.

      Tradpub fought both.
      Readers noticed.

      • I don’t agree that curation is value subtraction. Some curation is useless to me– most things published, both traditionally and independently, doesn’t appeal to me much.

        The books I like are hard to find. Out of a hundred samples, only three or four make the cut. That’s a lot of unpleasantness to wade through.

        But some reviewers and sites consistently recommend content that I do like. That is a form of curation that I value. I probably miss a few things that I might like by using these filters, but that is not the point for me. By using filters, my overall experience is better because I miss more when I spend too much time reading samples.

    • In the case of the music industry, it was not a reduction in cost of distribution that took a bite out of their revenue. The technological change that affected them was the capacity for the customer to make a high quality copy of the media that had been sold to them.

      Up until 1999-2001, the music industry had been able to charge the consumer an “ownership” price while providing the music on media that was very difficult to make an inexpensive copy of and would deteriorate with time and use. With the advent of gigabyte scale hard drives, as well as optical media that was becoming radically less expensive, it became possible to store music on one’s computer.

      Some people ripped their CDs and converted them from wav to mp3 format. Then, they started sharing those files over the internet, which made it easier for individuals that legally owned the music on CD to have an immortal copy of the music without having to go to the tedium of processing the hundreds or thousands of CDs that they owned.

      There was illegal file sharing, but this was not the revenue killer. There had always been coping, which was of inferior quality, but usually good enough for those who were price sensitive enough to engage in it.

      The studios’ revenue hit a cliff, because the resale of the same music to the same customer stopped almost completely. They no longer had wear and tear as a weapon to charge a customer for each play of the music they ostensibly “owned”. They could not fall back on the charge by the listen business model, either, because customers were used to “owning” their music, even when they did not actually have unlimited access to that music for the rest of their lives as they would if they truly owned it.

      The impact to the movie and book industries have been related, but are different. Books are not much affected by the lifespan or by the sharing of electronic files for free. Books last lifetimes and there are libraries full of books. They are more impacted by the ability of the publishers to control price points during different windows of the book’s lifecycle. Ebooks don’t command the premium of a hardback, but there are customers that have high demand for the book that will want the ebook rather than a physical copy. This makes it difficult to make the less expensive ebook available a year later than the hardback.

      Trad pub could probably improve the situation by continuing their current ebook pricing regime, but reducing the price of the ebook 6 and 12 months after first publication. They do it sometimes as a promotion, but it might make sense to do it more regularly and dump the softcover book altogether.

      Obviously, this only makes sense with the big names. They should probably go with an ebook-softcover release at launch for new authors and midlist. They definitely need to be more thoughtful in a world that contains ebooks, but their problems are only peripherally related to the business model gutting that the music industry suffered due to the move to a digital format.

      • I don’t dispute at all that copying eroded music industry revenue, but when music consumers began copying they became, in effect, free music publishers. Music publishers were no longer able to exert control over their content by controlling its reproduction and distribution.

        Looking at the twenty year horizon through my toy spyglass, I see book publishers succeeding by marshaling the best authors, editing, design, and marketing, but they will have to devise a revenue model that does not depend on controlling reproduction and distribution.

        Tweaking pricing and launch sequences are good short term strategies, but how do they compete against free? Which is what audiences have come to expect. I listen to music all the time, but I pay for it by contributing to public network radio and going to live performances. I haven’t shelled out a nickel for albums, tapes, or CDs for at least a decade. I think I am close to typical. I don’t quite see yet how it will work, I still buy books, but I anticipate something similar for books in the future.

        • “Music publishers were no longer able to exert control over their content by controlling its reproduction and distribution.”

          Or creation. It no longer takes a sound stage and a dozen techs to record a song. The late 90s had better than CD sampling sound cards you could plug into a computer and software to help make it sound like whatever you wanted.

          Rejected/ignored by the music studios? You can now roll your own.

          Rejected/ignored by trad-pub? Lots of indies read here and know that that is no longer a problem.

          Control, or the lack of control is what’s leaving them with just their backlists to play with.

      • MP3 rips might be quick and easy but they were not the first or only way people made copies of commercial music for personal use.
        In fact most serious music lovers going back to the 50’s/60’s and most prevalently in the 70’s would routinely copy their LP’s during the first play and use the taped version afterwards, keeping the LP as an archival or master copy. If the cassette broke, they just ran a new copy. Some bought two tape decks and by the 80’s dual tape cassette decks were common. Even early CDS were “ripped” using the “Analog hole”. Analog to digital interfaces were producing digital music files well before CD-ROMS became a standard PC feature.

        So no, consumer copying wasn’t the big turning point. Nor was it Napster, though it pointed the way.

        The big turning point was the consumers pivoting away from primarily buying albums to primarily buying singles. Instead of buying a $12-15 CD album, consumers were buying two to three $0.99 hit singles.

        That switch by itself gutted their revenues.

    • “It’s not piracy that is disrupting content creation.”

      Nope, not piracy, loss of control is what did them in.

      The age of the computers and the internet is what is killing them.

      A garage band can make their own music and be ‘discovered’ on youtube. They can sell MP3s and CDs, all without a contract or any ‘help’ from a music studio.

      A writer with a computer can write/edit/make a cover/upload a story and bypass trad-pub.

      Yes, there’s a lot of chaff you have to weed through to find the grain, but you had to do that with trad-pub/music studios too.

      And that’s what is driving them crazy, they can no longer dictate what the public can read/hear, they can’t control ‘when’ an indie is going to release their latest bombshell and at what price-point.

      It’s a brave new world and they have no idea how to compete if they can’t control it.


    • What are the economics of free? I don’t think anyone knows. Yet.

      The economic price of a good includes both the money transferred to the seller and the transaction costs. There may be no transfer of money to the seller, but the transaction costs may substantial. If so, the good is not free.

      If we dive into theory, one might say the rise of the large corporations over the last few hundred years is simply an ongoing reduction in transaction costs.

      If we jump out of the theory and look at the real world, the transaction costs of getting a book from Gutenberg are much higher than getting it from Amazon. The same is true of getting a paper book from the library vs getting it from Amazon.

      The best comparison available is between Amazon “free” eBooks and Amazon paid eBooks. Transaction costs are the same. So, the data is there, but not available to anyone outside of Amazon. I suspect they are having a wonderful time playing with it.

  3. When I think about it, the amount of free reading that I do has increased greatly in the last few years. I read a lot of classical stuff from Project Gutenberg and there are other similar sources. I used to subscribe to a few technical and scholarly journals, but I can get them electronically through my alumni association now. I’ve always been a big library user and borrowing has gotten easier with slick online catalogs and more ebooks available. OverDrive now has its own ebook reading app that a lot of people like. Maybe we are closer to free reading than I thought…

    • OverDrive now has its own ebook reading app that a lot of people like. Maybe we are closer to free reading than I thought…

      Taxpayers would disagree.

Comments are closed.