Book publishers sue Audible to stop new speech-to-text feature

PG has posted about this latest dispute between Big Publishing and Amazon before, but thought the OP was a good (though speculative) description of Amazon’s possible legal analysis supporting its offering of this new audiobook feature.

From Ars Technica:

Seven of the nation’s top book publishers sued Amazon subsidiary Audible on Friday, asking federal courts to block the company from releasing a new feature called Audible Captions that’s due out next month. The technology does exactly what it sounds like: display text captions on the screen of your phone or tablet as the corresponding words are read in the audio file.

The publishers argue that this is straight-up copyright infringement. In their view, the law gives them the right to control the distribution of their books in different formats. Audio is a different format from text, they reason, so Audible needs a separate license.

This would be a slam-dunk argument if Audible were generating PDFs of entire books and distributing them to customers alongside the audio files. But what Audible is actually doing is subtly different—in a way that could provide the company with firm legal ground to stand on.

The caption feature “is not and was never intended to be a book,” Audible explained in an online statement following the lawsuit. “Listeners cannot read at their own pace or flip through pages as they could with a print book or eBook.” Instead, the purpose is to allow “listeners to follow along with a few lines of machine-generated text as they listen to the audio performance.”

“We disagree with the claims that this violates any rights and look forward to working with publishers and members of the professional creative community to help them better understand the educational and accessibility benefits of this innovation,” Audible added.

. . . .

[A]n Audible executive explained that the technology was “built on publicly available technology through AWS Transcribe.” That’s Amazon’s cloud-based service for automatic text transcription.

So it seems that the Audible app is generating text captions in realtime as the user plays an audio file. The app sends snippets of audio files to an Amazon server and gets back corresponding sections of text, which it then displays on the screen one word at a time. (It’s possible that AWS Transcribe has an offline mode that allows the transcription to happen on-device, but I haven’t found any documentation about this. I’ve asked Audible about this and will update if they respond.)

Audible is likely doing this because it strengthens the company’s argument that it can do this without a license from publishers.

To see why, it’s helpful to review two of the most important copyright decisions of the modern era. The first was the 1984 decision of Sony v. Universal that declared the VCR legal. Hollywood argued that the “record” button on a VCR was an invitation for customers to infringe their copyrights. But the Supreme Court disagreed, arguing that copyright’s fair use doctrine allowed “time shifting”—recording a show now to play it later.

The courts built on this decision with a 2008 ruling known as Cartoon Network v. Cablevision. In that case, a bunch of media companies sued the cable company Cablevision because it was offering customers a “remote DVR.” Like a conventional DVR (or a VCR before that), Cablevision’s technology allowed customers to record and play back television shows at their convenience. But unlike a conventional DVR, the remote DVR was located in a Cablevision data center, not in the customer’s home.

Television content owners argued that Cablevision was infringing their copyrights by making unauthorized copies of their show on a massive scale. Cablevision disagreed, arguing that the copies were being made by customers, not by Cablevision. The physical DVR might be owned and maintained by Cablevision, but the customer was deciding which shows to record. And the customer was entitled to do that under the earlier Sony ruling. An appeals court ultimately accepted this argument.

The Cablevision ruling provided a legal foundation for cloud-based “storage locker” services that allowed customers to upload, save, and stream (but not share) their music and video collections.

. . . .

That brings us back to Audible’s new transcription technology. Audible doesn’t have the legal right to sell text versions of audiobooks to customers without publishers’ permission. But we can expect Audible to argue that it does have a right to sell software tools that allow customers to do speech-to-text conversion.

Audible’s case will likely be strengthened by the fact that its app never creates or saves a permanent, full transcript of an audiobook. Instead, the software only displays a few words on the screen at a time.

If Audible is sending audio files to Amazon’s servers for transcription, publishers are likely to argue this means Amazon—not users—are creating the transcripts. But this seems closely analogous to the Cablevision case: the conversion is being done by Amazon servers but only when explicitly requested by users. And each translation is only sent back to the user who requested it.

Link to the rest at Ars Technica



8 thoughts on “Book publishers sue Audible to stop new speech-to-text feature”

  1. On a different note… those of us with, say, Welsh names embedded in the audio have to wonder just how mangled they will be.

    I’d rather accurate transcriptions than AI ones…

    • We talked about this in previous threads. In the last thread on this, I linked to the legal filing the publishers made against Audible, and in it Audible said that for now they’re holding off against transcribing audio books that have either a high amount of foreign words (Llanfairpwllgwyngyll) or “fantastical words” (Daenerys), because the transcription would not be accurate enough. I’m gonna guess that “The Chronicles of Prydain” is not gonna be transcribed any time soon 🙂

    • “In response to a lawsuit filed by seven publishers, in which those publishers asked the court to prevent Amazon’s Audible from launching Audible Captions, Audible has decided to temporarily exclude those publishers’ works from the new feature.”

      Beautiful. It’s like that agency game they played a while back, Amazon does as they ask/demand and the readers see that those publishers’ works aren’t as full-featured as everyone elses.

      I’ll go pop some corn because as soon as this thing goes live you know those same publishers will be screaming that Amazon’s being mean to them yet again. 😉

  2. “This would be a slam-dunk argument if Audible were generating PDFs of entire books and distributing them to customers alongside the audio files.” And they aren’t. It’s just like closed captioning, correct?

    I just don’t see the harm. I only see the upside for my readers that want a more enhanced experienced, use audio to help their reading skills, etc.

  3. I actually agree with the publishers on this one. I’ve always felt that Amazon has no right to facilitate my work being read out loud in their applications. They’re violating my audio rights. If the reader is able to make that happen on his/her own for their personal use, more power to them. But for someone who is selling text copies of my book, theyre violating my rights by providing an audio narrator whom I haven’t approved.

    • In the first place, you’ve got it exactly backwards. This is not text-to-speech, it’s speech-to-text. They already have a licence for the audio rights, or they wouldn’t be releasing an audiobook.

      In the second place, there’s nothing you can do about either text-to-speech or speech-to-text. The former is trivially easy and the function is built into every modern operating system for computers, tablets, or phones. The latter is actually mandated by U.S. laws requiring media to be accessible to those with sensory disabilities.

      The publishers are suing Amazon for complying with federal law. It won’t fly.

Comments are closed.