WHEN VOICE ACTOR Heath Miller sits down in his boatshed-turned-home studio in Maine to record a new audiobook narration, he has already read the text through carefully at least once. To deliver his best performance, he takes notes on each character and any hints of how they should sound. Over the past two years, audiobook roles, like narrating popular fantasy series He Who Fights With Monsters, have become Miller’s main source of work. But in December he briefly turned online detective after he saw a tweet from UK sci-fi author Jon Richter disclosing that his latest audiobook had no need for the kind of artistry Miller offers: It was narrated by a synthetic voice.
Richter’s book listing on Amazon’s Audible credited that voice as “Nicholas Smith” without disclosing that it wasn’t human. To Miller’s surprise, he found that “Smith” voiced a total of around half a dozen on the site from multiple publishers—breaching Audible rules that say audiobooks “must be narrated by a human.” Although “Smith” sounded more expressive than a typical synthetic voice, to Miller’s ear it was plainly artificial and offered a worse experience than a human narrator. It made giveaway mistakes, like pronouncing Covid as “kah-viid” when referring to the pandemic.
Miller tracked down “Smith”—the voice matched a sample posted to SoundCloud by Speechki, a San Francisco startup that offers more than 300 synthetic voices for audiobook publishing across 77 dialects and languages. He and other narrators and audio fans who discussed the artificial audiobooks online reported the titles to Audible, which eventually removed them. Although it wasn’t a large number, discovering that synthetic voices were good enough for some publishers to put them to work prompted Miller to wonder about the future of his art and income. “It’s a little terrifying because it’s my livelihood and that of many people I respect,” he says.
Richter says he chose an artificial voice because the concept and its “uncanny valley” sound suited his book, which has a piece of intelligence software as one of its main characters, and that he was unaware of Audible’s policies. “My intention was never to upset or offend anyone,” he says. Speechki says it recommends publishers identify that narrations are synthetic and that it informs them of Audible’s policies. Will Farrell-Green, a senior director at Audible, said in an emailed statement that the company uses automated and manual processes to enforce its rules but that “due to the volume of content on our service, titles that are not compliant do slip through from time to time.” Audible’s “human’s only” policy dates back to at least 2014, when synthetic voices were much less convincing, and the company has said the rule helps provide listeners the performances they expect.
Synthetic voices have become less grating in recent years, in part due to artificial intelligence research by companies such as Google and Amazon, which compete to offer virtual assistants and cloud services with smoother artificial tones. Those advances have also been used to make reality-spoofing “deepfakes.” Speechki is one of several startups developing speech synthesis for audiobooks. It analyzes text with in-house software to mark up how to inflect different words, voices it with technology adapted from cloud providers including Amazon, Microsoft, and Google, and employs proof listeners who check for mistakes. Google is testing its own “auto-narration” service that publishers can use to generate English audiobooks for free, using more than 20 different synthetic voices. Audiobooks published through the program include an academic history of theater and a novelist’s exploration of cultural attitudes to sex. Google spokesperson Dan Jackson says its auto-narrated books supplement rather than replace professionally narrated books. “Our goal with auto-narration is to make it possible to create a low-cost audiobook for any ebook title and increase content accessibility for those that are unable to read via ebook,” he says.
Link to the rest at Wired
Here’s a sample of a synthetic voice from Speechki that was embedded in the OP.
Per the Speechki website, their software can produce an audio book in 15 minutes.