From The New Publishing Standard:
Pretty much since smartphones became mainstream, audio content in the form of podcasts and audiobooks have been gathering momentum as a significant format sector in the global publishing industry.
Even with the à la carte and monthly credit subscription models audio has taken off big time with consumers, while in the markets where publishers are amenable to unlimited subscription audiobooks have quickly become a format to rival – and in the case of Sweden even to exceed – the popularity of print.
But the brake on audio – and especially on longform audiobooks – has always been the production costs of studios, sound engineers and narrators that can add thousands of dollars to the cost of a book as a sound product, deterring many publishers and making some titles financially unviable.
Lurking in the background as the audio industry discovered and embraced digital, was AI – artificial intelligence – with the futuristic promise and premise that one day an entire book could be narrated by a robot and no-one would know any better.
Well, we’re not there yet, but anyone who follows developments in this arena will know quality is accelerating, driven by the proven global demand for digital audio based on text-to-speech (TTS).
As an author I love the idea that one day I might, at the click of a mouse, convert my novels to saleable-quality audiobooks, and as an industry commentator writing TNPS I fantasise about the day I might hit the mouse and my TNPS posts be converted into podcasts.
In the real world it seemed like the latter might happen soonest, as TTS (text to speech) seems to be developing fastest in the non-fiction arena, where delivery relies less on emotion and more purveying information.
But the reality is when I try the latest sample AI offerings I hit one major obstacle – TNPS posts are so full of “foreign” names (as in not in the AI English names database) that the text converted to sound is quite unacceptable. Another couple of years and it might be a different story.
But for fiction, where conveying emotion and tone has been the problem, progress has been palpable, this week resulting in news that one AI-audio operator, UK-based DeepZen, has partnered with US distributor Ingram to offer its AI-audio services to a no doubt cautiously optimistic publishing industry.
Per the DeepZen press release,
The service uses innovative technology that replicates the human voice to create a listening experience that is virtually indistinguishable from the real thing. Developed specifically for audiobooks and long form content, it incorporates artificial intelligence, natural language processing, and next generation algorithms.
DeepZen’s AI voices are licensed from voice actors and narrators, capturing all of the elements of the human voice, such as pacing and intonation, and a wide range of emotions that produce more realistic speech patterns. They are benchmarked against human narration, and are a world away from the robotic, monotone, voice assistants with which we are all familiar.
But that still begs the question, are they a world away enough to be acceptable to paying consumers?
The 49 second sample DeepZen offers via the press release really isn’t enough to make that call, but check it out here and see – or rather hear – for yourself.
Link to the rest at The New Publishing Standard
Here’s a link to DeepZen where you can hear some AI voices