From Kristine Kathryn Rusch:
I just spent a half fun few hours and a half pain in the patootie few hours. As I mentioned in the previous post, I’ve been working on AI audio. I decided I’d make a decision on the preliminary service this week.
I figured I’d do a lot of audio versions of the test blog, each from a different site. But the terms of service on some sites scared me off. On others, it was the pricing. Not the introductory pricing, but the pricing that WMG needed.
The Enterprise Tier of many of those services, which is the tier WMG would need, are often eye-crossingly expensive. Many of them include services that we don’t need…at least at the moment.
A number of the services sounded great, until I looked at how many hours of audio I would get for the price. A few of the services, in beta, were really expensive. I’d rather pay a voice actor than pay for these services.
So I ended up trying only one service, Murf. It has a good TOS (at the moment, anyway). It gave me ten free completed minutes of audio. I only used 1:17 minutes.
The free service did not let me clone my voice (not that I would have at this juncture), although I could have tried a simulation. Instead, I had the choice of two middle-aged female voices or half a dozen female young adult voices. I could also have at least two middle-aged male voices, and a bunch of middle aged young adult voices.
I chose the least objectionable middle-aged female voice, and played.
I had to work with pronunciation on some expected things, like my last name, and some unexpected things, like PayPal. The voice, at a neutral speed, sounded robotic, so I sped her up.
As I noted in the text, I had to change a number of things for clarity. I will have to do some of the audio blogs differently than I do the text blogs, which really isn’t a problem.
All in all, it took me 30 minutes to learn the system and create the 1:17 minutes of audio. I could have done the same on one of my audio programs, using my own voice, in half that time.
But I don’t expect the audio version of the blog to take longer than 30 minutes to set up. Most of that 30 minutes was me learning the program. Not a big deal, actually, and it wasn’t that hard.
I was surprised, actually. I thought it would be more difficult. Instead, I had fun.
. . . .
In my AI Audio research, I found a lot of really good programs. Almost all of them wanted me to email them or contact them by phone to do voice cloning. Which means that voice cloning is expensive.
At the moment, I’m not into expensive. I’m going to pay a little for some of these services because I want to do the blog and a few other things, but I am not going to pay a lot.
I’m going to wait on voice cloning.
I liked what I saw from Murf.ai, and I had fun playing with their system. It didn’t take long, as I mentioned above, and the sound was good enough. (I didn’t spend extra time tweaking it, since I wasn’t sure if I was going to use the program.)
Link to the rest at Kristine Kathryn Rusch
Kris’s experience with AI narration (it’s worth reading the entire OP if you’re thinking about it) is similar to PG’s. Kris was more systematic in her exploration than PG was, but her conclusions were the same as PG’s – professional book narrators (and, to a lesser extent right now, voice actors) have a lot to be worried about with AI.
If you would like to get an audiobook completed quickly, AI is the clear winner. Absent some foreign language or very obscure words in the manuscript, AI of commercial quality should do a perfect first take almost every time. You don’t need to pay for a recording engineer or studio rental, either.
If AI works for audiobooks, PG would expect the cost of audiobooks to plunge. Effectively, an audiobook is a bunch of electrons, just like an ebook, and the storage and distribution of electrons over the internet is very inexpensive these days.
Here’s a link to Kris Rusch’s books. If you like the thoughts Kris shares, you can show your appreciation by checking out her books.