The World’s First Genderless Ai Voice Is Here.

From Fast Company:

Voice assistants like Apple’s Siri and Amazon’s Alexa are women rather than men. You can change this in the settings, and choose a male speaker, of course, but the fact that the technology industry has chosen a woman to, by default, be our always-on-demand, personal assistant of choice, speaks volumes about our assumptions as a society: Women are expected to carry the psychic burden of schedules, birthdays, and phone numbers; they are the more caregiving sex, they should nurture and serve. Besides, who wants to ask a man for directions? He’ll never pull over at a gas station if he’s lost!

But what many people–myself included–have missed in the gender criticism of personal assistants is that it was even binary to begin with, as so much of the world identifies outside that schema. This oversight is exactly what Q is trying to fix. Q claims to be the world’s first genderless voice for AI systems developed by the creative studio Virtue Nordic and the human rights festival Copenhagen Pride, in conjunction with social scientist Julie Carpenter. The project had no client; it was born from a design exploration inside Virtue Nordic and snowballed from there.

. . . .

. . . .

Now, voice assistants are often gender-specific for a reason. Companies test these computer voices on users and listen to the results of those tests. At Amazon, users preferred Alexa as a woman rather than a man. That relatively small sample set was extrapolated to represent Alexa for everyone. Research has shown, too, that men and women alike report female voices being more “welcoming” and “understanding” than male voices, and it’s easy to understand why these would be qualities any company would want in their always-listening voice assistant. But these companies and researchers only tested male and female voices. And testing a narrow set of options on a limited number of users isn’t the best way to build representational technology.

Link to the rest at Fast Company


48 thoughts on “The World’s First Genderless Ai Voice Is Here.”

  1. The very first time that Hertz/Avis/National gave me a car for a business trip with an unturnoffable male voice, I drove right back in and made them change it — couldn’t get rid of it fast enough. It’s astonishing how quickly one hates hearing a man’s voice telling you what to do.

    Female voices (except mine) are higher pitched and penetrate noisy locations like airports better. All of us had mothers who told us to do things and we learned not to hate it.

  2. I wonder if they researched the data from Military flying development. They and civilian aircraft manufacturers all determined that female voices for warning systems worked best.

    Also, Bell systems found that a female voice were preferred in automated answering systems.

  3. Totally misguided and backwards.
    Humans are hardwired to respond to human intonations and verbal cues. And certain voices stay with us forever. (Momma!) 😉

    They would be best advised to spend some time in finding out what makes some voices so pleasant and memorable. Like, starting around the 1 Minute mark here:

    Voice control is one of the hallmarks of a great actor.
    Encode that and it’ll improve gadget users attention 10 fold. 😀

    • I don’t know what this is, but I recognize Timothy Dalton and the actor who played Sterling in “Leverage.” Great voices, both of them.

      The horse was great, too.

      • Timothy Dalton, Mark Shepard, and Alan Tudyk (as the narrator/fourth wall breaking villain).

        Baphomet is Chantelle Barry, who is actually aussie, not french.

        Other regulars are Brendan Fraser and Matt Bomer doing mostly voice work. Plus April Bowlby and Diane Guerrero doing 64 different characters and voices.

        A regular tour de force of voice acting in live action.
        (And excellent writing and direction.)
        Most notable being Timothy Dalton as a very persuasive mad scientist.

  4. Frankly, jumping to the conclusion of societal assumptions for why we use female voices just seems to be the author looking to create controversy. Female voices tend to be used (even going WAY back) because the female voice was considered more calming or, as in the case of WWII warning systems, the female voice would stand out in a cockpit among the male voices on the radio to avoid confusion.

    Personally, I blame Star Trek, Majel Barrett voicing the ship’s computer created a generation who expected the computer to have a woman’s voice – that’s my non-scientific opinion, and I’m sticking to it 🙂

    • There is a cultural element.
      And in a few languages with object genders, like spanish and other romance languages, computer is a female term.
      Plus, before computers were machines, they were overwhelmingly and decidedly females. Usually youngish.

      (Lets ignore Hal9000 and Adam Selene/Mycroft/Mychelle for now, right?)

    • I don’t “blame” Star Trek, but my first thought was the same as yours regarding expectations. Star Trek taught our subconscious to expect a female voice when talking with a computer.

  5. The voices were chosen because female voices were easier to understand in an environment with lots of other noise. Nobody gave a hoot if someone upset at being told what to do by a man. They used what worked best.

    Male and female voices have only a narrow frequency overlap. They are different. They had a choice, and chose what worked best. They would have used a frog’s voice if it worked better. But, if they did, some frog would have probably been offended.

    • Don’t know about “narrow frequency overlap”…

      Between male counter-tenors and female tenors (I’m one), that’s almost a two octave gender overlap bracketing middle C, out of a 3-octave general human range (from G 1.5 octaves below middle C to G 1.5 octaves above middle C).

      The distribution is skewed to both ends, of course, but the overlap is substantial.

      • Don’t know about “narrow frequency overlap”…

        The narrow overlap is generally between 165Hz and 180Hz.

        Males are generally between 85Hz nd 180Hz.
        Females are generally between 165Hz and 255Hz.

        There are certainly higher and lower frequencies a few standard deviations (SD) each way. That’s where we get the singers. James Earl Jones is at 85Hz, while a mezzo sopranno is over 1,000Hz.

        The voices in the various devices were chosen because the general population finds them easiest to hear and understand in different situations. If we do a distribution of all voices, we can see the range that is easiest for the largest numbers to understand. If we superimpose the distributions of male and females, we find that the chosen range is well within the female range.

        So, I would agree with that author that the choice of voices for these devices “speaks volumes about our assumptions as a society.” It tells us the designers are trying to make their products as easy as possible for the greatest number of people to use. With more advances in software we see the designers offering choice of voice to make the products even easier to use.

        As with any distribution, the SDs contain the exceptions, but the products are rarely designed for the SDs. But with today’s choice of voice, even the folks three SDs out on the grievance index distribution can be satisfied.

  6. Wow. What an accomplishment. So amazing. Now give me the actress who plays Cortana. I vastly prefer her to this.

  7. “…chosen a woman to, by default, be our always-on-demand, personal assistant of choice, speaks volumes about our assumptions as a society: Women are expected to carry the psychic burden of schedules, birthdays, and phone numbers…”

    No. They just sound better. But then I’m a SWM.


  8. I use Word Text-to-speech all the time when I am editing. I’ve switched the voice to female because I hear the higher pitched female voice better and I hear more intonation. Note: Msft could do a lot better– the voice has a lot of clunkers like pronouncing “read” as “reed” and “red” in the wrong places, but I find it quite useful for spotting typos and other solecisms.

    I don’t especially care for the gender-neutral voice in the OP. Sounds mush-mouthed, not gender-neutral to me. I’ll bet it would not do well in A-B testing. Gender-neutral seems like a good idea to me, but not well-executed here.

    • I also didn’t care for this voice; I couldn’t listen to it for more than a few seconds. I don’t grasp the point of it being gender neutral. Computer voices already can’t pass as humans, so why not just go with a voice that’s not the “uncanny valley” of voices?

      I probably wouldn’t mind an AI that sounds Timothy Dalton-esque, but that’s probably because I don’t hear that accent very often. A & B testing for sure.

        • Only if you want it to be. I see a gender neutral voice as complexity reduction. One less choice to make, one less configuration option to account for. One less item for picky people to whine about.

          • Reminds me of a compromise. One tool that is both a hammer and screw driver. I’d prefer to choose the best tool for the job at hand. But who knows, maybe the scrammer is good for something.

            • If you find scrammers for sale, please note it here. I’d get a half-dozen (various sizes) for the wife, all of my tools are already scrammers in her eyes. (Or wrammers, or plammers, or…)

                • But it’s not a real scrammer as it doesn’t do screw-driving. I can’t see myself ever using it as a scraper so I’ll stick to my trusty claw-hammer (one of the few dual purpose tools I own that actually works for both tasks).

                • When I’m messing around the house, I like to keep one of these in my back pocket: The one I have also has a bottle opener and screw driver point. When I was a carpenter, we used claw hammers to pick pad locks, loosen and tighten catheads, even pound nails. I saw an old guy sharpen a circular saw by tapping just right on the teeth with his claw hammer. Never tried it myself. People who insist on the exact tool for the job lack imagination. They waste the whole day walking back and forth to the tool crib.

                • @ Felix

                  I was still working on aircraft when those dang Leatherman miltitools came out.

                  Day one they looked like a time saver – one tool for many things.

                  Week two many of us were cursing the crew chiefs that were using them. The cross-point (Phillips) was between a #1 and #2 and if the screw was tight would strip both types of screw. Then there were to rounded off nuts from some idiot trying to use the needle-nose end on them …

                  Sometimes the right tool for a job is the right tool. 😉

                  As for voices, the female seems to cut through a lot of the other noise/interference for me. (Always got a laugh from hearing the F-4’s ‘canopy’ warning, sounded more like she was saying she needed to find a bathroom …)

            • Sure. It’s a clever response to a political claim that there is something wrong with using female voices in machines. It’s a good idea. There is a demand from people who feel offended, and their money is just as good as anyone’s. Hope they make lots of money.

              • Bloody hell, I hope they lose their shirts. If they make money, it means one of two things:

                (1) there really is a huge unserved market out there that demands ‘ungendered’ voices (a thing, by the way, that does not exist in nature), or

                (2) a handful of screeching extremists will have shamed the general public into going along with their demands, so that nobody but the extremists shall be allowed to do what they actually prefer.

                Take your pick which one is worse.

          • It’s a third option. There are two options if you want the AI to sound human: AI sounds male, AI sounds female. Sounding like a human who is none of the above / all of the above (either of which is rare in real life**) is an extra option.

            It’s one more choice, not less. It’s amusing that the OP claimed that male and female are narrow choices, when either one is literally half the human race. Any time you mean “half of all humans in the world” you’re going broad. The percentage of the world who does not prefer either half of the human race is rather narrow. Outliers, one might say.

            **Turner’s, Klinefelter’s, and intersexed people could potentially be classified as “neutral,” but it doesn’t sound as if the programmers grabbed any of these individuals as a control group. That actually would be an interesting experiment. A variation on the “identical twins raised apart” trick of nature/nurture questions.

    • Have you tried the IVONA TTS voices on FIRE tablets?

      It is also very good. Not as good as a human but nothing is as good as a good voice actor or audiobook reader.

      You get US, UK, and aussie accents.
      TTS is excellent for proofing.

  9. Except that non-AI voices that tell you what to do are very often male – I immediately think of air traffic controllers.

    • I have heard a few female ATCs. Not many, but there were some. But all the voices on RAPCON were male.

    • Just about the only time I dealt with a female air traffic controller was when I was turning from base to final when landing at Albuquerque. She asked me to confirm that my landing gear were down and locked. I was puzzled, never having been asked that before.

      “Say again, please.”

      “Confirm landing gear down and locked.”

      “Ma’am, my landing gear are down and welded.” (I was flying a motorglider without retractable gear.)

      Only after I landed did I learn that the controllers at Albuquerque had to ask that of all pilots because the airport was dual use, civilian and military, and the military pilots had to be asked about gear being down and locked because it’s especially expensive to land a jet fighter on its belly.

  10. Maybe I missed it, but I listened to half the thing and all I could hear was a woman’s deep voice. Fail.

    Or maybe I should get my hearing checked. Who knows.

  11. Years ago, the Navy studied voices and found that pilots and NFOs attended and responded better to female voices than to male voices. But the OP prefers to be woke.

  12. Hmm. My hearing must be different from almost anyone else’s.

    I’ve listened to this three times (well, two and a half) – all that I hear is an upper-crust, late teens English male. Who needs professional help with his depression, or needs to sign into a drug addiction clinic.

    99.999% of the world is gendered. (Even the vast majority of homosexuals / bisexuals are gendered, they simply have different preferences as to the genders they have romantic relationships with.)

  13. WHat I would like is for the Big Brother AIs (Alexa, Siri, Cortana, Google Assistant, etc” to allow people to make, share and use their own voice pack “skins” for their various assistants. You could maky your own, or buy one from your favourite actor. I expect the James Earl Jones one may be a popular option.

    There are various options for the voice top text in my phone with local accents, male and female and adjustments to spped and pitch available but it would be great to be able t o easily make them sound they way you like. If you want genderless, use that, want something else, that’s great too.

  14. It’s not quite that easy. The recording sessions for Siri, which is one of the smaller audio interfaces, took 4 hours a day for a month. At least the first round. There probably have been more.

    A couple hundred hours of James Earl Jones won’t come cheap.

    Tbey currently use voice actors because they’re used to the workload. But they’re still not cheap.

    A more likely scenario is that with the next generation of speech synthesis it should be possible to achieve 100% fluidity with semantic encoding and then overlay actor-specific tones and inflections. Figure another five years.
    They’ll still have to pay to record and analyze the actors but it should be complete in a couple of hours instead of a couple hundred.

Comments are closed.