Imagine you are in a foreign country where you don’t speak the language and your small child unexpectedly starts to have a fever seizure. You take them to the hospital, and the doctors use an online translator to let you know that your kid is going to be OK. But “your child is having a seizure” accidentally comes up in your mother tongue is “your child is dead.”
This specific example is a very real possibility, according to a 2014 study published in the British Medical Journal about the limited usefulness of AI-powered machine translation in communications between patients and doctors. (Because it’s a British publication, the actual hypothetical quote was “your child is fitting.” Sometimes we need American-British translation, too.)
Machine translation tools like Google Translate can be super handy, and Big Tech often promotes them as accurate and accessible tools that’ll break down many intra-linguistic barriers in the modern world. But the truth is that things can go awfully wrong. Misplaced trust in these MT tools’ ability is already leading to their misuse by authorities in high-stake situations, according to experts—ordering a coffee in a foreign country or translating lyrics can only do so much harm, but think about emergency situations involving firefighters, police, border patrol, or immigration. And without proper regulation and clear guidelines, it could get worse.
Machine translation systems such as Google Translate, Microsoft Translator, and those embedded in platforms like Skype and Twitter are some of the most challenging tasks in data processing. Training a big model can produce as much CO2 as a trans-Atlantic flight. For the training, an algorithm or a combination of algorithms is fed a specific dataset of translations. The algorithms save words and their relative positions as probabilities that they may occur together, creating a statistical estimate as to what other translations of similar sentences might be. The algorithmic system, therefore, doesn’t interpret the meaning, context, and intention of words, like a human translator would. It takes an educated guess—one that isn’t necessarily accurate.
In South Korea, a young man used a Chinese-to-Korean translation app to tell his female co-worker’s Korean husband they should all hang out together again soon. A mistranslation resulted in him erroneously referring to the woman as a nightlife establishment worker, resulting in a violent fistfight between the two in which the husband was killed, the Korea Herald reported in May. In Israel, a young man captioned a photo of himself leaning on a bulldozer with the Arabic caption “يصبحهم,” or “good morning,” but the social media’s AI translation rendered it as “hurt them” in English or “attack them” in Hebrew. This led the man, a construction worker, to being arrested and questioned by police, according to the Guardian in October 2017. Something similar happened in Denmark, where, the Copenhagen Post Online reported in September 2012, police erroneously confronted a Kurdish man for financing terrorism because of a mistranslated text message. In 2017, a cop in Kansas used Google Translate to ask a Spanish-speaker if they could search their car for drugs. But the translation was inaccurate and the driver did not fully understand what he had agreed to given the lack of accuracy in the translation. The case was thrown out of court, according to state legal documents.
These examples are no surprise. Accuracy of translation can vary widely within a single language—according to language complexity factors such as syntax, sentence length, or the technical domain—as well as between languages and language pairs, depending on how well the models have been developed and trained. A 2019 study showed that, in medical settings, hospital discharge instructions translated with Google Translate into Spanish and Chinese are getting better over the years, with between 81 percent and 92 percent overall accuracy. But the study also found that up to 8 percent of mistranslations actually have potential for significant harm. A pragmatic assessment of Google Translate for emergency department instructions from 2021 showed that the overall meaning was retained for 82.5 percent of 400 translations using Spanish, Armenian, Chinese, Tagalog, Korean, and Farsi. But while translations in Spanish and Tagalog are accurate more than 90 percent of the time, there’s a 45 percent chance that they’ll be wrong when it comes to languages like Armenian. Not all errors in machine translation are of the same severity, but quality evaluations always find some critical accuracy errors, according to this June paper.
The good news is that Big Tech companies are fully aware of this, and their algorithms are constantly improving. Year after year, their BLEU scores—which measure how similar machine-translated text is to a bunch of high quality human translations—get consistently better. Just recently, Microsoft replaced some of its translation systems with a more efficient class of AI model. Software programs are also updated to include more languages, even those often described as “low-resource languages” because they are less common or harder to work with; that includes most non-European languages, even widely used ones like Chinese, Japanese, and Arabic, to small community languages, like Sardinian and Pitkern. For example, Google has been building a practical machine translation system for more than 1,000 languages. Meta has just released the No Language Left Behind project, which attempts to deploy high-quality translations directly between 200 languages, including languages like Asturian, Luganda, and Urdu, accompanied by data about how improved the translations were overall.
However, the errors that lead to consequential mistakes—like the construction worker experienced—tend to be random, subjective, and different for each platform and each language. So cataloging them is only superfluously helpful in figuring out how to improve MT, says Félix Do Carmo, a senior lecturer at the Centre for Translation Studies at the University of Surrey. What we need to talk about instead, he says, is “How are these tools integrated into society?” Most critically, we have to be realistic about what MT can and cannot do for people right now. This involves understanding the role machine translation can have in everyday life, when and where it can be used, and how it is perceived by the people using it. “We have seen discussions about errors in every generation of machine translation. There is always this expectation that it will get better,” says Do Carmo. “We have to find human-scale solutions for human problems.”
And that means understanding the role human translators still need to play. Even as medications have gotten massively better over the decades, there still is a need for a doctor to prescribe them. Similarly, in many translation use cases, there is no need to totally cut out the human mediator, says Sabine Braun, director of the Centre for Translation Studies at the University of Surrey. One way to take advantage of increasingly sophisticated technology while guarding against errors is something called machine translation followed by post-editing, or MT+PE, in which a human reviews and refines the translation.
Link to the rest at Slate