Microsoft has developed a “universal translator” that not only converts English speech into Chinese in real-time, but does so while preserving the speaker’s own voice. Demonstrated in China recently, the technology is based on joint research into Deep Neural Networks by the software giant and the University of Toronto, Microsoft’s chief research officer Rick Rashid writes, using an hour’s worth of prerecorded speech example data to cut together a new, translated mashup.
Microsoft first started talking about the universal translator project earlier this year, revealing that the system can handle Spanish and Italian in addition to Mandarin Chinese. Recordings from both the speaker and a native Chinese speaker are required for the English-Chinese conversion, with the properties of the English speaker mapped onto a few hours’ worth of Chinese speech that can be reworked to suit a broad variety of phrases.
The big difference between early presentations of the technology and the October 25 demo in China is the degree of accuracy Microsoft has managed to achieve. Rashid says the word error rate has been cut by around 30-percent, meaning around one word in 7 or 8 is now incorrect.
“Of course, there are still likely to be errors in both the English text and the translation into Chinese, and the results can sometimes be humorous” Rashid warns. “Still, the technology has developed to be quite useful.”
Further refinement will be enabled when Microsoft loads in more training data, it’s expected, as the Deep Neural Networks learn more about how the human brain processes audio. Still, there’s no telling when the system might show up in your Windows Phone.