“Turn right at the next intersection”: Most of the computer voices still sound unnatural. The Google subsidiary Deep Mind wants to change that with the Wave Net technology. Can you tell the artificial intelligence from an actual human?
Computers that speak – various platforms and programs have offered artificial speech output for decades—so old hat. But the problem has always been that computers don’t sound natural. This becomes more and more annoying as voice assistants like Siri and Google Now find their way into the everyday lives of millions of people. So what can be done to make voice assistants like Siri and Google Now sound even more natural in the future?
Google Subsidiary DeepMind Is Developing New Speech Output
The British company DeepMind, which specializes in artificial intelligence programming and is now a Google subsidiary, developed Wave Net. Unlike other language computers, the language program does not chain together snippets of a speaker’s speech into sentences but instead generates addresses as a waveform. This also includes data on sentence position, phonemes, syllables, words, and speakers.
Wave Networks with what is known as a convolutional neural network – a machine learning concept inspired by biological processes. This makes it possible for one another to influence each other and for a natural-sounding language to emerge, improving over time.
Wave Net has a better grip on complex phenomena of natural languages – such as so-called assimilation. It says that language components – such as phonemes, syllables, and clauses – in natural language are always influenced by the sounds in their environment. They are similar in sound. The vowel A, for example, sounds differently directly in front of an O than in front of an N. The possible combinations are gigantic. It isn’t easy to take these phenomena into account in the artificial speech output.
Test Subjects Confirm The Naturalness Of The Voice Output
Wave Net has already subjected Deep Mind to tests. Test subjects were asked to compare them with natural language and example sentences from classical systems and rate the naturalness on a scale of 1-5. The result: Wave Net landed on the rating scale at 4.21 – human speech at 4.55. “Wave Net reduces the gap between human performance and state-of-the-art by over 50%,” writes Deep Mind in the company blog. Google’s method is one of the best in the world.
And What Do You Think Of The New Voice?
Wave Net could give voice assistants a more natural voice. However, DeepMind has not yet revealed when the Google Now assistant will benefit. Until then, not only the hardware of the smartphones would have to leap in performance. Google’s cloud would also have to adapt to the increased computing effort of the method.
But when it comes to artificial intelligence, no one is quick to lead Google. The news hit the waves that the Google software AlphaGo had beaten the Go world champion Lee Sedol from South Korea by a considerable margin until March. This is not just any victory. Go is considered the most challenging game in the world. It says that language components – such as phonemes, syllables, and clauses – in natural language are always influenced by the sounds in their environment. They are similar in sound.
Also Read: Lead Generation With Artificial Intelligence