Remember the times of Picture Avatars? Well, now there are Vocal Avatars. About a year ago, a company’s voice cloning tool called Deep Voice required 30 minutes of audio to clone a human voice. A new improvement that was announced by a Chinese tech giant, Baidu shows the same tool from a year ago takes just 3.7 seconds of audio to clone a voice now. As outstanding as this is, it’s a bit of concern for the reason of possible innovation misuse.
Another known Artificial intelligence company backed by powerful investors claims to be building a whole new generation of speech synthesis technologies. They offer options of a custom-made artificial voice for a company or a particular application, a digital voice that sounds like the client with only one minute of audio and integration of the digital voices of users in particular application.
According to researcher’s write in a Baidu article last year: “Voice cloning is expected to have significant applications in the direction of personalization in human-machine interfaces,” The baidu system can change a one country accent to another or a female voice to male, clearly AI can learn to mimic different styles of speaking, personalizing text-to-speech to a new level.
You know what they say about technology always evolving and change being constant. No matter how long it takes, the technology to create artificial voices is moving forward. Voice generation capabilities of AI are now more realistic from their starting stage of sounding ‘robotic.’
It wasn’t always about how fast results could come out but how real they were. The more data voice cloning tools are trained the more realistic the results will be, like all artificial intelligence algorithms When you listen to different cloning examples (which I did), it’s easier to appreciate how much the technology can do.
Credits -Bernard Marr, Samantha Cole.JOIN OUR COMMUNITY