Microsoft researchers have created a new artificial intelligence model, Vall-E, capable of reproducing a voice identical to a human. It is noted that Vall-E learns from “discrete codes derived from the standard neural audio codec model”, as well as from recordings of 60 thousand hours of conversations (this is 100 times more than in existing systems) from more than 7 thousand speakers. Most of the dialogues are taken from public LibriVox audiobook sites.
Vall-E is based on the EnCodec technology that Meta announced in October 2022. It analyzes a person’s voice, breaks down information into components and synthesizes variations of its sound in different phrases. Even after listening to only a three-second sample, Vall-E can reproduce the timbre and emotional tone of the speaker.
“The results of the experiment show that Vall-E is significantly superior to the current TTS system [ИИ, воспроизводящий голоса, которых он никогда не слышал] in terms of the naturalness of speech and similarity to the speaker, ”says the researchers in the article.
You can listen to Vall-E voice playback examples on GitHub. Most sound identical to the recordings, even though only short excerpts are used. Several voices sound more robotic and resemble the voices of traditional text-to-sound software.
Microsoft researchers believe Vall-E could be used as a text-to-speech tool, speech editing tool, and audio creation system in the future by pairing it with other generative AIs such as GPT-3.
Course
BUILDING BUSINESS PROCESSES
Learn how to implement business processes in line with new company goals.
REGISTER!
As with all other AI models, there is concern about the misuse of Vall-E – for example, to imitate the voices of public figures, politicians or stars (especially when used in combination with deepfakes). Criminals can also get sensitive data if they trick a person into believing they are talking to family, friends, or officials. Some security systems also use voice recognition. As far as its impact on jobs, Vall-E is likely to be a cheaper alternative for dubbing actors.
But Vall-E researchers say all of these risks can be reduced:
“It is possible to build a model that will determine if the audio has been synthesized by Vall-E.”
Microsoft seems to be resolutely taking up the development of AI technologies and their implementation in their own products. They will try to integrate the GPT language model from OpenAI with Word, Outlook and PowerPoint, and ChatGPT, a chat bot that generates human-like texts and gives detailed answers to questions, will be added to the version of the Bing search engine from March.
Microsoft is also in talks to invest $10 billion in OpenAI, according to media reports. The agreement stipulates that the company will receive 75% of the profits of the AI lab until it recovers its investment. Upon reaching this goal, Microsoft will receive a 49% stake in the startup, other investors will receive another 49%, and the non-profit parent organization OpenAI will receive 2%.
Media: Microsoft is investing $ 10 billion in OpenAI, the developer of the chat bot ChatGPT, which generates frighteningly human texts
Source: techspot