Microsoft AI can now clone voices to sound perfectly ‘human’ in seconds – but it’s too dangerous to release to public

10.07.2024 17:41

TheSun.co.uk

MICROSOFT has developed an artificial intelligence tool that can replicate human speech with uncanny precision.

It is so convincing that the tech giant refuses to share it with the public, citing “potential risks” of misuse.

GettyMicrosoft’s research subsidiary has developed an AI text-to-speech generator that can replicate human voices with eerie accuracy[/caption]

The tool, dubbed VALL-E 2, is a text-to-speech generator that can mimic a voice based on just a few seconds of audio.

It is trained to recognize concepts without being provided any examples of those concepts beforehand in a scenario called zero-shot learning.

The tech giant says VALL-E 2 is the first of its kind to achieve “human parity,” meaning it meets or surpasses benchmarks for human likeness.

It succeeds the original VALL-E system, which was announced in January 2023.

According to developers at Microsoft Research, VALL-E 2 can produce “accurate, natural speech in the exact voice of the original speaker, comparable to human performance.”

It can synthesize complex sentences in addition to short phrases.

To do so, the tool takes advantage of two features called Repetition Aware Sampling and Grouped Code Modeling.

Repetition Aware Sampling addresses the pitfalls of repetitive tokens, or the smallest units of data a language model can process – represented here by words or parts of words.

It prevents recurring sounds or phrases during the decoding process, helping vary the system’s speech and making it sound more natural.

Grouped Code Modeling limits the number of tokens the model processes at once to generate faster results.

The researchers compared VALL-E 2 against audio samples from LibriSpeech and VCTK, two English-language databases.

They also used ELLA-V, an evaluation framework for zero-shot text-to-speech synthesis, to determine how well VALL-E handled more complex tasks.

The system ultimately beat out its competitors “in speech robustness, naturalness, and speaker similarity,” according to a June 17 paper summarizing the results.

GettyThe system, called VALL-E 2, will not be released to the public due to “potential risks in the misuse of the model” including voice spoofing and targeted impersonation[/caption]

Microsoft claims VALL-E 2 will not be released to the public anytime soon, deeming it “purely a research project.”

“Currently, we have no plans to incorporate VALL-E 2 into a product or expand access to the public,” the company wrote on its website.

“It may carry potential risks in the misuse of the model, such as spoofing voice identification or impersonating a specific speaker.”

The tech behemoth notes that suspected abuse of the tool can be reported using an online portal.

And Microsoft’s concerns are well within reason. Just this year, cybersecurity experts have seen an explosion in the use of AI tools by malicious actors, including those that replicate speech.

GettyMicrosoft has come under fire for its rollout of artificial intelligence tools and relationship with OpenAI, which caught the attention of antitrust regulators[/caption]

“Vishing,” a portmanteau of “voice” and “phishing,” is a type of attack where scammers pose as friends, family, or other trusted parties on the phone.

Voice spoofing could even pose a national security risk. In January, a robocall using President Joe Biden’s voice urged Democrats not to vote in New Hampshire primaries.

The man behind the plot was later indicted on charges of voter suppression and impersonation of a candidate.

Microsoft has come under increased scrutiny over its implementation of AI, on both the antitrust and data privacy fronts.

Regulators have voiced concern about the tech giant’s $13 billion partnership with OpenAI and resulting control over the startup.

What are the arguments against AI?

Artificial intelligence is a highly contested issue, and it seems everyone has a stance on it. Here are some common arguments against it:

Loss of jobs – Some industry experts argue that AI will create new niches in the job market, and as some roles are eliminated, others will appear. However, many artists and writers insist the argument is ethical, as generative AI tools are being trained on their work and wouldn’t function otherwise.

Ethics – When AI is trained on a dataset, much of the content is taken from the Internet. This is almost always, if not exclusively, done without notifying the people whose work is being taken.

Privacy – Content from personal social media accounts may be fed to language models to train them. Concerns have cropped up as Meta unveils its AI assistants across platforms like Facebook and Instagram. There have been legal challenges to this: in 2016, legislation was created to protect personal data in the EU, and similar laws are in the works in the United States.

Misinformation – As AI tools pulls information from the Internet, they may take things out of context or suffer hallucinations that produce nonsensical answers. Tools like Copilot on Bing and Google’s generative AI in search are always at risk of getting things wrong. Some critics argue this could have lethal effects – such as AI prescribing the wrong health information.

The company has also faced blowback from its users.

Recall, an “AI assistant” that takes screen captures of a device every few seconds, saw its release indefinitely postponed last month.

Microsoft faced a deluge of criticism from consumers and data privacy experts like the Information Commissioner’s Office in the UK.

In a statement to The U.S. Sun, a company spokesperson said Recall would shift “from a preview experience broadly available for Copilot+ PCs…to a preview available first in the Windows Insider Program.”

Only after receiving feedback from this community would Recall become “available for all Copilot+ PCs,” the spokesperson said.

The company declined to comment on whether the tool posed a security risk.