by Anna Bulakh – Apr 12, 2022 10:00:00 AM • 8 min

The Rise of Ethical Voice Cloning in the Deepfake Voice Wars

•••

Deepfake voice technology has experienced a dramatic evolution over the past decade. These advancements have given way to the technology’s growing popularity in multiple industries including entertainment, movies, marketing, healthcare, and customer service. With the rise of synthetic media, deepfake voice generator has become increasingly sophisticated and accessible, allowing for highly realistic audio manipulation and voice synthesis.

Such rapid growth and demand are always accompanied by active discussions about the ethical use of new technologies. The notorious fame of deepfake videos did not contribute to the debate. Nevertheless, today, voice cloning software is considered safe and ethical. This article will explain how and why. With increasing concerns about privacy, consent, and authenticity, addressing AI ethics becomes paramount in the development and deployment of such generative AI technologies.

The rise of AI voice cloning technology

Not long ago, AI voice cloning technology got its start with simple speech synthesizers — programs capable of converting text to human speech. Even today this generative AI technology is one of the most widespread. For example, Google Translator can read a text in a foreign language after translating it.

Text-to-speech voice conversion reached its peak in products like Descript's Overdub — ultra-realistic text-to-speech voice cloning widely used in podcasting and radio. Services like Overdub help create pieces of audio content so that producers never have to reach out to voice actors.

After realistic voice generators, the AI deepfake voice technology made its way onto the market. Using machine learning and AI algorithms, Respeecher was able to create a unique technology capable of cloning one person's voice into the voice of someone else. As the demand for synthetic media grows, innovations like AI voice cloning offer new possibilities and challenges in various industries. We’ve examined in detail how this AI voice cloning technology is changing content production for the better in a series of articles on our blog:

In short, you can convert the voice of any person (gender does not matter) into the target voice of a person using an AI voice generator. There is only one requirement: the algorithm requires an hour-long, high-quality recording of the target's voice, allowing the AI to generate its model correctly. Once the model has been generated, you can clone unlimited speech to your target voice without sacrificing the source voices’ intonations, cadence, particular vocal emphasis, etc.

In case the original audio recording does not have the best quality, especially if they are old, Respeecher built an audio version of the super-resolution algorithm to deliver the highest resolution audio across the board. Want to find out more? Download this whitepaper on audio super-resolution with Respeecher.

Ethical doubts around the voice cloning process

As you can see, there is nothing unethical about voice cloning technology itself. And although it uses the same AI technology as video deepfakes, there are significantly fewer examples of defamatory deepfake voices.

However, it is becoming more common for deepfakes to combine audio and video with the goal of deceiving as many people as possible. Here are the most famous examples.

Voice scammers

Every human being’s voice is unique. This is why some government and financial institutions use voice authentication to access private assets. In everyday life, most people also rely on their natural ability to distinguish the voices of friends and family when they cannot see them.

All this creates ideal circumstances for those with bad intentions to gain access to people's personal information or money.

Law enforcement agencies in many countries are busy establishing proper regulations for producing and using artificially synthesized voices. The United States has already passed a law called The Defending Each and Every Person from False Appearances by Keeping Exploitation Subject (DEEP FAKES) to Accountability Act in 2019.

Fake news

In 2020, fake news was estimated to have cost the global economy up to $78 billion. In 2019, cybersecurity company Deeptrace reported that the number of deepfake videos circulating online had surpassed 15,000. And this number would continue to double each year.

Deepfakes are widely used in the political arena — to mislead voters and manipulate facts. All this can create financial risks and damage the very fabric of our society.

Controversial media applications

Aside from malicious intent, some deepfake applications in media don’t quite qualify for compliance with ethical standards.

One such example would be the 2021 Anthony Bourdain deepfake controversy.

A film detailing the life of Anthony Bourdain encountered backlash after the director disclosed that the producers used deepfake voice technology. Some of his quotes were narrated using a cloned voice due to not having access to the original audio recordings.

Naturally, this raised concerns in the community. With the ability to alter historical facts, there is a grave need to ensure the production of ethical voice cloning. In this regard, the AI engineering community is constantly working to improve the recognition of audio and video deepfakes.

Be that as it may, there are many more positive examples of utilizing deepfake voice technology than negative ones. Here are just a few.

Recent examples of ethical AI voice cloning

Here at Respeecher, we take AI ethics very seriously. That's why we are committed to following a strict ethical code for voice cloning.

Here are just a few projects from our portfolio. As you will see, every single one was created in close cooperation with the copyright holders and families of those deceased (in case concerns arise over a project’s use of a voice).

We recommend taking a quick look at these stories:

The titles speak for themselves and include resurrection projects and voice cloning for actual living celebrities and movie stars, showcasing the capabilities of ethical synthetic media.

As you can see, there's no inherent evil in a deepfake generator in and of itself. However, there are those who intentionally disregard responsibility or use generative AI with malicious intent.

The future of voice conversion as Respeecher sees it

With developments like the recent Respeecher and Veritone partnership or voice cloning making its way to Hollywood, it's evident that voice cloning is here to stay. As pioneers of the technology, we want to ensure ethical voice cloning applications.

In addition to purely technical measures, which include the development of algorithms for deepfake identification and voice watermarking, we are working to democratize and educate the market.

Making the AI voice cloning technology legible and accessible to as many businesses and creative projects as possible through our AI voice generator will protect the community from scammers or unethical use.

Contact us if you're looking for a trustworthy partner for your media, marketing, or healthcare initiative. We are always eager to help.

FAQ

AI voice cloning technology uses generative AI to replicate a person's voice based on an hour-long recording. It captures vocal nuances like intonation, cadence, and emotional tone to generate a realistic synthetic voice. This voice cloning software opens up possibilities for content creation, healthcare, and speech disability solutions.

A deepfake voice generator utilizes AI voice cloning algorithms to create realistic synthetic voices. By analyzing high-quality audio samples, the technology learns the unique patterns of a person’s voice. This enables the deepfake voice generator to produce audio that mimics the original speaker's speech, cadence, and tone.

The main ethical concerns surrounding voice cloning software include deepfake misuse, such as creating misleading or malicious content, identity theft, and unauthorized use of someone's voice. Ethical voice cloning requires permission from the voice owner, ensuring transparency and preventing misuse of generative AI technology for deceptive purposes.

Deepfake detection algorithms help identify manipulated audio and video by analyzing inconsistencies in patterns that AI may miss. With advanced AI-driven voice synthesis technology and audio super-resolution technology, these algorithms can flag deepfake content, ensuring synthetic media innovations are used responsibly and ethically.

Voice cloning software benefits a variety of industries, including entertainment, marketing, healthcare, and customer service. AI voice cloning technology enhances customer interactions, content production, and voice restoration for laryngectomy patients, providing personalized solutions and creating new opportunities in synthetic media innovations.

Glossary

AI voice cloning technology

A generative AI technology that uses AI-driven voice synthesis to replicate voices. It powers deepfake voice generators, voice cloning software, and synthetic media innovations, with applications in ethical voice cloning, audio super-resolution technology, deepfake detection algorithms, and voice watermarking for security.

Deepfake voice generator

A tool powered by AI voice cloning technology and AI-driven voice synthesis, enabling synthetic media innovations and voice cloning software for creating realistic deepfake voices. It integrates audio super-resolution technology, and requires deepfake detection algorithms and voice watermarking for security.

Voice watermarking

A technique used in AI voice cloning technology to embed hidden markers in voice cloning software, ensuring authenticity and enabling deepfake detection algorithms for security in synthetic media innovations.

Audio super-resolution

A technology enhancing audio quality in AI voice cloning technology, improving voice cloning software output for clearer, more natural AI-driven voice synthesis in synthetic media innovations.

Ethical voice cloning

The responsible use of AI voice cloning technology and voice cloning software, ensuring consent, transparency, and security in AI-driven voice synthesis and synthetic media innovations.

Anna Bulakh

Head of Ethics and Partnerships

Blending a decade of expertise in international security with a passion for the ethical deployment of AI, I stand at the forefront of shaping how emerging technologies intersect with national resilience and security strategies. As the Head of Ethics and Partnerships at Respeecher, I focus on guiding ethical AI development. My role is centered around promoting the responsible use of AI, especially in synthetic media.

Did you like this content?

How Respeecher Can Influence Victim Assistance and Witness Protection

The Future of Sound: AI Voice Cloning for the Metaverse

The Rise of Ethical Voice Cloning in the Deepfake Voice Wars

The rise of AI voice cloning technology