by Vova Ovsiienko – Mar 24, 2021 10:44:14 AM • 8 min

Voice Cloning and Video Games: How Game Developers Can Create Synthetic Voices

•••

While video game developers create stories and plots for virtual characters, until recently, the dubbing of game character voices wasn’t so different from that of characters in movies. The process for a video game involves dubbing the voices of actors and actresses along with a substantial time and money investment in the production. However, with the advent of AI voice cloning technology, everything is changing.

What are synthetic voices?

A synthetic voice is a human voice that has been produced by a computer, often using generative AI technologies. The synthesized voice is indistinguishable from the real one. This means that deciphering between the actual person's voice and the machine’s is impossible for an outsider listener.

How does voice cloning work?

The most famous among the lay audience is the so-called text-to-speech (TTS) synthesis. For this process, the computer reads the text, and this speech is recorded.

The most common example is when you ask Google to read the text you type in services like Google Translate.

This type of speech is easily distinguishable from a natural human’s voice. In a recent post, we examined the technical aspects of a characteristic robotic voice.

Things get more interesting in speech-to-speech (STS) voice conversion. Imagine you need to dub a video game character, only the original voice actor is no longer available.

It's easy to imagine with gaming franchises that have been around for decades. Unfortunately, video game or cartoon characters can easily outlive their human counterparts.

So how do you capture a character's original voice when you don’t have access to the original actor?

Artificial intelligence and machine learning technologies present a solution to these problems. Let's briefly describe the speech-to-speech character voice generation process, as developed by Respeecher:

The first requirement is good quality audio recordings of the voice being cloned. The recordings should be at least one hour in length. Artificial intelligence cannot create a perfect voice model with anything less.
We then feed this data into the machine learning algorithms. The system performs complex calculations to form a model of the original voice. Here it is crucial that the original voice recording includes as many different emotions, elocutions, tones, cadences, vocal timbre, etc., as possible. The more emotionally-loaded the original speech is, the more accurate the voice model will be.
When the model is formed, the ability to generate an unlimited amount of audio content that is indistinguishable from the original speaker is now possible. All that remains is for someone to record the speech that is needed.
The final stage is the process of transforming that person's recorded voice into the voice of the original actor. This process of conversion involves completely morphing every characteristic of speech into the authentic actor’s voice.

With this type of AI voice technology now available, video game producers are no longer restricted to dubbing actors. They can even generate unique voices that did not exist before.

Here's how AI voice generation changes game production for the better

Some of Respeecher’s most innovative benefits that allow game developers to save time and money on dubbing and additional dialogue replacement (ADR) include:

Allowing for some of the most famous voice actors to participate in the project. It's primarily about the money and time saved on bringing in a celebrity. Suppose you want to incorporate an a-list star in your project. In that case, they will probably appreciate that they only need to provide a single, hour-long voice recording. Based on this, Respeecher can generate an unlimited amount of original speech content.
Resurrecting voices from the past. Imagine that in your WWII strategy game, the dialog of the main characters and villains is in their original voices. You can also easily replace a voice actor that left your project somewhere between the first and second games.
Making it easy to solve the problem of dubbing child actors. When children grow up, their voices change. This can become a problem if your project evolves over time with the same child as the hero. Luckily, you can keep the original character's voice without depending on the original actor that dubbed them. Learn more about all the benefits of cloning a child’s voice here.
AI voice synthesis makes adding adjustments to game content easier. If the writer or director makes edits to a scene, you no longer need to work with the voice actor to add their changes. Instead, any modifications to the voice content are easily implemented by the sound engineer.

Respeecher’s tech is able to provide GameDev audio teams with creative and production flexibility by taking away dependence on specific actors/studios/production teams and making it a non-factor. It also allows interchangeability among VO actors, moreover any audio production team member now becomes able to voice over any character’s voice that’s in the system.

5. Let your best available VO actors play even more characters by giving them access to a variety of male or female voices. It’s achieved by the gender-agnostic nature of the technology allowing female-to-male voice conversion and vice-versa.

6. Receiving custom voice cloning environments containing on-demand voices that allow rapid iterations on VO during all production phases, starting from the “storyboard”. Saving time and money due to production cycles requiring less people and resources. Ensuring management receives increased planning capabilities and control over the entire process.

While all of these benefits apply to custom generative AI projects, there is something you can use right out of the box.

Synthetic voice marketplace at your service

For several years, Respeecher’s clients have been asking the same question: "how can we access your library of synthesized voices?" And while every game project doesn't have to reproduce the voice of a particular actor, gaming studios can significantly benefit from using one or two voice actors by transforming their speech into unique voices for dozens of game characters.

This is now possible thanks to Respeecher's Voice Marketplace, a library of pre-existing target voices produced by combining characteristics from several voice actors.

The library is filled with AI character voices that do not belong to any particular person. This means that you do not have to worry about copyright. Any licensing arrangement is resolved between you and Respeecher. Currently, the marketplace is the most accessible source of synthetic voices in the world and allows you to minimize the price of sound production for your game by tenfold.

In addition to Respeecher’s synthesized voices, we are adding professional licensed voices to the library. Respeecher plans to connect world-famous actors with the most ambitious gaming projects, making cooperation pleasant and mutually beneficial.

In the meantime, you can find more information here. The marketplace is available for free for the first seven days, so you can see for yourself if it's a good fit for your project.

What voice quality and authenticity can you expect with synthesized speech?

Although most of the projects Respeecher is working on are protected under an NDA, you can get a sense of our work’s quality with a couple of examples.

One of our recent projects was recreating the voice of the famous NFL coach Vincent Lombardi for the 2020 Super Bowl. Check out the original commercial.

Even sound engineering professionals find it difficult to distinguish between the legendary coach's original voice and Respeecher's synthesized version.

Projects for the gaming industry are often much easier to produce. Most video game characters are created from scratch. But even in projects like Cyberpunk 2077 or Beyond: Two Souls, where famous actors participate, Respeecher can dramatically reduce costs in order to accurately clone the speech of a character.

Feel free to drop us a line if you have any questions or are interested in a customized demo.

FAQ

Deepfake technology uses AI and machine learning to create highly realistic, synthetic media, including videos, audio, and images, mimicking real people or voices for various applications like deepfake marketing or film production.

In marketing, deepfake technology applications allow brands to create personalized, engaging content with AI-powered personalized marketing. This includes generating synthetic media for ads or leveraging AI voice generator tools to create custom experiences for consumers.

The use of deepfake technology raises concerns about manipulation, privacy, and consent. AI voice cloning ethics must be considered to avoid misusing generative AI in advertising, ensuring ethical use of deepfake technology while protecting individuals from deceptive content.

Deepfake technology streamlines content creation benefits, reducing the need for costly reshoots or actor re-engagement. AI voice cloning tools and generative AI can generate an unlimited amount of content, saving both time and money in areas like film production and advertising.

Positive examples include the use of deepfake technology in film production, where it can bring back past actors or create characters without the need for new recordings. Additionally, deepfake marketing can help brands deliver more targeted, engaging ads by leveraging AI voice cloning and synthetic media for business.

Glossary

Deepfake Technology

AI-driven tool that creates realistic synthetic media, including videos and voices, used in deepfake marketing, film production, and AI-powered personalized marketing.

Generative AI in Marketing

Uses AI-powered personalized marketing, deepfake marketing, and synthetic media for business to create targeted ads and content, boosting engagement and efficiency.

AI Voice Cloning Ethics

Focuses on the ethical use of deepfake technology, ensuring responsible use of AI voice generator tools in deepfake marketing and AI-powered personalized marketing.

Synthetic Media

AI-generated content, like deepfake technology applications, used in deepfake marketing, AI-powered personalized marketing, and film production, driving AI content creation benefits.

Personalized Advertising with AI

Uses AI-powered personalized marketing, deepfake marketing, and AI voice generator tools to create targeted, ethical ads with synthetic media for business.

Vova Ovsiienko

Business Development Executive

With a rich background in strategic partnerships and technology-driven solutions, Vova handles business development initiatives at Respeecher. His expertise in identifying and cultivating key relationships has been instrumental in expanding Respeecher's global reach in voice AI technology.