How Voice Cloning Makes Dubbing and Localization Easier: The 3 Biggest Benefits for Studios

Written by Alex Serdiuk | May 19, 2021 6:03:00 PM

Is it possible to adapt the authentic dialog of an actor from one language to another so that nothing is lost in translation?

Things to consider would include idioms, cultural specifics, tone, the list goes on. What’s more, the technical side of the process also demands time, patience, and experience to be managed properly.

With machine learning and new AI voice cloning and AI dubbing technologies, the $2.5 billion localization and dubbing market is undergoing a major disruption. Today’s article explores the most relevant factors.

Challenges that dubbing studios must overcome

Of the $2.5 billion market capitalization mentioned above, 70% (1.7 billion) belongs to dubbing. With the growing popularity of streaming networks like Netflix, the demand for dubbing is on the rise. It is safe to assume that market capitalization will increase by another 30% over the next few years.

Simply put, dubbing is the process of translating and replacing original speech with the same speech in a foreign language.

Television shows, movies, and animated films are the most common examples of dubbed content. Every year, hundreds of films are dubbed into dozens of international languages in Hollywood alone.

Dubbing a video that contains a dozen voice actors with one and a half to two hours of audio material can take months. Here are the main reasons that cause the dubbing process to take so long.

Dubbing and cultural adaptation

When initiating a dub, a studio has to consider the target country's cultural aspects - references, jokes, names, idiomatic expressions, etc.

The literal translation of a dialog will not be understandable to an audience. Sometimes, it can even offend the viewer. Not all phrases and cultural references may be appropriate in certain regions. In this situation, edits are made to the text of the original dialog.

Synchronization difficulties

Three types of synchronization must be taken into account to allow a dubbed voice to fit the original video.

Lip-sync: the voice is synchronized with the mouth articulations of onscreen actors.
Kinesic: the voice is synchronized with body movements.
Isochrony: the voice is synchronized with an original actor's utterances.

The problem is that an exact phrase in different languages may take additional time to pronounce when compared with the original.

This leads to the problem of audio ceasing to correspond with what is happening on the screen. This discrepancy spoils the experience for the viewer watching the dubbed film.

Business complications

To launch their content in multiple markets, production companies rely on the services of multiple dubbing studios around the world. The dubbing market is very conservative and over the past decades, every dubbing studio has revealed the same flaws.

There are too few high-quality service providers.
There is a limited number of voice actors employed by regional dubbing leaders. This results in having to use the same voices in almost every film that comes out in a particular region.
In this regard, there is an almost constant queue for projects that studios are working on. This leads to significant delays in production schedules.

How speech-to-speech voice cloning is disrupting the dubbing industry

If you're not familiar with voice cloning, you can delve into applications for industries like film and TV, game development, and even dubbing and localization - as well as learn how Respeecher technology works.

In short, Respeecher's AI voice generator technology allows you to clone the voice of any person in such a way that it sounds like the voice of another person. Provided, of course, that the AI has an audio recording of sufficient length for the target voice.

Even when the client lacks high-res sources, Repeecher can make it work. Despite this challenge, we have built an audio version of the super resolution algorithm to deliver the highest resolution audio across the board. You can download this audio super-resolution whitepaper to find out more.

In practice, this means that through an AI voice changer your voice can be transformed, for example, into Beyoncé's voice. With this type of technology, your gender doesn't even matter.

The resulting recording will contain all the emotional accents you spoke into the recording and will come out the other end sounding like the famous singer.

This is what Shaun Cashman, Emmy award-winning animation Producer/Director has to say about the technology:

View full post