Voice Cloning F.A.Q.

Get answers to all of your questions about voice cloning, synthetic media, deepfakes, AI, text-to-speech (TTS) vs speech-to-speech (STS) voice synthesis, and more.

Respeecher F.A.Q.

A comprehensive list of speech synthesis questions and answers that will tell you everything you want to know

Respeecher utilizes AI technology to enable one person to speak in the voice of another person. In simpler terms, we take recordings of the "target voice" (the voice that is being replicated), train our system, and apply it to a "source speaker" (the actor reading the lines). The result is not robotic speech; it features all the emotions, intonations, and nuances of a real human voice. Essentially, you can speak into a microphone and our technology can make you sound exactly like a young Luke Skywalker.
Respeecher is focused on producing unmatched quality of synthetic speech while preserving all the nuances of human speech. We believe that humans are best for performing, and AI allows them to speak with a completely different voice. In 2019, Respeecher became the only technology that made it to the biggest Hollywood blockbusters using its synthetic speech that is indistinguishable from a real recording. Most synthetic speech technologies on the market are TTS, which generate vocal input from text. Respeecher conveys all the emotions and inflections from a performer but changes the voice itself.
Respeecher won its Emmy Award for Interactive Media Documentary for the short film “In Event of Moon Disaster.” Created in collaboration with MIT's Center for Advanced Virtuality in 2019 and directed by Halsey Burgund and Francesca Panetta, the film features a digitally recreated Richard Nixon reading a speech that was prepared in case Neil Armstrong's Apollo 11 moon landing went fatally wrong. For more information, please check out the case study about this project here.
The difference between the two is significant. A few important limitations text to speech has:In most cases, TTS provides non-natural, robotic emotions. AI doesn't know where to take emotions from, so it tries to generate them based on the text alone.Very limited control over emotions. Some TTS can make the converted voice sound sad or excited using text annotation. But it is hard to manually encode intricacies of human acting using these annotations alone.Words only. TTS are based on dictionaries. Unknown words and abbreviations pose a significant problem. Natural speech contains lots of non-verbal content as well. TTS struggles to render that.Most TTS systems face challenges with low-resource languages due to higher data requirements.The Respeecher voice cloning system works solely in the acoustic domain. We convey all the emotions and sounds of the source speaker while converting their timbre and other subtle variations into the target speaker.
Only with their permission. Respeecher requires consent from voice owners before we start working on a project. This is a big part of our ethics statement - we're committed to ensuring that our groundbreaking technology is only used for ethical projects - and doesn't fall into the wrong hands.Some ethical questions about synthetic speech are easy, but others are hard. We don't just rely on our gut to tell us what is right. This set of principles guides our decision making:Respeecher does not allow any deceptive uses of our technology.Respeecher does not use voices without permission, and does not impact the privacy of the subject or their ability to make a living. In practice, this means we will never use the voice of a private person or an actor without consent. In a handful of cases we have used the voices of historical figures such as Richard Nixon and Barack Obama without permission but non-deceptively and for the purposes of showing what the technology can do. While we will listen to requests, we are generally not open to doing new projects of this nature.Respeecher does not provide any public API for creating voices.Respeecher works directly with clients we trust.Respeecher requires written consent from voice owners.Respeecher only approves projects that meet our strict standards.Respeecher is developing watermarking technology that allows us to easily tell Respeecher-generated content from other content, even if it is disguised by being mixed in with other audio.
Usually, known deceased voices are represented by estates or families. Sometimes by libraries (like American presidents). Our clients reach out to the persons or companies handling the estates and obtain written consent for every project.
There could be. But for any of these cases, we'd need to double-check with lawyers. Permission from the family or relatives is the best way to proceed.
We provide 48khz audio. We typically provide raw recordings and sound engineers on the client side do the post-production and mixing.
In terms of the regular voice cloning procedure, we start the process by asking our clients to provide voice recordings of both "target" and "source" voices.By "target" voice we mean the voice that you want to get cloned. The "source" voice belongs to the voice actor whose speech serves as a base to drive the cloning. After we receive the recordings they get evaluated by our Delivery team.Our system requires high-quality, isolated voice tracks. Both target and source datasets should cover the emotional range expected in the final conversions.We do not commit to any projects without data evaluation, and therefore do not provide false promises on delivery.
Yes, by mixing other voices. Also, we have a voice library in our Voice Marketplace.
De-aging is a special case of voice conversion when we treat different ages of the same person as target and source speakers.
In general, the quality of our system is limited by the quality of the training material.Our main services and products are non-real-time (take some time to render). But we also have a real-time voice conversion prototype in closed beta.Cross-lingual voice conversion also works for certain languages.
Yes, it can. We can apply voice conversion on top of text-to-speech (TTS) and convert the TTS output to multiple voices. Moreover, in this process, the speech tends to become more natural.
Yes, we can provide API. Please contact us describing your use case for details.
We have. You can't get access to it via Voice Marketplace. We implement real-time voice cloning models on a project basis. Our real-time system is in beta and in use by corporate clients. Reach out to us describing your project that requires a real-time system and the team will get back to you regarding details on how it works.
We have some control over accents and are developing accent conversion technology. However, depending on the language, the conversions may have a slight US-English accent.
There are many similarities among various approaches, as well as a number of technical nuances, to creating indistinguishable synthetic speech or a synthetic human image. While voice synthesis, image/video synthesis, and deepfakes all involve the use of AI algorithms to generate or modify media, the specific methods and goals of these technologies may differ. However, the process of creating such media is complex and resource-intensive.
From what we’re allowed to disclose at this point, we are in charge of the young Luke Skywalker in the Book of Boba Fett and The Mandalorian.
We can only confirm that we are credited in episodes three, four, five, and six.

Voice actors are a critical component of a project's success, providing all the performing talent while we simply modify their voices. In the past, some voice actors were afraid of synthetic voice technology, worried that it would take their jobs. However, Respeecher has shown that the best results come from combining the talents of a good voice actor with voice cloning technology.

We work directly with some voice actors, helping them to scale their performances by simplifying the process of creating character voices and allowing them to speak using the voice they had many years ago. Additionally, voice actors who use our Voice Marketplace can access any voice in our library, enabling them to find work using their ability to perform, their voice, and their professionalism. This approach delivers better voiceover performances, distributes the workload more evenly for content creators, and makes work fairer for voice actors.

Here are some of the opportunities that the technology is capable of delivering:

  • Work allocation. Usually, one actor = one character. Voice cloning technology makes it possible to hire actors not only for their voice but for other other acting qualities.
  • Character voices. Voice actors often have the task of performing unusual voices.
  • Voice rejuvenation. It often happens that for the voice acting of video games, scenes like flashbacks with a younger version of a character are needed.
  • Multilinguality and accents. Voice cloning can also be used to translate an actor's words into different languages.
  • Monetization. Actors now have the opportunity to monetize their voices without having to work directly in voice acting.

To learn more about the opportunities that voice cloning offers to voice actors, please read this article.

We work in the Healthcare industry, providing real-time services that can help people with vocal disabilities communicate better.

Voice cloning is one of these types of technologies that is capable of drastically changing the way people with speech and voice disabilities function on a daily basis.

People who suffer from Parkinson’s disease, ALS, multiple sclerosis, vocal fold paralysis, pharyngeal cancer, or other ailments can use Respeecher to replicate their own voices and speak naturally. While we are not yet delivering a final solution for patients, our technology has the potential to be integrated into hardware through licensing for B2B companies, and help people with neuromuscular problems and laryngectomy communicate in their normal voices.

Respeecher takes responsible use of its voice cloning technology seriously and has implemented various measures to prevent potential misuse. These measures include integrating mitigation tools in both business streams and pre-selecting projects based on our ethical policies.

When working on tailored projects, such as cloning someone's voice, we require permission from the voice source. Additionally, Respeecher's ethical policy prohibits deceptive uses of synthetic speech, and we have pledged never to use the voice of a private person or actor without their consent. Although in a few instances, historical figures' voices have been used to showcase the technology's potential. Furthermore, Respeecher is developing two technical defenses: a synthetic speech detector and audio watermarking to help prevent any potential misuse of its voice cloning technology.

Find out more on our ethics page.

Our aim is to expand filmmakers' creative horizons and democratize the technology so that even smaller film and TV studios and video game developers can utilize it to stretch their budgets and compete with bigger studios. We want to empower small creators to realize their ideas and creativity without being held back by financial limitations.

We are continuously improving our technology by making it faster, more efficient in terms of data requirements, user-friendly, and less resource-intensive. However, our vision doesn't just stop at the entertainment industry.

The healthcare applications of Respeecher's platform are of paramount importance, and we also have initiated several ethics-oriented initiatives to ensure that our technology is used for good and not abused.

We also work on expanding our applications for cybersecurity pros: prevent vishing, improve pen testing, or enhance the accuracy of voice-based identification.

Create your projects
with advanced voice cloning

Your reliable AI Voice Partner