Voice Cloning F.A.Q.

Get answers to all of your questions about voice cloning, synthetic media, deepfakes, AI, text-to-speech (TTS) vs speech-to-speech (STS) voice synthesis, and more.

Respeecher F.A.Q.

A comprehensive list of speech synthesis questions and answers that will tell you everything you want to know

Respeecher utilizes AI technology to enable one person to speak in the voice of another person. In simpler terms, we take recordings of the "target voice" (the voice that is being replicated), train our system, and apply it to a "source speaker" (the actor reading the lines). The result is not robotic speech; it features all the emotions, intonations, and nuances of a real human voice. Essentially, you can speak into a microphone and our technology can make you sound exactly like a young Luke Skywalker.

Respeecher is focused on producing unmatched quality of synthetic speech while preserving all the nuances of human speech. We believe that humans are best for performing, and AI allows them to speak with a completely different voice. In 2019, Respeecher became the only technology that made it to the biggest Hollywood blockbusters using its synthetic speech that is indistinguishable from a real recording. Most synthetic speech technologies on the market are TTS, which generate vocal input from text. Respeecher conveys all the emotions and inflections from a performer but changes the voice itself.

Respeecher won its Emmy Award for Interactive Media Documentary for the short film “In Event of Moon Disaster.” Created in collaboration with MIT's Center for Advanced Virtuality in 2019 and directed by Halsey Burgund and Francesca Panetta, the film features a digitally recreated Richard Nixon reading a speech that was prepared in case Neil Armstrong's Apollo 11 moon landing went fatally wrong. For more information, please check out the case study about this project here.

The difference between the two is significant. A few important limitations text to speech has:In most cases, TTS provides non-natural, robotic emotions. AI doesn't know where to take emotions from, so it tries to generate them based on the text alone.Very limited control over emotions. Some TTS can make the converted voice sound sad or excited using text annotation. But it is hard to manually encode intricacies of human acting using these annotations alone.Words only. TTS are based on dictionaries. Unknown words and abbreviations pose a significant problem. Natural speech contains lots of non-verbal content as well. TTS struggles to render that.Most TTS systems face challenges with low-resource languages due to higher data requirements.The Respeecher voice cloning system works solely in the acoustic domain. We convey all the emotions and sounds of the source speaker while converting their timbre and other subtle variations into the target speaker.

Only with their permission. Respeecher requires consent from voice owners before we start working on a project. This is a big part of our ethics statement - we're committed to ensuring that our groundbreaking technology is only used for ethical projects - and doesn't fall into the wrong hands.Some ethical questions about synthetic speech are easy, but others are hard. We don't just rely on our gut to tell us what is right. This set of principles guides our decision making:Respeecher does not allow any deceptive uses of our technology.Respeecher does not use voices without permission, and does not impact the privacy of the subject or their ability to make a living. In practice, this means we will never use the voice of a private person or an actor without consent. In a handful of cases we have used the voices of historical figures such as Richard Nixon and Barack Obama without permission but non-deceptively and for the purposes of showing what the technology can do. While we will listen to requests, we are generally not open to doing new projects of this nature.Respeecher does not provide any public API for creating voices.Respeecher works directly with clients we trust.Respeecher requires written consent from voice owners.Respeecher only approves projects that meet our strict standards.Respeecher is developing watermarking technology that allows us to easily tell Respeecher-generated content from other content, even if it is disguised by being mixed in with other audio.

Usually, known deceased voices are represented by estates or families. Sometimes by libraries (like American presidents). Our clients reach out to the persons or companies handling the estates and obtain written consent for every project.

There could be. But for any of these cases, we'd need to double-check with lawyers. Permission from the family or relatives is the best way to proceed.

We provide 48khz audio. We typically provide raw recordings and sound engineers on the client side do the post-production and mixing.

In terms of the regular voice cloning procedure, we start the process by asking our clients to provide voice recordings of both "target" and "source" voices.By "target" voice we mean the voice that you want to get cloned. The "source" voice belongs to the voice actor whose speech serves as a base to drive the cloning. After we receive the recordings they get evaluated by our Delivery team.Our system requires high-quality, isolated voice tracks. Both target and source datasets should cover the emotional range expected in the final conversions.We do not commit to any projects without data evaluation, and therefore do not provide false promises on delivery.

Yes, by mixing other voices. Also, we have a voice library in our Voice Marketplace.

De-aging is a special case of voice conversion when we treat different ages of the same person as target and source speakers.

In general, the quality of our system is limited by the quality of the training material.Our main services and products are non-real-time (take some time to render). But we also have a real-time voice conversion prototype in closed beta.Cross-lingual voice conversion also works for certain languages.

Yes, it can. We can apply voice conversion on top of text-to-speech (TTS) and convert the TTS output to multiple voices. Moreover, in this process, the speech tends to become more natural.

Yes, we can provide API. Please contact us describing your use case for details.

We have. You can't get access to it via Voice Marketplace. We implement real-time voice cloning models on a project basis. Our real-time system is in beta and in use by corporate clients. Reach out to us describing your project that requires a real-time system and the team will get back to you regarding details on how it works.

We have some control over accents and are developing accent conversion technology. However, depending on the language, the conversions may have a slight US-English accent.

There are many similarities among various approaches, as well as a number of technical nuances, to creating indistinguishable synthetic speech or a synthetic human image. While voice synthesis, image/video synthesis, and deepfakes all involve the use of AI algorithms to generate or modify media, the specific methods and goals of these technologies may differ. However, the process of creating such media is complex and resource-intensive.

From what we’re allowed to disclose at this point, we are in charge of the young Luke Skywalker in the Book of Boba Fett and The Mandalorian.

We can only confirm that we are credited in episodes three, four, five, and six.

Voice actors are a critical component of a project's success, providing all the performing talent while we simply modify their voices. In the past, some voice actors were afraid of synthetic voice technology, worried that it would take their jobs. However, Respeecher has shown that the best results come from combining the talents of a good voice actor with voice cloning technology.

We work directly with some voice actors, helping them to scale their performances by simplifying the process of creating character voices and allowing them to speak using the voice they had many years ago. Additionally, voice actors who use our Voice Marketplace can access any voice in our library, enabling them to find work using their ability to perform, their voice, and their professionalism. This approach delivers better voiceover performances, distributes the workload more evenly for content creators, and makes work fairer for voice actors.

Here are some of the opportunities that the technology is capable of delivering:

Work allocation. Usually, one actor = one character. Voice cloning technology makes it possible to hire actors not only for their voice but for other other acting qualities.
Character voices. Voice actors often have the task of performing unusual voices.
Voice rejuvenation. It often happens that for the voice acting of video games, scenes like flashbacks with a younger version of a character are needed.
Multilinguality and accents. Voice cloning can also be used to translate an actor's words into different languages.
Monetization. Actors now have the opportunity to monetize their voices without having to work directly in voice acting.

To learn more about the opportunities that voice cloning offers to voice actors, please read this article.

We work in the Healthcare industry, providing real-time services that can help people with vocal disabilities communicate better.

Voice cloning is one of these types of technologies that is capable of drastically changing the way people with speech and voice disabilities function on a daily basis.
People who suffer from Parkinson’s disease, ALS, multiple sclerosis, vocal fold paralysis, pharyngeal cancer, or other ailments can use Respeecher to replicate their own voices and speak naturally. While we are not yet delivering a final solution for patients, our technology has the potential to be integrated into hardware through licensing for B2B companies, and help people with neuromuscular problems and laryngectomy communicate in their normal voices.

Respeecher takes responsible use of its voice cloning technology seriously and has implemented various measures to prevent potential misuse. These measures include integrating mitigation tools in both business streams and pre-selecting projects based on our ethical policies.

When working on tailored projects, such as cloning someone's voice, we require permission from the voice source. Additionally, Respeecher's ethical policy prohibits deceptive uses of synthetic speech, and we have pledged never to use the voice of a private person or actor without their consent. Although in a few instances, historical figures' voices have been used to showcase the technology's potential. Furthermore, Respeecher is developing two technical defenses: a synthetic speech detector and audio watermarking to help prevent any potential misuse of its voice cloning technology.

Find out more on our ethics page.

Our aim is to expand filmmakers' creative horizons and democratize the technology so that even smaller film and TV studios and video game developers can utilize it to stretch their budgets and compete with bigger studios. We want to empower small creators to realize their ideas and creativity without being held back by financial limitations.

We are continuously improving our technology by making it faster, more efficient in terms of data requirements, user-friendly, and less resource-intensive. However, our vision doesn't just stop at the entertainment industry.

The healthcare applications of Respeecher's platform are of paramount importance, and we also have initiated several ethics-oriented initiatives to ensure that our technology is used for good and not abused.

We also work on expanding our applications for cybersecurity pros: prevent vishing, improve pen testing, or enhance the accuracy of voice-based identification.

A voiceover artist, also known as a voice over actor, performs voice over work by narrating content, often utilising voice over software or AI voiceover technology.

To become a voiceover actor, start by training in voice acting, build a portfolio of your voice over work, and utilise platforms like the best voice over websites to scale and monetize your voice.

Yes, several voice over generators offer free services. These AI voiceover tools allow you to create voiceovers without cost, ideal for beginners or low-budget projects.

Respeecher Marketplace offers a free 3-day trial and great package deals for TTS and STS credits.

The best AI voiceover generator offers high-quality, realistic outputs, with features like AI voice cloning and celebrity voice over generator options, enhancing voice over AI experiences.

Respeecher is the best AI voice generator for celebrities. It offers high-quality, realistic voice cloning - including both male and female voice changer options, gives celebrities full control over the way their voice is used, and lets them preserve, scale and additionally monetize their voice.

Respeecher, the most popular AI voice generator, offers Hollywood-quality AI voice output. With its groundbreaking voice cloning technology and high ethical standards, it the best solution for various voice AI applications.

Yes, if the celebrity gives an explicit consent for their voice to be used in a celebrity voice generator, you can use voice AI in order to add celebrity voices to your software of generate a voice over with a celebrity voice.

If a celebrity gave an explicit content for their voice to be used, you can use a voice changer software that includes singing capabilities. Respeecher's software can convert singing voice perfectly.

Keep in mind that generating deepfakes is unethical unless it's done for education purposes and the AI nature is clearly indicated.

Voice cloning legality depends on jurisdiction and consent. Using AI voices through voice cloning without permission is illegal.
Respeecher's high ethical standards help you stay compliant with the laws and regulations when using voice AI technologies.

Yes, but only with a customer plan.
Imagine you want to change your voice during a live podcast or telephone call, while playing a video game, or even while singing a song on stage. This is what we call real-time voice conversion.

Respeecher is actively developing a real-time voice conversion system. However, at this time, real-time conversion is not available in Voice Marketplace, and we must charge considerably more for it than we do for Voice Marketplace.

If you need real-time voice conversion today, please contact us. Otherwise, stay tuned, and hopefully soon this will be yet another technology that we first develop and then democratize.

Only with a custom plan and with permission from the voice owner.

Yes, but only with a custom plan.

At the moment, converting a person's voice into their own is only possible through custom plans, as it demands more time and resources to train new models compared to standard conversion options. However, as technology advances, it is feasible that this capability will become accessible in standard plans.

Yes, you can use your conversions in commercial projects for all plans except for the free trial plan.

With Speech-to-Speech, you can speak any language.

You can speak any language you like, and the system will work. However, since the system is mainly trained on English, you may find that it adds a slight English accent to the conversions.

This accent is more pronounced in some cases than in others, and many customers don't notice it at all. If you find that the accent makes our system unacceptable for your use case, please let us know and also check back later since we are working on eliminating this accent.

Calibrate your voice and ensure you have good recording conditions.

Improving the audio quality of the conversion can be achieved through several methods. Firstly, it's crucial to ensure that the voice calibration process has been completed accurately, as this allows the conversion software to better interpret and modify the person's voice.

Additionally, having good recording conditions is also essential. This means using a high-quality microphone and recording in a quiet environment with minimal background noise.

Finally, it's recommended to speak clearly and consistently throughout the recording to ensure the best possible audio quality for the conversion.

Calibration is the process of collecting information about a person's voice, such as their average pitch.

Calibration allows the system to understand how high or low your voice is so that it knows how much to shift its pitch when converting to different speakers. You don't need to use different calibrations for different recording setups, but you may need to use a different calibration for singing or for a character voice.

To see your calibrations or submit a new calibration, click on the Calibrations option from main navigation menu.

No, minutes included in your subscription do not carry over from month to month. They reset on the day of the month that your subscription originally started.

However, any Credits you have purchased with Pay as You Go plan will carry over from month to month for a whole year

Yes. Go for Credits in Pay as You Go plan.

With Pay as You Go, you only pay for the conversions you need. No commitment to monthly subscriptions, no hassle, no burning money on unused minutes and characters.

Yes, we do - learn more about Respeecher API.

Our API allows you to integrate our text-to-speech (TTS) and speech-to-speech (STS) capabilities into your applications, platforms, or services. For detailed information about our API, including endpoints, parameters, authentication methods, and examples, you can refer to our public web specification.

Look for Manage Subscription button on the Plans page.

On Respeecher Marketplace, click on your name in the upper right corner. Click on the Plans button in the menu that opens. There, you can either pick a subscription plan or purchase credits with Pay as You Go plan.

Look for Manage Subscription button on the Plans page.

On Respeecher Marketplace, click on your name in the upper right corner. Click on the Plans button in the menu that opens. On the screen that opens, you can cancel your subscription.

You will keep your access until the end of your trial or (if your trial has already finished) until the end of the time that you have paid for.

Refunds are at the sole discretion of Respeecher.

As a rule of thumb, we will approve requests for refunds of payments made in the past 30 days if you have not made any conversions since making the payment.

If you have questions about refunds, contact us.

Look for Delete Account button on the Account page.

On Respeecher Marketplace, click on your name in the upper right corner. Click on the Account page in the menu that opens. On the Account page, you'll find the Delete Account button positioned at the bottom right corner.

Kindly be aware that the data deletion request may take up to 30 days to be fully processed.

STS stands for speech-to-speech, while TTS stands for text-to-speech.

STS (speech-to-speech) involves converting the audio recording of one person’s speech into the speech of another person. TTS (text-to-speech) converts written text into spoken audio.

Narration styles are available only for a subset of voices and exclusively for English, French, and German input.

If you don’t see Narration styles available for a particular voice, it may be because that voice does not support these advanced styles, or the text you’re trying to convert is not in English. We continually work to expand support for these advanced styles across different voices and languages to enhance your experience.

Keep up with a rapidly evolving industry

Get the monthly newsletter keeping thousands of sound professionals in the loop and up-to-date

Voice Cloning F.A.Q.

Respeecher F.A.Q.

AI Voice Lab

Voice Marketplace

Keep up with a rapidly evolving industry

Any questions?

Voice Cloning F.A.Q.

Respeecher F.A.Q.

AI Voice Lab

Voice Marketplace

Create your projects with advanced voice cloning

Keep up with a rapidly evolving industry

Any questions?

Create your projects
with advanced voice cloning