Respeecher F.A.Q.

Get answers to all of your questions about voice cloning, synthetic media, deepfakes, AI, text-to-speech (TTS) vs speech-to-speech (STS) voice synthesis, and more.



Voice cloning F.A.Q.

A comprehensive list of speech synthesis questions and answers that will tell you everything you want to know


Why is STS (speech to speech) different from TTS (text to speech)?

The difference between the two is significant. A few important limitations text to speech has:

  1. In most cases, TTS provides non-natural, robotic emotions. AI doesn't know where to take emotions from, so it tries to generate them based on the text alone.

  2. Very limited control over emotions. Some TTS can make the converted voice sound sad or excited using text annotation. But it is hard to manually encode intricacies of human acting using these annotations alone.

  3. Words only. TTS are based on dictionaries. Unknown words and abbreviations pose a significant problem. Natural speech contains lots of non-verbal content as well. TTS struggles to render that.

  4. Most TTS systems face challenges with low-resource languages due to higher data requirements.

The Respeecher voice cloning system works solely in the acoustic domain. We convey all the emotions and sounds of the source speaker while converting their timbre and other subtle variations into the target speaker.


Can you use someone's voice?

Only with their permission. Respeecher requires consent from voice owners before we start working on a project. This is a big part of our ethics statement - we're committed to ensuring that our groundbreaking technology is only used for ethical projects - and doesn't fall into the wrong hands.

Some ethical questions about synthetic speech are easy, but others are hard. We don't just rely on our gut to tell us what is right. This set of principles guides our decision making:

  • Respeecher does not allow any deceptive uses of our technology.

  • Respeecher does not use voices without permission, and does not impact the privacy of the subject or their ability to make a living. In practice, this means we will never use the voice of a private person or an actor without consent. In a handful of cases we have used the voices of historical figures such as Richard Nixon and Barack Obama without permission but non-deceptively and for the purposes of showing what the technology can do. While we will listen to requests, we are generally not open to doing new projects of this nature.

  • Respeecher does not provide any public API for creating voices.

  • Respeecher works directly with clients we trust.

  • Respeecher requires written consent from voice owners.

  • Respeecher only approves projects that meet our strict standards.

  • Respeecher is developing watermarking technology that allows us to easily tell Respeecher-generated content from other content, even if it is disguised by being mixed in with other audio.


How can we get permissions for deceased voices?

Usually, known deceased voices are represented by estates or families. Sometimes by libraries (like American presidents). Our clients reach out to the persons or companies handling the estates and obtain written consent for every project.


Are there voices in "public domain"?

There could be. But for any of these cases, we'd need to double-check with lawyers. Permission from the family or relatives is the best way to proceed.


What audio quality does your system deliver?

We provide 48khz audio. We typically provide raw recordings and sound engineers on the client side do the post-production and mixing.


What does the voice cloning process look like?

In terms of the regular voice cloning procedure, we start the process by asking our clients to provide voice recordings of both "target" and "source" voices.

By "target" voice we mean the voice that you want to get cloned. The "source" voice belongs to the voice actor whose speech serves as a base to drive the cloning. After we receive the recordings they get evaluated by our Delivery team.

Our system requires high-quality, isolated voice tracks. Both target and source datasets should cover the emotional range expected in the final conversions.

We do not commit to any projects without data evaluation, and therefore do not provide false promises on delivery.


Can you create a new voice?

Yes, by mixing other voices. Also, we have a voice library in our Voice Marketplace.


How does de-aging work?

De-aging is a special case of voice conversion when we treat different ages of the same person as target and source speakers. 


Can you share more demos?

Please check our video demos page


What are the limitations of your system?

In general, the quality of our system is limited by the quality of the training material.

Our main services and products are non-real-time (take some time to render). But we also have a real-time voice conversion prototype in closed beta.

Cross-lingual voice conversion also works for certain languages.


Could your technology enhance TTS?

Yes, it can. We can apply voice conversion on top of text-to-speech (TTS) and convert the TTS output to multiple voices. Moreover, in this process, the speech tends to become more natural.


Do you offer API?

Yes, we can provide API. Please contact us describing your use case for details.


Do you have a real-time voice cloning system?

We have. You can't get access to it via Voice Marketplace. We implement real-time voice cloning models on a project basis. Our real-time system is in beta and in use by corporate clients. Reach out to us describing your project that requires a real-time system and the team will get back to you regarding details on how it works. 


Abigail Savage

Sound Designer and Supervisor, Actress

Respeecher is a remarkable tool for Sound Editors. It delivers very high-fidelity recreations of a target voice, with transparent performance-matching of its source. It blows text-to-speech out of the water! The effect is uncanny and incredibly effective and I can imagine a whole slew of uses going forward. I am very excited to have discovered Respeecher, and it will be my go-to for voice recreation in the future, without question.

Trusted by

Ready to Replicate Voices?

Want to see our technology in action? Get a demo today.