Respeecher utilizes AI technology to enable one person to speak in the voice of another person. In simpler terms, we take recordings of the "target voice" (the voice that is being replicated), train our system, and apply it to a "source speaker" (the actor reading the lines). The result is not robotic speech; it features all the emotions, intonations, and nuances of a real human voice. Essentially, you can speak into a microphone and our technology can make you sound exactly like a young Luke Skywalker.
Voice cloning F.A.Q.
A comprehensive list of speech synthesis questions and answers that will tell you everything you want to know
What exactly does Respeecher do?
How is Respeecher different from other synthetic speech companies in the market?
Respeecher is focused on producing unmatched quality of synthetic speech while preserving all the nuances of human speech. We believe that humans are best for performing, and AI allows them to speak with a completely different voice.
In 2019, Respeecher became the only technology that made it to the biggest Hollywood blockbusters using its synthetic speech that is indistinguishable from a real recording. Most synthetic speech technologies on the market are TTS, which generate vocal input from text. Respeecher conveys all the emotions and inflections from a performer but changes the voice itself.
What is the story behind your Emmy award?
Respeecher won its Emmy Award for Interactive Media Documentary for the short film In Event of Moon Disaster. Created in collaboration with MIT's Center for Advanced Virtuality in 2019 and directed by Halsey Burgund and Francesca Panetta, the film features a digitally recreated Richard Nixon reading a speech that was prepared in case Neil Armstrong's Apollo 11 moon landing went fatally wrong. For more information, please check out the case study about this project here.
Why is STS (speech to speech) different from TTS (text to speech)?
The difference between the two is significant. A few important limitations text to speech has:
In most cases, TTS provides non-natural, robotic emotions. AI doesn't know where to take emotions from, so it tries to generate them based on the text alone.
Very limited control over emotions. Some TTS can make the converted voice sound sad or excited using text annotation. But it is hard to manually encode intricacies of human acting using these annotations alone.
Words only. TTS are based on dictionaries. Unknown words and abbreviations pose a significant problem. Natural speech contains lots of non-verbal content as well. TTS struggles to render that.
Most TTS systems face challenges with low-resource languages due to higher data requirements.
The Respeecher voice cloning system works solely in the acoustic domain. We convey all the emotions and sounds of the source speaker while converting their timbre and other subtle variations into the target speaker.
Can you use someone's voice?
Only with their permission. Respeecher requires consent from voice owners before we start working on a project. This is a big part of our ethics statement - we're committed to ensuring that our groundbreaking technology is only used for ethical projects - and doesn't fall into the wrong hands.
Some ethical questions about synthetic speech are easy, but others are hard. We don't just rely on our gut to tell us what is right. This set of principles guides our decision making:
Respeecher does not allow any deceptive uses of our technology.
Respeecher does not use voices without permission, and does not impact the privacy of the subject or their ability to make a living. In practice, this means we will never use the voice of a private person or an actor without consent. In a handful of cases we have used the voices of historical figures such as Richard Nixon and Barack Obama without permission but non-deceptively and for the purposes of showing what the technology can do. While we will listen to requests, we are generally not open to doing new projects of this nature.
Respeecher does not provide any public API for creating voices.
Respeecher works directly with clients we trust.
Respeecher requires written consent from voice owners.
Respeecher only approves projects that meet our strict standards.
Respeecher is developing watermarking technology that allows us to easily tell Respeecher-generated content from other content, even if it is disguised by being mixed in with other audio.
How can we get permissions for deceased voices?
Usually, known deceased voices are represented by estates or families. Sometimes by libraries (like American presidents). Our clients reach out to the persons or companies handling the estates and obtain written consent for every project.
Are there voices in "public domain"?
There could be. But for any of these cases, we'd need to double-check with lawyers. Permission from the family or relatives is the best way to proceed.
What audio quality does your system deliver?
We provide 48khz audio. We typically provide raw recordings and sound engineers on the client side do the post-production and mixing.
What does the voice cloning process look like?
In terms of the regular voice cloning procedure, we start the process by asking our clients to provide voice recordings of both "target" and "source" voices.
By "target" voice we mean the voice that you want to get cloned. The "source" voice belongs to the voice actor whose speech serves as a base to drive the cloning. After we receive the recordings they get evaluated by our Delivery team.
Our system requires high-quality, isolated voice tracks. Both target and source datasets should cover the emotional range expected in the final conversions.
We do not commit to any projects without data evaluation, and therefore do not provide false promises on delivery.
Can you create a new voice?
How does de-aging work?
De-aging is a special case of voice conversion when we treat different ages of the same person as target and source speakers.
Can you share more demos?
Please check our video demos page
What are the limitations of your system?
In general, the quality of our system is limited by the quality of the training material.
Our main services and products are non-real-time (take some time to render). But we also have a real-time voice conversion prototype in closed beta.
Cross-lingual voice conversion also works for certain languages.
Could your technology enhance TTS?
Yes, it can. We can apply voice conversion on top of text-to-speech (TTS) and convert the TTS output to multiple voices. Moreover, in this process, the speech tends to become more natural.
Do you offer API?
Do you have a real-time voice cloning system?
We have. You can't get access to it via Voice Marketplace. We implement real-time voice cloning models on a project basis. Our real-time system is in beta and in use by corporate clients. Reach out to us describing your project that requires a real-time system and the team will get back to you regarding details on how it works.
Can you change accents?
We have some control over accents and are developing accent conversion technology. However, depending on the language, the conversions may have a slight US-English accent.
How is voice synthesis similar to image/video synthesis or deepfake?
There are many similarities among various approaches, as well as a number of technical nuances, to creating indistinguishable synthetic speech or a synthetic human image. While voice synthesis, image/video synthesis, and deepfakes all involve the use of AI algorithms to generate or modify media, the specific methods and goals of these technologies may differ. However, the process of creating such media is complex and resource-intensive.
Can you specify which Star Wars movies you worked on and what your contribution was in creating AI voices?
From what we’re allowed to disclose at this point, we are in charge of the young Luke Skywalker in the Book of Boba Fett and The Mandalorian.
Could you provide details about your involvement in the production of Obi-Wan Kenobi?
We can only confirm that we are credited in episodes three, four, five, and six.
How do you cooperate with voice actors?
Voice actors are a critical component of a project's success, providing all the performing talent while we simply modify their voices. In the past, some voice actors were afraid of synthetic voice technology, worried that it would take their jobs. However, Respeecher has shown that the best results come from combining the talents of a good voice actor with voice cloning technology.
We work directly with some voice actors, helping them to scale their performances by simplifying the process of creating character voices and allowing them to speak using the voice they had many years ago. Additionally, voice actors who use our Voice Marketplace can access any voice in our library, enabling them to find work using their ability to perform, their voice, and their professionalism. This approach delivers better voiceover performances, distributes the workload more evenly for content creators, and makes work fairer for voice actors.
Here are some of the opportunities that the technology is capable of delivering:
Work allocation. Usually, one actor = one character. Voice cloning technology makes it possible to hire actors not only for their voice but for other other acting qualities.
Character voices. Voice actors often have the task of performing unusual voices.
Voice rejuvenation. It often happens that for the voice acting of video games, scenes like flashbacks with a younger version of a character are needed.
Multilinguality and accents. Voice cloning can also be used to translate an actor's words into different languages.
Monetization. Actors now have the opportunity to monetize their voices without having to work directly in voice acting.
To learn more about the opportunities that voice cloning offers to voice actors, please read this article.
What are other use cases for synthetic media and voice cloning?
We work in the Healthcare industry, providing real-time services that can help people with vocal disabilities communicate better.
How is voice cloning technology involved in healthcare?
Voice cloning is one of these types of technologies that is capable of drastically changing the way people with speech and voice disabilities function on a daily basis.
People who suffer from Parkinson’s disease, ALS, multiple sclerosis, vocal fold paralysis, pharyngeal cancer, or other ailments can use Respeecher to replicate their own voices and speak naturally. While we are not yet delivering a final solution for patients, our technology has the potential to be integrated into hardware through licensing for B2B companies, and help people with neuromuscular problems and laryngectomy communicate in their normal voices.
How do you ensure that your voice cloning technology is used responsibly and prevent any potential misuse of it?
Respeecher takes responsible use of its voice cloning technology seriously and has implemented various measures to prevent potential misuse. These measures include integrating mitigation tools in both business streams and pre-selecting projects based on our ethical policies.
When working on tailored projects, such as cloning someone's voice, we require permission from the voice source. Additionally, Respeecher's ethical policy prohibits deceptive uses of synthetic speech, and we have pledged never to use the voice of a private person or actor without their consent. Although in a few instances, historical figures' voices have been used to showcase the technology's potential. Furthermore, Respeecher is developing two technical defenses: a synthetic speech detector and audio watermarking to help prevent any potential misuse of its voice cloning technology.
Find out more on our ethics page.
What other industries or use cases can Respeecher's voice cloning technology potentially expand into in the future?
Our aim is to expand filmmakers' creative horizons and democratize the technology so that even smaller film and TV studios and video game developers can utilize it to stretch their budgets and compete with bigger studios. We want to empower small creators to realize their ideas and creativity without being held back by financial limitations.
We are continuously improving our technology by making it faster, more efficient in terms of data requirements, user-friendly, and less resource-intensive. However, our vision doesn't just stop at the entertainment industry.
The healthcare applications of Respeecher's platform are of paramount importance, and we also have initiated several ethics-oriented initiatives to ensure that our technology is used for good and not abused.
We also work on expanding our applications for cybersecurity pros: prevent vishing, improve pen testing, or enhance the accuracy of voice-based identification.
Sound Designer and Supervisor, Actress
Respeecher is a remarkable tool for Sound Editors. It delivers very high-fidelity recreations of a target voice, with transparent performance-matching of its source. It blows text-to-speech out of the water! The effect is uncanny and incredibly effective and I can imagine a whole slew of uses going forward. I am very excited to have discovered Respeecher, and it will be my go-to for voice recreation in the future, without question.