Voice Cloning | Blog - Respeecher

What Are Deepfakes: Synthetic Media Explained

Written by Alex Serdiuk | Jun 15, 2021 10:43:48 AM

Deepfakes are one of the most unique phenomena of the last five years in the world of synthetic media. Many are afraid of this technology, others have figured out how to put it to productive use. It's time to figure out what deepfakes are exactly, and what makes them so significant in a world of modern media and generative AI technologies.

What is synthetic media and what makes up its market landscape?

Synthetic media is an AI-generated or AI-modified content. With traditional media, people relied on broadcasting networks (radio and TV) and social networks to create and distribute their content.

This method of receiving information came with certain restrictions, which, of course, were significantly weakened by the emergence of social networks.

With synthetic media, content creators can create content with a high level of quality that was previously only available to major studios with massive budgets.

AI content is cheaper and easier to scale. However, this democratization of content creation, facilitated by AI voice generators and voice AI technologies, comes with ethical considerations, notably the differentiation of AI-synthesized content from genuine content.

Respeecher, a pioneer in voice cloning technology is using watermarking technology that makes it simple to distinguish Respeecher-generated content from other audio, even if it is mixed in with other sounds. As a key player in the voice cloning market, we are taking ethics very seriously, and that's why we're following a strict voice cloning ethics code. Find out more on the Respeecher FAQ page.

While the tech community and policymakers formulate proper regulation, movie studios, video bloggers, and the education sector are reaping the benefits of this technology.

The current landscape for the synthetic media market has been covered in detail by a recent Samsung Next study. Here are the key media sectors disrupted by this emerging technology: 

  1. Speech and voice synthesis
  2. Music & sound synthesis
  3. Image synthesis
  4. Video synthesis
  5. Game content synthesis
  6. Digital avatar synthesis
  7. Mixed reality synthesis
  8. Natural-language generation

Keep in mind that the majority of synthetic media use cases are run on deepfake technology.

Deepfakes in a nutshell 

In short, deepfakes are artificial intelligence-based images and sound synthesis techniques. They are used to join and overlay existing images, videos, and soundtracks onto original content.

In most cases, deepfakes use generative adversarial neural networks (GANs) to create this type of content. One part of the algorithm learns from a real media object. It creates an image that literally "competes" with the second part of the algorithm until it starts confusing the generated copy with the original.

Here's how deepfakes work in three key steps (using video production as an example):

  1. It begins with inserting the original video or voice of the target character into a neural network. Autoencoder and GAN algorithms go to work by analyzing the subject's facial expressions and main features. 
  2. Combining an autoencoder with GAN allows the algorithm to generate fake images until it can no longer distinguish them from the original.
  3. The video with the stunt double is then inserted into the neural network. After having analyzed the facial characteristics of the target subject, the network can easily generate a deepfake. The target subject's face is then overlaid onto the video of the stunt double. 

In voice cloning, where other algorithms are used, the process itself practically does not change.

Deepfake use cases 

The most common examples of deepfakes are videos in which the authors swap people's faces with other actors. You can find many deepfake cosplays of Hollywood actors like Tom Cruise or Arnold Schwarzenegger on the web. Less often, there are genuinely unique projects where the technology is used at the level of an art form.

One such project is the resurrection of Vince Lombardi for the Super Bowl. Respeecher created Vincent's speech for this project, and you can appreciate how brilliant the final product turned out and how it showcases the potential of AI voice cloning to breathe life into historical figures.

Here's what Abigail Savage, Sound Designer and Actress that starred in Orange Is the New Black, had to say about Respeecher's AI-synthesized voice cloning: