A sci-fi set scene highlighting advanced real-time animation techniques used in smart remote production showcased at IBC2021

by Orysia Khimiak – Jan 10, 2022 9:26:25 AM • 8 min

IBC2021 Accelerator: Smart Remote Production For Real-Time Animation

•••

In the latest publication of IBCDaily, an article appeared about the "Smart Remote Production for Real-Time Animation" Accelerator, which Respeecher was involved in.

Together with VRT, Rai - Radiotelevisione Italiana, Yle, RTE, European Broadcasting Union, Digital Domain, RADiCAL, NVIDIA and Epic Games/Unreal Engine, we will compete against other accelerators on this year's IBC event for the "Media Innovation Project of the year Award".

For this we showcase and compare two different workflows of a low budget, remote pipeline, using off the shelf tools and the power of smartphones and generative AI.

Read the full article below.

Accelerating the animation pipeline without increasing the amount of kit required is the principal aim of the Smart Remote Production For Real-Time Animation Accelerator project, which is bringing together leading broadcasters and vendors to leverage the latest technological developments of markerless motion capture and speech-driven facial animation to drive CG performances in the Unreal real-time render engine.

“We don’t usually work directly with vendors, so this is very interesting to us,” says RAI’s Roberto Iacoviello.

Indeed, Italy’s RAI is just one of a strong broadcast contingent of Accelerator Champions that also include RTÉ, VRT, YLE, and the EBU, working alongside the Entertainment Technology Center at USC, Digital Domain, and Unreal Engine developer Epic Games.

With participants including RADiCAL and Respeecher, with guidance from IBC Accelerator supporter Nvidia, this Accelerator represents a powerful conglomeration of talent and expertise that is looking to create the most effective, low-cost pipeline that they can to take material from script to 3D character in a real-time, distributed workflow environment.

Low cost and accessible

One of the key elements to this Accelerator is the phrase ‘low-cost’. As well as testing the feasibility of using vocal performance and body posture to drive 3D avatars from remotely connected locations, its stated aim is to do all this while using minimal equipment.

Even more so, according to one of the project leads, RTÉ’s Ultan Courtney, the wish is to utilize technology that exists less in production facilities and more in the pockets of the people that work there and their audiences.

“As opposed to having studios doing volumetric capture and other high-end techniques, we want to look at a workflow that engages millions of people,” he says.

“Most households have a smartphone, for example, so we’re working on that basis and what is the most accessible, highest tech we can work with to tell stories and communicate.”

This changes the technological emphasis of performance capture markedly. As opposed to a high technology solution involving wearing marker-based motion capture suits in a precisely delineated and monitored capture volume, the tech emphasis is shifted to the AI processing of a standard video signal.

The AI tracks the performance and converts that into an avatar, mapping limb and facial movements onto a virtual character.

“This is the difference between now and a few years ago; there is a lot of generative AI technology that can help us at a low budget,” explains Iacoviello. “There are three aspects where AI has made a lot of progress: marker-less capture, emotional response and speech animation.”

Voice cloning

Speech is one of the areas where this Accelerator is using interesting new generative AI tools. Respeecher is software that effectively clones voices using Deep Learning techniques and has already been used on The Mandalorian to create the voice of a young Luke Skywalker and by the NFL to recreate the voice of deceased football coach Vince Lombardi at the

Speech is one of the areas where this Accelerator is using interesting new generative AI tools. Respeecher is software that effectively clones voices using Deep Learning techniques and has already been used on The Mandalorian to create the voice of a young Luke Skywalker and by the NFL to recreate the voice of deceased football coach Vince Lombardi at the 2021 Super Bowl with the assistance of voice AI. The process of creating these AI generated content is fascinating, and on the voice cloning FAQ page, you can discover how the de-aging of voice works, as well as the entire voice cloning process. Additionally, our voice cloning software allows for precise replication of specific voices, offering unprecedented versatility in voice generation applications.

“In this project the idea is to make one source speaker be able to speak in several different voices for the POC piece,” says company co-founder Alex Serdiuk.

“To date we’ve offered white glove service where we need to be involved, but this uses our new self-serve Voice Marketplace. There are many challenges for us with this as we lose a lot of control over the recording conditions and performance levels of the actors involved, but it is important to be able to democratise these generative AI tools so that they can be used in the sort of environments that ideally would require just an iPhone.”

Of course, one of the challenges any Accelerator team faces is that while the tools may be available to make the POC a success, getting them to work together in a designated workflow is not always easy.

Sometimes they are lucky and the work has already been done. For instance, once motion has been captured using RADiCAL and uploaded to the cloud, RADiCAL provides an animated FBX file which can be used in both Omniverse and Unreal to retarget the animation to CG characters.

In this case, as an illustration, the wrnch CaptureStream iOS-based AI capture software can already export into Nvidia’s Omniverse real-time simulation and collaboration platform via the wrnch AI Pose Estimator.

However, other areas of the workflow still need work. Despite Omniverse’s success at becoming a universal solvent for 3D tools, there are always challenges.

For example, can the team’s Pose Estimation workflow be easily connected to the speech-to-facial animation workflow that the team is working on? And can the resulting characters then be imported from Omniverse and integrated into Unreal environments?

“We are pushing things forward but not everything is designed to be opened up to other uses. There are a few steps forward and back, and that’s where the Accelerator helps; having a target forces us to crash through all these problems,” says Courtney.

Multiple use cases

As the RAI team - which also includes Alberto Ciprian and Davide Zappia - points out, the TV market is not the only one that will be interested in the concept of realistic realtime character animation.

Virtual influencers, for example, become a distinct possibility, while its use in high-end pre-production or even to drive real-time performances in mixed reality LED capture volumes is fairly compelling. “When you go into the digital world the use cases are limited only by your imagination,” says Zappia.

Paola Sunna, senior project manager at EBU Technology & Innovation, says: “I was really happy to see so many broadcasters involved in this project. What I really hope already is that we keep going on this project even after IBC is over. We are learning so many things.”

For more information on the IBC Accelerator Media Innovation Programme, supported by Nvidia, visit the IBC website.

Smart Remote Production For Real-Time Animation

Champions: RTÉ, EBU, RAI, VRT, YLE, ETC/USC, Digital Domain, Unreal/Epic Games

Participants: Respeecher, RADiCAL

FAQ

Smart Remote Production leverages AI and accessible technology to create real-time animation with minimal equipment. By using smartphones and generative AI, Smart Remote Production enables low-cost workflows for real-time animation in environments like Unreal Engine, powered by technologies like markerless motion capture and speech-driven facial animation.

Markerless motion capture uses AI-powered speech synthesis and video processing to track body movements without requiring physical markers. By analyzing standard video signals, it converts real-world performance into real-time animation, enabling natural-sounding voice generation and seamless 3D avatar creation with minimal equipment.

Speech-to-facial animation enables characters to reflect emotional nuances from speech in real-time, enhancing realistic animation. Using generative AI and AI-powered speech synthesis, this technology improves the synchronization between voice and facial expressions, allowing for more expressive and engaging CG performances in virtual environments.

Glossary

Smart Remote Production

A cutting-edge approach using Generative AI, markerless motion capture, and Speech-to-facial animation for real-time animation in Unreal Engine with Voice cloning.

Markerless Motion Capture

A Generative AI technique used in Smart Remote Production for real-time animation, capturing movement without markers, enhancing Speech-to-facial animation and Voice cloning in Unreal Engine.

Generative AI

A technology driving Smart Remote Production and real-time animation, enabling Markerless motion capture, Voice cloning, and Speech-to-facial animation in Unreal Engine.

Speech-to-Facial Animation

A technique in Smart Remote Production that uses Generative AI to synchronize facial movements with speech, enhancing real-time animation in Unreal Engine.

Voice Cloning

A Generative AI technology that replicates human voices, enabling Smart Remote Production and real-time animation with synchronized speech and facial movements in Unreal Engine.

Unreal Engine

A powerful platform for real-time animation, enabling Smart Remote Production with Generative AI, markerless motion capture, and speech-to-facial animation.

Orysia Khimiak

PR and Comms Manager

For the past 9 years, have been engaged in Global PR of early stage and AI startups, in particular Reface, Allset, and now Respeecher. Clients were featured in WSJ, Forbes, Mashable, the Verge, Tech Crunch, and Financial Times. For over a year, I Orysia been conducting PR Basics course on Projector. During the war, became more actively involved as a fixer and worked with the BBC, Guardian and The Times.

Did you like this content?

Previous Article

The Future of Audio-as-a-Service: AI in Film, Gaming, Social Media, and Advertising

AI Voices and the Future of Speech-Based Applications

Clients:

Recommended Articles

How to denoise your audio for better voice conversion

Jun 19, 2025 | 11 minutes read

How to denoise your audio for better voice conversion

Respeecher's AI Voice Lab is a team of amazing, daring, and creative sound specialists who go beyond basic voice conversions. We asked them to share - and demonstrate - ...

# Respeecher Voice Marketplace

Jun 11, 2025 | 8 minutes read

Choosing the Right Voice for Your Brand: A Step-by-Step Guide

When people hear the term brand voice, they often think of written tone or visual personality. But another critical element is the literal voice that represents your ...

How to generate backing vocals for a track

Jun 6, 2025 | 10 minutes read

How to generate backing vocals for a track

Respeecher's AI Voice Lab is a team of amazing, daring, and creative sound specialists who go beyond basic voice conversions. We asked them to share - and demonstrate - ...

# Respeecher Voice Marketplace

The Role of AI Voice APIs in Building Accessible Smart Cities

Oct 25, 2024 | 9 minutes read

The Role of AI Voice APIs in Building Accessible Smart Cities

As urban environments are becoming smarter, the role of AI voice API in enhancing accessibility becomes increasingly critical. Smart cities make use of technologies like ...

# Respeecher Voice Marketplace

Keep up with a rapidly evolving industry

Get the monthly newsletter keeping thousands of sound professionals in the loop and up-to-date