New Ethical Dilemma in Voice Synthesis: Vishing and Its Consequences
Feb 21, 2023 8:22:15 AM
The emergence of technology that can generate realistic-sounding voices has created an ethical dilemma: how can we trust voice synthesis technology when criminals can use it to deceive people? In this blog, we'll look at one particular type of fraud enabled by AI voices called vishing, its potential consequences, and how to overcome this latest ethical dilemma.
Voice synthesis introduction
On the topic of voice synthesis technology, people usually refer to it as an advanced form of speech-to-speech (STS) conversion.
This technology clones human speech using recordings from both the target and source voices. Voice cloning operates using advanced AI/ML algorithms to produce a unique, natural voice with the same tonal characteristics as the original speaker.
STS is quite different from a common text-to-speech (TTS) application, which relies on dictionaries and annotations to generate emotions. In contrast, STS utilizes recordings from the source speaker to accurately imitate their voice. Furthermore, TTS systems are unsuited for low-resource languages due to higher data requirements. In contrast, STS systems can produce native-sounding voices regardless of these types of limitations.
Respeecher’s voice cloning stands out as one of the most advanced forms of STS. We allow creators to synthesize realistic audio content that captures all nuances of authentic voices while conveying emotion and other subtle variations into the new speaker's timbre.
Voice synthesis in various applications, enabling people and businesses to create reproductions of human voices in different ways:
Assistive technology makes use of voice synthesis technology to provide support to those with speaking impairments. It lets users personalize their synthetic voice output, making it sound natural. A person that lost their voice could get it back without compromising any degree of likeness.
Dubbing is becoming increasingly popular by utilizing voice synthesis to align the dialog to sync with the actor's lip movements. It is used increasingly in the film industry to create dubbed versions of movies and other similar works.
Audiobooks and podcasts also benefit from this technology, as high-quality audio recordings can be generated without requiring a particular speaker's presence during production.
Call centers and customer support are leveraging this technology to impress customers and reduce operational costs by automating customer interactions using generated synthetic voices.
In entertainment, synthesized voices have given way to engaging experiences for video games and YouTube projects.
Synthesized voices are also used for educational purposes, such as cloning the voice of historical figures to create realistic interactive experiences. Check these case studies using the voices of Richard Nixon and Manuel Rivera Morales to get a better idea.
As you can see, positive examples of using this technology abound. But as with any technology, one can find a number of different harmful applications. One of them (and the best known to date) is malicious manipulation in the social and political spheres.
Ethical concerns regarding voice synthesis technology
The recent deepfake of Ukraine's President Zelensky have highlighted synthetic media's power and potential danger in modern society.
Both cases demonstrated how seemingly realistic footage could be generated using AI to spread false information or manipulate public opinion. This begs a critical ethical discussion: how should this technology be used responsibly and ethically?
In a world where synthetic media can increasingly replicate real-life events, we must strive to keep conversations honest, open, and transparent. Individuals must be aware when exposed to misinformation; they must be able to differentiate between real and synthetic media to form their own opinion.
We'll discuss some ethical options below. But for now, let's look at vishing, one of the most dangerous forms of synthetic voice fraud.
Vishing with a synthetic voice
Vishing (voice phishing) is a social engineering attack where malicious actors use phone calls to target organizations or individuals for financial gain. It takes advantage of people's trust in familiar companies and brands and other forms of psychological manipulation to steal personal information such as credit card numbers and passwords.
Gender and vocals are integral to a successful vishing campaign as voice can create a sense of trust between caller and victim. Studies have shown that people perceive women as more honest and trustworthy, offering an advantage to cyber attackers in this scenario. With synthetic voice technology, attackers can now sound like anyone they choose, enabling them to launch highly effective vishing campaigns that go undetected by victims.
The list of potential negative uses of this technology goes beyond bank fraud. The FBI recently issued a public service announcement warning of deepfakes used by fraudsters to impersonate job applicants during online interviews.
The scam is concerning as the targeted jobs involve access to customer PII, financial data, corporate IT databases, and proprietary information. Facing potential business and legal repercussions due to unauthorized access to PII, businesses must be aware of this issue and take necessary steps to prevent it.
Vishing is a growing threat, and Richey May (a financial services and IT consultancy) and Respeecher are taking steps to combat the problem.
The two companies have developed scenarios for using synthetic speech for social engineering penetration testing. This includes creating simulations where an engineer sounds like a specific person and attempts to acquire information over a call or video conferencing app. By conducting vishing tests, organizations can identify potential personnel vulnerabilities and address these issues with proper training.
Voice synthesis and code of ethics
Voice cloning technology is growing in popularity due to its ease of production and lack of regulation. Companies must take caution when using this technology in their development as it can have long-lasting implications for individuals and society.
Over the past five years, Respeecher has established itself as the go-to voice cloning provider for Hollywood studios. Famous voice IP owners have chosen to work with itment to producing outstanding cloned voices and developing strict ethical standards throughout the industry.
In addition to anti-voice phishing initiatives, Respeecher is working to develop a broader list of principles that set the standard for ethical voice cloning.
To ensure that our technology is not used with malicious intent, Respeecher does not provide any public API for cloning voices. We only work with trusted clients, require written consent from voice owners, and approve projects that meet our standards.
Additionally, Respeecher develops watermarking technology to identify Respeecher-generated content. We are working with broad voice engineering, voice actors, and movie studio communities to educate the public, build detection algorithms, and prevent technology abuse.
If you want to learn more about our ethical standards, check this page. We hope you appreciate the opportunity to use voice cloning technology in a safe and ethical manner.
Head of Ethics and Partnerships
Blending a decade of expertise in international security with a passion for the ethical deployment of AI, I stand at the forefront of shaping how emerging technologies intersect with national resilience and security strategies. As the Head of Ethics and Partnerships at Respeecher, I focus on guiding ethical AI development. My role is centered around promoting the responsible use of AI, especially in synthetic media.