Voice recognition systems have a wide range of applications in our modern world. Many devices employ them, from smart phones to car entertainment systems and common operational systems such as “Windows” and “iOS”. What many of these systems have in common is the use of voice recognition technologies to perform commands and access stored information.

These methods are still at an early stage of development, therefore, they may be exposed to vulnerabilities that keen hackers could exploit to access sensitive data. The major weakness related to voice recognition technologies is the relative easiness with which someone can obtain voice information about someone else. For instance, many people nowadays use different channels such as YouTube, SnapChat and the like, through which they make their voice a “common good”, so to speak, available for everyone with bad intentions to be stolen. Other authentication methods that require biometric information, such as fingerprints recognition technologies, are more difficult to trick as you will have to have access to someone’s fingerprints, therefore, a physical proximity is needed.

In the specifics, what’s the state of the art of voice recognition technologies? So to have an idea of the recent developments.

As mentioned before, voice recognition technologies include hardware and software specifically designed to decode human voice in order to perform certain commands or functions. The most common uses of these methods are the transcription of voice into text, the execution of software applications and the verification of someone’s identity.

A bit of history now. We have to travel back to 1952 to find the first voice recognition device. It was not computerized but it was able to recognize single digits pronounced by a human voice. Fast forwarding 40 years and we find the very prototype for a modern voice recognition system, the Sphinx-II, create by Xuedong Huang who is found amongst the founders of Microsoft speech recognition group. The Sphinx-II was able to recognize the voice in real time and could be deployed on modern software applications.

There are now a variety of fields, in which voice recognition technologies have taken center stage and have made it possible for hand-free and/or remote vocal commands to be performed. For instance, when it comes to automotive safety, voice recognition technologies have enabled drivers to make phone calls without having to remove their hands from the steering wheel: this has a huge impact on passengers’ safety. Customer support has recently seen a vast increase in the use of voice recognition systems, especially in their customer support telephone lines through which customers are asked to provide details of their enquiry and are then redirected to the suitable department.    Amongst other fields in which voice recognition technologies have been successfully deployed, we can list military, avionics, healthcare and telecoms.

Now for the nasty part: Are There Vulnerabilities In Voice Recognition?

There are the so-called voice impersonation attacks which can be used to perform a hack on most devices that deploy voice recognition technologies. In particular, two systems of voice authentication are exposed to vulnerabilities the most: “Siri” and “Google Now”.

Voice impersonation attack

So what’s a voice impersonation attack? It basically refers to the malevolent use of voice authentication systems to gain control of a device or software through the bypassing of all security mechanisms with copies of recorded or synthetized speech commands. In short, it’s the unlawful appropriation of someone else’s voice to access personal data. It has recently been found by a group of researchers at the University of Alabama at Birmingham that every device or system that relies on voice recognition technologies is vulnerable to voice impersonation attack. Exciting news for hackers.

In the end, you only need to possess a sample of user’s voice to then be able to unlock/gain unauthorized access to a device. As simple as that. And, as we said before since everybody’s voice has now easily made available through multiple online channels, it’s crystal clear how users are easy preys to voice impersonation ambushes.

And even if a user’s voice is not immediately traceable online, the researchers explained how a sample of it can still be collected via other means: a spam call can be one, so as the recording of someone’s voice through a recording device or even by compromising cloud servers in order to access audio files. There are several ways to acquire a sample of a user’s voice. According to Nitesh Saxena – the leader of the above-mentioned research group – in fact, few minutes are sufficient enough to be able to clone someone’s voice – a characteristic which is unique to each and every one of us – and use it maliciously.

Is there a critical issue here? Yes, there is. A biometric information as someone’s voice to be so freely available or easily traceable makes it almost fun for hackers to take control of one or multiple devices.

“Siri” Security Vulnerabilities

“Siri” is a famous voice recognition application that’s normally installed by default on iOS operational systems. It has been exploited twice: in 2011 by a hacker group based in China and in 2015 by a group of researchers at French-based ANSSI security organization. The researchers found out how Siri can be remotely controlled by anyone through radio waves. And even though the hack can only be performed on smartphones that have earphones with a microphone plugged in, one cannot deny the influence and consequences such actions may have: a remote-controlled device can be used to make premium-rate phone calls or financial operations.

“Google Now” Security Vulnerabilities

So as for Siri, also “Google Now” – the Android version of “Siri – shows dangerous vulnerabilities. The analogous application has similar functions: it can be used to gain info on different subjects, from current traffic data, to sports results or other tasks such as taking pictures or opening files, and so on and so forth. It was AVG, in this case, to perform a fake attack on Google Now. In order to do so, they created a game for Android OS, which required voice commands that could be then “stolen” and misused by hackers.

What to do to prevent the “voice robbers” to act?

According to experts, the creation of voice recognition systems that cold resist imitation attempts – through, for example, a stronger electromagnetic sensor – by hackers would now be the best defence. Moreover, “Siri” or “Google Now” users can – amongst other things – customize words to launch their software in order to enhance their security.

But these are just suggestions. The truth of the matter is that currently there is no real protection against this kind of threats that affect almost all those systems relying solely on voice recognition technologies. In fact, it appears that authentication systems that rely on multimodal biometric systems, that is a combination of two or more identification methods (fingerprints, iris recognition etc.), are undoubtedly safer and more secure.

Here another question jumps to mind: could all these modern methods that should guarantee our safety to be exploited and deployed to set up a highly-controlled Big Brother society?