Input specialists Nuance Communications, the team behind Dragon Naturally Speaking and Dragon Dictation, are looking into the future and seeing the world speaking to their computers. The company recently released a video showcasing a concept called Dragon ID, where people protect and access sensitive data, like their bank accounts, using only the sound of their voice.
But can you trust a voice to protect your hard-earned cash? After all, comedians impersonate celebrities all the time. Couldn’t someone simply pretend to be you?
We spoke with Jason Stirling from Nuance, who explained how everyone has a unique voice, which they refer to as a voiceprint.
“If you look at this from a technical authentication viewpoint, there are five types of biometric authentication. We’re most familiar with fingerprints and iris scans (you see those in James Bond movies), and there is facial recognition, used in airports with passports,” said Stirling. “One of the others is your voice. Everyone’s vocal cord is unique, like a fingerprint, and that means we all have a unique voiceprint.”
But he agrees that it could be a difficult job to convince the wider community of the power of their voice.
“We’ve been talking with consumers about this, and we’ve found that, when you call it a voice fingerprint, consumers get it. But when you tell them that they have a unique voiceprint, they say, what’s a voiceprint?”
Still, if anyone is in a position to change the way we interact with machines, it is Nuance. The US-based company has been front and centre on phones we have all used over the last decade or so, providing many of the big electronics companies with the XT9 keyboards and input software for many of these phones that we have owned and fallen in love with.
The influence of Nuance extends beyond phones, too, including the keyboard interfaces on many smart TVs and even the speech-to-text responses you hear in elevators. Recently, Nuance acquired Swype, a continuous input keyboard system for smartphones.
“It’s Nuance’s goal to create simple, safe user interfaces to improve the man-machine interface … using a multi-modal approach, where you can touch, you can speak or you can use gestures,” said Stirling. “We’re not advocating that voice is everything, we advocate that, as a user, you should have the choice to use what is most simple for whatever you are doing.”
Once you have used your voiceprint to authenticate yourself to the system, the challenge then is to create a natural, conversational dialogue between the user and the machine. Stirling says that Nuance is investing heavily in this natural communication, but that it is a difficult task to accomplish.
“There’s three pieces. Firstly, there is speech recognition, or capturing what was said. The second part to the problem is around semantic understanding, or natural language processing. So if I’ve captured what was said correctly, I’m also trying to capture the intent or what was meant,” said Stirling. “Let’s say you asked, am I going to be late to my meeting? We then have to check a time reference and we have to know where the meeting is, so we have to hit a diary. And that moves into the third element of the problem …a disambiguation process. If [the system] is unclear about some part of the request, we might come back in a conversational way, the way you or I might speak, and try to clarify what information you were trying to get to.”
But as mobile devices, like smartphones, become more powerful and our connection to the internet becomes faster, more ubiquitous and more reliable, the possibilities are expanding.
“The mobile networks are better, the computing power is better, the microphones [on smartphones] are ten-times better than they were two years ago, so we can capture things more accurately … there’s so much more scope for what we can do. It’s really exciting.”