Talk to any of the executives at Nuance, the voice recognition specialists, and it’s not long before they get round to Star Trek. “We haven’t reached full Star Trek-ness yet, but it’s coming,” said Peter Mahoney, senior VP and chief marketing officer, in an interview with The Independent at Nuance’s Burlington, Massachusetts headquarters.
By this he doesn’t mean, sadly, the Universal Translator which listens to any language known to the Enterprise’s crew – except, of course, Klingon – and translates it on the fly into perfect American English with lip-synching to match. He’s talking about a system which will listen to natural, easily spoken language and convert it to reliably accurate printed text or understand it well enough to perform an activity like paying a bill, sending flowers to Mum and so on.
Of course, there are voice-activated phone services in widespread use already, and they offer a level of security that real people don’t. Statistics, Nuance said, showed that if you want to fraudulently access someone’s bank account, say, the best way by far was to talk to a human being. Machines are less susceptible.
Even so, the most up-to-date systems mostly still require users to stick to a limited vocabulary. To make things work better, Mahoney explained, you need to have a system that knows what you mean if you say “Damn, I forgot to send a birthday card to my brother” that what you really need to say is “Please send this message to this person at this address.” The increase in processing power means that Nuance can now build in recognition of dozens of ways of saying the same thing, to increase naturalness.
The other topic that comes up repeatedly with Nuance is Siri, the voice recognition service on Apple’s iPhone 4S which, to be fair, is coming closer to translating natural words into actions than most. Ask it if you need an umbrella today and it understands you want to find out the weather forecast for where you are and answers in colloquial English.
Mahoney said that just as iTunes popularised downloadable music, so Siri is popularising voice interactions. Since Siri was announced, there have been companies beating their way to his door saying they want voice control, too.
It’s not that Nuance execs are jealous of Apple’s system. Nuance has supplied technology used in some Apple products. That’s all they’ll say, but it’s assumed that this involvement includes at least parts of Siri.
After all, Nuance is the indisputable leader in this kind of technology. It provides the solution for a lot of companies, from in-car systems to healthcare. And where Nuance doesn’t supply the software, you’ll probably find it’s done by Vlingo – formerly a rival and now in the process of becoming part of Nuance. This is the company that gave us T9, the predictive text input system which saved us all sore thumbs as we texted. It has in its repertoire Swype, a cool text input system where you slide your finger across a touchscreen keyboard to make words. And it knows the value of data. SpinVox was a British company which translated voicemails into text – a boon if you didn’t have time to call the network and listen all the time. When it hit problems, Nuance swooped in and bought it. With the company came the translated messages, adding to Nuance's database.
Hullomail, a useful smartphone app which sends your voicemails to your as MP3 files so you can save them or delete them as you please, is now a Nuance service user, too. Its Get the Gist system comes from Nuance. This turns the first 10 seconds of each voice message into text so you know if you need to listen or – if it begins “You may have been missold payment protection…” – not.
This is mostly how Nuance works, supplying software and technologies to companies who build them into their products, rather than direct to the public. There are Nuance products like PC and Mac programs and App Store apps –Dragon Dictation is an example – but it’s not the major part, and can give the impression that Nuance is a small player. It’s really not. Though the arrival of Dictation built into Mac’s next software release could dent sales of some of these products.
Of course, there’s a long way to go before Lieutenant Uhura’s out of a job. As anyone who’s demonstrated Siri to friends knows, it’s not always reliable. Too much background noise or a poor data connection can leave the demonstrator red-faced while friends scornfully insist it would have been quicker to type it in.
And talk to a journalist – actually any journalist – and they’ll tell you that all they want is quick, flawless transcription of the interviews they’ve conducted. It doesn’t seem much compared to a Universal Translator but try it, even with the best programs available (made by Nuance, no surprise) and you’ll see we’re some way off.
Mahoney feels that it’s almost here, probably months rather than years. This is good, and means we can finally crack on with those other pressing developments, like warp speed and transporter beams.