Office romance: Joaquin Phoenix as Theodore Twombly, who falls in love with a computer, in ‘Her’

Users' impatient gripes fail to acknowledge how far we've come

One of the more embarrassing tasks that regularly faces me at work is listening back to interviews that I've conducted and transcribing them for print.

While my interviewees are invariably fascinating, coherent and precise, I wince at my own stuttered mumblings, e.g. "Well, it's like, er, you know, a bit like what you were – sorry, I was saying a little while back about, um, just that". As native English speakers, we have little trouble parsing that sentence; it's just the sound of man registering a link in his head. A speech-recognition program, however, would choke on it. It's shot through with so much ambiguity and vagueness that it's barely worth wasting processor power on – but the computer wouldn't know that. It would soldier on and make a valiant stab at guessing what I'd said. But what if it had to translate it into another language? No chance. At least, not yet.

Computer comprehension is being sold to us on a daily basis. That Google TV advert where the mum asks the search engine "What is a rockhopper penguin?" will have prompted innumerable people to ask their own computers what a rockhopper penguin is. (Well, I know I have.) When computers understand what we're saying, we're delighted; when the WordLens app translated foreign signs using a smartphone camera, our eyes widened in disbelief. When Siri told us a few weak gags, we chuckled obligingly.

When, in 2012, Microsoft Research demonstrated computer translation of spoken English into spoken Chinese, the audience in Tianjin gasped in astonishment. When they get it wrong, we roll our eyes and dismiss the idea that computers could ever properly understand us. But we forget three things: a) that our language patterns can be ridiculously ambiguous; b) that our own comprehension of language is truly amazing; c) that computers are, all things considered, achieving remarkable things. Just before Christmas, Microsoft released a preview of Skype Translator (for Windows 8.1), which allows English and Spanish speakers to have spoken conversations translated on the fly.

While we've long been familiar with speech-to-text, text translation and text-to-speech technologies, Microsoft stresses that this isn't just a daisy chain. It talks instead of "deep neural networks", simulations of the human brain that recognise, generate and, most importantly, learn. Some reviewers have been gobsmacked by Skype Translator, with Peter Bright at Ars Technica, a self-acknowledged cynic, describing it as "magical". Others, however, have pooh-poohed it as another feeble step forward. They complain that it doesn't deal well with unusual accents, or slang, or children's voices, or real names, and that if it were used for international diplomacy it's likely that the world would be plunged into conflict in a matter of hours.

But these are impatient gripes that fail to acknowledge how far we've come. The fact that Skype Translator works at all is astounding; users will be perfectly aware of its limitations, and the more these technologies are used, the bigger the corpus of data, the better they're going to get.

My ums, ahs and y'knows – my "disfluencies" – are starting to be spotted as such. That old joke where you put a load of text into Google Translate, turn it into German and then back into English, isn't quite as funny any more, because the translations are better. Speech synthesis no longer sounds like a distressed robot. We're pursuing a sci-fi dream here. It will eventually come true – just not as soon as we'd like it to.