How Google Translate works

The web giant's translation service might serve up the odd batch of nonsense, but it's still one of the smartest communication tools of all time, as David Bellos explains

Using software originally developed in the 1980s by researchers at IBM, Google has created an automatic translation tool that is unlike all others. It is not based on the intellectual presuppositions of early machine translation efforts – it isn't an algorithm designed only to extract the meaning of an expression from its syntax and vocabulary.

In fact, at bottom, it doesn't deal with meaning at all. Instead of taking a linguistic expression as something that requires decoding, Google Translate (GT) takes it as something that has probably been said before.

It uses vast computing power to scour the internet in the blink of an eye, looking for the expression in some text that exists alongside its paired translation.

The corpus it can scan includes all the paper put out since 1957 by the EU in two dozen languages, everything the UN and its agencies have ever done in writing in six official languages, and huge amounts of other material, from the records of international tribunals to company reports and all the articles and books in bilingual form that have been put up on the web by individuals, libraries, booksellers, authors and academic departments.

Drawing on the already established patterns of matches between these millions of paired documents, Google Translate uses statistical methods to pick out the most probable acceptable version of what's been submitted to it.

Much of the time, it works. It's quite stunning. And it is largely responsible for the new mood of optimism about the prospects for "fully automated high-quality machine translation".

Google Translate could not work without a very large pre-existing corpus of translations. It is built upon the millions of hours of labour of human translators who produced the texts that GT scours.

Google's own promotional video doesn't dwell on this at all. At present it offers two-way translation between 58 languages, that is 3,306 separate translation services, more than have ever existed in all human history to date.

Most of these translation relations – Icelandic to Farsi, Yiddish to Vietnamese, and dozens more – are the newborn offspring of Google Translate: there is no history of translation between them, and therefore no paired texts, on the web or anywhere else. Google's presentation of its service points out that given the huge variations between languages in the amount of material its program can scan to find solutions, translation quality varies according to the language pair involved.

What it does not highlight is that GT is as much the prisoner of global flows in translation as we all are. Its admirably smart probabilistic computational system can only offer 3,306 translation directions by using the same device as has always assisted intercultural communication: pivots, or intermediary languages.

It's not because Google is based in California that English is the main pivot. If you use statistical methods to compute the most likely match between languages that have never been matched directly before, you must use the pivot that can provide matches with both target and source.

A good number of English-language detective novels, for example, have probably been translated into both Icelandic and Farsi. They thus provide ample material for finding matches between sentences in the two foreign languages; whereas Persian classics translated into Icelandic are surely far fewer, even including those works that have themselves made the journey by way of a pivot such as French or German. This means that John Grisham makes a bigger contribution to the quality of GT's Icelandic-Farsi translation device than Rumi or Halldór Laxness ever will. And the real wizardry of Harry Potter may well lie in his hidden power to support translation from Hebrew into Chinese. GT-generated translations themselves go up on the web and become part of the corpus that GT scans, producing a feedback loop that reinforces the probability that the original GT translation was acceptable. But it also feeds on human translators, since it always asks users to suggest a better translation than the one it provides – a loop pulling in the opposite direction, towards greater refinement. It's an extraordinarily clever device. I've used it myself to check I had understood a Swedish sentence more or less correctly, for example, and it is used automatically as a webpage translator whenever you use a search engine.

Of course, it may also produce nonsense. However, the kind of nonsense a translation machine produces is usually less dangerous than human-sourced bloopers. You can usually see instantly when GT has failed to get it right, because the output makes no sense, and so you disregard it. (This is why you should never use GT to translate into a language you do not know very well. Use it only to translate into a language in which you are sure you can recognise nonsense.)

Human translators, on the other hand, produce characteristically fluent and meaningful output, and you really can't tell if they are wrong unless you also understand the source – in which case you don't need the translation at all.

If you remain attached to the idea that a language really does consist of words and rules and that meaning has a computable relationship to them (a fantasy that many philosophers still cling to), then GT is not a translation device. It's just a trick performed by an electronic bulldozer allowed to steal other people's work. But if you have a more open mind, GT suggests something else.

Conference interpreters can often guess ahead of what a speaker is saying because speakers at international conferences repeatedly use the same formulaic expressions. Similarly, an experienced translator working in a familiar domain knows without thinking that certain chunks of text have standard translations that he or she can slot in.

Translators don't reinvent hot water every day. They behave more like GT – scanning their own memories in double-quick time for the most probable solution to the issue at hand. GT's basic mode of operation is much more like professional translation than is the slow descent into the "great basement" of pure meaning that early mechanical translation developers imagined.

GT is also a splendidly cheeky response to one of the great myths of modern language studies. It was claimed, and for decades it was barely disputed, that what was so special about a natural language was that its underlying structure allowed an infinite number of different sentences to be generated by a finite set of words and rules.

A few wits pointed out that this was no different from a British motor car plant, capable of producing an infinite number of vehicles each one of which had something different wrong with it – but the objection didn't make much impact outside Oxford.

GT deals with translation on the basis not that every sentence is different, but that anything submitted to it has probably been said before. Whatever a language may be in principle, in practice it is used most commonly to say the same things over and over again. There is a good reason for that. In the great basement that is the foundation of all human activities, including language behaviour, we find not anything as abstract as "pure meaning", but common human needs and desires.

All languages serve those same needs, and serve them equally well. If we do say the same things over and over again, it is because we encounter the same needs, feel the same fears, desires and sensations at every turn. The skills of translators and the basic design of GT are, in their different ways, parallel reflections of our common humanity.

This is an extract from 'Is That A Fish In Your Ear: Translation and the Meaning of Everything' by David Bellos published by Particular (£20). To order a copy for the special price of £16.50 (free P&P), call Independent Books Direct on 08430 600 030, or visit independentbooksdirect.co.uk

Independent Comment
blog comments powered by Disqus
News in pictures
World news in pictures
Life & Style blogs

It’s National Work From Home Day today

Plus live in a folly tower and Towcester growth

Where have property prices been reduced most in the UK?

Plus how much you need to earn to rent in London, and new homes figures

Is Rushcliffe the best place for families to live?

Plus where The Apprentices live, house price growth outside London, and househunter numbers

       
Independent
Travel Shop
South Africa
15 nights from only £1,899pp Find out more
Paris and the Cote d’Azur city break
Seven nights from £579pp Find out more
Seville, Granada and Malaga break
Seven nights from £549pp Find out more

ES Rentals

    iJobs Job Widget
    iJobs Gadgets & Tech

    WPF Developer (C#, VB.Net) - North East - 6 Months

    £240 - £260 per day: Progressive Recruitment: WPF Developer (C#, VB.Net) North...

    UAT

    Negotiable: Progressive Recruitment: Windows 7 upgrade UAT Application Testing...

    Perl Developer - £55k - Havant

    £50000 - £55000 per annum: Progressive Recruitment: An experienced Perl Develo...

    CRM SAP CONSULTANT, WEST SUSSEX

    £50000 - £60000 per annum + Excellent benefits package: Progressive Recruitmen...

    Day In a Page

    The price of pacifism: Refusing to go to war is finally being recognised as a brave act

    The price of pacifism

    From the Second World War refusenik to the 19-year-old Israeli, Holly Williams talks to five people who risked shame and suffering to take a stand as conscientious objector.
    'It was mass hysteria': Jason Isaacs on groupies, theatre bores and snogging James Bond

    Jason Isaacs: Groupies, theatre bores and James Bond

    To millions, Jason Isaacs is one of Harry Potter's arch enemies – but his wife prefers him as a Scottish TV detective.
    Notes from a small island: Is Sealand an independent 'micronation' or an illegal fortress?

    Sealand: 'Micronation' or illegal fortress?

    Thomas Hodgkinson spent a week at the tiny platform off the Suffolk coast to find out.
    Not a bad bone: Mark Hix cooks with cutlets and ribs

    Mark Hix cooks with cutlets and ribs

    If you ignore cutlets and ribs, you'll risk missing out on some delicious and easy meals, says our chef.
    Sir James Dyson’s latest project: Cleaning up hospitals

    Sir James Dyson’s latest project: Cleaning up hospitals

    Doctors are hailing the revamp of a Bath neonatal unit, where babies sleep more and feed better, as the model for patient care
    One man returns to Argentina's town that drowned

    One man returns to Argentina's town that drowned

    Epecuen was submerged under 10 metres of water in 1985. Now the floods have gone – and 83-year-old Pablo Novak has moved back in
    The real thing? Historian publishes Coca Cola's 'secret formula'

    The real thing?

    Historian publishes Coca Cola's 'secret formula'
    Gordon Ramsey's worst nightmare: A restaurant he cannot save

    Gordon Ramsay's worst nightmare: A restaurant he cannot save

    The pugnacious chef finally met a shambolic restaurant he couldn't save. John Walsh on when TV makover refuseniks fight back
    Join Ryanair! See the world! But we're only paying you for nine months a year

    Join Ryanair! See the world! But we're only paying you for nine months a year

    Glamorous myth of the flight attendant lifestyle undermined by angry employee's claims of 'exploitation'
    Braising saddles: Did the recent furore scupper sales of horse meat? Neigh, far from it!

    Braising saddles: How to cook horse meat

    Did the recent furore scupper sales of horse meat? Neigh, far from it! Will Coldwell hoofs it to the kitchen.
    Why bitters are back on the bar: A few little drops pack a big punch in cocktails

    Why bitters are back on the bar

    A few little drops pack a big punch in cocktails. No wonder we're learning to love them again...
    The 10 Best barbecues

    The 10 Best barbecues

    Whether you're cooking on gas or are a convert to charcoal we've got the perfect way to cook when the sun is out.
    Style icon David Beckham calls time on his long retirement

    Style icon calls time on his long retirement

    David Beckham never disgraced himself but former England captain ceased to be a major player years ago. Remember him at his United peak
    Steve Harper: My darkest times

    Steve Harper: My darkest times

    As the popular Newcastle goalkeeper bows out after 20 years at the club, he tells Martin Hardy about the private battle with depression that threatened his career
    Sir Torquil Norman has designed a flat-pack OX truck for the developing world

    The flat-pack truck with big ambitions

    After making a fortune from Polly Pocket and a doll's house shaped like a teapot, the entrepreneur has turned his creativity to a transporter truck for the developing world. Simon Usborne meets him.