How Google Translate works

The web giant's translation service might serve up the odd batch of nonsense, but it's still one of the smartest communication tools of all time, as David Bellos explains

Using software originally developed in the 1980s by researchers at IBM, Google has created an automatic translation tool that is unlike all others. It is not based on the intellectual presuppositions of early machine translation efforts – it isn't an algorithm designed only to extract the meaning of an expression from its syntax and vocabulary.

In fact, at bottom, it doesn't deal with meaning at all. Instead of taking a linguistic expression as something that requires decoding, Google Translate (GT) takes it as something that has probably been said before.

It uses vast computing power to scour the internet in the blink of an eye, looking for the expression in some text that exists alongside its paired translation.

The corpus it can scan includes all the paper put out since 1957 by the EU in two dozen languages, everything the UN and its agencies have ever done in writing in six official languages, and huge amounts of other material, from the records of international tribunals to company reports and all the articles and books in bilingual form that have been put up on the web by individuals, libraries, booksellers, authors and academic departments.

Drawing on the already established patterns of matches between these millions of paired documents, Google Translate uses statistical methods to pick out the most probable acceptable version of what's been submitted to it.

Much of the time, it works. It's quite stunning. And it is largely responsible for the new mood of optimism about the prospects for "fully automated high-quality machine translation".

Google Translate could not work without a very large pre-existing corpus of translations. It is built upon the millions of hours of labour of human translators who produced the texts that GT scours.

Google's own promotional video doesn't dwell on this at all. At present it offers two-way translation between 58 languages, that is 3,306 separate translation services, more than have ever existed in all human history to date.

Most of these translation relations – Icelandic to Farsi, Yiddish to Vietnamese, and dozens more – are the newborn offspring of Google Translate: there is no history of translation between them, and therefore no paired texts, on the web or anywhere else. Google's presentation of its service points out that given the huge variations between languages in the amount of material its program can scan to find solutions, translation quality varies according to the language pair involved.

What it does not highlight is that GT is as much the prisoner of global flows in translation as we all are. Its admirably smart probabilistic computational system can only offer 3,306 translation directions by using the same device as has always assisted intercultural communication: pivots, or intermediary languages.

It's not because Google is based in California that English is the main pivot. If you use statistical methods to compute the most likely match between languages that have never been matched directly before, you must use the pivot that can provide matches with both target and source.

A good number of English-language detective novels, for example, have probably been translated into both Icelandic and Farsi. They thus provide ample material for finding matches between sentences in the two foreign languages; whereas Persian classics translated into Icelandic are surely far fewer, even including those works that have themselves made the journey by way of a pivot such as French or German. This means that John Grisham makes a bigger contribution to the quality of GT's Icelandic-Farsi translation device than Rumi or Halldór Laxness ever will. And the real wizardry of Harry Potter may well lie in his hidden power to support translation from Hebrew into Chinese. GT-generated translations themselves go up on the web and become part of the corpus that GT scans, producing a feedback loop that reinforces the probability that the original GT translation was acceptable. But it also feeds on human translators, since it always asks users to suggest a better translation than the one it provides – a loop pulling in the opposite direction, towards greater refinement. It's an extraordinarily clever device. I've used it myself to check I had understood a Swedish sentence more or less correctly, for example, and it is used automatically as a webpage translator whenever you use a search engine.

Of course, it may also produce nonsense. However, the kind of nonsense a translation machine produces is usually less dangerous than human-sourced bloopers. You can usually see instantly when GT has failed to get it right, because the output makes no sense, and so you disregard it. (This is why you should never use GT to translate into a language you do not know very well. Use it only to translate into a language in which you are sure you can recognise nonsense.)

Human translators, on the other hand, produce characteristically fluent and meaningful output, and you really can't tell if they are wrong unless you also understand the source – in which case you don't need the translation at all.

If you remain attached to the idea that a language really does consist of words and rules and that meaning has a computable relationship to them (a fantasy that many philosophers still cling to), then GT is not a translation device. It's just a trick performed by an electronic bulldozer allowed to steal other people's work. But if you have a more open mind, GT suggests something else.

Conference interpreters can often guess ahead of what a speaker is saying because speakers at international conferences repeatedly use the same formulaic expressions. Similarly, an experienced translator working in a familiar domain knows without thinking that certain chunks of text have standard translations that he or she can slot in.

Translators don't reinvent hot water every day. They behave more like GT – scanning their own memories in double-quick time for the most probable solution to the issue at hand. GT's basic mode of operation is much more like professional translation than is the slow descent into the "great basement" of pure meaning that early mechanical translation developers imagined.

GT is also a splendidly cheeky response to one of the great myths of modern language studies. It was claimed, and for decades it was barely disputed, that what was so special about a natural language was that its underlying structure allowed an infinite number of different sentences to be generated by a finite set of words and rules.

A few wits pointed out that this was no different from a British motor car plant, capable of producing an infinite number of vehicles each one of which had something different wrong with it – but the objection didn't make much impact outside Oxford.

GT deals with translation on the basis not that every sentence is different, but that anything submitted to it has probably been said before. Whatever a language may be in principle, in practice it is used most commonly to say the same things over and over again. There is a good reason for that. In the great basement that is the foundation of all human activities, including language behaviour, we find not anything as abstract as "pure meaning", but common human needs and desires.

All languages serve those same needs, and serve them equally well. If we do say the same things over and over again, it is because we encounter the same needs, feel the same fears, desires and sensations at every turn. The skills of translators and the basic design of GT are, in their different ways, parallel reflections of our common humanity.

This is an extract from 'Is That A Fish In Your Ear: Translation and the Meaning of Everything' by David Bellos published by Particular (£20). To order a copy for the special price of £16.50 (free P&P), call Independent Books Direct on 08430 600 030, or visit independentbooksdirect.co.uk

PROMOTED VIDEO
Life and Style
ebookNow available in paperback
Life and Style
ebooksA superb mix of recipes serving up the freshest of local produce in a delicious range of styles
Latest stories from i100
Have you tried new the Independent Digital Edition apps?
Independent Dating
and  

By clicking 'Search' you
are agreeing to our
Terms of Use.

ES Rentals

    iJobs Job Widget
    iJobs Gadgets & Tech

    Recruitment Genius: Senior Project Manager

    £45000 - £65000 per annum: Recruitment Genius: This is a fantastic opportunity...

    Recruitment Genius: Customer Service Executive

    £20000 per annum: Recruitment Genius: A Customer Service Executive is required...

    Ashdown Group: Junior SQL DBA - London - £39,000

    £37000 - £39000 per annum + benefits: Ashdown Group: SQL Database Administrato...

    Recruitment Genius: PHP Developer

    £26000 - £32000 per annum: Recruitment Genius: Expanding creative studio requi...

    Day In a Page

    In a world of Saudi bullying, right-wing Israeli ministers and the twilight of Obama, Iran is looking like a possible policeman of the Gulf

    Iran is shifting from pariah to possible future policeman of the Gulf

    Robert Fisk on our crisis with Iran
    The young are the new poor: A third of young people pushed into poverty

    The young are the new poor

    Sharp increase in the number of under-25s living in poverty
    Greens on the march: ‘We could be on the edge of something very big’

    Greens on the march

    ‘We could be on the edge of something very big’
    Revealed: the case against Bill Cosby - through the stories of his accusers

    Revealed: the case against Bill Cosby

    Through the stories of his accusers
    Why are words like 'mongol' and 'mongoloid' still bandied about as insults?

    The Meaning of Mongol

    Why are the words 'mongol' and 'mongoloid' still bandied about as insults?
    Mau Mau uprising: Kenyans still waiting for justice join class action over Britain's role in the emergency

    Kenyans still waiting for justice over Mau Mau uprising

    Thousands join class action over Britain's role in the emergency
    Isis in Iraq: The trauma of the last six months has overwhelmed the remaining Christians in the country

    The last Christians in Iraq

    After 2,000 years, a community will try anything – including pretending to convert to Islam – to avoid losing everything, says Patrick Cockburn
    Black Friday: Helpful discounts for Christmas shoppers, or cynical marketing by desperate retailers?

    Helpful discounts for Christmas shoppers, or cynical marketing by desperate retailers?

    Britain braced for Black Friday
    Bill Cosby's persona goes from America's dad to date-rape drugs

    From America's dad to date-rape drugs

    Stories of Bill Cosby's alleged sexual assaults may have circulated widely in Hollywood, but they came as a shock to fans, says Rupert Cornwell
    Clare Balding: 'Women's sport is kicking off at last'

    Clare Balding: 'Women's sport is kicking off at last'

    As fans flock to see England women's Wembley debut against Germany, the TV presenter on an exciting 'sea change'
    Oh come, all ye multi-faithful: The Christmas jumper is in fashion, but should you wear your religion on your sleeve?

    Oh come, all ye multi-faithful

    The Christmas jumper is in fashion, but should you wear your religion on your sleeve?
    Dr Charles Heatley: The GP off to do battle in the war against Ebola

    The GP off to do battle in the war against Ebola

    Dr Charles Heatley on joining the NHS volunteers' team bound for Sierra Leone
    Flogging vlogging: First video bloggers conquered YouTube. Now they want us to buy their books

    Flogging vlogging

    First video bloggers conquered YouTube. Now they want us to buy their books
    Saturday Night Live vs The Daily Show: US channels wage comedy star wars

    Saturday Night Live vs The Daily Show

    US channels wage comedy star wars
    When is a wine made in Piedmont not a Piemonte wine? When EU rules make Italian vineyards invisible

    When is a wine made in Piedmont not a Piemonte wine?

    When EU rules make Italian vineyards invisible