Stay up to date with notifications from The Independent

Notifications can be managed in browser preferences.

Code of conduct: The relentless march of the algorithm

Robin Barton discovers how complicated mathematical formulae are increasingly controlling every area of our lives, from love to life-saving procedures.

Robin Barton
Sunday 15 January 2012 01:00 GMT
Comments
(Jan Kallwejt)

Last October a tiny tech start-up from Australia called Kaggle raised $11m from a trio of venture capitalists. More interesting than the sum, modest by Silicon Valley standards, is who provided the funding: Index Ventures, a venture-capital firm co-founded by Neil Rimer in London but now one of the hottest in San Francisco; Khosla Ventures, founded by Vinod Khosla of Sun MicroSystems; and Max Levchin, one of PayPal's founders. Among Silicon Valley's many potential investors, they are aristocracy. Kaggle swiftly relocated from Melbourne to new offices in San Francisco.

A short distance away in Palo Alto, another tech firm, Palantir Technologies, also enjoyed an autumn funding boost – to the tune of $70m. The company, started in 2004 by another PayPal founder Peter Thiel (among others), is valued at $2.5bn, according to the blog TechCrunch. What Kaggle and Palantir have in common, aside from flush bank accounts, is their interest in algorithms – the mathematical equations that, when coded into computers, are influencing every area of our lives, from the stock market to the supermarket. k

Algorithms – algos to the digerati – translate a process into instructions that a machine can understand, ensuring, for example, that just enough bananas, at the most profitable price, are delivered to your local Sainsbury's on a Monday morning in January. They can be used to predict tomorrow's weather or, as the machine learning team at Bristol University accomplished in December using stats from 50 years of music charts, the popularity of a pop song. British firm Epagogix claims to use algorithms to score Hollywood film plots on their profitability – before they get the green light (or not). But, at a deeper level, as algorithms rapidly reshape business, finance, culture and even war, they raise profound ethical questions about the controls and responsibilities we cede to machines.

Anthony Goldbloom, now 28, worked for the Australian Treasury and the Reserve Bank of Australia before starting Kaggle in 2010. His idea was sublime in its simplicity and timing: to harness the brain power of bright people – mostly data scientists of some description – all over the world to solve specific data-analysis problems posted by companies and organisations as competitions. Some of the competitions hosted by Kaggle over the past year included one to predict whether a used car is a lemon (prize pot of $10,000), another by Nasa and the Royal Astronomical Society to map the universe's dark matter more effectively (prize $3,000), and the Heritage Health Prize ($3m), to predict patients who will be admitted to hospital within the year, using historical claims data. The organisation supplies the data and competitors craft algorithms to delve into it. The results can be surprising. "We get people from fields I've never heard of making breakthroughs," says Goldbloom: Nasa's competition saw glaciologist Martin O'Leary from Cambridge University make the early running, confounding the space scientists. "The combination of people from unusual backgrounds plus the competitive dynamic means you get close to the limit of what's possible," says Goldbloom. "On all 50 competitions we've hosted, we've always outperformed the previous benchmark."

Kaggle sits at the junction of three hugely powerful trends in computing: crowd-sourcing; the availability of vast quantities of good-quality data; and the awareness that gold lies in that there data. Data scientists, wielding algorithms, are the modern-day alchemists who unlock that gold. "Companies such as banks, insurance companies and search engines compete on their ability to build algorithms," explains Goldbloom. By divining who is most likely to default on a loan from an applicant's personal details, banks, for example, can cut their bad loans and thus offer lower interest rates to those they do accept. Google, which has about 90 per cent of the online search market, guards the secrets of its ever-evolving PageRank formula closely.

"Algorithms are a major source of competitive advantage," says Goldbloom. "Tesco's rise in the 1990s was off the back of its partnership with [data-mining company] Dunnhumby and its customer loyalty card, which gave it a much better insight into what its customers were doing. It's a really, really big area."

Palantir Technologies operates on an even more secretive level. A demonstration of the firm's abilities follows a fictitious foreign national as he makes his way across the US; buying a one-way plane ticket, renting an apartment, making bank withdrawals, phoning the Middle East, renting a truck, visiting Disney World. You can see where this is going. The foreign national pops up in a security agency's database when he gets a speeding ticket. From that moment, his every movement can be unravelled. Palantir uses state-of-the-art algorithms to sift vast and varied sets of data for the minute clues that can pinpoint an individual of interest in a population of billions. With clients such as the CIA, FBI, US Defense Department, armed forces, police departments and, increasingly, financial institutions chasing bank fraud, Palantir's turnover in 2011 is thought to be around $250m.

But algorithms are not only the province of security agencies and Nasa. They've been around for a while – they were used in the the 1940s to predict traffic flow on Britain's roads – and they're in everything we use, from cars and dishwashers to iTunes and online newspapers.

Netflix, the American online movie provider launching in the UK later this year, is one of the best-known examples, explains data scientist Sam Roberts of XMW Consulting in Cambridge. "Netflix offered a $1m prize to anyone who could improve its movie recommendations," he says. "It provided the data about the movies its customers rent and, with an algorithm, a lot of useful information can be drawn from that data, such as who rents what genre and when. An algorithm is simply a recipe: the data constitutes the ingredients and an algorithm weighs and measures it. The output is a prediction of something of interest."

That prediction can be as benign as the next film you want to watch – or something rather less trivial. "You would be astonished at how accurately an algorithm can make predictions. But an algorithm is only as good as your data. 'Garbage in, garbage out' is the saying."

The rise of the algorithms embedded in our lives could not have occurred without a surge in something produced by the inter-connected, online lives we lead today: good-quality data and vast server farms of it. A supermarket chain can access detailed data not only from its millions of loyalty cards but also from every transaction at every till in every branch since the day it decided to start collecting this information. A tiny efficiency of less than 1 per cent can save £10m. The falling cost of data storage means that there is no reason not to collect it, and the mushrooming processing power of computers means that it can be churned continuously in the hunt for insights. "Data-mining algorithms," says Roberts, "are very good at digging out nuggets of information and ignoring the irrelevant – something at which the human eye is very bad."

We shed data all the time, like dead skin cells. With every click, every Facebook friendship, every shopping trip, every interaction with a government agency, we leave a mote. This, of course, raises ethical conundrums, not least around privacy. "In some ways, privacy is already dead," says Roberts. "There's already so much information about us out there. Data sets are anonymised but data mining can inadvertently reverse that. For example, merging aggregated medical information with something as simple as postcodes from a marketing company." Goldbloom agrees: "Privacy is just not an expectation my generation has. I go to a lot of conferences and at those where the speakers are younger the focus is on the application of big data; those with older speakers tend to focus on privacy."

The ethics of data mining are not limited to privacy. What if an algorithm's research throws up socially uncomfortable truths? How should we control the application of algorithms in the robot armies of the future? Air Force cadets in the US are already warned that few of them will actually fly planes; computer-controlled drones will rule the skies, potentially deciding on targets independently. And in healthcare, if an algorithm can predict with dispassionate precision when and how an individual is likely to need medical care, how might that affect their insurance premiums? Is a computer program better able to calculate kidney transplant survival statistics and decide who receives a donor organ and who doesn't? Can humans programme machines to act ethically when often we fail to do so ourselves?

Since the start of the global financial crisis in 2008, particular attention has been focused on the use of algorithms in the banking world, specifically the relatively new phenomenon of the high-frequency trader and black-box (or algorithmic) trading. Unlike old-fashioned stock-pickers, high-frequency traders use algorithms rather than hunches or legwork to analyse market data and exploit anomalies (in pricing, for example) across thousands of companies instantaneously, buying and selling stocks in fractions of a second. In recent years these black boxes have spread across the world's exchanges – it's thought that more than 50 per cent of trades in US equities are algorithmic – and their silent, incessant industry doesn't stop for breakfast, lunch or tea. John Coates, a former trader and now senior research fellow in Cambridge University's neuroscience department, spent 13 years running trading desks on Wall Street and now studies the physiology of traders, measuring the risks they take against the fluctuations of their nervous systems. In recent years, investment banks have replaced traders with black boxes because, he explains, "Human beings can't compete on speed with the machines. In 2005 I found it increasingly difficult to trade; I couldn't get the scent. The market felt inhuman."

The algorithms that really irritate traders are the "front-runners", using speed to pip the prospective purchaser of a stock at the post and then selling it to them at a slightly higher price. But algorithms are employed for more than just high-frequency trades; each hedge fund and investment bank now has its own mathematical arsenal to analyse data. A source at an Oxford-based hedge fund, which manages $2bn of funds, explains how algorithms work in the stock market: "If you can rationally articulate the reasons for buying or selling something, we try to capture that in the form of a mathematical model or strategy. There is nothing magical. We simply try to see what happens on average."

The fact that there's a human programmer behind every algorithm may not alleviate concerns about them. Even programmers are fallible: Credit Suisse was fined $150,000 by the New York Stock Exchange in January 2010 for failing to supervise a malfunctioning algorithm. Four months later, on 6 May 2010, the Dow Jones Industrial Average dropped 900 points in a matter of minutes in what is now known as the Flash Crash. Nobody can agree on what happened, only perhaps that for a few minutes algorithms, locked in loops with each other, vaporised 9 per cent of the US stock exchange.

The boom in algorithmic trading has not only inspired its first doomsday novel, in Robert Harris's The Fear Index – in which Vixal 4, a Swiss hedge fund's super-brainy, fear-predicting algorithm goes predictably loopy – but has changed the topography of cities and the personnel at trading firms. When profits are made in not milliseconds but microseconds (a thousandth of a millisecond), the less distance your automated order travels, the more money you make. So trading firms are stripping out skyscrapers as close to exchanges and nodes as possible and installing data centres, filling steel-lined rooms with servers. "It's an arms race between funds," says Coates. Nasdaq claims that an order takes just 98 millionths of a second to bounce to and from its exchange. An 825-mile fibre-optic cable has been laid between Chicago and New York by Spread Networks, promising a 13.33 millisecond round trip solely for trading firms' commands. A similar cable under the Atlantic to beam orders to and from London and New York in 60 milliseconds is proposed.

As the speed of algorithms changes the nature of the markets (the volume of trades on the New York Stock Exchange swelled by 181 per cent between 2005 and 2009), so the barrow boys barking in the pits have been replaced by physicists, 2,000 of whom now work on Wall Street. Most of the 50 staff at the hedge fund in Oxford have science or maths degrees; half have doctorates. "We care about intelligent people," says a source at the firm. "The fact they're scientists is almost irrelevant. We like clever people, people who are analytical, people who can look at a problem dispassionately. That's all."

At Kaggle, Goldbloom's goal is to see the best data scientists earn as much as hedge-fund managers. If supply and demand is any indication, he may achieve it: a May 2011 report on "big data" by McKinsey highlighted a shortfall of 140,000 to 190,000 people in the US alone with the skills to analyse data. Goldbloom makes another observation: "The things that companies look for in data scientists tend not to be the things that really matter. Companies might look for the right sort of degree, but much more important than mathematical aptitude is common sense. The greatest proportion of our users have backgrounds in statistics and computer science, which is what you would expect. But the most successful members of our community tend to be electrical engineers and physicists. The implication is that these people are more interested in problem-solving than trying out the latest fancy algorithm." His example is Kaggle's used-car competition. It seems that unusually coloured second-hand motors (especially orange) are less likely to be duds than standard-coloured (such as silver), perhaps because unusually coloured cars are more likely to be bought by enthusiasts who take good care of them.

Perhaps the human brain, slow and easily distracted as it is, has a place in an algorithmic world after all. As a neuroscientist, Coates certainly thinks so. "Black-box trading isn't the dominant force it was; like the Terminator movie, humans are fighting back from the brink. When the market's not moving, traders today sit on their hands because they know they'll get picked off by a machine. But when there's volatility humans come into their own. Old traders have adapted. We're better at long-term predictions."

Outside of fiction, algorithms can't predict the big political calls, such as whether Northern Rock will be rescued. But it's fair to assume that one day they will: already algorithms scan and interpret tweets in real time for news trends.

And the news itself is shaped by decisions made by algorithms, gaming the web for clicks. Not only is it normal for internet users to select feeds for the sort of news they want to receive, but stories themselves are composed to appeal to search-engine algorithms. It's the sort of computerised content aggregation that irks Jaron Lanier, the musician, computer scientist and author of the bestseller You Are Not a Gadget. In the book's preface he writes: "Algorithms will find correlations between my words and [readers'] purchases, their romantic adventures, their debts, and, soon, their genes. Ultimately these words will contribute to the fortunes of those few who have been able to position themselves as lords of the computing clouds."

I catch up with Lanier as he's writing his follow-up book and he makes the valid point that algorithms are not inherently problematic. "Algorithms themselves are a form of creativity. The problem is the illusion that they're free-standing. If you start to think that information isn't just a mask behind which people are hiding, if you forget that, you'll pay a price for that way of thinking. It will cause you to be less creative.

"If you show me an algorithm that dehumanises, impoverishes, manipulates or spies upon people," he continues, "that same core maths can be applied differently. In every case. Take Facebook's new Timeline feature [a diary-style way of displaying personal information]. It's an idea that has been proposed since the 1980s [by Lanier himself]. But there are two problems with it. One, it's owned by Facebook; what happens if Facebook goes bankrupt? Your life disappears – that's weird. And two, it becomes fodder for advertisers to manipulate you. That's creepy. But its underlying algorithms, if packaged in a different way, could be wonderful because they address a human cognitive need."

What concerns Lanier today is "how people will make their way when the machines get better". It's not just traders being made redundant by algorithms. In the next 10 years a swathe of middle-class, white-collar occupations, from legal researchers to, perhaps, journalists, will become obsolete. Lanier gives an example: "We already have pretty good self-driving cars. They're probably already safer than humans. The technology will soon be good enough that if cars drove themselves, far fewer people would be killed. That's a compelling argument for adopting them. But look at the vast number of people making their living behind a wheel – truck drivers, cab drivers – what will they do? This is an ancient problem; even Aristotle talked about it."

At breakneck speed, algorithms have raced ahead of us and led us into some surprising situations. One company's monopoly on online advertising. Traders outgunned by scientists in the stock markets. And Kaggle's latest algorithm writing competition: to grade student essays by computer. "It sounds like science fiction," says Goldbloom, "and I don't know if it's possible but we'll know by the end of the competition." Who or what could have seen all that coming?

Jaron Lanier talks at Learning Without Frontiers at Olympia, London W14 (learningwithoutfrontiers.com), from 25 to 26 January, and plays a concert with David Rothenberg at the October Gallery, London WC1 (octobergallery.co.uk), on 28 January

How algorithms play a part in your life

Love

Dating websites use algorithms to match suitors with dates. At eHarmony, algorithms perform more than a billion romantic calculations a day for its 20 million-plus members. Brush up your own profile pic using a 'beautification' algorithm to make your face's proportions more appealing.

Food

When your local supermarket runs out of bread after a freak snowstorm, blame an algorithm. Maths formulae predict demand for particular items at particular times at particular shops, then allocate stock – but one-off events can throw the predictive software.

Shelter

Home automation will be the next tech leap, as algorithms, sensors and the net combine to heat certain rooms of your house before you arrive back from work, order food, save energy and turn lights on and off as you move round.

Join our commenting forum

Join thought-provoking conversations, follow other Independent readers and see their replies

Comments

Thank you for registering

Please refresh the page or navigate to another page on the site to be automatically logged inPlease refresh your browser to be logged in