The number crunch: Will Big Data transform your life - or make it a misery?


The age of Big Data is upon us. Fuelled by an incendiary mix of overblown claims and dire warnings, the public debate over the handling and exploitation of digital information on an astronomically large scale has been framed in stark terms: on one side are transformative forces that could immeasurably improve the human condition; on the other, powers so subversive and toxic that a catastrophic erosion of fundamental liberties looks inevitable.

The tension between these opposites has marooned the discussion of Big Data. It is stuck somewhere between Bletchley Park – the former Government Communications Headquarters (GCHQ) location where the godfather of the computational universe, Alan Turing, primed today's Big Data explosion during the Second World War – and the satirical tomfoolery of South Park, which recently portrayed the living core of all data as an incarcerated Father Christmas cruelly wired up to a machine by the US's National Security Agency (NSA).

We know from Edward Snowden's widely publicised whistle-blowing revelations that the NSA – in collusion with GCHQ – lifted vast amounts of data from Google and Yahoo, under the once-top-secret codename, Muscular. At the same time, we're told that the potential for beneficial insights mined from anonymous, adequately protected data is enormous.

Big Data helps us find things we "might like" to buy on Amazon, for example, but it has also left us vulnerable to surveillance by state and other agencies. Companies such as Google and Facebook are essentially Big Data businesses, whose staggering profitability stems from the application of data analysis to advertising: these "free" services are paid for by personal data surrendered automatically with every click.

In finance, meanwhile, optimists foresee a theoretical end to all stock-market crashes, thanks to insights derived from huge-scale data-crunching, while others predict an automated, algorithmic road to ruin. Similarly, the cost and efficiency of healthcare provision is set to be radically transformed for the better with access to massive amounts of data – likewise the development of new drugs and treatments. But what about the mining of medical data without patient consent? So the debate goes on.

One aspect of Big Data, however, is beyond question: it is indeed very big, and it's getting bigger by the millisecond. An IBM report in September estimated that 2.5 quintillion bytes of data are created every day (that's 25 followed by 17 zeros, or roughly 10 million laptop hard drives) and that 90 per cent of the world's data has been generated in the past two years: everything from geo-tagged phone texts and tweets to credit-card transactions and uploaded videos. By 2020, it's thought that the number of bytes will be 57 times greater than all the grains of sand on the world's beaches.

So what's actually going on at the coalface of Big Data, a code-centric world of striping, load-balancing, clustering and massively parallel processing? What do the analysts working with Big Data say it's going to do for us?

"You get a fuller picture of the phenomenon you're interested in, with more dimensions, and that lets you derive greater insights," says Big Data pioneer Doug Cutting, chief architect at enterprise software company Cloudera and founder of the popular open-source Big Data tool Hadoop. Cutting's work on internet search technology for Yahoo during the mid-2000s provided the ideal proving ground for combining vastly increased computing power with huge and diverse datasets. "And from that we've seen a new style of computing emerge."

The revolutionary effects of this new approach cannot be understated, especially within the scientific community. For Brad Voytek, professor of computational cognitive science and neuroscience at the University of California San Diego, and "data evangelist" for app-based taxi service Uber, Big Data has had a profound effect on the traditional scientific method. "You can sweep through huge amounts of data and come up with new observations," he says. "That's where the power of Big Data comes in. It's automating the observation process. It's making everything easier but in a way that few people yet understand. It's going to dramatically speed up the scientific process and people have been doing some really cool stuff with it."

Michael Schmidt, founder and chief executive of American "machine-learning" start-up Nutonian, established a Big Data landmark when, in partnership with robotics engineer Hod Lipson at Cornell University, New York, he created Eureqa – a piece of software that deduced Newton's Second Law of Motion by analysing data from the chaotic movements of a double pendulum. What took Newton years, the Eureqa algorithm accomplished in a matter of hours. With Nutonian, Schmidt is now opening up that Big Data technology beyond the college lab.

"We want to accelerate the process that scientists go through, to help you discover very deep principles from data," he says. "We want to explain how things work." The range of Eureqa's uses couldn't be more striking, from the construction of better warplanes to helping save the lives of infants. Schmidt is currently working with the United States Air Force, analysing the strength of advanced super-alloys used in engine components. "They are really interested in anticipating failures – knowing when things are going to break, explode or stop working. We were able to show them the most important things that go into a failure of a particular engine part, at a finer resolution than ever before."

Eureqa has also been used to help discover the optimal moment to remove breathing tubes from prematurely born babies. "It's really critical when you remove that tube, and allow the child to start breathing on its own," says Schmidt. "Premature babies are hooked up to every monitoring device you can imagine and we were able to take that data and winnow it down to a few of these key metrics that drive the future health of the babies. Which is pretty neat."

Harnessed to Big Data, this kind of analysis becomes the work of hours and minutes. "Traditionally you could spend years before you could conclude on a result. What's changed is that we have these huge datasets. You can rapidly accelerate the entire discovery process."

While the benefits of this revolutionary increase in analytical speed are clear, Big Data is often inseparable from its source and context, especially in the public realm, where ethical concerns are paramount. Justin Keen, professor of health politics at the University of Leeds, co-authored a June 2013 paper published in Policy & Internet, the journal of the Oxford Internet Institute. In it, he addressed issues of privacy and access in relation to Big Health Data. "The potential for much greater exploitation of data held by government departments in England and all around the world is real," he says. "We just haven't got proper governance arrangements at the moment – we don't know what rules should govern what NHS data gets published, and in what sort of format."

Early in 2013, Health Secretary Jeremy Hunt set the goal of a paperless NHS by April 2018, in line with programmes including, which links patient data across different parts of the NHS. It is hoped that the resulting increase in preventative treatments, coupled with improvements in health management, will save billions and improve the quality of healthcare. The sticking point is patient confidentiality.

"I'm very happy to see that in the past month or two, senior civil servants have actually put the brakes on," says Keen. "Releases of data through and other channels are actually going to be slowed down until we've got these governance arrangements right. But we're not going to get the releases of data that advocates are hoping for as early as they might have hoped for it."

Despite this slow-down, the Big Data community appears to be echoing Keen's note of caution. "From my perspective as a person who works in data, of course I want as much as I can get, because the more data you've got, the more interesting things you can do with it," says Francine Bennett, chief executive and co-founder of London-based Big Data specialists, Mastodon C, which mined available data to co-create the CDEC Open Health Data Platform, a showcase for insights generated by Big Health Data. "However, as a person who's knowledgeable about data – and as a citizen of the UK k whose health data is in these systems – I know that it could be enormously damaging to privacy to release things which shouldn't be released. It's hard to put the genie back in the bottle. I'm keen for it to be done in a measured way."

Gil Elbaz, founder and chief executive of open-data platform Factual, began his career as a database engineer in Silicon Valley in the 1990s before co-founding Applied Semantics, acquired by Google in 2003 for $102m. Applied Semantics developed AdSense, the technology that matches online advertisements to the pages being browsed and the person browsing them. "The approach we took to the contextual targeting of ads was all rooted in processing huge amounts of data," says Elbaz, whose Factual company website affirms his core belief in "making data accessible".

"We take data privacy very seriously, and if somebody's data is theirs, they should have the right to keep it private. That being said, there are significant opportunities where data shouldn't be kept fully private, because it's to society's benefit for it to be open," he says, citing David Cameron's October 2013 announcement, at the Open Government Partnership summit, of a public register of business ownership. "Data at Factual is primarily business data," says Elbaz. "These businesses want it to be available."

Even where the privacy question is not an issue, Elbaz is concerned that information can get trapped in hard-to-reach databases. "Too often today data is not accessible. For example, why is it that software can't automatically check – given the age of a patient and any drug – whether a dosage is healthy or lethal? Why can't it be flagged? The reason is that there is no open API (Application Programming Interface, or app-creation tool) that has drugs and dosage ranges. It does not exist. Is there a database? Yes. But it'll take a long time even to find the right person to buy that data from. To me, this is insane."

So where's it all leading us? For some, the ultimate goal of Big Data has been defined as a kind of supreme foresight: an ability to predict what people want before they know they want it. Elbaz takes a more functional view. "My holy grail is that if any piece of software needs access to information, it can find that access at a reasonable cost," he says. "To me, it is not crazy rocket science – it's the basic fabric of how a global information system should work."

For Schmidt, the quest for enlightenment has only just begun. "A lot of promises have been made for Big Data in the hope that it has this enormous value, and we're starting to chip away a little at that, but there's still so much to be done."

Doug Cutting, however, has little interest in the notion that Big Data will supply some kind of predictive super-power. "I'm an engineer. I focus down on the plumbing. I think I have a more concrete imagination about what is possible. I don't believe it's possible to have an oracle that can predict what I'll be interested in doing tomorrow. Moreover, I find surprises invigorating; I'd hate to lose spontaneity in the world."

However, he adds, certain kinds of things can be done better. "To me, the holy grail is removing limitations and being able to achieve the interconnectedness that we want; to be able to take advantage of all the data and do all the things we imagine are possible. I don't think we want to get there overnight as a society. We need to embrace these things and understand what we want to happen and what we don't want to happen – build the right societal, legal and business structures. We need to evolve."

Three eye-catching big data ventures

1. Open Data Institute

Aim: free data for all

Co-founded by Sir Tim Berners-Lee, the inventor of the World Wide Web, to encourage the exploitation of freely available data – aka "open data" – the not-for-profit Open Data Institute has positioned itself as both a catalyst for data innovation and a global hub for data expertise. Based in Shoreditch, east London, the ODI oversees a network of collaborative international "nodes", including Dubai and Buenos Aires, and has incubated a growing bunch of Big Data start-ups – for example, Mastodon C (see main feature), which identified potential NHS savings of about £200m by crunching data relating to branded and generic drugs; and Placr, which analyses real-time transportation and timetable information to improve daily travel.

2. The Human Brain Project

Aim: to reveal the workings of human consciousness

Flush with €1bn in funding, the Human Brain Project is a 10-year quest to reveal the hidden workings of consciousness. The scale of this task is so immense – the brain has around 100 trillion neural connections – that many still doubt it can be achieved, but Switzerland-based project leader Henry Markram believes his collaborative Big Data approach, using statistical simulations and vast supercomputing power across "swarms" of researchers, might do the trick. One aspect of the plan involves mining a huge amount of available data on mental disorders from public hospitals as well as pharmaceutical company databases; algorithms will then isolate revealing patterns and connections. In a decade's time, the neural picture should be much clearer.

3. IBM's Computational Creativity

Aim: to make computers 'creative'

Following a line of computer evolution that runs from Deep Blue (which beat Gary Kasparov at chess in 1997) through Watson (which beat human opponents on the US quiz show Jeopardy! in 2011), IBM has continued its ingenious manipulation of huge datasets with a system designed to generate creativity. Big-data analytics techniques have been deployed by IBM's Thomas J Watson Research Center to create new food recipes – what you might call technouvelle cuisine – mined from sources including Wikipedia and Fenaroli's Handbook of Flavor Ingredients, then tweaked with an algorithm designed to add creativity to matched ingredients. The results (from Vietnamese apple kebab to Cuban lobster bouillabaisse) have impressed human chefs.

Arts and Entertainment
Characters in the new series are based on real people, say its creators, unlike Arya and Clegane the Dog in ‘Game of Thrones’
tv'The Last Kingdom' embraces politics, religion, warfare, courage, love and loyalty, say creators
Sergio Romero saves Wesley Sneijder's penalty
world cup 2014But after defeating the Dutch, Lionel Messi and Argentina will walk out at the Maracana on Sunday as underdogs against Germany
Scoreboard at the end of the semi-final World Cup match between Brazil and Germany at The Mineirao Stadium in Belo Horizonte
'Saddest man in Brazil' takes defeat with good grace, handing replica trophy to German fans
Life and Style
ebookA wonderful selection of salads, starters and mains featuring venison, grouse and other game
peopleThe Game of Thrones author said speculation about his health and death was 'offensive'
Arts and Entertainment
Martin Freeman and Lauren O'Neil in Jamie Lloyd's Richard III
theatreReview: The monarch's malign magnetism and diabolic effrontery aren’t felt
Glamour magazine hosts a yoga class with Yogalosophy author Mandy Ingber on June 10, 2013 in New York City.
newsFather Padraig O'Baoill said the exercise was 'unsavoury' in a weekly parish newsletter
people'She is unstoppable', says Jean Paul Gaultier at Paris show
Alexis Sanchez and apparently his barber Carlos Moles in Barcelona today
Arts and Entertainment
Miley Cyrus has her magic LSD brain stolen in this crazy video produced with The Flaming Lips
Arts and Entertainment
In his own words: Oscar Wilde in 1882
theatreNew play by the Oscar Wilde's grandson reveals what the Irish wit said at his trials - and what they reveal about the man
Arts and Entertainment
Unless films such as Guardians of the Galaxy, pictured, can buck the trend, this summer could be the first in 13 years that not a single Hollywood blockbuster takes $300m
filmWith US films earning record-breaking amounts at the Chinese box office, Hollywood is more than happy to take its lead from its new-found Asian audience
The garage was up for sale in Canning Place Mews for £500,000
newsGarage for sale for £500,000
Life and Style
Travel Shop
the manor
Up to 70% off luxury travel
on city breaks Find out more
Up to 70% off luxury travel
on chic beach resorts Find out more
sardina foodie
Up to 70% off luxury travel
on country retreats Find out more
Have you tried new the Independent Digital Edition apps?
Independent Dating

By clicking 'Search' you
are agreeing to our
Terms of Use.

ES Rentals

    iJobs Job Widget
    iJobs Gadgets & Tech

    Mobile App/IOS Developer (C#, ASP.NET, .NET, MVC)

    £50000 - £60000 per annum + Benefits + Bonus: Harrington Starr: Mobile App/IOS...

    Front End Developer-JavaScript, Angular J.S, HTML, CSS, ASP.NET

    £40000 - £45000 per annum + Benefits + Bonus: Harrington Starr: Front End Deve...

    C# Web App Developer (ASP.NET, TDD, MVC, ASP.NET, JavaScript)

    £50000 - £65000 per annum + Benefits + Bonus: Harrington Starr: C# Web App Dev...

    BI Developer/Analyst (SQL, SSIS, SSAS, Data-Warehouse) London

    £60000 - £75000 per annum + Benefits + Bonus: Harrington Starr: BI Developer/A...

    Day In a Page

    The true Gaza back-story that the Israelis aren’t telling this week

    The true Gaza back-story that the Israelis aren’t telling this week

    A future Palestine state will have no borders and be an enclave within Israel, surrounded on all sides by Israeli-held territory, says Robert Fisk
    A History of the First World War in 100 Moments: The German people demand an end to the fighting

    A History of the First World War in 100 Moments

    The German people demand an end to the fighting
    New play by Oscar Wilde's grandson reveals what the Irish wit said at his trials

    New play reveals what Oscar Wilde said at trials

    For a century, what Wilde actually said at his trials was a mystery. But the recent discovery of shorthand notes changed that. Now his grandson Merlin Holland has turned them into a play
    Can scientists save the world's sea life from

    Can scientists save our sea life?

    By the end of the century, the only living things left in our oceans could be plankton and jellyfish. Alex Renton meets the scientists who are trying to turn the tide
    Richard III, Trafalgar Studios, review: Martin Freeman gives highly intelligent performance

    Richard III review

    Martin Freeman’s psychotic monarch is big on mockery but wanting in malice
    Hollywood targets Asian audiences as US films enjoy record-breaking run at Chinese box office

    Hollywood targets Asian audiences

    The world's second biggest movie market is fast becoming the Hollywood studios' most crucial
    Grindr founder Joel Simkhai: 'I've found love on my dating app - and my mum keeps trying to hook me up!'

    Grindr founder Joel Simkhai: 'I've found love on my dating app'

    Five years on from its launch and Grindr is the world's most popular dating app for gay men. Its founder Joel Simkhai answers his critics, describes his isolation as a child
    Autocorrect has its uses but it can go rogue with embarrassing results - so is it time to ditch it?

    Is it time to ditch autocorrect?

    Matthew J X Malady persuaded friends to message manually instead, but failed to factor in fat fingers and drunk texting
    10 best girls' summer dresses

    Frock chick: 10 best girls' summer dresses

    Get them ready for the holidays with these cool and pretty options 
    Westminster’s dark secret: Adultery, homosexuality, sadomasochism and abuse of children were all seemingly lumped together

    Westminster’s dark secret

    Adultery, homosexuality, sadomasochism and abuse of children were all seemingly lumped together
    A History of the First World War in 100 Moments: Dulce et decorum est - a life cut short for a poet whose work achieved immortality

    A History of the First World War in 100 Moments

    Dulce et decorum est: a life cut short for a poet whose work achieved immortality
    Google tells popular music website to censor album cover art in 'sexually explicit content' ban

    Naked censorship?

    The strange case of Google, the music website and the nudity take-down requests
    Howzat! 8 best cricket bats

    Howzat! 8 best cricket bats

    As England take on India at Trent Bridge, here is our pick of the high-performing bats to help you up your run-count this summer 
    Brazil vs Germany World Cup 2014 comment: David Luiz falls from leader figure to symbol of national humiliation

    David Luiz falls from leader figure to symbol of national humiliation

    Captain appears to give up as shocking 7-1 World Cup semi-final defeat threatens ramifications in Brazil