The number crunch: Will Big Data transform your life - or make it a misery?

 

The age of Big Data is upon us. Fuelled by an incendiary mix of overblown claims and dire warnings, the public debate over the handling and exploitation of digital information on an astronomically large scale has been framed in stark terms: on one side are transformative forces that could immeasurably improve the human condition; on the other, powers so subversive and toxic that a catastrophic erosion of fundamental liberties looks inevitable.

The tension between these opposites has marooned the discussion of Big Data. It is stuck somewhere between Bletchley Park – the former Government Communications Headquarters (GCHQ) location where the godfather of the computational universe, Alan Turing, primed today's Big Data explosion during the Second World War – and the satirical tomfoolery of South Park, which recently portrayed the living core of all data as an incarcerated Father Christmas cruelly wired up to a machine by the US's National Security Agency (NSA).

We know from Edward Snowden's widely publicised whistle-blowing revelations that the NSA – in collusion with GCHQ – lifted vast amounts of data from Google and Yahoo, under the once-top-secret codename, Muscular. At the same time, we're told that the potential for beneficial insights mined from anonymous, adequately protected data is enormous.

Big Data helps us find things we "might like" to buy on Amazon, for example, but it has also left us vulnerable to surveillance by state and other agencies. Companies such as Google and Facebook are essentially Big Data businesses, whose staggering profitability stems from the application of data analysis to advertising: these "free" services are paid for by personal data surrendered automatically with every click.

In finance, meanwhile, optimists foresee a theoretical end to all stock-market crashes, thanks to insights derived from huge-scale data-crunching, while others predict an automated, algorithmic road to ruin. Similarly, the cost and efficiency of healthcare provision is set to be radically transformed for the better with access to massive amounts of data – likewise the development of new drugs and treatments. But what about the mining of medical data without patient consent? So the debate goes on.

One aspect of Big Data, however, is beyond question: it is indeed very big, and it's getting bigger by the millisecond. An IBM report in September estimated that 2.5 quintillion bytes of data are created every day (that's 25 followed by 17 zeros, or roughly 10 million laptop hard drives) and that 90 per cent of the world's data has been generated in the past two years: everything from geo-tagged phone texts and tweets to credit-card transactions and uploaded videos. By 2020, it's thought that the number of bytes will be 57 times greater than all the grains of sand on the world's beaches.

So what's actually going on at the coalface of Big Data, a code-centric world of striping, load-balancing, clustering and massively parallel processing? What do the analysts working with Big Data say it's going to do for us?

"You get a fuller picture of the phenomenon you're interested in, with more dimensions, and that lets you derive greater insights," says Big Data pioneer Doug Cutting, chief architect at enterprise software company Cloudera and founder of the popular open-source Big Data tool Hadoop. Cutting's work on internet search technology for Yahoo during the mid-2000s provided the ideal proving ground for combining vastly increased computing power with huge and diverse datasets. "And from that we've seen a new style of computing emerge."

The revolutionary effects of this new approach cannot be understated, especially within the scientific community. For Brad Voytek, professor of computational cognitive science and neuroscience at the University of California San Diego, and "data evangelist" for app-based taxi service Uber, Big Data has had a profound effect on the traditional scientific method. "You can sweep through huge amounts of data and come up with new observations," he says. "That's where the power of Big Data comes in. It's automating the observation process. It's making everything easier but in a way that few people yet understand. It's going to dramatically speed up the scientific process and people have been doing some really cool stuff with it."

Michael Schmidt, founder and chief executive of American "machine-learning" start-up Nutonian, established a Big Data landmark when, in partnership with robotics engineer Hod Lipson at Cornell University, New York, he created Eureqa – a piece of software that deduced Newton's Second Law of Motion by analysing data from the chaotic movements of a double pendulum. What took Newton years, the Eureqa algorithm accomplished in a matter of hours. With Nutonian, Schmidt is now opening up that Big Data technology beyond the college lab.

"We want to accelerate the process that scientists go through, to help you discover very deep principles from data," he says. "We want to explain how things work." The range of Eureqa's uses couldn't be more striking, from the construction of better warplanes to helping save the lives of infants. Schmidt is currently working with the United States Air Force, analysing the strength of advanced super-alloys used in engine components. "They are really interested in anticipating failures – knowing when things are going to break, explode or stop working. We were able to show them the most important things that go into a failure of a particular engine part, at a finer resolution than ever before."

Eureqa has also been used to help discover the optimal moment to remove breathing tubes from prematurely born babies. "It's really critical when you remove that tube, and allow the child to start breathing on its own," says Schmidt. "Premature babies are hooked up to every monitoring device you can imagine and we were able to take that data and winnow it down to a few of these key metrics that drive the future health of the babies. Which is pretty neat."

Harnessed to Big Data, this kind of analysis becomes the work of hours and minutes. "Traditionally you could spend years before you could conclude on a result. What's changed is that we have these huge datasets. You can rapidly accelerate the entire discovery process."

While the benefits of this revolutionary increase in analytical speed are clear, Big Data is often inseparable from its source and context, especially in the public realm, where ethical concerns are paramount. Justin Keen, professor of health politics at the University of Leeds, co-authored a June 2013 paper published in Policy & Internet, the journal of the Oxford Internet Institute. In it, he addressed issues of privacy and access in relation to Big Health Data. "The potential for much greater exploitation of data held by government departments in England and all around the world is real," he says. "We just haven't got proper governance arrangements at the moment – we don't know what rules should govern what NHS data gets published, and in what sort of format."

Early in 2013, Health Secretary Jeremy Hunt set the goal of a paperless NHS by April 2018, in line with programmes including care.data, which links patient data across different parts of the NHS. It is hoped that the resulting increase in preventative treatments, coupled with improvements in health management, will save billions and improve the quality of healthcare. The sticking point is patient confidentiality.

"I'm very happy to see that in the past month or two, senior civil servants have actually put the brakes on," says Keen. "Releases of data through care.data and other channels are actually going to be slowed down until we've got these governance arrangements right. But we're not going to get the releases of data that advocates are hoping for as early as they might have hoped for it."

Despite this slow-down, the Big Data community appears to be echoing Keen's note of caution. "From my perspective as a person who works in data, of course I want as much as I can get, because the more data you've got, the more interesting things you can do with it," says Francine Bennett, chief executive and co-founder of London-based Big Data specialists, Mastodon C, which mined available data to co-create the CDEC Open Health Data Platform, a showcase for insights generated by Big Health Data. "However, as a person who's knowledgeable about data – and as a citizen of the UK k whose health data is in these systems – I know that it could be enormously damaging to privacy to release things which shouldn't be released. It's hard to put the genie back in the bottle. I'm keen for it to be done in a measured way."

Gil Elbaz, founder and chief executive of open-data platform Factual, began his career as a database engineer in Silicon Valley in the 1990s before co-founding Applied Semantics, acquired by Google in 2003 for $102m. Applied Semantics developed AdSense, the technology that matches online advertisements to the pages being browsed and the person browsing them. "The approach we took to the contextual targeting of ads was all rooted in processing huge amounts of data," says Elbaz, whose Factual company website affirms his core belief in "making data accessible".

"We take data privacy very seriously, and if somebody's data is theirs, they should have the right to keep it private. That being said, there are significant opportunities where data shouldn't be kept fully private, because it's to society's benefit for it to be open," he says, citing David Cameron's October 2013 announcement, at the Open Government Partnership summit, of a public register of business ownership. "Data at Factual is primarily business data," says Elbaz. "These businesses want it to be available."

Even where the privacy question is not an issue, Elbaz is concerned that information can get trapped in hard-to-reach databases. "Too often today data is not accessible. For example, why is it that software can't automatically check – given the age of a patient and any drug – whether a dosage is healthy or lethal? Why can't it be flagged? The reason is that there is no open API (Application Programming Interface, or app-creation tool) that has drugs and dosage ranges. It does not exist. Is there a database? Yes. But it'll take a long time even to find the right person to buy that data from. To me, this is insane."

So where's it all leading us? For some, the ultimate goal of Big Data has been defined as a kind of supreme foresight: an ability to predict what people want before they know they want it. Elbaz takes a more functional view. "My holy grail is that if any piece of software needs access to information, it can find that access at a reasonable cost," he says. "To me, it is not crazy rocket science – it's the basic fabric of how a global information system should work."

For Schmidt, the quest for enlightenment has only just begun. "A lot of promises have been made for Big Data in the hope that it has this enormous value, and we're starting to chip away a little at that, but there's still so much to be done."

Doug Cutting, however, has little interest in the notion that Big Data will supply some kind of predictive super-power. "I'm an engineer. I focus down on the plumbing. I think I have a more concrete imagination about what is possible. I don't believe it's possible to have an oracle that can predict what I'll be interested in doing tomorrow. Moreover, I find surprises invigorating; I'd hate to lose spontaneity in the world."

However, he adds, certain kinds of things can be done better. "To me, the holy grail is removing limitations and being able to achieve the interconnectedness that we want; to be able to take advantage of all the data and do all the things we imagine are possible. I don't think we want to get there overnight as a society. We need to embrace these things and understand what we want to happen and what we don't want to happen – build the right societal, legal and business structures. We need to evolve."

Three eye-catching big data ventures

1. Open Data Institute

Aim: free data for all

Co-founded by Sir Tim Berners-Lee, the inventor of the World Wide Web, to encourage the exploitation of freely available data – aka "open data" – the not-for-profit Open Data Institute has positioned itself as both a catalyst for data innovation and a global hub for data expertise. Based in Shoreditch, east London, the ODI oversees a network of collaborative international "nodes", including Dubai and Buenos Aires, and has incubated a growing bunch of Big Data start-ups – for example, Mastodon C (see main feature), which identified potential NHS savings of about £200m by crunching data relating to branded and generic drugs; and Placr, which analyses real-time transportation and timetable information to improve daily travel. theodi.org

2. The Human Brain Project

Aim: to reveal the workings of human consciousness

Flush with €1bn in funding, the Human Brain Project is a 10-year quest to reveal the hidden workings of consciousness. The scale of this task is so immense – the brain has around 100 trillion neural connections – that many still doubt it can be achieved, but Switzerland-based project leader Henry Markram believes his collaborative Big Data approach, using statistical simulations and vast supercomputing power across "swarms" of researchers, might do the trick. One aspect of the plan involves mining a huge amount of available data on mental disorders from public hospitals as well as pharmaceutical company databases; algorithms will then isolate revealing patterns and connections. In a decade's time, the neural picture should be much clearer. humanbrainproject.eu

3. IBM's Computational Creativity

Aim: to make computers 'creative'

Following a line of computer evolution that runs from Deep Blue (which beat Gary Kasparov at chess in 1997) through Watson (which beat human opponents on the US quiz show Jeopardy! in 2011), IBM has continued its ingenious manipulation of huge datasets with a system designed to generate creativity. Big-data analytics techniques have been deployed by IBM's Thomas J Watson Research Center to create new food recipes – what you might call technouvelle cuisine – mined from sources including Wikipedia and Fenaroli's Handbook of Flavor Ingredients, then tweaked with an algorithm designed to add creativity to matched ingredients. The results (from Vietnamese apple kebab to Cuban lobster bouillabaisse) have impressed human chefs. research.ibm.com

News
More than 90 years of car history are coming to an end with the abolition of the paper car-tax disc
newsThis and other facts you never knew about the paper circle - completely obsolete tomorrow
News
people'I’d rather have Fred and Rose West quote my characters on childcare'
News
Kim Jong Un gives field guidance during his inspection of the Korean People's Army (KPA) Naval Unit 167
newsSouth Korean reports suggest rumours of a coup were unfounded
Arts and Entertainment
You could be in the Glastonbury crowd next summer if you follow our tips for bagging tickets this week
music
PROMOTED VIDEO
Life and Style
ebooksA superb mix of recipes serving up the freshest of local produce in a delicious range of styles
Life and Style
ebooksFrom the lifespan of a slug to the distance to the Sun: answers to 500 questions from readers
Arts and Entertainment
Kylie performs during her Kiss Me Once tour
musicReview: 26 years on from her first single, the pop princess tries just a bit too hard at London's O2
News
peopleSwimmer also charged with crossing double land lines and excessive speeding
Arts and Entertainment
A new Banksy entitled 'Art Buff' has appeared in Folkestone, Kent
art
News
i100
Arts and Entertainment
Shia LaBeouf is one of Brad Pitt's favourite actors in the world ever, apparently
filmsAn 'eccentric' choice, certainly
Latest stories from i100
Have you tried new the Independent Digital Edition apps?
Independent Dating
and  

By clicking 'Search' you
are agreeing to our
Terms of Use.

ES Rentals

    iJobs Job Widget
    iJobs Gadgets & Tech

    1st Line Service Desk Analyst

    £27000 - £30000 Per Annum: Clearwater People Solutions Ltd: Our client who are...

    Trainee Recruitment Consultant - Birmingham - Huxley Associates

    £18000 - £23000 per annum + Commission: SThree: Huxley Associates are currentl...

    Trainee Recruitment Consultant - Birmingham - Computer Futures

    £18000 - £23000 per annum + Commission: SThree: Computer Futures are currently...

    Recruitment Consultant - Bristol - Computer Futures - £18-25k

    £18000 - £25000 per annum + Commission: SThree: Computer Futures are currently...

    Day In a Page

    Isis is an hour from Baghdad, the Iraq army has little chance against it, and air strikes won't help

    Isis an hour away from Baghdad -

    and with no sign of Iraq army being able to make a successful counter-attack
    Turner Prize 2014 is frustratingly timid

    Turner Prize 2014 is frustratingly timid

    The exhibition nods to rich and potentially brilliant ideas, but steps back
    Last chance to see: Half the world’s animals have disappeared over the last 40 years

    Last chance to see...

    The Earth’s animal wildlife population has halved in 40 years
    So here's why teenagers are always grumpy - and it's not what you think

    Truth behind teens' grumpiness

    Early school hours mess with their biological clocks
    Why can no one stop hackers putting celebrities' private photos online?

    Hacked photos: the third wave

    Why can no one stop hackers putting celebrities' private photos online?
    Royal Ballet star dubbed 'Charlize Theron in pointe shoes' takes on Manon

    Homegrown ballerina is on the rise

    Royal Ballet star Melissa Hamilton is about to tackle the role of Manon
    Education, eduction, education? Our growing fascination with what really goes on in school

    Education, education, education

    TV documentaries filmed in classrooms are now a genre in their own right
    It’s reasonable to negotiate with the likes of Isis, so why don’t we do it and save lives?

    It’s perfectly reasonable to negotiate with villains like Isis

    So why don’t we do it and save some lives?
    This man just ran a marathon in under 2 hours 3 minutes. Is a 2-hour race in sight?

    Is a sub-2-hour race now within sight?

    Dennis Kimetto breaks marathon record
    We shall not be moved, say Stratford's single parents fighting eviction

    Inside the E15 'occupation'

    We shall not be moved, say Stratford single parents
    Air strikes alone will fail to stop Isis

    Air strikes alone will fail to stop Isis

    Talks between all touched by the crisis in Syria and Iraq can achieve as much as the Tornadoes, says Patrick Cockburn
    Nadhim Zahawi: From a refugee on welfare to the heart of No 10

    Nadhim Zahawi: From a refugee on welfare to the heart of No 10

    The Tory MP speaks for the first time about the devastating effect of his father's bankruptcy
    Witches: A history of misogyny

    Witches: A history of misogyny

    The sexist abuse that haunts modern life is nothing new: women have been 'trolled' in art for 500 years
    Shona Rhimes interview: Meet the most powerful woman in US television

    Meet the most powerful woman in US television

    Writer and producer of shows like Grey's Anatomy, Shonda Rhimes now has her own evening of primetime TV – but she’s taking it in her stride
    'Before They Pass Away': Endangered communities photographed 'like Kate Moss'

    Endangered communities photographed 'like Kate Moss'

    Jimmy Nelson travelled the world to photograph 35 threatened tribes in an unashamedly glamorous style