The end of history

In centuries to come, what will scholars be able to learn of the great events and discoveries of our time? As paper records are replaced by unstored e-mails and obsolete software, we may be entering a new digital dark age. Charles Arthur reports

Spot the odd one out: the Magna Carta, the Dead Sea Scrolls, and the contents of Downing Street's email inbox today. Of course, it's Downing Street's inbox: in 200 years or so, it will take the most valiant effort to read those emails in whatever form they have been preserved. By contrast, you can be pretty confident that there will still be scholars AD2300 who will be interested by, and able to read, the parchment and paper documents that have already survived for centuries.

That sort of thing has got scientists worried. This month, three members of the University of Texas have noted in a letter to the science journal Nature that, compared with the ease of reading the Rosetta Stone (which held the key to deciphering Egyptian hieroglyphs), "it is nothing short of terrifying to contemplate the probable magnitude of lost records as yesterday's electronic storage devices become incompatible with today's software applications."

Anyone who bought a computer a decade - or even a few years - ago will be familiar with this. Many came with floppy drives or even more arcane storage systems (such as the cassette tapes used by the Sinclair home computers of the 1980s), which made perfect sense then but now present a formidable obstacle to retrieval. If your great novel was written on one of the three-inch discs that the popular Amstrad PCW all-in-one word processor used in the late 1980s, you'll have a struggle to get it into a format that anyone can read now.

Indeed, the rise of the digital age has left some subtle possibilities for calamity in its wake. In June 2002, the Ivar Aasen Centre of Language and Culture, a Norwegian literary museum, broadcast a cry for help on the internet after discovering it couldn't get into its computerised catalogue system: the man who administered it, and had the essential password to do so, had died, taking the password with him.

"A lot of our culture is moving into digital form," notes Michael Day, a research officer at UKOLN, a British centre for digital information management expertise based at the University of Bath. "And don't forget scientific data, which has to be available in the future so that we can confirm the experiments that are done today. But it's hard to know if there's long-term persistence of the data that we need. The web is becoming such an interlinked part of our culture that there's no way to record every piece of it."

In 2000, the University of California, Berkeley published a study showing that printed content represents just 0.003 per cent (three parts to every 100,000) of the world's total information - most of the remainder is stored digitally. That's a dramatic change from a standing start in the 1950s with IBM's "punched card" computer storage system. So where is that information? On hard drives with one-year warranties, which have a "mean time before failure" of perhaps a million hours. Computer geeks, however, know that three things in life are certain - death, taxes, and that your hard drive will fail.

What options are there, then, to save us from a future that can't read the past? One is Britain's Digital Preservation Coalition, which was launched last year with the backing of 19 organisations including the Public Record Office, the British Library and the University of London.

At the time, they were shaking their heads over what seemed like the classic example of digital work vanishing into the technological darkness: the BBC's Domesday Project, a multimedia extravaganza that was boiled down onto a pair of interactive videodiscs that could be viewed on a BBC Acorn computer. The intention was to celebrate the 900th anniversary of the original Domesday Book. More than a million people contributed, from researchers to schoolchildren. It was meant to give a social snapshot of Britain in the mid-1980s.

Yet, in February last year, it seemed that the original Domesday book had survived the test of time rather better than its descendant. "The problems of software and hardware have now rendered the system obsolete," said Loyd Grossman, chairman of the DPC. "With few working examples left, the information on this incredible historical object will soon disappear for ever."

Fortunately, since then programmers have dug up the specification of the original discs and created a programme that pretends to be a BBC Acorn computer, but is really a standard PC. Suddenly, the books can be viewed again.

However, warned Paul Wheatley, manager of the group at Leeds University who worked out this programme, "we must invest wisely in developing an infrastructure to preserve our digital records before it is too late." But when is it "too late"? The example he himself has set in fixing the BBC's Domesday Project suggests that time is not necessarily of the essence. And it's not as though the people of past ages ever paid much thought to what they were leaving us.

George Holmes, former Chichele Professor of Medieval History at All Souls College, Oxford, notes that the things that have survived from historyaren't necessarily that interesting if you want to know about day-to-day culture (as our descendants presumably will). "It's very accidental what things that record people's personalities survive," he says. "Serious writing only begins around the 12th century with the development of literacy. You find more things from the Renaissance in Italy than England, because the Italians were keen letter-writers."

The documents that do remain, though, are the administrative and financial ones. Walk around any junk shop today, and you'll probably find a framed mortgage from 100 years ago. Because, says Professor Holmes, "such things have administrative value. In England and Europe, records of the courts and the Treasuries survive pretty fully. But they aren't much good for working out what people were like."

Certainly, the Digital Preservation Coalition isn't looking to save everything; it recognises that it can't. Besides, with the amount of information that we store more than doubling every year, the challenge would only grow greater. The Public Record Office receives only three per cent of the Civil Service's records (and those must be 30 years old before they are submitted). More and more arrive in the form of computer tapes; in time, they will be floppy disks from an age whose computers are dead. How to deal with that? David Ryan, the head of archive services, says the PRO is assembling its own library of emulation programmes, just like the one used to bring the Domesday Project back to life. But, when he is asked the best way to preserve that data for future generations, choosing from the alphabet soup of possible encoding options available today - ASCII, HTML, PDF, XML, Postscript - Ryan simply says: "I don't know."

The enormous popularity of email is, however, sure to have a significant effect on at least one future trade: biography. "It will be very difficult to write biographies in the future in the same way," says Professor Holmes. "People have largely given up writing letters to each other; so much so, in fact, that it's probably going to be a serious problem for future historians." Then again, he adds, go back just two or three centuries and you would still have trouble finding enough letters to get a clear idea about someone. (After all, while Pepys's biography is enormously detailed, it's virtually impossible to write an account of his wife's life. The difference is that, because he wrote diaries and worked in the Civil Service, much of his work had a chance of being preserved.)

And the email messages written by our present and future history-makers to their peers, even the insightful ones, might reside on a machine that is later binned, stolen, wiped clean and sold on, or just destroyed. Or they'll be text messages, whose life is as short as the mobile phone they reside on - perhaps even as short as a few months.

Certainly, the shift to email is what has principally worried the University of Texas team. "Worldwide, the number of corporate mailboxes is projected to grow from 131 million in 2001 to 225 million in 2005; daily email traffic is anticipated to grow from 9.7 billion pieces in 2000 to more than 35 billion in 2005. Little of this staggering volume is translated to hard copy."

Of course, given that roughly half of those emails at present are "spam" - junk mail offers of sex and drugs (or sex drugs) mostly - perhaps they don't need to be.

In fact, archivists of whatever data is going to be preserved have hardened themselves to throwing a fair amount away. "I have no problem with not collecting everything," says Dr Day of UKOLN. "But there is important data that needs to be collected all the same."

Overall, though, he's hopeful that people who value whatever data they have - home movies, photographs, speeches, tapes, and so on - will find a way to transfer it onto a more current medium - and once something is in digital format, at least, you can transfer it onto newer media pretty much forever. The emulation programmes are also becoming easier to write - those for a home computer of the 1980s will fit comfortably into a mobile phone today, for example.

And there are always other hopeful signs that data won't get lost for ever. The Ivar Aasen Centre's cry for help drew 100 offers of expert assistance, but the problem was actually solved within five hours by a Swedish hacker. It turned out that the missing password was the dead man's name - spelt backwards.