Stay up to date with notifications from The Independent

Notifications can be managed in browser preferences.

Remembrance of data past

As digital records increase, so does the need to keep them readable. Charles Arthur reports on a new drive to keep old information alive

Monday 04 March 2002 01:00 GMT
Comments

If you want to amaze a child used to playing on a games console, try telling them that computer games used to come on cassette tapes. In 1982, proud owners of the Sinclair ZX Spectrum (which boasted a stunning 16K – that's kilobytes – of memory in its basic configuration) would connect the audio output of the cassette recorder to the Spectrum's input; the program, recorded as a series of high and low tones, was then translated into data and loaded into memory.

Doing that now might seem pointless; after all, you can nip out to the shops and buy a new game with far more bang for the buck than a Spectrum. True, you could probably pick one up for £5 at a car boot sale, but you'll have to find it first.

What, though, if you were a historian, and the program you were looking for was the one that was used originally to arm nuclear missiles in the 1960s, and fed into by punched cards to computers that are now long since consigned to the scrap heap? What if you needed to find a working version of the computer to test the program? Suddenly, the problem doesn't seem so trivial. After all, it would be a major research coup to discover that there was once a bug in our scheme to eliminate the enemy.

And what about the public and private records being generated today – the letters that a famous author wrote on a PC that has been discontinued, using a program whose developers have long ago gone bust? And what of the e-mails being generated within this government that will one day have to be made public?

Getting a handle on the preservation of this digital data is the purpose of the Digital Preservation Coalition (DPC), which last week announced an action plan "to ensure that the digital information we are producing is not lost to current and future generations".

At the launch of the project, which has backing from 19 UK organisations – including the Public Record Office (PRO), the Joint Information Systems Committee of the Higher and Further Education Funding Councils (JISC), the British Library and the University of London – a pertinent example was mentioned: the BBC Domesday Project. This was a multimedia project that eventually produced a pair of interactive video discs, made by the BBC, to celebrate the 900th anniversary of the original Domesday Book. More than a million people contributed in some way, providing offerings from schools and researchers.

These were then stored on the discs and could be viewed using a BBC Acorn computer. It was claimed that it would take you more than seven years to look at everything on the discs. However, by the time you had looked at all that content, the computers would long since have become obsolete. And that's pretty much what has happened: "As a multimedia resource and interactive learning tool it was unsurpassed," said Loyd Grossman, chairman of the DPC. "Yet despite those achievements, the problems of hardware and software dependence have now rendered the system obsolete. With few working examples left, the information on this incredible historical object will soon disappear forever."

Lynne Brindley, who chairs the DPC, concurs: "When the average life cycle of a website is six weeks, and the life cycle of new technologies is measured in singleton years, the concept of long-term access to digital content being measured in hundreds of years is, to say the least, challenging."

Among those who feel really challenged is the PRO. There, David Ryan, head of archive services, has the unenviable task of trying to marshal the growing flood of computer-based information that is coming in from all over the civil service.

Items are sent to the PRO when they are at least 30 years old; most are weeded out over time, and regarded as not worth keeping as a matter of historical record about the working of government, and so the PRO only receives 3 per cent of the paperwork that was generated in any department. It was even so for 2001 – covering the period stretching back to 1971 and (for more secret documents) even earlier, which generated a stack of paper that covers the equivalent of 1.5 kilometres (0.9 miles) of shelf space. And in a few years, there will be more and more computer tapes and disks. The question is, how should they be preserved? And what is the best medium and encoding format to make them available over the long term, perhaps hundreds of years?

"I don't know," says Ryan bluntly. But it's not said in defeat; instead, he relishes the idea of tackling this problem. "I'm actually fairly optimistic about all this. I think that society is migrating from being paper-based to being computer-based. We're at a crossover period, and so the rate of change in formats and media is because we are in the early age of the computer revolution. Computers will become ubiquitous, and in a few years many of these issues will have been dealt with. Look at cash machines, for example: the cards are all the same size because the system depends on interoperability. Maybe in the future it will be the same with the computing infrastructure."

One problem with storing digital data for the future is finding a standard, open format. "In the 1980s everyone would have said that it was ASCII, which is just plain text," he says. "Now, people are saying it's XML [Extensible Markup Language, of which HTML is a subset]. I would say – perhaps; but the really important thing is that digital data is very different from paper data. The latter has very low entropy. You can store and it will last literally for centuries if it's on acid-free paper. But with digital data, you have to keep paying attention to how standards are changing." Otherwise, you'll end up with your marvellous BBC Domesday discs – and no way to unlock the content.

To that end, the PRO is assembling its own computer library of emulation programs. "We recognise that not everyone will have a copy of Wordstar for DOS, so we're working on either using an emulator to present that document in the same format as it would have appeared, or to export it to PDF." (PDF, the Portable Document Format, is not a proprietary standard, despite having been defined by the graphics company Adobe.)

He doesn't even put forward an opinion on whether the PRO's future storage will use magnetic or optical media; a tender is about to go out for companies to bid for the contract to store its electronic records. And the PRO will store any computer programs sent to it, although the punch cards that might have controlled our missiles in the Cuban missile crisis are long gone. "Those would have been transferred to magnetic tape," Ryan says.

What he is expecting, though, is that formats will settle down. One can imagine, for example, that historians will be interested in the e-mails sent within the Department of Transport for the period between 11 September 2001 and mid-February 2002, especially where Jo Moore or Martin Sixsmith are among the senders or recipients.

But they won't have to hunt around at the future equivalents of car boot sales for machines to run them on. "Frankly, I would be depressed if in 200 years people are still having to go through this loop of finding old machines and emulators," says Ryan.

Certainly, when there's enough interest, the programs and data will live on. Using the ZX Spectrum's rubber keys was once memorably described as being "like typing on dead flesh". But for a generation, it was their introduction to computing and, even, to hacking operating systems. Their enthusiasm for the box means that today there are dozens of Spectrum emulators on the net, available for free and written in Java. And you can get a stack of games for free – though ironically, Amstrad, which bought the Sinclair name, recently announced that it will sell those games via its latest Em@iler web appliance. Clearly, the best way for a technology to combat obsolescence is the simple one: always remain popular.

Join our commenting forum

Join thought-provoking conversations, follow other Independent readers and see their replies

Comments

Thank you for registering

Please refresh the page or navigate to another page on the site to be automatically logged inPlease refresh your browser to be logged in