The two scientists stood shoulder to shoulder with President Bill Clinton in the East Room of the White House, the same room where the American explorers Meriwether Lewis and William Clark unfurled their map of the Northwest Territories for Thomas Jefferson.
Like Lewis and Clark 200 years before them, the scientists were explorers of a newly discovered landscape and they, too, had just completed their own draft map of an unknown world.
Francis Collins and Craig Venter had been, in fact, rivals in the race to decode the human genome, but they had been cajoled into making a very public "truce" in a joint transatlantic press conference with Tony Blair in Downing Street. Venter, the leader of the privately run project, and Collins, the government scientist, stood together to explain that humanity was now able to read, for the first time, the genetic map describing the detailed coordinates of our DNA code.
Some have likened the human genome to a genetic blueprint; some describe it as the digital recipe for making a human being; others simply refer to it as the Book of Life. In fact, no metaphor quite fits the human genome, an encoded message comprised of some 3 billion "letters" of the four-letter genetic alphabet arranged in a well-defined sequence.
It was determining this precise sequence that took Collins and Venter – and thousands of other scientists – more than 10 years and $3bn to complete. It was a scientific magnum opus, the Apollo programme of biology, comparable to putting a man on the Moon. One British scientist compared the achievement to the invention of the wheel, arguing that the genome was an even greater feat.
Standing in the East Room exactly a decade ago, Bill Clinton said that the completion of the international Human Genome Project meant scientists were now learning the language in which God created life, while Tony Blair in Downing Street said that it represented a 21st-century revolution in medical science whose implications will far surpass the discovery of antibiotics in the 20th century. The human genome was going to change the way medicine is practised. It was going to reveal the hidden hand of cards dealt by our genes so we could cheat our genetic destiny.
Francis Collins, now director of the US National Institutes of Health, made his own predictions for 2010 as he gave his PowerPoint presentation to the world. He predicted gene tests would be available for a dozen medical conditions, that doctors and clinics would practise genetic medicine, that pre-implantation genetic diagnosis of IVF embryos would be widely available, and that genetic discrimination would be banned – at least in the US.
More lab work needed
Behind all the hyperbole, it was clear the human genome was going to tell us things we could only dream about. It had the potential to reveal our genetic predisposition to disease early enough for us to do something about it. It could allow drugs to be designed and selected on the basis of a person's genetic make-up rather than relying simply on a patient's symptoms. Such genome scanning would avoid the side-effects of taking drugs, improve their efficacy, and lead to a bright, new future of "personalised medicine", where treatment is based on an individual's genome rather than a therapy based on a one-size-fits-all approach.
Ten years on from that momentous occasion, the reality for many people has been far less exciting, as Collins accepts. The consequences for clinical medicine have so far been modest, he admits. "It is fair to say that the Human Genome Project has not yet directly affected the healthcare of most individuals," he says. Venter, too, accepts that "there is still some way to go before this capability can have a significant effect on medicine and health".
This is not to say that there have been no advances. Far from it. Scientists in many disciplines have described enormous progress in understanding human disorders as a result of knowing the human genome, which was only fully sequenced in 2003 (the 2000 map was a rough draft). The genetic basis of some of the most common disorders in particular are beginning to be revealed, they argue.
Earlier this month, for instance, scientists announced a breakthrough in understanding the link between autism and changes to the DNA of the genome. Last year, genome scientists announced a genetic link between schizophrenia and bipolar disorder. Almost every week there are new genome studies into common medical problems, from heart disease and stroke, to obesity, diabetes and, of course, cancer, the quintessential genetic disease.
Shortly after the White House announcement, scientists established the Cancer Genome Project to compare the genomes of cancer cells with the genomes of healthy cells from the same individual. They hoped that by decoding the genomes of around 500 patients with the same type of cancer they would be able to find the specific "driver" mutations that cause a healthy cell to divide uncontrollably to form a tumour. The hope is to find out ways of counteracting these mutations and discover what it is that causes the genetic changes in the first place.
Professor Mike Stratton, director of the Wellcome Trust Sanger Institute in Cambridge, where much of the human genome was decoded, is also the joint head of the Cancer Genome Project. He insists the genome has more than lived up to expectations and has "changed the landscape" of medical science. "It has quite simply transformed our perception of cancer," he says.
Scientists have, to date, identified about 400 genes that contribute to cancer. Some of this knowledge has already been translated into practical benefits, the best-known examples being the drug Herceptin, which treats breast cancer patients with defective HER2 genes, and Glivec, the standard treatment for chronic myeloid leukaemia.
By understanding the precise nature of the genetic changes that trigger a cancer, it might be possible to devise a drug that could counteract the effect. Glivec, taken as a pill with few side-effects, is one such drug and has almost doubled five-year survival rates to 90 per cent. Mutations in another cancer gene, called BRAF, are already been targeted with potential drugs and soon, it is hoped, cancer patients will be treated on the basis of the genetic mutations carried by their tumours, rather than therapy based on the cancer's position in the body.
But the rolling-out of such services is painstakingly slow. NHS provision for the testing of known cancer-gene mutations is patchy, although the charity Cancer Research UK has a plan to carry out more systematic tests as part of a pilot scheme scheduled to begin this autumn. It is still the case that the potential "consumers" of genome information are several steps behind the scientific producers of the data. And what data there is.
Faster and faster
From its very outset, the Human Genome Project was going to be a vast exercise in data handling. The basic genome itself, consisting of 3 billion separate items of information, had to be "read" about a dozen times to eliminate inevitable errors in the decoding of the DNA sequence.
In 2000, when the draft genome was published, the public genome databases in the world consisted of about 8 billion "base pairs", the genetic letters that make up a DNA sequence. Ten years later they hold 270 billion base pairs, and this figure is doubling every 18 months. Furthermore, this vast ocean of data does not include the much bigger volume of DNA sequences that have yet to be deposited in public databases.
The sequencing technique, pioneered by the Cambridge double Nobel laureate Fred Sanger, was initially a labour-intensive affair. But as the years progressed, it became increasingly automated and computerised, dramatically speeding up and lowering the cost of sequencing.
In the past 10 years, the cost of decoding a genome has fallen dramatically, while the speed of decoding has risen faster than anyone expected. It took thousands of scientists 10 years and $3bn to decode the first human genome, at a cost of about $1 for each DNA base pair. That cost has fallen thousands of times over.
"Huge technological changes in the capacity to generate data means a handful of people can now generate a genome in a week for maybe $30,000 to $40,000. But these figures are almost out of date as soon as you write them down," says Professor Mark McCarthy of Oxford University, who is involved in the international consortium to find the genetic basis of diabetes.
The talk now is breaking the $1,000 genome barrier. One of the immediate benefits is the ease with which it is possible to find a genetic mutation causing a disease. Two decades ago, for instance, it took scientists three years and $50m to find the cystic fibrosis gene. Now, the same task would take a matter of months or weeks for a few thousand dollars.
Cystic fibrosis is a single-gene disorder and one of about 5,000 single-gene inherited disorders, many of which are very rare. But the genome has also proved invaluable in detecting the genetic defects contributing to more common disorders involving the influence of many different genes. About 40 genes have already been identified that are associated with diabetes, and another 40 have been linked with obesity. Yet, together, these genes account for less than 10 per cent of the genetic variation of these metabolic disorders. Scientists hope to find the remaining 90 per cent, something they call the "missing heritability", by sequencing more and more genomes – another 3,000 in the case of the diabetes genome project.
A plan to decode the entire genomes of 1,000 people from around the world is already underway. One human genome is more than 99 per cent identical to another. But it is the 1 per cent difference that can prove critical in the understanding of a genetic basis of disease; explaining why one person gets cancer while another does not. This is why scientists want to fully decode the genomes of a representatively diverse group of humanity, from Chinese, Japanese and Indians to Africans, Latin Americans and Europeans.
As a result of rapidly falling costs, the 1,000 Genomes Project now, in fact, intends to decode 2,500 genomes. This project will generate phenomenal amounts of data. During its early stages, the project will be decoding the equivalent of more than two genomes every 24 hours. Over its three-year course, the 1,000 Genomes Project will generate 6 trillion "letters", or base pairs, that comprise the DNA sequence. This is 60 times more DNA sequence data than has been deposited in the entire public DNA databases over the past 25 years.
When fully up and running, the 1,000 Genomes Project will, on its own, generate more DNA sequences in two days than was produced for all of the DNA public databases in the past year. And this is just human genome sequencing. Other scientists are decoding the genomes of a bewildering array of animals, adding to the published genomes of the chimp, rat, mouse, cat, dog ... and duck-billed platypus.
Ever since the Human Genome Project was first mooted at the end of the 1980s, scientists realised that it would be a mega-informatics project, akin to the "big physics" experiments that have enjoyed huge public financing. Genomics researchers have even adopted the same language of their physical science colleagues, hence the phrase "missing heritability", which is taken directly from the astronomer's lexicon describing the "missing mass" of the Universe. Equally, "dark matter" – to the genomics researcher – refers not to the cosmological phenomenon but the immense proportion of the genome whose function is unknown. The genomic dark matter is the vast proportion of the genome not directly involved in what genes are supposed to do – code for proteins. This form of dark matter, comprising some 90 per cent of the genome, has no apparent function, but few scientists would now describe it under its old nickname of "junk DNA".
Indeed, one of the first great surprises of the genome was realising that much of it doesn't seem to do very much. The classic idea of DNA is that it is composed of genes, each of which is responsible for holding the code of a single protein. Publication of the genome destroyed this simplistic notion, and has even questioned the very definition of a gene.
For a start, it has become apparent there are fewer genes than originally thought. In the 1990s it was assumed that the human genome would have between about 80,000 and 120,000 genes. Once we could see the genome in its entirety it was apparent that humans have a mere 21,500 genes, give or take, a surprisingly low number for such a complex life form.
"When we found that humans only had as many genes as a weed, that was a bit of an insult," says Nick Hastie, a genomics researcher at the Medical Research Council's Human Genetics Unit in Edinburgh. But in fact, he says, it also became clear that the old idea of "one gene, one protein" was wrong. For a start, some genes appear to be involved in coding for multiple proteins by a process known as "alternative splicing".
Another realisation was that many genes, or regions of the genome, do not code for proteins at all, but are responsible for producing short-strands of RNA, a close molecular cousin of DNA. These micro-RNAs play a critical, but little-understood role in controlling the vital activities of a cell. It was a further blow to the idea that knowing the genome would quickly solve all our problems. "We fooled ourselves into thinking the genome was going to be a transparent blueprint, but it's not," says Mel Greaves, a cell biologist at the Institute of Cancer Research.
A further complication of the genome was the realisation that there is another, hidden layer of complexity to inheritance. Inheritance was not just a case of passing on a particular DNA sequence. Other things could happen to the genes to change the way they behave depending on their environmental circumstances. This phenomenon, known as epigenetics, basically meant that new traits could be inherited by some unknown route, other than through mutations to the DNA. It has spawned an entire new discipline aimed at deciphering this mysterious "epigenome".
Knowing your own genome
All these extra complications meant that the simplistic idea of being able to go to a GP's surgery in 2010 to get a predictive prognosis based on a read-out of your genome was, well, simplistic. As Nick Hastie says: "It's going to be less predictive than you think, because it's complicated."
Over the last decade, a number of companies have sprung up offering personal genome-wide scans sold over the counter. These are not full sequences of the entire genome, just partial maps concentrating on well-visited areas of the genome. They are of limited value and the medical advice offered almost invariably comes down to "eat well, exercise and don't smoke" – hardly a medical revolution.
Still, there are a number of areas of the genome that could cause extreme anxiety when the day comes for us all to be able to read our personalised book of life. The ApoE4 gene, for instance, is linked with a significantly increased risk of Alzheimer's disease, a risk that few people would want to know about because, at present, there is nothing you can do about it. (Interestingly, when James Watson, the co-discoverer of the DNA double helix, had his own genome sequenced this was the only gene he wanted to know nothing about.)
Yet there is one prediction that will continue to come true. Decoding genomes will certainly become cheaper, easier and quicker. Each of us will very likely know several versions of our own genome, taken from different parts of our bodies at different stages of our lives. As Craig Venter predicts, we will move beyond the current goal of one genome per person. "This will enable us to select healthy cells for reproduction and tissue transplants, or to better understand ageing and tumour development," he says. "The genome revolution is only just beginning."
Ten years on from the first draft map of the human genome, scientists have even bolder plans for the next 10 years. The first maps to this brave new world are now being drawn.
Human genome project: The milestones
2000: Francis Collins and Craig Venter produce first draft of human genome comprised of three billion "letters" of the four-letter genetic alphabet that represent the Book of Life, right.
2002: The BRAF gene is found to be mutated in 60 to 70 per cent of malignant melanomas (a lethal form of skin cancer) and 10 to 15 per cent of colorectal cancers. Research is now yielding drugs to block BRAF and thus treat melanoma.
2003: Gold-standard human genome sequence is completed. Scientists start to identify genes linked with common diseases. About 40 genes have been associated with diabetes and 40 with obesity. However, together they account for less than 10 per cent of the variation in incidence of these conditions.
2008: 1,000 Genomes project launched to decode the genomes of 1,000 people around the world. One human genome is more than 99 per cent identical to another. But it is the one per cent difference that can explain why one person gets cancer while another does not.
2009: Researchers at the Sanger Institute publish the first complete cancer genomes for malignant melanoma and lung cancer. Scientists are working to sequence 25,000 cancer genomes from 50 different kinds of cancer to identify all the driving cancer genes and provide insight into the processes that cause cancer.
2010: About 800 genetic variations linked to common conditions including heart disease, diabetes, Crohn's disease and cancer are identifed. More than 2,000 single gene tests available for conditions including breast and colon cancer.
2010: onward Wellcome Trust Sanger Institute will sequence 10,000 genomes in the next three years. Cost has fallen from $3bn to decode the first genome to about $10,000 each, and is likely to reach the "$1,000 genome" in the next few years.