Stay up to date with notifications from The Independent

Notifications can be managed in browser preferences.

The Independent's journalism is supported by our readers. When you purchase through links on our site, we may earn commission.

American scientists have finally finished drafting the human genome

A century ago, scientists knew that genes were spread across 23 pairs of chromosome but only now do we have a full, gapless draft of the human genome. But what does that mean? Carl Zimmer explains

Wednesday 28 July 2021 14:11 BST
Comments
The DNA molecule. It is hoped the new data will help give a deeper understanding of how it influences risks of disease
The DNA molecule. It is hoped the new data will help give a deeper understanding of how it influences risks of disease (Getty/iStock)

Two decades after the draft sequence of the human genome was unveiled to great fanfare, a team of 99 scientists has finally deciphered the entire thing. They have filled in vast gaps and corrected a long list of errors in previous versions, giving us a new view of our DNA.

The consortium has posted six papers online in recent weeks in which they describe the full genome. These hard-sought data, now under review by scientific journals, will give scientists a deeper understanding of how DNA influences risks of disease, the scientists say, and how cells keep it in neatly organised chromosomes instead of molecular tangles.

For example, the researchers have uncovered more than 100 new genes that may be functional and have identified millions of genetic variations between people. Some of those differences probably play a role in diseases.

For Nicolas Altemose, a postdoctoral researcher at the University of California, Berkeley, who worked on the team, the view of the complete human genome feels something like the close-up pictures of Pluto from the New Horizons space probe.

He says: “You could see every crater, you could see every colour, from something that we only had the blurriest understanding of before. This has just been an absolute dream come true.”

Experts who were not involved in the project said it will enable scientists to explore the human genome in much greater detail. Large chunks of the genome that had been simply blank are now deciphered so clearly that scientists can start studying them in earnest.

By the late 1970s… tools were so crude that hunting down a single gene could take up an entire career

Yukiko Yamashita, a developmental biologist at the Whitehead Institute for Biomedical Research at Massachusetts Institute of Technology, says: “The fruit of this sequencing effort is amazing.”

A century ago, scientists knew that genes were spread across 23 pairs of chromosomes, but these strange, wormlike microscopic structures remained largely a mystery.

By the late 1970s, scientists had gained the ability to pinpoint a few individual human genes and decode their sequence but their tools were so crude that hunting down a single gene could take up an entire career.

Towards the end of the 20th century, an international network of geneticists decided to try to sequence all the DNA in our chromosomes. The Human Genome Project was an audacious undertaking, given how much there was to sequence. Scientists knew that the twin strands of DNA in our cells contained roughly three billion pairs of letters – a text long enough to fill hundreds of books.

When that team began its work, the best technology the scientists could use sequenced bits of DNA just a few dozen “letters”, or bases, long. Researchers were left to put them together like the pieces of a vast jigsaw puzzle. To assemble the puzzle, they looked for fragments with identical ends, meaning that they came from overlapping portions of the genome. It took years for them to gradually assemble the sequenced fragments into larger swathes.

The White House announced in 2000 that scientists had finished the first draft of the human genome and details of the project were published the following year. But long stretches of the genome remained unknown, while scientists struggled to figure out where millions of other bases belonged.

It turned out that the genome was a very hard puzzle to put together from small pieces. Many of our genes exist as multiple copies that are nearly identical to each other. Sometimes the different copies carry out different jobs. Other copies – known as pseudogenes – are disabled by mutations. A short fragment of DNA from one gene might fit just as well into the others.

And genes only make up a small percentage of the genome. The rest of it can be even more baffling. Much of the genome is made up of virus-like stretches of DNA that exist largely just to make new copies of themselves that get inserted back into the genome.

In the early 2000s, scientists got a little better at putting together the genome puzzle from its tiny pieces. They made more fragments, read them more accurately and developed new computer programs to assemble them into bigger chunks of the genome.

Periodically, researchers would unveil the latest, best draft of the human genome – known as the reference genome. Scientists used the reference genome as a guide for their own sequencing efforts. For example, clinical geneticists would catalog disease-causing mutations by comparing genes from patients to the reference genome.

The newest reference genome came out in 2013. It was a lot better than the first draft but it was a long way from complete; eight per cent of it was simply blank.

Michael Schatz, a computational biologist at Johns Hopkins University, says: “There’s basically an entire human chromosome that had gone missing.”

In 2019, two scientists – Adam Phillippy, a computational biologist at the US National Human Genome Research Institute, Maryland, and Karen Miga, a geneticist at the University of California, Santa Cruz – founded the Telomere-to-Telomere Consortium to complete the genome.

Phillippy admitted that part of his motivation for such an audacious project was that the missing gaps annoyed him. He says: “They were just really bugging me. You take a beautiful landscape puzzle, pull out a hundred pieces and look at it – that’s very bothersome to a perfectionist.”

They discovered more than two million new spots in the genome where people differ

Phillippy and Miga put out a call for scientists to join them to finish the puzzle. They ended up with 99 working directly on sequencing the human genome and dozens more pitching in to make sense of the data. The researchers worked remotely through the pandemic, coordinating their efforts over Slack, a messaging app.

Miga says: “It was a surprisingly nice ant colony.”

The consortium took advantage of new machines that can read stretches of DNA reaching tens of thousands of bases long. The researchers also invented techniques to figure out where particularly mysterious repeating sequences belonged in a genome.

All told, the scientists added or fixed more than 200 million base pairs in the reference genome. They can now say with confidence that the human genome measures 3.05 billion base pairs long.

Within those new sequences of DNA, the scientists discovered more than 2,000 new genes. Most appear to be disabled by mutations but 115 of them look as if they can produce proteins – the function of which scientists may need years to figure out. The consortium now estimates that the human genome contains 19,969 protein-coding genes.

With a complete genome finally assembled, the researchers could take a better look at the variation in DNA from one person to the next. They discovered more than two million new spots in the genome where people differ. Using the new genome also helped them to avoid identifying disease-linked mutations where none actually exists.

This article originally appeared in The New York Times.

Join our commenting forum

Join thought-provoking conversations, follow other Independent readers and see their replies

Comments

Thank you for registering

Please refresh the page or navigate to another page on the site to be automatically logged inPlease refresh your browser to be logged in