Lost in the sea of genes

Large tracts of human DNA appear to serve no purpose, but if so, why have they not disappeared through evolution? And why do puffer fish have exactly the same 'junk'?
Click to follow
The Independent Online
At the Sanger Centre in Cambridge, Greg Elgar keeps puffer fish. Despite being highly poisonous, they're a delicacy in Japanese cuisine; but Dr Elgar wants them not for culinary or aquatic interest. He wants to sequence their genes. While doing this, he has found evidence that may help solve a 20-year-old mystery of the structure of most genes.

Genes contain instructions for producing amino acids, which are then strung together into proteins. The instructions can be thought of as words made up from a four-letter DNA alphabet - A, T, C and G (for the base pairs of nucleic acids adenine, thymine, cytosine and guanine). String three particular letters together and they form a meaningful word known as a "codon" - an instruction which tells the cell to make a particular amino acid. However, only 20 of the 64 possible three-letter sequences actually code for an amino acid. Large stretches of DNA are, apparently, junk - they do not contain sequences of base pairs that code for amino acids. If DNA were a book, it would mostly read !@pounds $$$%^* ()&*&%^((^%pounds !!!^%$^%&$&

However, small parts do make sense - the genes. Yet even there, in many species a gene resembles a word divided up by meaningless letters, as if someone had taken the word "elephant" and written "elexymtygmaphant". The recognisable parts, such as "ele" and "phant", are known as exons, because in the DNA they "express" the protein - that is, they consist of one or more codons. The apparently meaningless intervening sequences - quite separate from the "junk" - are called "introns".

By the 1970s, scientists knew that the protein products of a bacterial gene correspond exactly to the gene's DNA sequence, just as the word "elephant" when read aloud corresponds to the written word. However, in 1977, molecular biologists investigating viral DNA found the first indications that the absence of introns actually makes bacteria unusual.

We now know that the genes of most organisms have introns - in fact, introns comprise by far the largest part of any mammalian gene. For example, the gene involved in the human genetic disorder Duchenne muscular dystrophy is 2.5 million base pairs long. But only 13,000 of these bases, fragmented into 79 exons, actually code for the protein dystrophin. In other words, only 0.5 per cent of the DMD gene actually makes the gene's product.

Introns thus seem to make genes very inefficient. In order to produce its protein, the entire gene has first to be transcribed into an DNA molecule from which, by virtue of the reading of a sort of punctuation between introns and exons, the introns are removed. The protein is then assembled from its constituent amino acids. The obvious question, and the mystery, is - why do we have introns?

One possibility is that they are just junk, like the base pairs outside the genes appear to be. (Though there are suggestions that such "junk" actually serves a useful purpose, such as providing a buffer against environmental damage to DNA. A mutation in the junk does not affect the genes, and so reduces the chance of harmful mutations arising, by this thinking.)

However, Greg Elgar and his team have made a key discovery - the puffer fish has just as many introns as we do. And it's not just the number of introns; their position, in relation to exons, is astonishingly consistent with that in humans. Elgar adds: "We have never found a case of a missing intron in a puffer fish."

From the evolutionary point of view, this is truly surprising. Evolutionary theory predicts that something functionally unimportant would gradually be lost over thousands or millions of years - think of the human appendix.

We would expect exons to be conserved, because of their role in creating proteins. However, since introns have no obvious biological function, and the last common ancestor of the puffer fish and humans lived 450 million years ago, one would think that enough time has passed for the introns to have diverged between species if they served no purpose. After all, we haven't kept our gills.

Instead, the number of introns is consistent across vertebrate species, but their size can vary enormously. As noted above, the human dystrophin gene has 2.5 million base pairs, but in the puffer fish, it has only 30,000; the difference is in the size of the introns. This is not just because we are more complex than puffer fish - frogs have 30 times more DNA than humans. The difference, again, is mainly in intron size.

All of which indicates that introns cannot just be junk. But what could their evolutionary role possibly be? Nineteen years ago, Walter Gilbert, at Harvard University, proposed a theory he called "exon shuffling". According to this, having introns permits genetic variation in evolution, while preserving the biological function of exons. Dr Elgar explains it like this: "Imagine a gene as a string of beads. The exons are the beads, and the introns are the string between them. The introns act as spacing elements between exons. When chromosomes break and rejoin, the DNA sequence can be broken and rejoined anywhere. Having introns increases the chances that a random break occurs in the introns. This means that breaks and rejoins are less likely to disturb the exons. They will alter their order, but not their function."

So exon shuffling allows new genetic combinations to be offered for the scrutiny of natural selection while (usually) leaving exons intact, since on average the exons form only one in 200 of the points. The idea is that every so often a new combination may throw up the coding sequence for a viable new protein - rather like a child's flip book, with an elephant on the first split page and a crocodile on the second. Turn the lower half of the first page and "elephant" becomes "eledile", and a new creature emerges. In DNA terms, you get a new protein.

Exon shuffling does have its critics. According to Arlin Stoltzfus, an evolutionary biologist at Dalhousie University, Nova Scotia, "The selective advantage of exon shuffling may be a consequence of having introns, not a reason for their first existing."

Certainly, we don't know how introns originated. One idea is that they are relatively recent, arising after the divergence from bacteria about three billion years ago. The opposing view is that introns are evolutionary relics from an original common ancestor, marking the places where exons were joined to form the first modern-sized proteins. Stoltzfus adds that exon shuffling also fails to explain how introns have increased in number in some genes. "We need better methods for inferring history from present patterns of diversity," he comments.

But in its favour, DNA sequence comparisons between species provide evidence for exon shuffling. They show that some genes have components from different sources. A gene in the fly Drosophila consists of four exons fused into one new functional gene. Other large genes with many components, such as the genes involved in blood coagulation and in the immune response, have particular exons corresponding to specific blocks of protein which share a common ancestor with exons in other genes. Exon shuffling may also explain the variation in intron size. Different species probably have to balance the advantage of evolving new protein against the disadvantages of carrying surplus genetic baggage. But it seems a small price to pay for the possible benefitsn