Biological spot-the-difference

How do scientists compare one species with another? Richard Dawkins explains
Many people are puzzled about what is meant when scientists say that humans and chimpanzees share 98.5 per cent of their DNA. What does this figure actually represent and how has it been calculated?

In the case of the proportion of shared DNA in different species, the estimation method is usually some version of a technique known as "DNA hybridisation". It relies upon the fact that DNA strands in the famous double helix, once separated, tend to join up with complementary strands.

If you mix human DNA with, say, lizard DNA you'll get far less pairing off ("DNA hybridisation") than if you mix human DNA with chimpanzee DNA, which is in turn less than if you mix human with human DNA. Now, all paired- off DNA separates again if you heat it. It's like melting a block of ice. The "melting point" - the temperature to which you have to heat it, in order to break it apart - depends upon how many bonds there are. Human/human hybrid DNA needs a higher temperature than human/chimp hybrid DNA, which in turn needs a higher temperature than human/lizard DNA.

By measuring the "melting point" of hybrid DNA you can estimate the proportion of the genetic code shared between the two hybridised strands. There's difficulty hidden in the details, but this is essentially how it is done. Like any other estimation procedure, it doesn't require you to have total information. We know approximately what percentage of our DNA is shared by two species, without having to know the precise DNA sequence of either of them.

When people say "humans and chimpanzees share 98.5 per cent of their DNA" what does this really mean? I'm asking what number is supposed to be 98.5 per cent of what other number? Do we share 98.5 per cent of our genes, 98.5 per cent of our nucleotide bases, or what?

You can see that the larger the unit we consider, the smaller the shared percentage will be. For instance, if we could measure the proportion of my chromosomes that are identical with yours, the answer is zero. If we narrow down to a smaller unit - the gene in the sense of the sequence of DNA responsible for one protein molecule - the answer will be greater than zero but smaller than the approximately 98.5 per cent which you get if you count nucleotides binding together in a DNA hybridisation test.

The following parallel is helpful. Suppose we have two versions of a book: an American edition with spellings and phrases suitable for an American readership, and one published for the British market. What percentage of their chapters are identical? Probably zero, for it takes only one discrepancy in the chapter to break the identity. What percentage of their words are identical? The percentage will be well up in the nineties. If you line the two texts up side by side and compare them letter by letter, what percentage of the letters will be identical? Even higher. But notice that if you do the lining up naively, a single missing letter will cause all subsequent letters (until a mistake in the other direction occurs) to mismatch, because they are a step out. It is clearly unfair to let the estimate of errors be inflated in this way. This is one of the beauties of the DNA hybridisation method. It copes with this problem. It's as if you estimated the overlap between our two books as follows. Chop thousands of copies of both editions into irregular fragments. Then stir the fragments in a huge cauldron - actually a computer programmed to count the proportion of fragments that stick together because they match. Although it provides only an estimate of genetic relatedness, DNA hybridisation is a nice method of comparing one species with another.

Richard Dawkins is the Charles Simonyi professor of public understanding of scienceat the University of Oxford. His latest book is `Unweaving the Rainbow' (Penguin Books).