Science: A spelling test for genes

New research says that our basic DNA contains far more `misspellings' than previously thought. This is bad news for drugs companies which want to develop gene-specific treatments. By Charles Arthur
Click to follow
THE EXPECTED finish date for the Human Genome Project, which aims to map the 100,000 or so genes in human DNA, is now only a few years away. But anyone who thinks that 2003 will mean the end of gene mapping should think again. Research published yesterday suggests that getting to the stage where we really know about everybody's genes is actually much further away than was previously thought.

Two papers in the July issue of the journal Nature Genetics suggest that in human populations, there are between 240,000 and a million subtle variations in the coding of genes that are involved in human disease. Previously, that figure was reckoned to be about 200,000 - a big, but still more manageable number.

In medical parlance, those variations are known as "SNPs", for single nucleotide polymorphisms. If you liken the four nucleotide "letters" of DNA to that of an alphabet, then we are all books containing roughly the same words - but with millions of us having different spellings for what should be the same words. SNPs (often called "snips") are defined as variations which appear in at least one per cent of a given population.

In some cases, the misspelling makes no difference to the meaning of the "sentence" in the book (in real life, the protein coded for by the gene). In other cases, it has a huge effect, the equivalent of substituting "now" for "not" in the phrase "We are not ready to surrender".

For example, sickle-cell anaemia is the result of a SNP, in which an adenine ("A") nucleotide replaces a thymine ("T") in a gene that specifies how to make haemoglobin in red blood cells. The effect of that tiny difference is to render those cells prone to painful collapses - though also to confer some useful resistance to malaria.

The broader theories of links between disease and genes suggest that if your "book" contains a particular word spelt in a particular way, your susceptibility to a particular illness will be changed. Three variations in the "spelling" of a gene called apolipoprotein E are associated with the greatest risk of Alzheimer's disease.

But the new research implies that the number of different spellings is so huge that the task of matching the variations to their effects will make the unravelling of the genome, the biggest project in molecular biology, seem like a warm-up.

The HGP essentially passes over rare SNPs, by producing a sequence that is the "consensus" of each position in the genes - that is, the most commonly found nucleotide. The one for sickle-cell anaemia will probably not be recorded in the main sequence, though it should appear in the parallel Human Genome Diversity Project, which is collecting DNA from at least 100 populations on each continent.

But given that SNPs are thought to have links to disease, pharmaceuticals companies want to be able to identify them in order to make diagnoses. The multi-million dollar question is: how many do they need to look for?

In the journal, two separate groups of researchers identified variations in a total of nearly 200 genes believed to be involved in heart disease, diabetes and schizophrenia. The findings, they said, are an important step in discovering how genes make people susceptible to disease.

"It's a fairly good look at how human genes vary between people. We actually vary quite a bit more than I had expected," says Dr Aravinda Chakravarti of the Case Western Reserve University School of Medicine in Ohio, who led one of the studies looking at 75 different genes known to be involved in high blood pressure in 40 Africans and 34 Americans of European origin.

Chakravarti's team, including a company called Affymetrix which makes a "gene chip" sequencer, extrapolated from their findings to estimate the total number of SNPs in human genes as close to a million. But a second team, led by Dr Eric Lander of the Whitehead Institute at the Massachusetts Institute of Technology, Boston, estimated the figure at 240,000 to 400,000, after looking at the occurrence in four geographically spread populations of SNPs in 106 genes suspected of involvement in coronary artery disease, type II diabetes and schizophrenia.

Most differences in SNPs had nothing to do with ethnic groups, although there was a small number of variations that did seem to be linked with race. "It's a reflection of time rather than any intrinsic biological difference between them," Chakravarti said.

For example, because Native Americans lived separately from other populations for up to 30,000 years, they would have developed different SNPs from other Asian groups.

SNPs arise through mutation in the reproductive cells, either caused by cosmic radiation and other outside events, or by DNA copying errors. To some researchers, they are like a ticking clock that offers clues to the age of a piece of DNA. It is by the analysis of the difference in SNPs between two groups - say, Africans and Caucasians - that one can estimate how long ago the two had a common ancestor. As soon as they diverged, the descendants began to pick up their own SNPs.

Why are SNPs important in medicine? Because the sequence of base pairs in the gene determines which amino acid is slotted into place as the protein is assembled. In sickle-cell anaemia, specifying one amino acid rather than another alters the eventual physical shape of the protein. Then, because cell function is so finely tuned, the difference in shape means that the protein's function and effectiveness is altered.

Alternatively, the difference in the gene's sequence may affect how much of its protein is produced. Nobody yet knows exactly how SNPs affect gene expression.

Chakravarti's team was looking at their connection to disease. They found 874 different SNPs that might cause high blood pressure. More than half caused a change in the protein controlled by the gene, which would indicate a biological effect.

"Half of them probably have some biological function (because) half of all the variation we see changes the protein sequence," Chakravarti said. "Some of it may lead to trivial changes ... Some may lead to fairly drastic levels of function."

Between 35 and 45 per cent of cases of high blood pressure are inherited, and knowing which genes are vulnerable may be useful to doctors and patients. Some genes point to disease more clearly than others: for example, single mutations in a gene known as BRCA1 make a woman clearly more susceptible to breast cancer. But SNPs in the BRCA1 gene may affect the level of susceptibility; again, nobody knows. Other cancers may be caused by mutations in several genes; in the case of high blood pressure, it is clear that many genes are involved.

The problem with the huge number of SNPs is that everybody has different ones. For drug companies who want to set up DNA diagnosis kits, in which our DNA would be analysed to predict future illnesses and treatments, it poses a problem which can only be tackled by co-operation rather than competition.

Thus in April, 10 international drug companies said they would co-operate in finding SNPs and would make the information public so as to avoid battles over who owns such information. They hope to develop "personalized" drugs and treatments based on the information.

They may have a long haul ahead. Dr Leonid Kruglyak of the Fred Hutchinson Cancer Research Center in Seattle, Washington, predicted last month that at least half a million SNPs will be needed as a companion to the Human Genome Sequence before it can be considered a truly useful tool for drugs research and development.

"It implies we're going to have to work that much harder," said Dr Kruglyak. "The technology really needs to be developed much further than it is today."

But he offers some hope when the task is complete: it will "ultimately shed light on what makes each person genetically unique and thus particularly vulnerable to certain diseases, or immune to certain drugs."

But Dr Lander is prepared to be realistic. He says that if you really want a cheap, quick way to find out what is likely to kill you, and when it will happen, look at your parents' ages and causes of death. After all, you're carrying their genes - and, quite likely, exactly the same SNPs.

Comments