A former "white hat" hacker hired by banks to test their computer security has been able to discover the names of individuals who volunteered to take part in genome studies on the condition of anonymity.
Nearly 50 people who had agreed to have their genomes sequenced and placed on scientific databases provided that their names would not be used were identified by Yaniv Erlich as part of an exercise to test the vulnerability of personal data held in DNA libraries.
The revelation will prove embarrassing for organisations who have promoted the widespread use of genome sequencing in medical research. Last month, the Government announced a plan to sequence the genomes of 100,000 Britons to boost the discovery of new drugs and treatments.
Dr Erlich used computer algorithms to link DNA sequences, particularly of the male Y chromosome, with surnames and other personal data held on genealogy databases as part of a deliberate attempt to test the security of the “anonymised” information held on genome databases.
“This is an important result that points out the potential for breaches of privacy in genomic studies,” said Dr Erlich, a fellow of the Whitehead Institute for Biomedical Research in Cambridge, Massachusetts, whose hacking study is published in the journal Science.
“Our aim is to better illuminate the current status of identifiability of genetic data. More knowledge empowers participants to weigh the risk and benefits and make more informed decisions when considering whether to share their own data,” Dr Erlich said.
“We also hope that this study will eventually result in better security algorithms, better policy guidelines, and better legislation to help mitigate some of the risks,” he said.
The number of people having their full genomes sequenced has risen rapidly in recent years as the cost of DNA sequencing has come down. Scientists around the world are collaborating on a number of international projects to sequence thousands of genomes, often with the guarantee of anonymity to the volunteers who take part.
However, using little more than an internet connection and some clever software, Dr Erlich and his colleagues were able to match specific DNA sequences in publicly-accessible genome databases with items of personal information from other public sources, which led to the positive identifications,
Civil liberties groups have raised concerns that DNA data gathered for scientific or medical reasons under conditions of confidentiality could be used to identify individuals and even to link people’s names to genetic disorders or medical predispositions hidden within the DNA sequences of their genomes.
“We show that if, for example, your Uncle Dave submitted his DNA to a genetic genealogy database, you could be identified. In fact, even your fourth cousin Patrick, whom you’ve never met, could identify you if his DNA is in the database, as long as he is paternally related to you,” said Melissa Gymrek, a member of Dr Erlich’s team.
Laura Rodrigues, director of policy at the US National Human Genome Research Institute in Bethesda, Maryland, said she was “surprised but not flabbergasted” by the ability of the researchers to break the anonymity of people who have provided DNA to genome scientists.
“We didn’t realise how easy it was to access this information,” Dr Rodrigues toild Science. The ages of the volunteers have been immediately removed from the genome databases to lessen the chances of their identities being discovered, she said.
David Page, director of the Whitehead Institute, said that Dr Erlich’s discovery is a timely reminder in an age of increasing DNA sequencing of the vulnerability of genome databases to privacy breeches.