Your Data-Driven Family Tree

August 24, 2018

Family tree projects are a staple of elementary school social studies. Students come home to pepper their family members with questions about ancestors and collect snippets of information from old photo albums — a name and a birthdate, perhaps even a birthplace. The final result is a hand-drawn tree with sprawling branches neatly drawn on a piece of poster board.

The internet has disrupted this process. Now, students can reach far further for information by searching through historical documents online and connecting with family members from around the world. The emergence of online ancestry and genealogy companies built to answer the age-old question “who am I and where do I come from?” has given rise to a lucrative commercial industry over the last two decades.

Websites like Ancestry and 23andMe have made millions of dollars offering what can’t be gleaned from open-source research. Using their services does cost more than money, however. According to their advertisements, in exchange for a fee — and personal DNA — people will receive neatly packaged insights into their ethnicity and lineage.

While many perceive these kits as amusing (and potentially provocative) Christmas presents for family members, their use catapulted into headlines this year with the revelation that they had played an essential role in catching the infamous Golden State Killer. Now, concerns around the privacy and protection of the most intimate personal data — biological information — are becoming a part of mainstream consciousness. For Canadians, the discussion became particularly salient just last month, when news broke that the Canada Border Services Agency (CBSA) was using ancestry websites to investigate the identity of migrants.

In an age where the misuse of data and prevalence of data leaks have become a common beat in just about any newsroom, it’s not difficult to find coverage about how personally identifiable information (PII) is collected, used and protected. PII can be used to distinguish or trace one’s identity, but few data sets are as sensitive or telling as unique biological data. As long as DNA sharing, identifying and collecting remains part of the mainstream, improved public literacy around who has access to DNA and the information it contains is imperative.

A Sea of Genealogy Services

If understanding DNA data uses and rights seems difficult, that’s because it is. The confusion is partly due to the sheer number of companies offering a variety of services that require your genetic information. A simple Google search of “online DNA testing” reveals how expansive the commercial industry has become, with an estimated 40 companies to choose from. Leaders like Ancestry and 23andMe dominate the space, having tested seven million and three million people, respectively, as of early 2018. The growing number of the services is reflective of a booming demand: in 2017, more people bought into the services than in all previous years combined.

For the most part, genealogy services market the opportunity to explore your ethnicity and even the possibility of discovering a long-lost relative. The popularity of the private companies has also spawned public, open-source genomic databases, which offer spaces for genealogy professionals and hobbyists alike to connect, explore and discover matches across services. These public databases don’t always provide the DNA testing service, so using them may mean first obtaining the individual’s genetic profile from another company’s kit.

Can Law Enforcement Access DNA Data?

Genealogy service users likely don’t offer their personal information with the notion that it might help a criminal investigation or risk incriminating a family member, but the fine print of these services makes it explicit that these uses are possible. The private databases held by these private companies are accessible to law enforcement — many companies state that they will disclose information if they receive valid law enforcement requests. Ancestry even publishes an annual transparency report on the subject.

However, neither the Golden State Killer case nor the reported CBSA use falls within that category. Instead, it was a public, volunteer-run database called GEDmatch — not its private service predecessors — that provided US law enforcement officials with the key they needed to solve the decades-old cold case. By submitting DNA from a past crime scene to GEDmatch, investigators were able to find the suspect’s distant relative matches, construct a family tree and pursue new leads.

In the CBSA case, the agency submitted the actual DNA of migrants in detention to private company services for testing, in order to corroborate their claims of nationality. In the instances reported by Vice, consent was obtained to do the testing, but the submission was done by agency officials and the results were not shared with the individual.

Neither instance required a court order or violated the terms of service associated with the sites. GEDmatch’s policy specifically allows raw DNA data from law enforcement to “identify a perpetrator of a violent crime against another individual.” Many of the companies allow DNA to be submitted by a third party as long as they have consent from the individual to use and submit it.

While cases like these may not currently affect the majority of consumers, they introduce important questions on how unique biological data is managed with regard to law enforcement, the private sector and other unanticipated third parties. Who can consent to the use of DNA data? How accurate is the data? What purposes can the data be used for, and how do we protect the privacy of a person’s most sensitive information?

Consent and DNA Data

The decision to upload DNA to a private or public database is different from the decision to share information on social media sites, in two important ways. First, DNA can’t be changed or deleted. Second, because DNA includes genetic information that connects an individual to their family members, one person’s decision automatically implicates a whole network of other people.

The latter difference makes the concept of consent more complicated. According to Canadian privacy lawyer David Fraser, one person’s decision to upload their DNA data can have a ripple effect that their entire lineage didn’t — and can’t reasonably — consent to. “I can’t think of any other kind of information that has so many different layers and so many different stakeholders to it,” he added. Encouraging literacy and informed consumers becomes especially difficult when the privacy policies and terms of use statements on websites can be upward of 15 pages of legal jargon.

Out of the CBSA case emerges different legal questions on consent. Calling the work of the CBSA “extorted consent,” a lawyer interviewed by Vice questioned whether or not a migrant in detention was able to freely make a decision about DNA testing, given the individual’s vulnerable positioning and how a denial of consent might be interpreted. The complexities of consent are particularly important to explore, considering the loss of autonomy over DNA data that can come with uploading it online.

Accuracy and Representation

Despite DNA testing’s increasing popularity, academics, journalists and scholars alike have warned of inaccuracy and misrepresentation for at least a decade. Since each site uses its own methodology and dataset, one person submitting to multiple services is likely to see inconsistent results. This is particularly concerning given the likelihood that law enforcement will continue to use these services.

Further, the methodologies and databases behind the company services are often private, and the data on public databases such as GEDmatch are not held to the same evidentiary standards that national DNA databases are. Canada’s National DNA Data Bank, for example, is governed by legislation on what DNA can be inputted, follows guidelines under the DNA Identification Act, and has an accredited testing laboratory.

At least one study has also already identified the high possibility of false positives from the tests run by these companies. Others have reported on the dangers of coincidental matches or false convictions that can come from bad DNA tests more generally, highlighting the repercussions that can emerge with assumptions that all types of testing and results are held to the same standard. The challenge of representation also comes into play when discussing accuracy. If a larger majority of the data set is from European or North American populations, those of underrepresented heritage will find the results less accurate.

While experts encourage consumers to interpret the tests with an understanding that they are far from definitive, these inaccuracies and issues of diversity are concerning when considering how they might be used to inform law enforcement decisions. That said, it’s not always easy to know how the findings are used, or the extent of their use. According to a statement from a CBSA spokesperson to the CBC, the agency does not discuss the details of its investigation tactics publicly, “as doing so could render them ineffective.”

Beyond Law Enforcement

Many firms profit from personal data by selling it to third parties, such as pharmaceutical companies, for research purposes. Just this year, GlaxoSmithKline announced a US$300 million investment in 23andMe for research purposes. While sites such as Ancestry and 23andMe provide opt-out options for members, many opting in might not consider the loss of autonomy over their DNA data that comes with committing to participating in the company’s research projects.

In addition to the third-party data use that consumers have opted in to, consumers should consider the possibility of a data breach — DNA data could be obtained by unknown actors with nefarious intentions. This risk is particularly pertinent as reports emerge that it’s possible to re-identify people based on their genetic data, even if the data itself has been stripped of other personal identifiers such as name, email or birthdate.

While Canada’s Genetic Non-Discrimination Act exists to protect individuals’ genetic privacy and individuals from genetic discrimination — which can impact everything from getting a job to getting insurance — it’s unclear what effect the unregulated public release of DNA data could have on citizens.

What Next?

All of this isn’t to say that DNA testing services shouldn’t exist — improved access to DNA data presents some personal and public benefits.

However, given the extent of potential uses and abuses regarding DNA data, dealing with threats to the protection and privacy of this information is pressing. For starters, as Canada launches consultations regarding a national data strategy, special consideration should be made to protecting genetic data.

The government should also consider the relevance of older laws — such as the Criminal Code’s provisions on the collection and storage of DNA. Just as rapid, on-site DNA analysis technologies or familial DNA searching raise these types of questions and demand conversation, so too should the use of online DNA testing services — including a debate on whether or not they’re legal or ethical.

Another option to explore might be guidelines around restricted-use of the data, as one company has suggested in response to CBSA’s use. Since sales of the direct-to-consumer tests began, regulation has emerged in some areas. In 2013, 23andMe limited the results it shares on health diagnoses for common genetic diseases in their tests, after the United States Food and Drug Administration became concerned that the information could be inaccurate or misinterpreted.

DNA testing services — the new, data-driven means of building one’s family tree — have offered an opportunity to get a more in-depth look at family history — and arguably, these services do a better job than bygone school projects. However, as we learn with so many new technologies, the benefits of advancement shouldn’t overshadow the potentially irreversible harms.

The opinions expressed in this article/multimedia are those of the author(s) and do not necessarily reflect the views of CIGI or its Board of Directors.

About the Author

Nikki Gladstone is a RightsCon Program and Community Manager and a Master of Global Affairs (MGA) graduate from the Munk School of Global Affairs at the University of Toronto, where she focused on the intersection of technology, innovation, and human rights.