What is now public knowledge is that since 2008, through a computational method known as reverse genetic engineering, an individual person can be identified by analyzing and comparing the millions of combinations of the individual’s DNA. Moreover, an individual’s DNA can be compared against the millions of combinations of a population of individuals to determine the characteristics of a population, such as what percentage of intravenous drug users were infected with hepatitis, to create a profile that lists the individual’s weight, diabetic status, and age all without permission or consent. [1, 2]
In an article published in 2013 in the New York Times, Gina Kolata reported how computer scientists developed an algorithm that can identify a person from databases supposedly stripped of personal information. [3] In her article, she reported on how, by using sophisticated algorithms, researchers could track down and identify individuals using only their genetic information or DNA, age, and the state where they lived. The researchers were able to complete the task in hours and provide linkage to relatives of the individuals identified. Today,the exponential growth in genetic and genomic testing results and clinical information coupled with the power of machine learning (ML) and artificial intelligence(AI) will allow the task of reverse genetic engineering to be reduced to minutes instead of hours. The implications of Generative applied ML methods to identify individuals from fragments of their DNA willsignificantly impact the data privacy and security landscape challenges.
The exponential growth in genomic testing and results data, coupled with ML and AI, will have profound implications for data privacy and security, necessitating a greater emphasis on electronic consent, patient engagement, and shared decision-making. While applied reverse genetic engineering is advancing drug discovery and personalized medicine, the potential exploitation of genetic databases by emerging tools in generative artificial intelligence underscores the need for robust consent mechanisms that empower patients to control the use and purpose of their genetic data. Additionally, the application of advanced technologies will significantly impact reverse genetic and genomic engineering by accelerating research, automating certain tasks, and enabling the generation of synthetic genetic sequences.
- Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Muehling J, Pearson JV, Stephan DA, Nelson SF, Craig DW. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 2008 Aug 29;4(8):e1000167. doi: 10.1371/journal.pgen.1000167. PMID: 18769715; PMCID: PMC2516199.
- Schadt, Eric E., et al. “An integrative genomics approach to infer causal associationsbetween gene expression and disease.” Nature Genetics 37.7 (2005): 710-717.
- Kolata, G. Poking Holes in Genetic Privacy. New York Times, June 16, 2013 (accessed January 1, 2024). https://www.nytimes.com/2013/06/18/science/poking-holes-in-the-privacy-of-dna.html