Recent from talks
Nothing was collected or created yet.
DNA database
View on WikipediaA DNA database or DNA databank is a database of DNA profiles which can be used in the analysis of genetic diseases, genetic fingerprinting for criminology, or genetic genealogy. DNA databases may be public or private, the largest ones being national DNA databases.
DNA databases are often employed in forensic investigations. When a match is made from a national DNA database to link a crime scene to a person whose DNA profile is stored on a database, that link is often referred to as a cold hit. A cold hit is of particular value in linking a specific person to a crime scene, but is of less evidential value than a DNA match made without the use of a DNA database.[1] Research shows that DNA databases of criminal offenders reduce crime rates.[2][3]
Types
[edit]Forensic
[edit]A forensic database is a centralized DNA database for storing DNA profiles of individuals that enables searching and comparing of DNA samples collected from a crime scene against stored profiles. The most important function of the forensic database is to produce matches between the suspected individual and crime scene bio-markers, and then provides evidence to support criminal investigations, and also leads to identify potential suspects in the criminal investigation. Majority of the National DNA databases are used for forensic purposes.[4]
The Interpol DNA database is used in criminal investigations. Interpol maintains an automated DNA database called DNA Gateway that contains DNA profiles submitted by member countries collected from crime scenes, missing persons, and unidentified bodies.[5] The DNA Gateway was established in 2002, and at the end of 2013, it had more than 140,000 DNA profiles from 69 member countries. Unlike other DNA databases, DNA Gateway is only used for information sharing and comparison, it does not link a DNA profile to any individual, and the physical or psychological conditions of an individual are not included in the database.[5]
Genealogical
[edit]A national or forensic DNA database is not available for non-police purposes. DNA profiles can also be used for genealogical purposes, so that a separate genetic genealogy database needs to be created that stores DNA profiles of genealogical DNA test results. GenBank is a public genetic genealogy database that stores genome sequences submitted by many genetic genealogists. Until now, GenBank has contained large number of DNA sequences gained from more than 140,000 registered organizations, and is updated every day to ensure a uniform and comprehensive collection of sequence information. These databases are mainly obtained from individual laboratories or large-scale sequencing projects. The files stored in GenBank are divided into different groups, such as BCT (bacterial), VRL (viruses), PRI (primates)...etc. People can access GenBank from NCBI's retrieval system, and then use “BLAST” function to identify a certain sequence within the GenBank or to find the similarities between two sequences.[6]
Medical
[edit]A medical DNA database is a DNA database of medically relevant genetic variations. It collects an individual's DNA which can reflect their medical records and lifestyle details. Through recording DNA profiles, scientists may find out the interactions between the genetic environment and occurrence of certain diseases (such as cardiovascular disease or cancer), and thus finding some new drugs or effective treatments in controlling these diseases. It is often collaborated with the National Health Service.[7]
National
[edit]A national DNA database is a DNA database maintained by the government for storing DNA profiles of its population. Each DNA profile based on PCR uses STR (Short Tandem Repeats) analysis. They are generally used for forensic purposes, including searching and matching DNA profiles of potential criminal suspects.[8]
In 2009 Interpol reported 54 police national DNA databases in the world and 26 more countries planned to start one.[9] In Europe Interpol reported there were 31 national DNA databases and six more planned.[9] The European Network of Forensic Science Institutes (ENFSI) DNA working group made 33 recommendations in 2014 for DNA database management and guidelines for auditing DNA databases.[10] Other countries have adopted privately developed DNA databases, such as Qatar.[11]
Typically, a tiny subset of the individual's genome is sampled from 13 or 16 regions that have high individuation.
United Kingdom
[edit]The first national DNA database in the United Kingdom was established in April 1995, called National DNA Database (NDNAD). By 2006, it contained 2.7 million DNA profiles (about 5.2% of the UK population), as well as other information from individuals and crime scenes.[12] in 2020 it had 6.6 million profiles (5.6 million individuals excluding duplicates).[13][14][15] The information is stored in the form of a digital code, which is based on the nomenclature of each STR.[16] In 1995 the database originally had 6 STR markers for each profile, from 1999 10 markers, and from 2014, 16 core markers and a gender identifier. Scotland has used 21 STR loci, two Y-DNA markers and a gender identifier since 2014.[17] In the UK, police have wide-ranging powers to take DNA samples and retain them if the subject is convicted of a recordable offence.[18][19] As the large amount of DNA profiles which have been stored in NDNAD, "cold hits" may happen during the DNA matching, which means finding an unexpected match between an individual's DNA profile and an unsolved crime-scene DNA profile. This can introduce a new suspect into the investigation, thus helping to solve the old cases.[20]
In England and Wales, anyone arrested on suspicion of a recordable offence must submit a DNA sample, the profile of which is then stored on the DNA database. Those not charged or not found guilty have their DNA data deleted within a specified period of time.[21] In Scotland, the law similarly requires the DNA profiles of most people who are acquitted be removed from the database.
New Zealand
[edit]New Zealand was the second country to set up a DNA database.[22] In 2019 The New Zealand DNA Profile Databank held 40,000 DNA profiles and 200,000 samples.[23][24]
United States
[edit]The United States national DNA database is called Combined DNA Index System (CODIS). It is maintained at three levels: national, state and local. Each level implemented its own DNA index system. The national DNA index system (NDIS) allows DNA profiles to be exchanged and compared between participated laboratories nationally. Each state DNA index system (SDIS) allows DNA profiles to be exchanged and compared between the laboratories of various states and the local DNA index system (LDIS) allows DNA profiles collected at local sites and uploaded to SDIS and NDIS.
CODIS software integrates and connects all the DNA index systems at the three levels. CODIS is installed on each participating laboratory site and uses a standalone network known as Criminal Justice Information Systems Wide Area Network (CJIS WAN)[8][25] to connect to other laboratories. In order to decrease the number of irrelevant matches at NDIS, the Convicted Offender Index requires all 13 CODIS STRs to be present for a profile upload. Forensic profiles only require 10 of the STRs to be present for an upload.
As of 2011, over 9 million records were held within CODIS.[26] As of March 2011, 361,176 forensic profiles and 9,404,747 offender profiles have been accumulated,[27] making it the largest DNA database in the world. As of the same date, CODIS has produced over 138,700 matches to requests, assisting in more than 133,400 investigations.[28]
The growing public approval of DNA databases has seen the creation and expansion of many states' own DNA databases. Political measures such as California Proposition 69 (2004), which increased the scope of the DNA database, have already met with a significant increase in numbers of investigations aided. Forty-nine states in the USA, all apart from Idaho, store DNA profiles of violent offenders, and many also store profiles of suspects.[29] A 2017 study showed that DNA databases in U.S. states "deter crime by profiled offenders, reduce crime rates, and are more cost-effective than traditional law enforcement tools".[3]
CODIS is also used to help find missing persons and identify human remains. It is connected to the National Missing Persons DNA Database; samples provided by family members are sequenced by the University of North Texas Center for Human Identification,[30] which also runs the National Missing and Unidentified Persons System. UNTCHI can sequence both nuclear and mitochondrial DNA.[31]
The Department of Defense maintains a DNA database to identify the remains of service members. The Department of Defense Serum Repository maintains more than 50,000,000 records, primarily to assist in the identification of human remains. Submission of DNA samples is mandatory for US servicemen, but the database also includes information on military dependents. The National Defense Authorization Act of 2003 provided a means for federal courts or military judges to order the use of the DNA information collected to be made available for the purpose of investigation or prosecution of a felony, or any sexual offense, for which no other source of DNA information is reasonably available.[32]
Australia
[edit]The Australian national DNA database is called the National Criminal Investigation DNA Database (NCIDD). By July 2018, it contained 837,000+ DNA profiles.[33][34] The database used nine STR loci and a sex gene for analysis, and this was increased to 18 core markers in 2013.[35] NCIDD combines all forensic data, including DNA profiles, advanced bio-metrics or cold cases.
Canada
[edit]The Canadian national DNA database is called the National DNA Data Bank (NDDB) which was established in 1998 but first used in 2000.[36] The legislation that Parliament enacted to govern the use of this technology within the criminal justice system has been found by Canadian courts to be respectful of the constitutional and privacy rights of suspects, and of persons found guilty of designated offences.[37]
On December 11, 1999, The Canadian Government agreed upon the DNA Identification Act. This would allow a Canadian DNA data bank to be created and amended for the criminal code. This provides a mechanism for judges to request the offender to provide blood, buccal swabs, or hair samples from DNA profiles. This legislation became official on June 29, 2000. Canadian police has been using forensic DNA evidence for over a decade. It has become one of the most powerful tools available to law enforcement agencies for the administration of justice.[38]
NDDB consists of two indexes: the Convicted Offender Index (COI) and National Crime Scene Index (CSI-nat). There is also the Local Crime Scene Index (CSI-loc) which is maintained by local laboratories but not NDDB as local DNA profiles do not meet NDDB collection criteria. Another National Crime Scene Index (CSI-nat) is a collection of three labs operated by Royal Canadian Mounted Police (RCMP), Laboratory Sciences Judiciary Medicine Legal (LSJML) and Center of Forensic Sciences (CFS).
Dubai
[edit]In 2017 Dubai announced an initiative called Dubai 10X which was planned to create 'disruptive innovation' into the country.[39] One of the projects in this initiative was a DNA database that would collect the genomes of all 3 million citizens of the country over a 10-year period. It was intended to use the data base for finding genetic causes of diseases and creating personalised medical treatments.[40]
Germany
[edit]Germany set up its DNA database for the German Federal Police (BKA) in 1998.[41][42][43][44] In late 2010, the database contained DNA profiles of over 700,000 individuals and in September 2016 it contained 1,162,304 entries.[45] On 23 May 2011 in the "Stop the DNA Collection Frenzy!" campaign various civil rights and data protection organizations handed an open letter[46] to the German minister of justice Sabine Leutheusser-Schnarrenberger asking her to take action in order to stop the "preventive expansion of DNA data-collection" and the "preemptive use of mere suspicions and of the state apparatus against individuals" and to cancel projects of international exchange of DNA data at the European and transatlantic level.[47]
Israel
[edit]The Israeli national DNA database is called the Israel Police DNA Index System (IPDIS)[48] which was established in 2007, and has a collection of more than 135,000 DNA profiles. The collection includes DNA profiles from suspected and accused persons and convicted offenders. The Israeli database also include an “elimination bank” of profiles from laboratory staff and other police personnel who may have contact with the forensic evidence in the course of their work.
In order to handle the high throughput processing and analysis of DNA samples from FTA cards, the Israeli Police DNA database has established a semi-automated program LIMS, which enables a small number of police to finish processing a large number of samples in a relatively small period of time, and it is also responsible for the future tracking of samples.
Kuwait
[edit]The Kuwaiti government passed a law in July 2015 requiring all citizens and permanent residents (4.2 million people) to have their DNA taken for a national database.[49] The reason for this law was security concerns after the ISIS suicide bombing of the Imam Sadiq mosque.[50] They planned to finish collecting the DNA by September 2016 which outside observers thought was optimistic.[51] In October 2017 the Kuwait constitutional court struck down the law saying it was an invasion of personal privacy and the project was cancelled.[52]
Brazil
[edit]In 1998, the Forensic DNA Research Institute of Federal District Civil Police created DNA databases of sexual assault evidence.[53] In 2012, Brazil approved a national law establishing DNA databases at state and national levels regarding DNA typing of individuals convicted of violent crimes.[53] Following the decree of the Presidency of the Republic of Brazil in 2013, which regulates the 2012 law, Brazil began using CODIS in addition to the DNA databases of sexual assault evidence to solve sexual assault crimes in Brazil.[53]
France
[edit]France set up the DNA database called FNAEG in 1998. By December 2009, there were 1.27 million profiles on FNAEG.[54]
Russia
[edit]In Russia, scientific DNA testing is being actively carried out in order to study the genetic diversity of the peoples of Russia in the framework of the state task - to learn from DNA to determine the probable territory of human origin based on data on the majority of the peoples of the country. On June 16, 2017, the Council of Ministers of the Union State of Belarus and Russia adopted Resolution No. 26, in which it approved the scientific and technical program of the Union State "Development of innovative genogeographic and genomic technologies for identification of personality and individual characteristics of a person based on the study of gene pools of the regions of the Union State" (DNA - identification).
Within the framework of this program, it is also planned to include the peoples of neighboring countries, which are the main source of migration, into the genogeographic study on the basis of existing collections.
In accordance with the Federal Law of December 3, 2008 No. 242-FZ "On state genomic registration in the Russian Federation", voluntary state genomic registration of citizens of the Russian Federation, as well as foreign citizens and stateless persons living or temporarily staying in the territory of the Russian Federation on the basis of a written application and on a paid basis. Genomic information obtained as a result of state genomic registration is used, among other things, for the purpose of establishing family relationships of wanted (identified) persons. The form of keeping records of data on genomic registration of citizens is the Federal Genomic Information Database (FBDGI).
Articles 10 and 11 of the Federal Law of July 27, 2006 No. 152-FZ "On Personal Data" provide that the processing of special categories of personal data relating to race, nationality, political views, religious or philosophical beliefs, health status, intimate life is allowed if it is necessary in connection with the implementation of international agreements of the Russian Federation on readmission and is carried out in accordance with the legislation of the Russian Federation on citizenship of the Russian Federation. Information characterizing the physiological and biological characteristics of a person, on the basis of which it is possible to establish his identity (biometric personal data), can be processed without the consent of the subject of personal data in connection with the implementation of international agreements of the Russian Federation on readmission, administration of justice and execution of judicial acts, compulsory state fingerprinting registration, as well as in cases stipulated by the legislation of the Russian Federation on defense, security, anti-terrorism, transport security, anti-corruption, operational investigative activities, public service, as well as in cases stipulated by the criminal-executive legislation of Russia, the legislation of Russia on the procedure for leaving the Russian Federation and entering the Russian Federation, citizenship of the Russian Federation and notaries.[55]
Other European countries
[edit]In comparison with the other European countries, The Netherlands is the largest collector of DNA profiles of its citizens. At this moment the DNA databank at the Netherlands Forensic Institute contains the DNA profiles of over 316,000 Dutch citizens.[56]
Contrary to the situation in most other European countries, the Dutch police have wide-ranging powers to take and retain DNA samples if a subject is convicted of a recordable offence, except when the conviction only involves paying a fine. If a subject refuses, for example because of privacy concerns, the Dutch police will use force.
In Sweden, only the DNA profiles of criminals who have spent more than two years in prison are stored. In Norway and Germany, court orders are required, and are only available, respectively, for serious offenders and for those convicted of certain offences and who are likely to reoffend. Austria started a criminal DNA database in 1997[57] and Italy also set one up in 2016[58][59] Switzerland started a temporary criminal DNA database in 2000 and confirmed it in law in 2005.[60]
In 2005 the incoming Portuguese government proposed to introduce a DNA database of the entire population of Portugal.[61] However, after informed debate including opinion from the Portuguese Ethics Council[62] the database introduced was of just the criminal population.[63]
Genuity Science (formerly Genomics Medicine Ireland) is an Irish life sciences company that was founded in 2015 to create a scientific platform to perform genomic studies and generate new disease prevention strategies and treatments. The company was founded by a group of life science entrepreneurs, investors and researchers and its scientific platform is based on work by Amgen’s Icelandic subsidiary, deCODE genetics, which has pioneered genomic population health studies.[64] The company is building a genomic database which will include data from about 10 per cent of the Irish population, including patients with various diseases and healthy people.[65] The idea of a private company owning public DNA data has raised concerns, with an Irish Times editorial stating: "To date, Ireland seems to have adopted an entirely commercial approach to genomic medicine. This approach places at risk the free availability of genomic data for scientific research that could benefit patients."[66] The paper's editorial pointed out that this is in stark contrast to the approach the U.K. has taken, which is the publicly and charitably funded 100,000 Genomes Project being carried out by Genomics England.
China
[edit]By 2020, Chinese police had collected 80 million DNA profiles.[67][68] There have been concerns that China may be using DNA data not just for crime solving, but for tracking activists, including Uyghurs.[69]
Chinese have begun a $9 billion program for genetic science studying, Fire-Eye has DNA labs in over 20 countries.[70]
India
[edit]India announced it will launch its genomic database by fall 2019.[71] In the first phase of "Genome India" the genomic data of 10,000 Indians will be catalogued. The Department of Biotechnology (DBT) has initiated the project. The first private DNA bank in India is in Lucknow[72] - the capital of Indian State Uttar Pradesh. Unlike a research center, this is available for Public to store their DNA by paying a minimum amount and four drops of blood.
Corporate
[edit]This section needs expansion. You can help by adding to it. (December 2016) |
- Ancestry was reported to have collected 14 million DNA samples as of November 2018.[73]
- 23andme's DNA database contained genetic information of over nine million people worldwide by 2019.[74][75] The company explores selling the "anonymous aggregated genetic data" to other researchers and pharmaceutical companies for research purposes if patients give their consent.[76][77][78][79][80] Ahmad Hariri, professor of psychology and neuroscience at Duke University who has been using 23andMe in his research since 2009 states that the most important aspect of the company's new service is that it makes genetic research accessible and relatively cheap for scientists.[76] A study that identified 15 genome sites linked to depression in 23andMe's database lead to a surge in demands to access the repository with 23andMe fielding nearly 20 requests to access the depression data in the two weeks after publication of the paper.[81]
- My Heritage said their database had 2.5 million profiles in 2019.[74]
- Family Tree DNA was reported they had about two million people in their database in 2019.[74]
- Fire-Eye
Compression
[edit][82] DNA databases occupy more storage when compared to other non DNA databases due to the enormous size of each DNA sequence. Every year DNA databases grow exponentially. This poses a major challenge to the storage, data transfer, retrieval and search of these databases. To address these challenges DNA databases are compressed to save storage space and bandwidth during the data transfers. They are decompressed during search and retrieval. Various compression algorithms are used to compress and decompress. The efficiency of any compression algorithm depends how well and fast it compresses and decompresses, which is generally measured in compression ratio. The greater the compression ratio, the better the efficiency of an algorithm. At the same time, the speed of compression and decompression are also considered for evaluation.
DNA sequences contain palindromic repetitions of A, C, T, G. Compression of these sequences involve locating and encoding these repetitions and decoding them during decompression.
Some approaches used to encode and decode are:
- Huffman Encoding
- Adaptive Huffman Encoding
- Arithmetic coding
- Arithmetic coding
- Context tree weighting (CTW) method
The compression algorithms listed below may use one of the above encoding approaches to compress and decompress DNA database
- Compression using Redundancy of DNA sets (COMRAD)[83][84]
- Relative Lempel-Ziv (RLZ)[84]
- GenCompress
- BioCompress
- DNACompress
- CTW+LZ
In 2012, a team of scientists from Johns Hopkins University published the first genetic compression algorithm that does not rely on external genetic databases for compression. HAPZIPPER was tailored for HapMap data and achieves over 20-fold compression (95% reduction in file size), providing 2- to 4-fold better compression much faster than leading general-purpose compression utilities.[85]
Genomic sequence compression algorithms, also known as DNA sequence compressors, explore the fact that DNA sequences have characteristic properties, such as inverted repeats. The most successful compressors are XM and GeCo.[86] For eukaryotes XM is slightly better in compression ratio, though for sequences larger than 100 MB its computational requirements are impractical.
Medicine
[edit]Many countries collect newborn blood samples to screen for diseases mainly with a genetic basis. Mainly these are destroyed soon after testing. In some countries the dried blood (and the DNA) is retained for later testing.
In Denmark the Danish Newborn Screening Biobank at Statens Serum Institut keeps a blood sample from people born after 1981. The purpose is to test for phenylketonuria and other diseases.[87] However, it is also used for DNA profiling to identify deceased and suspected criminals.[88] Parents can request that the blood sample of their newborn be destroyed after the result of the test is known.
Privacy issues
[edit]Critics of DNA databases warn that the various uses of the technology can pose a threat to individual civil liberties.[89][90] Personal information included in genetic material, such as markers that identify various genetic diseases, physical and behavioral traits, could be used for discriminatory profiling and its collection may constitute an invasion of privacy.[91][92][93] Also, DNA can be used to establish paternity and whether or not a child is adopted. Nowadays, the privacy and security issues of DNA database has caused huge attention. Some people are afraid that their personal DNA information will be let out easily, others may define their DNA profiles recording in the Databases as a sense of "criminal", and being falsely accused in a crime can lead to having a "criminal" record for the rest of their lives.
UK laws in 2001 and 2003 allowed DNA profiles to be taken immediately after a person was arrested and kept in a Database even if the suspect was later acquitted.[94] In response to public unease at these provisions,[94] the UK later changed this by passing the Protection of Freedoms Act 2012 which required that those suspects not charged or found not guilty would have their DNA data deleted from the Database.[21]
In European countries which have established a DNA database, there are some measures which are being used to protect the privacy of individuals, more specifically, some criteria to help removing the DNA profiles from the databases. Among the 22 European countries which have been analyzed, most of the countries will record the DNA profiles of suspects or those who have committed serious crimes. For some countries (like Belgium and France) may remove the criminal's profile after 30–40 years, because these “criminal investigation” database are no longer needed. Most of the countries will delete the suspect's profile after they are acquitted...etc. All the countries have a completed legislation to largely avoid the privacy issues which may occur during the use of DNA database.[4] Public discussion around the introduction of advanced forensic techniques (such as genetic genealogy using public genealogy databases and DNA phenotyping approaches) has been limited, disjointed, and unfocused, and raises issues of privacy and consent that may warrant additional legal protections to be established.[95]
Privacy issues surrounding DNA databases not only means privacy is threatened in collecting and analyzing DNA samples, it also exists in protecting and storing this important personal information. As the DNA profiles can be stored indefinitely in DNA database, it has raised concerns that these DNA samples can be used for new and unidentified purposes.[96] With the increase of the users who access the DNA database, people are worried about their information being let out or shared inappropriately, for example, their DNA profile may be shared with others such as law enforcement agencies or countries without individual consent.[97]
The application of DNA databases have been expanded into two controversial areas: arrestees and familial searching. An arrestee is a person arrested for a crime and who has not yet been convicted for that offense. Currently, 21 states in the United States have passed legislation that allows law enforcement to take DNA from an arrestee and enter it into the state's CODIS DNA database to see if that person has a criminal record or can be linked to any unsolved crimes. In familial searching, the DNA database is used to look for partial matches that would be expected between close family members. This technology can be used to link crimes to the family members of suspects and thereby help identify a suspect when the perpetrator has no DNA sample in the database.[98][99]
Furthermore, DNA databases could fall into the wrong hands due to data breaches or data sharing.
DNA collection and human rights
[edit]In a judgement in December 2008, the European Court of Human Rights ruled that two British men should not have had their DNA and fingerprints retained by police saying that retention "could not be regarded as necessary in a democratic society".[100]
The DNA fingerprinting pioneer Professor Sir Alec Jeffreys condemned UK government plans to keep the genetic details of hundreds of thousands of innocent people in England and Wales for up to 12 years. Jeffreys said he was "disappointed" with the proposals, which came after a European court ruled that the current policy breaches people's right to privacy. Jefferys said "It seems to be as about as minimal a response to the European court of human rights judgment as one could conceive. There is a presumption not of innocence but of future guilt here … which I find very disturbing indeed".[101]
Effects on crime
[edit]A 2021 study found that registration of Danish criminal offenders in a DNA database substantially reduced the probability of re-offending, as well as increased the likelihood that re-offenders were identified if they committed future crimes.[2]
A 2017 study in the American Economic Journal: Applied Economics showed that databases of criminal offenders' DNA profiles in US states "deter crime by profiled offenders, reduce crime rates, and are more cost-effective than traditional law enforcement tools."[3]
Monozygotic twins
[edit]Monozygotic twins share around 99.99% of their DNA, while other siblings share around 50%. Some next generation sequencing tools are capable of detecting rare de novo mutations in only one of the twins (detectable in rare single nucleotide polymorphisms).[102] Most DNA testing tools would not detect these rare SNPs in most twins.
Each person's DNA is unique to them to the slight exception of identical (monozygotic and monospermotic) twins, who start out from the identical genetic line of DNA but during the twinning event have incredibly small mutations which can be detected now (for all intents and purposes, compared to all other humans and even to theoretical "clones, [who would not share the same uterus nor experience the same mutations pre-twinning event]" identical twins have more identical DNA than is probably possible to achieve between any other two humans). Tiny differences between identical twins can now (2014) be detected by next generation sequencing. For current fiscally available testing, "identical" twins cannot be easily differentiated by the most common DNA testing, but it has been shown to be possible. While other siblings (including fraternal twins) share about 50% of their DNA, monozygotic twins share virtually 99.99%. Beyond these more recently discovered twinning-event mutation disparities, since 2008 it has been known that people who are identical twins also each have their own set of copy number variants, which can be thought of as the number of copies they each personally exhibit for certain sections of DNA.[103]
See also
[edit]References
[edit]- ^ Rose & Goos: DNA - A Practical Guide (Carswell Publications, Toronto).
- ^ a b Anker, Anne Sofie Tegner; Doleac, Jennifer L.; Landersø, Rasmus (2021). "The Effects of DNA Databases on the Deterrence and Detection of Offenders". American Economic Journal: Applied Economics. 13 (4): 194–225. doi:10.1257/app.20190207. ISSN 1945-7782. S2CID 239235452.
- ^ a b c Doleac, Jennifer L. (2017-01-01). "The Effects of DNA Databases on Crime". American Economic Journal: Applied Economics. 9 (1): 165–201. CiteSeerX 10.1.1.269.6210. doi:10.1257/app.20150043. ISSN 1945-7782.
- ^ a b Santos, Filipe; Machado, Helena; Silva, Susana (3 December 2013). "Forensic DNA databases in European countries: is size linked to performance?". Life Sciences, Society and Policy. 9 (1): 12. doi:10.1186/2195-7819-9-12. PMC 4513018.
- ^ a b "DNA / Forensics / INTERPOL expertise / Internet / Home - INTERPOL". www.interpol.int.
- ^ Benson, Dennis A.; Cavanaugh, Mark; Clark, Karen; Karsch-Mizrachi, Ilene; Lipman, David J.; Ostell, James; Sayers, Eric W. (27 November 2012). "GenBank". Nucleic Acids Res. 41 (Database issue): D36–42. doi:10.1093/nar/gks1195. PMC 3531190. PMID 23193287 – via nar.oxfordjournals.org.
- ^ Hagmann, M (2000). "UK plans major medical DNA database". Science. 287 (5456): 1184b–1184. doi:10.1126/science.287.5456.1184b. PMID 10712143. S2CID 70954894.
- ^ a b Butler, John M. (27 July 2011). Advanced Topics in Forensic DNA Typing: Methodology. Academic Press. ISBN 978-0-12-387823-6.
- ^ a b "Global DNA Profiling Survey; Results and Analysis" (PDF). Interpol DNA Unit. 2009. p. Appendix 1. Archived from the original (PDF) on 4 March 2016. Retrieved 12 October 2015.
- ^ www.enfsi.eu https://web.archive.org/web/20140911172729/http://www.enfsi.eu/sites/default/files/documents/enfsi_2014_document_on_dna-database_management_0.pdf. Archived from the original (PDF) on September 11, 2014.
{{cite web}}: Missing or empty|title=(help) - ^ Wickenheiser, R. A. (2022). "Expanding DNA database effectiveness". Forensic Science International. Synergy. 4 100226. doi:10.1016/j.fsisyn.2022.100226. PMC 8991311. PMID 35402888.
- ^ Linacre, A (2003). "The UK National DNA Database". The Lancet. 361 (9372): 1841–1842. doi:10.1016/s0140-6736(03)13539-8. PMID 12788567. S2CID 31070032.
- ^ National DNA Database Strategy Board Biennial Report 2018–2020 (PDF). UK Home Office; Her Majesty's Stationery Office. September 2020. p. 10. ISBN 978-1-5286-1916-5. Retrieved 6 November 2020.
- ^ "National DNA Database statistics, Q1 2015 to 2016". National DNA Database statistics. UK Government Home Office. Retrieved 11 October 2015.
- ^ Gav Ireland; Simon Lewis; Dan Fookes. "Statistics". npia.police.uk. NPIA. Archived from the original on 2012-06-17. Retrieved 2012-08-04.
- ^ Gill, P. (February 2002). "Role of Short Tandem Repeat DNA in Forensic Casework in the UK—Past, Present, and Future Perspectives" (PDF). BioTechniques. 32 (2): 366–385. doi:10.2144/02322rv01. PMID 11848414.
- ^ Forensic DNA analysis : a primer for courts. London: Royal Society. 2017. ISBN 978-1-78252-301-7. OCLC 1039675621.
- ^ Bowcott, Owen (13 May 2015). "Retention of offenders' DNA profiles not illegal, supreme court rules". The Guardian. Retrieved 11 October 2015.
- ^ "Identification by body samples and impressions—4.4 Section 82: Restrictions on use and destruction of fingerprints and samples". WikiCrimeLine. Archived from the original on 2007-02-23.
- ^ Wallace, Helen (1 July 2006). "The UK National DNA Database". EMBO Reports. 7 (1S): S26 – S30. doi:10.1038/sj.embor.7400727. PMC 1490298. PMID 16819445.
- ^ a b "Protection of Freedoms Act 2012: DNA and fingerprint provisions". Protection of Freedoms Act 2012: how DNA and fingerprint evidence is protected in law. UK Government Home Office. 4 April 2014. Retrieved 11 October 2015.
- ^ Forensic Science
- ^ "About the DNA databank ESR". Institute of Environmental Science and Research, ESR, New Zealand Government. Archived from the original on 2020-09-22. Retrieved 2020-11-07.
- ^ "Reviewing the DNA Database" (PDF). ESR NZ Institute of Environmental Science and Research - Crime Scene Intelligence Newsletter. November 2019. p. 3. Archived from the original (PDF) on 2021-01-31. Retrieved 2020-11-07.
- ^ CODIS Brochure
- ^ "Laboratory Services". FBI.
- ^ "CODIS - NDIS Statistics".
- ^ "CODIS - Investigations Aided". www.fbi.gov. Archived from the original on April 6, 2009.
- ^ "Supreme Court says police can take DNA swabs after arrest". CBS News.
- ^ "Family DNA Collection Protocol" (PDF). Archived from the original (PDF) on 2010-12-16. Retrieved 2018-05-01.
- ^ "Missing Persons Unit". Archived from the original on 2018-05-02. Retrieved 2018-05-01.
- ^ "Archived copy". Archived from the original on 2018-09-04. Retrieved 2018-09-17.
{{cite web}}: CS1 maint: archived copy as title (link) - ^ Commission, Australian Criminal Intelligence (18 July 2018). "National Criminal Investigation DNA Database". Archived from the original on 9 September 2018. Retrieved 9 September 2018.
- ^ Mobbs, Jonathan D. (2001). "Crimtrac-technology and detection". 4th National Outlook Symposium on Crime in Australia, New Crimes or New Responses. Canberra.
- ^ Curtis, Caitlin; Hereward, James (August 29, 2017). "From the crime scene to the courtroom: the journey of a DNA sample". The Conversation. Retrieved October 14, 2017.
- ^ Milot, E; Lecomte, MM; Germain, H; Crispino, F (2013). "The National DNA Data Bank of Canada: a Quebecer perspective". Front Genet. 4: 249. doi:10.3389/fgene.2013.00249. PMC 3834530. PMID 24312124.
- ^ "DNA Data Bank". www.publicsafety.gc.ca. Archived from the original on June 25, 2013.
- ^ "National DNA Data Bank". Royal Canadian Mounted Police. 2001-04-22.
- ^ Sutton, Mark (2017-02-14). "HH Sheikh Mohammed launches 10x initiative". ITP.net. Retrieved 2018-03-02.
- ^ Treviño, Julissa (2018-03-20). "Dubai Wants to DNA Test Its Millions of Residents to Prevent Genetic Disease". Smithsonian. Retrieved 2018-03-02.
- ^ "GeneWatch UK - Germany". genewatch.org. Retrieved 29 December 2016.
- ^ "Germany's DNA database". Archived from the original on 29 December 2016. Retrieved 29 December 2016.
- ^ "National DNA Intelligence Databases in Europe – Report on the Current Situation" (PDF). Retrieved 29 December 2016.
- ^ Peerenboom, E. (1 June 1998). "Central criminal DNA database created in Germany". Nature Biotechnology. 16 (6): 510–511. doi:10.1038/nbt0698-510. ISSN 1087-0156. PMID 9624672. S2CID 28662677.
- ^ Käppner, Joachim (8 December 2016). "Justiz: Verräterische Proben" (in German). Süddeutsche Zeitung. Retrieved 29 December 2016.
- ^ "Ope Letter Stop the police's DNA collection frenzy!" (PDF). Retrieved 29 December 2016.
- ^ Schultz, Susanne. ""Stop the DNA Collection Frenzy!": Expansion of Germany's DNA Database". Forensic Genetics Policy Initiative. Retrieved 29 December 2016.
- ^ Zamir, Ashira; Dell’Ariccia-Carmon, Aviva; Zaken, Neomi; Oz, Carla (1 March 2012). "The Israel DNA database—The establishment of a rapid, semi-automated analysis system". Forensic Science International: Genetics. 6 (2): 286–289. doi:10.1016/j.fsigen.2011.06.003. PMID 21727053.
- ^ Visser, Nick (14 July 2015). "Kuwait To Institute Mandatory DNA Testing For All Residents". Huffington Post. Retrieved 10 October 2015.
- ^ "ISIL claims responsibility for Kuwait Shia mosque blast". Al Jazeera. 27 June 2015. Retrieved 10 October 2015.
- ^ Field, Dawn (3 September 2015). "Kuwait's war on ISIS and DNA". Oxford University Press Blog. Retrieved 10 October 2015.
- ^ Coghlan, Andy (2017-10-09). "Kuwait's plans for mandatory DNA database have been cancelled". New Scientist. Retrieved 2018-03-02.
- ^ a b c Ferreira, Samuel T.G.; Paula, Karla A.; Maia, Flávia A.; Svidizinski, Arthur E.; Amaral, Marinã R.; Diniz, Silmara A.; Siqueira, Maria E.; Moraes, Adriana V. (2015). "The use of DNA database of biological evidence from sexual assaults in criminal investigations: A successful experience in Brasília, Brazil". Forensic Science International: Genetics Supplement Series. 5: 595–597. doi:10.1016/j.fsigss.2015.09.235.
- ^ Raoult, Eric (2010-01-12). "Question No: 68468" (in French). 13th legislature. Response 2010-04-06.
- ^ Mirolyubova, Svetlana (2021). "Проблемы применения ДНК-теста в целях воссоединения семьи и репатриации". Surgut State University Journal. 2021. № 1(31): 91–100. doi:10.34822/2312-3419-2021-1-91-100.
- ^ Veiligheid, Ministerie van Justitie en (2013-05-14). "Home - Nederlandse DNA-databank". dnadatabank.forensischinstituut.nl (in Dutch). Archived from the original on 2019-09-22. Retrieved 2019-09-22.
- ^ Hindmash, Richard; Prainsack, Barbara, eds. (2010-08-12). Genetic Suspects: Global Governance of Forensic DNA Profiling and Databasing. Cambridge University Press. p. 154. ISBN 978-0521519434.
- ^ Negri, Giovanni (2016-03-26). "Italy approves DNA database to fight crime". Il Sole 24 Ore, English edition. Archived from the original on 2017-11-07. Retrieved 2017-05-11.
- ^ "Italy creates national DNA database to enhance anti-terror fight". Jamaica Observer. 2016-03-26. Retrieved 2017-05-11.
- ^ Haas, C.; Voegeli, P.; Hess, M.; Kratzer, A.; Bär, W. (2006-04-01). "A new legal basis and communication platform for the Swiss DNA database". International Congress Series. Progress in Forensic Genetics 11Proceedings of the 21st International ISFG Congress held in Ponta Delgada, The Azores, Portugal between 13 and 16 September 2005. 1288: 734–736. doi:10.1016/j.ics.2005.11.040.
- ^ "Newropeans Magazine - The European Perspective. Preparing for the world of tomorrow". Archived from the original on 2015-03-18. Retrieved 2016-11-10.
- ^ "CNECV - Conselho Nacional de Ética para as Ciências da Vida". Archived from the original on 2016-11-11. Retrieved 2016-11-10.
- ^ Skinner, David (14 July 2010). "Sociology 52: 13: Machado and Silva: Forensic DNA in Portugal".
- ^ "Genuity Science | Genomic Data Insights to Power Discovery". Genuity Science. Retrieved 27 May 2021.
- ^ "Genomics: Exploring new horizons". The Irish Times. Retrieved 2020-11-07.
- ^ McConnell, David; Hardiman, Orla. "Ireland putting profit before people with genomic medicine strategy". The Irish Times. Retrieved 2020-11-07.
- ^ Wee, Sui-Lee (2020-07-30). "China Is Collecting DNA From Tens of Millions of Men and Boys, Using U.S. Equipment". The New York Times. ISSN 0362-4331. Retrieved 2020-11-07.
- ^ Qianwei, Wenxin Fan and Natasha Khan in Hong Kong and Liza Lin in (2017-12-27). "China Snares Innocent and Guilty Alike to Build World's Biggest DNA Database". Wall Street Journal. ISSN 0099-9660. Retrieved 2020-11-07.
- ^ Wee, Sui-Lee (21 February 2019). "China Uses DNA to Track Its People, with the Help of American Expertise". The New York Times.
- ^ Warrick, Joby; Brown, Cate. "China's quest for human genetic data spurs fears of a DNA arms race". Washington Post. Retrieved 2023-10-27.
- ^ RAJAGOPAL, DIVYA. "India to launch its 1st human genome cataloguing project". The Economic Times. Retrieved 2020-11-07.
- ^ "Uttar Pradesh - Google Search". www.google.com. Retrieved 2022-04-06.
- ^ Bursztynsky, Jessica (2019-02-12). "More than 26 million people shared their DNA with ancestry firms, allowing researchers to trace relationships between virtually all Americans: MIT". CNBC. Retrieved 2020-11-07.
- ^ a b c Regalado, Antonio (2019-02-11). "More than 26 million people have taken an at-home ancestry test". MIT Technology Review. Retrieved 2020-11-07.
- ^ "23andMe - Ancestry". 23andme.com. Retrieved 29 December 2016.
- ^ a b Potenza, Alessandra (13 July 2016). "23andMe wants researchers to use its kits, in a bid to expand its collection of genetic data". The Verge. Retrieved 29 December 2016.
- ^ "This Startup Will Sequence Your DNA, So You Can Contribute To Medical Research". Fast Company. 23 December 2016. Retrieved 29 December 2016.
- ^ Seife, Charles. "23andMe Is Terrifying, but Not for the Reasons the FDA Thinks". Scientific American. Retrieved 29 December 2016.
- ^ Zaleski, Andrew (22 June 2016). "This biotech start-up is betting your genes will yield the next wonder drug". CNBC. Retrieved 29 December 2016.
- ^ Regalado, Antonio. "How 23andMe turned your DNA into a $1 billion drug discovery machine". MIT Technology Review. Retrieved 29 December 2016.
- ^ "23andMe reports jump in requests for data in wake of Pfizer depression study | FierceBiotech". fiercebiotech.com. 22 August 2016. Retrieved 29 December 2016.
- ^ "Ateet Mehta & Bankim Patel, et al., 2010, "DNA Compression using Hash Based Data Structure", International Journal of Information Technology and Knowledge Management July–December 2010, Volume 2, No. 2, pp. 383–386" (PDF).
- ^ Biji, C.L.; Madhu, M.K.; Vishnu, V. (May 28, 2015). "Compression of Large genomic datasets using COMRAD on Parallel Computing Platform". Bioinformation. 11 (5): 267–271. doi:10.6026/97320630011267. PMC 4464544. PMID 26124572.
- ^ a b Kuruppu, S.S. (January 2012). Compression of Large DNA Databases (PDF) (PhD). The University of Melbourne.
- ^ Chanda, P.; Elhaik, E.; Bader, J.S. (2012). "HapZipper: sharing HapMap populations just got easier". Nucleic Acids Res. 40 (20): 1–7. doi:10.1093/nar/gks709. PMC 3488212. PMID 22844100.
- ^ Pratas, D.; Pinho, A. J.; Ferreira, P. J. S. G. (2016). Efficient compression of genomic sequences. Data Compression Conference. Snowbird, Utah.
- ^ "Siden er ikke blevet fundet / Page not found". www.ssi.dk.
{{cite web}}: Cite uses generic title (help) - ^ "Blodbank som forbryderalbum". 16 September 2007.
- ^ Jeffries, Stuart (27 October 2006). "Suspect nation". The Guardian.
- ^ Lemieux, Scott (March 23, 2012). "Are Police Building a Massive DNA Database?". AlterNet.
- ^ "DNA database 'breach of rights'". BBC News. 4 December 2008.
- ^ Curtis, Caitlin; Hereward, James (May 2, 2018). "DNA facial prediction could make protecting your privacy more difficult". The Conversation. Retrieved May 21, 2018.
- ^ Curtis, Caitlin; Hereward, James (December 4, 2017). "It's time to talk about who can access your digital genomic data". The Conversation. Retrieved May 21, 2018.
- ^ a b Wallace, H. M.; Jackson, A. R.; Gruber, J.; Thibedeau, A. D. (1 September 2014). "Forensic DNA databases–Ethical and legal standards: A global review". Egyptian Journal of Forensic Sciences. 4 (3): 57–63. doi:10.1016/j.ejfs.2014.04.002.
- ^ Curtis, Caitlin; Hereward, James; Mangelsdorf, Marie; Hussey, Karen; Devereux, John (18 December 2018). "Protecting trust in medical genetics in the new era of forensics". Genetics in Medicine. 21 (7): 1483–1485. doi:10.1038/s41436-018-0396-7. PMC 6752261. PMID 30559376.
- ^ Roman-Santos, Candice (2010). "Concerns Associated with Expanding DNA Databases". Hastings Science and Technology Law Journal. 2: 267.
- ^ Marten Youssef (October 2, 2009). "DNA databank proposal raises privacy concerns". The National.
- ^ "DNA Forensics". dnaforensics.com.
- ^ Compulsory DNA Collection: A Fourth Amendment Analysis Congressional Research Service
- ^ "UK | DNA database 'breach of rights'". BBC News. 2008-12-04. Retrieved 2012-08-04.
- ^ James Sturcke (2009-05-07). "DNA pioneer condemns plans to retain data on innocent | Politics | guardian.co.uk". London: Guardian. Retrieved 2012-08-04.
- ^ Weber-Lehmann, Jacqueline; Schilling, Elmar; Gradl, Georg; Richter, Daniel C.; Wiehler, Jens; Rolf, Burkhard (2014). "Finding the needle in the haystack: Differentiating "identical" twins in paternity testing and forensics by ultra-deep next generation sequencing". Forensic Science International: Genetics. 9: 42–46. doi:10.1016/j.fsigen.2013.10.015. ISSN 1872-4973. PMID 24528578.
- ^ Am J Hum Genet. 2008 Mar;82(3):763-71. doi: 10.1016/j.ajhg.2007.12.011. Epub 2008 Feb 14. Phenotypically concordant and discordant monozygotic twins display different DNA copy-number-variation profiles.
DNA database
View on GrokipediaDefinition and Fundamentals
Core Definition and Purpose
A DNA database is a centralized repository of DNA profiles generated from biological samples, such as blood, saliva, or tissue, which are analyzed to produce genetic identifiers suitable for comparison and matching. These profiles typically rely on short tandem repeat (STR) markers—regions of non-coding DNA that vary in length among individuals—to create a probabilistic match rather than a full genomic sequence, minimizing privacy risks while enabling high discrimination power. Unlike complete genome storage, such databases store hashed or abstracted data to facilitate forensic, investigative, or research applications without retaining raw sequences.[2] The core purpose of DNA databases originated in forensic science to support criminal justice by comparing crime scene evidence against profiles from convicted offenders, arrestees, or volunteers, thereby identifying perpetrators, linking serial crimes, or excluding non-matches to exonerate suspects. For instance, the U.S. Federal Bureau of Investigation's Combined DNA Index System (CODIS), operational since 1998, indexes over 14 million offender profiles and has generated more than 600,000 investigative hits as of 2023, demonstrating empirical efficacy in resolving cold cases and volume crimes like burglaries. Similarly, Interpol's DNA Gateway, launched in 2015, facilitates international exchanges to identify victims of disasters or transnational offenders, with over 280,000 profiles contributing to cross-border matches.[17][18] Beyond law enforcement, DNA databases serve ancillary objectives in human identification, such as tracing missing persons or disaster victims through kinship matching, and in research contexts to study population genetics or disease markers, though these expand from the foundational investigative role. Legislative frameworks, such as the U.S. DNA Identification Act of 1994, explicitly limit retention to convicted individuals or qualifying arrestees to balance utility against overreach, with expungement provisions for non-convictions ensuring causal focus on proven criminality rather than speculative surveillance. Empirical data indicate that larger databases proportionally increase hit rates—e.g., a 1% size increase correlates with higher solvability—but effectiveness hinges on sample quality and marker standardization, not mere accumulation.[19][20]DNA Profiling Methods
Short tandem repeat (STR) analysis constitutes the predominant method for generating DNA profiles stored in forensic databases worldwide, leveraging polymerase chain reaction (PCR) amplification to detect variations in the number of tandemly repeated short DNA sequences (typically 2–7 base pairs) at targeted loci.[21][22] These non-coding regions exhibit high polymorphism due to differences in repeat copy number, enabling discrimination among individuals with a match probability often below 1 in 10^18 for multi-locus profiles.[23][22] The process begins with DNA extraction from biological samples such as blood, semen, or epithelial cells, requiring as little as 1 nanogram for viable amplification.[24] Selected STR loci—standardized for interoperability across databases—are then amplified via multiplex PCR using fluorescently labeled primers, followed by capillary electrophoresis to separate and size fragments based on their electrophoretic mobility.[21][22] In the United States, the FBI's Combined DNA Index System (CODIS) mandates profiles from 20 core autosomal STR loci for national database submissions, an expansion from the original 13 loci established in 1997 to enhance discriminatory power and reduce adventitious matches.[25][6] These loci, primarily tetranucleotide repeats, include CSF1PO, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, D21S11, FGA, TH01, TPOX, and VWA, plus seven additional ones (D1S1656, D2S441, D2S1338, D10S1248, D12S391, D19S433, D22S1045, HPRT1) implemented in 2017.[26][25] Prior to STR adoption in the mid-1990s, restriction fragment length polymorphism (RFLP) analysis dominated, involving restriction enzyme digestion of DNA, Southern blotting, and hybridization with variable number tandem repeat (VNTR) probes to visualize band patterns on autoradiographs.[27] RFLP required 50–100 nanograms of high-molecular-weight DNA and weeks for processing, rendering it unsuitable for trace or degraded samples, which prompted the transition to PCR-STR for its sensitivity, speed (results in days), and automation potential.[27][28] Supplementary methods include Y-chromosome STR (Y-STR) typing for male-lineage tracing in databases, analyzing markers on the non-recombining Y chromosome to link patrilineal relatives, and mitochondrial DNA (mtDNA) sequencing for maternal lineage or degraded samples lacking nuclear DNA.[22] Single nucleotide polymorphism (SNP) typing, which interrogates biallelic variations, is increasingly explored for kinship analysis or low-quality evidence due to its robustness against degradation, though it offers lower per-locus discrimination than STRs and is not yet standard for core database indexing.[29][22] Whole-genome sequencing remains experimental for profiling, constrained by cost and data volume, with STR persisting as the benchmark for database efficiency and legal admissibility.[22]Technical Challenges in Data Management
Managing large volumes of DNA profiles poses significant storage challenges, as national forensic databases have expanded rapidly; for instance, the U.S. National DNA Index System (NDIS) component of CODIS contained over 24.8 million offender profiles and 1.4 million crime scene profiles as of 2025.[30] This growth, driven by mandatory collections from arrestees and convicts, requires petabyte-scale infrastructure to accommodate not only core short tandem repeat (STR) loci data but also associated metadata, electropherograms, and emerging massively parallel sequencing (MPS) outputs, which generate substantially larger datasets per sample.[31] Inadequate storage capacity can lead to backlogs in profile entry, delaying investigative matches.[32] Scalability issues arise from the computational demands of searching vast datasets efficiently, particularly with partial, mixed, or low-template profiles that increase the risk of adventitious (random) matches; European guidelines recommend calculating and reporting expected adventitious matches based on database size and profile completeness to mitigate false leads.[8] Systems like CODIS have addressed this by expanding from 13 to 20 STR loci in 2015, enhancing discriminatory power but necessitating software upgrades and re-analysis of legacy profiles, which strains resources in underfunded labs.[31] International exchanges, such as under the EU's Prüm framework involving 27 states, further complicate scalability due to varying profile formats and the need for automated, real-time hit notifications without overwhelming network bandwidth.[8] Ensuring data accuracy requires rigorous quality controls, as errors from manual allele calling, contamination, or null alleles can propagate false inclusions or exclusions; automation of allele designation and database imports is recommended to minimize human error, alongside validation of matches against original raw data.[8] Forensic standards mandate ISO/IEC 17025 accreditation for contributing labs and exclusion of complex mixtures (e.g., from more than two contributors) to reduce interpretive ambiguities, yet partial profiles from degraded evidence remain prevalent, demanding specialized search algorithms.[31] Elimination databases for lab personnel DNA help filter contamination artifacts, preventing erroneous entries into main indices.[8] Interoperability challenges stem from non-standardized loci sets and nomenclature across jurisdictions; while the European Standard Set (ESS) of 12 core loci facilitates Prüm comparisons, allowing one mismatch, discrepancies in additional markers or MPS-derived data hinder seamless integration.[8] Upgrading profiles to newer standards, such as incorporating expanded ESS loci, involves resource-intensive re-testing and database migrations, with risks of data loss during transitions.[31] Technical security measures must counter risks of breaches in these high-value targets, including encryption of stored profiles, role-based access controls, and regular backups to prevent unauthorized exfiltration or ransomware impacts; compliance with regulations like GDPR adds layers of audit logging for familial or investigative genetic genealogy searches.[8] De-identification proves difficult given DNA's uniqueness, enabling relative inference attacks even from anonymized aggregates, necessitating robust pseudonymization and query restrictions.[31]Historical Development
Origins and Early Adoption (1980s–1990s)
The technique of DNA fingerprinting, foundational to modern DNA databases, was developed by British geneticist Alec Jeffreys at the University of Leicester in September 1984, initially for studying genetic mutations and inheritance patterns using variable number tandem repeats (VNTRs) in minisatellite regions of the human genome.[33] This method enabled the creation of unique genetic profiles from small biological samples, such as blood or semen, by analyzing highly variable DNA segments that differ between individuals except identical twins.[34] Jeffreys' team refined the process into a practical forensic tool by 1985, with the first documented DNA profile generated in 1987 for immigration verification in the UK.[35] The inaugural forensic application occurred in 1986 during the investigation of the Narborough murders in Leicestershire, England, where Jeffreys' technique exonerated an initial suspect and identified serial rapist and murderer Colin Pitchfork through a familial match after systematic screening of local males.[34] This case demonstrated DNA profiling's evidentiary power, prompting its adoption by law enforcement agencies; by the late 1980s, UK police forces and the Forensic Science Service (FSS) integrated it into routine casework, though initial limitations in sample degradation and manual processing restricted scalability.[36] Early challenges included high costs and the need for large sample quantities, addressed partially by the advent of polymerase chain reaction (PCR) amplification in 1987, which enabled analysis from trace evidence.[37] Transitioning from ad hoc profiling to systematic databases began in the early 1990s amid growing conviction rates—UK FSS DNA matches contributed to over 100 arrests by 1994—driving legislative support for centralized storage.[36] The UK established the world's first national forensic DNA database, the National DNA Database (NDNAD), in April 1995 under the Criminal Procedure and Investigations Act, initially holding profiles from 250,000 individuals and crime scenes; the first database-generated match occurred within four months, linking a crime scene sample to a prior offender.[36] In the United States, state-level databanks emerged by 1989 in Virginia and later California, with the FBI launching a CODIS pilot program in 1990 involving 14 state and local labs to standardize profiles using restriction fragment length polymorphism (RFLP) initially, later shifting to short tandem repeats (STRs).[38] The Violent Crime Control and Law Enforcement Act of 1994 authorized federal expansion, reflecting bipartisan recognition of DNA's role in resolving over 1,000 U.S. cases by the mid-1990s, though implementation lagged until software interoperability improved.[39] Early adoption emphasized convicted offenders and serious felons, with privacy concerns prompting retention policies limited to criminal justice samples.[38]Expansion in the 2000s
In the United Kingdom, the National DNA Database (NDNAD) experienced rapid growth via the government-funded DNA Expansion Programme, initiated in April 2000 and concluding in March 2005 with over £300 million allocated to sample collection, laboratory capacity, and profile loading.[40][41] This initiative targeted profiles from all known active offenders, adding more than 2.25 million subject profiles and achieving the goal of 2.5 million total profiles by 2004, while quadrupling DNA-based detections in crimes.[40] Legislative changes, including provisions under the Criminal Justice and Police Act 2001 and subsequent expansions, permitted retention of DNA from individuals arrested for recordable offences regardless of conviction, contributing to the database's increase from about 793,000 subject profiles in March 2000 to over 3.4 million by March 2005.[42][43] In the United States, the federal DNA Analysis Backlog Elimination Act of 2000 marked a pivotal expansion of the FBI's Combined DNA Index System (CODIS), authorizing grants totaling hundreds of millions to state and local labs for processing backlogged samples and uploading profiles to the National DNA Index System (NDIS).[44][45] State laws broadened collection to include felony arrestees, certain misdemeanants, and sex offenders, driving NDIS offender profiles from roughly 700,000 in 2000 to over 5 million by 2007, with forensic profiles exceeding 200,000 by mid-decade, enabling tens of thousands of investigative leads.[46] This growth reflected coordinated federal-state efforts to standardize 13 core loci for interoperability and prioritize violent crime samples, though backlogs persisted due to surging submissions.[47] Globally, the 2000s saw proliferation of national databases, with Interpol launching its DNA Gateway in 2002 to facilitate standardized profile exchanges among member states using common short tandem repeat loci.[48] By 2009, 54 countries maintained operational forensic DNA databases, up from fewer than 20 a decade prior, including expansions in Australia (via the National Criminal Investigation DNA Database in 2001), Canada (National DNA Data Bank formalized in 2000), and several European nations aligning with EU Council Framework Decisions on data exchange.[49] This era's expansions were propelled by falling sequencing costs, improved automation, and policy shifts emphasizing DNA's evidentiary value in linking serial crimes, though varying retention rules highlighted disparities in scope and privacy safeguards across jurisdictions.[50]Modern Advancements and Integrations (2010s–Present)
In the 2010s, forensic DNA databases underwent significant expansions in core loci to enhance discriminatory power and facilitate international data sharing. The U.S. Federal Bureau of Investigation (FBI) expanded the Combined DNA Index System (CODIS) core short tandem repeat (STR) loci from 13 to 20 in 2012, enabling the analysis of more genetic markers for improved profile matching and compatibility with global standards.[51] This change contributed to a rise in CODIS hit rates from 47% to 58% over the subsequent decade, primarily driven by database growth rather than increases in crime scene profiles.[10] Concurrently, next-generation sequencing (NGS) technologies advanced DNA profiling by allowing massively parallel analysis of degraded or trace samples, supporting applications like mixture deconvolution and single-nucleotide polymorphism (SNP) genotyping for ancestry inference.[52] These methods increased sensitivity, enabling profiles from samples previously unamenable to traditional STR typing.[53] Rapid DNA instruments emerged as a key integration in the mid-2010s, automating STR profiling in under 90 minutes at field sites without laboratory infrastructure.[54] The FBI certified initial devices for CODIS uploading in 2017, with plans for full investigative use by 2025 to streamline arrestee and crime scene processing.[55] Adoption has accelerated crime resolution, as seen in U.S. agencies using portable systems for real-time suspect identification during bookings or patrols.[56] Globally, DNA database sizes have ballooned, with the U.S. National DNA Index System (NDIS) exceeding 14 million profiles by 2020, while countries like China reported over 8 million entries, reflecting legislative pushes for broader sample collection from arrestees and convicts.[50] Transnational exchanges via Interpol's DNA Gateway, established in 2009 but expanded in the 2010s, have facilitated cross-border matches in over 100 member states.[57] Forensic genetic genealogy (FGG) integrated consumer databases with law enforcement workflows starting in 2018, leveraging public platforms like GEDmatch to trace distant relatives via SNP arrays from direct-to-consumer kits.[58] This approach resolved high-profile cold cases, such as the Golden State Killer identification, by combining autosomal DNA matches with genealogical records, yielding leads where traditional STR searches failed.[59] By 2024, over 300 U.S. investigations had utilized FGG, prompting policy debates on consent and database opt-in policies amid privacy concerns.[60] These integrations have boosted database effectiveness, though challenges persist in standardizing NGS data uploads to systems like CODIS and ensuring chain-of-custody for rapid field results.[61]Types of DNA Databases
Forensic and Law Enforcement Databases
Forensic and law enforcement DNA databases maintain repositories of short tandem repeat (STR) profiles—partial genetic markers rather than full genomes—extracted from biological evidence at crime scenes, as well as reference samples from convicted offenders, arrestees, and sometimes victims or witnesses, to enable probabilistic matching for criminal investigations. These systems prioritize investigative utility by comparing unknown crime scene profiles against known references, generating leads that link perpetrators to unsolved cases, including cold cases, and supporting prosecutions through statistically rare profile matches (e.g., match probabilities often exceeding 1 in 10^18 for 20+ loci). Unlike consumer or medical databases, access is restricted to authorized law enforcement and forensic personnel under strict protocols to prevent misuse, though expansions to include non-convicted arrestees have raised debates on retention policies balanced against recidivism risks.[25][2] The United States' Combined DNA Index System (CODIS), developed by the Federal Bureau of Investigation (FBI) under the DNA Identification Act of 1994, exemplifies a tiered national infrastructure with Local DNA Index Systems (LDIS) feeding into State DNA Index Systems (SDIS) and the overarching National DNA Index System (NDIS). Over 190 public laboratories contribute to NDIS, which as of 2025 holds more than 24.8 million offender/arrestee profiles and 1.4 million crime scene profiles, facilitating over 600,000 forensic hits annually that have contributed to investigations of serious violent crimes. CODIS software, adopted internationally by more than 90 laboratories, employs automated searching algorithms to detect exact matches or partial profiles, with familial searching enabled in select states since 2010 for investigative leads when direct matches fail, yielding identifications in cases like the 2010 conviction of a serial killer via a relative's profile.[25][30][62] In the United Kingdom, the National DNA Database (NDNAD), launched on April 10, 1995, as the first national forensic DNA repository, stores subject profiles from over 6 million individuals (predominantly males arrested for qualifying offenses) alongside approximately 600,000 crime scene profiles, representing about 10% of the population when adjusted for replicates. By September 30, 2025, the database included profiles with a 17.1% replication rate, and in 2023/24, crime scene profiles loaded yielded a 64.8% match rate against subjects, enabling over 820,000 total matches to unsolved crimes since 2001 that supported arrests in priority offenses like burglary, robbery, and sexual assault. NDNAD operations integrate with police national computer systems for real-time uploads, with speculative searches prohibited but retention justified by empirical patterns of offender recidivism, where profiled individuals commit disproportionate repeat crimes.[63][64][65][66] Empirical analyses demonstrate these databases' causal impact on crime reduction: a study of U.S. state expansions found that a 10% increase in profiled offenders correlates with 0.5-1% drops in violent index crimes (e.g., homicide, rape), driven by deterrence—profiled individuals offend 17-40% less post-sampling—and clearance enhancements, as biological evidence recovery rates exceed 30% in qualifying scenes. In the UK, NDNAD growth from 1995-2010 averted an estimated 10,000-20,000 burglaries annually via similar mechanisms, with cost-benefit ratios favoring databases over incremental policing (e.g., $1 invested yields $40-100 in avoided crime costs). Limitations include backlog processing delays—U.S. labs faced 100,000+ unanalyzed samples pre-2010 expansions—and lower efficacy for crimes without touch DNA (e.g., gun violence), though rapid STR kits have boosted scene recovery since 2015.[9][67][68][69]Genealogical and Consumer Databases
Genealogical and consumer DNA databases consist of genetic profiles collected through direct-to-consumer (DTC) testing kits marketed for ancestry estimation, relative matching, and occasionally health or trait reporting. These databases enable users to identify biological relatives by comparing shared segments of autosomal DNA, typically measured in centimorgans (cM), and to receive probabilistic estimates of ethnic origins based on reference populations. Unlike forensic databases, which are government-operated and restricted to law enforcement, consumer databases are privately held by companies and rely on voluntary customer submissions, with users retaining ownership of their data under service agreements.[70] The largest such database is maintained by AncestryDNA, which reported over 25 million kits sold by 2025, facilitating matches across a vast network that enhances the likelihood of distant relative discoveries.[71] 23andMe follows with more than 12 million samples, emphasizing ancestry composition updates and health-related variants alongside genealogy tools.[70] Other providers include MyHeritage, with approximately 9.6 million DNA samples integrated with historical records, and FamilyTreeDNA, which supports Y-DNA and mitochondrial testing for paternal and maternal lineage tracing in addition to autosomal matches.[72] Collectively, these four major platforms exceed 53 million tested kits as of April 2025, reflecting exponential growth from DTC testing's commercialization in the mid-2000s, when 23andMe launched in 2006, followed by AncestryDNA's entry in 2012.[73] Operational matching in these databases employs algorithms to detect identical-by-descent (IBD) segments, predicting relationship degrees—such as third cousins sharing 0.78% DNA on average—while accounting for recombination rates. Users can build family trees to triangulate matches, resolving ambiguities in paper records, though ethnicity estimates remain approximations reliant on proprietary reference panels that evolve with database expansion. Some platforms, like 23andMe, incorporate whole-genome sequencing data for finer granularity, but accuracy varies by population coverage, with better resolution for European ancestries due to sample biases.[74] Access by law enforcement is limited by policy: AncestryDNA and 23andMe require subpoenas or warrants for data release and do not proactively share with police, citing user privacy.[75] However, users may upload raw data to open platforms like GEDmatch, a free repository exceeding 1 million profiles, where explicit opt-in consent allows forensic searches via investigative genetic genealogy (IGG). This method, popularized by the 2018 Golden State Killer arrest, has identified over 100 suspects and victims by reconstructing pedigrees from third-party relatives' data, demonstrating empirical efficacy in cold cases despite requiring only 10-20 cM matches for viable leads.[76][16] Privacy risks persist, including data breaches—such as 23andMe's 2023 incident exposing 6.9 million users' ancestry data—and potential familial implications, where one individual's test implicates untested kin without consent. Critics argue this circumvents probable cause under the Fourth Amendment, though courts have upheld voluntary uploads as diminishing privacy expectations, and empirical data shows IGG resolves cases with high precision when corroborated by traditional evidence. Companies mitigate concerns through encryption and anonymization for aggregate research, but users must navigate terms allowing de-identified data use for product improvement, underscoring the trade-off between genealogical utility and genetic surveillance potential.[59][77]Medical and Research Databases
Medical and research DNA databases aggregate genomic sequences, genotypes, and linked phenotypic data from consented participants to enable studies on genetic influences on disease etiology, drug response, and population-level variation. These repositories support genome-wide association studies (GWAS), variant pathogenicity assessment, and pharmacogenomic research by providing large-scale, controlled-access datasets that link DNA profiles with clinical outcomes, environmental exposures, and longitudinal health records. Unlike forensic databases, access is restricted to approved researchers under ethical oversight, with data de-identification to protect privacy while promoting discoveries in precision medicine.[78] The UK Biobank exemplifies such databases, having whole-genome sequenced 490,640 participants aged 40-69 recruited from 2006 to 2010 across the United Kingdom. This dataset, released progressively with full sequencing completed by 2025, integrates genetic information with electronic health records, biomarkers, and lifestyle questionnaires from over 500,000 individuals, powering analyses that have identified novel genetic associations with traits like cardiovascular risk and cancer susceptibility. As of 2025, it represents the world's largest whole-genome sequencing resource for population-based research, supporting thousands of studies on causal genetic mechanisms.[79][80] The NIH All of Us Research Program maintains a diverse genomic database aimed at one million U.S. participants, with over 414,000 whole-genome sequences available by February 2025, emphasizing underrepresented racial and ethnic groups to address biases in prior genetic studies. Launched in 2018, it combines DNA data with electronic health records, surveys, and wearable metrics to investigate health disparities and personalized interventions, such as variant-driven predictions for conditions like diabetes and hypertension. This controlled-access repository has enabled early findings on ancestry-specific variants influencing disease prevalence.[81][82] The Genome Aggregation Database (gnomAD) compiles exome and genome data from 730,947 exomes and 76,215 whole genomes across diverse cohorts, primarily to calculate population allele frequencies and annotate variant rarity for clinical interpretation. Established by the Broad Institute in 2017 through harmonization of sequencing projects, it aids in distinguishing benign polymorphisms from pathogenic mutations in diseases like rare genetic disorders and cancers, with updates incorporating non-European ancestries to refine global reference data.[83][84] The NCBI Database of Genotypes and Phenotypes (dbGaP) serves as a federal archive for study-derived genomic and phenotypic datasets, hosting individual-level data from thousands of association studies since its inception around 2007. It includes raw genotypes, sequence variants, and linked traits from projects like GWAS consortia, accessible via tiered controls—open for summary statistics and restricted for sensitive files—to facilitate replication and meta-analyses on genotype-phenotype interactions. By 2025, dbGaP supports research into complex traits by providing standardized formats for data sharing across institutions.[78][85]Operational Mechanisms
Sample Collection and Processing
DNA samples for databases are primarily collected via non-invasive buccal swabs, which involve rubbing a sterile cotton, foam, or flocked-tipped applicator against the inner cheek to harvest epithelial cells containing genomic DNA.[17] [86] This method is standard for law enforcement reference samples from arrestees, convicts, or volunteers, as it requires minimal training and yields sufficient DNA (typically 0.5–1 microgram) without blood draws.[17] [87] Swabs are air-dried to prevent microbial degradation, labeled with donor identifiers, and packaged in breathable envelopes or tubes for transport to accredited labs.[88] In forensic contexts, crime scene samples may involve blood, semen, or touch DNA from substrates, but database uploads require comparable reference profiles from suspects.[89] Post-collection, processing begins with DNA extraction to isolate nucleic acids from cellular material, using methods like Chelex-100 chelation, silica-based solid-phase binding, or organic phenol-chloroform separation, which yield pure DNA free of proteins and inhibitors.[90] Extracted DNA is quantified via spectrophotometry or fluorometry to ensure adequate concentration (e.g., 0.1–1 ng/μL for downstream steps), followed by polymerase chain reaction (PCR) amplification of targeted loci.[91] For databases like the FBI's CODIS, amplification focuses on 20 core short tandem repeat (STR) loci, such as CSF1PO and D3S1358, which provide high discriminatory power due to allele length variations (2–50 repeats).[92] [21] Amplified products undergo capillary electrophoresis for fragment separation by size, with fluorescent detection generating electropherograms that depict peak heights and positions corresponding to alleles.[93] Profiles are then interpreted against quality assurance standards, such as the FBI's Quality Assurance and Proficiency Testing Program, to validate matches or generate searchable entries excluding rare artifacts like stutter peaks.[19] In genealogical or medical databases, processing may incorporate single nucleotide polymorphisms (SNPs) via microarray or next-generation sequencing for broader ancestry or health insights, but STR remains dominant for forensic interoperability.[22] Rapid DNA instruments automate these steps in 90 minutes for field use, though they require confirmatory lab analysis for database submission.[94]Matching Algorithms and Analysis
In forensic DNA databases such as the FBI's Combined DNA Index System (CODIS), matching algorithms primarily involve comparing short tandem repeat (STR) profiles from evidentiary samples against stored reference profiles from known offenders or crime scenes.[25] The process begins with generating a DNA profile by amplifying and analyzing alleles at 20 core STR loci, followed by a search that identifies potential hits based on the number of matching alleles, typically requiring at least 15 loci for a full match in the National DNA Index System (NDIS).[95] Partial profiles from degraded or low-quantity samples may yield near matches, prompting manual review by forensic analysts to confirm investigative leads, such as offender hits linking a suspect to a crime or forensic hits connecting multiple scenes.[95] Statistical analysis of matches relies on calculating the random match probability (RMP), which estimates the frequency of the profile in a relevant population using the product rule: allele frequencies at each locus are multiplied across loci, assuming independence, to derive the overall rarity, often expressed as one in trillions for 20-locus profiles.[96] This approach, validated through population databases like those from the NIST STRBase, accounts for substructure via theta corrections to avoid overestimation of uniqueness in non-random mating populations.[97] For single-source profiles, the match is binary—include or exclude—but significance is quantified via RMP rather than assuming absolute uniqueness due to potential laboratory error rates below 1%.[98] Complex mixtures from multiple contributors necessitate probabilistic genotyping software, such as STRmix, TrueAllele, or EuroForMix, which employ likelihood ratio (LR) models incorporating peak heights, stutter artifacts, and dropout probabilities via Markov chain Monte Carlo simulations or Bayesian frameworks.[99] These algorithms deconvolute mixtures by assigning weights to possible genotype combinations, yielding LRs that compare the probability of the evidence under prosecution (e.g., suspect as contributor) versus defense (e.g., unrelated) hypotheses, with validation studies showing LRs exceeding 10^10 for major contributors in two-person mixtures.[100] Unlike deterministic methods, probabilistic approaches handle uncertainty empirically, reducing false exclusions in low-template DNA while requiring empirical validation against casework data to mitigate validation biases.[101] In genealogical databases like GEDmatch or AncestryDNA, matching algorithms detect identity-by-descent (IBD) segments using single nucleotide polymorphism (SNP) arrays, calculating shared centimorgans (cM) by summing matching chromosomal segments above a threshold (e.g., 7 cM) and applying phasing to distinguish maternal/paternal inheritance.[102] These systems employ segment-based detection via algorithms like GERMLINE or refined IBD tools, estimating relationships probabilistically (e.g., 3rd cousins at 50-200 cM) but face challenges from recombination rate variations and distant matches prone to false positives without triangulation.[103] Forensic applications of such consumer data, as in familial searching, integrate these with STR-to-SNP imputation, though success rates remain low (e.g., 1-2% for cold cases) due to database coverage biases.[104]Storage, Compression, and Security Protocols
DNA profiles in forensic databases, such as the U.S. Federal Bureau of Investigation's Combined DNA Index System (CODIS), are stored in a compact digital format consisting of numerical alleles—one or two per locus—at 20 core short tandem repeat (STR) loci, supplemented by non-personal metadata including specimen identifiers, laboratory codes, and analyst initials, but excluding direct identifiers like names or Social Security numbers to limit re-identification risks beyond matching.[25][2] This STR-based representation, rather than raw sequence data, minimizes storage requirements, with each profile occupying approximately 100-200 bytes, enabling efficient management of over 14 million profiles in the National DNA Index System (NDIS) as of recent audits.[4] In contrast, medical and research databases, such as those in biobanks like UK Biobank, store variant data from whole-genome sequencing in formats like compressed Variant Call Format (VCF) files or array-based genetic data structures (aGDS), capturing single nucleotide polymorphisms (SNPs) or full sequences relative to reference genomes to handle petabyte-scale datasets from thousands of individuals.[105] Compression techniques are essential for genomic-scale databases due to the redundancy in human DNA sequences, where reference-based methods encode only variants (e.g., insertions, deletions, SNPs) against a standard reference genome like GRCh38, achieving compression ratios of 300:1 to over 3,000:1 for collections of haploid genomes by exploiting shared subsequences and probabilistic models.[106][107] Algorithms such as those using Burrows-Wheeler transforms, arithmetic coding tailored to the four-letter DNA alphabet (A, C, G, T), or minimizer-based indexing further reduce file sizes—for instance, compressing short-read sequencing data to 0.317 bits per base or terabytes of raw genomic data to gigabytes—while preserving lossless retrieval for analysis.[108][109] In forensic contexts, where profiles are inherently concise, general-purpose compression like gzip suffices, but emerging whole-genome forensic applications increasingly adopt these genomic compressors to balance query speed and storage costs.[110] Security protocols for DNA databases emphasize layered protections, including FBI-mandated Quality Assurance Standards (QAS) that require biennial external audits of participating laboratories to verify compliance with data integrity, chain-of-custody, and access controls.[111][25] Digital profiles are secured via state-of-the-art encryption for data at rest and in transit, firewalls, and role-based access limited to vetted personnel who undergo FBI background checks, with NDIS procedures prohibiting unauthorized searches or sharing.[112][113] Physical samples are maintained in locked, environmentally controlled facilities with restricted entry, while policies enforce de-identification, automatic expungement for ineligible profiles, and sanctions for misuse, though vulnerabilities persist in non-forensic consumer databases lacking equivalent federal oversight.[114][115]Applications and Societal Impacts
Role in Criminal Justice and Crime Reduction
DNA databases facilitate suspect identification in criminal investigations by comparing DNA profiles from crime scenes to those of known offenders, arrestees, and forensic evidence, thereby generating investigative leads that often lead to arrests and convictions. In the United States, the FBI's Combined DNA Index System (CODIS), part of the National DNA Index System (NDIS), contains over 18.9 million offender profiles, 6 million arrestee profiles, and 1.4 million forensic profiles as of August 2025, with 769,572 total hits contributing to 747,041 aided investigations.[4] These matches have proven instrumental in resolving violent crimes, including homicides and sexual assaults, where biological evidence is recoverable. Similarly, the United Kingdom's National DNA Database (NDNAD) yielded 22,371 routine crime scene-to-subject matches in 2022/23, encompassing 476 homicides (including attempts) and 519 rapes, alongside 1,115 crime scene-to-crime scene matches that link serial offenses.[116] Beyond active cases, DNA databases enable the resolution of cold cases by reanalyzing archived evidence against expanded profiles, exonerating the innocent through mismatches and identifying perpetrators decades later. The National Institute of Justice reports that advancements in DNA technology, coupled with database growth, have linked serial crimes and solved previously unsolvable investigations, with CODIS aiding in connecting disparate cases across jurisdictions.[117] In the UK, NDNAD matches have contributed to convictions in historical cases, such as a 1999 rape resolved in 2022 via database linkage.[116] Overall, since its inception, NDNAD has produced nearly 800,000 matches, demonstrating sustained utility in enhancing detection rates for crimes where DNA evidence is present—achieving a 64% match rate for loaded profiles in 2022/23, compared to lower general crime detection rates.[116] Empirical evidence suggests DNA databases contribute to crime reduction through specific deterrence, as profiled offenders face heightened risks of detection and rearrest for future offenses. Studies analyzing database expansions find that adding individuals reduces their likelihood of new convictions by 17% for serious violent crimes and 6% for serious property crimes, with effects persisting due to the permanence of profiles.[118] Larger databases correlate with overall declines in crime rates, particularly for offenses like murder, rape, and assault where biological evidence is routinely collected and analyzed.[9] For instance, U.S. state-level expansions have shown deterrent impacts, lowering recidivism by increasing the perceived probability of punishment.[119] However, while effective for serious and evidence-rich crimes, DNA matches account for detection in only about 0.35% of total recorded crimes in early assessments, indicating limited broad applicability but disproportionate value in high-impact investigations.[43] This targeted efficacy underscores databases' role in prioritizing resource allocation toward solvable cases, though benefits accrue primarily post-offense rather than through universal prevention.Empirical Evidence of Effectiveness
Empirical studies demonstrate that forensic DNA databases significantly enhance investigative outcomes by generating matches that link crime scene evidence to known offender profiles, thereby aiding in case resolutions. In the United States, the FBI's Combined DNA Index System (CODIS) has produced over 761,872 hits as of June 2025, assisting in more than 739,456 investigations across federal, state, and local levels.[4] These hits include offender-to-crime scene matches that have contributed to solving violent crimes, including homicides and sexual assaults, with cumulative data showing consistent growth in database utility for cold case reviews.[4] In the United Kingdom, the National DNA Database (NDNAD) exhibits high match rates for crime scene profiles, reaching 64% in the 2022/23 fiscal year, indicating robust effectiveness in providing actionable leads for law enforcement.[116] This performance has persisted, with a 66% match rate reported for 2019/20, supporting detections in serious offenses despite the database's inclusion of profiles from arrests rather than convictions alone.[120] Systematic reviews confirm that such databases have facilitated resolutions in numerous specific investigations by matching traces from scenes to stored records.[121] Broader econometric analyses link database expansion to tangible crime reductions, particularly in offenses amenable to biological evidence collection. Research exploiting state-level variations in U.S. DNA database laws finds that larger databases lower overall crime rates, with pronounced effects in categories like murder, rape, and assault, where forensic evidence is frequently recoverable.[9][122] A study in Denmark similarly shows that DNA profiling elevates detection probabilities and curtails recidivism among profiled offenders by up to 43% within the subsequent year.[123] Cost-benefit evaluations underscore the efficiency of these systems relative to alternatives. One analysis estimates that DNA database expansions prevent crimes at a marginal cost orders of magnitude lower than incarceration or increased policing, yielding net societal savings through deterrence and swift resolutions.[124] Forensic leads from databases have also been modeled to generate preventative value in sexual assault cases, with rapid processing averting future offenses and reducing judicial expenditures.[125] However, effectiveness metrics vary by jurisdiction and profile quality, with diminishing marginal returns observed in oversized databases containing low-forensic-value entries.[69]Contributions to Medicine and Genealogy
DNA databases have advanced medical research by enabling large-scale genomic analyses that identify causal variants for complex diseases. The UK Biobank, encompassing genetic, phenotypic, and health record data from about 500,000 UK adults recruited between 2006 and 2010, has produced over 18,000 peer-reviewed publications by September 2025, yielding insights into genetic risk factors for conditions like cancer, heart disease, and dementia, thereby informing preventive strategies and therapeutic targets.[126][127] Similarly, population-scale databases facilitate genome-wide association studies (GWAS) that differentiate disease subtypes and estimate allele frequencies, enhancing causal inference in multifactorial disorders.[128] In rare disease diagnostics, resources such as the Genome Aggregation Database (gnomAD), aggregating exome and genome sequences from over 800,000 individuals as of its latest releases, have reclassified thousands of variants of uncertain significance (VUS) as benign, aiding diagnoses in more than 200,000 patients by providing context-specific population frequencies absent in smaller cohorts.[129] This has directly supported clinical decisions, such as confirming pathogenic mutations in pediatric-onset conditions where penetrance is high but allele rarity is key.[130] Pharmacogenomics benefits from these databases through variant annotation that predicts drug metabolism and efficacy, reducing adverse reactions; empirical data show pharmacogenomic-guided dosing lowers hospitalization risks by 30-50% in polypharmacy cases and cuts adverse events in treatments like warfarin anticoagulation or chemotherapy.[131][132] Databases like PharmGKB integrate such evidence, correlating genotypes with outcomes across populations to refine prescribing guidelines.[133] Consumer-oriented DNA databases have transformed genealogy by leveraging autosomal DNA matching to infer relatedness via shared segments, typically identifying cousins within 4-6 generations with high confidence based on centimorgan thresholds (e.g., 7-15 cM for 3rd cousins). Over 30 million people have submitted samples to major platforms by 2025, generating matches that resolve adoptions, non-paternity events, and unknown kinships; surveys indicate 46% of users encounter unexpected results, yet fewer than 1% report distress, with many achieving family reunions or historical clarifications.[134][135] These databases also aggregate data for admixture analyses, tracing continental ancestry proportions with improving accuracy as sample sizes grow, though estimates remain probabilistic for distant lineages.[136] Genealogical applications extend to constructing extended pedigrees for medical genetics, where DNA-confirmed links enhance risk assessment in hereditary conditions, bridging consumer insights with clinical utility.[137] Overall, such databases democratize access to biological kinship data, fostering empirical refinements in human migration models through crowd-sourced genotyping.[138]Controversies and Ethical Debates
Privacy Risks and Data Misuse Potential
DNA databases, particularly forensic and national ones, face significant privacy risks from unauthorized access and data breaches, as genetic information is uniquely identifiable and immutable, enabling lifelong tracking or reconstruction of personal traits. In commercial genetic databases like 23andMe, a 2023 breach exposed ancestry data for 6.9 million users, allowing hackers to access family trees and potentially reveal sensitive ethnic or health-related inferences without consent.[139] Forensic databases, while more secure due to government controls, carry inherent vulnerabilities; for instance, the U.S. National Institute of Standards and Technology has highlighted risks of genomic data enabling discrimination, synthetic biology attacks, or identity-based targeting if compromised.[140] Function creep exacerbates misuse potential, where data collected for criminal justice expands to unrelated surveillance or policy enforcement without legislative oversight. Early warnings, such as the ACLU's 1999 critique of U.S. expansions from convicted offenders to arrestees, illustrated this drift, which has since included immigration enforcement and predictive policing in some jurisdictions.[141] In Europe, analyses of forensic DNA databases document similar expansions, such as using profiles for non-criminal identifications, raising concerns over mission erosion and inadequate safeguards against repurposing.[142] Such shifts can lead to overreach, as seen in debates over U.K.'s National DNA Database retaining innocent individuals' samples until a 2008 European Court of Human Rights ruling mandated deletions.[115] Familial searching amplifies privacy erosion, as matches to relatives implicate non-consenting family members, violating genetic privacy principles. Investigative genetic genealogy, popularized after the 2018 Golden State Killer case, has drawn criticism for releasing relatives' data indirectly, with studies noting heightened risks of exposing entire lineages to scrutiny or stigma.[143] Peer-reviewed assessments confirm that DNA's heritability means individual entries compromise family-wide privacy, potentially enabling inferences about health predispositions or ancestry without explicit permissions.[144] Misuse extends to discriminatory applications, where biased algorithms or human interpretation in databases could perpetuate racial disparities, as evidenced by higher match rates for certain demographics in U.S. CODIS analyses, compounded by error risks linking innocents.[15] While empirical breaches in national forensic systems remain rare compared to commercial ones, the potential for state-level abuse—such as in authoritarian contexts repurposing data for political profiling—underscores the need for robust, audited protocols, though current frameworks vary widely and often lag technological advances.[145]Human Rights Implications of Mandatory Collection
Mandatory DNA collection for inclusion in national databases has raised significant concerns regarding the right to privacy, as enshrined in Article 8 of the European Convention on Human Rights, which protects respect for private and family life. In the landmark case of S and Marper v. United Kingdom (2008), the European Court of Human Rights ruled that the United Kingdom's policy of indefinite retention of DNA profiles and cellular samples from individuals arrested but not convicted constituted a disproportionate interference with privacy rights, due to its blanket and indiscriminate nature without adequate safeguards for destruction or review.[146] The Court emphasized that such retention implied a presumption of future criminality, undermining the principle of innocence until proven guilty, and lacked proportionality given the minimal additional investigative value compared to targeted retention policies.[147] Bodily integrity and autonomy are further implicated by the invasive nature of DNA sampling, typically via buccal swabs, which courts in jurisdictions like the United States have analogized to a physical search under the Fourth Amendment. While the U.S. Supreme Court in Maryland v. King (2013) upheld routine DNA collection from serious felony arrestees as a reasonable booking procedure akin to fingerprinting, critics argue it erodes consent-based autonomy by compelling genetic disclosure without individualized suspicion beyond arrest, potentially enabling function creep where samples are repurposed for non-forensic uses such as ancestry or health inference.[148] Human Rights Watch has contended that expanding mandatory collection to non-criminal populations, such as detained immigrants, violates privacy by treating biometric data as a default state interest without balancing individual rights to control personal genetic information.[149] Equality and non-discrimination rights under Article 14 of the European Convention are threatened by disproportionate impacts on ethnic minorities, who are overrepresented in many forensic DNA databases due to higher arrest and conviction rates for certain offenses. In the U.S., African Americans and Latinos constitute a significant share of database entries relative to their population proportion, amplifying risks of biased policing and familial searches that ensnare relatives without direct involvement, thereby perpetuating cycles of surveillance and stigmatization.[13] A 2005 analysis in the UK revealed Black men were four times more likely than White men to be profiled in the national database, raising fears of de facto racial profiling embedded in mandatory collection regimes that fail to account for systemic arrest disparities.[150] Broader human rights frameworks, including those from the United Nations, highlight risks of stigmatization and erosion of presumption of innocence, as permanent database inclusion signals ongoing suspicion regardless of acquittal or minor offenses. Academic analyses warn that universal or near-mandatory databases could normalize genetic surveillance, violating principles of proportionality and necessity by retaining sensitive data indefinitely without robust deletion mechanisms or oversight, potentially leading to misuse in non-criminal contexts like employment or insurance discrimination if security breaches occur.[151] Despite judicial validations in some contexts, such as U.S. federal expansions under the DNA Fingerprint Act of 2005 allowing collection from arrestees, these implications underscore ongoing debates over whether empirical crime-solving benefits justify encroachments on core liberties, with evidence suggesting limited marginal gains from non-convict inclusions.[11][7]Challenges with Familial Searching and Genetic Inference
Familial searching in DNA databases involves scanning forensic profiles against offender databases for partial matches indicative of kinship, thereby identifying potential suspects through relatives already profiled. This technique, first systematically implemented in the United Kingdom in 2003 and later in U.S. states like California starting in 2010, circumvents direct matches but implicates innocent family members in investigations without their consent, raising significant privacy concerns.[152][153] Critics argue that such indirect surveillance expands state access to genetic data beyond convicted individuals, potentially deterring database participation and eroding public trust in forensic systems.[154] Accuracy challenges arise from the probabilistic nature of kinship inference, where partial matches (typically requiring a likelihood ratio above a threshold like 10^4 to 10^6) can yield false positives, leading investigators to pursue unrelated or distantly related individuals. A 2013 study examining familial search error rates found that adventitious matches—random similarities mimicking kinship—occur at rates influenced by database size and population structure, with false positive investigations documented in early implementations, such as a 2015 California case where a partial match erroneously directed resources toward non-relatives.[155] Genetic inference exacerbates this by incorporating ancestry predictions from single nucleotide polymorphisms (SNPs) to refine allele frequency estimates, yet simulations show false positive rates remain comparable to standard methods, particularly when ancestry misclassification occurs in admixed populations.[156] Overreliance on these inferences risks confirmatory bias, where initial partial hits prompt invasive follow-ups without sufficient validation.[157] Demographic disparities amplify these issues, as DNA databases like CODIS overrepresent racial minorities due to higher arrest and conviction rates—African Americans, comprising about 13% of the U.S. population, account for roughly 40% of profiles—resulting in familial searches disproportionately implicating their communities.[158] Empirical analyses confirm that this skews investigative focus toward minority families, potentially perpetuating cycles of surveillance and reinforcing existing inequities in criminal justice data collection.[155] In genetic genealogy contexts, where commercial databases are queried for broader SNP data, inference accuracy declines further in non-European ancestries due to reference panel biases, heightening misidentification risks for underrepresented groups.[159] Broader ethical hurdles include the absence of uniform safeguards against data misuse and the tension between investigative utility and civil liberties, with policy reports highlighting needs for judicial oversight and hit confirmation protocols to mitigate harms.[160] While proponents cite successes like the 2010 identification in the Grim Sleeper case, opponents emphasize that unconsented familial implications violate principles of autonomy and equality, particularly absent empirical proof of net crime reduction outweighing privacy erosions.[15] Ongoing debates underscore the causal linkage between database composition biases and amplified scrutiny of certain demographics, urging first-principles reevaluation of search thresholds to prioritize evidentiary rigor over exploratory fishing.[161]Legal and Policy Landscapes
Frameworks in Major Jurisdictions
In the United States, the Combined DNA Index System (CODIS) serves as the national forensic DNA database, authorized by the Violent Crime Control and Law Enforcement Act of 1994, which empowered the FBI to establish and maintain indices of DNA profiles from convicted offenders, crime scenes, and unidentified human remains.[11] Subsequent legislation, including the DNA Fingerprint Act of 2005 and the Katie Sepich Enhanced DNA Collection Act of 2010, expanded eligibility to include profiles from arrestees in certain states and non-violent felons, with states required to submit profiles for federal matching.[6] As of 2018, CODIS contained approximately 13-15 million profiles, primarily from criminal justice sources, with access restricted to authorized law enforcement for investigative matching and no familial searching at the federal level.[15] The United Kingdom operates the National DNA Database (NDNAD), initiated in 1995 under the Police and Criminal Evidence Act, but significantly reformed by the Protection of Freedoms Act 2012 following a European Court of Human Rights ruling in S and Marper v. UK (2008) that deemed indefinite retention of innocent individuals' profiles disproportionate.[162] The 2012 Act mandates retention of profiles and samples from convicted individuals indefinitely, while limiting non-convicted adults to three years (with possible extension) and deleting those from arrested children unless charged; it applies to England and Wales, with devolved systems in Scotland and Northern Ireland.[163] Oversight includes the NDNAD Strategy Board and Ethics Group, ensuring compliance with data protection laws.[164] Canada's National DNA Data Bank, established by the DNA Identification Act of 1998 and operational since June 30, 2000, compiles profiles from biological samples ordered by courts for designated offences under the Criminal Code, such as serious violent or sexual crimes.[165][166] The Act requires the Royal Canadian Mounted Police to maintain two indices—convicted offenders and crime scenes—for automated searching, with retention indefinite for matches to unsolved crimes but subject to destruction orders for acquittals or stays; voluntary samples from victims or missing persons form a separate index.[167] Amendments via Bill C-13 in 2003 broadened collection authority, emphasizing linkage to perpetrators rather than broad arrestee inclusion.[168] In Australia, DNA database frameworks are decentralized across states and territories under forensic procedures legislation, such as New South Wales' Crimes (Forensic Procedures) Act 2000, with federal coordination via Part 1D of the Crimes Act 1914 regulating the Commonwealth DNA database system for offences under federal jurisdiction.[169] Profiles derive from suspects, offenders, and crime scenes, with retention policies varying by jurisdiction—typically indefinite for serious offenders but limited for minors or non-convicted individuals—and the National Criminal Investigation DNA Database (NCIDD), managed by the Australian Criminal Intelligence Commission, integrates over 1.8 million profiles as of August 2024 for cross-jurisdictional matching.[170] Interstate data sharing is permitted under strict protocols, excluding speculative familial searches without judicial approval.[171] Within the European Union, the Prüm Decision (2008/615/JHA) mandates member states to establish national DNA databases and enables automated cross-border exchange of profiles for serious crimes, covering 13-16 short tandem repeat loci standardized via ENFSI guidelines; by 2018, all EU states complied with database creation, though retention rules differ nationally, often balancing EU data protection regulations (GDPR) with investigative needs.[172][57] Non-EU participation, such as Interpol's DNA Gateway, supplements but does not supplant national frameworks.[57]International Variations and Policy Debates
National DNA databases exhibit significant variations in scale, inclusion criteria, and retention policies across jurisdictions. The United States' Combined DNA Index System (CODIS), managed by the FBI, maintains the largest forensic database globally, with over 18.6 million offender profiles, 5.9 million arrestee profiles, and 1.4 million forensic profiles as of June 2025.[4] In contrast, China's national database, established in 2005, has expanded rapidly to encompass tens of millions of profiles, driven by policies mandating collection from criminal suspects, administrative detainees, and certain ethnic minorities, though exact current figures remain opaque due to limited official disclosures.[173] The United Kingdom's National DNA Database (NDNAD), operational since 1995, holds approximately 6.7 million subject profiles as of recent estimates, representing about 10% of the population, with profiles from convicted individuals retained indefinitely and those from unconvicted arrestees subject to time-limited retention following European Court of Human Rights (ECtHR) rulings.[145] Other nations, such as those in the European Union, often limit inclusion to profiles from serious offenses, with smaller databases; for instance, Germany's database focuses on convicted serious offenders, emphasizing proportionality under data protection laws.[174]| Country/Region | Approximate Size (Recent) | Key Inclusion Criteria | Retention Policy |
|---|---|---|---|
| United States (CODIS/NDIS) | >18.6M offender profiles (June 2025) | Convicted felons nationwide; arrestees in 30+ states | Lifetime for qualifying offenders; indefinite for forensic profiles[4] |
| China | ~68M+ profiles (2022 onward expansion) | Suspects, detainees, voluntary contributors, targeted groups | Indefinite, with broad administrative uses[173] |
| United Kingdom (NDNAD) | ~6.7M subject profiles | Convicted for recordable offenses; limited arrestee profiles | Indefinite for convicted; 3-5 years for unconvicted with renewal option[145] |
| European Union (varies, e.g., Germany) | Smaller, e.g., <1M in many nations | Primarily convicted serious offenders | Proportional to offense severity; expungement possible post-sentence[174] |
