Hubbry Logo
SNP genotypingSNP genotypingMain
Open search
SNP genotyping
Community hub
SNP genotyping
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
SNP genotyping
SNP genotyping
from Wikipedia

SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms (SNPs) between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most common types of genetic variation. An SNP is a single base pair mutation at a specific locus, usually consisting of two alleles (where the rare allele frequency is > 1%). SNPs are found to be involved in the etiology of many human diseases and are becoming of particular interest in pharmacogenetics. Because SNPs are conserved during evolution, they have been proposed as markers for use in quantitative trait loci (QTL) analysis and in association studies in place of microsatellites. The use of SNPs is being extended in the HapMap project, which aims to provide the minimal set of SNPs needed to genotype the human genome. SNPs can also provide a genetic fingerprint for use in identity testing.[1] The increase of interest in SNPs has been reflected by the furious development of a diverse range of SNP genotyping methods.

Hybridization-based methods

[edit]

Several applications have been developed that interrogate SNPs by hybridizing complementary DNA probes to the SNP site. The challenge of this approach is reducing cross-hybridization between the allele-specific probes. This challenge is generally overcome by manipulating the hybridization stringency conditions.[1]

Dynamic allele-specific hybridization

[edit]

Dynamic allele-specific hybridization (DASH) genotyping takes advantage of the differences in the melting temperature in DNA that results from the instability of mismatched base pairs. The process can be vastly automated and encompasses a few simple principles.[citation needed]

In the first step, a genomic segment is amplified and attached to a bead through a PCR reaction with a biotinylated primer. In the second step, the amplified product is attached to a streptavidin column and washed with NaOH to remove the unbiotinylated strand. An allele-specific oligonucleotide is then added in the presence of a molecule that fluoresces when bound to double-stranded DNA. The intensity is then measured as temperature is increased until the melting temperature (Tm) can be determined. A SNP will result in a lower than expected Tm.[2]

Because DASH genotyping is measuring a quantifiable change in Tm, it is capable of measuring all types of mutations, not just SNPs. Other benefits of DASH include its ability to work with label free probes and its simple design and performance conditions.[citation needed]

Molecular beacons

[edit]

SNP detection through molecular beacons makes use of a specifically engineered single-stranded oligonucleotide probe. The oligonucleotide is designed such that there are complementary regions at each end and a probe sequence located in between. This design allows the probe to take on a hairpin, or stem-loop, structure in its natural, isolated state. Attached to one end of the probe is a fluorophore and to the other end a fluorescence quencher. Because of the stem-loop structure of the probe, the fluorophore is close to the quencher, thus preventing the molecule from emitting any fluorescence. The molecule is also engineered such that only the probe sequence is complementary to the genomic DNA that will be used in the assay.[3]

If the probe sequence of the molecular beacon encounters its target genomic DNA during the assay, it will anneal and hybridize. Because of the length of the probe sequence, the hairpin segment of the probe will be denatured in favour of forming a longer, more stable probe-target hybrid. This conformational change permits the fluorophore and quencher to be free of their tight proximity due to the hairpin association, allowing the molecule to fluoresce.

If on the other hand, the probe sequence encounters a target sequence with as little as one non-complementary nucleotide, the molecular beacon will preferentially stay in its natural hairpin state and no fluorescence will be observed, as the fluorophore remains quenched.

The unique design of these molecular beacons allows for a simple diagnostic assay to identify SNPs at a given location. If a molecular beacon is designed to match a wild-type allele and another to match a mutant of the allele, the two can be used to identify the genotype of an individual. If only the first probe's fluorophore wavelength is detected during the assay then the individual is homozygous to the wild type. If only the second probe's wavelength is detected then the individual is homozygous to the mutant allele. Finally, if both wavelengths are detected, then both molecular beacons must be hybridizing to their complements and thus the individual must contain both alleles and be heterozygous.

SNP microarrays

[edit]

In high-density oligonucleotide SNP arrays, hundreds of thousands of probes are arrayed on a small chip, allowing for many SNPs to be interrogated simultaneously.[1] Because SNP alleles only differ in one nucleotide and because it is difficult to achieve optimal hybridization conditions for all probes on the array, the target DNA has the potential to hybridize to mismatched probes. This is addressed somewhat by using several redundant probes to interrogate each SNP. Probes are designed to have the SNP site in several different locations as well as containing mismatches to the SNP allele. By comparing the differential amount of hybridization of the target DNA to each of these redundant probes, it is possible to determine specific homozygous and heterozygous alleles.[1] Although oligonucleotide microarrays have a comparatively lower specificity and sensitivity, the scale of SNPs that can be interrogated is a major benefit. The Affymetrix Human SNP 5.0 GeneChip performs a genome-wide assay that can genotype over 500,000 human SNPs.[4][full citation needed]

Enzyme-based methods

[edit]

A broad range of enzymes including DNA ligase, DNA polymerase and nucleases have been employed to generate high-fidelity SNP genotyping methods.

Restriction fragment length polymorphism

[edit]

Restriction fragment length polymorphism (RFLP) is considered to be the simplest and earliest method to detect SNPs. SNP-RFLP makes use of the many different restriction endonucleases and their high affinity to unique and specific restriction sites. By performing a digestion on a genomic sample and determining fragment lengths through a gel assay it is possible to ascertain whether or not the enzymes cut the expected restriction sites. A failure to cut the genomic sample results in an identifiably larger than expected fragment implying that there is a mutation at the point of the restriction site which is rendering it protection from nuclease activity.

The combined factors of the high complexity of most eukaryotic genomes, the requirement for specific endonucleases, the fact that the exact mutation cannot necessarily be resolved in a single experiment, and the slow nature of gel assays make RFLP a poor choice for high throughput analysis.

PCR-based methods

[edit]

Tetra-primer amplification refractory mutation system PCR, or ARMS-PCR, employs two pairs of primers to amplify two alleles in one PCR reaction. The primers are designed such that the two primer pairs overlap at a SNP location but each match perfectly to only one of the possible SNPs. The basis of the invention is that unexpectedly, oligonucleotides with a mismatched 3'-residue will not function as primers in the PCR under appropriate conditions.[5] As a result, if a given allele is present in the PCR reaction, the primer pair specific to that allele will produce product but not to the alternative allele with a different SNP. The two primer pairs are also designed such that their PCR products are of a significantly different length allowing for easily distinguishable bands by gel electrophoresis or melt temperature analysis.[6][7] In examining the results, if a genomic sample is homozygous, then the PCR products that result will be from the primer that matches the SNP location and the outer opposite-strand primer, as well from the two outer primers. If the genomic sample is heterozygous, then products will result from the primer of each allele and their respective outer primer counterparts as well as the outer primers.

An alternative strategy is to run multiple qPCR reactions with different primer sets that target each allele separately. Well-designed primers will amplify their target SNP at a much earlier cycle than the other SNPs. This allows more than two alleles to be distinguished, although an individual qPCR reaction is required for each SNP. To achieve high enough specificity, the primer sequence may require placement of an artificial mismatch near its 3'-end, which is an approach generally known as Taq-MAMA.[8]

Flap endonuclease

[edit]

Flap endonuclease (FEN) is an endonuclease that catalyzes structure-specific cleavage. This cleavage is highly sensitive to mismatches and can be used to interrogate SNPs with a high degree of specificity[9]

In the basic Invader assay, a FEN called cleavase is combined with two specific oligonucleotide probes, that together with the target DNA, can form a tripartite structure recognized by cleavase.[9] The first probe, called the Invader oligonucleotide is complementary to the 3’ end of the target DNA. The last base of the Invader oligonucleotide is a non-matching base that overlaps the SNP nucleotide in the target DNA. The second probe is an allele-specific probe which is complementary to the 5’ end of the target DNA, but also extends past the 3’ side of the SNP nucleotide. The allele-specific probe will contain a base complementary to the SNP nucleotide. If the target DNA contains the desired allele, the Invader and allele-specific probes will bind to the target DNA forming the tripartite structure. This structure is recognized by cleavase, which will cleave and release the 3’ end of the allele-specific probe. If the SNP nucleotide in the target DNA is not complementary to the allele-specific probe, the correct tripartite structure is not formed and no cleavage occurs. The Invader assay is usually coupled with fluorescence resonance energy transfer (FRET) system to detect the cleavage event. In this setup, a quencher molecule is attached to the 3’ end and a fluorophore is attached to the 5’ end of the allele-specific probe. If cleavage occurs, the fluorophore will be separated from the quencher molecule generating a detectable signal.[9]

Only minimal cleavage occurs with mismatched probes making the Invader assay highly specific. However, in its original format, only one SNP allele could be interrogated per reaction sample and it required a large amount of target DNA to generate a detectable signal in a reasonable time frame.[9] Several developments have extended the original Invader assay. By carrying out secondary FEN cleavage reactions, the Serial Invasive Signal Amplification Reaction (SISAR) allows both SNP alleles to be interrogated in a single reaction. SISAR Invader assay also requires less target DNA, improving the sensitivity of the original Invader assay.[9] The assay has also been adapted in several ways for use in a high-throughput format. In one platform, the allele-specific probes are anchored to microspheres. When cleavage by FEN generates a detectable fluorescent signal, the signal is measured using flow-cytometry. The sensitivity of flow-cytometry, eliminates the need for PCR amplification of the target DNA.[10][full citation needed] These high-throughput platforms have not progressed beyond the proof-of-principle stage and so far the Invader system has not been used in any large scale SNP genotyping projects.[9]

Primer extension

[edit]

Primer extension is a two step process that first involves the hybridization of a probe to the bases immediately upstream of the SNP nucleotide followed by a ‘mini-sequencing’ reaction, in which DNA polymerase extends the hybridized primer by adding a base that is complementary to the SNP nucleotide. This incorporated base is detected and determines the SNP allele.[11][12] Because primer extension is based on the highly accurate DNA polymerase enzyme, the method is generally very reliable. Primer extension is able to genotype most SNPs under very similar reaction conditions making it also highly flexible. The primer extension method is used in a number of assay formats. These formats use a wide range of detection techniques that include MALDI-TOF mass spectrometry (see Sequenom) and ELISA-like methods.[1]

Generally, there are two main approaches which use the incorporation of either fluorescently labeled dideoxynucleotides (ddNTP) or fluorescently labeled deoxynucleotides (dNTP). With ddNTPs, probes hybridize to the target DNA immediately upstream of SNP nucleotide, and a single, ddNTP complementary to the SNP allele is added to the 3’ end of the probe (the missing 3'-hydroxyl in didioxynucleotide prevents further nucleotides from being added). Each ddNTP is labeled with a different fluorescent signal allowing for the detection of all four alleles in the same reaction. With dNTPs, allele-specific probes have 3’ bases which are complementary to each of the SNP alleles being interrogated. If the target DNA contains an allele complementary to the probe's 3’ base, the target DNA will completely hybridize to the probe, allowing DNA polymerase to extend from the 3’ end of the probe. This is detected by the incorporation of the fluorescently labeled dNTPs onto the end of the probe. If the target DNA does not contain an allele complementary to the probe's 3’ base, the target DNA will produce a mismatch at the 3’ end of the probe and DNA polymerase will not be able to extend from the 3' end of the probe. The benefit of the second approach is that several labeled dNTPs may get incorporated into the growing strand, allowing for increased signal. However, DNA polymerase in some rare cases, can extend from mismatched 3’ probes giving a false positive result.[1]

A different approach is used by Sequenom's iPLEX SNP genotyping method, which uses a MassARRAY mass spectrometer. Extension probes are designed in such a way that 40 different SNP assays can be amplified and analyzed in a PCR cocktail. The extension reaction uses ddNTPs as above, but the detection of the SNP allele is dependent on the actual mass of the extension product and not on a fluorescent molecule. This method is for low to medium high throughput, and is not intended for whole genome scanning.

The flexibility and specificity of primer extension make it amenable to high throughput analysis. Primer extension probes can be arrayed on slides allowing for many SNPs to be genotyped at once. Broadly referred to as arrayed primer extension (APEX), this technology has several benefits over methods based on differential hybridization of probes. Comparatively, APEX methods have greater discriminating power than methods using this differential hybridization, as it is often impossible to obtain the optimal hybridization conditions for the thousands of probes on DNA microarrays (usually this is addressed by having highly redundant probes). However, the same density of probes cannot be achieved in APEX methods, which translates into lower output per run.[1]

Illumina Incorporated's Infinium assay is an example of a whole-genome genotyping pipeline that is based on primer extension method. In the Infinium assay, over 100,000 SNPs can be genotyped. The assay uses hapten-labelled nucleotides in a primer extension reaction. The hapten label is recognized by anti-bodies, which in turn are coupled to a detectable signal.[13]

APEX-2 is an arrayed primer extension genotyping method which is able to identify hundreds of SNPs or mutations in parallel using efficient homogeneous multiplex PCR (up to 640-plex) and four-color single-base extension on a microarray. The multiplex PCR requires two oligonucleotides per SNP/mutation generating amplicons that contain the tested base pair. The same oligonucleotides are used in the following step as immobilized single-base extension primers on a microarray (Krjutskov et al. 2008).

5’-nuclease

[edit]

Taq DNA polymerase's 5’-nuclease activity is used in the TaqMan assay for SNP genotyping. The TaqMan assay is performed concurrently with a PCR reaction and the results can be read in real-time as the PCR reaction proceeds.[14] The assay requires forward and reverse PCR primers that will amplify a region that includes the SNP polymorphic site. Allele discrimination is achieved using FRET combined with one or two allele-specific probes that hybridize to the SNP polymorphic site. The probes will have a fluorophore linked to their 5’ end and a quencher molecule linked to their 3’ end. While the probe is intact, the quencher will remain in close proximity to the fluorophore, eliminating the fluorophore's signal. During the PCR amplification step, if the allele-specific probe is perfectly complementary to the SNP allele, it will bind to the target DNA strand and then get degraded by 5’-nuclease activity of the Taq polymerase as it extends the DNA from the PCR primers. The degradation of the probe results in the separation of the fluorophore from the quencher molecule, generating a detectable signal. If the allele-specific probe is not perfectly complementary, it will have lower melting temperature and not bind as efficiently. This prevents the nuclease from acting on the probe.[14]

Since the TaqMan assay is based on PCR, it is relatively simple to implement. The TaqMan assay can be multiplexed by combining the detection of up to seven SNPs in one reaction. However, since each SNP requires a distinct probe, the TaqMan assay is limited by the how close the SNPs can be situated. The scale of the assay can be drastically increased by performing many simultaneous reactions in microtitre plates. Generally, TaqMan is limited to applications that involve interrogating a small number of SNPs since optimal probes and reaction conditions must be designed for each SNP.[11]

Oligonucleotide Ligation Assay

[edit]

DNA ligase catalyzes the ligation of the 3' end of a DNA fragment to the 5' end of a directly adjacent DNA fragment. This mechanism can be used to interrogate a SNP by hybridizing two probes directly over the SNP polymorphic site, whereby ligation can occur if the probes are identical to the target DNA. In the oligonucleotide ligase assay, two probes are designed; an allele-specific probe which hybridizes to the target DNA so that its 3' base is situated directly over the SNP nucleotide and a second probe that hybridizes the template upstream (downstream in the complementary strand) of the SNP polymorphic site providing a 5' end for the ligation reaction. If the allele-specific probe matches the target DNA, it will fully hybridize to the target DNA and ligation can occur. Ligation does not generally occur in the presence of a mismatched 3' base. Ligated or unligated products can be detected by gel electrophoresis, MALDI-TOF mass spectrometry or by capillary electrophoresis for large-scale applications.[1] With appropriate sequences and tags on the oligonucleotides, high-throughput sequence data can be generated from the ligated products and genotypes determined.[15][full citation needed] The use of large numbers of sample indexes allows high-throughput sequence data on hundreds of SNPs in thousands of samples to be generated in a small portion of a high-throughput sequencing run. This is a massive genotyping by sequencing technology (MGST).[citation needed]

Other post-amplification methods based on physical properties of DNA

[edit]

The characteristic DNA properties of melting temperature and single stranded conformation have been used in several applications to distinguish SNP alleles. These methods very often achieve high specificity but require highly optimized conditions to obtain the best possible results.

Single strand conformation polymorphism

[edit]

Single-stranded DNA (ssDNA) folds into a tertiary structure. The conformation is sequence dependent and most single base pair mutations will alter the shape of the structure. When applied to a gel, the tertiary shape will determine the mobility of the ssDNA, providing a mechanism to differentiate between SNP alleles. This method first involves PCR amplification of the target DNA. The double-stranded PCR products are denatured using heat and formaldehyde to produce ssDNA. The ssDNA is applied to a non-denaturing electrophoresis gel and allowed to fold into a tertiary structure. Differences in DNA sequence will alter the tertiary conformation and be detected as a difference in the ssDNA strand mobility.[16] This method is widely used because it is technically simple, relatively inexpensive and uses commonly available equipment. However compared to other SNP genotyping methods, the sensitivity of this assay is lower. It has been found that the ssDNA conformation is highly dependent on temperature and it is not generally apparent what the ideal temperature is. Very often the assay will be carried out using several different temperatures. There is also a restriction on the length of fragment because the sensitivity drops when sequences longer than 400 bp are used.[16]

Temperature gradient gel electrophoresis

[edit]

The temperature gradient gel electrophoresis (TGGE) or temperature gradient capillary electrophoresis (TGCE) method is based on the principle that partially denatured DNA is more restricted and travels slower in a porous material such as a gel. This property allows for the separation of DNA by melting temperature. To adapt these methods for SNP detection, two fragments are used; the target DNA which contain the SNP polymorphic site being interrogated and an allele-specific DNA sequence, referred to as the normal DNA fragment. The normal fragment is identical to the target DNA except potentially at the SNP polymorphic site, which is unknown in the target DNA. The fragments are denatured and then reannealed. If the target DNA has the same allele as the normal fragment, homoduplexes will form that will have the same melting temperature. When run on the gel with a temperature gradient, only one band will appear. If the target DNA has a distinct allele, four products will form following the reannealing step; homoduplexes consisting of target DNA, homoduplexes consisting of normal DNA and two heterduplexes of each strand of target DNA hybridized with the normal DNA strand. These four products will have distinct melting temperatures and will appear as four bands in the denaturing gel.[1]

Denaturing high performance liquid chromatography

[edit]

Denaturing high performance liquid chromatography (DHPLC) uses reversed-phase HPLC to interrogate SNPs. The key to DHPLC is the solid phase which has differential affinity for single and double-stranded DNA. In DHPLC, DNA fragments are denatured by heating and then allowed to reanneal. The melting temperature of the reannealed DNA fragments determines the length of time they are retained in the column.[17] Using PCR, two fragments are generated; target DNA containing the SNP polymorphic site and an allele-specific DNA sequence, referred to as the normal DNA fragment. This normal fragment is identical to the target DNA except potentially at the SNP polymorphic site, which is unknown in the target DNA. The fragments are denatured and then allowed to gradually reanneal. The reannaled products are added to the DHPLC column. If the SNP allele in the target DNA matches the normal DNA fragment, only identical homoduplexes will form during the reannealing step. If the target DNA contains a different SNP allele than the normal DNA fragment, heteroduplexes of the target DNA and normal DNA containing a mismatched polymorphic site will form in addition to homoduplexes. The mismatched heteroduplexes will have a different melting temperature than the homoduplexes and will not be retained in the column as long. This generates a chromatograph pattern that is distinctive from the pattern that would be generated if the target DNA fragment and normal DNA fragments were identical. The eluted DNA is detected by UV absorption.[17]

DHPLC is easily automated as no labeling or purification of the DNA fragments is needed. The method is also relatively fast and has a high specificity. One major drawback of DHPLC is that the column temperature must be optimized for each target in order to achieve the right degree of denaturation.[1]

High-resolution melting of the entire amplicon

[edit]

High-resolution melting analysis is the simplest PCR-based method to understand. Basically, the same thermodynamic properties that allowed for the gel techniques to work apply here, and in real-time. A fluorimeter monitors the post-PCR denaturation of the entire dsDNA amplicon. You make primers specific to the site you want to amplify. You "paint" the amplicon with a double-strand specific dye, included in the PCR mix. The ds-specific dye integrates itself into the PCR product. In essence, the entire amplicon becomes a probe. This opens up new possibilities for discovery. Either you position the primers very close to either side of the SNP in question (small amplicon genotyping)[18] or amplify a larger region (100–400bp in length) for scanning purposes. For simple genotyping of an SNP, it is easier to just make the amplicon small to minimize the chances you mistake one SNP for another. The melting temperature (Tm) of the entire amplicon is determined and most homozygotes are sufficiently different (in the better instruments) in Tm to genotype. Heterozygotes are even easier to differentiate because they have heteroduplexes generated (refer to the gel-based explanations) which broadens the melt transition and usually gives two discernible peaks. Amplicon melting using a fluorescently-labeled primer has been described,[19] but is less practical than using ds-specific dyes due to the cost of the fluorogenic primer.

Scanning of larger amplicons is based on the same principles as outlined above. However, melting temperature and the overall shape of the melting curve become informative. For amplicons >c.150bp there are often >2 melting peaks, each of which can vary, depending on the DNA template composition. Numerous investigators have been able to successfully eliminate the majority of their sequencing through melt-based scanning, allowing accurate locus-based genotyping of large numbers of individuals.[20] Many investigators have found scanning for mutations using high resolution melting as a viable and practical way to study entire genes.

Use of DNA mismatch-binding proteins

[edit]

DNA mismatch-binding proteins can distinguish single nucleotide mismatches and thus facilitate differential analysis of SNPs. For example, MutS protein from Thermus aquaticus binds different single nucleotide mismatches with different affinities and can be used in capillary electrophoresis to differentiate all six sets of mismatches.[21]

SNPlex

[edit]

SNPlex is a proprietary genotyping platform sold by Applied Biosystems.

Surveyor nuclease assay

[edit]

Surveyor nuclease is a mismatch endonuclease enzyme that recognizes all base substitutions and small insertions/deletions (indels), and cleaves the 3′ side of mismatched sites in both DNA strands.

Sequencing

[edit]

Next-generation sequencing technologies such as pyrosequencing sequence less than 250 bases in a read which limits their ability to sequence whole genomes. However, their ability to generate results in real-time and their potential to be massively scaled up makes them a viable option for sequencing small regions to perform SNP genotyping. Compared to other SNP genotyping methods, sequencing is in particular, suited to identifying multiple SNPs in a small region, such as the highly polymorphic major histocompatibility complex region of the genome.[1]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Single nucleotide polymorphism (SNP) genotyping is the process of determining the specific alleles present at positions in an individual's DNA where single nucleotide polymorphisms—variations involving a single nucleotide base—occur. SNPs represent the most common form of genetic variation among people, with each polymorphism consisting of a difference in a single DNA building block, such as replacing cytosine (C) with thymine (T), and occurring approximately once every 1,000 nucleotides on average. Over 1.2 billion SNPs have been identified across human populations (as of 2025), though only a subset are common (present in at least 1% of individuals), and they account for the majority of sequence differences between any two unrelated genomes. This genotyping technique enables the precise identification of an individual's genotype—whether homozygous or heterozygous—at these sites, serving as a foundational tool in genomics. The importance of SNP genotyping stems from its role in linking genetic variations to phenotypic traits, disease risks, and therapeutic responses. SNPs act as biological markers to locate genes associated with complex diseases such as heart disease, , and cancer, facilitating genome-wide association studies (GWAS) that scan millions of SNPs to identify susceptibility loci. In , genotyping helps predict individual responses to drugs by revealing variants that influence or efficacy, thereby supporting approaches. Additionally, SNPs are used to trace ancestry, study , and map inheritance patterns within families, providing insights into and . A variety of technologies underpin SNP genotyping, broadly categorized by their principles of allele discrimination and detection. Common methods include PCR-based assays, which use allele-specific fluorescent probes during to quantify signals in real-time; microarray platforms, such as Illumina bead arrays, that hybridize target DNA to immobilized probes for high-throughput analysis of up to 2 million SNPs simultaneously; and next-generation sequencing (NGS), which generates comprehensive sequence data to call both known and novel SNPs through alignment to reference genomes. These approaches have evolved for greater accuracy, scalability, and cost-efficiency, with NGS enabling de novo variant discovery while PCR methods offer targeted, rapid genotyping for clinical applications. Advances continue to integrate multi-sample analysis and probabilistic modeling to enhance call confidence, achieving accuracies exceeding 99% in well-designed studies.

Fundamentals

Single Nucleotide Polymorphisms

Single nucleotide polymorphisms (SNPs) are defined as single base-pair variations in the DNA sequence where one nucleotide is substituted for another, occurring at a frequency of at least 1% in a population, distinguishing them from rare mutations. These variations can involve any of the four nucleotide bases (adenine, thymine, cytosine, or guanine) and are typically biallelic, meaning they result in two possible alleles at a given position. SNPs represent the most common form of genetic variation in the human genome, serving as stable markers due to their low mutation rates compared to other variant types. SNPs are classified based on their location and functional impact relative to genes. In coding regions, they are categorized as synonymous, which do not alter the sequence due to the degeneracy of the , or non-synonymous, which change the and may affect protein function. Exonic SNPs occur within exons, while intronic SNPs are found in introns; additionally, SNPs can reside in regulatory elements such as promoters, potentially influencing by altering binding sites. Most SNPs (~90-95%) are located in non-coding regions, where their density is higher than in coding sequences, reflecting less selective pressure on these areas. In the , approximately 10 million common SNPs (with >1%) have been identified, though comprehensive catalogs like the report over 84 million total SNPs across diverse populations. Evolutionarily, SNPs act as key indicators of , capturing historical population bottlenecks, migrations, and admixture events that shape ancestry patterns. For cataloging, the dbSNP database assigns unique reference SNP identifiers (rsIDs), such as rs12345, to track and annotate these variants across studies and genomes.

Importance of Genotyping

Single nucleotide polymorphisms (SNPs) play a pivotal role in disease association studies by identifying genetic variants linked to complex traits and disorders. For instance, the rs7903146 SNP in the TCF7L2 gene has been strongly associated with an increased of , with carriers of the showing up to a 1.4-fold higher susceptibility in large-scale genome-wide association studies (GWAS). Similarly, SNPs in the 8q24 region, such as rs6983267, confer elevated for multiple cancers, including and colorectal types, where the is present in approximately 50% of Europeans and contributes to tumor initiation through regulatory effects on nearby oncogenes like . These associations have enabled the development of polygenic scores that predict disease susceptibility with improved accuracy, guiding preventive strategies in clinical settings. In , SNP genotyping facilitates by revealing how genetic variants influence drug response and . A prominent example is the , where poor metabolizer alleles (e.g., *4) impair the conversion of to its , resulting in reduced efficacy and up to 50% of patients experiencing inadequate pain relief. Conversely, ultrarapid metabolizers with duplicated functional alleles can produce excessive , leading to toxicity risks such as respiratory depression, as demonstrated in clinical cases where genotyping predicted adverse outcomes. This has prompted regulatory guidelines, including FDA warnings, to avoid in poor metabolizers, thereby enhancing treatment safety and efficacy across diverse populations. SNP genotyping is essential in for tracing ancestry, kinship, and . Ancestry informative markers (AIMs), such as those identified through of SNP data, allow differentiation of continental origins with over 99% accuracy using as few as 3,000 SNPs, enabling and migration history reconstruction. In evolutionary studies, dense SNP panels have mapped patterns, revealing continuous genetic gradients across that reflect ancient dispersals from around 60,000 years ago. These applications extend to assessments, where SNPs quantify in to inform conservation priorities. Beyond human applications, SNP genotyping drives advancements in and . In breeding programs, SNPs enable genomic selection in crops like and livestock such as , accelerating trait improvement for yield and disease resistance by 20-50% compared to traditional methods, with over 50,000 SNPs routinely used in commercial panels. Evolutionarily, SNPs illuminate migration and in non-human species, such as tracking Neanderthal in modern humans via analysis. Economically, the global SNP genotyping market, fueled by these applications, is estimated at approximately USD 30.01 billion in 2025.

Sample Preparation and Amplification

DNA Isolation

DNA isolation is the foundational step in SNP genotyping workflows, providing high-quality genomic DNA free from contaminants that could interfere with downstream enzymatic reactions such as PCR amplification. This process involves lysing cells to release DNA, followed by purification to remove proteins, lipids, and other impurities, ensuring sufficient yield and integrity for reliable genotyping. Common challenges include achieving adequate DNA quantity from limited samples and maintaining purity to prevent inhibition of amplification enzymes. Traditional phenol-chloroform extraction remains a widely used method for isolating high-molecular-weight DNA, particularly from blood and tissue samples. In this technique, cells are lysed with detergents and , followed by using phenol-chloroform-isoamyl alcohol to partition DNA into the aqueous phase, with subsequent for recovery. It offers high yields of intact DNA suitable for but requires careful handling due to the hazardous nature of the reagents. Silica-based column purification, exemplified by kits like QIAGEN's DNeasy, provides a safer alternative by binding DNA to a silica under chaotropic salt conditions (e.g., guanidine ), allowing wash steps to remove contaminants before elution in low-salt buffer. These kits are favored for their speed and consistency in applications, yielding DNA of sufficient length (>10 kb) for PCR-based assays. Magnetic bead-based purification has gained prominence for its compatibility with automation in high-throughput genotyping projects. DNA binds to carboxylated magnetic particles in the presence of chaotropic salts, enabling , washing, and elution without . Systems like those using AMPure XP beads or similar formulations efficiently recover DNA fragments across a broad size range, minimizing shearing and supporting multiplexed SNP analysis. DNA isolation methods must be adapted to diverse sample types, including , , fresh tissue, and formalin-fixed paraffin-embedded (FFPE) tissues, each presenting unique challenges in yield and purity. is the most common source, where are preferentially lysed to yield typically 4-10 μg of DNA from 200 μL of using silica or magnetic methods. provides a non-invasive alternative but often requires stabilization buffers to prevent degradation, yielding typically 10-60 μg per mL, though highly variable, with potential carryover. Tissue samples demand mechanical homogenization or enzymatic digestion prior to extraction, while FFPE samples necessitate deparaffinization and cross-link reversal to recover fragmented DNA (typically 200 bp-5 kb), suitable for targeted but with lower yields (0.1-5 μg per section). Quality assessment is critical, with spectrophotometric measurement of the A260/A280 ratio serving as a primary indicator of purity; a value of approximately 1.8 signifies minimal protein contamination, essential for efficient PCR in genotyping. Contaminants such as proteins, salts, or residual phenols can inhibit Taq polymerase, leading to failed amplifications, while humic acids or heme from certain samples may require additional cleanup. Fluorometric quantification (e.g., PicoGreen) complements absorbance readings to accurately determine double-stranded DNA concentration, ensuring at least 10-50 ng/μL for standard genotyping protocols. For large-scale SNP genotyping studies involving thousands of samples, automated systems like the Thermo Scientific KingFisher enhance efficiency and reproducibility. These instruments use magnetic rod or particle separation technology to process up to 96 samples per run, integrating lysis, binding, and elution in under 30 minutes with minimal hands-on time. Such automation reduces variability and supports integration with downstream PCR workflows, where the isolated DNA serves directly as template material.

PCR Amplification Techniques

Polymerase chain reaction (PCR) is a fundamental technique for amplifying specific DNA regions containing single nucleotide polymorphisms (SNPs) prior to genotyping detection, enabling the enrichment of target sequences from genomic DNA samples. Standard PCR relies on key components including thermostable Taq DNA polymerase, deoxynucleotide triphosphates (dNTPs), and oligonucleotide primers flanking the SNP locus, typically in a reaction buffer optimized for enzymatic activity. The process involves thermal cycling: denaturation at approximately 95°C to separate DNA strands, annealing at 50–60°C for primer binding, and extension at 72°C where Taq polymerase synthesizes new strands, repeated for 30–40 cycles to achieve exponential amplification. These steps, performed in a thermal cycler, generate amplicons of defined length (often 100–500 base pairs) suitable for downstream SNP analysis. Variants of standard PCR address limitations in specificity and throughput for SNP applications. Hot-start PCR incorporates modified that activates only at high temperatures (above 70°C), minimizing non-specific primer annealing and primer-dimer formation during setup, which is particularly beneficial for low-abundance SNP targets in complex genomes. Multiplex PCR extends this by using multiple primer pairs in a single reaction to amplify several SNP loci simultaneously, reducing time and reagent costs; for example, it has been applied to genotype up to 50 SNPs in crop like grapevines using optimized buffer conditions. Quantitative evaluation of PCR performance is essential for reliable SNP amplification, with cycle threshold (Ct) values indicating the cycle at which (in real-time formats) exceeds a baseline, typically ranging from 15–35 for efficient reactions. Amplification (E), calculated from standard curves as E=101/slopeE = 10^{-1/\text{slope}}, should approach 2.0 (100%) for ideal doubling per cycle, where the slope derives from plotting Ct against log template concentration; deviations below 1.8 (90%) signal issues like inhibitor presence or suboptimal primers. SNP-specific challenges in PCR primarily stem from primer design, as mismatches at the SNP site or nearby variants can introduce bias, leading to preferential amplification of one over another. To mitigate this, primers are positioned to avoid the SNP locus, with thermodynamic optimization ensuring similar melting temperatures (Tm ~60°C) and minimal secondary structures; tools like FastPCR aid in selecting sequences that enhance specificity without 3'-end mismatches. Post-2000 advances integrated real-time PCR with SNP amplification, allowing in situ detection via fluorescent probes during cycling, which improved sensitivity for low-frequency variants and enabled allelic discrimination in multiplex formats.

Hybridization-Based Methods

Allele-Specific Hybridization

Allele-specific hybridization (ASH) exploits the differential stability of DNA duplexes formed by oligonucleotide probes designed to match one allele of a single nucleotide polymorphism (SNP) perfectly while introducing a mismatch with the alternative allele. This mismatch, typically positioned near the center of the probe, imposes a thermodynamic penalty that destabilizes the hybrid, enabling discrimination between alleles through variations in hybridization efficiency or melting temperature. The method's core principle relies on the subtle energetic differences caused by the single-base mismatch, which can be amplified under controlled stringency conditions to achieve reliable SNP detection without enzymatic steps. A key variant of ASH is dynamic allele-specific hybridization (DASH), which monitors hybridization kinetics in real-time by incrementally raising the temperature of the reaction mixture containing biotinylated PCR-amplified target DNA immobilized on a solid support and fluorophore-labeled allele-specific probes. In DASH, the melting temperature (Tm) of the perfectly matched probe-target duplex is higher than that of the mismatched one, producing distinct fluorescence decay curves that allow unambiguous genotyping; for instance, homozygous samples show a single sharp transition, while heterozygotes exhibit two. This temperature-controlled approach enhances specificity by exploiting the full dynamic range of duplex stability, making it suitable for both low- and medium-throughput applications. The specificity of ASH probes is modulated by probe length, generally 15-25 to optimize affinity while maintaining mismatch discrimination; , which influences overall duplex Tm and mismatch penalty; and hybridization buffer salt concentrations, where elevated levels (e.g., 0.5-1 M NaCl) stabilize hybrids and necessitate higher temperatures for stringency. Shorter probes with balanced (40-60%) often yield the best discrimination, as excessive length or high GC can mask the mismatch effect, while low salt increases stringency but risks reducing overall signal. Detection in , particularly , frequently incorporates fluorescence resonance energy transfer () for sensitive, real-time readout, where a donor on the probe transfers energy to an acceptor upon stable hybridization, with signal loss indicating dissociation during . Improved configurations, such as iFRET, further refine this by using asymmetric labeling to minimize background and enhance curve resolution for accurate allele calling. ASH emerged in the early as a foundational technique for low-throughput SNP genotyping, with initial implementations using high-affinity probes like peptide nucleic acids (PNAs) to achieve single-base discrimination in dot-blot formats. These early assays laid the groundwork for subsequent hybridization-based platforms, including adaptations for scaled analysis.

Molecular Beacons

Molecular beacons are hairpin-shaped probes designed for the detection of specific sequences, including single nucleotide polymorphisms (SNPs), through fluorescence-based signaling. Developed in the late 1990s by Sanjay and Fred Russell Kramer, these probes consist of a loop region containing the target-complementary sequence flanked by a stem formed by complementary self-hybridizing arms; one arm end bears a , while the other has a quencher molecule that suppresses in the closed conformation. Upon binding to a complementary target sequence, the structure opens, separating the and quencher, which restores emission; a mismatch at the SNP site destabilizes the probe-target hybrid due to reduced binding affinity, minimizing signal generation and enabling discrimination. This mechanism relies on resonance energy transfer () for quenching in the intact , with typical quenching efficiencies ranging from 85% to 97%, though variations can occur depending on the fluorophore-quencher pair. In SNP genotyping, molecular beacons are integrated into (PCR) assays to monitor amplification and allele-specific hybridization in a homogeneous format, allowing simultaneous detection of multiple SNPs through probes with distinct fluorophores. For instance, they enable reliable genotyping of SNPs in genes like those associated with by distinguishing perfect matches from single-base mismatches during thermal cycling. Key advantages include the elimination of washing steps required in heterogeneous assays, facilitating high-throughput and real-time analysis, as well as enhanced specificity for single-base differences compared to linear probes like those in assays. However, limitations arise from incomplete , which can elevate background , and potential unintended self-hybridization or stem-loop instability that may affect probe performance under varying ionic conditions.

SNP Microarrays

SNP microarrays are high-throughput platforms that enable the simultaneous genotyping of hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) by hybridizing fragmented genomic DNA to immobilized probes on a solid surface. These arrays facilitate genome-wide association studies and large-scale genetic analyses by providing dense coverage of genetic variation across the human genome. Developed in the early 2000s, SNP microarrays have evolved from targeted panels to comprehensive whole-genome solutions, leveraging advances in probe design and signal detection to achieve high accuracy and reproducibility. Key commercial platforms include the GeneChip and the Illumina BeadChip systems. The GeneChip employs a ligation-mediated approach in , where genomic DNA is first digested with restriction enzymes to generate fragments, followed by adaptor ligation to enable PCR amplification and labeling of the targets. These labeled fragments then hybridize to an array of probes tiled across the , with each SNP represented by multiple perfect match (PM) probes that fully complement one and mismatch (MM) probes that differ by a single central base to reduce non-specific binding. After hybridization and washing, the array is scanned using a to measure intensities from bound targets. In contrast, the Illumina BeadChip utilizes an allele-specific extension method integrated into the Infinium assay. Genomic DNA is whole-genome amplified without locus-specific PCR, fragmented enzymatically, and hybridized to beads attached to a substrate, where each bead type carries a locus-specific 50-mer oligo probe ending one base before the SNP site. During a single-base extension step, DNA polymerase incorporates a fluorescently labeled nucleotide specific to the allele at the SNP position, followed by staining and scanning to capture red (for one allele) and green (for the other) channel intensities. This process supports multi-sample processing on BeadChips holding up to 12 or 24 samples per array. Both platforms output raw fluorescence intensities that are processed to generate calls. For arrays, intensities from PM and MM probes are contrasted to compute relative signals (e.g., AA, AB, BB), while Illumina data yield normalized intensity ratios (often denoted as θ or B-allele frequency) from the two color channels to cluster samples into homozygous (AA or BB) or heterozygous (AB) . Advanced algorithms, such as Birdseed for or GenomeStudio clustering for Illumina, refine these calls by modeling expected intensity distributions. Modern SNP microarrays interrogate over 1 million SNPs per sample, enabling near-complete coverage of common genetic variants. Early first-generation arrays in the 2000s, such as the 10K Mapping Array (2005) and Illumina Human-1 (100K SNPs), focused on candidate regions but rapidly scaled to whole-genome coverage with the Genome-Wide Human SNP Array 6.0 (906K SNPs, 2007) and Illumina Human1M-Duo (1.2 million SNPs). These advances have supported population-scale genotyping projects, with call rates exceeding 99% and accuracy comparable to for validated SNPs. Data analysis involves normalization to correct for technical biases like dye effects or affinity variations. For Illumina BeadChips, (QN) aligns intensity distributions across samples and loci, often enhanced with thresholding (tQN) to stabilize estimates and improve copy number and allelic ratio accuracy by 15-26% in standard deviation reduction. data typically use robust multi-array average (RMA)-like methods, incorporating background correction, of probes, and summarization via median polish to generate reliable relative allele difference (RAD) scores for . These steps ensure robust downstream applications, such as imputation and association testing. Cost reductions have made SNP microarrays accessible for large cohorts, dropping from thousands of dollars per sample in the early 2000s to under $100 by 2025 through , automated workflows, and applications. This affordability, combined with high throughput (e.g., processing 96-384 samples simultaneously), has democratized genome-wide for research and clinical use.

Enzyme-Based Methods

Restriction Fragment Length Polymorphism

Restriction Fragment Length Polymorphism (RFLP) is an enzyme-based method for SNP genotyping that exploits variations in DNA sequence to produce distinguishable restriction fragments. A single nucleotide polymorphism can create or abolish a recognition site for a specific restriction endonuclease, resulting in different patterns of DNA fragments after enzymatic digestion. This technique relies on the precise cutting of DNA at these sites, where the presence of one allele allows cleavage while the other does not, leading to fragments of varying lengths that can be separated and detected. The protocol for RFLP-based SNP genotyping typically begins with PCR amplification of the genomic region flanking the SNP to generate sufficient target DNA. The amplified product is then incubated with a whose recognition sequence is allele-specific, such as HaeIII for SNPs that introduce or remove its GGCC site. Following digestion, the resulting fragments are separated by size using or , allowing visualization of allele-specific band patterns. For detection, fragments are commonly stained with and imaged under ultraviolet light, though fluorescent labeling can enhance sensitivity in automated systems. Despite its simplicity and low cost, RFLP has notable limitations for SNP genotyping. It is only feasible for the subset of SNPs that alter sites, excluding the majority of polymorphisms and requiring careful enzyme selection. Additionally, the method is labor-intensive, involving multiple manual steps like and , which limits its throughput compared to modern high-density approaches. Historically, RFLP emerged as one of the earliest techniques in the , predating PCR and enabling the first maps through polymorphic markers. Seminal work by Botstein et al. in 1980 proposed using RFLPs to construct a map, laying the foundation for positional and disease gene identification. With the advent of PCR in the mid-, PCR-RFLP became a standard for targeted SNP analysis, though it has largely been supplanted by more scalable methods.

Allele-Specific PCR

Allele-specific PCR (AS-PCR) is a genotyping technique that exploits the specificity of PCR primers designed to anneal preferentially to one allele of a single nucleotide polymorphism (SNP), thereby amplifying only the target variant while minimizing non-specific amplification of the other allele. This method relies on the incorporation of a deliberate mismatch at the 3' terminus of the primer, which corresponds to the SNP site, preventing efficient extension by the DNA polymerase if the primer binds to the mismatched allele. To enhance discrimination, an additional destabilizing mismatch is often introduced one or two bases upstream from the 3' end, further reducing primer annealing to the non-target allele under optimized PCR conditions. Key variants of AS-PCR include the amplification refractory mutation system (ARMS-PCR), originally developed for detecting point , and primer-induced restriction analysis (PIRA-PCR). In ARMS-PCR, -specific primers are used in separate reactions or, in the tetra-primer format, co-amplified in a single reaction with inner -specific primers and outer common primers to produce distinct amplicon sizes for each . PIRA-PCR modifies the approach by introducing a primer mismatch that creates an artificial recognition site in the amplicon of one but not the other, allowing post-amplification differentiation via , though the core discrimination occurs during the -specific amplification step. Both variants enable reliable SNP genotyping with high specificity for biallelic variants. Detection of AS-PCR products typically involves to visualize the presence or absence of amplicons specific to each , providing a straightforward qualitative assessment of . For quantitative analysis, real-time PCR adaptations incorporate fluorescent probes or intercalating dyes to monitor allele-specific amplification in real time, enabling higher throughput and reduced hands-on time without post-PCR processing. These detection strategies make AS-PCR suitable for both low- and medium-throughput applications. AS-PCR offers several advantages, including simplicity in setup and execution, low cost due to reliance on standard PCR reagents and equipment, and the potential for a limited number of SNPs in a single reaction, such as through tetra-primer ARMS formats. It is particularly effective for targeted in resource-limited settings and has been widely adopted for validating SNPs in clinical and contexts. A 2025 comparative study of PCR-based methods for the challenging class IV T-to-A SNP rs9939609 in the demonstrated that both ARMS-PCR and PIRA-PCR achieved over 99% accuracy across diverse sample sets, outperforming some alternatives in specificity for difficult variants with high or secondary structure propensity.

Flap Endonuclease

The flap endonuclease (FEN) method for SNP genotyping, commercialized as the Invader assay, relies on the structure-specific cleavage activity of FEN enzymes to discriminate single nucleotide polymorphisms (SNPs) in target DNA. The assay involves two overlapping oligonucleotides: an invader probe that hybridizes upstream of the SNP site and an allele-specific probe that hybridizes across the SNP, forming a three-dimensional "flap" structure only when the probe perfectly matches the target allele. The FEN enzyme, such as Cleavase derived from Thermus aquaticus DNA polymerase, recognizes this overlapping flap and cleaves the 5' arm of the allele-specific probe at the junction point, releasing a short oligonucleotide flap. This cleavage is highly specific, occurring with minimal mismatch tolerance due to the enzyme's preference for perfectly matched structures, akin to but distinct from the 5' nuclease activity used in other assays. The released flap then hybridizes to a separate fluorescent reporter probe containing a fluorescence resonance energy transfer (FRET) cassette, triggering a secondary cleavage by FEN that separates a fluorophore from a quencher, generating a detectable fluorescent signal proportional to the allele present. This signal amplification is linear and isothermal, avoiding the exponential amplification of PCR and thus eliminating biases from preferential amplification of alleles. Detection can be performed in real-time or endpoint formats using standard fluorescence readers, with sensitivity down to femtomolar levels of target DNA. The method was pioneered through the discovery of FEN's cleavage properties in the early 1990s and adapted for genotyping by Third Wave Technologies, which commercialized the Invader assay in the late 1990s. A key advantage of the is its capability, allowing of up to 100 SNPs in a single reaction through pre-amplification of target regions via multiplex PCR followed by parallel invasive cleavages, each with distinct fluorescent labels. This approach supports high-throughput applications, such as genome-wide association studies, with reported accuracy exceeding 99% and low failure rates under optimized conditions. The assay's reliance on enzymatic cleavage rather than hybridization stability alone enhances specificity for SNPs in complex genomic backgrounds.

Primer Extension

Primer extension, also known as minisequencing or single-nucleotide primer extension (SNuPE), is an enzymatic method for SNP genotyping that involves the extension of a primer annealed immediately upstream of the polymorphic site using and allele-specific dideoxynucleoside triphosphates (ddNTPs). This technique, pioneered in the late 1990s, allows for the incorporation of a single labeled complementary to the SNP , terminating extension and enabling discrimination. Common implementations include SNaPshot, a commercial single-base extension , and broader minisequencing protocols adaptable to various formats. The protocol typically begins with PCR amplification of the target region containing the SNP to generate a template, followed by purification to remove unincorporated primers and dNTPs. An unlabeled primer is then annealed to the single-stranded PCR product immediately adjacent to the SNP site, and a , such as Thermo Sequenase, extends the primer by adding one fluorescently labeled ddNTP (e.g., ddATP, ddCTP, ddGTP, or ddTTP) specific to the present, with different colors assigned to each for . The reaction is cycled (e.g., 26 cycles at optimized Mg²⁺ concentrations) to enhance yield, and unincorporated ddNTPs are removed via enzymatic digestion or cleanup. Detection is achieved through on instruments like the 3730xl Genetic Analyzer, where extension products are separated by size and , allowing simultaneous analysis of up to 10 SNPs per reaction with software like GeneMapper for calling. This method offers high accuracy for heterozygous genotype calls, with error rates as low as 0.003, as validated against array-based platforms, due to the direct incorporation of allele-specific signals. It also effectively handles SNPs near insertions/deletions (indels), genotyping up to 87% of such variants successfully, as the upstream primer annealing site can be designed to avoid affected regions. Automated since the early through commercial kits and instrumentation, primer extension supports high-throughput processing, yielding up to 42,000 genotypes per day. Applications include panels for individual SNPs and clinical diagnostics, such as multiplex assays for mutations in genes like PAH in , achieving 100% detection rates in targeted populations.

5'-Nuclease Assay

The 5'-nuclease assay, also known as the assay, is a widely adopted PCR-based method for (SNP) genotyping that leverages the intrinsic 5'-3' activity of (Taq) to generate allele-specific fluorescent signals during amplification. Introduced in the early 1990s, this homogeneous, closed-tube technique enables high-throughput genotyping without post-PCR processing, making it suitable for discriminating biallelic SNPs in diverse applications such as and . The assay's reliability stems from its ability to detect subtle differences in probe-target hybridization stability, achieving genotyping accuracy exceeding 99% in validated systems. Probe design is critical for specificity, featuring two competing oligonucleotide probes, each complementary to one allele of the SNP. Each probe is approximately 13-30 nucleotides long, with a fluorescent reporter dye attached to the 5' end—typically 6-carboxyfluorescein (FAM) for one allele and VIC for the other—and a non-fluorescent quencher moiety, such as minor groove binder (MGB) or 6-carboxytetramethylrhodamine (TAMRA), at the 3' end to suppress fluorescence in the intact state. The polymorphic nucleotide is positioned in the central third of the probe sequence to optimize mismatch discrimination, as a single base mismatch with the non-complementary allele reduces hybridization stability and prevents efficient cleavage. Probes are paired with locus-specific PCR primers, and the overall design ensures a melting temperature (Tm) of 65-67°C for the probes to align with the 60°C annealing temperature commonly used in the assay. In the mechanism, during each PCR cycle, the probes anneal to their target sequences if perfectly matched. As Taq polymerase extends the upstream primer, its 5' nuclease domain hydrolyzes the 5' end of the annealed probe, separating the reporter dye from the quencher and producing a proportional increase in fluorescence that accumulates over cycles. Mismatched probes fail to anneal stably and are not cleaved, resulting in no signal for that allele. This process occurs in real-time, but genotyping is typically performed via endpoint analysis after 40-50 cycles. Genotype determination relies on the ratio of endpoint fluorescence from the two reporter dyes, plotted on a two-dimensional scatter graph where samples cluster into three distinct groups: homozygotes for the VIC-labeled (high VIC, low FAM), homozygotes for the FAM-labeled (low VIC, high FAM), and heterozygotes (intermediate levels of both). Real-time amplification curves can also aid in clustering by monitoring signal accumulation kinetics, though endpoint reading suffices for most applications. Automated software on instruments like the 7500 Fast Real-Time PCR System facilitates cluster calling and quality control. Commercially, the SNP Genotyping Assays developed by (now ) have been a cornerstone of the field since the late , with predesigned assays available for over 17 million SNPs, including those from the HapMap and 1000 Genomes projects. These assays support medium- to high-throughput of hundreds to thousands of samples per SNP, with typical reaction volumes of 5-25 μL and costs enabling analysis of more than 1,000 SNPs in large cohorts. The platform's integration with multi-color detection has expanded its use to duplex or multiplex formats for simultaneous SNP analysis.

Oligonucleotide Ligation Assay

The oligonucleotide ligation assay (OLA) is a genotyping method that exploits the high fidelity of enzymes to discriminate single nucleotide polymorphisms (SNPs) by joining adjacent hybridized to target only when there is a perfect base match at the ligation junction. In this assay, two probes are designed to anneal to the target : an allele-specific probe with its 3' terminus positioned at the polymorphic site and a locus-specific probe adjacent to it on the 5' side. Thermostable ligases, such as (Taq) , catalyze the formation of a between the probes if the allele-specific probe perfectly matches the SNP allele; a single-base mismatch at the 3' end of the allele-specific probe substantially inhibits ligation due to the enzyme's sensitivity to distortions in the DNA helix. This principle was first demonstrated for detection in 1988 and later adapted for . Variants of OLA enhance its utility for high-throughput applications, including integration with PCR amplification of the ligated products to generate detectable signals from low-abundance targets. Another variant, ligation-rolling circle amplification (L-RCA), employs circularizable "padlock" probes that, upon allele-specific ligation, form closed circles amenable to exponential isothermal amplification by phi29 , enabling sensitive detection without initial PCR. These approaches maintain the core ligation specificity while amplifying signals for analysis in diverse sample types, such as genomic DNA directly from or tissue. Detection of ligated products in OLA typically involves labeling strategies followed by separation and readout. Common methods include incorporating fluorescent tags on the probes, amplifying ligated products via PCR with universal primers, and resolving alleles by or based on size differences. Alternatively, biotinylated probes can be captured on streptavidin-coated surfaces or microarrays for fluorescence-based detection, allowing of dozens of SNPs. The SNPlex platform exemplifies this by performing multiplex OLA on up to 48 SNPs per sample, using zip-coded probes that ligate allele-specifically, followed by universal PCR and for automated allele calling with >99% accuracy in large-scale studies. OLA offers advantages in specificity and simplicity for SNP genotyping, as the ligase's mismatch discrimination exceeds that of many hybridization-based methods, reducing false positives in complex samples. It has been particularly valuable in forensic applications, where its tolerance for degraded DNA and ability to multiplex ancestry-informative SNPs aid in human identification and relationship testing from limited or compromised evidence. Unlike primer extension techniques, OLA achieves discrimination through probe joining without enzymatic nucleotide addition.

Physical Property-Based Methods

Single-Strand Conformation Polymorphism

Single-strand conformation polymorphism (SSCP) is a method for detecting single nucleotide polymorphisms (SNPs) by exploiting differences in the electrophoretic mobility of single-stranded DNA fragments under non-denaturing conditions. The principle relies on the fact that denatured DNA strands fold into unique three-dimensional conformations determined by intrastrand base pairing, and a single base substitution caused by an SNP can alter this folding pattern, leading to distinct migration rates during polyacrylamide gel electrophoresis (PAGE). This technique was first described for identifying DNA polymorphisms in human genes, where sequence variations as small as one nucleotide change the conformational stability enough to produce detectable band shifts. The standard protocol for SSCP in SNP genotyping begins with genomic DNA isolation, typically from blood samples using proteinase K digestion followed by phenol-chloroform extraction. The target region containing the potential SNP is then amplified via polymerase chain reaction (PCR) to produce fragments ideally 150-250 base pairs in length. Post-amplification, the double-stranded PCR products are denatured by heating in the presence of formamide to separate the strands, after which they are rapidly cooled to allow renaturation into single-stranded conformations. The denatured samples are loaded onto a non-denaturing polyacrylamide gel (usually 5-12% ) and subjected to electrophoresis at low temperature (e.g., 4-20°C) to preserve the secondary structures. Visualization is achieved through silver for unlabeled DNA or autoradiography for radioactively labeled samples, revealing mobility shifts indicative of SNPs. SSCP demonstrates high sensitivity for SNP detection, identifying over 90% of single base substitutions in fragments shorter than 200 and approximately 80% in those ranging from 300-400 , with optimal performance around 150 base pairs where conformational changes are most pronounced. This makes it particularly effective for screening short amplicons in applications like detection in candidate genes, though detection rates can vary based on the SNP's position within the fragment and the specific sequence context. Despite its utility, SSCP has notable limitations, including labor-intensive manual gel preparation and analysis, which restrict throughput and make it sequence-dependent—certain SNPs may not induce sufficient conformational changes for detection. Additionally, partial reannealing of complementary strands during can reduce sensitivity, necessitating dilute samples and careful optimization of denaturation conditions. The method gained popularity in the for its simplicity and low cost but has been largely supplanted by higher-throughput technologies due to these constraints. Improvements to SSCP have focused on automation and enhanced detection, such as incorporating fluorescent labeling of PCR primers followed by (CE-SSCP), which allows for faster, safer analysis on automated sequencers and improves resolution for multiplexed SNP genotyping. These adaptations maintain the core conformational principle while enabling higher sample processing rates, as demonstrated in panels of functional candidate SNPs.

Denaturing Gradient Methods

Denaturing gradient methods encompass gel-based techniques that exploit the differential melting behaviors of DNA duplexes to detect single nucleotide polymorphisms (SNPs). These methods separate PCR-amplified DNA fragments on polyacrylamide gels subjected to controlled gradients of denaturants or temperature, allowing resolution of sequence variants based on their stability under partially denaturing conditions. Developed in the 1980s, they were particularly valuable for mutation scanning in targeted genomic regions prior to the widespread adoption of high-throughput sequencing technologies. The primary variants are denaturing gradient gel electrophoresis (DGGE) and temperature gradient gel electrophoresis (TGGE). DGGE employs a chemical denaturant gradient, typically composed of 20-80% urea and formamide, which progressively destabilizes DNA double helices as the gel is electrophoresed. In contrast, TGGE uses a temperature gradient, often ranging from 40-70°C, applied across the gel to achieve similar denaturation effects without chemical additives. Both techniques rely on the principle that homo- and heteroduplexes—formed when PCR products containing SNPs reanneal—exhibit distinct melting profiles; a single base mismatch in heteroduplexes lowers the melting temperature, causing the molecule to migrate more slowly or stop earlier in the gradient compared to perfectly matched homoduplexes. A key innovation enhancing sensitivity in these methods is the attachment of a GC-rich clamp (typically 30-50 bp) to one PCR primer, which anchors the duplex at one end and prevents complete strand separation, ensuring partial denaturation reveals sequence-dependent mobility shifts. The protocol begins with PCR amplification of the SNP-containing region (usually 100-500 bp), followed by denaturation and reannealing to promote heteroduplex formation—often by mixing the sample with wild-type DNA or through controlled cooling cycles. The products are then loaded onto a vertical gel (6-8% acrylamide) with the established gradient, electrophoresed at constant voltage (e.g., 100-200 V) for several hours, and visualized by staining with or silver staining, where variant bands appear as distinct patterns offset from the wild-type. These methods excel in applications such as mutation scanning for unknown SNPs in genes of interest, offering near-100% detection rates for variants within the analyzed fragment when optimized, and requiring minimal equipment beyond standard setups. Heteroduplex formation particularly enhances resolution for heterozygous SNPs, making them suitable for population screening in research settings during the pre-sequencing era. However, their labor-intensive nature, need for gradient optimization per sequence, and low throughput have led to decreased use following the rise of next-generation sequencing, which provides more scalable and precise .

Denaturing High-Performance Liquid Chromatography

Denaturing high-performance liquid chromatography (DHPLC) is a technique that detects single nucleotide polymorphisms (SNPs) by separating DNA homoduplexes from heteroduplexes formed during partial denaturation of PCR-amplified fragments. It relies on ion-pair reversed-phase high-performance liquid chromatography, where DNA molecules are separated on a nonporous polystyrene-divinylbenzene column under partially denaturing conditions at elevated temperatures, typically 60–70°C. Heteroduplexes, which contain mismatched base pairs due to SNPs, exhibit reduced thermal stability and weaker hydrophobic interactions with the column, causing them to elute earlier than stable homoduplexes. This differential elution is monitored by UV absorbance at 260 nm, producing distinct chromatograms that indicate the presence of variants. The protocol begins with PCR amplification of the target DNA region using high-fidelity polymerase to generate sufficient product, followed by denaturation at 95°C for 5 minutes and controlled reannealing via slow cooling (e.g., 1°C per minute to ambient temperature over 30–45 minutes) to promote heteroduplex formation in heterozygous samples. The resulting mixture is then injected onto the column, where separation occurs using a linear acetonitrile gradient (e.g., 15–35% in 0.1 M triethylammonium acetate buffer, pH 7.0) at an optimized temperature predicted by software based on fragment size and GC content. Analysis time per sample is approximately 5–10 minutes, with automated injection enabling high-throughput processing. Unlike denaturing gradient gel electrophoresis, DHPLC uses liquid chromatography for non-gel-based separation, allowing faster turnaround without electrophoresis. DHPLC achieves high resolution for SNP detection, with sensitivity exceeding 95% for base substitutions and small insertions/deletions in amplicons up to 1.5 kb (optimally 100–500 bp), and it can identify or minor variants at levels as low as 2–2.5% in pooled samples. Specificity ranges from 87–100%, depending on fragment design and conditions, making it reliable for scanning large genes or populations. The commercial WAVE system, introduced by Transgenomic in the late , automates this process using a DNASep cartridge and Navigator software for temperature and gradient optimization, supporting unattended analysis of up to 96 samples per run and throughput of approximately 100–200 samples per day. Compared to traditional gel-based methods like single-strand conformation polymorphism, DHPLC offers advantages including no need for radioactive labeling, quantitative peak area measurements for estimation, reduced hands-on time due to , and lower cost per sample (approximately $0.50–1.00 versus $2–5 for sequencing confirmation). It has been widely adopted for mutation scanning in genes associated with diseases like , enabling rapid identification of SNPs in clinical and research settings.

High-Resolution Melting Analysis

High-resolution melting (HRM) analysis is a post-PCR technique used for SNP genotyping that relies on the precise monitoring of DNA duplex dissociation through fluorescence changes as temperature increases. Intercalating dyes, such as EvaGreen or SYBR Green, bind to double-stranded DNA amplicons during PCR amplification, and their fluorescence decreases as the DNA melts into single strands, producing characteristic melting curves that differ based on sequence variations like SNPs. SNPs alter the melting temperature (Tm) of amplicons, with G/C base pairs typically raising Tm by 1-2°C compared to A/T pairs due to stronger hydrogen bonding and stacking interactions, enabling genotype distinction without enzymatic or probe-based steps. This method was first demonstrated for SNP genotyping in a seminal 2004 study, which highlighted its simplicity using only PCR, a DNA dye, and melting instrumentation on real-time PCR systems. The protocol integrates HRM directly with real-time PCR on instruments equipped for high-resolution , such as those with sensitive detectors. Following standard PCR amplification of a target region containing the SNP (typically 80-150 in length), a dissociation step ramps the temperature from 60°C to 95°C in 0.1°C increments while continuously recording . are plotted as versus temperature to generate melting curves, which are normalized and analyzed using the negative first (-dF/dT) to identify Tm peaks, allowing visualization of sequence-specific dissociation patterns. Saturating dyes like EvaGreen are preferred over SYBR Green for their higher binding capacity and lack of PCR inhibition, providing sharper resolution for subtle Tm shifts. Genotype discrimination in HRM exploits the formation of homoduplexes and heteroduplexes during the PCR cooling phase. Homozygous samples produce two distinct Tm peaks or curves corresponding to the uniform sequences, while heterozygous samples form heteroduplexes with mismatched bases that melt at lower temperatures, resulting in a broader or shifted curve. This approach reliably distinguishes common biallelic SNPs, with accuracy exceeding 99% in validated assays for targets like those in the TP53 gene, as shown in comparative studies against sequencing. Advances in HRM include the concomitant amplification and detection method with HRM (CADMA-HRM), which uses a three-primer system to simultaneously amplify wild-type and mutant alleles, enhancing sensitivity for challenging SNPs with minimal Tm differences, such as T-to-A transversions. Recent modifications to CADMA-HRM, reported in studies, have extended its application to indels and low-frequency variants by improving assay robustness on standard qPCR platforms, maintaining the method's cost-effectiveness (under $0.50 per sample) and probe-free nature. These developments position HRM as a scalable tool for high-throughput in clinical and research settings. Despite its advantages, HRM has limitations, including a recommended amplicon length under 300 to ensure sufficient resolution of Tm differences, as longer fragments dilute SNP-specific signals. Sequence context, such as high or adjacent polymorphisms, can confound interpretation by altering baseline Tm or introducing multiple peaks, necessitating careful primer design and validation against known standards.

Mismatch-Binding and Nuclease Assays

Mismatch-binding and nuclease assays exploit the structural distortions in DNA heteroduplexes formed by annealing strands with single nucleotide polymorphisms (SNPs), allowing proteins or enzymes to recognize and process these mismatches for detection. These methods are particularly useful for mutation scanning in PCR-amplified regions, where heteroduplexes are generated by denaturation and reannealing of DNA from heterozygous samples. Unlike direct sequencing, they provide rapid, cost-effective screening without requiring allele-specific probes. Mismatch-binding proteins, such as homologs of the bacterial MutS protein, specifically recognize and bind to bulges, bubbles, or loops caused by SNPs in heteroduplex DNA, typically detected through electrophoretic mobility shifts. In the DNA retardation assay, PCR products (200–700 bp) are incubated with 1–3 μg of thermostable MutS from Thermus thermophilus, followed by polyacrylamide gel electrophoresis and staining with SYBR-Gold to visualize the slower-migrating protein-DNA complexes. This approach detects single-base substitutions and small insertions/deletions with high specificity, making it suitable for genotyping clinical samples containing point mutations. Chimeric MutS fusions with reporter enzymes like β-galactosidase or fluorescent proteins such as GFP enable solid-phase immobilization and signal amplification, allowing detection of minute DNA quantities directly from genomic samples without prior PCR. These MutS-based methods offer sensitivity for low-abundance variants but are primarily used for qualitative scanning rather than quantitative allele calling. Nuclease assays employ structure-specific enzymes that cleave DNA at or near mismatch sites in heteroduplexes, producing fragments resolvable by gel electrophoresis. The Surveyor nuclease, derived from CEL I (a celery endonuclease), is a single-strand-specific enzyme that cleaves both strands at the 3' side of mismatches, including SNPs and small indels up to 12 nucleotides. The protocol involves PCR amplification of the target region, denaturation at 95°C followed by gradual reannealing to form heteroduplexes, incubation with Surveyor nuclease at 42°C for 20–60 minutes in Mg²⁺-containing buffer, and analysis of cleavage products (typically 100–500 bp) via polyacrylamide gel electrophoresis. This method detects heterozygous variants at frequencies as low as 1 in 32 copies (approximately 3%), enabling high-throughput SNP discovery in large genomic fragments exceeding 2.9 kb. Similarly, T7 endonuclease I, a resolvase from bacteriophage T7, cleaves heteroduplexes at mismatch-induced bubbles or loops, with a protocol mirroring Surveyor's but using incubation at 37°C for 15–30 minutes. Commercialized for applications like CRISPR validation, T7 endonuclease I shows comparable sensitivity (1–10% variant detection) and is preferred for deletions over Surveyor due to higher efficiency on certain substrates, though it may require multiple reactions to minimize false negatives (up to 70% in some SNP contexts). Both nucleases are applied in scanning modes for mutation mapping, not precise genotyping, and are cost-effective alternatives to sequencing for initial variant identification in research and diagnostics.

Mass Spectrometry-Based Methods

MALDI-TOF Mass Spectrometry

Matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry enables SNP genotyping by ionizing post-PCR or products and measuring their mass-to-charge ratios to differentiate alleles based on subtle mass differences. For instance, incorporation of versus in extension products results in a 16 Da mass shift due to their respective weights of 313 Da and 329 Da, allowing resolution of homozygous and heterozygous genotypes through distinct spectral peaks. This direct mass analysis avoids indirect detection methods, providing unambiguous allele identification even for challenging SNPs. The protocol begins with PCR amplification of the genomic region containing the SNP, followed by a single-base primer extension reaction adjacent to the polymorphic site using a polymerase, dNTPs, and termination mix to generate allele-specific products. Excess nucleotides are removed by incubation with shrimp alkaline phosphatase (SAP) at 37°C for 30-60 minutes to dephosphorylate unincorporated dNTPs, minimizing spectral noise; the enzyme is then inactivated by heat. Purified extension products are desalted, mixed with a UV-absorbing MALDI matrix such as alpha-cyano-4-hydroxycinnamic acid, and spotted onto a stainless steel target plate. A nitrogen laser desorbs and ionizes the sample, propelling singly charged ions through a field-free drift tube in the TOF analyzer, where flight time correlates inversely with the square root of mass, yielding spectra for genotype calling. The Sequenom MassARRAY iPLEX platform exemplifies a key , supporting automated, high-throughput with multiplexing capacities of up to 40 SNPs per reaction well through optimized primer design that spaces mass signals to prevent overlap. Developed in the early 2000s amid the Project's push for scalable , this method has evolved to handle thousands of samples daily with minimal hands-on time. MALDI-TOF offers advantages including fluorescence-free detection for unbiased mass readout, robust performance across diverse sample types, and genotyping accuracy greater than 99% with call rates often exceeding 98% in multiplex formats. These attributes stem from the technique's high mass resolution (down to 1-3 Da) and sensitivity to low DNA inputs (1-10 ng). In applications, it has powered large-scale research, such as associating CYP450 SNPs with drug efficacy and adverse reactions in cohorts of thousands, facilitating advancements since its widespread adoption around 2000. The platform, now under Agena Bioscience, continues to be used for SNP genotyping as of 2025.

Electrospray Ionization Mass Spectrometry

Electrospray ionization mass spectrometry (ESI-MS) enables SNP genotyping by generating multiply charged ions from DNA samples in solution, allowing the analysis of mass-to-charge ratios (m/z) to identify allelic variations based on subtle mass differences, such as 9 Da for A↔T or 40 Da for G↔C substitutions. This soft ionization technique preserves intact biomolecules, producing a series of adjacent peaks corresponding to different charge states, which facilitates accurate mass determination through deconvolution algorithms that reconstruct the neutral molecular mass. For more complex analyses, tandem MS (MS/MS) is employed, where precursor ions are fragmented in a collision cell to yield daughter ions, revealing sequence-specific fragmentation patterns that confirm SNP compositions without enzymatic digestion. The typical protocol begins with PCR amplification of target regions containing the SNPs, followed by cleanup to remove salts, primers, and unincorporated nucleotides that could cause ion adduction and spectral broadening. Cleanup is achieved via , digestion, or online chromatographic desalting, after which the purified amplicons are introduced into the ESI source via a capillary needle under (typically 2-5 kV), nebulized with a sheath gas, and ionized. The resulting ions enter the mass analyzer—often a or time-of-flight (TOF) instrument—where spectra are acquired and deconvoluted to assign allele-specific masses; for MS/MS, selected precursors are isolated and fragmented to generate confirmatory spectra. This process supports direct genotyping of heterozygous and homozygous SNPs from amplicons up to several hundred base pairs. Key advantages of ESI-MS include its ability to handle longer DNA fragments (>100 bp) compared to other mass spectrometry methods, owing to the multiple charging that reduces m/z values into a detectable range, and its capacity to distinguish sequence isomers through MS/MS fragmentation, which provides structural detail beyond mere mass shifts. Additionally, the technique offers high mass accuracy (often <20 ppm) and sensitivity down to picogram levels of input DNA, enabling robust genotyping without fluorescent labeling or immobilization steps. Commercial platforms such as the Ibis PLEX-ID system, which integrated ESI-TOF MS, were widely adopted for forensic applications, including multiplexed panels of 40 autosomal SNPs for human identification and mixture deconvolution, until its discontinuation by Abbott in 2017. These systems automated PCR setup, desalting, and data analysis, achieving near-100% sensitivity from 100 pg DNA per reaction and supporting SNP/indel detection in degraded samples. Post-2010 developments enhanced automation and throughput for ESI-MS, but following the PLEX-ID discontinuation, the method has seen reduced use in routine SNP genotyping, with applications shifting toward research in proteomics and other biomolecular analyses as of 2025.

Sequencing-Based Methods

Sanger Sequencing

Sanger sequencing, also known as chain-termination sequencing, relies on the incorporation of dideoxynucleotide triphosphates (ddNTPs), which lack a 3'-hydroxyl group and thus terminate DNA polymerase-mediated strand extension upon incorporation. In the automated fluorescent variant, four distinct fluorescent dyes are conjugated to the ddNTPs (one for each base: ddATP, ddTTP, ddGTP, and ddCTP), enabling the simultaneous generation and detection of termination fragments in a single reaction tube. This four-color detection system revolutionized the method by replacing labor-intensive radioactive labeling and gel lane separation with capillary-based fluorescence reading. The protocol for SNP genotyping using Sanger sequencing begins with PCR amplification of the target genomic region containing the SNP of interest, typically using high-fidelity polymerases to generate clean templates of 200-500 base pairs. This is followed by cycle sequencing, where the purified PCR product serves as a template for linear amplification with a sequencing primer, DNA polymerase, normal dNTPs, and fluorescently labeled ddNTPs under thermal cycling conditions to produce a ladder of terminated fragments. The fragments are then separated by size via capillary electrophoresis on automated systems such as Applied Biosystems (ABI) 3730 or 3500 Genetic Analyzers, where a laser excites the dyes, and a detector records emission spectra to generate a chromatogram. SNP calling involves inspecting the resulting electropherogram for peak patterns at the position of interest: a single dominant peak indicates homozygosity, while overlapping double peaks of roughly equal height signify heterozygosity. Manual visual inspection is essential to distinguish true heterozygous signals from artifacts like dye blobs or compression, though software tools such as Sequence Scanner or Chromas can automate base calling with user verification. Due to its read lengths of up to 800-1000 bases and error rate below 0.01%, Sanger sequencing remains the gold standard for validating SNPs identified by higher-throughput methods. With a throughput limited to approximately 1-10 SNPs per sample per run, Sanger sequencing is ideal for targeted validation rather than genome-wide analysis. However, its high per-sample cost—around $5-10 per SNP—and labor-intensive workflow have made it unsuitable for large-scale genotyping, leading to its phase-out for routine applications after 2010 in favor of more scalable technologies.

Next-Generation Sequencing

Next-generation sequencing (NGS) has transformed SNP genotyping by enabling massively parallel sequencing of DNA fragments, allowing the simultaneous interrogation of millions of potential SNP sites across whole genomes or targeted amplicon panels with high sensitivity and specificity. Unlike traditional Sanger sequencing, which processes one DNA fragment at a time, NGS platforms generate billions of short reads in a single run, facilitating de novo SNP discovery and precise genotyping in diverse applications such as population genetics and clinical diagnostics. This approach achieves per-base accuracies exceeding 99.9% when validated against Sanger sequencing, making it a gold standard for variant detection. Key NGS platforms for SNP genotyping include Illumina's sequencing-by-synthesis (SBS) systems, which utilize reversible terminator nucleotides and fluorescent imaging to detect base incorporation during DNA synthesis on a flow cell. Ion Torrent platforms employ semiconductor technology to measure pH changes resulting from proton release during nucleotide incorporation, offering rapid turnaround times for targeted SNP panels. Pacific Biosciences (PacBio) systems use single-molecule real-time (SMRT) sequencing, where a polymerase incorporates fluorescently labeled nucleotides in zero-mode waveguides, producing long reads (up to 20 kb) that improve SNP calling in repetitive or structurally complex genomic regions. These platforms differ in read length, throughput, and error profiles, with Illumina providing the highest accuracy for short-read SNP genotyping, Ion Torrent excelling in speed for smaller panels, and PacBio aiding in resolving ambiguous variants through longer context. The typical NGS workflow for SNP genotyping begins with library preparation, involving genomic DNA fragmentation (e.g., via sonication or enzymatic methods), end repair, A-tailing, and adapter ligation to enable sequencing compatibility. Amplification follows, using emulsion PCR on beads for Ion Torrent to clonally expand fragments or bridge amplification on a flow cell surface for Illumina to form clusters. Sequencing then occurs in a massively parallel manner: for Illumina, iterative cycles of nucleotide addition and imaging; for Ion Torrent, sequential flow of unlabeled nucleotides with real-time pH detection; and for PacBio, continuous real-time monitoring of polymerase activity. Post-sequencing, raw reads undergo quality trimming and demultiplexing before analysis. SNP calling in NGS data involves aligning short reads to a reference genome using tools like Burrows-Wheeler Aligner (BWA), which efficiently maps reads while accounting for mismatches indicative of SNPs. Variant calling then employs probabilistic models in software such as the Genome Analysis Toolkit (GATK), which applies machine learning to distinguish true SNPs from sequencing artifacts based on read depth, mapping quality, and allele balance. Reliable SNP genotyping typically requires a minimum read depth of 30x at target loci to achieve >99% accuracy, as lower coverage increases false positives and negatives. Hard filtering or variant quality score recalibration in GATK further refines calls by excluding low-confidence variants. NGS offers significant advantages for SNP genotyping, including the ability to profile millions of SNPs per sample with per-base error rates below 0.1%, enabling comprehensive genome-wide association studies and rare variant detection. By 2025, targeted NGS panels have reduced costs to approximately $0.01 per SNP, driven by in high-throughput platforms and declining reagent prices, making it more accessible than earlier methods. However, platform-specific biases must be mitigated: variations can cause uneven coverage, with high-GC regions often underrepresented in Illumina data due to inefficient amplification. Homopolymer stretches are particularly error-prone in Ion Torrent sequencing, where signal decay leads to insertion/deletion inaccuracies, while PacBio exhibits lower bias in these areas but higher overall raw error rates (~13%). Computational corrections, such as GC normalization in alignment, help address these issues to ensure robust .

Genotyping-by-Sequencing

Genotyping-by-sequencing (GBS) is a high-throughput method for discovering and genotyping single nucleotide polymorphisms (SNPs) by reducing genome complexity through digestion, followed by next-generation sequencing (NGS) of the resulting fragments. Developed in , GBS enables cost-effective analysis of large populations without requiring a , making it particularly suitable for species with high or limited genomic resources. The approach was first described using the ApeKI, which targets GCWGC sites and produces 5' overhangs compatible with adapter ligation, allowing for the selective amplification and sequencing of low-copy genomic regions. The protocol for GBS begins with the digestion of genomic DNA (typically 100-200 ng per sample) using ApeKI or similar enzymes, followed by the ligation of a unique barcode adapter to the overhangs and a common sequencing adapter to facilitate multiplexing. The ligated fragments are then pooled, purified to remove small products, and subjected to PCR amplification (typically 18-20 cycles) to generate the sequencing library, with size selection often focusing on 200-500 bp fragments to enrich for informative regions. Libraries are sequenced using platforms like Illumina, yielding short single-end reads (e.g., 64-100 bp) at depths sufficient for SNP detection (around 0.1-1x coverage per locus across the reduced representation). This streamlined process, completable in a few days, supports genotyping of thousands of samples at costs under $50 per sample when scaled. GBS is widely applied in studies to assess diversity and structure, as well as in (QTL) mapping for traits like biomass yield in and disease resistance in animals, often identifying over 10,000 SNPs per without prior sequence knowledge. For variant calling, reads are demultiplexed by barcodes, quality-filtered, and either assembled de novo into tags for novel SNP discovery or mapped to a using tools like BWA, with genotypes inferred via likelihood models that accommodate and . Variants of GBS, such as restriction-site associated DNA sequencing (RAD-seq) developed in 2008 using enzymes like , and double-digest RAD-seq (ddRAD-seq) introduced in 2012 with paired enzymes (e.g., and NlaIII) for tunable fragment sizes, offer customization for specific marker densities and multiplexing needs. These methods have enabled de novo in non-model organisms, supporting applications from evolutionary studies to breeding programs.

Emerging Methods

CRISPR-Cas Systems

CRISPR-Cas systems have emerged as powerful tools for SNP genotyping by leveraging the programmable nucleic acid recognition and collateral cleavage activity of Cas enzymes to detect single-base variations with high specificity and sensitivity. Originally developed for genome editing, these systems have evolved for diagnostic applications, enabling isothermal detection without the need for thermal cycling or complex equipment. In SNP genotyping, a guide RNA (gRNA) is designed to hybridize with the target DNA or RNA sequence adjacent to a protospacer adjacent motif (PAM) for Cas12 or a protospacer flanking sequence (PFS) for Cas13; upon perfect match binding, the Cas enzyme activates and indiscriminately cleaves reporter molecules, producing a detectable signal only for the matched allele. The mechanism relies on the enzyme's collateral activity: Cas13a, for instance, cleaves non-target RNA reporters after binding to target RNA, while Cas12a cleaves single-stranded DNA reporters following target DNA recognition. Specificity for single-base discrimination arises from mismatches in the gRNA-target duplex, particularly in the seed region (positions 1-8 from the PAM) or through engineered synthetic mismatches in the gRNA, which prevent activation for variant alleles; this allows differentiation of SNPs with minimal cross-reactivity. Key variants include SHERLOCK, which uses Cas13a combined with (RPA) for isothermal pre-amplification of target nucleic acids, followed by T7 transcription to generate targets for Cas13a detection. SHERLOCK achieves attomolar sensitivity and has been applied to human at multiple loci, such as those associated with , with accurate discrimination of heterozygous and homozygous variants. Similarly, DETECTR employs Cas12a with RPA for direct DNA detection, enabling rapid genotyping of polymorphisms like those in human papillomavirus strains, where single-base mismatches in the seed region reduce signal by over 90%. Detection outputs include from cleaved reporters for quantitative readouts in lab settings or colorimetric assays via gold nanoparticle aggregation or lateral flow strips for visual interpretation, often integrated in one-pot reactions completing in under 60 minutes. These methods support point-of-care applications, with lyophilized reagents enabling field-deployable, low-cost (~$0.61 per test) in resource-limited environments. Recent advances from 2023 to 2025 highlight the transition from editing-focused to detection-oriented Cas12 and Cas13 systems, with enhancements in for simultaneous SNP analysis and improved PAM/seed-based strategies for enhanced single-base fidelity, as seen in diagnostics for viral variants and cancer mutations. These developments underscore CRISPR-Cas's role in portable, isothermal SNP genotyping, surpassing traditional PCR-based methods in speed and simplicity while maintaining high accuracy.

Lateral Flow and Biosensor Assays

Lateral flow assays (LFAs) for SNP genotyping combine amplification techniques, such as PCR, with immunochromatographic detection on strip formats to provide rapid, visual genotype identification. These assays typically employ allele-specific primers labeled with and a (e.g., or digoxigenin) to selectively amplify target SNPs during the initial amplification step. The resulting amplicons are then applied to a nitrocellulose-based , where gold nanoparticles conjugated to anti-hapten antibodies enable colorimetric visualization through . The protocol begins with and PCR amplification using (LNA)-modified primers for enhanced specificity, followed by hybridization of the biotin- and hapten-labeled products with immobilized probes on the dipstick. As the sample migrates along the strip, unbound components pass the test line, while specific matches accumulate gold nanoparticle complexes at the test line, forming a visible red band; a control line, coated with or anti-biotin antibodies, always develops to validate the . This migration and detection step typically requires 15–20 minutes post-amplification, yielding results interpretable by eye without instrumentation. In 2025, a PCR-LFD method was established for the visual detection of the MC4R g.732 C > G SNP in Hu sheep, associating genotypes (CC, CG, GG) with growth traits and completing the entire process in 1.5 hours. This approach demonstrated 100% concordance with Sanger sequencing for 24 samples, highlighting its reliability for on-site applications. Biosensor assays extend LFA principles by incorporating electrochemical or optical transduction for quantitative SNP readout, often integrated with nanomaterials like graphene for improved sensitivity. Optical biosensors leverage graphene's fluorescence quenching properties, where single-stranded DNA adsorbs strongly to the surface, quenching fluorophores, while perfect-match ligation forms double-stranded DNA that desorbs and restores signal for SNP discrimination. Such graphene-based optical systems detect SNP frequencies as low as 2.6% within 40 minutes. These lateral flow and assays offer key advantages for SNP genotyping, including field-deployability due to their portability and lack of need for power or complex readers, low cost (typically under $1 per test), and potential for through color-coded nanoparticles or multi-channel signals. They achieve high sensitivity (e.g., 97.96% for parasite-associated SNPs) and specificity (up to 100%), making them suitable for resource-limited settings like antimalarial resistance monitoring.

Proteomic Genotyping

Proteomic genotyping infers genotypes of single nucleotide polymorphisms (SNPs) by analyzing protein variants produced from non-synonymous SNPs, which result in single amino acid polymorphisms (SAPs) within peptides. These SAPs cause measurable mass shifts in peptides detectable by mass spectrometry (MS), allowing indirect determination of DNA sequences without nucleic acid extraction. This approach leverages the genetic code's translation from DNA to proteins, focusing on coding regions where nucleotide changes alter amino acid composition. The protocol begins with protein extraction from biological samples, followed by enzymatic digestion—typically using —to produce peptides containing potential SAPs. These peptides are then separated and analyzed via liquid chromatography-tandem (LC-MS/MS), where high-resolution MS identifies mass-to-charge ratios and fragmentation patterns. Observed spectra are matched against databases of predicted genetically variant peptides (GVPs) derived from known SNP catalogs, such as those from the , to infer the corresponding SNP genotypes. Advances in high-resolution (HRMS) post-2020 have improved resolution for distinguishing subtle mass differences (e.g., 1 Da shifts from substitutions). In forensic applications, proteomic genotyping enables human identification from challenging samples like hair shafts, fingermarks, and bones where DNA is degraded or insufficient. For instance, 2025 reviews emphasize its utility in missing persons investigations and disaster victim identification by linking proteomic data to DNA databases for probabilistic matching. It provides a complementary bridge between genomic and proteomic profiles, enhancing intelligence from trace evidence. Key advantages include the chemical stability of proteins, which persist better than DNA in environmentally degraded samples such as buried bones or heat-exposed remains, allowing in scenarios where traditional DNA methods fail. Validation studies demonstrate high concordance rates exceeding 90% between MS-inferred genotypes and direct for targeted SNPs. However, limitations restrict its scope to non-synonymous coding SNPs that produce detectable SAPs, excluding synonymous or non-coding variants; it also requires comprehensive GVP databases and faces challenges in low-abundance samples.

Data Analysis and Quality Control

Genotype Calling

Genotype calling in SNP genotyping involves algorithmic assignment of discrete genotypes, such as homozygous reference (AA), heterozygous (AB), or homozygous alternate (BB), from raw signal intensities or read alignments generated by various platforms. This process relies on statistical models to interpret noisy data, distinguishing true variants from artifacts while maximizing call accuracy and completeness. Core principles include unsupervised clustering of multidimensional data points, where intensity signals from allele-specific probes are grouped into expected genotype clusters, and probabilistic modeling to estimate genotype likelihoods under uncertainty. Clustering approaches, such as k-means, partition intensity data into 2-3 clusters for diploid SNPs, leveraging platform-specific priors like skewed homozygote distributions in Illumina arrays to improve separation. For instance, adaptive k-means variants adjust cluster numbers based on signal quality predictors, enhancing calls for low-frequency alleles. Bayesian models offer a complementary framework by treating genotype assignment as posterior inference over mixture distributions, such as Gaussian mixtures for data, which incorporate priors on cluster variances and shapes to handle non-normal distributions. A symmetric multinomial logistic regression (SMLR) model, applied to small DNA panels as of 2025, estimates conditional genotype probabilities from allele signals without relying on priors, reducing no-call rates by up to 50% at low input quantities like 31 pg. Dedicated software implements these principles across genotyping methods. For microarray data, Illumina's GenomeStudio employs normalized intensity clustering to automate diploid and polyploid calls, generating quality metrics like Log R ratios. In next-generation sequencing (NGS), the Genome Analysis Toolkit (GATK) HaplotypeCaller uses local de novo assembly and Bayesian genotype likelihoods to call SNPs and indels simultaneously from aligned reads. HISAT-genotype provides a graph-based alignment platform for targeted regions, applying expectation-maximization to infer maximum-likelihood s from haplotype-resolved assemblies. Common parameters ensure reliability, including minimum call rates exceeding 95% per SNP to filter low-confidence loci and Phred quality scores above 20, corresponding to a 1% base-calling error rate. No-calls arise from ambiguous signals, such as overlapping clusters or low coverage, and are managed via probabilistic thresholds to prioritize accuracy over completeness. For example, methods like declare no-calls when posterior probabilities fall below 0.80, modeling raw intensities as t-distributions to quantify uncertainty and flag potential assay failures. Advances incorporate to refine calls in error-prone datasets; neural networks, such as Cluster Buster (2024), train on raw intensities to recover no-calls with high concordance (>99%) to , particularly for ancestry-diverse cohorts in disease studies. These approaches reduce manual intervention while adapting to batch effects and .

Error Detection and Validation

Error sources in SNP genotyping primarily arise from technical limitations in sample preparation and amplification processes. Allelic dropout, where one allele fails to amplify during PCR, is a common issue in low-quality or degraded DNA samples, leading to false homozygous calls. Preferential amplification occurs when one allele is amplified more efficiently than the other due to sequence variations or primer mismatches, resulting in allelic imbalance. Contamination from exogenous DNA can introduce false alleles, particularly in high-throughput workflows. In modern SNP genotyping methods, such as array-based or next-generation sequencing approaches, overall error rates are typically below 0.1%, reflecting improvements in assay design and quality controls. Quality metrics are essential for assessing the reliability of SNP genotypes. Concordance rates between duplicate samples, where at least 99% agreement is expected, serve as a direct measure of and help identify systematic errors. Deviations from Hardy-Weinberg equilibrium (HWE), tested using chi-square statistics, indicate potential genotyping issues, as SNPs in a randomly population should conform to expected frequencies based on frequencies. For instance, a chi-square test compares observed versus expected counts, with p-values below 0.001 often flagging problematic loci for further investigation. Validation strategies ensure the accuracy of SNP calls beyond initial quality checks. is routinely used to confirm variants, particularly those with low coverage or ambiguous calls, achieving near 100% concordance with high-quality next-generation sequencing data. Internal controls, such as known SNPs with established genotypes, are incorporated into assays to monitor performance and detect batch-specific errors. In family-based studies, Mendelian inconsistencies—where offspring genotypes cannot be derived from parental alleles—are checked in trios to identify errors, with rates exceeding 1% prompting re-genotyping. In forensic applications, protocols emphasize duplicate testing and error tracking to maintain evidential integrity, building on established frameworks for genetic markers.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.