Hubbry Logo
search
logo

Third-generation sequencing

logo
Community Hub0 Subscribers
Read side by side
from Wikipedia

Third-generation sequencing (also known as long-read sequencing) is a class of DNA sequencing methods that have the capability to produce substantially longer reads (ranging from 10 kb to >1 Mb in length)[1] than second generation sequencing, also known as next-generation sequencing methods.[2] These methods emerged in 2008, characterized by technologies such as nanopore sequencing or single-molecule real-time sequencing, and continue to be developed.[2] The ability to sequence longer reads has critical implications for both genome science and the study of biology in general. In structural variant calling, third generation sequencing has been found to outperform existing methods, even at a low depth of sequencing coverage.[3] However, third generation sequencing data have much higher error rates than previous technologies, which can complicate downstream genome assembly and analysis of the resulting data.[4] These technologies are undergoing active development and it is expected that there will be improvements to the high error rates.[1]

Current technologies

[edit]

Sequencing technologies with a different approach than second-generation platforms were first described as "third-generation" in 2008–2009.[5]

There are several companies currently at the heart of third generation sequencing technology development, namely, Pacific Biosciences, Oxford Nanopore Technology, Quantapore (CA-USA), and Stratos (WA-USA). These companies are taking fundamentally different approaches to sequencing single DNA molecules.

PacBio developed the sequencing platform of single molecule real time sequencing (SMRT), based on the properties of zero-mode waveguides. Signals are in the form of fluorescent light emission from each nucleotide incorporated by a DNA polymerase bound to the bottom of the zL well.

Oxford Nanopore's technology involves passing a DNA molecule through a nanoscale pore structure and then measuring changes in electrical field surrounding the pore; while Quantapore has a different proprietary nanopore approach. Stratos Genomics spaces out the DNA bases with polymeric inserts, "Xpandomers", to circumvent the signal to noise challenge of nanopore ssDNA reading.

Also notable is Helicos's single molecule fluorescence approach, but the company entered bankruptcy in the fall of 2015.

Advantages

[edit]

Longer reads

[edit]

In comparison to the second generation of sequencing technologies, third generation sequencing has the obvious advantage of producing much longer reads. It is expected that these longer read lengths will alleviate numerous computational challenges surrounding genome assembly, transcript reconstruction, and metagenomics among other important areas of modern biology and medicine.[2]

It is well known that eukaryotic genomes including primates and humans are complex and have large numbers of long repeated regions. Short reads from second generation sequencing must resort to approximative strategies in order to infer sequences over long ranges for assembly and genetic variant calling. Pair end reads have been leveraged by second generation sequencing to combat these limitations. However, exact fragment lengths of pair ends are often unknown and must also be approximated as well. By making long reads lengths possible, third generation sequencing technologies have clear advantages.

Epigenetics

[edit]

Epigenetic markers are stable and potentially heritable modifications to the DNA molecule that are not in its sequence. An example is DNA methylation at CpG sites, which has been found to influence gene expression. Histone modifications are another example. The current generation of sequencing technologies rely on laboratory techniques such as ChIP-sequencing for the detection of epigenetic markers. These techniques involve tagging the DNA strand, breaking and filtering fragments that contain markers, followed by sequencing. Third generation sequencing may enable direct detection of these markers due to their distinctive signal from the other four nucleotide bases.[6]

Portability and speed

[edit]
MinION Portable Gene Sequencer, Oxford Nanopore Technologies

Other important advantages of third generation sequencing technologies include portability and sequencing speed.[7] Since minimal sample preprocessing is required in comparison to second generation sequencing, smaller equipment could be designed. Oxford Nanopore Technology has recently commercialized the MinION sequencer. This sequencing machine is roughly the size of a regular USB flash drive and can be used readily by connecting to a laptop. In addition, since the sequencing process is not parallelized across regions of the genome, data could be collected and analyzed in real time. These advantages of third generation sequencing may be well-suited in hospital settings where quick and on-site data collection and analysis is demanded.

Challenges

[edit]

Third generation sequencing, as of 2008, faced important challenges mainly surrounding accurate identification of nucleotide bases; error rates were still much higher compared to second generation sequencing.[4] This is generally due to instability of the molecular machinery involved. For example, in PacBio's single molecular and real time sequencing technology, the DNA polymerase molecule becomes increasingly damaged as the sequencing process occurs.[4] Additionally, since the process happens quickly, the signals given off by individual bases may be blurred by signals from neighbouring bases. This poses a new computational challenge for deciphering the signals and consequently inferring the sequence. Methods such as Hidden Markov Models, for example, have been leveraged for this purpose with some success.[6]

On average, different individuals of the human population share about 99.9% of their genes. In other words, approximately only one out of every thousand bases would differ between any two person. The high error rates involved with third generation sequencing are inevitably problematic for the purpose of characterizing individual differences that exist between members of the same species.[citation needed]

Genome assembly

[edit]

Genome assembly is the reconstruction of whole genome DNA sequences. This is generally done with two fundamentally different approaches.

Reference alignment

[edit]

When a reference genome is available, as one is in the case of human, newly sequenced reads could simply be aligned to the reference genome in order to characterize its properties. Such reference based assembly is quick and easy but has the disadvantage of "hiding" novel sequences and large copy number variants. In addition, reference genomes do not yet exist for most organisms.

De novo assembly

[edit]

De novo assembly is the alternative genome assembly approach to reference alignment. It refers to the reconstruction of whole genome sequences entirely from raw sequence reads. This method would be chosen when there is no reference genome, when the species of the given organism is unknown as in metagenomics, or when there exist genetic variants of interest that may not be detected by reference genome alignment.

Given the short reads produced by the current generation of sequencing technologies, de novo assembly is a major computational problem. It is normally approached by an iterative process of finding and connecting sequence reads with sensible overlaps. Various computational and statistical techniques, such as de bruijn graphs and overlap layout consensus graphs, have been leveraged to solve this problem. Nonetheless, due to the highly repetitive nature of eukaryotic genomes, accurate and complete reconstruction of genome sequences in de novo assembly remains challenging. Pair end reads have been posed as a possible solution, though exact fragment lengths are often unknown and must be approximated.[8]

Hybrid assembly – the use of reads from 3rd gen sequencing platforms with shorts reads from 2nd gen platforms – may be used to resolve ambiguities that exist in genomes previously assembled using second generation sequencing. Short second generation reads have also been used to correct errors that exist in the long third generation reads.

Hybrid assembly

[edit]

Long read lengths offered by third generation sequencing may alleviate many of the challenges currently faced by de novo genome assemblies. For example, if an entire repetitive region can be sequenced unambiguously in a single read, no computation inference would be required. Computational methods have been proposed to alleviate the issue of high error rates. For example, in one study, it was demonstrated that de novo assembly of a microbial genome using PacBio sequencing alone performed superior to that of second generation sequencing.[9]

Third generation sequencing may also be used in conjunction with second generation sequencing. This approach is often referred to as hybrid sequencing. For example, long reads from third generation sequencing may be used to resolve ambiguities that exist in genomes previously assembled using second generation sequencing. On the other hand, short second generation reads have been used to correct errors in that exist in the long third generation reads. In general, this hybrid approach has been shown to improve de novo genome assemblies significantly.[10]

Epigenetic markers

[edit]

DNA methylation (DNAm) – the covalent modification of DNA at CpG sites resulting in attached methyl groups – is the best understood component of epigenetic machinery. DNA modifications and resulting gene expression can vary across cell types, temporal development, with genetic ancestry, can change due to environmental stimuli and are heritable. After the discovery of DNAm, researchers have also found its correlation to diseases like cancer and autism.[11] In this disease etiology context DNAm is an important avenue of further research.

Advantages

[edit]

The current most common methods for examining methylation state require an assay that fragments DNA before standard second generation sequencing on the Illumina platform. As a result of short read length, information regarding the longer patterns of methylation are lost.[6] Third generation sequencing technologies offer the capability for single molecule real-time sequencing of longer reads, and detection of DNA modification without the aforementioned assay.[12]

PacBio SMRT technology and Oxford Nanopore can use unaltered DNA to detect methylation.

Oxford Nanopore Technologies' MinION has been used to detect DNAm. As each DNA strand passes through a pore, it produces electrical signals which have been found to be sensitive to epigenetic changes in the nucleotides, and a hidden Markov model (HMM) was used to analyze MinION data to detect 5-methylcytosine (5mC) DNA modification.[6] The model was trained using synthetically methylated E. coli DNA and the resulting signals measured by the nanopore technology. Then the trained model was used to detect 5mC in MinION genomic reads from a human cell line which already had a reference methylome. The classifier has 82% accuracy in randomly sampled singleton sites, which increases to 95% when more stringent thresholds are applied.[6]

Other methods address different types of DNA modifications using the MinION platform. Stoiber et al. examined 4-methylcytosine (4mC) and 6-methyladenine (6mA), along with 5mC, and also created software to directly visualize the raw MinION data in a human-friendly way.[13] Here they found that in E. coli, which has a known methylome, event windows of 5 base pairs long can be used to divide and statistically analyze the raw MinION electrical signals. A straightforward Mann-Whitney U test can detect modified portions of the E. coli sequence, as well as further split the modifications into 4mC, 6mA or 5mC regions.[13]

It seems likely that in the future, MinION raw data will be used to detect many different epigenetic marks in DNA.

PacBio sequencing has also been used to detect DNA methylation. In this platform, the pulse width – the width of a fluorescent light pulse – corresponds to a specific base. In 2010 it was shown that the interpulse distance in control and methylated samples are different, and there is a "signature" pulse width for each methylation type.[12] In 2012 using the PacBio platform the binding sites of DNA methyltransferases were characterized.[14] The detection of N6-methylation in C Elegans was shown in 2015.[15] DNA methylation on N6-adenine using the PacBio platform in mouse embryonic stem cells was shown in 2016.[16]

Other forms of DNA modifications – from heavy metals, oxidation, or UV damage – are also possible avenues of research using Oxford Nanopore and PacBio third generation sequencing.

Drawbacks

[edit]

Processing of the raw data – such as normalization to the median signal – was needed on MinION raw data, reducing real-time capability of the technology.[13] Consistency of the electrical signals is still an issue, making it difficult to accurately call a nucleotide. MinION has low throughput; since multiple overlapping reads are hard to obtain, this further leads to accuracy problems of downstream DNA modification detection. Both the hidden Markov model and statistical methods used with MinION raw data require repeated observations of DNA modifications for detection, meaning that individual modified nucleotides need to be consistently present in multiple copies of the genome, e.g. in multiple cells or plasmids in the sample.

For the PacBio platform, too, depending on what methylation you expect to find, coverage needs can vary. As of March 2017, other epigenetic factors like histone modifications have not been discoverable using third-generation technologies. Longer patterns of methylation are often lost because smaller contigs still need to be assembled.

Transcriptomics

[edit]

Transcriptomics is the study of the transcriptome, usually by characterizing the relative abundances of messenger RNA molecules in the tissue under study. According to the central dogma of molecular biology, genetic information flows from double stranded DNA molecules to single stranded mRNA molecules where they can be readily translated into functional protein molecules. By studying the transcriptome, one can gain valuable insight into the regulation of gene expression.

While expression levels can be more or less accurately depicted by second generation sequencing (we can assume that actual abundances of the population of transcripts are randomly sampled), transcript-level information still remains an important challenge.[17] As a consequence, the role of alternative splicing in molecular biology remains largely elusive. Third generation sequencing technologies hold promising prospects in resolving this issue by enabling sequencing of mRNA molecules at their full lengths.

Alternative splicing

[edit]

Alternative splicing (AS) is the process by which a single gene may give rise to multiple distinct mRNA transcripts and consequently different protein translations.[18] Some evidence suggests that AS is a ubiquitous phenomenon and may play a key role in determining the phenotypes of organisms, especially in complex eukaryotes; all eukaryotes contain genes consisting of introns that may undergo AS. In particular, it has been estimated that AS occurs in 95% of all human multi-exon genes.[19] AS has undeniable potential to influence myriad biological processes. Advancing knowledge in this area has critical implications for the study of biology in general.

Transcript reconstruction

[edit]

The current generation of sequencing technologies produce only short reads, putting tremendous limitation on the ability to detect distinct transcripts; short reads must be reverse engineered into original transcripts that could have given rise to the resulting read observations.[20] This task is further complicated by the highly variable expression levels across transcripts, and consequently variable read coverages across the sequence of the gene.[20] In addition, exons may be shared among individual transcripts, rendering unambiguous inferences essentially impossible.[18] Existing computational methods make inferences based on the accumulation of short reads at various sequence locations often by making simplifying assumptions.[20] Cufflinks takes a parsimonious approach, seeking to explain all the reads with the fewest possible number of transcripts.[21] On the other hand, StringTie attempts to simultaneously estimate transcript abundances while assembling the reads.[20] These methods, while reasonable, may not always identify real transcripts.

A study published in 2008 surveyed 25 different existing transcript reconstruction protocols.[17] Its evidence suggested that existing methods are generally weak in assembling transcripts, though the ability to detect individual exons are relatively intact.[17] According to the estimates, average sensitivity to detect exons across the 25 protocols is 80% for Caenorhabditis elegans genes.[17] In comparison, transcript identification sensitivity decreases to 65%. For human, the study reported an exon detection sensitivity averaging to 69% and transcript detection sensitivity had an average of a mere 33%.[17] In other words, for human, existing methods are able to identify less than half of all existing transcript.

Third generation sequencing technologies have demonstrated promising prospects in solving the problem of transcript detection as well as mRNA abundance estimation at the level of transcripts. While error rates remain high, third generation sequencing technologies have the capability to produce much longer read lengths.[22] Pacific Bioscience has introduced the iso-seq platform, proposing to sequence mRNA molecules at their full lengths.[22] It is anticipated that Oxford Nanopore will put forth similar technologies. The trouble with higher error rates may be alleviated by supplementary high quality short reads. This approach has been previously tested and reported to reduce the error rate by more than 3 folds.[23]

Metagenomics

[edit]

Metagenomics is the analysis of genetic material recovered directly from environmental samples.

Advantages

[edit]

The main advantage for third-generation sequencing technologies in metagenomics is their speed of sequencing in comparison to second generation techniques. Speed of sequencing is important for example in the clinical setting (i.e. pathogen identification), to allow for efficient diagnosis and timely clinical actions.

Oxford Nanopore's MinION was used in 2015 for real-time metagenomic detection of pathogens in complex, high-background clinical samples. The first Ebola virus (EBOV) read was sequenced 44 seconds after data acquisition.[24] There was uniform mapping of reads to genome; at least one read mapped to >88% of the genome. The relatively long reads allowed for sequencing of a near-complete viral genome to high accuracy (97–99% identity) directly from a primary clinical sample.[24]

A common phylogenetic marker for microbial community diversity studies is the 16S ribosomal RNA gene. Both MinION and PacBio's SMRT platform have been used to sequence this gene.[25][26] In this context the PacBio error rate was comparable to that of shorter reads from 454 and Illumina's MiSeq sequencing platforms.[citation needed]

Drawbacks

[edit]

MinION's high error rate (~10-40%) prevented identification of antimicrobial resistance markers, for which single nucleotide resolution is necessary. For the same reason, eukaryotic pathogens were not identified.[24] Ease of carryover contamination when re-using the same flow cell (standard wash protocols don't work) is also a concern. Unique barcodes may allow for more multiplexing. Furthermore, performing accurate species identification for bacteria, fungi and parasites is very difficult, as they share a larger portion of the genome, and some only differ by <5%.

The per base sequencing cost is still significantly more than that of MiSeq. However, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach;[25] this could possibly greatly help the identification of organisms in metagenomics.

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Third-generation sequencing (TGS), also known as long-read sequencing, refers to advanced DNA and RNA sequencing technologies that analyze individual nucleic acid molecules in real time without prior PCR amplification, generating reads ranging from thousands to millions of base pairs in length.[1][2] Introduced in the early 2010s, TGS emerged as a response to the limitations of second-generation sequencing (SGS), such as short read lengths (typically 150–300 bp) that hinder accurate assembly of repetitive genomic regions, detection of structural variants, and phasing of haplotypes.[1][3] The primary platforms driving TGS are Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT). PacBio employs single-molecule real-time (SMRT) sequencing, where DNA polymerase incorporates fluorescently labeled nucleotides in zero-mode waveguides, allowing continuous, long-read sequencing with average lengths of 15–20 kb and accuracy up to 99.95% using circular consensus methods.[1][2] In contrast, ONT uses nanopore-based sequencing, passing DNA or RNA through protein nanopores to measure changes in ionic current, enabling ultra-long reads up to 4 Mb, real-time analysis, and direct detection of base modifications like methylation without bisulfite conversion.[1][3] Notable devices include PacBio's Revio system, which achieves up to 360 Gb per day, and ONT's portable MinION (up to 50 Gb) and high-throughput PromethION (up to 13.3 Tb).[4][5] TGS offers significant advantages over SGS, including improved de novo genome assembly, resolution of complex structural variations (e.g., insertions, deletions, inversions), and comprehensive transcriptomics through full-length isoform identification.[2][3] It also facilitates epigenomic studies by directly sequencing modified bases and supports applications in metagenomics, cancer genomics, and rapid pathogen detection, such as during the Ebola and SARS-CoV-2 outbreaks.[1][2] Despite these benefits, challenges persist, including higher per-base error rates (5–15% before correction) and the need for sophisticated computational tools to handle long reads.[2][3] Ongoing advancements in TGS, such as improved base-calling algorithms and hybrid approaches combining TGS with SGS, promise to reduce costs and enhance accuracy, with recent 2025 developments including PacBio's SPRQ-Nx chemistry and ONT's PromethION Plus flow cells further advancing throughput and multiomic capabilities, positioning it as a cornerstone for precision medicine and large-scale genomic projects.[3][6][7]

Overview

Definition and key characteristics

Third-generation sequencing (TGS), also referred to as long-read sequencing, encompasses a suite of DNA and RNA sequencing technologies designed to generate extended sequence reads ranging from 10 kilobases (kb) to over 1 megabase (Mb) in length, facilitating the interrogation of complex genomic regions such as repetitive sequences and structural variants that challenge shorter-read methods.[8] These platforms achieve this by directly sequencing native nucleic acid molecules, bypassing the need for fragmentation or amplification, which minimizes biases associated with polymerase chain reaction (PCR) processes.[2] Key characteristics of TGS include single-molecule resolution, where individual DNA or RNA strands are sequenced without clonal amplification, enabling the detection of rare variants and heterogeneity at the molecular level.[9] Real-time data acquisition is another hallmark, allowing immediate base calling during the sequencing process rather than post-sequencing assembly.[1] Additionally, TGS supports the native detection of epigenetic modifications, such as DNA methylation, by analyzing kinetic signatures or ionic current changes inherent to the unmodified molecule, providing insights into gene regulation without separate bisulfite conversion steps.[10] In comparison to earlier generations, first-generation Sanger sequencing produces short reads of approximately 800–1,000 base pairs (bp) with low throughput, suitable for targeted validation but inefficient for large-scale genomics.[11] Second-generation next-generation sequencing (NGS) methods, such as Illumina platforms, yield high-throughput short reads of 100–300 bp but rely on PCR amplification, introducing biases and complicating the resolution of repetitive or low-complexity regions.[2] TGS shifts the paradigm by prioritizing read length and structural fidelity over per-base accuracy (which has improved to >99% in recent iterations), though it initially traded higher error rates for these benefits. TGS technologies first emerged in the late 2000s, with Pacific Biosciences launching its single-molecule real-time (SMRT) platform in 2010, and by 2025, ultra-long reads exceeding 4 Mb have been achieved, particularly with nanopore-based systems.[12][13]

Historical development and generational context

The development of DNA sequencing technologies traces back to the first generation, which emerged in the 1970s with Frederick Sanger's chain-termination method introduced in 1977. This technique, relying on dideoxynucleotides and gel electrophoresis, produced reads of less than 1 kb and was pivotal for the Human Genome Project, where it enabled the sequencing of the 3 billion base pair human genome over 13 years from 1990 to 2003.[9] The second generation of sequencing, launched in the mid-2000s, shifted to high-throughput platforms that amplified and sequenced millions of short DNA fragments in parallel, drastically reducing costs and time. Key early systems included 454 Life Sciences' pyrosequencing instrument in 2005, which generated 400–500 bp reads, and Illumina's Genome Analyzer in 2006, utilizing reversible terminator chemistry for even higher output. These technologies dominated the 2000s and 2010s, facilitating large-scale genomic studies but struggling with repetitive sequences and structural variants due to short read lengths.[9] Early single-molecule sequencing without amplification began in 2008 with Helicos BioSciences' true single-molecule sequencing platform, but long-read TGS technologies started in 2010 with Pacific Biosciences' launch of its PacBio RS system, introducing single-molecule real-time (SMRT) sequencing for continuous long-read generation. Oxford Nanopore Technologies entered the field in 2014 with the MinION, a USB-powered nanopore device that allowed portable, real-time sequencing of ultra-long strands.[14][15][16] Subsequent key events in the 2010s and 2020s focused on enhancing TGS accuracy and utility. In 2015, Oxford Nanopore released the MinION for widespread early access, emphasizing its portability for field applications. The 2020s brought major accuracy boosts, including Pacific Biosciences' 2019 introduction of HiFi reads through circular consensus sequencing, yielding >99% accuracy for 15–20 kb reads, and Oxford Nanopore's 2020 rollout of adaptive sampling for real-time targeted enrichment without library modifications.[17][18][19] TGS represents a generational shift by overcoming second-generation limitations in assembling complex genomes, particularly in repetitive regions and structural variant detection, through reads often exceeding 10 kb. By 2025, TGS adoption has accelerated via hybrid workflows combining short- and long-read data, with the market valued at approximately USD 881 million in 2025 and exhibiting a compound annual growth rate of over 20% through 2032, reflecting its integration into routine genomics.[9][20] From 2023 to 2025, advancements emphasized precision, such as Oxford Nanopore's R10.4 pores paired with Q20+ chemistry, achieving >99% raw read accuracy for DNA sequencing, further refined by AI-driven error correction in the Dorado basecaller for consistent high-quality outputs. In October 2025, Oxford Nanopore announced the PromethION Plus flow cell, which significantly increases output for large-scale genomic studies.[21][22][23]

Technologies

Pacific Biosciences SMRT sequencing

Pacific Biosciences' Single Molecule, Real-Time (SMRT) sequencing technology enables the direct observation of DNA synthesis by individual DNA polymerase molecules, providing long-read sequencing data with high accuracy through consensus generation.[24] Introduced commercially in 2010 following foundational research published in 2009, SMRT sequencing has evolved to support scalable genomic applications, including the 2022 launch of the Revio system, which facilitates sequencing of human genomes at 30x coverage in approximately 24 hours using a single SMRT Cell.[24][4] Recent 2025 updates include the announcement of SPRQ-Nx chemistry for enhanced throughput on Revio, with beta testing beginning in November 2025.[25] The core mechanism relies on zero-mode waveguides (ZMWs), nanoscale wells etched into a fused silica substrate that confine excitation light to a volume of about 20 nm in depth, allowing real-time optical detection of nucleotide incorporation without illuminating the entire reaction volume.[24] A highly processive DNA polymerase, such as a modified phi29, incorporates fluorescently labeled nucleotides—each with a distinct fluorophore attached to the terminal phosphate—into a growing DNA strand complementary to the template.[24] As incorporation occurs, the fluorophore is cleaved and diffuses away, producing a characteristic light pulse captured by high-speed cameras; the sequence of pulses corresponds to the DNA template sequence.[24] To enable multiple observations of the same molecule, the template is prepared as a circular SMRTbell structure, where the polymerase repeatedly traverses the insert region, generating subreads that can be aligned for consensus.[24][26] The workflow begins with sample preparation to create SMRTbell libraries: high-molecular-weight DNA is sheared to desired fragment sizes (typically 10-20 kb for HiFi applications), ends are repaired and A-tailed, and hairpin adapters are ligated to form the double-stranded circular template.[27] Libraries are then bound to polymerase and loaded onto SMRT Cells containing millions of ZMWs (8 million on Sequel IIe, 25 million on Revio), where sequencing occurs in real time.[28][29] Raw data produce continuous long reads (CLR) from single passes or high-fidelity (HiFi) reads via circular consensus sequencing (CCS), where multiple subreads (typically 10-30 passes) are computationally aligned to generate a consensus sequence.[26] SMRT sequencing specifications vary by instrument and chemistry. On the Sequel IIe system, CLR reads exceed 20 kb, while HiFi reads average 15-20 kb with >99% accuracy (Q30 or better).[30] The Revio system enhances throughput, yielding up to 120 Gb of HiFi data per SMRT Cell in a 24-hour run at 15-20 kb read lengths, supporting two phased 20x human genomes per cell.[28] HiFi accuracy derives from consensus over multiple polymerase passes, approximated by the error rate formula:
error rateinitial errornpasses \text{error rate} \approx \frac{\text{initial error}}{\sqrt{n_{\text{passes}}}}
where $ n_{\text{passes}} $ is the number of cycles (10-30 typical), reducing random errors from the ~13-15% raw rate to >99.9% at 10 passes.[26][31]

Oxford Nanopore sequencing

Oxford Nanopore sequencing, developed by Oxford Nanopore Technologies (ONT), utilizes a protein nanopore embedded in a membrane to detect nucleic acid sequences through changes in ionic current. In this method, DNA or RNA molecules are driven through the nanopore by an applied voltage, while a motor protein, such as a helicase, controls the translocation speed to approximately 450 bases per second, allowing sufficient time for signal detection. As the nucleotide sequence passes through the nanopore—typically variants like R9 or R10—the partial blockage of the pore by each base or k-mer (short sequence motif) generates distinct disruptions in the ionic current flowing across the membrane. These current variations are recorded as a time-series signal, which is then analyzed to infer the underlying sequence.[32][33][34] The workflow begins with library preparation, where native barcoding is ligated to DNA or RNA fragments to enable multiplexing of up to 96 samples without amplification, preserving epigenetic modifications. Prepared libraries are loaded into flow cells containing thousands of nanopores, and sequencing occurs on portable devices like the MinION or high-throughput platforms such as the PromethION. During or post-sequencing, raw current signals undergo real-time basecalling using AI-driven models like Dorado, which employs neural networks to translate signals into FASTQ-formatted sequences with integrated modification calling. This process supports both single-end and duplex modes, where complementary strands are sequenced for consensus accuracy.[35][36][37] Key specifications include ultra-long read lengths exceeding 1 megabase, enabling assembly of complex genomes without fragmentation. Raw single-read accuracy typically ranges from 90-95%, but advances in chemistry, such as the Q20+ kits introduced post-2023, achieve over 99% consensus accuracy through duplex basecalling. Throughput varies by device: the MinION, launched in 2014 as a USB-powered, palm-sized sequencer for field use, yields up to 48 gigabases per flow cell, while a single PromethION flow cell delivers up to 290 gigabases in 72 hours. Recent 2024-2025 updates incorporate adaptive sampling, a software feature that enables real-time targeted enrichment by selectively slowing or rejecting off-target reads during sequencing, and the introduction of PromethION Plus Flow Cells in limited release in Q4 2025 for higher output.[38][5][7] A distinctive feature is the direct detection of base modifications like 5-methylcytosine (5mC) and N6-methyladenine (6mA) by modeling characteristic current signal perturbations, eliminating the need for bisulfite conversion or other harsh treatments that can damage DNA. This native approach integrates modification calling during basecalling, providing simultaneous sequence and epigenome data at single-molecule resolution with accuracies exceeding 90% for 5mC in high-coverage regions.[39][40][41] Both Pacific Biosciences SMRT sequencing and Oxford Nanopore sequencing enable long-read analysis by processing extended DNA strands through specialized sensor-based mechanisms. SMRT sequencing utilizes fluorescence detection within zero-mode waveguides to observe nucleotide incorporation in real time, while Oxford Nanopore sequencing measures disruptions in ionic current as nucleic acids translocate through protein nanopores. These methods provide superior resolution of repetitive genomic regions compared to short-read technologies, facilitating the characterization of complex structural variations.[42]

Emerging and alternative platforms

While Pacific Biosciences and Oxford Nanopore dominate third-generation sequencing (TGS), several emerging platforms explore alternative single-molecule approaches to address limitations in throughput, accuracy, and cost, with recent 2025 updates including enhancements to established systems like PacBio's SPRQ-Nx chemistry and ONT's PromethION Plus. These technologies often integrate solid-state or hybrid detection methods but remain in developmental stages with limited commercial adoption as of 2025.[2][25][7] One influential early platform was Helicos BioSciences' HeliScope, launched in 2009 as the first true single-molecule sequencing (tSMS) system. This technology immobilized DNA strands on a flow cell surface and used fluorescently labeled reversible terminators for direct imaging of individual molecules, bypassing amplification biases common in earlier generations. Although Helicos filed for bankruptcy in 2012, its demonstration of unbiased, high-throughput single-molecule readout paved the way for subsequent TGS innovations by proving the feasibility of direct nucleic acid interrogation without cloning.[2][43] Quantapore represents a solid-state nanopore alternative, leveraging silicon nanopores integrated with CMOS chips for massively parallel detection. The platform threads single DNA molecules through nanopores while using optical readout to monitor ionic current changes and base-specific signals, aiming for long reads and reduced costs through scalable chip fabrication. As of 2025, Quantapore remains focused on research prototypes without widespread commercial availability, emphasizing hybrid integration for proteomics and genomics applications.[44][45] Stratos Genomics' sequencing by expansion (SBX) technology, acquired by Roche in 2020, introduces a novel amplification-free method using Xpandomers—synthetic polymers that biochemically convert and expand DNA templates by over 50-fold to encode sequence information with high signal-to-noise ratios. These expanded molecules are then sequenced via nanopore sensing of reporter codes, combining short-read accuracy with long-range phasing potential. Unveiled by Roche in February 2025, SBX is in development with demonstrated F1 scores >99.8% for single nucleotide variants (SNVs) and >99.7% for insertions/deletions (InDels) in whole genome samples, along with collaborations for clinical validation. In October 2025, Roche announced major advances, including a Guinness World Record for sequencing a human genome in under 4 hours, highlighting its speed and reliability, though it holds no significant market share yet and prioritizes cost-effective scalability.[46][47][48][49]

Advantages

Extended read lengths and structural resolution

Third-generation sequencing (TGS) technologies, such as those from Pacific Biosciences and Oxford Nanopore, produce reads typically exceeding 10 kilobases (kb) in length, enabling them to span repetitive genomic regions that fragment short-read next-generation sequencing (NGS) assemblies.[50] This capability resolves ambiguities in complex structures like segmental duplications and tandem repeats, which often exceed the 100-300 base pair limits of NGS reads.[51] For instance, the telomere-to-telomere (T2T) assembly of the human CHM13 haplotype in 2022 achieved a fully contiguous 3.055 billion base pair genome by leveraging high-fidelity long reads to bridge previously unresolvable gaps in centromeres and acrocentric regions.[51] These extended read lengths facilitate superior haplotype phasing. In diverse human populations, TGS has enabled phasing of a high proportion of structural variants across haplotypes, revealing allele-specific configurations in repetitive loci that NGS struggles to disambiguate. Additionally, TGS excels at detecting large insertions and deletions (indels) greater than 50 base pairs, capturing full event spans that short reads misalign or overlook, thus improving resolution of structural variants contributing to phenotypic diversity. In bacterial genomics, TGS routinely assembles over 99% of closed circular chromosomes into a single contig, contrasting with NGS's typical fragmentation into dozens of contigs due to repeats and GC bias.[52] Recent 2025 analyses of eukaryotic assemblies demonstrate improvements in gap closure rates compared to hybrid NGS approaches, particularly in polyploid crops and model organisms where long reads scaffold heterochromatic regions without additional mate-pair data. This inherent scaffolding power of long reads eliminates the need for paired-end libraries, streamlining workflows for high-resolution structural annotation.[53]

Native detection of molecular modifications

Third-generation sequencing technologies, such as Pacific Biosciences SMRT and Oxford Nanopore, enable the direct detection of epigenetic modifications like 5-methylcytosine (5mC) and N6-methyladenine (6mA) in native DNA molecules, avoiding the need for bisulfite conversion or other chemical treatments that can introduce artifacts and DNA damage.[54] In these platforms, modifications are identified through disruptions in sequencing signals: Oxford Nanopore detects them as shifts in ionic current blockade as the modified base passes through the nanopore, while PacBio SMRT sequencing observes variations in polymerase kinetics, such as altered inter-pulse durations (IPD) and pulse widths during nucleotide incorporation.[53] This native approach preserves the integrity of the DNA and allows simultaneous sequencing and modification profiling at single-molecule resolution.[55] Specialized software tools facilitate the analysis of these signals for accurate modification calling. For Oxford Nanopore data, Nanopolish employs a hidden Markov model (HMM) trained on signal patterns to distinguish modified from unmodified bases, computing a modification probability score as the log-likelihood ratio of the modified versus unmodified models:
Score=log(P(datamodified)P(dataunmodified)) \text{Score} = \log \left( \frac{P(\text{data} \mid \text{modified})}{P(\text{data} \mid \text{unmodified})} \right)
A positive score above a threshold (typically 2.0) indicates evidence of modification, enabling site-specific calling of 5mC and 6mA.[55] Similarly, PacBio's SMRT-Link software performs base modification detection by modeling kinetic signatures, identifying motifs associated with 4mC, 5mC, and 6mA through IPD ratio analysis and Phred-scaled quality scores.[53] These tools support comprehensive methylome profiling without amplification biases.[56] By 2025, these methods have achieved high accuracy for 6mA detection in prokaryotic genomes, with PacBio SMRT-Link reaching an F1 score of 0.958 (equivalent to >95% balanced accuracy) in wild-type Pseudomonas syringae, demonstrating low false positive rates at the site level.[53] This precision enables single-molecule epigenome mapping, revealing heterogeneous modification patterns across individual DNA strands and facilitating studies of bacterial restriction-modification systems.[53] Such capabilities extend to broader epigenetic applications, including the integration with structural variant detection for comprehensive genome annotation.[54]

Real-time processing and portability

Third-generation sequencing (TGS) platforms, particularly those from Oxford Nanopore Technologies (ONT), enable real-time basecalling, where raw electrical signals from nanopores are converted into nucleotide sequences as the DNA or RNA strand transits the pore, allowing immediate data analysis without post-sequencing delays.[36] This capability supports adaptive sampling, a process in which the sequencer dynamically rejects off-target DNA molecules and directs the process toward regions of interest, such as specific genes, enabling users to stop sequencing upon reaching target coverage and optimize resource use.[57] For instance, basecalling occurs in real time via software like Guppy, which processes signals on-the-fly to inform decisions during the run.[58] The portability of TGS devices further enhances their utility in field settings, exemplified by ONT's MinION, a compact sequencer weighing approximately 87 grams and powered via USB connection to a laptop or portable computer, facilitating deployment in remote or resource-limited environments.[59] This design supports on-site sequencing without reliance on laboratory infrastructure, making it ideal for rapid response scenarios. Real-world applications include the 2015 Ebola outbreak in West Africa, where MinION devices enabled portable genome sequencing of Ebola virus samples, yielding results in less than 24 hours from sample receipt, with the sequencing itself taking 15–60 minutes.[60] More recently, in 2025, a nanopore-based platform was used for near real-time genotyping of wheat stem rust pathogen lineages in agricultural fields, allowing on-site detection of fungicide sensitivity to inform immediate crop management decisions.[61] To facilitate real-time workflows, ONT's EPI2ME platform integrates cloud-based or local analysis pipelines that process data during sequencing runs, supporting applications like adaptive enrichment and preliminary variant calling.[62] For short reads, basecalling latency is minimal, often enabling decisions within seconds to minutes, which is critical for time-sensitive tasks such as pathogen identification.[57] Streaming basecalling, enhanced by GPU acceleration, further improves efficiency; for example, neural network-based tools like those in Guppy achieve up to 4-fold speedups on a single GPU compared to CPU-only processing, with scalability to multi-GPU setups yielding even greater performance gains for live runs.[63]

Challenges

Error rates and correction strategies

Third-generation sequencing (TGS) technologies are characterized by higher raw error rates than short-read methods, primarily due to the challenges of single-molecule detection without amplification. In Pacific Biosciences (PacBio) single-molecule real-time (SMRT) sequencing, continuous long reads (CLR) exhibit error rates of approximately 8-15%, stemming from polymerase kinetics and signal noise during single-pass traversal of the DNA molecule. These errors are predominantly substitutions and insertions/deletions (indels), with random distribution but occasional systematic biases in repetitive regions. In contrast, Oxford Nanopore Technologies (ONT) sequencing shows raw error rates of 5-15%, with systematic homopolymer errors being particularly prominent; these arise from imprecise ionic current measurements in stretches of identical bases, often leading to over- or underestimation of repeat lengths by 5-10%. Post-correction strategies can reduce ONT errors to 1-2% at consensus level, while PacBio's high-fidelity (HiFi) reads, derived from multiple passes, achieve error rates below 0.1%.[18] Correction strategies for TGS errors leverage both platform-specific consensus mechanisms and external data integration. PacBio's circular consensus sequencing (CCS) generates HiFi reads by repeatedly sequencing the same molecule (typically 10-20 passes), yielding accuracy that follows the binomial model for error reduction:
Consensus accuracy=1(1p)n \text{Consensus accuracy} = 1 - (1 - p)^n
where $ p $ represents single-pass accuracy (around 85%) and $ n $ is the coverage depth or number of passes.[31] This approach corrects random errors effectively but requires sufficient subread coverage to resolve systematic issues. Hybrid polishing combines TGS assemblies with high-accuracy short-read next-generation sequencing (NGS) data; for instance, NextPolish aligns Illumina-like short reads to long-read contigs and iteratively corrects single-nucleotide variants (SNVs) and indels, achieving up to 99% polishing accuracy in microbial genomes.[64] Advancements in artificial intelligence have further refined error correction, especially for ONT. Deep learning basecallers like Bonito, an open-source PyTorch-based model, process raw signal data to predict bases with reduced indel rates, outperforming earlier recurrent neural network (RNN) models by 2-5% in accuracy on R9 pores.[65] By 2025, ONT's Q20+ kits, incorporating R10.4 pore chemistry, have lowered raw error rates to under 1% (Q20+ modal accuracy >99%), while machine learning enhancements on R10 pores enable up to 99.75% single-read accuracy through optimized convolutional architectures.[41] These strategies collectively address TGS error profiles, enabling reliable downstream applications like de novo assembly with minimal residual bias.

Throughput, cost, and scalability issues

Third-generation sequencing (TGS) platforms, such as those from Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), offer significant advantages in read length but face notable limitations in throughput compared to second-generation sequencing (NGS) systems like Illumina's NovaSeq X. The PacBio Revio system achieves up to 480 Gb of high-fidelity (HiFi) reads per day, enabling the equivalent of approximately 2,500 human whole genomes annually when accounting for typical coverage requirements.[4] Similarly, a single ONT PromethION flow cell generates up to 200 Gb of output, though full PromethION 24 systems can scale to 2.4–4.8 Tb across 24 flow cells in a 72-hour run.[5] In contrast, NGS platforms like the NovaSeq X Plus deliver up to 16 Tb per dual-flow-cell run, often achieving terabase-scale output per day in high-throughput configurations, making TGS generally 10-100 times lower in daily volume for large-scale projects.[66] This disparity arises from TGS's reliance on single-molecule detection, which prioritizes length over parallelization density, limiting its suitability for population-scale studies without extensive multiplexing. Cost remains a key barrier for TGS adoption, though prices have declined steadily into 2025. ONT sequencing costs approximately $345 per human genome as of 2025, with PromethION flow cells priced around $900–$1,000 for 100–200 Gb outputs, translating to roughly $4.50–$9 per Gb depending on library efficiency and run conditions.[67] PacBio's Revio offers HiFi genomes at approximately $250–300 each as of late 2025, with SMRT cells yielding up to 480 Gb and costs reduced through new SPRQ chemistry, or approximately $2–$3 per Gb.[6] Initial setup costs are higher for portable systems, such as ONT's MinION device at around $1,000-$5,000, compared to NGS where per-Gb rates can fall below $1 through bulk runs.[68] Despite these reductions—driven by improved chemistries and economies of scale—TGS remains 5-10 times more expensive than NGS for routine high-volume tasks like whole-genome resequencing, though it provides better value for applications requiring long reads.[67] Scalability challenges in TGS stem from hardware constraints and limited reuse options, though 2025 innovations have mitigated some issues. Flow cell reuse is possible with ONT's washing protocols, which remove prior libraries and refresh nanopores, enabling multiple runs per cell and contributing to cost savings of up to 50% in sequential experiments by avoiding full replacements. However, each wash reduces active pores, capping practical reuse at 3-6 cycles with diminishing yields.[69] PacBio SMRT cells are largely single-use, though the Revio's design supports higher parallelism. Recent multiplexing advances, such as PacBio's barcoding for up to 48 samples per Revio SMRT cell and ONT's adaptive sampling for 96+ barcodes, have boosted per-run capacity, allowing efficient scaling for mid-sized cohorts without proportional cost increases.[70] The 2025 introduction of ONT's PromethION Plus flow cell, in limited release as of November, further enhances scalability by delivering significantly increased outputs optimized for 15–30 kb fragments without washing, reducing the per-30x genome cost below $345 through built-in multiplexing efficiencies.[7] These developments position TGS for hybrid workflows, where it complements NGS for structural insights, but full scalability for terabase projects still requires multi-instrument deployments.

Data management and computational requirements

Third-generation sequencing (TGS) platforms produce exceptionally large datasets, with a single run on high-throughput devices like the Oxford Nanopore PromethION generating over 100 GB of raw data in POD5 format, necessitating robust storage infrastructure for initial handling. This volume arises from the capture of detailed electrical current signals ("squiggles") during sequencing, which are significantly bulkier than the basecalled outputs; for instance, raw POD5 files from Oxford Nanopore technologies (ONT) can occupy approximately 10 times the storage space per gigabase sequenced compared to next-generation sequencing (NGS) FASTQ files, due to the uncompressed signal traces essential for downstream analyses.[71] Basecalled FASTQ files require about 0.65 GB per gigabase, while aligned BAM files demand around 1.4 GB per gigabase, further amplifying storage needs for processed data.[72] Computational demands escalate during alignment and assembly, where tools like minimap2 for mapping long reads to reference genomes—such as the human genome—may require 64 GB or more of RAM for high-coverage datasets with long reads, though standard human genome alignments typically use 30–60 GB. Basecalling, a prerequisite step, relies on neural network-based software such as Guppy or the high-performance Dorado from ONT, which convert raw signals to sequences but still demand substantial GPU resources for real-time or batch processing.[73] Assembly tools like Hifiasm address these challenges by leveraging the accuracy of high-fidelity reads (e.g., from PacBio) to produce haplotype-resolved genomes efficiently, often completing human-sized assemblies in hours on multi-core systems with hundreds of GB of memory. Recent advancements in 2025, including AI-driven optimizations in basecalling and alignment pipelines, have reduced overall compute requirements by up to 40% through techniques like accelerated neural networks and efficient data compression, enabling faster processing on standard hardware.[74] Cloud-based solutions, such as AWS HealthOmics, further mitigate these burdens by providing scalable storage and analysis, achieving up to 72% cost reductions in TGS workflows through optimized resource allocation and parallel computing.[75] A distinctive computational aspect of TGS involves signal-level analysis for epigenetic modifications, where pipelines like Nanocompore perform comparative modeling of raw current traces to detect RNA modifications (e.g., m6A) without basecalling, employing Gaussian mixture models that require specialized scripting and moderate CPU/GPU resources for statistical inference across large signal datasets.[76]

Genomic applications

De novo and hybrid genome assembly

Third-generation sequencing (TGS) has transformed de novo genome assembly by enabling the construction of complete genomes from raw reads without a reference, primarily through overlap-layout-consensus (OLC) paradigms that leverage long reads to bridge repetitive regions. In OLC approaches, reads are first aligned based on overlaps, then organized into a layout graph, and finally consensus sequences are generated to correct errors; this method excels with TGS data due to read lengths often exceeding 10 kb, allowing resolution of repeats larger than 10 kb that fragment short-read assemblies. For instance, the Canu assembler, designed for noisy long reads from PacBio platforms, employs adaptive k-mer weighting to separate repeats and has successfully produced near-complete assemblies for microbial and eukaryotic genomes, such as a Drosophila melanogaster assembly with contig N50 exceeding 20 Mb.[77] Hybrid assembly strategies further enhance contiguity by integrating TGS long reads to scaffold or extend contigs from next-generation sequencing (NGS) short reads, combining the high accuracy of short reads with the structural spanning power of long reads. Tools like MaSuRCA use a "mega-reads" approach, where short reads are paired into longer composites aligned to TGS reads, yielding assemblies with dramatically improved metrics; for example, in the model plant Arabidopsis thaliana, MaSuRCA achieved a contig N50 of up to 9.15 Mb, a substantial improvement over NGS-only assemblies. Such hybrid methods typically boost N50 lengths by 10- to 100-fold across diverse taxa, reducing fragmentation in complex regions like centromeres.[78] A landmark application occurred in 2022 when the Telomere-to-Telomere (T2T) Consortium utilized ultralong Oxford Nanopore Technologies (ONT) reads for scaffolding alongside high-fidelity PacBio reads to assemble the first complete human genome (T2T-CHM13), resolving all gaps including repetitive heterochromatin and achieving a contig N50 of over 100 Mb. By 2025, automated tools like Verkko have streamlined this process, integrating phased assembly graphs with long-read data to produce telomere-to-telomere human assemblies with minimal manual intervention, as demonstrated in diploid samples with chromosome-level contigs. The contig N50 metric quantifies assembly quality, defined as the smallest contig length such that 50% of the total genome length is contained in contigs of that length or longer; TGS routinely achieves N50 values >10 Mb, compared to ~100 kb for typical NGS de novo assemblies, underscoring the former's superiority in capturing large-scale structure.[51]

Structural variant detection and epigenetic analysis

Third-generation sequencing (TGS) technologies, such as Pacific Biosciences (PacBio) HiFi and Oxford Nanopore Technologies (ONT), enhance structural variant (SV) detection by producing long reads that span repetitive genomic regions and complex rearrangements, allowing alignment to reference genomes to identify variants like deletions, insertions, inversions, and translocations.[79] Tools like Sniffles align these reads and detect SVs by analyzing discordant alignments, split reads, and coverage depth, achieving high precision (up to 94%) for deletions and duplications.[80] Split-read alignment, where reads are partially mapped on either side of an SV breakpoint, enables precise localization of variant boundaries, particularly for insertions and deletions in long-read data.[81] For large SVs exceeding 1 kb, such as inversions and translocations, TGS demonstrates superior resolution compared to short-read next-generation sequencing (NGS), with tools like Sniffles2 and cuteSV reporting recall rates above 90% in benchmarked datasets.[82] In cancer genomes, 2025 studies using TGS have identified up to two to six times more SVs than NGS, uncovering hidden complexities in tumor rearrangements that short reads often miss due to fragmentation in repetitive elements.[79] This improved sensitivity stems from the ability of long reads to bridge SV breakpoints directly, reducing false negatives in challenging regions like segmental duplications.[83] TGS is particularly effective for detecting repeat expansions, a class of structural variants characterized by tandem repeats of DNA sequences that can expand pathologically in neurodegenerative disorders. For example, long reads from PacBio and ONT platforms can span expansive repetitive regions, such as the GGGGCC hexanucleotide repeat in the C9orf72 gene associated with amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD), or the CAG trinucleotide repeat in the HTT gene in Huntington's disease, which short-read methods often fail to resolve accurately due to their inability to traverse large repeat tracts.[84][85] Studies have shown that TGS enables precise quantification of repeat lengths, facilitating diagnosis and research into these disorders by identifying expansions that exceed hundreds or thousands of repeats.[86] In epigenetic analysis, TGS enables direct detection of DNA modifications without bisulfite conversion, leveraging signal changes in raw sequencing data to call 5-methylcytosine (5mC) at single-molecule resolution.[87] The F5C tool, an optimized version of Nanopolish, processes ONT event alignments to quantify 5mC probabilities, facilitating genome-wide methylation profiling with base-level accuracy.[87] Furthermore, TGS supports phasing of epigenetic modifications across haplotypes by integrating methylation calls with variant phasing, as demonstrated in methods like MethPhaser, which resolve allele-specific methylation patterns in diploid genomes.[88] ONT sequencing has also advanced the mapping of N6-methyladenine (6mA) in bacterial genomes at the single-molecule level, detecting modification motifs through ionic current disruptions without amplification bias.[89] Recent 2025 benchmarks confirm ONT's efficacy for 6mA calling, with tools achieving high concordance across replicates in diverse bacterial strains, aiding in the study of restriction-modification systems.[53]

Advanced applications

Transcriptomic profiling and isoform resolution

Third-generation sequencing technologies enable comprehensive transcriptomic profiling by capturing full-length RNA transcripts, allowing direct observation of alternative splicing and isoform diversity without the fragmentation required in short-read methods. PacBio's Iso-Seq method sequences full-length cDNA molecules, typically poly-A selected transcripts exceeding 10 kb, providing complete representations of mature mRNAs from 5' cap to poly-A tail.[90] Similarly, Oxford Nanopore Technologies (ONT) supports direct RNA sequencing of native strands, preserving modifications and enabling isoform quantification at the single-molecule level.[91] These approaches facilitate the resolution of complex splicing patterns, such as exon skipping, mutually exclusive exons, and alternative polyadenylation sites, which are often ambiguous in short-read data due to computational assembly needs.[92] Isoform resolution in third-generation sequencing distinguishes splice variants by sequencing uninterrupted full-length transcripts, eliminating fragmentation-induced biases and enabling precise annotation of novel isoforms. Tools like SQANTI3 perform quality control and curation of long-read transcript models, classifying them against reference annotations to identify structural categories such as full-length non-chimeric reads and filtering artifacts for accurate isoform discovery.[93] This full-length consensus approach supports the detection of previously unannotated transcripts, revealing hidden transcriptional complexity in genes with high alternative splicing rates, such as those involved in neuronal development.[94] Recent applications of single-cell Iso-Seq have demonstrated its power in uncovering isoform diversity in human brain tissues, with studies identifying over 30% novel isoforms during cortical neurogenesis through high-fidelity long-read sequencing.[95] Error correction in these datasets is achieved using tools like FLAIR, which clusters noisy reads, corrects splice junctions with reference or short-read support, and generates high-confidence consensus sequences for isoform quantification and alternative splicing analysis.[96] Such advancements highlight third-generation sequencing's role in transcriptomic profiling, particularly for discovering cell-type-specific isoforms that short-read methods overlook.[97]

Metagenomics and pathogen identification

Third-generation sequencing technologies, with their ability to generate long reads spanning repetitive regions and structural variations, have significantly advanced metagenomics by facilitating the assembly of genomes from unculturable microorganisms that dominate microbial ecosystems. Unlike short-read approaches, which often fragment assemblies due to incomplete coverage of complex genomic elements, long reads enable the reconstruction of near-complete metagenome-assembled genomes (MAGs) from diverse, low-abundance taxa in environmental samples. This capability is particularly valuable in studying microbial communities where cultivation-independent methods are essential, as unculturable species account for over 99% of microbial diversity.[98][99] A key tool in this domain is metaFlye, an assembler designed specifically for long-read metagenomic data, which uses repeat graphs to handle uneven community composition and intra-species heterogeneity. By resolving strain-level variants—such as single-nucleotide polymorphisms and insertions/deletions within closely related populations—metaFlye improves the recovery of high-quality MAGs, achieving contiguity scores that surpass traditional short-read assemblers in simulated and real datasets. This strain resolution is crucial for understanding microbial evolution, functional redundancy, and interactions in complex consortia, such as those in soil or ocean microbiomes.[100][99] Taxonomic binning in long-read metagenomics further enhances analysis by grouping reads or contigs based on compositional features, including k-mer frequencies that capture species-specific signatures over extended sequences. Methods employing coverage and k-mer profiles allow reference-free classification of long reads, outperforming short-read binning in accuracy for low-abundance taxa and reducing chimerism in diverse samples. This approach supports precise community profiling without relying on marker genes, enabling deeper insights into phylogenetic diversity.[101][102] In pathogen identification, real-time sequencing via Oxford Nanopore Technologies (ONT) platforms enables rapid metagenomic surveillance during outbreaks, allowing identification of causative agents directly from clinical or environmental samples within hours. The portability of devices like MinION supports on-site deployment in remote settings, such as field investigations, where immediate data generation informs containment strategies. For viral pathogens, ONT delivers consensus genomes with modal accuracies exceeding 97%, sufficient for epidemiological tracking and variant detection in real-time workflows.[103][104] A notable application occurred in the Canadian High Arctic, where MinION sequencing was performed on-site in 2018 to analyze metagenomes from sea ice cryoconites, yielding high-quality MAGs from uncultured archaea and bacteria in extreme environments. Such portable TGS efforts have since facilitated the detection of novel antibiotic resistance genes in polar soils, revealing reservoirs of resistance to clinically relevant antibiotics like beta-lactams and aminoglycosides, with implications for global AMR surveillance amid climate-driven microbial dispersal.[105][106]

Clinical diagnostics and personalized medicine

Third-generation sequencing (TGS) has significantly advanced clinical diagnostics by enabling the precise characterization of repeat expansions, which are challenging for short-read technologies due to their length and repetitive nature. For instance, in Huntington's disease, long-read sequencing accurately sizes CAG repeats in the HTT gene, detecting somatic expansions exceeding 150 repeats that drive selective neuronal degeneration in striatal projection neurons.[107] This approach has facilitated rapid and comprehensive diagnostic methods for repeat expansion disorders, including amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) involving expansions in the C9orf72 gene, improving diagnostic yield in neurological conditions where traditional methods fall short.[108][109] In Mendelian disorders, TGS enhances structural variant (SV) detection with high sensitivity and specificity, particularly for complex rearrangements in repetitive genomic regions often missed by short-read sequencing. Long-read methods have identified causal SVs in unsolved cases, contributing to diagnostic rates of up to 20-30% in reanalysis of undiagnosed patients by resolving variants in genes associated with rare genetic diseases. This capability is crucial for conditions involving copy number variations or inversions, where TGS provides complete phasing and breakpoint resolution.[110][111] For personalized medicine, TGS supports haplotype phasing of pharmacogenes, such as CYP2D6 and HLA loci, enabling diplotyping that predicts individual drug responses more accurately than genotyping alone. Real-time tumor sequencing with platforms like Oxford Nanopore Technologies (ONT) allows intra-operative profiling to guide immunotherapy, identifying neoantigens and tumor-specific alterations for tailored treatments like checkpoint inhibitors. In diverse populations, TGS resolves up to 40% more nucleotide-resolved deletions compared to short-read approaches, addressing biases in variant calling and improving equity in genomic diagnostics across ancestries.[112][113][50] Furthermore, the detailed resolution of complex mutations, such as repeat expansions, provided by long-read sequencing supports the development of personalized gene therapies by enabling the design of targeted genetic interventions tailored to individual patients' specific mutations.[114] Portable TGS devices, such as the ONT MinION, extend these benefits to low-resource settings by enabling on-site prenatal SV screening and rapid pathogen identification for sepsis, achieving results from blood cultures in under 2 hours through real-time analysis. This portability facilitates point-of-care testing in remote areas, supporting timely interventions for conditions like structural anomalies in fetuses without relying on centralized labs.[32][115]

References

User Avatar
No comments yet.