Hubbry Logo
Structural variationStructural variationMain
Open search
Structural variation
Community hub
Structural variation
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Structural variation
Structural variation
from Wikipedia

Genomic structural variation is the variation in structure of an organism's chromosome, such as deletions, duplications, copy-number variants, insertions, inversions and translocations. Originally, a structure variation affects a sequence length about 1kb to 3Mb, which is larger than SNPs and smaller than chromosome abnormality (though the definitions have some overlap).[1] However, the operational range of structural variants has widened to include events > 50bp.[2] Some structural variants are associated with genetic diseases, however most are not.[3][4] Approximately 13% of the human genome is defined as structurally variant in the normal population, and there are at least 240 genes that exist as homozygous deletion polymorphisms in human populations, suggesting these genes are dispensable in humans.[4] While humans carry a median of 3.6 Mbp in SNPs (compared to a reference genome), a median of 8.9 Mbp is affected by structural variation which thus causes most genetic differences between humans in terms of raw sequence data.[4]

Microscopic structural variation

[edit]

Microscopic means that it can be detected with optical microscopes, such as aneuploidies, marker chromosome, gross rearrangements and variation in chromosome size.[5][6] The frequency in human population is thought to be underestimated due to the fact that some of these are not actually easy to identify. These structural abnormalities exist in 1 of every 375 live births by putative information.[7]

Sub-microscopic structural variation

[edit]

Sub-microscopic structural variants are much harder to detect owing to their small size. The first study in 2004 that used DNA microarrays could detect tens of genetic loci that exhibited copy number variation, deletions and duplications, greater than 100 kilobases in the human genome.[8] However, by 2015 whole genome sequencing studies could detect around 5,000 of structural variants as small as 100 base pairs encompassing approximately 20 megabases in each individual genome.[3][4] These structural variants include deletions, tandem duplications, inversions, mobile element insertions. The mutation rate is also much higher than microscopic structural variants, estimated by two studies at 16% and 20% respectively, both of which are probably underestimates due to the challenges of accurately detecting structural variants.[3][9] It has also been shown that the generation of spontaneous structural variants significantly increases the likelihood of generating further spontaneous single nucleotide variants or indels within 100 kilobases of the structural variation event.[3]

Copy-number variation

[edit]

Copy-number variation (CNV) is a large category of structural variation, which includes insertions, deletions and duplications. In recent studies, copy-number variations are tested on people who do not have genetic diseases, using methods that are used for quantitative SNP genotyping. Results show that 28% of the suspected regions in the individuals actually do contain copy number variations.[10][11] Also, CNVs in human genome affect more nucleotides than Single Nucleotide Polymorphism (SNP). It is also noteworthy that many of CNVs are not in coding regions. Because CNVs are usually caused by unequal recombination, widespread similar sequences such as LINEs and SINEs may be a common mechanism of CNV creation.[12][13]

Inversion

[edit]

There are several inversions known which are related to human disease. For instance, recurrent 400kb inversion in factor VIII gene is a common cause of haemophilia A,[14] and smaller inversions affecting idunorate 2-sulphatase (IDS) will cause Hunter syndrome.[15] More examples include Angelman syndrome and Sotos syndrome. However, recent research shows that one person can have 56 putative inversions, thus the non-disease inversions are more common than previously supposed. Also in this study it's indicated that inversion breakpoints are commonly associated with segmental duplications.[16] One 900 kb inversion in the chromosome 17 is under positive selection and are predicted to increase its frequency in European population.[17]

Other structural variants

[edit]

More complex structural variants can occur include a combination of the above in a single event.[3] The most common type of complex structural variation are non-tandem duplications, where sequence is duplicated and inserted in inverted or direct orientation into another part of the genome.[3] Other classes of complex structural variant include deletion-inversion-deletions, duplication-inversion-duplications, and tandem duplications with nested deletions.[3] There are also cryptic translocations and segmental uniparental disomy (UPD). There are increasing reports of these variations, but are more difficult to detect than traditional variations because these variants are balanced and array-based or PCR-based methods are not able to locate them.[18]

Structural variation and phenotypes

[edit]

Some genetic diseases are suspected to be caused by structural variations, but the relation is not very certain. It is not plausible to divide these variants into two classes as "normal" or "disease", because the actual output of the same variant will also vary. Also, a few of the variants are actually positively selected for (mentioned above). A series of studies have shown that gene disrupting spontaneous (de novo) CNVs disrupt genes approximately four times more frequently in autism than in controls and contribute to approximately 5–10% of cases.[3][19][20][21][22] Inherited variants also contribute to around 5–10% of cases of autism.[3]

Structural variations also have its function in population genetics. Different frequency of a same variation can be used as a genetic mark to infer relationship between populations in different areas. A complete comparison between human and chimpanzee structural variation also suggested that some of these may be fixed in one species because of its adaptative function.[23] There are also deletions related to resistance against malaria and AIDS.[24][25] Also, some highly variable segments are thought to be caused by balancing selection, but there are also studies against this hypothesis.[26]

Database of structural variation

[edit]

Some of genome browsers and bioinformatic databases have a list of structural variations in human genome with an emphasis on CNVs, and can show them in the genome browsing page, for example, UCSC Genome Browser.[27] Under the page viewing a part of the genome, there are "Common Cell CNVs" and "Structural Var" which can be enabled. On NCBI, there is a special page [28] for structural variation. In that system, both "inner" and "outer" coordinates are shown; they are both not actual breakpoints, but surmised minimal and maximum range of sequence affected by the structural variation. The types are classified as insertion, loss, gain, inversion, LOH, everted, transchr and UPD.[citation needed]

Methods of detection

[edit]
Signatures and patterns of SVs for deletion (A), novel sequence insertion (B), inversion (C), and tandem duplication (D) in read count (RC), read-pair (RP), split-read (SR), and de novo assembly (AS) methods.[29]

New methods have been developed to analyze human genetic structural variation at high resolutions. The methods used to test the genome are in either a specific targeted way or in a genome wide manner. For Genome wide tests, array-based comparative genome hybridization approaches bring the best genome wide scans to find new copy number variants.[30] These techniques use DNA fragments that are labeled from a genome of interest and are hybridized, with another genome labeled differently, to arrays spotted with cloned DNA fragments. This reveals copy number differences between two genomes.[30]

For targeted genome examinations, the best assays for checking specific areas of the genome are primarily PCR based. The best established of the PCR based methods is real time quantitative polymerase chain reaction (qPCR).[30] A different approach is to specifically check certain areas that surround known segmental duplications since they are usually areas of copy number variation.[30] An SNP genotyping method that offers independent fluorescence intensities for two alleles can be used to target the nucleotides in between two copies of a segmental duplication.[30] From this, an increase in intensity from one of the alleles compared to the other can be observed.

With the development of next-generation sequencing (NGS) technology, four classes of strategies for the detection of structural variants with NGS data have been reported, with each being based on patterns that are diagnostic of different classes of SV.[31][29][32][33]

  • Read-depth or read-count methods assume a random distribution (e.g. Poisson distribution) of reads from short read sequencing. The divergence from this distribution is investigated to discover duplications and deletions. Regions with duplication will show higher read depth while those with deletion will result in lower read depth.
  • Split-read methods enable detection of insertions (including mobile element insertions) and deletions down to single base-pair resolution. The presence of a SV is identified from discontinuous alignment to the reference genome. A gap in the read marks a deletion and in the reference marks an insertion.
  • Read pair methods examine the length and orientation of paired-end reads from short read sequencing data. For example, read pairs further apart than expected indicate a deletion. Translocations, inversions and tandem duplications can likewise be discovered using read-pairs.
  • De novo sequence assembly may be applied with reads that are accurate enough. While, in practice, use of this method is limited by the length of sequence reads, long read based genome assemblies offer structural variation discovery for classes such as insertions that escape detection when using other methods.[34]

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Structural variation (SV), also known as structural variants, refers to genomic alterations involving rearrangements of DNA segments typically 50 base pairs (bp) or larger in size, which can include insertions, deletions, duplications, inversions, translocations, and more complex forms such as copy-number neutral events or catastrophic rearrangements like . These variants arise from mechanisms such as , , or replication errors, leading to changes in , disruption of regulatory elements, or alterations in . Unlike single-nucleotide variants, SVs affect larger genomic regions and contribute substantially to across populations, often accounting for a greater proportion of base pairs under variation despite being less frequent in number. SVs play a pivotal role in human evolution by driving phenotypic diversity and adaptation, as evidenced by their enrichment in regions associated with traits like height, immune response, and neurological function. In disease contexts, they are implicated in both rare Mendelian disorders—such as Potocki-Lupski syndrome caused by microduplications on chromosome 17—and complex conditions like autism spectrum disorders, , and cancer, where they can disrupt protein-coding genes or create novel fusion products. For instance, large-scale catalogs like the Genome Aggregation Database's structural variation resource (gnomAD-SV), derived from over 14,000 high-quality , reveal that SVs influence more than 25% of rare protein-truncating events and impact hundreds of genes per individual, underscoring their diagnostic relevance in clinical . Advances in long-read sequencing technologies, such as PacBio and Oxford Nanopore, have revolutionized SV detection by overcoming limitations of short-read methods, enabling more accurate mapping and interpretation of these variants in both research and medical applications. Despite their importance, SVs remain understudied compared to smaller variants due to historical challenges in ascertainment, but ongoing efforts in population-scale continue to highlight their contributions to both normal variation and .

Overview and Background

Definition and Scope

Structural variation (SV) refers to genomic alterations involving DNA segments typically larger than 1 kilobase (kb) in length, including deletions, insertions, duplications, inversions, and translocations. In contemporary genomics, some definitions extend the threshold to 50 base pairs (bp) or more to accommodate variants resolvable by advanced sequencing methods. These changes disrupt the linear arrangement of chromosomes and can influence gene dosage, expression, and function. SVs encompass both variants, which are heritable and present in reproductive cells, and somatic variants, which arise post-zygotically in non-reproductive tissues and may drive diseases such as cancer. They are highly prevalent in human populations; high-resolution whole-genome sequencing analyses indicate that each diploid contains approximately 26,000 SVs, far exceeding the number of smaller variants like single nucleotide polymorphisms (SNPs). In contrast to point mutations (e.g., SNPs) or small indels affecting one to a few bases, SVs rearrange larger genomic regions, often leading to greater functional consequences, such as altered protein-coding sequences or regulatory elements. Broadly, SVs are categorized as balanced, involving no net gain or loss of genetic material (e.g., inversions), or unbalanced, resulting in copy number changes (e.g., deletions or duplications).

Historical Development

The study of structural variations (SVs) in the began with early cytogenetic observations in the mid-20th century, following the establishment of the correct human chromosome number of 46 in 1956. Initial discoveries focused on large-scale chromosomal abnormalities associated with diseases, such as the partial deletion of the short arm of causing , identified by and colleagues in 1963 through karyotyping of affected individuals. These findings highlighted SVs as key contributors to congenital disorders, building on earlier reports of aneuploidies like 21 in (1959) and other visible rearrangements detectable at resolutions of several megabases. Advancements in the 1970s revolutionized SV detection with the development of chromosome banding techniques, which enabled visualization of microscopic rearrangements at resolutions down to about 3-5 Mb. Q-banding, introduced by Torbjörn Caspersson et al. in 1970 using quinacrine mustard staining, produced fluorescent patterns that distinguished individual chromosomes and subtle structural changes, such as deletions and duplications. This was complemented by in the early 1970s, which used Giemsa staining after treatment to reveal darker and lighter bands corresponding to and regions, respectively, facilitating the identification of balanced translocations and inversions in clinical samples. By the 1980s, (FISH) emerged as a pivotal tool for sub-microscopic SV detection, with the first fluorescent probes applied around 1980 to localize specific DNA sequences on chromosomes, achieving resolutions of 50-100 kb and enabling precise mapping of disease-associated variants. The completion of the in 2003 provided a reference sequence that spurred systematic SV discovery in the , shifting focus from rare pathogenic variants to common polymorphisms. A landmark 2004 study by Iafrate, Feuk, and colleagues used array (array-CGH) to detect 255 large-scale copy-number variations (CNVs) across the genome in healthy individuals, demonstrating that SVs constitute a substantial portion of human beyond single-nucleotide polymorphisms. This was rapidly expanded by the , launched in 2008 and reporting initial results in 2010, which cataloged over 1,000 SVs in 1,092 individuals from diverse populations using paired-end sequencing, quantifying their prevalence at frequencies up to 20% and underscoring their role in . Recent advances through 2025 have integrated long-read sequencing technologies, such as (PacBio) and (ONT), to resolve complex and previously undetectable SVs, including those in repetitive regions. For instance, a 2025 study sequenced 1,019 diverse human genomes with ONT, identifying 167,291 SV sites—including 65,075 deletions and 74,125 insertions—using graph-based references, with 98.4% successfully phased and improved sensitivity for rare variants ( <1%). These methods have also extended SV recognition to non-human genomes, building on early observations like gene duplications in Drosophila reported in 1936, and accelerating post-2004 through comparative genomics in species such as primates and plants to elucidate evolutionary impacts.

Classification and Types

Microscopic Structural Variations

Microscopic structural variations refer to large-scale alterations in chromosome structure or number that are detectable using light microscopy techniques, typically involving segments greater than 5-10 megabases (Mb) in size. These variations include numerical changes such as (e.g., or ) and structural changes like large deletions, duplications, and , which can disrupt the normal chromosomal architecture visible during . Such variations are distinguished from smaller sub-microscopic counterparts that require molecular methods for detection. Prominent examples of microscopic structural variations include trisomy 21, which causes and results from an extra copy of chromosome 21, leading to a karyotype of 47,XX,+21 or 47,XY,+21. Another classic case is the , a reciprocal translocation t(9;22)(q34;q11.2) observed in chronic myeloid leukemia (CML), first identified in 1960 through cytogenetic analysis of leukemic cells. These examples illustrate how microscopic variations can involve entire chromosomes or substantial segments, often with profound clinical consequences. Detection of these variations primarily relies on karyotyping, where chromosomes are stained and visualized during cell division. G-banding, using Giemsa dye after trypsin treatment, produces characteristic light and dark bands that allow identification of large-scale changes like aneuploidies and translocations. For more complex rearrangements, spectral karyotyping (SKY) employs fluorescent dyes to label each chromosome pair with a unique spectral signature, enabling precise mapping of derivative chromosomes beyond standard banding resolution. In populations, visible chromosomal abnormalities occur in approximately 0.5-1% of live births, contributing significantly to congenital anomalies. The functional impacts of microscopic structural variations often stem from gene dosage imbalances, where extra or missing chromosomal material alters the expression levels of multiple genes, or from loss of heterozygosity in deletions that unmasks recessive mutations. In translocations like the , novel fusion genes (e.g., BCR-ABL) drive oncogenesis. These disruptions frequently result in severe developmental disorders, intellectual disabilities, and increased risk of malignancies, as seen in trisomy 21 where triplication of genes on chromosome 21 affects neural and cardiac development.

Sub-Microscopic Structural Variations

Sub-microscopic structural variations refer to genomic rearrangements smaller than those detectable by standard karyotyping, typically ranging from 50 base pairs to approximately 5 megabases in size, including small insertions, deletions, inversions, and complex rearrangements that alter DNA sequence structure without visible chromosomal changes under a light microscope. These variations require molecular techniques such as array comparative genomic hybridization or next-generation sequencing for identification, distinguishing them from larger microscopic alterations. In human populations, sub-microscopic structural variations are highly prevalent and contribute substantially more to inter-individual genomic differences than single nucleotide polymorphisms (SNPs), with studies from the 2010s estimating that they affect over 20 megabases of sequence per diploid genome on average, representing a larger fraction of variable bases compared to the roughly 3-4 megabases impacted by SNPs. For instance, population-scale sequencing efforts have revealed thousands of such variants per individual, underscoring their role as a major source of genetic diversity beyond point mutations. These variations are categorized as unbalanced or balanced based on whether they result in net gains or losses of genetic material. Unbalanced sub-microscopic variations, such as small copy-number variations (CNVs), lead to dosage changes that can disrupt gene function, while balanced ones, like cryptic translocations or small inversions, rearrange segments without altering overall copy number but potentially affecting regulatory elements or gene orientation. Additionally, sub-microscopic structural variations often occur in a mosaic fashion, present in only a subset of cells due to post-zygotic events, which complicates detection and can contribute to variable expressivity in genetic traits. A representative example is the ~1.5 megabase deletion at chromosome 7q11.23 associated with Williams-Beuren syndrome, an unbalanced sub-microscopic variation that removes multiple genes and exemplifies how such events can underlie genomic disorders. Evolutionarily, sub-microscopic structural variations have played a key role in primate genome divergence; comparative analyses show they account for the majority of base-pair differences between humans and other great apes, driving adaptations through gene dosage alterations and regulatory rewiring.

Specific Structural Variants

Copy-Number Variations

Copy-number variations (CNVs) represent a major category of unbalanced structural variants in the human genome, defined as deletions or duplications of DNA segments that result in fewer than one or more than two copies of those regions in a diploid genome. These alterations typically span from 1 kilobase (kb) to 5 megabases (Mb), though smaller events down to 50 base pairs (bp) have been identified, distinguishing them from smaller insertions/deletions (indels). Unlike balanced rearrangements such as inversions, CNVs directly alter gene dosage by changing the number of functional copies of genes or regulatory elements within the affected segments. The formation of CNVs arises primarily through two key mechanisms: non-allelic homologous recombination (NAHR) and non-homologous end joining (NHEJ). NAHR occurs when highly similar low-copy repeats (LCRs), often sharing over 95% sequence identity across 1-5 kb, misalign during meiosis or mitosis, leading to unequal crossing over that produces recurrent deletions or duplications at predictable genomic hotspots. In contrast, NHEJ is an error-prone double-strand break repair pathway that ligates DNA ends with minimal or no homology (typically 0-4 bp microhomology), resulting in non-recurrent CNVs with more variable breakpoints. CNVs are classified as tandem when the duplicated or deleted segments are adjacent, often involving repetitive elements like microsatellites, or dispersed when they involve non-adjacent regions, such as through misalignment of segmental duplications. These mechanisms contribute to both germline and somatic CNVs, with NAHR being more common in genomic disorders due to its reliance on abundant repetitive sequences. CNVs exhibit substantial prevalence across human populations, with common variants (frequency >1%) shared widely and rare variants (<1% frequency) often unique to individuals or families. Comprehensive mapping efforts have revealed that CNVs affect 4.8-9.5% of the genome, with an early study identifying over 1,400 CNVs across 270 individuals from diverse ancestries, averaging about 12 exonic CNVs per genome. Common CNVs tend to be benign or adaptive, while rare ones are enriched for disease associations due to their potential for deleterious dosage changes. For instance, the AMY1 gene cluster on chromosome 1p21 shows copy number variation from 2 to 18 copies (average 6-7), correlating with salivary amylase enzyme levels and providing an adaptive advantage in populations with high-starch diets, such as agricultural societies. The functional consequences of CNVs stem largely from altered gene dosage, where reduced (haploinsufficiency) or increased (gain-of-function) expression disrupts cellular processes. This can manifest in developmental disorders, neurological conditions, or evolutionary adaptations. A prominent example is the 22q11.2 deletion, a recurrent ~3 Mb CNV mediated by NAHR between LCRs, which causes 22q11.2 deletion syndrome and confers a more than 20-fold increased risk for schizophrenia through dosage sensitivity of genes like PRODH and DGCR8. Such effects highlight how CNVs contribute to phenotypic diversity and disease susceptibility beyond single-nucleotide variants.

Inversions

Inversions are intra-chromosomal structural variants characterized by the reversal of orientation of a segment of DNA within a chromosome, resulting in no net gain or loss of genetic material. These balanced rearrangements arise from two breaks in the chromosome that allow the intervening segment to flip 180 degrees before rejoining. Inversions are classified into two main types based on their relation to the centromere: paracentric inversions, which occur within a single chromosome arm and exclude the centromere, and pericentric inversions, which span the centromere and involve breaks in both the short (p) and long (q) arms. Paracentric inversions maintain the arm ratio but can lead to acentric or dicentric chromosomes during meiosis if recombination occurs within the inverted segment, while pericentric inversions may alter the arm lengths and potentially change the chromosome's morphology. The primary mechanisms underlying inversion formation involve double-strand breaks (DSBs) in the DNA, followed by erroneous repair and rejoining of the broken ends. DSBs can be induced by various factors, such as ionizing radiation or replication errors, and are repaired through pathways like non-homologous end joining (NHEJ), which ligates ends without requiring homology, or non-allelic homologous recombination (NAHR) between low-copy repeats oriented in opposite directions. In heterozygotes—individuals carrying one normal and one inverted chromosome—pairing during meiosis forms an inversion loop, suppressing recombination within the inverted region to avoid unbalanced gametes with duplications or deletions. This suppression preserves linked alleles but can reduce fertility if crossovers occur, producing inviable gametes. A prominent example in humans is the 8p23.1 inversion polymorphism, a paracentric inversion spanning approximately 4.5 Mb on the short arm of chromosome 8, with frequencies varying by population—around 20–50% in Europeans and up to 70% in some African groups. This common variant likely originated from recombination between human endogenous retrovirus (HERV) elements and influences local recombination rates and gene expression. Another significant case involves inversions disrupting the F8 gene on the X chromosome, particularly the intron 22 inversion (Inv22), which accounts for about 45% of severe hemophilia A cases by splitting the gene and preventing functional factor VIII production through homologous recombination between intronic repeats. In evolutionary biology, inversions play a key role in speciation by reducing gene flow between diverging populations, as seen in Drosophila species where fixed inversions on multiple chromosomes predate species splits and maintain co-adapted gene complexes. For instance, in Drosophila persimilis and D. pseudoobscura, ancestral inversions on the X and second chromosomes suppress recombination in hybrids, enhancing reproductive isolation and contributing to postzygotic barriers like hybrid sterility. These inversions capture polymorphisms that promote divergence, thereby facilitating adaptation and lineage separation without altering gene content.

Other Structural Variants

Insertions represent a class of structural variants characterized by the addition of DNA segments into the genome, frequently driven by the activity of mobile genetic elements known as retrotransposons. Alu elements, which are short interspersed nuclear elements (SINEs) specific to primates, constitute the most abundant type, with over 1 million copies occupying approximately 11% of the human genome. These non-autonomous elements rely on the enzymatic machinery of long interspersed nuclear elements (LINEs), particularly LINE-1 (L1), for retrotransposition, and their insertions can alter gene function by disrupting coding sequences or regulatory regions. LINE-1 elements themselves are autonomous retrotransposons comprising about 17% of the genome, with roughly 500,000 copies, many of which are truncated or rearranged but still contribute to ongoing genomic insertions. Mobile element insertions (MEIs), including both Alu and L1, account for a significant portion of structural variation and have been implicated in over 100 cases of human genetic disorders by directly interrupting essential genes. Translocations involve the exchange of genetic material between non-homologous chromosomes, often resulting in balanced rearrangements that do not alter overall DNA dosage but can reposition genes or regulatory elements. In balanced translocations, carriers typically exhibit no phenotypic effects, yet they face elevated risks of producing gametes with unbalanced derivatives, leading to offspring with partial trisomies or monosomies. A well-documented example is the constitutional t(11;22)(q23;q11.2) translocation, the only known recurrent non-Robertsonian translocation in humans, where balanced carriers have a 10-15% risk of conceiving children with Emanuel syndrome due to the supernumerary der(22)t(11;22) chromosome. This translocation arises through a specific mechanism involving low-copy repeats and palindromic sequences, highlighting how repetitive DNA can predispose to inter-chromosomal exchanges. Complex structural variants (cxSVs) encompass multifaceted rearrangements that combine multiple simple variant types, such as inversion-deletions, duplications, or insertions within a single event, often spanning several breakpoints. These variants frequently originate from repeat-induced mechanisms, including non-allelic homologous recombination (NAHR) in regions of high sequence similarity, leading to clustered alterations that defy simple end-joining models. For instance, inversion-deletion complexes can juxtapose distant genomic elements, creating novel fusion genes or disrupting topologically associated domains. Studies of germline and somatic genomes have revealed that cxSVs are an important but underappreciated component of in disease cohorts, underscoring their role in routine diagnostics due to challenges in resolution by short-read sequencing. Repeat-induced rearrangements, particularly in Alu-rich or LINE-flanked regions, amplify genomic instability and are prevalent in both constitutional and acquired contexts. In somatic contexts, such as cancer, translocations exemplify the pathogenic potential of these variants; the Philadelphia chromosome, resulting from t(9;22)(q34;q11), generates the BCR-ABL1 fusion oncogene that constitutively activates tyrosine kinase signaling in chronic myeloid leukemia. This balanced translocation, occurring somatically in hematopoietic stem cells, drives clonal expansion and is detectable in over 95% of CML cases. Mobile element insertions further illustrate disease causation, as de novo L1 retrotransposition into exon 14 of the F8 gene disrupts protein production and leads to severe hemophilia A in affected individuals. Alu insertions have similarly been documented in hemophilia cases, often creating premature stop codons or exon deletions within coagulation factor genes. These examples highlight how insertions and translocations, alone or in complex forms, contribute to both inherited and acquired disorders by perturbing gene integrity.

Detection Methods

Microscopic and Cytogenetic Techniques

Microscopic and cytogenetic techniques have long served as foundational methods for detecting large-scale (SVs) in chromosomes, particularly those visible at the light microscope level. Karyotyping, the process of visualizing and arranging chromosomes from a cell sample, relies on staining to reveal banding patterns that highlight structural abnormalities such as deletions, duplications, inversions, and translocations exceeding several megabases. The most widely used approach is G-banding, which involves treating metaphase chromosomes with trypsin followed by Giemsa staining to produce characteristic light and dark bands along each chromosome. This technique, standardized at the Paris Conference in 1971, enables the identification of aneuploidies, polyploidies, and large SVs with a resolution typically ranging from 5 to 10 Mb, allowing detection of alterations that disrupt chromosome morphology but missing smaller submicroscopic changes. Advanced cytogenetic methods build on basic karyotyping by incorporating fluorescence in situ hybridization (FISH) variants to enhance specificity for complex rearrangements. Spectral karyotyping (SKY), introduced in 1996, uses a combination of five fluorochromes and spectral imaging to paint each of the 24 human chromosomes in distinct pseudo-colors, facilitating the simultaneous visualization of all chromosomes and the detection of marker chromosomes, translocations, and other interchromosomal exchanges that may be cryptic in standard banding. Complementing SKY, multicolor FISH (M-FISH) employs chromosome-specific probe sets with varying fluorophores to differentiate chromosomes and pinpoint breakpoints in translocations, offering improved resolution for identifying derivative chromosomes in complex karyotypes. These techniques are particularly valuable for resolving ambiguities in G-banded karyotypes, such as in cases of multiple marker chromosomes or hidden translocations. Comparative genomic hybridization (CGH) represents a pivotal advancement in cytogenetic analysis for SV detection, comparing patient DNA to reference DNA to identify copy-number imbalances without requiring cell culturing. Traditional metaphase CGH, while effective for large aneuploidies and CNVs, has been largely superseded by array-CGH, which uses microarray platforms with densely spaced probes to achieve higher resolution (down to 50-100 kb in some implementations) for detecting segmental aneuploidies and large CNVs across the genome. Developed in 1998, array-CGH hybridizes differentially labeled test and reference DNAs to arrays of bacterial artificial chromosomes or oligonucleotides, with ratio imbalances indicating gains or losses. These techniques find primary application in prenatal diagnostics, where amniotic fluid or chorionic villi obtained via amniocentesis or chorionic villus sampling are analyzed to screen for fetal chromosomal abnormalities like trisomies or large SVs that could lead to congenital disorders. For instance, G-banding and array-CGH are routinely used to confirm ultrasound-detected anomalies, providing actionable insights for clinical decision-making. However, their limitations include low resolution for SVs smaller than 1 Mb, reliance on dividing cells for metaphase preparations, and potential misses of balanced translocations or low-level mosaicism, necessitating complementary molecular approaches for comprehensive assessment.

Molecular and Sequencing-Based Methods

Molecular and sequencing-based methods provide high-resolution detection of sub-microscopic structural variations (SVs), enabling the identification of copy-number variations (CNVs), insertions, deletions, inversions, and translocations at the kilobase scale or smaller. These approaches leverage genomic arrays and next-generation sequencing (NGS) technologies to overcome the limitations of traditional cytogenetic techniques, which are constrained to larger chromosomal abnormalities. Array-based methods, in particular, were pivotal in the early 2000s for genome-wide CNV discovery, while sequencing-based strategies have evolved to capture complex SVs in repetitive and difficult-to-assemble regions. Array-based methods, such as single nucleotide polymorphism (SNP) arrays and array comparative genomic hybridization (aCGH), detect CNVs by measuring DNA copy number changes across the genome. SNP arrays interrogate known polymorphic sites to infer copy number through signal intensity and allele-specific ratios, achieving resolutions down to approximately 10-50 kb, with specialized designs enabling detection as fine as 1 kb in targeted regions. aCGH, conversely, hybridizes fluorescently labeled test and reference DNA samples to oligonucleotide or BAC arrays, quantifying copy number imbalances via log-ratio intensities; high-density oligonucleotide aCGH platforms offer resolutions of 1-5 kb, facilitating the identification of submicroscopic CNVs associated with developmental disorders. These methods excel in clinical diagnostics for de novo CNV calling but require computational normalization to account for probe biases and GC content effects. Short-read sequencing, typically using Illumina platforms with 100-300 bp reads, employs paired-end mapping and split-read alignment to detect SVs by analyzing read-pair orientations, insert sizes, and breakpoint-spanning alignments. Paired-end mapping identifies discordant read pairs—those with unexpected distances, orientations, or mapping positions—to infer deletions, insertions, and inversions larger than the read length, often combined with read-depth signals for CNV confirmation. Split-read alignment detects precise breakpoints by soft-clipping reads that align across SV junctions, enabling nucleotide-resolution calling of small insertions and deletions. Tools like GATK's SV pipeline integrate these signals with local de novo assembly for robust SV discovery in whole-genome sequencing data, while DELLY combines paired-end, split-read, and mate-pair information to achieve high sensitivity for complex rearrangements, though performance diminishes in low-mappability regions. Long-read sequencing technologies, including Pacific Biosciences (PacBio) HiFi circular consensus sequencing and Oxford Nanopore Technologies (ONT), span tens to hundreds of kilobases per read, dramatically improving SV detection in repetitive sequences and enabling haplotype phasing. PacBio HiFi reads (15-20 kb average length, >99% accuracy) resolve complex SVs like nested inversions and expansions by direct alignment or graph-based assembly, outperforming short-read methods in recall for variants >50 bp, particularly in centromeric and telomeric regions. ONT provides ultra-long reads (up to megabases) with real-time basecalling, facilitating the traversal of repeats to phase SVs across haplotypes and detect balanced translocations missed by short reads; however, higher rates necessitate consensus . These platforms have revealed thousands of novel SVs in human populations, with studies showing 20-30% more detections than short-read approaches in challenging genomic contexts. Emerging methods up to 2025, such as optical genome mapping (OGM) via Bionano Genomics, visualize long DNA molecules labeled at specific motifs to create genome-wide maps, detecting SVs from 500 bp to whole chromosomes with >95% sensitivity, including those in repetitive regions intractable to sequencing. OGM complements NGS by validating large indels and inversions through molecule-level resolution, often integrated in hybrid workflows for clinical diagnostics. CRISPR-based approaches, leveraging or dCas9 for targeted enrichment and breakpoint validation, enable precise confirmation of SVs by amplifying junctions for Sanger or NGS readout, enhancing specificity in validation pipelines for complex variants. Additionally, recent advances in and , such as models like SVEA and Primer, have improved SV calling from sequencing data by enhancing prediction accuracy in complex and heterogeneous samples like tumors. These innovations continue to refine SV detection, prioritizing accuracy in heterogeneous samples like tumors.

Biological and Clinical Significance

Associations with Phenotypes and Diseases

Structural variations (SVs) play a significant role in human phenotypic diversity and disease susceptibility by altering , disrupting regulatory elements, and modifying architecture. Pathogenic SVs, such as deletions and duplications, often lead to monogenic disorders by directly impacting coding regions or essential splice sites, while somatic SVs in cancer drive oncogenesis through gene amplifications. In , SVs modulate in response to environmental factors, contributing to adaptive phenotypes. Recent genomic studies highlight the underappreciated burden of de novo SVs in neurodevelopmental disorders and the polygenic contributions of common SVs to multifactorial diseases like . In monogenic diseases, SVs frequently cause loss-of-function mutations in critical genes. For instance, in (DMD), approximately 65% of cases arise from exonic deletions and 10% from duplications in the DMD gene, leading to absent or truncated protein and progressive muscle degeneration. These copy-number variations (CNVs) disrupt the , resulting in severe phenotypes, while in-frame variants may cause milder . Similar pathogenic mechanisms occur in other disorders, such as hemophilia A, where intron 22 inversions account for nearly half of severe cases by splitting the F8 gene and preventing proper transcription. SVs also influence by fine-tuning levels. A well-studied example is the CNV at the AMY1 locus encoding salivary , where copy number correlates positively with salivary protein levels and digestion efficiency. Populations with high-starch diets, such as agricultural societies, exhibit higher average AMY1 copies (6-8 per diploid ) compared to low-starch hunter-gatherers (4-5 copies), facilitating metabolic to carbohydrate-rich diets and influencing glycemic responses. This variation demonstrates how SVs contribute to without causing overt disease. In cancer, somatic SVs promote tumorigenesis by amplifying oncogenes or deleting tumor suppressors. Amplifications of the , often through or intrachromosomal duplications, occur in up to 15-20% of solid tumors across pan-cancer analyses, driving uncontrolled via enhanced transcription of growth-related genes. For example, amplifications are frequent in pancreatic ductal adenocarcinoma and , correlating with aggressive disease and poor prognosis by altering the proximal network, including enhancers and binding partners. These structural changes enable rapid oncogene activation, underscoring SVs' role in cancer evolution. Recent 2020s studies emphasize de novo SVs in neurodevelopmental disorders like autism spectrum disorder (ASD). Long-read whole- sequencing has revealed that de novo SVs, including complex rearrangements and CNVs, account for 5-10% of ASD cases, often disrupting noncoding regulatory elements or merging topologically associating domains to misregulate distant genes. Optical mapping in ASD cohorts identified novel SVs in 20-30% of unresolved exome-negative cases, enriching for brain-expressed genes involved in synaptic function and . These findings highlight SVs' substantial, previously underestimated contribution to ASD etiology beyond single-nucleotide variants. For common diseases, polygenic SVs add to the genetic architecture alongside single-nucleotide polymorphisms. In (T2D), rare and common CNVs collectively influence risk, with large deletions or duplications near insulin-related loci like PPARG or TCF7L2 modulating beta-cell function and insulin sensitivity. Genome-wide CNV association studies in diverse populations, including and , have identified enriched CNVs in T2D cases, interacting with factors to elevate susceptibility. This polygenic SV burden underscores their role in T2D's multifactorial pathogenesis.

Role in Evolution and Population Genetics

Structural variations (SVs) contribute substantially to within populations, often exceeding the impact of single-nucleotide variants in terms of affected genomic bases. In analyses of the cohorts using long-read sequencing, African ancestry samples exhibit a of 23,969 SVs per , compared to 19,297 in non-African samples, reflecting higher overall heterozygosity and allelic diversity in African genomes. This disparity underscores SVs as primary drivers of inter-individual differences, with African populations harboring a disproportionate share of novel and heterozygous variants that enhance population-level structural heterogeneity. SVs also play adaptive roles by facilitating selection and maintaining through mechanisms like balancing selection. For instance, the 17q21.31 inversion polymorphism on human chromosome 17, which spans approximately 900 kb and defines the H1 and H2 haplotypes, shows signatures of balancing selection that preserve both orientations at appreciable frequencies across populations. This inversion suppresses recombination within the region, thereby linking multiple loci into co-inherited haplotypes that may confer fitness advantages, such as increased fertility in carriers, contributing to its persistence despite associations with neurological traits. In the context of speciation, SVs, particularly chromosomal rearrangements like inversions and translocations, promote by reducing hybrid fertility. Between humans and chimpanzees, at least 26 large-scale chromosomal rearrangements have been identified since their divergence approximately 6-7 million years ago, including pericentric inversions that disrupt meiotic pairing in hypothetical hybrids. Heterozygosity for these fixed differences leads to meiotic aberrations, such as unbalanced gametes, thereby limiting and facilitating lineage divergence. Recent initiatives have further illuminated the evolutionary significance of SVs by transcending linear genomes to capture structural diversity more comprehensively. The Human Reference Consortium's 2023 draft, based on 47 diverse haplotypes, uncovered novel SVs and complex alleles at loci previously underrepresented, revealing how SVs drive adaptive variation beyond what SNP-focused references detect. In microbial , SVs similarly accelerate ; for example, in , insertions, deletions, and inversions enable rapid genome restructuring in response to environmental pressures, often outpacing point mutations in generating functional diversity.

Databases and Resources

Major Structural Variation Databases

dbVar, maintained by the (NCBI), serves as a primary public archive for genomic structural variations (SVs) across and other organisms, cataloging variants larger than 50 base pairs, including insertions, deletions, duplications, inversions, and mobile element insertions. It aggregates data from over 185 studies as of 2025, encompassing both research and clinical submissions, with a focus on human SVs from control, case, and tumor populations. dbVar tracks frequencies, population-specific distributions, and study-level metadata, enabling users to query variants by type, size, and genomic location; as of 2025, it includes millions of submitted SVs, facilitating comparative analyses across like and . DECIPHER (Database of Chromosomal Imbalance and in Humans Using Ensembl Resources) is a clinician-led resource dedicated to sharing rare, pathogenic SVs and other genomic variants linked to phenotypes in patients with developmental disorders and rare diseases. Launched by the Sanger Institute and international collaborators, it integrates copy-number variants, balanced rearrangements, and sequence variants from over 40,000 consented patient records containing more than 51,000 variants worldwide as of 2025, with controlled access for unpublished data to support clinical interpretation. The database emphasizes genotype-phenotype correlations, allowing users to visualize SV breakpoints, overlapping variants, and associated clinical features through an interactive interface built on Ensembl; version 11.35, released November 6, 2025, integrates gnomAD variant data. It aids in diagnostic decision-making by highlighting benign versus pathogenic alleles. The Genome Aggregation Database (gnomAD) provides a large-scale catalog of structural variants derived from whole-genome and of over 800,000 individuals across diverse populations, prioritizing the documentation of common and benign SVs to improve variant interpretation in clinical . Its SV , updated in version 4.1 as of 2024 with further browser table releases in April 2025, encompasses approximately 1.2 million SVs from 63,046 samples, including deletions, duplications, insertions, and inversions, with annotations stratified by ancestry and cohort. gnomAD SVs emphasize by filtering rare variants and integrating quality metrics from short- and long-read technologies, serving as a reference for distinguishing disease-associated SVs from normal variation; it excludes severe pediatric disease cohorts to focus on non-pathogenic diversity. The Pan-Cancer Analysis of Whole (PCAWG) has produced a comprehensive SV catalog from 2,658 cancer across 38 tumor types, matched to normal tissues, revealing patterns of SVs such as and kataegis in somatic contexts. This resource, released in 2020 and hosted via the International Cancer portal, documents over 100,000 driver SVs and recurrent rearrangements, with detailed breakpoint resolutions and ; it supports by integrating SV with transcriptomic and epigenomic profiles for pan-cancer comparisons. The Earth BioGenome Project (EBP), an international initiative to sequence all eukaryotic , generates structural variation data through high-quality assemblies for model and non-model organisms, contributing to SV catalogs in biodiversity genomics as of the 2020s. As of 2025, EBP efforts have advanced for various , highlighting evolutionary rearrangements and aiding comparative studies across the ; these datasets are deposited in public repositories like NCBI, emphasizing SVs in ecological and conservation contexts.

Analysis and Annotation Tools

Analysis and annotation tools for structural variations (SVs) encompass a range of software pipelines designed to simulate, detect, annotate, and visualize SVs post-sequencing, enabling researchers to interpret their genomic context and potential impacts. These tools process outputs from detection methods, such as (VCF) files, to predict functional consequences like gene disruptions or regulatory alterations. Simulation tools generate synthetic datasets to benchmark caller performance, while annotation pipelines integrate genomic annotations to prioritize pathogenic SVs. Visualization software facilitates interactive exploration of SV distributions across genomes. SVsim is a simulation toolbox that generates synthetic structural variants and corresponding sequencing reads to evaluate SV calling pipelines, supporting deletions, insertions, inversions, and translocations in reference genomes like hg19 or hg38. It automates variant insertion and read simulation, allowing customization of SV density and complexity for benchmarking studies. For instance, SVsim has been used to create ground-truth datasets for assessing short-read callers on simulated human genomes, revealing performance gaps in repetitive regions. Manta serves as a widely adopted caller for detecting SVs and indels from short-read paired-end sequencing data, optimized for and somatic analysis in small cohorts. It employs a graph-based approach to identify discordant read pairs and split reads, outputting VCF-formatted calls for deletions, duplications, inversions, and translocations with high sensitivity in tumor-normal pairs. Evaluations show Manta achieving precision above 80% for deletions over 50 bp in simulated datasets, though it may underperform in highly repetitive regions compared to long-read methods. Annotation tools like AnnotSV provide comprehensive functional interpretation of SVs by integrating over 40 annotation tracks, including overlaps, regulatory elements, and frequency data, to score potential pathogenicity. It processes VCF inputs to classify SVs as exonic, intronic, or intergenic, predicting impacts such as frameshifts or enhancer disruptions, and ranks variants by clinical relevance using databases like ClinVar. Performance benchmarks indicate AnnotSV processes 10,000 SVs in under 5 minutes, outperforming older tools in speed and coverage of non-coding effects. Similarly, SVAN annotates SVs from long-read assemblies by assessing overlaps with repeat elements, segmental duplications, and centromeric regions, aiding interpretation in diverse . In a study of 1,019 genomes, SVAN highlighted SVs in complex loci, integrating with references for refined calls. Visualization tools such as the Integrative Genomics Viewer (IGV) enable interactive plotting of SVs alongside aligned reads and annotations in VCF format, supporting zooming into breakpoints for manual validation. IGV's track-based interface displays SV arcs and read pileups, facilitating assessment of supporting evidence like split reads. Circos complements this by generating circular ideograms to depict genome-wide SV patterns, such as intra- and inter-chromosomal events, ideal for summarizing large cohorts. Both tools integrate seamlessly with VCF standards, allowing layered views of SV density and types across populations. Recent advancements include SVision-pro, a 2024 neural network framework for visualizing and discovering de novo and somatic SVs from long-read data, representing alignments as images for instance segmentation of complex events. It achieves over 90% recall for multi-breakpoint SVs in cancer genomes, surpassing traditional callers in repetitive regions. AI-based callers like DeepSV leverage convolutional neural networks to filter and call deletions from short-read alignments, improving accuracy to 95% F1-score on benchmark datasets by modeling read depth and pairing signals. These tools enhance precision in challenging genomic contexts, such as tandem repeats.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.