Hubbry Logo
Gene duplicationGene duplicationMain
Open search
Gene duplication
Community hub
Gene duplication
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Gene duplication
Gene duplication
from Wikipedia

Gene duplication (or chromosomal duplication or gene amplification) is a mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene. Gene duplications can arise as products of several types of errors in DNA replication and repair machinery as well as through fortuitous capture by selfish genetic elements. Common sources of gene duplications include ectopic recombination, retrotransposition event, aneuploidy, polyploidy, and replication slippage.[1]

Mechanisms of duplication

[edit]

Ectopic recombination

[edit]

Duplications arise from an event termed unequal crossing-over that occurs during meiosis between misaligned homologous chromosomes. The chance of it happening is a function of the degree of sharing of repetitive elements between two chromosomes. The products of this recombination are a duplication at the site of the exchange and a reciprocal deletion. Ectopic recombination is typically mediated by sequence similarity at the duplicate breakpoints, which form direct repeats. Repetitive genetic elements such as transposable elements offer one source of repetitive DNA that can facilitate recombination, and they are often found at duplication breakpoints in plants and mammals.[2]

Schematic of a region of a chromosome before and after a duplication event

Replication slippage

[edit]

Replication slippage is an error in DNA replication that can produce duplications of short genetic sequences. During replication DNA polymerase begins to copy the DNA. At some point during the replication process, the polymerase dissociates from the DNA and replication stalls. When the polymerase reattaches to the DNA strand, it aligns the replicating strand to an incorrect position and incidentally copies the same section more than once. Replication slippage is also often facilitated by repetitive sequences, but requires only a few bases of similarity.[citation needed]

Retrotransposition

[edit]

Retrotransposons, mainly L1, can occasionally act on cellular mRNA. Transcripts are reverse transcribed to DNA and inserted into random place in the genome, creating retrogenes. Resulting sequence usually lack introns and often contain poly(A) sequences that are also integrated into the genome. Many retrogenes display changes in gene regulation in comparison to their parental gene sequences, which sometimes results in novel functions. Retrogenes can move between different chromosomes to shape chromosomal evolution.[3]

Aneuploidy

[edit]

Aneuploidy occurs when nondisjunction at a single chromosome results in an abnormal number of chromosomes. Aneuploidy is often harmful and in mammals regularly leads to spontaneous abortions (miscarriages). Some aneuploid individuals are viable, for example trisomy 21 in humans, which leads to Down syndrome. Aneuploidy often alters gene dosage in ways that are detrimental to the organism; therefore, it is unlikely to spread through populations.

Polyploidy

[edit]

Polyploidy, or whole genome duplication, is a product of nondisjunction during meiosis which results in additional copies of the entire genome. Polyploidy is common in plants, but it has also occurred in animals, with two rounds of whole genome duplication (2R event) in the vertebrate lineage leading to humans.[4] It has also occurred in the hemiascomycete yeasts ~100 mya.[5][6]

After a whole genome duplication, there is a relatively short period of genome instability, extensive gene loss, elevated levels of nucleotide substitution and regulatory network rewiring.[7][8] In addition, gene dosage effects play a significant role.[9] Thus, most duplicates are lost within a short period, however, a considerable fraction of duplicates survive.[10] Interestingly, genes involved in regulation are preferentially retained.[11][12] Furthermore, retention of regulatory genes, most notably the Hox genes, has led to adaptive innovation.

Rapid evolution and functional divergence have been observed at the level of the transcription of duplicated genes, usually by point mutations in short transcription factor binding motifs.[13][14] Furthermore, rapid evolution of protein phosphorylation motifs, usually embedded within rapidly evolving intrinsically disordered regions is another contributing factor for survival and rapid adaptation/neofunctionalization of duplicate genes.[15] Thus, a link seems to exist between gene regulation (at least at the post-translational level) and genome evolution.[15]

Polyploidy is also a well known source of speciation, as offspring, which have different numbers of chromosomes compared to parent species, are often unable to interbreed with non-polyploid organisms. Whole genome duplications are thought to be less detrimental than aneuploidy as the relative dosage of individual genes should be the same.

As an evolutionary event

[edit]
Evolutionary fate of duplicate genes

Rate of gene duplication

[edit]

Comparisons of genomes demonstrate that gene duplications are common in most species investigated. This is indicated by variable copy numbers (copy number variation) in the genome of humans[16][17] or fruit flies.[18] However, it has been difficult to measure the rate at which such duplications occur. Recent studies yielded a first direct estimate of the genome-wide rate of gene duplication in Caenorhabditis elegans, the first multicellular eukaryote for which such as estimate became available. The gene duplication rate in C. elegans is on the order of 10−7 duplications/gene/generation, that is, in a population of 10 million worms, one will have a gene duplication per generation. This rate is two orders of magnitude greater than the spontaneous rate of point mutation per nucleotide site in this species.[19] Older (indirect) studies reported locus-specific duplication rates in bacteria, Drosophila, and humans ranging from 10−3 to 10−7/gene/generation.[20][21][22]

Genome duplication in cancer

[edit]

Genome duplication does not occur as a single event but as a continuous process during tumor progression, generating cells with different degrees of ploidy. More than 60% of the tumors analyzed showed multiple whole-genome duplication (WGD) events, suggesting an active evolutionary model within the tumor.[23]

Neofunctionalization

[edit]

Gene duplications are an essential source of genetic novelty that can lead to evolutionary innovation. Duplication creates genetic redundancy, where the second copy of the gene is often free from selective pressure—that is, mutations of it have no deleterious effects to its host organism. If one copy of a gene experiences a mutation that affects its original function, the second copy can serve as a 'spare part' and continue to function correctly. Thus, duplicate genes accumulate mutations faster than a functional single-copy gene, over generations of organisms, and it is possible for one of the two copies to develop a new and different function. Some examples of such neofunctionalization is the apparent mutation of a duplicated digestive gene in a family of ice fish into an antifreeze gene and duplication leading to a novel snake venom gene[24] and the synthesis of 1 beta-hydroxytestosterone in pigs.[25]

Gene duplication is believed to play a major role in evolution; this stance has been held by members of the scientific community for over 100 years.[26] Susumu Ohno was one of the most famous developers of this theory in his classic book Evolution by gene duplication (1970).[27] Ohno argued that gene duplication is the most important evolutionary force since the emergence of the universal common ancestor.[28] Major genome duplication events can be quite common. It is believed that the entire yeast genome underwent duplication about 100 million years ago.[29] Plants are the most prolific genome duplicators. For example, wheat is hexaploid (a kind of polyploid), meaning that it has six copies of its genome.

Subfunctionalization

[edit]

Another possible fate for duplicate genes is that both copies are equally free to accumulate degenerative mutations, so long as any defects are complemented by the other copy. This leads to a neutral "subfunctionalization" (a process of constructive neutral evolution) or DDC (duplication-degeneration-complementation) model,[30][31] in which the functionality of the original gene is distributed among the two copies. Neither gene can be lost, as both now perform important non-redundant functions, but ultimately neither is able to achieve novel functionality.

Subfunctionalization can occur through neutral processes in which mutations accumulate with no detrimental or beneficial effects. However, in some cases subfunctionalization can occur with clear adaptive benefits. If an ancestral gene is pleiotropic and performs two functions, often neither one of these two functions can be changed without affecting the other function. In this way, partitioning the ancestral functions into two separate genes can allow for adaptive specialization of subfunctions, thereby providing an adaptive benefit.[32]

Loss

[edit]

Often the resulting genomic variation leads to gene dosage dependent neurological disorders such as Rett-like syndrome and Pelizaeus–Merzbacher disease.[33] Such detrimental mutations are likely to be lost from the population and will not be preserved or develop novel functions. However, many duplications are, in fact, not detrimental or beneficial, and these neutral sequences may be lost or may spread through the population through random fluctuations via genetic drift.

Identifying duplications in sequenced genomes

[edit]

Criteria and single genome scans

[edit]

The two genes that exist after a gene duplication event are called paralogs and usually code for proteins with a similar function and/or structure. By contrast, orthologous genes present in different species which are each originally derived from the same ancestral sequence. (See Homology of sequences in genetics).

It is important (but often difficult) to differentiate between paralogs and orthologs in biological research. Experiments on human gene function can often be carried out on other species if a homolog to a human gene can be found in the genome of that species, but only if the homolog is orthologous. If they are paralogs and resulted from a gene duplication event, their functions are likely to be too different. One or more copies of duplicated genes that constitute a gene family may be affected by insertion of transposable elements that causes significant variation between them in their sequence and finally may become responsible for divergent evolution. This may also render the chances and the rate of gene conversion between the homologs of gene duplicates due to less or no similarity in their sequences.

Paralogs can be identified in single genomes through a sequence comparison of all annotated gene models to one another. Such a comparison can be performed on translated amino acid sequences (e.g. BLASTp, tBLASTx) to identify ancient duplications or on DNA nucleotide sequences (e.g. BLASTn, megablast) to identify more recent duplications. Most studies to identify gene duplications require reciprocal-best-hits or fuzzy reciprocal-best-hits, where each paralog must be the other's single best match in a sequence comparison.[34]

Most gene duplications exist as low copy repeats (LCRs), rather highly repetitive sequences like transposable elements. They are mostly found in pericentronomic, subtelomeric and interstitial regions of a chromosome. Many LCRs, due to their size (>1Kb), similarity, and orientation, are highly susceptible to duplications and deletions.

Genomic microarrays detect duplications

[edit]

Technologies such as genomic microarrays, also called array comparative genomic hybridization (array CGH), are used to detect chromosomal abnormalities, such as microduplications, in a high throughput fashion from genomic DNA samples. In particular, DNA microarray technology can simultaneously monitor the expression levels of thousands of genes across many treatments or experimental conditions, greatly facilitating the evolutionary studies of gene regulation after gene duplication or speciation.[35][36]

Next generation sequencing

[edit]

Gene duplications can also be identified through the use of next-generation sequencing platforms. The simplest means to identify duplications in genomic resequencing data is through the use of paired-end sequencing reads. Tandem duplications are indicated by sequencing read pairs which map in abnormal orientations. Through a combination of increased sequence coverage and abnormal mapping orientation, it is possible to identify duplications in genomic sequencing data.

Nomenclature

[edit]
Human karyotype with annotated bands and sub-bands as used for the nomenclature of chromosome abnormalities. It shows dark and white regions as seen on G banding. Each row is vertically aligned at centromere level. It shows 22 homologous autosomal chromosome pairs, both the female (XX) and male (XY) versions of the two sex chromosomes, as well as the mitochondrial genome (at bottom left).

The International System for Human Cytogenomic Nomenclature (ISCN) is an international standard for human chromosome nomenclature, which includes band names, symbols and abbreviated terms used in the description of human chromosome and chromosome abnormalities. Abbreviations include dup for duplications of parts of a chromosome.[37] For example, dup(17p12) causes Charcot–Marie–Tooth disease type 1A.[38]

As amplification

[edit]

Gene duplication does not necessarily constitute a lasting change in a species' genome. In fact, such changes often don't last past the initial host organism. From the perspective of molecular genetics, gene amplification is one of many ways in which a gene can be overexpressed. Genetic amplification can occur artificially, as with the use of the polymerase chain reaction technique to amplify short strands of DNA in vitro using enzymes, or it can occur naturally, as described above. If it's a natural duplication, it can still take place in a somatic cell, rather than a germline cell (which would be necessary for a lasting evolutionary change).

Role in cancer

[edit]

Duplications of oncogenes are a common cause of many types of cancer. In such cases the genetic duplication occurs in a somatic cell and affects only the genome of the cancer cells themselves, not the entire organism, much less any subsequent offspring. Recent comprehensive patient-level classification and quantification of driver events in TCGA cohorts revealed that there are on average 12 driver events per tumor, of which 1.5 are amplifications of oncogenes.[39]

Common oncogene amplifications in human cancers
Cancer type Associated gene
amplifications
Prevalence of
amplification
in cancer type
(percent)
Breast cancer MYC 20%[40]
ERBB2 (HER2) 20%[40]
CCND1 (Cyclin D1) 15–20%[40]
FGFR1 12%[40]
FGFR2 12%[40]
Cervical cancer MYC 25–50%[40]
ERBB2 20%[40]
Colorectal cancer HRAS 30%[40]
KRAS 20%[40]
MYB 15–20%[40]
Esophageal cancer MYC 40%[40]
CCND1 25%[40]
MDM2 13%[40]
Gastric cancer CCNE (Cyclin E) 15%[40]
KRAS 10%[40]
MET 10%[40]
Glioblastoma ERBB1 (EGFR) 33–50%[40]
CDK4 15%[40]
Head and neck cancer CCND1 50%[40]
ERBB1 10%[40]
MYC 7–10%[40]
Hepatocellular cancer CCND1 13%[40]
Neuroblastoma MYCN 20–25%[40]
Ovarian cancer MYC 20–30%[40]
ERBB2 15–30%[40]
AKT2 12%[40]
Sarcoma MDM2 10–30%[40]
CDK4 10%[40]
Small cell lung cancer MYC 15–20%[40]


Whole-genome duplications are also frequent in cancers, detected in 30% to 36% of tumors from the most common cancer types.[41][42] Their exact role in carcinogenesis is unclear, but they in some cases lead to loss of chromatin segregation leading to chromatin conformation changes that in turn lead to oncogenic epigenetic and transcriptional modifications.[43]

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Gene duplication is a fundamental process in whereby a segment of DNA containing a gene is copied, resulting in two or more identical copies within the that can subsequently diverge through . This mechanism provides organisms with additional genetic material, allowing one copy to retain its original function while the duplicate acquires novel roles, thereby contributing to and without disrupting essential processes. Gene duplications occur through several distinct mechanisms, including whole-genome duplication (WGD), where the entire set of chromosomes is replicated, often seen in and early vertebrates; tandem duplication, involving adjacent copies formed by during ; segmental duplication, which copies large chromosomal regions; and retroduplication, where reverse-transcribed mRNA creates intronless gene copies via retrotransposons. These processes vary in frequency across species—for instance, WGD events have shaped over 64% of genes in many genomes, while tandem duplications account for about 10% in —and can lead to , which is particularly prevalent in flowering . Evolutionarily, gene duplication serves as a primary source of genetic innovation by enabling neofunctionalization, where the duplicate gains a new function, or subfunctionalization, where the original functions are partitioned between copies. This has driven key adaptations, such as enhanced stress resistance in plants and the diversification of gene families, with nearly all genes tracing back to ancient duplications. However, duplications can also contribute to when dosage imbalances occur, as in over 80% of human disease-associated genes that have undergone duplication. Overall, the retention and divergence of duplicates, with a typical half-life of around 4 million years, underscore their role in long-term genomic across eukaryotes.

Fundamentals

Definition and Types

Gene duplication is a fundamental evolutionary process in which a segment of DNA containing a functional is copied within the , resulting in two or more identical or nearly identical copies of the original . This duplication creates genetic redundancy, allowing one copy to maintain the original function while the other may accumulate mutations without immediate deleterious effects. The process is widespread across eukaryotes and prokaryotes, contributing to expansion and functional innovation, as first systematically explored in Susumu Ohno's seminal work. Gene duplications are classified into several types based on their genomic scale and mechanism of origin. Tandem duplications occur when copies are generated adjacent to each other on the same chromosome, often through errors in recombination, resulting in gene clusters. Dispersed duplications produce non-adjacent copies scattered across the genome, typically via transposition events like retrotransposition or DNA-mediated movement. Segmental duplications involve larger blocks of DNA, encompassing multiple genes, duplicated within or between chromosomes. Whole-genome duplications (WGD), also known as polyploidy events, replicate the entire genome, leading to multiple copies of all genes simultaneously; these are particularly common in plants but have occurred in vertebrate lineages as well. At the molecular level, gene duplication immediately introduces , where the duplicate copies share overlapping functions and are initially under relaxed purifying selection, as in one copy are buffered by the other. This reduces selective pressure on the duplicates, permitting neutral or slightly deleterious changes to accumulate without disrupting essential functions, though most duplicates are eventually lost or pseudogenized. Functional divergence, if it occurs, arises later through processes like neofunctionalization or subfunctionalization, but the initial phase is characterized by preserved sequence similarity and . A classic example of whole-genome duplication's impact is seen in the clusters of , where two rounds of WGD in early produced four clusters (HoxA-D) from an ancestral single cluster, enabling spatial patterning innovations in body plans such as paired appendages.

Historical Context

The concept of gene duplication emerged in the early through cytogenetic studies in , where —whole-genome duplication—was recognized as a common mechanism contributing to and variation. Dutch botanist first described mutants in in 1907, and by the 1910s, researchers like Albert F. Blakeslee and Øjvind Winge had identified in various angiosperms, attributing it to doubling that amplified gene copies and facilitated evolutionary novelty. These observations laid foundational evidence for duplication events at the genomic scale, particularly in , where was estimated to occur in up to 70% of species by mid-century. In animals, early molecular insights came from research in the 1930s. Calvin B. Bridges demonstrated in 1936 that the Bar eye phenotype resulted from a tandem duplication of a chromosomal segment, providing the first direct evidence of segmental gene duplication and its phenotypic effects through . This work hinted at duplication as a source of genetic and , though it was viewed primarily as a cytological anomaly rather than an evolutionary driver. By the , the discovery of multigene families further illuminated the prevalence of duplications; for instance, (rDNA) was identified as a tandemly repeated multigene family in by Ritossa and Spiegelman in 1965, revealing hundreds of identical copies essential for . Similar findings in other organisms, such as and immunoglobulin genes, underscored that duplications generated families of related sequences, challenging the notion of genes as unique loci. The modern synthesis of gene duplication as a major evolutionary mechanism crystallized in 1970 with Susumu Ohno's seminal book Evolution by Gene Duplication, which argued that duplications provide raw material for innovation by freeing redundant copies from selective constraints, allowing divergence into new functions. This perspective integrated with Motoo Kimura's , proposed in 1968 and expanded in the 1970s, positing that many duplications and subsequent mutations are selectively neutral, fixed by rather than adaptive pressure, thus explaining the abundance of pseudogenes and paralogs in . Confirmation accelerated in the 1980s and 1990s with technologies; for example, sequencing of the human beta-globin cluster in 1980 revealed ancient duplications underlying evolution, while the 1996 genome sequence identified widespread paralogs from a whole-genome duplication event approximately 100 million years ago. These molecular data validated Ohno's hypothesis at scale, showing duplications accounted for 15-20% of eukaryotic genes. Early reception of these ideas was marked by debates over whether duplications primarily drive adaptive innovation or accumulate neutrally. Ohno's adaptive emphasis faced skepticism from neutralists like , who argued most fixed duplicates contribute little to fitness and are lost or silenced, as evidenced by high rates in genomes. Proponents of , however, highlighted cases like clusters, sequenced in the 1990s, where duplications correlated with morphological complexity. This tension persisted into the late 20th century, shaping models that balanced neutral drift with occasional positive selection in duplicate retention.

Mechanisms

Unequal Crossing Over

is a key mechanism of gene duplication that occurs during , particularly in , when misaligned homologous chromosomes or exchange genetic material unevenly. This misalignment leads to one recombinant chromatid receiving an extra copy of a or segment, while the reciprocal product experiences a deletion. The process is homology-dependent, relying on sequence similarity to initiate pairing, but errors in alignment result in non-allelic homologous recombination (NAHR), producing tandem duplications. At the molecular level, repetitive sequences play a critical role in facilitating misalignment. Low-copy repeats (LCRs), which are paralogous segments greater than 1 kb with over 90% sequence identity, mediate NAHR by promoting ectopic pairing between non-allelic sites. Similarly, Alu elements, abundant short interspersed nuclear elements, can drive unequal exchanges due to their high copy number and sequence homology, often resulting in local duplications or larger copy-number variants. These events typically yield arrays, where duplicated genes are arranged in direct orientation adjacent to the original copy, enhancing the potential for further evolutionary changes. Segmental duplications, involving large (often >10 ) non-tandem copies of chromosomal regions, can also arise via NAHR between dispersed LCRs, contributing to genomic architecture and disease susceptibility. The frequency of is elevated in genomic regions enriched with LCRs or Alu elements, as these repeats increase the likelihood of misalignment during . Such hotspots are common in clusters prone to instability, where even low-level homology (e.g., 25-39 bp identity) can suffice for recombination. In human sperm, for instance, de novo duplications occur at rates around 10^{-5} per , predominantly through intermolecular exchanges between homologous chromosomes. A prominent example is the duplication within the human alpha-globin gene cluster on , where between the alpha2 (HBA2) and alpha1 (HBA1) genes generates anti-3.7 kb duplications, resulting in three alpha-globin genes (ααα configuration). This event, driven by Z-box repetitive homology blocks flanking the genes, is reciprocal to common deletions and underscores how such mechanisms contribute to both normal variation and disease predisposition.

Replication-Based Errors

Replication-based errors during represent a primary mechanism for generating small-scale gene duplications, particularly those involving short tandem repeats (STRs). In this , known as replication slippage or slipped-strand mispairing, the temporarily dissociates from the template strand within repetitive sequences, leading to misalignment upon re-annealing. This slippage can cause the polymerase to skip forward (resulting in deletions) or repeat a segment (producing duplications) of the template, typically affecting sequences under 1 kb in length. Such errors are exacerbated in regions rich in STRs, where the repetitive nature facilitates strand dissociation during the S-phase of the . At the molecular level, replication fork stalling plays a central role, often triggered by non-B DNA structures such as hairpins or triplexes formed in repetitive or AT-rich sequences during strand unwinding. The fork stalling and template switching (FoSTeS) model describes how a stalled fork disengages, with the nascent strand invading a secondary template via microhomology (typically 2–15 bp), resuming synthesis and incorporating duplicated material. This mechanism accounts for both simple tandem duplications and more complex rearrangements with junctional microhomologies or insertions. Non-B structures, like stable hairpins in CAG/CTG repeats, impede polymerase progression, increasing the likelihood of template switching and duplication events. Error-prone DNA polymerases, such as those with lower fidelity (e.g., inversely correlated with proofreading efficiency), further promote slippage by stabilizing misaligned intermediates during synthesis. These errors are more frequent for microduplications under 1 kb, occurring at elevated rates in regions of , such as fragile sites or late-replicating domains. Replication timing influences susceptibility, with late-replicating regions exhibiting higher rates due to prolonged exposure to endogenous stresses and reduced efficiency. Experimental induction of replication (e.g., via aphidicolin) generates non-recurrent copy number variants (CNVs), including duplications, at frequencies mimicking spontaneous events, with breakpoints often showing microhomologies consistent with FoSTeS. Small tandem duplications of 15–300 bp are observed in up to 25% of certain alleles, underscoring their prevalence in genomic . A representative example is the expansion of CAG trinucleotide repeats in the HTT gene, associated with . Slippage during replication of these repeats leads to duplication of the triplet units, with formation on the nascent strand promoting further iterations and expansions beyond 36 repeats, resulting in toxic protein aggregates. This process highlights how replication errors in STRs can drive pathological duplications while contributing to evolutionary variation in repeat copy number.

Transposition Events

Transposition events contribute to gene duplication through retrotransposition, a process in which mature mRNA transcripts are reverse-transcribed into (cDNA) and randomly inserted into new genomic locations, generating retrogene copies of the original gene. This RNA-mediated mechanism differs from direct DNA duplication by relying on an intermediary transcript, often utilizing the enzymatic machinery of endogenous retroelements to facilitate the insertion. At the molecular level, long interspersed nuclear element-1 (LINE-1 or L1) retrotransposons play a central role by providing the enzyme, which converts the mRNA into cDNA via a target-primed reverse transcription process. The resulting retrogenes typically lack introns, as the source mRNA is processed and spliced, and they often insert without their original promoters or regulatory elements, leading to poly(A) tails at the 3' end but potential initial transcriptional silence unless new regulatory sequences are acquired nearby. These characteristics distinguish retrogenes from intron-containing duplicates formed by other mechanisms. Retrotransposition is particularly prevalent in mammalian genomes, where LINE-1 activity has driven a significant portion of processed formation, accounting for about 70% of non-functional duplicates in . In the , estimates indicate approximately 8,000 to 17,000 retrocopies exist, many of which originated from lineage expansions around 40-50 million years ago. This abundance underscores retrotransposition's role in genomic plasticity, though most retrogenes become , with a subset evolving new functions post-fixation. A notable example of retrotransposition's impact on gene family expansion involves the PGAM family, where functional retrocopies like PGAM5 have arisen and acquired new roles in cellular processes.

Chromosomal Alterations

Chromosomal alterations represent a major mechanism for generating gene duplications on a large scale, primarily through and , which result in the gain or multiplication of entire chromosomes or genomes, thereby creating multiple copies of numerous genes simultaneously. involves the abnormal gain or loss of one or more chromosomes, leading to an imbalance in where affected cells possess extra or fewer copies of genes on those chromosomes. This process often arises from , the failure of homologous chromosomes or to separate properly during or , which disrupts normal chromosome segregation and produces gametes or daughter cells with altered numbers. In contrast, entails the duplication of the entire , instantly doubling or multiplying gene copies across all chromosomes, and can occur through mechanisms such as hybridization between (leading to allopolyploidy) or , where cells undergo repeated without or . These alterations extend beyond single-gene events, affecting vast genomic regions and providing raw material for evolutionary innovation. Aneuploidy is typically transient in most organisms due to its disruptive effects on cellular function, but it can become fixed in certain lineages, contributing to gene copy variation. , however, is far more stable and prevalent, particularly in , where it serves as a key driver of and . Recent estimates suggest that polyploidy accompanies approximately 15% of speciation events in angiosperms, though older studies proposed higher figures of 30–80%. In animals, and related aneuploid events are rarer owing to challenges in and development, yet they have played pivotal roles in major evolutionary transitions, such as in . For instance, two rounds of whole-genome duplication (2R) occurred in the ancestral vertebrate lineage approximately 500–600 million years ago, followed by a third round (3R) in fish, which expanded families essential for complex traits like the nervous and immune systems. These events underscore how chromosomal alterations can facilitate rapid genomic reconfiguration without relying on incremental small-scale duplications.

Evolutionary Implications

Duplication Rates

Gene duplication rates are typically estimated through phylogenetic analyses that reconstruct the divergence times of paralogous gene pairs using molecular clocks calibrated against known evolutionary timelines. These methods account for synonymous substitution rates (Ks) between duplicates to infer when duplications occurred, providing a framework to quantify both ongoing small-scale events and episodic bursts from whole-genome duplications (WGDs). In animals, the average duplication rate is approximately 0.01 events per per million years, based on genomic surveys of such as humans, nematodes, fruit flies, and . This rate reflects primarily and segmental duplications, with estimates varying slightly by ; for instance, rates in vertebrates range from 0.0005 to 0.004 duplications per per million years when focusing on recent events. In the , duplicated genes constitute about 8–20% of the total content, underscoring the cumulative impact of these events over evolutionary time. exhibit generally higher effective duplication rates, often exceeding 0.01 per per million years when including polyploidy-driven WGDs, which are far more prevalent in than in animals and can double the gene complement instantaneously. For example, many plant lineages, such as , show elevated retention of duplicates with half-lives of 17–25 million years, compared to 3–7 million years in animals, due to these polyploid events. Several factors influence these rates across taxa. Larger sizes correlate with higher duplication frequencies, as expanded non-coding regions facilitate segmental duplications and transposon-mediated events. Recombination hotspots, where is more likely, also elevate local duplication rates by promoting non-allelic . Selection pressures play a key role in modulating net rates by favoring retention of duplicates under dosage constraints or novel functions, while purging redundant copies; purifying selection is stronger in essential genes, leading to faster loss rates. Variation is evident across taxa—for instance, fishes display accelerated duplication dynamics post their ancient WGD event approximately 300–450 million years ago, resulting in higher proportions of paralogs (up to 20–30% in some species like ) and elevated tandem duplication rates compared to other vertebrates. This burst contributed to the diversification of s, which comprise over half of all vertebrate species.

Neofunctionalization

Neofunctionalization refers to the evolutionary process whereby, after gene duplication, one paralog acquires a novel function—such as a new enzymatic activity or a distinct expression pattern—while the other copy preserves the original ancestral role. This divergence enables the innovation of new traits without disrupting established functions, contributing to adaptive across . The concept builds on the initial redundancy created by duplication, which provides a genetic buffer for mutational experimentation. At the molecular level, neofunctionalization arises from relaxed purifying selection on the duplicate , allowing neutral or slightly deleterious mutations to accumulate until beneficial ones confer selective advantages. These adaptive changes often involve alterations in regulatory regions, leading to novel spatiotemporal expression, or structural modifications like shuffling that enable new interactions or catalytic properties. For instance, mutations in promoter sequences can shift expression to new tissues, while shuffling might repurpose binding sites for different substrates. Such mechanisms have been observed in , where duplicated copies develop enhanced specificity or entirely new reactions. Evidence for neofunctionalization emerges from , revealing paralogous genes with specialized roles that diverged post-duplication. A prominent example is the globin gene family in vertebrates, where ancient duplications led to paralogs like alpha and beta hemoglobins adapting distinct functions in oxygen transport and storage across developmental stages and tissues, such as fetal versus adult forms. Similarly, in insects, the bithorax complex demonstrates neofunctionalization through gene duplicates that acquired unique regulatory roles in body patterning. These cases highlight how paralogs evolve non-overlapping functions, supported by sequence divergence and functional assays. Theoretical models underpin neofunctionalization, with Susumu Ohno's foundational framework proposing that gene duplication supplies the raw material for evolutionary novelty by freeing one copy from selective constraints. Ohno emphasized that this redundancy fosters innovation, as seen in genome expansions. Quantitative models extend this by estimating the probability of fixation for advantageous in duplicates under positive selection, often approximating 2s (where s is the selection coefficient) compared to neutral drift, which influences the likelihood of permanent divergence. These probabilistic approaches, informed by , predict higher neofunctionalization rates in large populations with strong selective pressures.

Subfunctionalization and Dosage Effects

Subfunctionalization occurs when duplicated genes partition the ancestral gene's functions between the copies, thereby reducing redundancy and promoting the retention of both paralogs. This process typically involves complementary degenerative mutations that eliminate subsets of the original regulatory elements or protein domains in each duplicate, leading to a division of labor such as tissue-specific expression or specialized biochemical roles. For instance, one copy may retain expression in certain tissues while the other takes over in different ones, ensuring that the combined functions match the pre-duplication state. This mechanism was formalized in the duplication-degeneration-complementation (DDC) model, which posits that neutral mutations in cis-regulatory sequences, like promoters, can stochastically partition ancestral expression patterns, making both copies essential for viability. At the molecular level, subfunctionalization often arises through mutations affecting promoters, enhancers, or splicing sites, which alter expression timing, location, or isoform production without creating novel functions. Changes in can further drive this by fixing different splice variants in each paralog, preserving the ancestral while distributing subroles. In the (CYP) , involved in liver detoxification, duplicates have subfunctionalized to specialize in metabolizing distinct substrates, such as one paralog targeting specific xenobiotics while another handles endogenous compounds, enhancing adaptive responses to environmental toxins. This partitioning contrasts with neofunctionalization, where duplicates acquire entirely new functions, but both can contribute to long-term gene retention. Dosage effects refer to the selective pressures maintaining balanced copy numbers in duplicated genes, particularly those encoding stoichiometric components of protein complexes, where imbalances disrupt macromolecular assembly or cellular . genes exemplify this: following duplication, yeast histone paralogs are retained to preserve precise stoichiometry, with strong purifying selection against dosage imbalances via mechanisms like gene conversion to minimize divergence. Such balance is critical because excess or deficient gene products can impair complex formation; for instance, overexpressed in trigger and segregation errors. In metazoans, dosage imbalances from segmental duplications or often lead to developmental disorders or cancer predisposition, as seen in conditions like where extra copies of dosage-sensitive genes perturb stoichiometric networks.

Gene Loss and Redundancy

Following gene duplication, one common evolutionary outcome is the loss of one or both copies, often through the accumulation of deleterious mutations that render the gene non-functional, transforming it into a . This process typically begins shortly after duplication, as redundant copies experience relaxed purifying selection, allowing slightly deleterious mutations—such as frameshifts, premature stop codons, or promoter disruptions—to accumulate and fix via . In many cases, the redundant copy decays neutrally until it is completely silenced or deleted from the , contributing to the observation that the vast majority of duplicate genes are lost within a few million years. Estimates suggest that 50-80% of duplicates may be lost or pseudogenized within this timeframe, depending on the and duplication mechanism, as seen in post-whole-genome duplication events in like where 30-65% of duplicates were eliminated over tens of millions of years. Redundancy resolution after duplication is heavily influenced by dosage sensitivity, where genes involved in balanced complexes or stoichiometric interactions are less likely to lose a copy due to the disruptive effects of altered . The gene balance hypothesis posits that such dosage-sensitive genes, including many transcription factors and signaling components, experience stronger selection against imbalance, leading to higher retention rates of duplicates compared to dosage-insensitive genes. For instance, essential genes—those whose is lethal—are disproportionately retained as duplicates, as their loss would compromise critical functions without the buffering effect of . This selective pressure helps maintain genomic stability by preserving copies that mitigate dosage perturbations, while non-essential, dosage-tolerant genes are more prone to rapid elimination. Evolutionary patterns of gene loss vary with size and ecological context, with faster pseudogenization observed in smaller populations where accelerates the fixation of disabling mutations. In neutral models of decay, the rate of pseudogene formation approximates the genomic deleterious mutation rate (typically 10^{-5} to 10^{-6} per site per generation), but in small effective sizes (e.g., Ne < 10^6), drift dominates, shortening the half-life of duplicates to as little as 1-5 million years on average across eukaryotes. A notable example is the mammalian-specific pseudogenization of olfactory receptor genes, where rapid expansions via duplication were followed by extensive losses—up to 50% s in humans—likely due to relaxed selection in species with diminished reliance on olfaction, such as primates. These patterns underscore how gene loss streamlines genomes by removing redundant or non-adaptive sequences, reducing metabolic costs and mutational targets while adapting to niche-specific pressures.

Detection Methods

Computational Identification

Computational identification of gene duplications relies on analyzing single-genome sequence data to detect paralogous genes—copies arising within the same lineage—through in silico algorithms that assess sequence homology, genomic context, and evolutionary relationships. Key criteria include high sequence similarity, typically requiring greater than 30-50% amino acid identity over substantial portions of the protein length (e.g., >70-90% coverage), to infer homology; synteny breaks, where conserved order is disrupted indicating duplication events; and paralog clustering, grouping genes into families based on shared ancestry. Tools like BLAST (Basic Local Alignment Search Tool) are foundational for initial local alignments, scanning genomes for similar sequences with e-value thresholds to filter spurious matches. Methods for detection encompass whole-genome alignments to pinpoint segmental duplicates, where tools such as MCScanX identify collinear blocks of homologous genes (requiring at least five pairs with minimal gaps) to reveal duplicated segments often spanning tens to hundreds of kilobases. For ancient duplications, phylogenetic tree reconciliation integrates gene trees—built from multiple sequence alignments using models like WAG or HKY—with species trees to infer duplication nodes by detecting inconsistencies like excess terminal branches. These approaches enable timing of events relative to speciation, distinguishing within-species paralogs from inter-species orthologs. Challenges in these methods include accurately distinguishing paralogs (duplication-derived) from orthologs (speciation-derived), which often requires multi-species comparisons to resolve ambiguous topologies, and handling assembly errors in repetitive regions that can artifactually inflate duplication counts or misalign segments. False positives from fragmented , particularly in low-coverage genomes, necessitate filtering steps like reciprocal best hits or synteny validation. A prominent example is Ensembl's paralogy predictions, which employ a inspired by TreeFam : genes are clustered via BLAST-based similarity (e.g., e-value < 1e-5), followed by multiple alignments and phylogenetic tree construction with TreeBeST for reconciliation, identifying duplications across vertebrate genomes with high precision for families like Hox genes.

Array-Based Techniques

Array-based techniques, particularly comparative genomic hybridization (CGH) microarrays, enable the detection of gene duplications by identifying copy number variations (CNVs) across the genome. In array CGH, genomic DNA from a test sample is labeled with one fluorophore (e.g., Cy3), while reference DNA is labeled with another (e.g., Cy5), and both are hybridized to an array of immobilized DNA probes, such as bacterial artificial chromosome (BAC) clones or oligonucleotides. The ratio of fluorescence intensities for each probe reflects relative copy number differences; specifically, the log2-transformed ratio (log2(test/reference)) greater than 0 indicates copy number gains, including duplications, with values around 0.58 corresponding to a single copy gain in diploid genomes. This method was pioneered in the late 1990s to achieve higher resolution than traditional metaphase CGH for analyzing DNA copy number alterations. Resolution has evolved significantly with array designs. Early BAC-based arrays offered megabase (Mb)-scale resolution due to larger probe sizes (100-200 kb), suitable for detecting large segmental duplications but limited for smaller events. Subsequent oligonucleotide and single nucleotide polymorphism (SNP) arrays improved this to kilobase (kb) scale, with probe densities enabling detection of CNVs as small as 1-10 kb, particularly effective for recent duplications not obscured by sequence divergence. These advancements allow array CGH to identify both germline and somatic duplications, though it primarily detects unbalanced changes and may miss low-level mosaicism below 20-30% cellular prevalence. In applications, array CGH has been instrumental in population genetics to map CNV landscapes, revealing widespread gene duplications contributing to human genetic diversity, as seen in studies profiling hundreds of individuals. In disease diagnostics, it aids in identifying pathogenic duplications associated with developmental disorders, congenital anomalies, and cancers, often as a first-line test replacing karyotyping due to its genome-wide coverage. However, a key limitation is its inability to readily distinguish tandem duplications (adjacent copies) from dispersed ones (non-adjacent), as it reports net copy number without structural context, necessitating orthogonal methods like fluorescence in situ hybridization for clarification. A notable example from the 2000s involved array CGH in the Human Genome Project era, where BAC-based platforms identified thousands of segmental duplications and associated CNVs, contributing to assemblies like hg17 and hg18 by highlighting duplication hotspots prone to genomic instability. For instance, high-density aCGH experiments targeted these regions, uncovering over 1,400 copy-number variable regions (CNVRs) in diverse human populations and linking duplications to evolutionary expansions in gene families like those involved in immunity.

Sequencing Approaches

Next-generation sequencing (NGS) technologies have revolutionized the detection of gene duplications by enabling high-throughput analysis of copy number variations (CNVs) and structural variants (SVs) at base-pair resolution. Read-depth analysis, a primary method in NGS, quantifies duplication events by measuring the normalized coverage of sequencing reads across genomic regions, where increased read depth indicates copy number gains. Paired-end mapping complements this by identifying SVs, including duplications, through discrepancies in the expected distance or orientation between read pairs, which signal insertions or rearrangements. These approaches build on earlier array-based techniques as precursors for CNV detection but offer superior resolution for mapping duplication breakpoints. Long-read sequencing technologies, such as PacBio's single-molecule real-time (SMRT) sequencing and (ONT), address limitations of short-read NGS by producing reads spanning tens to hundreds of kilobases, effectively resolving complex gene duplications within repetitive genomic contexts. These methods excel at assembling segmental duplications—low-copy repeats with high sequence identity—by spanning homologous regions that short reads often collapse or misalign. For instance, polyploid phasing algorithms applied to long-read data have enabled the de novo assembly of duplicated loci, distinguishing alleles in heterozygous duplications. In the 2020s, advances in long-read sequencing have significantly improved the resolution of segmental duplications exhibiting greater than 95% sequence identity, with complete telomere-to-telomere assemblies revealing previously hidden duplication structures in the human genome. These improvements stem from enhanced base-calling accuracy and hybrid assembly pipelines integrating short- and long-read data, achieving near-perfect reconstruction of duplicated regions that were intractable in earlier drafts. Integration of sequencing with CRISPR-Cas9 enrichment has further advanced validation, where targeted capture of duplicated loci followed by long-read sequencing confirms structural variants and resolves causal alleles in complex regions. Despite these progresses, challenges persist, particularly with short-read sequencing in repetitive regions, where high sequence similarity leads to mapping ambiguities and false positives in duplication calls. Quantification errors in read-depth analysis are also common due to biases from GC content or mappability, potentially under- or overestimating copy numbers in duplicated segments. Long-read technologies mitigate some issues but face higher per-base error rates, necessitating computational polishing for accurate duplication annotation. Hi-C sequencing provides a complementary 3D contextual view for duplication detection by capturing chromatin interactions, revealing spatial proximity between duplicated loci that indicates functional or evolutionary relationships. Recent pangenome studies from 2023 to 2025 have leveraged these sequencing approaches to uncover hidden duplications across diverse human populations, with graph-based pangenomes identifying novel SVs in non-reference alleles that short-read methods missed. For example, the Human Pangenome Reference Consortium's 2023 assembly highlighted population-specific gene duplications through long-read integration, enhancing our understanding of structural variation diversity. The 2025 Data Release 2 further expanded the pangenome with additional phased diploid assemblies from diverse ancestries, improving the identification of population-specific gene duplications and structural variants.

Nomenclature and Annotation

Naming Conventions

Gene duplication results in paralogous genes that require standardized nomenclature to facilitate consistent scientific communication and database integration. The Human Genome Organisation (HUGO) Gene Nomenclature Committee (HGNC) establishes these conventions for human genes, ensuring unique symbols that reflect evolutionary relationships without implying unverified functions. For paralogs arising from duplication, HGNC assigns a shared root symbol followed by distinguishing suffixes, typically Arabic numerals (e.g., -1, -2) or letters (e.g., A, B) based on sequence similarity, chromosomal location, or inferred function. Gene families, often expanded by duplications, use prefixes like CYP for the cytochrome P450 superfamily, with suffixes such as CYP2D6 indicating specific members. Pseudogenes, which are non-functional duplicates, receive a "P" suffix, as in CYP2D7P, to denote their inactivated status. These rules prioritize stability, with updates only for newly resolved duplications or to correct ambiguities, overseen by HGNC in collaboration with international experts. Naming principles emphasize brevity and specificity: chromosomal location informs symbols for genes of unknown function (e.g., location-based identifiers), while sequence homology or functional clues guide family assignments. However, challenges arise with ancient duplications, where extensive sequence divergence creates ambiguities in paralog identification and orthology assignment, complicating consistent labeling across species. The HGNC mitigates this through rigorous review, but entrenched provisional names (e.g., FAM for "family with sequence similarity") can persist until better evidence emerges. A prominent example is the HOX gene clusters, products of ancient whole-genome duplications, where paralogs are named by cluster (e.g., HOXA, HOXB) and positional numeral (e.g., HOXA1, HOXB1), reflecting their collinear arrangement and shared homeobox domain. This system highlights duplication events while avoiding functional speculation.

Database Resources

Several key databases serve as essential repositories for gene duplication data, enabling researchers to access annotated genomic regions, evolutionary histories, and comparative analyses across species. These resources integrate high-throughput sequencing data to facilitate the study of duplication events, their ages, and functional implications, while providing tools for visualization and programmatic access. Ensembl's Compara database offers comprehensive paralog trees derived from gene orthology and paralogy predictions, where paralogues are identified as genes sharing a most recent common ancestor via duplication events. These trees annotate duplication ages through reconciliation with species trees, distinguishing recent from ancient duplications, and include synteny viewers for visualizing conserved genomic blocks affected by duplications. The platform supports API access for querying homology data and has incorporated 2020s sequencing advancements, such as long-read assemblies, in its latest releases, including Ensembl 115 (September 2025) with expanded vertebrate and invertebrate genome coverage. The UCSC Genome Browser provides dedicated tracks for segmental duplications, displaying putative duplicated regions with color-coded levels of support based on sequence similarity and alignment evidence (data from 2013, last updated 2014 for GRCh38/hg38). This resource aids in identifying low-copy repeats and tandem duplicates within human and other mammalian genomes. While the browser integrates recent assemblies like GRCh38.p14 (2023), the specific segmental duplication track has not been updated; for refined boundaries from newer data, such as the Telomere-to-Telomere (T2T) Consortium's CHM13 assembly (2022), users may employ custom tracks or external resources. For plant-specific analyses, Phytozome hosts comparative genomics data across hundreds of Archaeplastida species, using tools like InParanoid-DIAMOND to cluster paralogous gene families and detect duplication-driven expansions. It features synteny browsers via JBrowse and BioMart for cross-species queries, with post-2020 updates including over 149 new genomes (up to October 2025, e.g., Nicotiana benthamiana v1.0) and improved homology alignments from long-read sequencing. As of Phytozome v14 (2025), it incorporates pangenome datasets such as BrachyPan (54 Brachypodium distachyon lines) and CowpeaPan (8 Vigna unguiculata genomes) to enhance duplication detection in diverse accessions. DupMasker is a specialized annotation tool for segmental duplications, particularly in primates, employing a library of consensus duplicon sequences (based on 2008 data) to mask and annotate duplicated regions with metrics like percent divergence and alignment scores. Integrated with RepeatMasker, it outputs GFF-formatted results for downstream analysis and supports modern search engines like RMBlast. For analyses with recent primate assemblies, supplementation with updated repeat libraries is recommended. OrthoDB complements these by cataloging orthologs and paralogs across eukaryotes and prokaryotes, using hierarchical orthology inference to distinguish duplication-derived paralogs from speciation-derived orthologs. This enables cross-species comparisons of gene family evolution, with tools for phyloprofiling duplication patterns in diverse taxa. The latest version, OrthoDB v12.2 (updated 2024), covers 5,952 eukaryotic species with expanded gene loci coordinates and CDS data.
DatabaseKey Features for Gene DuplicationPrimary OrganismsAccess Methods
Ensembl ComparaParalog trees, duplication age annotation, synteny viewersVertebrates, invertebratesWeb interface, API, BioMart
UCSC Genome BrowserSegmental dups tracks with similarity levels (2013 data, updated 2014)Mammals (e.g., human)Interactive browser, custom tracks
PhytozomeParalogy clustering, synteny via JBrowse, pangenome datasetsPlants (Archaeplastida)BioMart, genome browsers
DupMaskerDuplicon annotation, divergence metrics (2008 library)PrimatesCommand-line tool, GFF output
OrthoDBOrtholog-paralog distinction, phyloprofiles (v12.2, 2024)Eukaryotes, prokaryotesWeb search, downloads
Recent advancements in pangenomics have addressed gaps in non-model organisms by enabling the inclusion of diverse accessions in databases like Ensembl and Phytozome, with integrations as of 2025 (e.g., Ensembl's ongoing pangenome projects and Phytozome's BrachyPan) improving duplication detection in species lacking single reference genomes.

Pathological and Applied Aspects

Role in Disease Amplification

Gene duplications can contribute to disease pathogenesis through mechanisms that alter gene dosage, particularly in the form of oncogene amplification and copy number variations (). In cancers, oncogene amplification often arises via unequal crossing over during meiosis or mitosis, leading to increased copy numbers of proto-oncogenes such as on chromosome 8q24. This process results in extrachromosomal DNA elements or intrachromosomal homogeneously staining regions that drive uncontrolled cell proliferation. Similarly, CNVs involving gene duplications are implicated in neurodevelopmental disorders, where dosage imbalances disrupt neural development; for instance, duplications in regions like 16p11.2 or 22q11.2 are associated with autism spectrum disorder and by affecting synaptic function and neuronal connectivity. In oncology, amplified gene duplications elevate oncoprotein levels, promoting hallmarks of cancer such as sustained proliferation and evasion of apoptosis. A prominent example is HER2 (ERBB2) amplification on chromosome 17q12, observed in approximately 15-20% of breast cancers, which enhances signaling through the PI3K/AKT and MAPK pathways to accelerate tumor growth. This amplification is therapeutically targeted by trastuzumab, a monoclonal antibody that binds the extracellular domain of HER2, inhibiting dimerization and downstream signaling while recruiting immune effectors for antibody-dependent cellular cytotoxicity. Such targeted therapies have improved survival rates, with trastuzumab-based regimens reducing recurrence risk by up to 50% in HER2-positive cases. Beyond cancer, gene duplications underlie several genetic disorders by perturbing protein stoichiometry in cellular processes. Charcot-Marie-Tooth disease type 1A (CMT1A), the most common inherited neuropathy, results from a 1.4 Mb duplication on chromosome 17p12 encompassing the PMP22 gene, leading to 1.5- to 2-fold overexpression of peripheral myelin protein 22. This excess disrupts Schwann cell myelination of peripheral nerves, causing progressive muscle weakness and sensory loss with onset typically in the first or second decade of life. The duplication accounts for 70-80% of CMT1 cases and arises de novo in about 25% of patients. From an evolutionary perspective, gene duplications that initially provided adaptive advantages, such as expanded dosage for immune or metabolic functions, can predispose modern humans to disease susceptibility when dysregulated. Dosage-sensitive genes, which are intolerant to copy number changes, are enriched in genomic regions prone to recurrent duplications, linking ancient duplication events to contemporary disorders like congenital anomalies and cancers. This evolutionary legacy highlights how duplicated genes, while fostering innovation, create vulnerabilities exploited in pathological contexts. Recent advances in 2024 and 2025 have leveraged liquid biopsies to detect gene amplifications in circulating tumor DNA (ctDNA), enabling non-invasive monitoring of disease progression and therapy response. Studies demonstrate that ultrasensitive next-generation sequencing of ctDNA can identify amplifications like MYC or HER2 with >95% specificity in advanced cancers, correlating with tumor burden and resistance emergence. For example, a 2025 multicenter trial showed ctDNA-based detection of amplifications predicted in non-small cell lung cancer with accuracy comparable to tissue biopsies, facilitating personalized adjustments to targeted therapies. These insights underscore liquid biopsies' role in amplifying early intervention for duplication-driven malignancies.

Applications in Biotechnology

Gene duplication has been harnessed in biotechnology through synthetic techniques to create diverse gene libraries and facilitate . -Cas9 systems enable precise synthetic duplications by inducing targeted double-strand breaks that promote , allowing the copying of specific genomic segments for library construction. This approach is particularly useful in , where multiplexed variants generate libraries of duplicated regulatory elements or coding sequences to screen for enhanced functions, such as improved variants. Directed evolution leverages gene duplicates as scaffolds to accelerate the development of novel proteins with desired properties. By introducing duplicate copies of a into a host organism, followed by random and selection, researchers can evolve one copy while preserving the original function, mimicking natural neofunctionalization. A notable example involves β-propeller protein scaffolds through multiple gene duplications and domain rearrangements, which provides a stable framework for toward new catalytic activities. This method has been refined to include computational design of interface evolution between duplicated domains, yielding proteins with leaps in binding affinity or specificity. In , intentional gene duplications enhance biofuel production by increasing enzymatic flux through key pathways. In , duplicating genes involved in ethanol metabolism, such as , boosts tolerance and yield under industrial conditions, as seen in strains engineered for second-generation bioethanol from lignocellulosic feedstocks. Similarly, polyploid breeding induces whole-genome duplications to improve crop traits, leading to larger fruits, higher yields, and stress resistance; for example, tetraploid varieties of and exhibit enhanced vigor and nutrient content compared to diploids. These polyploids arise from colchicine-induced doubling, facilitating the fixation of beneficial alleles across multiple copies. Recent advances from 2023 to 2025 have focused on multiplexed duplication techniques to scale . Amplification editing (), a CRISPR-based tool, enables precise, programmable duplication of endogenous genes at chromosomal scales for higher expression levels in mammalian cells without off-target effects. This has been applied to boost therapeutic protein yields, such as monoclonal antibodies, by duplicating production cassettes in cells. In , such multiplexed duplications raise concerns, including risks of unintended genomic instability and equitable access, as alterations could be heritable if applied to cells, prompting calls for stringent oversight similar to broader editing guidelines. Validation of these duplications often relies on sequencing to confirm integration fidelity. A classic biotechnological example is the duplication of the human insulin gene in bacterial expression systems to meet pharmaceutical demands. Recombinant insulin production began with synthetic genes cloned into high-copy plasmids in , effectively duplicating the insulin sequence across multiple plasmid copies per cell to achieve gram-scale yields; this approach revolutionized treatment by providing scalable, human-identical insulin. Subsequent optimizations, including codon adaptation and multi-copy integration, further amplified output, with modern systems producing over 10 g/L in fermenters.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.