Hubbry Logo
Copy number variationCopy number variationMain
Open search
Copy number variation
Community hub
Copy number variation
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Copy number variation
Copy number variation
from Wikipedia

This gene duplication has created a copy number variation. The chromosome now has two copies of this section of DNA, rather than one.

Copy number variation (CNV) is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals.[1] Copy number variation is a type of structural variation: specifically, it is a type of duplication or deletion event that affects a considerable number of base pairs.[2] Approximately two-thirds of the entire human genome may be composed of repeats[3] and 4.8–9.5% of the human genome can be classified as copy number variations.[4] In mammals, copy number variations play an important role in generating necessary variation in the population as well as disease phenotype.[1]

Copy number variations can be generally categorized into two main groups: short repeats and long repeats. However, there are no clear boundaries between the two groups and the classification depends on the nature of the loci of interest. Short repeats include mainly dinucleotide repeats (two repeating nucleotides e.g. A-C-A-C-A-C...) and trinucleotide repeats. Long repeats include repeats of entire genes. This classification based on size of the repeat is the most obvious type of classification as size is an important factor in examining the types of mechanisms that most likely gave rise to the repeats,[5] hence the likely effects of these repeats on phenotype.

Types and chromosomal rearrangements

[edit]

One of the most well known examples of a short copy number variation is the trinucleotide repeat of the CAG base pairs in the huntingtin gene responsible for the neurological disorder Huntington's disease.[6] For this particular case, once the CAG trinucleotide repeats more than 36 times in a trinucleotide repeat expansion, Huntington's disease will likely develop in the individual and it will likely be inherited by his or her offspring.[6] The number of repeats of the CAG trinucleotide is inversely correlated with the age of onset of Huntington's disease.[7] These types of short repeats are often thought to be due to errors in polymerase activity during replication including polymerase slippage, template switching, and fork switching which will be discussed in detail later. The short repeat size of these copy number variations lends itself to errors in the polymerase as these repeated regions are prone to misrecognition by the polymerase and replicated regions may be replicated again, leading to extra copies of the repeat.[8] In addition, if these trinucleotide repeats are in the same reading frame in the coding portion of a gene, it may lead to a long chain of the same amino acid, possibly creating protein aggregates in the cell,[7] and if these short repeats fall into the non-coding portion of the gene, it may affect gene expression and regulation. On the other hand, a variable number of repeats of entire genes is less commonly identified in the genome. One example of a whole gene repeat is the alpha-amylase 1 gene (AMY1) that encodes alpha-amylase which has a significant copy number variation between different populations with different diets.[9] Although the specific mechanism that allows the AMY1 gene to increase or decrease its copy number is still a topic of debate, some hypotheses suggest that the non-homologous end joining or the microhomology-mediated end joining is likely responsible for these whole gene repeats.[9] Repeats of entire genes has immediate effects on expression of that particular gene, and the fact that the copy number variation of the AMY1 gene has been related to diet is a remarkable example of recent human evolutionary adaptation.[9] Although these are the general groups that copy number variations are grouped into, the exact number of base pairs copy number variations affect depends on the specific loci of interest. Currently, using data from all reported copy number variations, the mean size of copy number variant is around 118kb, and the median is around 18kb.[10]

In terms of the structural architecture of copy number variations, research has suggested and defined hotspot regions in the genome where copy number variations are four times more enriched.[2] These hotspot regions were defined to be regions containing long repeats that are 90–100% similar known as segmental duplications either tandem or interspersed and most importantly, these hotspot regions have an increased rate of chromosomal rearrangement.[2] It was thought that these large-scale chromosomal rearrangements give rise to normal variation and genetic diseases, including copy number variations.[1] Moreover, these copy number variation hotspots are consistent throughout many populations from different continents, implying that these hotspots were either independently acquired by all the populations and passed on through generations, or they were acquired in early human evolution before the populations split, the latter seems more likely.[1] Lastly, spatial biases of the location at which copy number variations are most densely distributed does not seem to occur in the genome.[1] Although it was originally detected by fluorescent in situ hybridization and microsatellite analysis that copy number repeats are localized to regions that are highly repetitive such as telomeres, centromeres, and heterochromatin,[11] recent genome-wide studies have concluded otherwise.[2] Namely, the subtelomeric regions and pericentromeric regions are where most chromosomal rearrangement hotspots are found, and there is no considerable increase in copy number variations in that region.[2] Furthermore, these regions of chromosomal rearrangement hotspots do not have decreased gene numbers, again, implying that there is minimal spatial bias of the genomic location of copy number variations.[2]

Detection and identification

[edit]

Copy number variation was initially thought to occupy an extremely small and negligible portion of the genome through cytogenetic observations.[12] Copy number variations were generally associated only with small tandem repeats or specific genetic disorders,[13] therefore, copy number variations were initially only examined in terms of specific loci. However, technological developments led to an increasing number of highly accurate ways of identifying and studying copy number variations. Copy number variations were originally studied by cytogenetic techniques, which are techniques that allow one to observe the physical structure of the chromosome.[12] One of these techniques is fluorescent in situ hybridization (FISH) which involves inserting fluorescent probes that require a high degree of complementarity in the genome for binding.[10] Comparative genomic hybridization was also commonly used to detect copy number variations by fluorophore visualization and then comparing the length of the chromosomes.[10]

Recent advances in genomics technologies gave rise to many important methods that are of extremely high genomic resolution and as a result, an increasing number of copy number variations in the genome have been reported.[10] Initially these advances involved using bacterial artificial chromosome (BAC) array with around 1 megabase of intervals throughout the entire gene,[14] BACs can also detect copy number variations in rearrangement hotspots allowing for the detection of 119 novel copy number variations.[2] High throughput genomic sequencing has revolutionized the field of human genomics and in silico studies have been performed to detect copy number variations in the genome.[2] Reference sequences have been compared to other sequences of interest using fosmids by strictly controlling the fosmid clones to be 40kb.[15] Sequencing end reads would provide adequate information to align the reference sequence to the sequence of interest, and any misalignments are easily noticeable thus concluded to be copy number variations within that region of the clone.[15] This type of detection technique offers a high genomic resolution and precise location of the repeat in the genome, and it can also detect other types of structural variation such as inversions.[10]

In addition, another way of detecting copy number variation is using single nucleotide polymorphisms (SNPs).[10] Due to the abundance of the human SNP data, the direction of detecting copy number variation has changed to utilize these SNPs.[16] Relying on the fact that human recombination is relatively rare and that many recombination events occur in specific regions of the genome known as recombination hotspots, linkage disequilibrium can be used to identify copy number variations.[16] Efforts have been made in associating copy number variations with specific haplotype SNPs by analyzing the linkage disequilibrium, using these associations, one is able to recognize copy number variations in the genome using SNPs as markers. Next-generation sequencing techniques including short and long read sequencing are nowadays increasingly used and have begun to replace array-based techniques to detect copy number variations.[17][18]

Molecular mechanism

[edit]

There are two main types of molecular mechanism for the formation of copy number variations: homologous based and non-homologous based.[5] Although many suggestions have been put forward, most of these theories are speculations and conjecture. There is no conclusive evidence that correlates a specific copy number variation to a specific mechanism.

Diagrammatic representation of non-allelic homologous recombination. Here, Gene X represents the gene of interest and the black line represents the chromosome. When the two homologous chromosomes are misaligned and recombination occurs, it may result in a duplication of the gene.

One of the best-recognized theories that leads to copy number variations as well as deletions and inversions is non-allelic homologous recombinations.[19] During meiotic recombination, homologous chromosomes pair up and form two ended double-stranded breaks leading to Holliday junctions. However, in the aberrant mechanism, during the formation of Holliday junctions, the double-stranded breaks are misaligned and the crossover lands in non-allelic positions on the same chromosome. When the Holliday junction is resolved, the unequal crossing over event allows transfer of genetic material between the two homologous chromosomes, and as a result, a portion of the DNA on both the homologues is repeated.[19] Since the repeated regions are no longer segregating independently, the duplicated region of the chromosome is inherited. Another type of homologous recombination based mechanism that can lead to copy number variation is known as break induced replication.[20] When a double stranded break occurs in the genome unexpectedly the cell activates pathways that mediate the repair of the break.[20] Errors in repairing the break, similar to non-allelic homologous recombination, can lead to an increase in copy number of a particular region of the genome. During the repair of a double stranded break, the broken end can invade its homologous chromosome instead of rejoining the original strand.[20] As in the non-allelic homologous recombination mechanism, an extra copy of a particular region is transferred to another chromosome, leading to a duplication event. Furthermore, cohesin proteins are found to aid in the repair system of double stranded breaks through clamping the two ends in close proximity which prevents interchromosomal invasion of the ends.[21] If for any reason, such as activation of ribosomal RNA, cohesin activity is affected then there may be local increase in double stranded break repair errors.[21]

The other class of possible mechanisms that are hypothesized to lead to copy number variations is non-homologous based. To distinguish between this and homologous based mechanisms, one must understand the concept of homology. Homologous pairing of chromosomes involved using DNA strands that are highly similar to each other (~97%) and these strands must be longer than a certain length to avoid short but highly similar pairings.[5] Non-homologous pairings, on the other hand, rely on only few base pairs of similarity between two strands, therefore it is possible for genetic materials to be exchanged or duplicated in the process of non-homologous based double stranded repairs.[5]

One type of non-homologous based mechanism is the non-homologous end joining or micro-homology end joining mechanism.[22] These mechanisms are also involved in repairing double stranded breaks but require no homology or limited micro-homology.[5] When these strands are repaired, oftentimes there are small deletions or insertions added into the repaired strand. It is possible that retrotransposons are inserted into the genome through this repair system.[22] If retrotransposons are inserted into a non-allelic position on the chromosome, meiotic recombination can drive the insertion to be recombined into the same strand as an already existing copy of the same region. Another mechanism is the break-fusion-bridge cycle which involves sister chromatids that have both lost its telomeric region due to double stranded breaks.[23] It is proposed that these sister chromatids will fuse together to form one dicentric chromosome, and then segregate into two different nuclei.[23] Because pulling the dicentric chromosome apart causes a double stranded break, the end regions can fuse to other double stranded breaks and repeat the cycle.[23] The fusion of two sister chromatids can cause inverted duplication and when these events are repeated throughout the cycle, the inverted region will be repeated leading to an increase in copy number.[23] The last mechanism that can lead to copy number variations is polymerase slippage, which is also known as template switching.[24] During normal DNA replication, the polymerase on the lagging strand is required to unclamp and re-clamp the replication region continuously.[24] When small scale repeats in the DNA sequence exist already, the polymerase can be 'confused' when it re-clamps to continue replication and instead of clamping to the correct base pairs, it may shift a few base pairs and replicate a portion of the repeated region again.[24] Note that although this has been experimentally observed and is a widely accepted mechanism, the molecular interactions that led to this error remains unknown. In addition, because this type of mechanism requires the polymerase to jump around the DNA strand and it is unlikely that the polymerase can re-clamp at another locus some kilobases apart, therefore this is more applicable to short repeats such as dinucleotide or trinucleotide repeats.[25]

Alpha-amylase gene

[edit]
Timeline of the change in hominin diet throughout late Paleolithic, Mesolithic, and Neolithic periods. As seen, root vegetables rich in starch were consumed around 20,000 years ago when the AMY1 diploid gene number is estimated to have increased.

Amylase is an enzyme in saliva that is responsible for the breakdown of starch into monosaccharides, and one type of amylase is encoded by the alpha-amylase gene (AMY1).[9] The AMY1 locus, as well as the amylase enzyme, is one of the most extensively studied and sequenced genes in the human genome. Its homologs are also found in other primates and therefore it is likely that the primate AMY1 gene is ancestral to the human AMY1 gene and was adapted early in primate evolution.[9] AMY1 is one of the most well studied genes which has wide range of variable numbers of copies throughout different human populations.[9] The AMY1 gene is also one of the few genes that had been studied that displayed convincing evidence which correlates its protein function to its copy number.[9] Copy number is known to alter transcription as well as translation levels of a particular gene, however research has shown that the relationship between protein levels and copy number is variable.[26] In the AMY1 genes of European Americans it is found that the concentration of salivary amylase is closely correlated to the copy number of the AMY1 gene.[9] As a result, it was hypothesized that the copy number of the AMY1 gene is closely correlated with its protein function, which is to digest starch.[9]

The AMY1 gene copy number has been found to be correlated to different levels of starch in diets of different populations.[9] Eight populations from different continents were categorized into high starch diets and low starch diets and their AMY1 gene copy number was visualized using high resolution FISH and qPCR.[9] It was found that the high starch diet populations which consists of the Japanese, Hadza, and European American populations had a significantly higher (two times higher) average AMY1 copy number than the low starch diet populations including Biaka, Mbuti, Datog and Yakut populations.[9] It was hypothesized that the levels of starch in one's regular diet, the substrate for AMY1, can directly affect the copy number of the AMY1 gene.[9] Since it was concluded that the copy number of AMY1 is directly correlated with salivary amylase,[9] the more starch present in the population's daily diet, the more evolutionarily favorable it is to have multiple copies of the AMY1 gene. The AMY1 gene was the first gene to provide strong evidence for evolution on a molecular genetic level.[26] Moreover, using comparative genomic hybridization, copy number variations of the entire genomes of the Japanese population was compared to that of the Yakut population.[9] It was found that the copy number variation of the AMY1 gene was significantly different from the copy number variation in other genes or regions of the genome, suggesting that the AMY1 gene was under a strong selective pressure that had little or no influence on the other copy number variations.[9] Finally, the variability of length of 783 microsatellites between the two populations was compared to copy number variability of the AMY1 gene. It was found that the AMY1 gene copy number range was larger than that of over 97% of the microsatellites examined.[9] This implies that natural selection played a considerable role in shaping the average number of AMY1 genes in these two populations.[9] However, as only six populations were studied, it is important to consider the possibility that there may be other factors in their diet or culture that influenced the AMY1 copy number other than starch.

Simplified phylogenetic tree of the great ape lineage and the number of diploid AMY1 genes that each species has. AMY1 gene number shown to increase after split with the chimpanzee lineage.

Although it is unclear when the AMY1 gene copy number began to increase, it is known and confirmed that the AMY1 gene existed in early primates. Chimpanzees, the closest evolutionary relatives to humans, were found to have two diploid copies of the AMY1 gene that is identical in length to the human AMY1 gene,[9] which is significantly less than that of humans. On the other hand, bonobos, also a close relative of modern humans, were found to have more than two diploid copies of the AMY1 gene.[9] Nonetheless, the bonobo AMY1 genes were sequenced and analyzed, and it was found that the coding sequences of the AMY1 genes were disrupted, which may lead to the production of dysfunctional salivary amylase.[9] It can be inferred from the results that the increase in bonobo AMY1 copy number is likely not correlated to the amount of starch in their diet. It was further hypothesized that the increase in copy number began recently during early hominin evolution as none of the great apes had more than two copies of the AMY1 gene that produced functional protein.[9] In addition, it was speculated that the increase in the AMY1 copy number began around 20,000 years ago when humans shifted from a hunter-gatherer lifestyle to agricultural societies, which was also when humans relied heavily on root vegetables high in starch.[9] This hypothesis, although logical, lacks experimental evidence due to the difficulties in gathering information on the shift of human diets, especially on root vegetables that are high in starch as they cannot be directly observed or tested. Recent breakthroughs in DNA sequencing has allowed researchers to sequence older DNA such as that of Neanderthals to a certain degree of accuracy. Perhaps sequencing Neanderthal DNA can provide a time marker as to when the AMY1 gene copy number increased and offer insight into human diet and gene evolution.

Currently it is unknown which mechanism gave rise to the initial duplication of the amylase gene, and it can imply that the insertion of the retroviral sequences was due to non-homologous end joining, which caused the duplication of the AMY1 gene.[27] However, there is currently no evidence to support this theory and therefore this hypothesis remains conjecture. The recent origin of the multi-copy AMY1 gene implies that depending on the environment, the AMY1 gene copy number can increase and decrease very rapidly relative to genes that do not interact as directly with the environment.[26] The AMY1 gene is an excellent example of how gene dosage affects the survival of an organism in a given environment. The multiple copies of the AMY1 gene give those who rely more heavily on high starch diets an evolutionary advantage, therefore the high gene copy number persists in the population.[26]

Brain cells

[edit]

Among the neurons in the human brain, somatically derived copy number variations are frequent.[28] Copy number variations show wide variability (9 to 100% of brain neurons in different studies). Most alterations are between 2 and 10 Mb in size with deletions far outnumbering amplifications.[28]

Genomic duplication and triplication of the gene appear to be a rare cause of Parkinson's disease, although more common than point mutations.[29]

Copy number variants in RCL1 gene are associated with a range of neuropsychiatric phenotypes in children.[30]

Gene families and natural selection

[edit]
Possible mechanism of how multiple copies of a gene can lead to a protein family over years with natural selection. Here, Gene X is the gene of interest that is duplicated and Gene X1 and Gene X2 are genes that acquired mutations and became functionally different to Gene X.

Recently, there had been discussion connecting copy number variations to gene families. Gene families are defined as a set of related genes that serve similar functions but have minor temporal or spatial differences and these genes likely derived from one ancestral gene.[26] The main reason copy number variations are connected to gene families is that there is a possibility that genes in a family may have derived from one ancestral gene which got duplicated into different copies.[26] Mutations accumulate through time in the genes and with natural selection acting on the genes, some mutations lead to environmental advantages allowing those genes to be inherited and eventually clear gene families are separated out. An example of a gene family that may have been created due to copy number variations is the globin gene family. The globin gene family is an elaborate network of genes consisting of alpha and beta globin genes including genes that are expressed in both embryos and adults as well as pseudogenes.[31] These globin genes in the globin family are all well conserved and only differ by a small portion of the gene, indicating that they were derived from a common ancestral gene, perhaps due to duplication of the initial globin gene.[31]

Research has shown that copy number variations are significantly more common in genes that encode proteins that directly interact with the environment than proteins that are involved in basic cellular activities.[32] It was suggested that the gene dosage effect accompanying copy number variation may lead to detrimental effects if essential cellular functions are disrupted, therefore proteins involved in cellular pathways are subjected to strong purifying selection.[32] In addition, proteins function together and interact with proteins of other pathways, therefore it is important to view the effects of natural selection on bio-molecular pathways rather than on individual proteins. With that being said, it was found that proteins in the periphery of the pathway are enriched in copy number variations whereas proteins in the center of the pathways are depleted in copy number variations.[33] It was explained that proteins in the periphery of the pathway interact with fewer proteins and so a change in protein dosage affected by a change in copy number may have a smaller effect on the overall outcome of the cellular pathway.[33]

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Copy number variation (CNV) is a form of structural genetic variation in which the number of copies of one or more sections of the genome differs among individuals, typically involving DNA segments larger than 1,000 base pairs (bp) and including such as duplications and deletions. CNVs are prevalent across populations, collectively accounting for approximately 12% of an individual's genomic variation when considering both and somatic changes, and they contribute significantly to inter-individual differences beyond single nucleotide polymorphisms (SNPs). These variations play a crucial role in by driving and , such as through effects that influence phenotypic traits and species divergence. In health and disease, CNVs are implicated in a wide array of conditions, including neurodevelopmental disorders like autism and , congenital anomalies, and cancers, where altered gene copy numbers can disrupt normal cellular function or promote oncogenesis. Advances in genomic technologies, such as array comparative genomic hybridization (aCGH) and next-generation sequencing, have facilitated the identification and characterization of CNVs, enhancing our understanding of their functional impacts.

Fundamentals

Definition and Characteristics

Copy number variation (CNV) is a form of structural variation in the genome characterized by deletions, duplications, or insertions of DNA segments, typically ranging from 1 kilobase (kb) to several megabases (Mb) in length, that result in abnormal copy numbers—such as 0, 1, or more than 2 copies—of genes or regulatory elements compared to a reference genome. These variations alter the genomic architecture without necessarily changing the DNA sequence itself, distinguishing them as a key contributor to inter-individual differences. CNVs are widespread across the , collectively spanning approximately 12% of its sequence and affecting hundreds of genes. They can occur as variants, present in all cells and heritable across generations, or as somatic variants, arising post-zygotically in specific tissues and thus non-inheritable. By definition, CNVs represent unbalanced structural changes that lead to a net gain or loss of genetic material, in contrast to balanced rearrangements like inversions or translocations that preserve overall dosage. This imbalance often modifies —the relative amount of —potentially disrupting expression levels, regulatory networks, or protein function, with effects ranging from neutral to pathogenic depending on the affected region. In comparison to other genetic variants, CNVs differ markedly from single nucleotide polymorphisms (SNPs), which involve substitutions at single base positions and primarily affect sequence specificity rather than quantity, and from insertions/deletions (indels), which are smaller events usually under 1 kb that may cause frameshifts but seldom span multiple genes. The larger scale of CNVs enables them to influence broader genomic contexts, such as gene families or enhancers, amplifying their potential for dosage-related impacts. Fundamentally, CNVs enhance by providing a mechanism for rapid evolutionary adaptation and phenotypic variation, as their duplication events can expand gene repertoires without the constraints of point mutations.

Historical Discovery

Prior to the , copy number variations (CNVs) were largely viewed as rare, pathogenic structural alterations in the genome, often associated with congenital syndromes detectable through cytogenetic techniques. For instance, the 22q11.2 deletion linked to was first identified in the early 1980s using cytogenetic methods, with submicroscopic deletions confirmed via in the early 1990s. These early observations framed CNVs primarily as disease-causing anomalies rather than common polymorphisms. The paradigm shifted dramatically between 2004 and 2006 with the application of array-based (array CGH) and (SNP) arrays, which enabled genome-wide detection of structural variants. Sebat et al. (2004) demonstrated the existence of large-scale copy number polymorphisms (>100 kb) in the , identifying dozens of such variants among normal individuals using representational microarray analysis (ROMA). Building on this, Redon et al. (2006) cataloged 1,447 CNV regions spanning over 12% of the across diverse populations, establishing CNVs as a major source of comparable to SNPs. These studies marked the recognition of CNVs as widespread, benign polymorphisms rather than solely pathological events. Following these breakthroughs, CNVs were integrated into large-scale genomic projects starting around 2010, providing deeper insights into their prevalence and impact. The , launched in 2008 and yielding key results from 2010 onward, systematically characterized CNVs alongside other variants in over 1,000 individuals from multiple ancestries, revealing the prevalence and diversity of CNVs and their contribution to through altered and regulatory elements. This effort expanded the known CNV catalog to thousands of loci, highlighting their role in population diversity and facilitating downstream association studies. By 2024–2025, advancing sequencing technologies and diverse cohort analyses have further emphasized CNVs as key drivers of and phenotypic variation. Studies using large biobanks like the and SPARK have shown that CNV frequencies vary significantly across genetic ancestry groups, with specific recurrent CNVs exhibiting up to twofold differences that influence trait and disease risk. This recent recognition underscores the need for ancestry-aware CNV mapping to refine understandings of human genomic diversity.

Types and Mechanisms

Classification of CNVs

Copy number variations (CNVs) are classified based on the nature of the copy number change, which can involve gains, losses, or more intricate combinations. Gains, also known as duplications or amplifications, refer to genomic segments with more than two copies, leading to increased in affected regions. Losses, or deletions, occur when fewer than two copies are present, resulting in reduced or absent function. Complex CNVs encompass multiallelic combinations where multiple copy number states coexist within a single variant, often involving both gains and losses in close proximity or through intricate rearrangements. CNVs are further categorized by size and resolution, reflecting their detectability and biological impact. Typically, CNVs range from 1 kilobase (kb) to several megabases (Mb) in length; variants smaller than 1 kb are generally classified as insertions or deletions rather than CNVs and are often overlooked by standard detection methods. Segmental duplications, a of CNVs, involve larger duplicated blocks (often 10-300 kb) that can be tandem (adjacent repeats) or dispersed (non-adjacent across chromosomes), contributing to genomic instability. Inheritance patterns distinguish CNVs, which are heritable and polymorphic (varying across populations), from de novo CNVs that arise anew in the offspring and are not present in parental genomes. CNVs are transmitted through generations and can be benign polymorphisms, while de novo events often carry higher pathogenic potential, particularly in neurodevelopmental disorders. The genomic context of CNVs provides additional classification based on location and orientation. CNVs can occur in intergenic regions (between genes), intragenic regions (within genes, affecting exons or introns), or regulatory elements (such as promoters or enhancers), influencing variably. They may be tandem, where duplicated segments are contiguous, or non-tandem (dispersed or insertional), where copies are separated by other genomic material, with tandem forms more prone to disrupting . Standard nomenclature for CNVs follows guidelines from the Database of Genomic Variants (DGV) and Variation Society (HGVS), using formats like "chr1:1000000-2000000 dup" to denote a tandem duplication on from position 1,000,000 to 2,000,000, or "del" for deletions, ensuring precise description of location, type, and size. This system facilitates consistent reporting and integration across databases, aiding in the interpretation of CNV diversity.

Molecular Mechanisms of Formation

Copy number variations (CNVs) arise primarily through error-prone and replication processes that generate structural rearrangements in the . These mechanisms include errors, double-strand break (DSB) repair pathways, and replication fork instabilities, often mediated by repetitive genomic elements such as low-copy repeats (LCRs) or segmental duplications (SDs). Recurrent CNVs, which occur at specific hotspots, are typically produced by precise misalignment events, while non-recurrent CNVs result from more variable, error-prone processes. Non-allelic (NAHR) is a key mechanism for recurrent CNVs, occurring when highly similar but non-allelic sequences, such as LCRs, misalign during or , leading to deletions or duplications of the intervening genomic segment. This process is facilitated by the homology length of LCRs, with longer repeats (typically >1 kb and >95% identity) increasing recombination frequency by promoting strand invasion and crossover resolution. NAHR hotspots are enriched in regions with clustered SDs, which constitute about 5-10% of the and are a primary mechanism underlying many disease-associated recurrent CNVs. Non-homologous end joining (NHEJ) and microhomology-mediated break-induced replication (MMBIR) contribute to non-recurrent CNVs by repairing DSBs through error-prone pathways that often involve short stretches of microhomology (2-15 bp) at breakpoints. directly ligates broken ends with minimal processing, frequently introducing small insertions or deletions, and is prevalent in post-replicative cells where DSBs arise from environmental insults like , which can elevate DSB rates by up to 10-fold. MMBIR, an extension of break-induced replication, initiates at stalled forks or DSBs, using microhomology to switch templates and copy distant sequences, resulting in complex rearrangements like inversions or tandem duplications observed in up to 50% of non-recurrent CNV junctions. Replication-based mechanisms, such as fork stalling and template switching (FoSTeS), generate CNVs during S-phase when replication forks encounter obstacles like repetitive sequences or secondary structures, causing temporary stalling and subsequent template switching to nearby homologous regions. This process, akin to MMBIR but focused on active replication, produces non-recurrent CNVs with junction microhomologies and is implicated in about 30% of complex genomic rearrangements in the . FoSTeS is particularly active in SD-rich regions, where fork collapse rates can be 5-10 times higher than in unique sequences. Additional mechanisms include transposon-mediated events, where mobile elements like LINEs or Alu sequences facilitate unequal recombination or excision, contributing to ~10% of CNVs through insertion-induced breaks or NAHR between homologous repeats. Viral integrations can similarly induce CNVs by creating DSBs at integration sites, though this is rarer in germline contexts. CNV frequency is highest in genomic hotspots defined by SDs and LCRs, which account for over 70% of recurrent events and correlate with elevated rates due to their repetitive nature.

Detection Methods

Traditional Techniques

Traditional techniques for detecting copy number variations (CNVs) predate next-generation sequencing and primarily relied on cytogenetic and molecular methods to identify large-scale chromosomal imbalances. These approaches, developed in the late 20th century, enabled the visualization and quantification of CNVs at resolutions sufficient for clinical diagnostics but were limited in detecting smaller variants. Karyotyping, a foundational cytogenetic method, involves culturing cells to arrest them in metaphase, staining chromosomes with Giemsa to produce banding patterns, and microscopically examining for structural abnormalities such as deletions or duplications. This technique can detect CNVs larger than 5–10 Mb across the entire genome, providing a broad overview of chromosomal architecture in a single cell analysis. Its strengths include the ability to identify aneuploidies and balanced translocations in individual cells, making it valuable for prenatal and cancer diagnostics since its introduction in the 1950s. However, limitations include low resolution for submicroscopic CNVs, labor-intensive cell culturing (taking several days), and subjectivity in interpretation, restricting its utility to large-scale events. Fluorescence in situ hybridization (FISH) enhances karyotyping by using fluorescently labeled DNA probes that hybridize to specific chromosomal loci, allowing visualization of targeted regions under a without requiring spreads. It detects CNVs in the 100 kb to 1 Mb range for known loci, offering higher specificity for confirming abnormalities identified by karyotyping. Strengths lie in its applicability to non-dividing cells and rapid turnaround (hours to days), which facilitated its widespread adoption in the for diagnosing microdeletion syndromes. Limitations include its targeted nature, requiring prior knowledge of suspect regions, and inability to scan the whole , making it unsuitable for de novo discoveries. Comparative genomic hybridization (CGH), introduced in 1992, compares test DNA from a sample with reference DNA by differentially labeling them with fluorescent dyes and hybridizing to normal chromosomes, where ratio imbalances indicate copy number gains or losses. This genome-wide method detects CNVs greater than 5–10 Mb without cell culturing for the test sample, marking a shift toward molecular in the 1990s. Its strengths include comprehensive scanning for imbalances in solid tumors and constitutional disorders, but limitations such as reliance on resolution and poor detection of small or low-level mosaics constrained its precision. Array-based CGH (aCGH), an evolution from 1998, replaces spreads with microarrayed genomic probes (e.g., BAC clones or ), enabling ratio-based detection of CNVs at resolutions of 50–100 kb depending on probe density. This high-throughput platform revolutionized CNV analysis in the by allowing simultaneous of thousands of loci, with strengths in unbiased genome-wide coverage and improved sensitivity for submegabase compared to traditional CGH. Limitations include potential biases from probe hybridization efficiency and challenges in distinguishing benign polymorphisms from pathogenic CNVs, often requiring validation. SNP microarrays, adapted for CNV detection in the mid-2000s, utilize genotyping platforms like those from or Illumina to measure signal intensities and B-allele frequencies (BAF) at polymorphic sites, inferring copy number states from deviations in these metrics. With resolutions of 10–40 kb, they excel in integrating CNV calling with for population studies, as demonstrated in early applications that identified thousands of common variants. Strengths include cost-effectiveness for large cohorts and detection of alongside CNVs, but limitations involve sparse coverage in repetitive regions and reduced sensitivity for rare or small events. Multiplex ligation-dependent probe amplification (MLPA), introduced around 2002, is a targeted molecular technique that uses multiple probes to detect copy number changes in up to 50 specific genomic loci in a single reaction by comparing probe amplification products via . It achieves resolution at the single to multi-gene level, making it ideal for clinical confirmation of known CNVs in diagnostic panels for genetic disorders like or hereditary cancers. Strengths include high throughput, low cost per locus, and no need for custom probe design for commercial kits, enabling rapid analysis (1-2 days). Limitations are its reliance on predefined targets, potential for probe failures in GC-rich regions, and need for orthogonal validation for novel variants. Quantitative PCR (qPCR) provides a targeted approach for validating or detecting CNVs at specific loci by amplifying test and reference regions and comparing amplification efficiencies via the ΔΔCt method, where fold change in copy number is calculated as 2ΔΔCt2^{-\Delta\Delta C_t}. This real-time PCR technique, refined for CNV in the early , offers high sensitivity for known regions with resolutions down to single-copy changes using standard curves or relative quantification. Its strengths are speed (hours), low cost, and precision for confirmation studies, but it is to predefined , introducing biases from primer efficiency and inability to discover novel CNVs genome-wide. Overall, these traditional methods drove key CNV discoveries in the , such as population-wide variant catalogs, but their low resolution for small CNVs (<50 kb) and reliance on known regions highlighted the need for higher-throughput alternatives.

Advanced Sequencing-Based Approaches

Whole-genome sequencing (WGS) has become a cornerstone for high-resolution CNV detection since the early , leveraging read depth analysis to infer copy number states by comparing observed sequencing coverage to expected levels in a diploid . The core principle relies on the formula for expected depth ratio in diploid contexts: Expected depth ratio=copy number2\text{Expected depth ratio} = \frac{\text{copy number}}{2} where deviations from this ratio indicate gains or losses, after normalization for sequencing biases. Tools like CNVnator, introduced in 2011, apply mean-shift segmentation to binned read depths, achieving sensitivities of 86–96% for CNVs larger than 1 kb while minimizing false positives through noise reduction via penalty. Control-FREEC, developed around the same period, complements this by using circular binary segmentation and to call both copy number changes and allelic imbalances, performing robustly on tumor-normal pairs with tunable parameters for varying coverage depths. These methods have been benchmarked extensively, showing improved precision for CNVs when integrated with discordant read-pair signals, though they require at least 30× coverage for reliable small-event detection. Exome sequencing (WES) extends CNV calling to targeted datasets by exploiting off-target reads—those aligning outside exonic capture regions—to achieve sparse genome-wide coverage, enabling detection of intergenic and intronic variants often missed by on-target analysis alone. Advances in 2024, such as the ECOLE deep learning caller, utilize architectures to denoise read depth signals from WES, outperforming traditional hidden Markov models in sensitivity for somatic and events across heterogeneous cohorts. By 2025, ensemble approaches integrating CNV callers into WES pipelines have boosted diagnostic yields in rare diseases, with retrospective analyses showing additional diagnoses in 1–10% of unsolved cases, raising overall yields from approximately 25% to 30–40% in diverse pediatric populations. For instance, tools like SavvyCNV normalize off-target depths via , facilitating scalable genome-wide profiling without full WGS costs. Long-read sequencing platforms, including PacBio's highly accurate circular consensus sequencing (HiFi) and (ONT), have revolutionized CNV detection post-2015 by spanning repetitive and homologous regions that confound short-read assembly, directly resolving complex structural variants like nested duplications or inversions with insertions. in 2024 revealed PacBio HiFi modes yielding higher recall rates (up to 95%) for CNVs in challenging loci compared to ONT or short-read baselines, with error rates below 1% enabling precise breakpoint delineation even in low-complexity DNA. These technologies excel for variants missed by short reads, such as those in centromeric or segmental duplications, and have been pivotal in de novo assemblies revealing novel CNVs in clinical . Tools like cuteSV and SVIM integrate long-read alignments for phased CNV calling, supporting applications in population-scale studies where short-read methods achieve only 70–80% concordance. Single-cell and low-pass WGS approaches address heterogeneity in tissues like tumors, using shallow coverage (0.1–1×) to profile CNVs across thousands of cells without amplification artifacts. The 2025 SCICoNE tool employs a Bayesian MCMC framework to infer copy number profiles and evolutionary event histories from single-cell , particularly for somatic CNVs, by modeling branching phylogenies from read depth bins. A 2024 benchmarking study in Genome Biology evaluated NGS-based somatic CNV callers, demonstrating sensitivities exceeding 90% for events larger than 10 kb at low coverage, with ensemble strategies reducing false discovery rates to under 5% in simulated and real tumor datasets. These methods, often combined with scRNA-seq for multimodal validation, enable high-throughput subclonal reconstruction but demand robust noise models to handle variations. Ensemble methods aggregate outputs from multiple callers to mitigate individual tool biases, with the 2025 EMcnv framework using on heterogeneous graphs to fuse read depth, split-read, and paired-end signals, achieving up to 15% gains in F1-score over single algorithms in diverse WGS cohorts. Addressing population stratification, 2024 studies incorporated ancestry-aware adjustments, such as principal component-based normalization, to correct for differences in non-European genomes, reducing false positives in CNV burden analyses by 20–30%. Despite these innovations, persistent challenges include GC bias correction—often via or lowess to equalize coverage in AT/GC-rich regions—and high computational demands, with segmentation algorithms requiring GPU for terabyte-scale datasets to maintain runtime under 24 hours.

Evolutionary Roles

Natural Selection and Adaptation

Copy number variations (CNVs) serve as a key evolutionary substrate due to their polymorphic nature and elevated mutation rates, which are estimated to be 100 to 10,000 times higher than those of single nucleotide polymorphisms (SNPs), facilitating rapid generation of for . This higher mutability allows CNVs to respond quickly to selective pressures, particularly in regions prone to structural changes. Analysis of the phase 3 data from 2,504 human genomes revealed that common CNVs are enriched in immune-related genes, such as those encoding immunoglobulin domains, indicating their role in diversifying immune responses across populations. Positive selection has acted on specific CNVs to enhance to environmental challenges, including dietary shifts and exposure. For instance, copy number increases in the AMY1 gene, which encodes salivary amylase for starch digestion, show signatures of positive selection in populations reliant on high-starch diets, such as agricultural societies, where higher copy numbers correlate with improved enzymatic efficiency. Similarly, higher copy numbers of CCL3L1, a gene that inhibits entry into cells, have been reported to confer resistance to HIV-1 acquisition, with individuals possessing more than two copies exhibiting up to an 80% reduced risk after controlling for confounders like age and , though this association remains controversial due to conflicting replication studies. Recent population-level studies highlight ongoing adaptive roles of CNVs, with differences in frequency across ancestries suggesting historical selection. A 2024 analysis in HGG Advances found that deleterious CNVs are less prevalent in non-European ancestry groups compared to Europeans in large cohorts like the , implying purifying selection may vary by ancestry to maintain fitness in diverse environments. Analogously, in like apple during , CNVs contribute to and to cultivation pressures, mirroring potential patterns where ancestry-specific CNV frequencies could underpin local adaptations. Neutral evolution and balancing selection also maintain CNV diversity, particularly in gene families where multiple alleles persist due to heterozygote advantages in . Balancing selection on deletion polymorphisms, for example, preserves ancient variants with exonic impacts identified in genome-wide association studies (GWAS), promoting polymorphism in traits like immune function. Integrated GWAS-CNV analyses further describe selection coefficients for these variants as stronger on average than for SNPs, underscoring CNVs' disproportionate evolutionary influence despite their lower overall frequency.

Impact on Gene Families

Copy number variations (CNVs) play a pivotal role in the expansion of through mechanisms such as duplications, which generate paralogous that can evolve new functions. In the human (OR) , the largest in the mammalian , duplications have contributed to extensive copy number diversity, with approximately 50% of identified CNVs spanning multiple OR loci. This expansion facilitates subfunctionalization, where duplicate copies partition ancestral functions, and neofunctionalization, enabling to odorants, thereby enhancing sensory capabilities across populations. Conversely, CNVs involving deletions lead to gene family contraction, effectively pruning redundant or non-essential copies to streamline genomic architecture. In the (MHC), structural variations including CNVs modulate immune response diversity while mitigating potential autoimmune risks from excessive variation. The classical MHC loci (, B, C for class I; DR, DQ, DP for class II) are fixed, but allelic diversity and variations in non-classical genes influence pathogen resistance without compromising overall fitness. Natural selection acts on CNV-driven copy number differences within gene families, often correlating higher copy numbers with improved fitness in specific environments. For instance, populations under selective pressure exhibit elevated copies of certain paralogs, as seen in adaptive expansions linked to . Recent 2025 analyses demonstrate how CNVs enable rapid toggling between ecological niches by altering , a dynamic observed in both and systems, such as during apple domestication where CNVs in enzyme families parallel sensory adaptations. These CNV-induced variations in family size, typically ranging from 1 to 20 copies, exert dosage effects that modulate expression networks, influencing downstream phenotypic traits without disrupting core functions. In expanded families, increased dosage amplifies signaling pathways, while contractions fine-tune regulatory balance, underscoring CNVs' contribution to evolutionary plasticity across taxa.

Biological and Clinical Implications

Associations with Human Diseases

Copy number variations (CNVs) are strongly implicated in a range of human diseases, particularly those involving neurodevelopmental and psychiatric disorders, where they disrupt and contribute to pathogenicity through mechanisms such as . Recurrent pathogenic CNVs, such as the 22q11.2 deletion, are associated with (also known as 22q11.2 deletion syndrome), a condition characterized by congenital heart defects, immune deficiency, and developmental delays, with a prevalence of approximately 1 in 4,000 live births. This deletion typically spans 30-40 genes, leading to dosage imbalances that underlie the syndrome's multisystem effects. Similarly, de novo CNVs—those arising anew in the affected individual—are identified in 5-10% of cases of autism spectrum disorder (ASD), often involving large deletions or duplications that alter neurodevelopmental pathways. Many pathogenic CNVs exhibit incomplete , meaning not all carriers develop the associated , with penetrance estimates for neurodevelopmental CNVs often below 10% for based on updated analyses of recurrent variants. This variability complicates clinical interpretation and highlights the influence of modifier factors, such as second-hit or environmental interactions. Recent studies have also revealed ancestry-related biases in CNV distribution; for instance, deleterious CNVs appear less prevalent in non-European ancestry groups compared to European groups in large cohorts like the , potentially due to ascertainment biases in population sampling rather than true biological differences. These findings underscore the need for diverse genomic datasets to accurately assess CNV risks across ancestries. Advancements in 2024-2025 have expanded understanding of CNV roles in adult-onset diseases. In , large-scale CNV analysis identified potentially disease-causing variants in 0.9% of patients versus 0.1% of controls, with an (OR) of 1.67, particularly involving genes like PRKN and LRRK2. For psychiatric disorders, CNVs confer substantial risks, with recurrent deletions and duplications linked to , , and ASD through shared neurodevelopmental disruptions, as detailed in comprehensive reviews emphasizing their high-impact contributions to multifactorial etiology. Diagnostic approaches have benefited from integrating CNV calling into whole-exome sequencing (WES), boosting yield by approximately 5-10% in cohorts previously undiagnosed by single-nucleotide variant analysis alone, enabling more precise identification of structural contributors. The primary mechanisms of CNV pathogenicity involve gene dosage alterations: deletions often cause haploinsufficiency, where reduced expression of one gene copy impairs function, while duplications may lead to toxic gain-of-function or imbalance in dosage-sensitive pathways. In schizophrenia, specific recurrent CNVs, such as those at 22q11.2 or 16p11.2, elevate risk with ORs ranging from 10 to 20, reflecting their disruption of critical synaptic and neuronal development genes. These dosage effects provide a conceptual framework for understanding CNV-driven diseases, prioritizing genes intolerant to copy number change in clinical genomics.

Somatic CNVs in Brain Development

Somatic copy number variations (CNVs) arise de novo during neurogenesis in the developing brain, primarily through replication errors such as double-strand breaks and non-allelic homologous recombination, leading to mosaic patterns in neural tissues. These post-zygotic events occur in neuronal progenitor cells, resulting in subpopulations of neurons with altered genomic content that persist into adulthood. In healthy human brains, studies using single-cell whole-genome sequencing have revealed that 10-25% of cortical neurons harbor at least one megabase-scale somatic CNV, with recent analyses reporting approximately 20.6% of brain cells affected across various amplification methods. This prevalence is higher in neurons (4-23.1%) compared to non-neuronal cells (4.7-8.7%), underscoring the role of proliferative divisions during corticogenesis in generating such mosaicism. Detecting these low-frequency mosaic CNVs presents significant challenges due to their subclonal nature and the technical limitations of bulk sequencing, which often masks variants present in fewer than 10% of cells. Breakthroughs in during the 2010s, including studies by McConnell et al. demonstrating CNVs in 13-41% of frontal cortex neurons and Cai et al. identifying clonal CNVs in pyramidal neurons, first revealed this widespread genomic heterogeneity. More recent advances, such as 2024 analyses integrating multiple amplification protocols (e.g., PicoPLEX and primary template-directed amplification), have improved resolution for megabase-scale events even at low coverage (~0.6x). In 2025, tools like SCICoNE, a using for copy number calling from shallow whole-genome sequencing data, enable accurate reconstruction of CNV histories in single cells, outperforming prior methods in handling amplification biases and low-input samples. Functionally, somatic CNVs contribute to neuronal diversity by altering in key neurodevelopmental pathways, fostering variability in synaptic connectivity and excitability among otherwise identical neurons. For instance, megabase-scale duplications or deletions can modify expression of genes involved in neuronal migration and , enhancing circuit-level adaptability without compromising overall function in healthy individuals. In pathological contexts, these variants link to neurodevelopmental disorders; somatic deletions, for example, are implicated in up to 29% of focal cortical type II cases associated with , where they disrupt signaling and promote malformed cortical architecture. From an evolutionary standpoint, somatic CNVs may enhance plasticity by introducing genomic variability that supports adaptive responses to environmental demands, potentially buffering against vulnerability in neural populations. This mosaicism likely stems from elevated rates during , estimated at approximately 8-9 SNVs per , reflecting the trade-off between proliferative demands and fidelity in cells. Recent 2024-2025 single-cell studies, including those mapping recurrent CNV hotspots near segmental duplications, further highlight how these variants drive tissue-specific in the , with sub-telomeric enrichments suggesting mechanisms that promote functional diversification over generations.

Case Studies

Alpha-Amylase Gene Family

The alpha-amylase gene family exemplifies copy number variation (CNV) in humans, particularly at the AMY1 locus on chromosome 1p21.1, where tandem duplicates of the salivary gene (AMY1) range from 2 to 17 copies per individual, with an average of about 6 copies. This contrasts with the nearby pancreatic genes (AMY2A and AMY2B), which exhibit less extensive CNV and are primarily expressed in the rather than . The structural complexity arises from repeated duplication events, creating a variable cluster that influences digestion efficiency. This CNV demonstrates adaptive significance tied to dietary shifts, as higher AMY1 copy numbers correlate with populations consuming starch-rich diets. For instance, agricultural groups like and Japanese average 6-8 copies, compared to 4-5 in low-starch hunter-gatherer populations such as the Biaka Pygmies or Yakut. Evidence from Perry et al. (2007) indicates positive selection favoring increased copies in high-starch contexts, enhancing the breakdown of complex carbohydrates into simpler sugars for better energy extraction. East Asian populations, including Japanese, often show elevated averages, reflecting historical reliance on starchy staples like . Functionally, greater AMY1 copies lead to higher salivary protein levels and activity, improving initial in the . Genome-wide association studies (GWAS) further link low AMY1 copies to increased (BMI) and risk, with each additional copy reducing odds by about 1.2-fold, potentially due to altered glycemic responses. Similar associations appear with susceptibility, where reduced copies may impair and elevate postprandial glucose. Detection of AMY1 CNV typically employs quantitative PCR (qPCR) for high-throughput estimation or array comparative genomic hybridization (aCGH) for structural mapping, though qPCR can underestimate copies due to primer biases—digital PCR offers higher accuracy. Population-level variation persists, with even higher averages in some groups like Indigenous Peruvians, underscoring ongoing . Recent research ties AMY1 CNV to microbiome adaptation, showing that copy number influences oral and gut microbial communities' response to starch, potentially modulating biofilm formation and fermentation efficiency in starch-dependent ecosystems.

Other Prominent Examples

Copy number variations (CNVs) in immune-related genes provide a clear example of adaptive dosage effects. The CCL3L1 gene, encoding the chemokine macrophage inflammatory protein-1α (MIP-1α), exhibits variable copy numbers. An initial 2005 study reported an inverse correlation with HIV-1 susceptibility and progression, suggesting that lower copy numbers (below the population median of typically 2–4 copies) increase risk due to reduced MIP-1α production, which inhibits HIV entry via the CCR5 co-receptor, and that higher copies confer protection, with each additional copy potentially reducing infection risk by 4–10%. However, subsequent research has produced inconsistent results, with many studies failing to replicate the association and questioning the original assay's accuracy, leaving the role of CCL3L1 CNV in HIV debated. In neurodevelopment, CNVs at the 16p11.2 chromosomal locus represent a prominent pathogenic example with bidirectional effects and variable expressivity. Deletions spanning approximately 600 kb in this region, affecting 25–29 genes including MVP and KCTD13, are strongly associated with autism spectrum disorder (ASD), intellectual disability, and macrocephaly, increasing ASD risk up to ninefold. Duplications of the same interval, in contrast, link to schizophrenia, bipolar disorder, and microcephaly, with penetrance varying by genetic background and environmental factors, underscoring how reciprocal CNVs disrupt dosage-sensitive pathways in brain development and connectivity. CNVs in the (OR) highlight evolutionary and functional diversity in sensory systems. This largest superfamily, comprising over 400 loci, shows extensive structural variation, with roughly 50% of identified CNVs encompassing multiple OR genes and contributing to inter-individual differences in copy number that explain up to 30% of variation in olfactory perception. Such polymorphisms, often involving duplications or deletions of gene clusters on chromosomes 11 and 1, influence detection thresholds and receptor repertoire, reflecting neutral genomic drift and selection pressures that fine-tune olfaction across populations without overt disease associations. Recent investigations into CNVs in Parkinson's disease (PD) have identified rare structural variants in the GBA gene, encoding glucocerebrosidase, as modifiers of risk. While loss-of-function variants in GBA represent the strongest common genetic risk factor for PD, affecting lysosomal function and α-synuclein accumulation. These cases collectively exemplify CNV-driven dosage sensitivity, where altered gene copy numbers modulate protein expression to yield protective adaptations (e.g., in immunity and olfaction), pathological vulnerabilities (e.g., in neurodevelopment and neurodegeneration), or evolutionary flexibility, distinct from broader disease associations or gene family expansions.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.