Hubbry Logo
Incomplete lineage sortingIncomplete lineage sortingMain
Open search
Incomplete lineage sorting
Community hub
Incomplete lineage sorting
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Incomplete lineage sorting
Incomplete lineage sorting
from Wikipedia

Incomplete lineage sorting (ILS)[1][2][3] (also referred to as hemiplasy, deep coalescence, retention of ancestral polymorphism, or trans-species polymorphism) is a phenomenon in evolutionary biology and population genetics that results in discordance between species and gene trees.[4][5] By contrast, complete lineage sorting results in concordant species and gene trees. ILS occurs in the context of a gene in an ancestral species which exists in multiple alleles. If a speciation event occurs in this situation, either complete lineage sorting will occur, and both daughter species will inherit all alleles of the gene in question, or incomplete lineage sorting will occur, when one or both daughter species inherits a subset of alleles present in the parental species. For example, if two alleles of a gene are present and a speciation event occurs, one of the two daughter species might inherit both alleles, but the second daughter species only inherits one of the two alleles. In this case, incomplete lineage sorting has occurred.[6]

Concept

[edit]
Figure 1. Incomplete lineage sorting: see the text for an explanation.
Figure 2. Apparent incomplete lineage sorting: see the text for an explanation.

The concept of incomplete lineage sorting has some important implications for phylogenetic techniques. The persistence of polymorphisms across different speciation events can cause incomplete lineage sorting. Suppose two subsequent speciation events occur where an ancestor species gives rise firstly to species A, and secondly to species B and C. When studying a single gene, it can have multiple versions (alleles) causing different characters to appear (polymorphisms). In the example shown in Figure 1, the gene G has two versions (alleles), G0 and G1. The ancestor of A, B and C originally had only one version of gene G, G0. At some point, a mutation occurred and the ancestral population became polymorphic, with some individuals having G0 and others G1. When species A split off, it retained only G1, while the ancestor of B and C remained polymorphic. When B and C diverged, B retained only G1 and C only G0; neither were now polymorphic in G. The tree for gene G shows A and B as sisters, whereas the species tree shows B and C as sisters. If the phylogeny of these species is based on gene G, it will not represent the actual relationships between the species. In other words, the most related species will not necessarily inherit the most related genes. This is of course a simplified example of incomplete lineage sorting, and in real research it is usually more complex containing more genes and species.[7][8]

However, other mechanisms can lead to the same apparent discordancy, for example, alleles can move across species boundaries via hybridization, and DNA can be transferred between species by viruses.[9] This is illustrated in Figure 2. Here the ancestor of A, B and C, and the ancestor of B and C, had only the G0 version of gene G. A mutation occurred at the divergence of B and C, and B acquired a mutated version, G1. Some time later, the arrow shows that G1 was transferred from B to A by some means (e.g. hybridization or horizontal gene transfer). Studying only the final states of G in the three species makes it appear that A and B are sisters rather than B and C, as in Figure 1, but in Figure 2 this is not caused by incomplete lineage sorting.

Implications

[edit]

Incomplete lineage sorting has important implications for phylogenetic research. There is a chance that when creating a phylogenetic tree it may not resemble actual relationships because of this incomplete lineage sorting. However, gene flow between lineages by hybridization or horizontal gene transfer may produce the same conflicting phylogenetic tree. Distinguishing these different processes may seem difficult, but much research and different statistical approaches are (being) developed to gain greater insight in these evolutionary dynamics.[10] One of the resolutions to reduce the implications of incomplete lineage sorting is to use multiple genes for creating species or population phylogenies. The more genes used, the more reliable the phylogeny becomes.[8]

In diploid organisms

[edit]

Incomplete lineage sorting commonly happens with sexual reproduction because the species cannot be traced back to a single person or breeding pair. When organism tribe populations are large (i.e. thousands) each gene has some diversity and the gene tree consists of other pre-existing lineages. If the population is bigger these ancestral lineages are going to persist longer. When you get large ancestral populations together with closely timed speciation events these different pieces of DNA retain conflicting affiliations. This makes it hard to determine a common ancestor or points of branching.[4]

In primate evolution

[edit]

Chimpanzees and bonobos are more related to each other than any other taxa and are thus sister taxa. Still, for 1.6% of the bonobo genome, sequences are more closely related to homologues of humans than to chimpanzees, which is probably a result of incomplete lineage sorting.[4] A study of more than 23,000 DNA sequence alignments in the family Hominidae (great apes, including humans) showed that about 23% did not support the known sister relationship of chimpanzees and humans.[9]

In human evolution

[edit]

In human evolution, incomplete lineage sorting is used to diagram hominin lineages that may have failed to sort out at the same time that speciation occurred in prehistory.[11] Due to the advent of genetic testing and genome sequencing, researchers found that the genetic relationships between hominin lineages might disagree with previous understandings of their relatedness based on physical characteristics.[11] Moreover, divergence of the last common ancestor (LCA) may not necessarily occur at the same time as speciation.[12] Lineage sorting is a method that allows paleoanthropologists to explore the genetic relationships and divergences that may not fit with their previous speciation models based on morphological traits alone.[11]

Incomplete lineage sorting of the human family tree is an area of great interest. There are a number of unknowns when considering both the transition from archaic humans to modern humans and divergence of the other great apes from the hominin lineage.[13]

Ape and hominin / human divergence

[edit]

Incomplete lineage sorting means that the average divergence time between genes may differ from the divergence time between species. Models suggest that the average divergence time between the genes in the human and chimpanzee genome is older than the split between humans and gorillas. What this means is the common ancestor of humans and chimpanzees has left traces of genetic material that was present in the common ancestor of humans, chimpanzees, and gorillas.[12] However, the genetic tree slightly differs from that of the species or phylogeny tree.[14] In the phylogeny tree when we look at the evolutionary relationship between the human, bonobo chimpanzee, and gorilla, the results show that the separation of bonobo and chimpanzee transpired in a close proximity of time to the common ancestor of the bonobo-chimpanzee ancestor and humans,[12] indicating that humans and chimpanzees shared a common ancestor for several million years after separation from gorillas. This creates the phenomenon that is incomplete lineage sorting. Today researchers are relying on DNA fragments in order to study the evolutionary relationships among humans and their counterparts in the hope that it will provide information about speciation and ancestral processes from genomes from different types of humans.[15]

In viruses

[edit]
Figure 3. The pretransmission interval and incomplete lineage sorting in the phylogeny of a human-transmissible virus. The shaded tree represents a transmission chain where each region represents the pathogen population in each of three patients. The width of the shaded regions corresponds to the genetic diversity. In this scenario, A infects B with an imperfect transmission bottleneck, and then B infects C. The genealogy at the bottom is reconstructed from a sample of a single lineage from each patient at three distinct time points. When diversity exists in donor A, a pre-transmission interval will occur at each inferred transmission event (MRCA(A,B) precedes transmission from A to B), and the order of transmission events may become randomized in the virus genealogy. Note that the pre-transmission interval also is a random variable defined by the donor's diversity at time of each transmission. Terminal branch lengths are also elongated due to these processes.

Incomplete lineage sorting is a common feature in viral phylodynamics, where the phylogeny represented by transmission of a disease from one person to the next, which is to say the population level tree, often doesn't correspond to the tree created from a genetic analysis due to the population bottlenecks that are an inherent feature of viral transmission of disease. Figure 3 illustrates how this can occur. This has relevance to criminal transmission of HIV where in some criminal cases, a phylogenetic analysis of one or two genes from the strains from the accused and the victim have been used to infer transmission; however, the commonality of incomplete lineage sorting means that transmission cannot be inferred solely on the basis of such a basic analysis.[16]

Jacques and List (2019)[17] show that the concept of incomplete lineage sorting can be applied to account for non-treelike phenomena in language evolution. Kalyan and François (2019), proponents of the method of historical glottometry, a model challenging the applicability of the tree model in historical linguistics, concur that "Historical Glottometry does not challenge the family tree model once incomplete lineage sorting has been taken into account."[18]

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Incomplete lineage sorting (ILS) is a fundamental phenomenon in and wherein ancestral genetic polymorphisms persist across events without fully resolving into distinct lineages, leading to discordance between gene trees and the species tree. This occurs when allelic lineages from a common ancestor fail to coalesce prior to a speciation event, allowing multiple ancestral variants to randomly segregate into descendant . As a result, ILS generates phylogenetic incongruences that can mimic other evolutionary processes, such as hybridization, complicating the reconstruction of evolutionary histories. The primary drivers of ILS are large effective population sizes in ancestral populations, which maintain high levels of , and short temporal intervals between successive events, which provide insufficient time for complete sorting of polymorphisms. Under the multispecies model, the probability of ILS increases with these factors, potentially affecting a substantial portion of the —up to 64% in certain internodes of the phylogeny. For instance, in the human-chimpanzee- , approximately 30% of the exhibits ILS, with about 15% of the showing greater similarity to the gorilla lineage than to the . ILS has profound implications for understanding speciation dynamics, as it preserves signals of ancestral variation that can illuminate effective population sizes, divergence times, and even selective pressures acting on the genome. It is particularly prevalent in rapid radiations, such as those observed in marsupials and salmonids, where it contributes to phenotypic evolution and challenges traditional phylogenetic methods reliant on single-locus data. To address ILS, modern phylogenomic approaches employ coalescent-based models that account for gene tree heterogeneity, enabling more accurate species tree inference despite pervasive discordance.

Fundamentals

Definition and Core Concept

Incomplete lineage sorting (ILS) is a population genetic phenomenon in which ancestral genetic polymorphisms persist across multiple events, resulting in trees that do not match the tree due to the incomplete coalescence of lineages within the ancestral . This occurs when the time between successive events is shorter than the time required for ancestral alleles to coalesce, allowing polymorphic variants to be randomly inherited by descendant in a manner that produces discordant phylogenetic signals. The core principle of ILS involves the asymmetric sorting of neutral alleles from an ancestral population into descendant lineages, which generates reticulate patterns of observable at the genomic scale. Under this process, shared ancestral polymorphisms can lead to portions of the appearing to support alternative evolutionary relationships among , even in the absence of or hybridization. This framework is rooted in , which models the probabilistic coalescence of lineages backward in time. In complete lineage sorting, by contrast, all ancestral lineages coalesce prior to the event, yielding monophyletic gene trees that fully align with the species tree . ILS deviates from this when coalescence remains incomplete, introducing stochastic variation in gene tree . A basic illustration is the three-species case with species tree ((A, B), C), where ILS can generate discordant gene trees—such as A clustering with C ( topology) or B with C (BABA topology)—each with roughly equal probability of about one-third under short internal branch lengths in the anomaly zone, where the species tree is less probable than these alternatives.

Historical Background

The concept of gene tree discordance, where individual gene phylogenies do not match the species tree, was first systematically explored in during the 1990s, with early recognition attributed to studies highlighting how ancestral polymorphisms could persist across diverging lineages. John C. Avise's 1994 work emphasized these discrepancies in the context of molecular markers and evolutionary history, noting their implications for inferring species relationships from genetic data. This laid foundational groundwork by illustrating how processes in gene evolution could lead to incongruent genealogies, prompting further investigation into mechanisms beyond simple divergence. The formalization of incomplete lineage sorting (ILS) as a key driver of such discordance accelerated in the through coalescent-based models, which mathematically described how incomplete sorting of ancestral polymorphisms causes gene trees to vary from the species tree. A seminal contribution came from James H. Degnan and Noah A. Rosenberg in , who identified the "anomaly zone"—a parameter space where the most likely gene tree topology contradicts the species tree due to ILS, challenging traditional methods for . This work, building on , highlighted the need for species-tree-aware inference and influenced subsequent modeling efforts by researchers like Laura S. Kubatko, who co-authored studies on the inconsistencies of concatenated data under coalescence. Post-2010, ILS became integral to phylogenomics, with large-scale genomic datasets revealing its prevalence in rapid radiations and complex evolutionary histories. Influential advancements include Siavash Mirarab's development of coalescent-based tools for handling ILS in species tree estimation, enhancing accuracy in multi-locus analyses. In the 2020s, studies such as the 2022 analysis of marsupial evolution demonstrated how extensive ILS during the Cretaceous-Paleogene radiation led to persistent ancestral polymorphisms, reshaping understandings of phenotypic convergence across lineages. Recent 2025 research on early-diverging eudicots further integrated ILS with hybridization, showing how these processes interplay to generate phylogenetic conflicts in angiosperm diversification, underscoring ongoing refinements in modeling ancient genomic exchanges.

Mechanisms

Genetic Processes

Incomplete lineage sorting (ILS) arises primarily from population-level factors that allow ancestral polymorphisms to persist beyond speciation events. A large effective population size (Ne) in the ancestor extends the expected time to coalescence for gene lineages, increasing the likelihood that polymorphisms remain unsorted when descendant lineages diverge. Similarly, short intervals between successive speciation events reduce the opportunity for lineages to coalesce within ancestral branches, thereby elevating ILS probabilities. Low migration rates between emerging populations further promote ILS by limiting gene flow that could otherwise homogenize or mask ancestral variation. Genetic drift plays a central role in ILS by driving the random fixation or loss of ancestral alleles in descendant populations after divergence. In the absence of strong selection, drift stochastically alters allele frequencies, potentially leading to different alleles becoming fixed in sister lineages, which results in gene trees that do not match the species tree. This process underlies the coalescent framework, where the failure of all lineages to coalesce prior to speciation manifests as incomplete sorting. Recombination influences ILS by differentially affecting linked and unlinked loci across the genome. Loci in low-recombination regions tend to share coalescent histories due to persistent linkage disequilibrium, behaving as single units under sorting. In contrast, high-recombination regions allow recombination events to break down linkage disequilibrium, enabling independent coalescence and sorting of nearby loci, which can amplify discordance between gene trees and the species tree. As a result, ILS proportions often increase with local recombination rates. Selection can interact with ILS by altering the persistence of polymorphisms. Positive selection may accelerate fixation of favored alleles, potentially reducing ILS at those loci, while balancing selection maintains multiple alleles at intermediate frequencies over longer periods, thereby exacerbating the retention of ancestral variation and increasing ILS. This effect is particularly evident in immune-related genes, where trans-species polymorphisms persist due to selective pressures. A classic example of polymorphism persistence occurs in a scenario with two successive events. Consider an ancestral population harboring three s (A, B, C) at a neutral locus. During the first speciation, the population splits into two lineages (e.g., leading to X and an intermediate ancestor YZ); A is lost in YZ by drift due to founder effects, leaving B and C. In the second speciation, YZ splits into Y and Z; B fixes in Y while C fixes in Z via drift. The resulting gene tree ((Y: B, Z: C), X: A) matches the tree (X, (Y, Z)), but alternative distributions—such as B in X and Y, C in Z—could yield discordant topologies like (X: B, (Y: B, Z: C)), illustrating incomplete sorting.

Mathematical Foundations

The multispecies coalescent (MSC) model formalizes incomplete lineage sorting (ILS) as a extending the Kingman coalescent to multiple , where lineages coalesce randomly within ancestral populations according to s and branch lengths. In this framework, the probability that two lineages coalesce within an ancestral branch of length tt (in generations) is given by 1exp(t2Ne)1 - \exp\left(-\frac{t}{2N_e}\right), where NeN_e is the of the ancestral population; this equation arises from the exponential waiting time for coalescence in a Wright-Fisher model, scaled by coalescent units where time is measured in units of 2Ne2N_e generations. A key phenomenon under the MSC is the anomaly zone, a region of parameter space where the most likely gene tree differs from the species tree , potentially leading to more than one-third of gene trees mismatching the species tree. Degnan and Rosenberg (2006) derived conditions for the anomaly zone based on branch lengths in units, showing that it occurs when short internal branches allow substantial ILS; for a four-taxon symmetric species tree, the zone is defined by the inequality where the probability of the matching gene tree falls below that of a discordant , computed via of coalescence probabilities across branches. For a three-taxon tree, the MSC predicts the expected discordance due to ILS explicitly: the probability that a gene tree mismatches the species tree is 23exp([τ](/page/Tau))\frac{2}{3} \exp(-[\tau](/page/Tau)), where τ\tau is the internode branch length between events in units; this follows from the equal likelihood of the two discordant topologies conditional on no coalescence in the internode, each with probability 13exp([τ](/page/Tau))\frac{1}{3} \exp(-[\tau](/page/Tau)). Extensions of the MSC incorporate recombination by modeling across genomic regions, often using piecewise constant approximations to population sizes or rates along the to adjust for recombination hotspots; these allow hybrid detection by distinguishing ILS from through site-pattern probabilities under joint coalescent-recombination processes.

Detection and Analysis

Methods for Identifying ILS

Site pattern methods provide a foundational approach for detecting incomplete lineage sorting (ILS) by examining allele frequency patterns at polymorphic sites across taxa. The ABBA-BABA test, introduced by Patterson et al., compares the frequencies of two specific site patterns—ABBA and BABA—under a four-taxon configuration (P1, P2, P3, outgroup), where ABBA and BABA represent alternative resolutions of the unrooted quartet topology discordant with the species tree. Under ILS alone, without gene flow, ABBA and BABA patterns are expected to occur in equal proportions due to the random coalescence of lineages. The test statistic D is calculated as D=ABBABABAABBA+BABAD = \frac{ABBA - BABA}{ABBA + BABA}, with values near zero (typically |D| < 0.05 after significance testing) indicating ILS as the primary cause of discordance rather than admixture. This method is particularly effective for distinguishing ILS from introgression in population genomic data, though it assumes a suitable outgroup to polarize alleles accurately. Quartet-based inference extends site pattern analysis by quantifying ILS through concordance factors (CFs), which measure the proportion of gene trees or sites supporting each of the three possible unrooted quartet topologies at internal branches of the species tree. For a given quartet, the primary CF (pCF) corresponds to the topology matching the species tree, while the two secondary CFs (sCF1 and sCF2) reflect discordant topologies arising from ILS. Under the multispecies coalescent (MSC) model, expected CFs can be simulated based on branch lengths in coalescent units, where short internal branches predict higher ILS-induced discordance (e.g., pCF ≈ 1/3 when the branch length approaches zero). These factors are computed across sliding genomic windows to assess heterogeneity in ILS levels, providing a genome-wide summary of coalescent stochasticity. This approach is robust to gene tree estimation error when using site-based CFs but requires dense sampling to achieve high resolution. Bayesian approaches offer a model-based framework for identifying ILS by integrating full likelihood computations under the MSC, which explicitly accounts for coalescent variation across loci to estimate ILS proportions while distinguishing it from or other processes. These methods use (MCMC) sampling to infer posterior distributions of tree topologies and branch lengths, incorporating multilocus sequence data to quantify the probability of discordance due to deep coalescence. For instance, the MSC likelihood evaluates the fit of observed gene trees to the species tree, estimating the fraction of loci affected by ILS as a function of effective sizes and times. Such full probabilistic improves accuracy in complex scenarios by jointly modeling ILS and potential confounders like migration. Despite their strengths, these methods face limitations in power and applicability, particularly in scenarios with low or confounding factors. Site pattern tests like ABBA-BABA have reduced power to differentiate ILS from when admixture events are ancient or sparse, as both processes can produce similar allele-sharing patterns, necessitating large sample sizes for reliable detection. Quartet CFs are sensitive to gene tree estimation errors and may underestimate ILS in regions with high recombination or selection, where secondary CFs deviate from MSC expectations. Bayesian MSC methods, while comprehensive, suffer from computational intensity and identifiability issues in low-divergence clades, where short branches lead to overlapping signals from ILS and , often requiring informative priors or outgroup taxa for convergence. Additionally, all approaches assume neutral evolution and can be biased by positive selection, which alters site patterns independently of ILS.

Computational Tools

One prominent phylogenomic tool for species tree estimation under incomplete lineage sorting (ILS) is ASTRAL, which infers coalescent-based species trees from unrooted gene trees by maximizing the number of induced s that match the target tree topology. Introduced in 2014, ASTRAL takes sets of gene trees as input—typically estimated from multi-locus genomic data—and outputs a species tree that accounts for ILS via quartet frequency scoring, demonstrating statistical consistency and scalability to thousands of loci. An updated version, ASTRAL-III (2018), enhances efficiency through polynomial-time algorithms for handling partially resolved gene trees and polytomies, improving runtime for large datasets up to 10,000 taxa while maintaining accuracy in ILS scenarios. Other specialized software includes SVDquartets, which enables rapid quartet-based inference directly from single nucleotide polymorphism (SNP) data under the multispecies coalescent model, bypassing the need for explicit gene tree estimation. Developed in 2014, SVDquartets uses singular value decomposition to compute quartet concordance factors from genomic alignments, producing a species tree via summary methods like ASTRAL, and is particularly efficient for high-throughput SNP datasets where ILS is prevalent. PhyloNet provides a comprehensive framework for Bayesian inference of species networks under the multispecies coalescent, accommodating both ILS and hybridization events through methods like MCMC_SEQ, which jointly estimates gene trees and network topologies from sequence alignments. Computational pipelines often integrate these tools with next-generation sequencing (NGS) workflows for ILS analysis; for instance, BUCKy performs Bayesian concordance analysis by clustering compatible gene trees to reconcile them with a species tree, handling ILS via posterior probabilities of quartet topologies derived from multi-locus data. Similarly, *BEAST implements full Bayesian co-estimation of gene trees and species trees under the multispecies using sampling on aligned sequences, allowing incorporation of ILS into phylogenetic inference alongside divergence time estimation. A more recent advancement is TRAILS (2024), a hidden Markov model-based tool for reconstructing ancestry in three-taxon scenarios by leveraging ILS-induced genealogical fragments along genomes, inferring time-resolved demographic parameters like effective population sizes from phased data. Another recent tool, Phytop (2025), facilitates visualization and recognition of ILS signals by analyzing gene tree topology patterns, aiding in the detection of ILS extent among lineages. Best practices for using these tools emphasize robust handling of , such as through gene tree contraction in ASTRAL-III to mitigate biases from incomplete loci, and incorporating bootstrap resampling to assess support for ILS-affected branches in species trees. Validation against coalescent simulations, generated via tools like msABC or SLiM, is recommended to evaluate pipeline performance under varying ILS levels, ensuring inferences align with expected gene tree discordance patterns like those detectable via ABBA-BABA statistics.

Biological Implications

Effects on Phylogenetics

Incomplete lineage sorting (ILS) generates substantial gene tree discordance, where the topologies of individual gene trees deviate from the species tree due to the random retention of ancestral polymorphisms across speciation events. In scenarios of rapid radiations, only approximately 30-60% of loci may produce gene trees that match the species tree topology, resulting in widespread phylogenetic incongruence. This discordance often manifests as conflicting signals in concatenation-based phylogenetic analyses, where loci supporting alternative topologies are combined into a supermatrix, leading to misleading support for incorrect branches and reduced resolution of evolutionary relationships. A particularly challenging consequence of ILS is the existence of anomaly zones in the parameter space of species trees, where the most probable tree differs from the species tree. These zones arise when internal branches are short—typically less than approximately 0.156 units for the four-taxon case—causing the probability of discordant gene trees to exceed that of the matching under the multispecies model. Without -aware models, standard phylogenetic methods, such as maximum likelihood on concatenated data, are prone to inferring incorrect species trees in these regions, as the aggregate signal favors anomalous topologies. ILS also introduces biases in distance-based phylogenetic methods, primarily through the underestimation of divergence times. Shared ancestral polymorphisms reduce observed genetic distances between species, as some lineages coalesce prior to the speciation event, inflating the apparent similarity and compressing estimated branch lengths. This effect is exacerbated in datasets with high ILS, leading to systematically younger divergence estimates that fail to reflect the true temporal separation of lineages. To mitigate these effects, summary coalescent methods—such as ASTRAL—that integrate multiple trees while accounting for ILS under the multispecies model generally outperform supermatrix approaches in datasets dominated by ILS. Simulations demonstrate that these methods recover the correct tree with higher accuracy, particularly in scenarios with short internal branches and high discordance, as evidenced in mammalian phylogenomic analyses. Concordance factors, which quantify the proportion of trees supporting each branch, can briefly aid in assessing the extent of ILS-induced discordance.

Role in Speciation

Incomplete lineage sorting (ILS) plays a pivotal role in by allowing ancestral polymorphisms to persist across species boundaries, thereby maintaining that can delay the establishment of complete . When populations rapidly, not all ancestral alleles coalesce before events, leading to shared genetic variants among descendant lineages. This retention of polymorphisms can hinder the fixation of species-specific alleles necessary for reproductive barriers, potentially prolonging or shared ancestry signals even after initial . ILS is particularly prevalent in allopatric speciation scenarios involving rapid radiations, where short internodes in the species tree limit coalescence time, resulting in high levels of ancestral polymorphism retention. In contrast, sympatric speciation tends to involve ongoing gene flow that can interact with or override ILS effects, as divergent selection acts amid continuous contact. Rapid allopatric events, such as those in adaptive radiations, thus exhibit elevated ILS compared to slower or sympatric divergences, preserving variation that may facilitate adaptation to new environments. Evolutionarily, ILS can mimic signals of hybridization by generating genomic patterns of allele sharing that resemble , thereby complicating estimates of times and species boundaries. This phenomenon contributes to by conserving beneficial ancestral variation, allowing descendant species to draw from a shared pool of adaptive without requiring de novo mutations. For instance, hemiplasy—where ancestral polymorphisms sort differently across lineages—can lead to trait evolution that aligns with species trees despite gene tree discordance. Quantitatively, ILS accounts for substantial genomic heterogeneity in young species pairs, often contributing to 30-60% of observed discordance, which influences hybrid viability by perpetuating incompatible allelic combinations. In rapidly diverging lineages, such as those in marsupials or , ILS affects over 50% of the , with reduced levels in regions under selection like the , highlighting its impact on dynamics.

Applications in Evolution

In Primate and Human Evolution

Incomplete lineage sorting (ILS) has profoundly influenced the evolutionary history of , particularly in the , where short divergence times and large ancestral effective population sizes (Ne) have led to substantial retention of ancestral polymorphisms. In the trio of , , and lineages, approximately 30% of the deviates from the expected species tree topology ((, ), ), reflecting high levels of ILS driven by an ancestral Ne of around 50,000 individuals and divergence times of roughly 6-8 million years ago (Mya) between the and the earlier . This ILS manifests in ABBA-BABA test patterns, where gene trees show discordant topologies; notably, the exhibits accelerated sorting compared to autosomes due to selective sweeps, reducing ILS to lower levels and highlighting sex-specific evolutionary dynamics in early hominins. Seminal work by Mailund et al. (2012) modeled these processes using isolation-with-migration frameworks on great genomes, revealing that accounting for ILS adjusts divergence estimates, often pushing the to 5.5-6.3 Mya rather than earlier assumptions without ILS correction. In , ILS complicates inferences of admixture with archaic hominins, particularly in the Neanderthal-Denisovan-modern trio, where ancestral polymorphisms from a large Ne (~10,000-20,000) persist despite divergences around 500,000-800,000 years ago. These patterns can mimic signals, as shared archaic alleles may arise from ILS rather than . Recent studies estimate that non-African populations carry ~1.5-2% Neanderthal-derived DNA, with ILS these signals by inflating apparent in low-divergence regions; similar effects apply to the ~0.1-0.5% Denisovan ancestry in Oceanians, where ILS in the shared ancestral population contributes to overlapping distributions. This underscores how ILS not only obscures admixture quantification but also informs the timing of , suggesting a more reticulated evolutionary history. Beyond great apes, ILS is rampant in radiations, such as (Platyrrhini), where rapid diversification ~25-40 Mya amid large Ne led to genome-wide discordance, with substantial numbers of loci showing alternative topologies in platyrrhine intergeneric relationships. In strepsirrhines, including lemurs, ancient rapid radiations ~60-70 Mya created high levels of ILS, affecting basal phylogeny reconstruction. These patterns, quantified in comprehensive phylogenomic analyses, highlight ILS as a driver of phylogenetic uncertainty in early evolution, with implications for understanding adaptive radiations in diverse habitats.

In Viral and Microbial Evolution

In , incomplete lineage sorting (ILS) manifests differently from eukaryotic systems due to high rates, short generation times, and frequent recombination or reassortment, which can produce ILS-like patterns of gene tree discordance. In viruses such as , within-host leads to ILS during transmission events, resulting in phylogenetic trees that do not accurately reflect transmission order because ancestral polymorphisms persist across host boundaries. This is amplified by HIV's error-prone , which generates substantial intrahost variation, mimicking incomplete sorting in reconstructions. Similar patterns occur in segmented RNA viruses like influenza A, where rapid during pandemics causes gene tree discordance across segments, potentially attributable to ILS alongside reassortment. For instance, analyses of gene phylogenies in the 2009 H1N1 pandemic strain reveal incongruent topologies explained in part by incomplete lineage sorting and missing ancestral sequences in sampled data. In influenza , such discordance complicates tracking of viral lineages, with studies estimating substantial topological mismatches that challenge species tree inference. In microbial , particularly , ILS is less common than in eukaryotes owing to large effective population sizes and rapid coalescence, but it can arise in with fragmented . In strains, phylogenetic incongruence among trees is often driven by (HGT) rather than ILS, with HGT introducing mosaic genomes that mimic sorting discordance across loci. For example, in the E. coli , reticulate via HGT produces multiple topologies consistent with gene exchange, rejecting ILS as the primary cause while highlighting how transfer events obscure vertical inheritance patterns. Recent studies on illustrate these dynamics in coronaviruses, where spike gene phylogenies show discordance potentially due to ILS, though recombination predominates. Analyses of sarbecovirus lineages, including variants, reveal frequent recombination events that generate breakpoint-free genomic regions with conflicting phylogenies, distinguishable from ILS using coalescent models. In 2020s research, trees from early samples exhibit topological mismatches, with coalescent-based approaches estimating low ILS contributions compared to recombination in preserving variant diversity. These processes pose challenges in viral and microbial , as short generation times promote persistent ancestral polymorphisms, leading to frequent incomplete sorting that is hard to disentangle from recombination or HGT without advanced modeling. frameworks, such as those in general detection methods, help differentiate ILS by simulating expected discordance under neutral coalescence versus reticulate signals. Implications extend to , where ILS-like patterns in viruses like maintain genetic diversity in surface proteins, complicating design by allowing escape variants to persist across lineages. A seminal study by Boni et al. (2020) underscores this by quantifying recombination hotspots in ancestors, emphasizing the need to account for both ILS and recombination to predict evolutionary trajectories.

In Other Taxa

Incomplete lineage sorting (ILS) has been extensively documented in avian evolution, particularly within the rapid radiation of neoavian birds following the Cretaceous-Paleogene extinction. Genome-wide analyses of modern birds reveal high levels of gene tree discordance attributable to ILS, with probabilities exceeding 30-50% along deep branches in the neoavian tree. In birds, a diverse comprising over half of all bird species, ILS contributes to substantial phylogenetic incongruence, especially in rapid radiations such as those within the tanager family (Thraupidae), where ancestral polymorphisms persist and obscure species relationships. These patterns highlight how short internodes and large ancestral effective population sizes during avian diversification promote widespread ILS, complicating resolution of the avian tree of life. In , ILS plays a prominent role in the phylogenomics of , the largest of flowering plants, where it interacts with other evolutionary processes like during adaptive bursts. A recent phylogenomic study of early-diverging eudicot lineages demonstrates pervasive ILS and hybridization, leading to tree discordance that challenges traditional tree inferences. High levels of ILS are particularly evident in angiosperm radiations, such as those in the cotton genus (), where ancestral polymorphisms combine with polyploid events to shape genomic diversity and adaptive evolution. This interplay underscores ILS as a key mechanism facilitating rapid diversification in plant lineages with complex histories. Among insects, ILS is well-characterized in speciation, where large effective population sizes allow ancestral polymorphisms to persist for approximately 1-2 million years, generating extensive tree-species tree mismatches. In the pseudoobscura group, for instance, incomplete sorting of ancestral variation explains much of the observed phylogenetic discordance, influencing patterns of across species. Furthermore, ILS contributes to the of complexes in insects like , where retained ancestral alleles facilitate the convergence of wing patterns in rings, promoting adaptive trait sharing without recent flow. Comparatively, the prevalence of ILS varies across taxa based on (Ne) and , with higher rates in groups exhibiting large Ne, such as , compared to those experiencing population bottlenecks, like mammals following radiations. In (Papilionoidea), extensive ILS drives phylogenetic discordance during rapid alpine radiations, reflecting prolonged coalescence times in large ancestral populations. This contrast illustrates how demographic factors modulate ILS, with expansive Ne enhancing its impact in invertebrates relative to bottlenecked vertebrate lineages.

Analogies in Other Fields

In Linguistics

Incomplete lineage sorting (ILS) in linguistics refers to the metaphorical persistence of ancestral linguistic polymorphisms—such as variant forms in , , or morphology—across the divergence of daughter languages, leading to discordant patterns in family trees that mimic non-tree-like evolution. This analogy draws from biological ILS, where ancestral genetic variants fail to coalesce before , but in languages, it arises from sociolinguistic variation in proto-languages that resolves unevenly during cultural transmission. For instance, in Indo-European branches, competing variants like near-synonyms in Germanic (knabō and knappaz for 'boy') or pronouns (izwiz/iwiz for second person plural) can distribute across descendant languages in ways that suggest irregular inheritance rather than strict bifurcation. Examples of ILS-like patterns appear in the retention of archaic Latin features across , where variant forms from —such as phonological shifts or lexical doublets—persist unevenly, causing shared traits between non-sister branches like Italian and Romanian despite their divergence. Similarly, rapid radiations in the Austronesian family exhibit discordant lexical and grammatical signals, where ancestral polymorphisms in proto-Austronesian variants lead to overlapping isoglosses among distant subgroups, complicating tree reconstruction. Linguists model these processes using coalescent-inspired phylogenetic methods adapted from biology, such as on lexical datasets, to distinguish ILS from borrowing (analogous to ). These approaches quantify how ancestral variants "sort" into dialects or languages without invoking . However, the remains metaphorical, as languages evolve through cultural and social mechanisms rather than genetic replication, limiting direct application of biological models; recent work, such as in 2019 studies on phonetic and lexical , has begun quantifying ILS in dialect continua to address these gaps.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.