Hubbry Logo
Long branch attractionLong branch attractionMain
Open search
Long branch attraction
Community hub
Long branch attraction
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Long branch attraction
Long branch attraction
from Wikipedia

In phylogenetics, long branch attraction (LBA) is a form of systematic error whereby distantly related lineages are incorrectly inferred to be closely related.[1] LBA arises when the amount of molecular or morphological change accumulated within a lineage is sufficient to cause that lineage to appear similar (thus closely related) to another long-branched lineage, solely because they have both undergone a large amount of change, rather than because they are related by descent. Such bias is more common when the overall divergence of some taxa results in long branches within a phylogeny. Long branches are often attracted to the base of a phylogenetic tree, because the lineage included to represent an outgroup is often also long-branched. The frequency of true LBA is unclear and often debated,[1][2][3] and some authors view it as untestable and therefore irrelevant to empirical phylogenetic inference.[4] Although often viewed as a failing of parsimony-based methodology, LBA could in principle result from a variety of scenarios and be inferred under multiple analytical paradigms.

Causes

[edit]

LBA was first recognized as problematic when analyzing discrete morphological character sets under parsimony criteria, however Maximum Likelihood analyses of DNA or protein sequences are also susceptible. A simple hypothetical example can be found in Felsenstein 1978 where it is demonstrated that for certain unknown "true" trees, some methods can show bias for grouping long branches, ultimately resulting in the inference of a false sister relationship.[5] Often this is because convergent evolution of one or more characters included in the analysis has occurred in multiple taxa. Although they were derived independently, these shared traits can be misinterpreted in the analysis as being shared due to common ancestry.

In phylogenetic and clustering analyses, LBA is a result of the way clustering algorithms work: terminals or taxa with many autapomorphies (character states unique to a single branch) may by chance exhibit the same states as those on another branch (homoplasy). A phylogenetic analysis will group these taxa together as a clade unless other synapomorphies outweigh the homoplastic features to group together true sister taxa.

These problems may be minimized by using methods that correct for multiple substitutions at the same site, by adding taxa related to those with the long branches that add additional true synapomorphies to the data, or by using alternative slower evolving traits (e.g. more conservative gene regions).

Results

[edit]

The result of LBA in evolutionary analyses is that rapidly evolving lineages may be inferred to be sister taxa, regardless of their true relationships. For example, in DNA sequence-based analyses, the problem arises when sequences from two (or more) lineages evolve rapidly. There are only four possible nucleotides and when DNA substitution rates are high, the probability that two lineages will evolve the same nucleotide at the same site increases. When this happens, a phylogenetic analysis may erroneously interpret this homoplasy as a synapomorphy (i.e., evolving once in the common ancestor of the two lineages).

The opposite effect may also be observed, in that if two (or more) branches exhibit particularly slow evolution among a wider, fast evolving group, those branches may be misinterpreted as closely related. As such, "long branch attraction" can in some ways be better expressed as "branch length attraction". However, it is typically long branches that exhibit attraction.

The recognition of long-branch attraction implies that there is some other evidence that suggests that the phylogeny is incorrect. For example, two different sources of data (i.e. molecular and morphological) or even different methods or partition schemes might support different placement for the long-branched groups.[6] Hennig's Auxiliary Principle suggests that synapomorphies should be viewed as de facto evidence of grouping unless there is specific contrary evidence (Hennig, 1966; Schuh and Brower, 2009).

A simple and effective method for determining whether or not long branch attraction is affecting tree topology is the SAW method, named for Siddal and Whiting. If long branch attraction is suspected between a pair of taxa (A and B), simply remove taxon A ("saw" off the branch) and re-run the analysis. Then remove B and replace A, running the analysis again. If either of the taxa appears at a different branch point in the absence of the other, there is evidence of long branch attraction. Since long branches can't possibly attract one another when only one is in the analysis, consistent taxon placement between treatments would indicate long branch attraction is not a problem.[7]

Example

[edit]
An example of long branch attraction. On this "true tree", branches leading to A and C might be expected to have a higher number of character state transformations than the internal branch or branches leading to B and D. 

Assume for simplicity that we are considering a single binary character (it can either be + or –) distributed on the unrooted "true tree" with branch lengths proportional to amount of character state change, shown in the figure. Because the evolutionary distance from B to D is small, we assume that in the vast majority of all cases, B and D will exhibit the same character state. Here, we will assume that they are both + (+ and – are assigned arbitrarily and swapping them is only a matter of definition). If this is the case, there are four remaining possibilities. A and C can both be +, in which case all taxa are the same and all the trees have the same length. A can be + and C can be –, in which case only one character is different, and we cannot learn anything, as all trees have the same length. Similarly, A can be – and C can be +. The only remaining possibility is that A and C are both –. In this case, however, we view either A and C, or B and D, as a group with respect to the other (one character state is ancestral, the other is derived, and the ancestral state does not define a group). As a consequence, when we have a "true tree" of this type, the more data we collect (i.e. the more characters we study), the more of them are homoplastic and support the wrong tree.[8] Of course, when dealing with empirical data in phylogenetic studies of actual organisms, we never know the topology of the true tree, and the more parsimonious (AC) or (BD) might well be the correct hypothesis.

Long branch repulsion

[edit]

While likelihood-based estimates are relatively more resistant to long branch attraction, they may fail in the opposite way: when two closely related taxa have long branches, they may be incorrectly separated. This is long branch repulsion (LBR).[9]

Avoidance

[edit]

Non-parsimony methods such as Bayesian inference and Maximum likelihood tends to reduce the occurrence of LBA, but does not eliminate it fully.[10] (Bayesian is more prone to LBA in this regard.)[11] Specifically, they still struggle with cases of compositional heterogeneity among taxa and sites, which invalidate the assumption of basic substitution models. This can be avoided by using a mixture model or PMSF which takes into account these possibilities. Amino-acid recoding and data filtering with compositional tests can also help.[12]

Excluding problematic portions of the data such as fast-evolving sites can help. Exclusion of certain taxa from analysis, either the long-branching ones themselves, or some regular taxa, also occasionally helps, though adding taxa tends to help in more cases. Adding data from taxa related to the long-branching taxon can break up the branch into smaller, more manageable pieces. Many more methods are useful in detecting LBA. Worked examples of detection and avoidance can be found in Bergsten (2005).[13]

Evaluation of methods

[edit]

The resistance of a method to LBA and LBR is empirically tested using challenging real or simulated data. With real data one is not totally sure of the ground truth, but they are guaranteed to be naturalistic. With simulated data one can specify a "true" shape of the tree and a model of evolution (hopefully one that resembles natural evolution). Some real data known to be challenging include:[10]

  • Leebens-Mack et al. 2005 angiosperm data set, where protein and nucleotide produced different results in their analysis
  • Brinkmann et al. 2005 dataset containing slow-evolving eukaryotes, archaea, and one fast-evolving microsporidian
  • The "nematode" and "platyhelminth" datasets in Lartillot et al. 2007
  • The Brown et al. 2013 dataset, which may or may not recover an "Obazoa".

On the simulation side, a classic program is Seq-gen.[14] The page for Pro-cov lists a number of later variants of Seq-gen to represent different kinds of heterotachy.[15]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Long branch attraction (LBA) is a systematic in phylogenetic methods, particularly parsimony and certain distance-based approaches, wherein distantly related taxa exhibiting long branches—indicative of accelerated evolutionary rates—are erroneously inferred to share a recent common ancestry due to convergent homoplasies that overwhelm true synapomorphies. This artifact leads to inconsistent topologies, where the probability of recovering the incorrect increases with additional under specific conditions, such as unequal lengths and limited character states typical of molecular sequences. The phenomenon was theoretically demonstrated by Joseph Felsenstein in 1978 through a four-taxon model, often termed the "Felsenstein zone," where two long peripheral branches separated by a short internal branch cause parsimony to favor a wrong resolution over the true topology. Subsequent studies expanded this to compatibility methods, maximum likelihood under simplistic models, and even scenarios with equal evolutionary rates but unbalanced taxon sampling, highlighting LBA's prevalence in due to data's four-state alphabet and substitution saturation. LBA is exacerbated by sparse sampling, distant outgroups forming long branches, and ancient rapid radiations, as seen in analyses of nematodes or relationships where fast-evolving lineages like Nematoda are misplaced. To mitigate LBA, researchers employ strategies such as increasing density to subdivide long branches, using model-based methods like with site-heterogeneous models (e.g., CAT-GTR), removing or recoding fast-evolving sites, and long-branch extraction techniques that test for artifactual attraction by pruning suspect taxa. These approaches have improved resolution in challenging datasets, such as mitochondrial genomes of , by reducing and enhancing topological stability. Despite advances, LBA remains a critical consideration in phylogenomics, underscoring the need for robust and comprehensive sampling.

Overview

Definition and Basic Principles

Long branch attraction (LBA) is a systematic in phylogenetic reconstruction that affects both distance-based and parsimony-based methods, causing unrelated taxa with long branches—reflecting accelerated evolutionary rates—to be artifactually grouped as closely related. This erroneous clustering occurs because these methods underestimate the true evolutionary distances between such taxa or misinterpret shared homoplasies (convergent or parallel changes) as of common ancestry. To understand LBA, it is essential to grasp foundational concepts in . Phylogenetic trees are branching diagrams that illustrate hypothesized evolutionary relationships among taxa, with nodes representing ancestral lineages and tips denoting extant or observed . Branch lengths in these trees quantify evolutionary distance, typically measured as the expected number of substitutions per site along a lineage, which accumulates over time due to genetic changes. Substitution models provide a framework for estimating these distances from data; for instance, the Jukes-Cantor model assumes equal probabilities of substitutions and corrects raw divergence for multiple substitutions at the same site, which are otherwise invisible in observed data. At its core, LBA emerges from the limitations of parsimony and simple distance methods in handling long branches, where extensive multiple substitutions lead to a loss of phylogenetic signal and an apparent convergence in character states between distantly related lineages. This results in the methods favoring incorrect tree topologies that unite the long-branched taxa. The issue is most acute in the "Felsenstein zone," a defined region of evolutionary parameter space—characterized by unequal branch lengths and short internal branches—in which these methods are statistically inconsistent, recovering the wrong tree with higher probability as more data are added. In certain likelihood-based approaches, the opposite effect of long branch repulsion may occur under specific modeling conditions, separating long branches more than expected.

Historical Development

The concept of long branch attraction (LBA) was first formally described by Joseph Felsenstein in 1978, through simulation studies that demonstrated how parsimony and compatibility methods could be positively misled under certain evolutionary scenarios involving unequal branch lengths. In these analyses, Felsenstein illustrated a four-taxon case where long branches converged artifactually, highlighting the statistical inconsistency of distance-independent methods when evolutionary rates vary significantly across lineages. During the 1980s and 1990s, the understanding of LBA evolved through further theoretical and empirical work in , with recognition of its impacts on early sequence-based trees, including challenges posed by rapidly evolving (mtDNA) sequences that often produced misleading groupings. A key advancement came in 1994, when Mary K. Kuhner and Felsenstein used simulations to compare phylogeny algorithms under equal and unequal evolutionary rates, showing that maximum likelihood methods were more robust to LBA than parsimony by better accounting for rate heterogeneity. In the 2000s, LBA gained broader attention as a pervasive artifact in real datasets, with Johannes Bergsten's 2005 comprehensive review synthesizing detection and avoidance strategies, emphasizing its role in misplacing long-branched taxa relative to outgroups. Post-2010 developments have integrated LBA considerations into phylogenomics, revealing that even advanced methods like partitioned models and coalescent-based approaches can suffer from LBA biases in large-scale genomic analyses, leading to inconsistencies in species tree estimation under conditions of long branches, as shown in studies using Bayesian methods.

Mechanisms

Underlying Causes

Long branch attraction (LBA) arises primarily from biological factors that generate uneven evolutionary rates across lineages, resulting in disproportionately long branches on phylogenetic trees. High substitution rates in certain taxa, often driven by small effective population sizes that reduce the efficiency of and allow more neutral or slightly deleterious mutations to fix, create these extended branches. Adaptive radiations, where lineages undergo rapid diversification under strong selective pressures, can also accelerate evolutionary change, leading to long branches as seen in bacterial groups like . Additionally, —arising from where unrelated lineages independently acquire similar traits—or saturation of substitutions, where multiple hits at the same site obscure true distances, further contributes to misleading similarities between fast-evolving lineages. Statistically, LBA is exacerbated by inadequacies in phylogenetic models that fail to account for these rate heterogeneities. Uncorrected distance metrics, such as the observed p-distance, underestimate true evolutionary distances between long-branch taxa due to multiple substitutions at the same sites, causing them to appear artificially closer than they are compared to model-corrected distances like those from the Jukes-Cantor formula. Maximum parsimony methods are particularly sensitive to LBA because they preferentially group taxa sharing apparent derived characters that are actually homoplastic, as demonstrated in theoretical four-taxon cases with unequal rates. Several factors intensify LBA by amplifying branch length disparities. In quartet topologies with highly unequal branch lengths—two long branches separated by short internals—reconstruction methods inconsistently recover the correct , a condition known as the Felsenstein zone. Incomplete sampling, by omitting intermediate taxa, artificially elongates branches through or undersampling, thereby heightening the attraction between distant fast-evolving lineages.

Mathematical Foundations

The mathematical foundations of long branch attraction (LBA) were established by Joseph Felsenstein in his analysis of phylogenetic reconstruction methods, demonstrating that maximum parsimony can fail to converge to the correct tree as the amount of data increases, a phenomenon known as statistical inconsistency. Felsenstein's core model is a four-taxon unrooted tree () with the true ((A, B), (C, D)), featuring two long terminal branches to taxa A and C, short terminal branches to B and D, and a short internal branch separating the (A, B) and (C, D) clades. Under unequal evolutionary rates—where the long branches accumulate substitutions at a higher rate than the short ones—parsimony preferentially groups the long-branched taxa A and C together, as homoplastic (convergent) changes on those branches mimic shared derived characters. This setup highlights how branch length asymmetry biases inference toward the incorrect ((A, C), (B, D)) . To formalize the bias under parsimony, consider site pattern probabilities derived from the Jukes-Cantor model, which assumes equal substitution rates among four nucleotides and no site-specific preferences. In this framework, the probability of inferring the incorrect grouping in the quartet (A, B | C, D)—where the vertical bar denotes the true split—increases with branch length asymmetry due to the dominance of homoplastic patterns. The full derivation involves computing the expected frequencies of the six parsimony-informative site patterns for the quartet: under the true tree, patterns supporting (A, B | C, D) (e.g., A and B share state 1, C and D share state 2) require a single substitution on the internal branch, with probability roughly proportional to the internal length ϵ\epsilon (small). In contrast, patterns supporting the wrong tree (e.g., A and C share state 1, B and D share state 2) arise from parallel independent substitutions on the two long branches, each with change probability approximately 1e2μt2\frac{1 - e^{-2\mu t}}{2} in a symmetric two-state analog, yielding a joint probability on the order of (1e2μt2)2\left( \frac{1 - e^{-2\mu t}}{2} \right)^2. When (1e2μt2)2>ϵ/2\left( \frac{1 - e^{-2\mu t}}{2} \right)^2 > \epsilon / 2, the wrong patterns outnumber the correct ones, and parsimony scores the incorrect tree as shorter; simulations confirm that as site number nn \to \infty, the probability of selecting the wrong tree tends to 1 if this inequality holds. The parameter space defining LBA is termed the "Felsenstein zone," where the internal branch length ϵ\epsilon is short relative to the long terminal branches tt. In this zone, parsimony error rates exceed 50% and can approach 100% with ample data, as overwhelms the weak historical signal. This contrasts sharply with assumptions in star phylogenies, where equal branch lengths ensure that parsimony-informative sites unambiguously support the true splits, yielding consistent recovery of the correct with probability approaching 1 as nn \to \infty. Rate heterogeneity across lineages generates the unequal lengths that place trees within this problematic zone.

Consequences

Observed Effects

One of the primary effects of long branch attraction (LBA) in phylogenetic reconstruction is the artificial clustering of taxa exhibiting long branches, where distantly related lineages with accelerated evolutionary rates are erroneously grouped together due to convergent homoplasies being misinterpreted as shared derived characters. This phenomenon particularly affects fast-evolving outgroups, which are often pulled into ingroup clades, thereby distorting the overall tree topology and leading to incorrect inferences of or sister-group relationships. In clades impacted by LBA, the signal from long-branch convergence can overwhelm true phylogenetic signals, resulting in reduced resolution and the formation of polytomies where internal branches collapse due to inconsistent character support across sites. LBA also induces significant inferential impacts on phylogenetic analyses, including biased estimates of divergence times, as incorrect topologies misplace calibration points and alter the relative branch lengths used in molecular clock models. Support values for these erroneous clades are frequently inflated; for instance, methods like parsimony or maximum likelihood can assign high bootstrap proportions or posterior probabilities to wrong relationships, providing false confidence in the reconstruction as sequence data increases. Furthermore, in tree-building algorithms employing stepwise addition, such as certain implementations of parsimony or neighbor-joining, LBA errors propagate through the analysis, with initial misgroupings of long branches influencing subsequent placements and compounding inaccuracies in larger datasets. The broader consequences of LBA extend to the generation of misleading evolutionary hypotheses, where artifactual topologies imply false patterns of ancestry and diversification; for example, it has contributed to erroneous conclusions about in major groups such as and mammals, undermining interpretations of adaptive radiations and biogeographic histories. These distortions highlight LBA as a systematic that not only affects individual tree estimates but also perpetuates inaccuracies in comparative unless explicitly addressed.

Illustrative Examples

One of the earliest and most influential demonstrations of long branch attraction (LBA) comes from Joseph Felsenstein's 1978 simulation study, which examined a four-taxon () tree involving DNA sequences evolving under unequal rates across branches. In this setup, two sister taxa evolve slowly while their outgroups evolve rapidly, creating long peripheral branches that cause maximum parsimony methods to incorrectly group the long-branch taxa together, inferring the wrong with high confidence as sequence length increases. In contrast, distance-based methods, which account for branch lengths, recover the correct , highlighting parsimony's inconsistency under rate heterogeneity. A prominent real-world example of LBA occurred in early molecular phylogenies of eukaryotes based on small-subunit () genes during the 1990s, where —a group of intracellular parasites—were artifactually placed at the base of the eukaryotic tree. This basal positioning resulted from the accelerated evolutionary rates in microsporidian rRNA sequences, which produced long branches that attracted them toward other fast-evolving basal lineages or the , misleading parsimony and early maximum likelihood analyses. Subsequent studies using slower-evolving genes and LBA-aware methods repositioned as derived relatives of fungi, confirming the artifactual nature of the initial deep ing. In arthropod phylogenomics, LBA has notably affected the placement of , a small order of endoparasitic with highly accelerated . Early 18S rRNA-based analyses in the 1990s and 2000s often artifactually clustered Strepsiptera with Diptera (flies) due to shared long branches from rate acceleration in both lineages, an error termed the "" hypothesis. More recent phylogenomic studies employing next-generation sequencing data and site-heterogeneous models in the have mitigated this attraction, robustly supporting Strepsiptera as the to Coleoptera (beetles) within a well-resolved .

Long Branch Repulsion

Long branch repulsion (LBR) is a phylogenetic artifact in which maximum likelihood (ML) methods incorrectly infer long-branch taxa as more distantly related than they truly are, often placing them on separate clades despite their actual close relatedness. This bias emerges particularly in the "Farris zone," a parameter space defined by simulations where external branches are long and the internal branch separating sister taxa is short, leading ML to favor topologies that artificially separate the long branches. Unlike parsimony, which performs well in this scenario by correctly grouping the taxa, ML exhibits this repulsion due to its probabilistic modeling of evolutionary distances. The mechanism of LBR stems from ML's over-correction for multiple substitutions (multiple hits) along long branches, which underestimates shared evolutionary history and inflates perceived distances between fast-evolving lineages. This effect can be amplified by models incorporating gamma-distributed rate variation across sites, as the correction for heterogeneous rates pushes long branches farther apart in the inferred , contrasting with the "pulling" attraction seen in long branch attraction (LBA) under simpler or distance-based methods. Site-heterogeneous models, such as those accounting for compositional heterogeneity, may occasionally mitigate LBA but can induce LBR when the model overparameterizes rate shifts, leading to inconsistent topologies in ML analyses. Recent studies as of 2024 have highlighted LBR's persistence in advanced methods, including phylogenetics and over-parameterized mixture models. Although rarer than LBA, LBR has been documented in empirical studies using codon substitution models and site-heterogeneous approaches like the CAT model, particularly in phylogenies with fast-evolving taxa. For instance, in a analysis of mitochondrial genomes from tinamous and ratites, ML under the GTR + Γ model repelled long-branch taxa (e.g., moas from their true sisters), an artifact attributed to rate heterogeneity over-correction, which was resolved only by removing fast sites.

Comparisons to Other Phylogenetic Artifacts

Long branch attraction (LBA) differs from compositional bias in that the latter arises from differences in nucleotide or amino acid composition across taxa, such as GC-content disparities that draw AT-rich lineages together irrespective of substitution rates, potentially leading to incorrect groupings without the rate heterogeneity central to LBA. In contrast, LBA specifically involves the erroneous clustering of distantly related, fast-evolving taxa due to shared homoplasies on long branches, as demonstrated in simulations where compositional adjustments alone failed to resolve rate-driven artifacts. For instance, in phylogenomic studies of archaea, compositional heterogeneity mimicked LBA but was mitigated by recoding schemes targeting base frequencies, highlighting their independence. Heterotachy, characterized by temporal shifts in site-specific evolutionary rates, can produce apparent long branches that induce LBA-like artifacts by violating stationary models, though it fundamentally stems from rate variation across evolutionary time rather than consistent high rates along entire branches as in classic LBA. Unlike LBA, which is branch-length specific and exacerbated by methods like maximum parsimony, heterotachy affects maximum likelihood more uniformly across rate regimes, as shown in simulations where site-rate shifts reduced accuracy without requiring unbalanced trees. In eukaryotic phylogenies, heterotachy has been linked to misleading placements of fast-evolving groups like microsporidia, where it compounds with but is distinguishable from LBA through model-based diagnostics. LBA is rate- and branch-length driven, setting it apart from saturation, which represents pure signal erosion from multiple substitutions without inherently causing grouping errors unless branches are asymmetrically long; saturation alone erodes phylogenetic signal but does not attract non-sisters in balanced topologies. Similarly, in distance-based methods, long-branch approximation underestimates pairwise distances between rapidly evolving lineages due to saturation corrections, fostering attraction akin to LBA but as a methodological limitation rather than a parsimony-specific . These distinctions underscore LBA's reliance on topological imbalance for artifactual grouping. LBA often interacts with other biases, such as in sparse sampling, where limited operational taxonomic units create or elongate branches, amplifying both LBA and biases by failing to break up problematic polytomies or introduce intermediate rates. For example, in mitogenomic analyses, undersampled clades led to compounded errors, with incomplete taxa sometimes rescuing accuracy by diluting long-branch signals but more frequently exacerbating attraction when data gaps correlate with high rates. This synergy highlights how LBA can propagate through datasets with poor sampling density, distinct from isolated artifacts like long branch repulsion that disperses rather than clusters elongate lineages.

Mitigation Strategies

Detection and Avoidance Methods

Detecting long branch attraction (LBA) in phylogenetic analyses often involves stability tests, where long-branch taxa are repeatedly removed and the analysis is re-run to assess changes in tree topology and nodal support. These tests evaluate whether the removal of suspected fast-evolving taxa stabilizes the phylogeny, indicating potential LBA if topologies shift significantly. Relative rate tests, such as Tajima's test, compare evolutionary rates between lineages using a reference to identify rate heterogeneity that may contribute to LBA. Tajima's test, implemented in software like MEGA, rejects the of equal rates if the is below 0.05, signaling accelerated evolution in specific branches. Visualization techniques further aid detection; branch length plots highlight disproportionately long branches, while likelihood mapping assesses the phylogenetic signal by plotting quartets in a ternary diagram, where star-like distributions indicate unresolved signal potentially due to LBA. In likelihood mapping, the proportion of resolved quartets versus stars provides a quantitative measure of signal strength. Avoiding LBA primarily relies on strategic taxon sampling to break up long branches by incorporating intermediate taxa that subdivide rapidly evolving lineages, thereby reducing the distance between fast-evolving groups. This approach has been shown to resolve LBA artifacts in datasets with sparse sampling, such as pseudoscorpion phylogenies, by adding representatives from basal lineages. Model selection plays a crucial role, with maximum likelihood (ML) methods under complex substitution models like GTR + Γ preferred over parsimony, as they better account for multiple substitutions and rate heterogeneity, mitigating LBA's pull. The GTR + Γ model incorporates among-site rate variation via a gamma distribution, which helps in datasets with heterogeneous evolutionary rates. In phylogenomics, slow-fast site partitioning separates sites into slow-, medium-, and fast-evolving categories based on substitution rates, allowing independent model application to reduce noise from saturated fast sites that exacerbate LBA. This partitioning, often using tools like PartitionFinder, focuses analyses on slower-evolving partitions for more reliable inference. Several software tools facilitate LBA detection and avoidance. RAxML employs ML inference with models like GTR + Γ to produce robust trees less prone to LBA compared to parsimony-based methods. PhyloBayes supports Bayesian analyses with site-heterogeneous models such as CAT, which model rate variation across sites to suppress LBA artifacts effectively. Recent approaches from the , including reference methods that use slowly evolving outgroups or paralogs to calibrate rates and break branches, enhance stability in large phylogenomic datasets. These tools, often integrated in pipelines like IQ-TREE, enable iterative testing and refinement.

Evaluation of Phylogenetic Approaches

Maximum parsimony methods are highly susceptible to long branch attraction, particularly in the Felsenstein zone, where simulations demonstrate inconsistency rates exceeding 50% for certain parameter combinations involving unequal branch lengths and limited sequence data. This artifact arises because parsimony favors trees minimizing evolutionary changes without accounting for multiple substitutions, leading to erroneous grouping of rapidly evolving lineages. In contrast, distance-based methods like neighbor-joining exhibit moderate vulnerability; uncorrected distances often fail in LBA-prone scenarios, but corrections incorporating rate heterogeneity, such as the , improve accuracy by up to 40% in benchmark simulations by better estimating evolutionary distances. Maximum likelihood approaches prove more robust against LBA when employing appropriate substitution models, as they explicitly model branch lengths and site-specific rate variation, thereby reducing artifactual attractions observed in parsimony analyses. For instance, incorporating invariable sites (+I) and gamma-distributed rates (+Γ) substantially mitigates LBA in simulations in cases of heterotachous evolution. similarly integrates posterior probabilities to average over uncertainties, rendering it generally effective, though poor prior specifications can amplify LBA biases under heterogeneous site evolution; site-heterogeneous models like CAT excel in complex datasets by partitioning sites into rate categories, suppressing artifacts that persist in uniform models. Empirical evaluations through simulation studies in the 2000s consistently showed maximum likelihood outperforming distance methods like , which assumes a and succumbs to LBA with rate variation. More recent phylogenomic reviews from 2015 onward, analyzing thousands of genes across , quantify LBA resolution rates improving to over 85% with model-based methods, coalescent approaches like ASTRAL, and increased taxon sampling, though remnants persist in deep divergences without site-heterogeneous modeling.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.