Hubbry Logo
List of phylogenetics softwareList of phylogenetics softwareMain
Open search
List of phylogenetics software
Community hub
List of phylogenetics software
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
List of phylogenetics software
List of phylogenetics software
from Wikipedia

This list of phylogenetics software is a compilation of computational phylogenetics software used to produce phylogenetic trees. Such tools are commonly used in comparative genomics, cladistics, and bioinformatics. Methods for estimating phylogenies include neighbor-joining, maximum parsimony (also simply referred to as parsimony), unweighted pair group method with arithmetic mean (UPGMA), Bayesian phylogenetic inference, maximum likelihood, and distance matrix methods.

List

[edit]
Name Description Methods Author
ADMIXTOOLS[1] R software package that contains the qpGraph, qpAdm, qpWave, and qpDstat programs Nick Patterson, David Reich
AncesTree[2] An algorithm for clonal tree reconstruction from multi-sample cancer sequencing data. Maximum Likelihood, Integer Linear Programming (ILP) M. El-Kebir, L. Oesper, H. Acheson-Field, B. J. Raphael
AliGROOVE[3] Visualisation of heterogeneous sequence divergence within multiple sequence alignments and detection of inflated branch support Identification of single taxa which show predominately randomized sequence similarity in comparison with other taxa in a multiple sequence alignment and evaluation of the reliability of node support in a given topology Patrick Kück, Sandra A Meid, Christian Groß, Bernhard Misof, Johann Wolfgang Wägele.
ape[4] R-Project package for analysis of phylogenetics and evolution Provides a large variety of phylogenetics functions Maintainer: Emmanuel Paradis
Armadillo Workflow Platform[5] Workflow platform dedicated to phylogenetic and general bioinformatic analysis Inference of phylogenetic trees using Distance, Maximum Likelihood, Maximum Parsimony, Bayesian methods and related workflows E. Lord, M. Leclercq, A. Boc, A.B. Diallo and V. Makarenkov
BAli-Phy[6] Simultaneous Bayesian inference of alignment and phylogeny Bayesian inference, alignment as well as tree search M.A. Suchard, B. D. Redelings
BATWING[7] Bayesian Analysis of Trees With Internal Node Generation Bayesian inference, demographic history, population splits I. J. Wilson, Weale, D.Balding
BayesPhylogenies[8] Bayesian inference of trees using Markov chain Monte Carlo methods Bayesian inference, multiple models, mixture model (auto-partitioning) M. Pagel, A. Meade
BayesTraits[9] Analyses trait evolution among groups of species for which a phylogeny or sample of phylogenies is available Trait analysis M. Pagel, A. Meade
BEAST[10] Bayesian Evolutionary Analysis Sampling Trees Bayesian inference, relaxed molecular clock, demographic history A. J. Drummond, M. A. Suchard, D Xie & A. Rambaut
BioNumerics Universal platform for the management, storage and analysis of all types of biological data, including tree and network inference of sequence data Neighbor-joining, maximum parsimony, UPGMA, maximum likelihood, distance matrix methods,... Calculation of the reliability of trees/branches using bootstrapping, permutation resampling or error resampling L. Vauterin & P. Vauterin.
Bosque Integrated graphical software to perform phylogenetic analyses, from the importing of sequences to the plotting and graphical edition of trees and alignments Distance and maximum likelihood methods (through PhyML, PHYLIP, Tree-Puzzle) S. Ramirez, E. Rodriguez.
BUCKy[11] Bayesian concordance of gene trees Bayesian concordance using modified greedy consensus of unrooted quartets C. Ané, B. Larget, D.A. Baum, S.D. Smith, A. Rokas and B. Larget, S.K. Kotha, C.N. Dewey
Canopy[12] Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing Maximum Likelihood, Markov Chain Monte Carlo (MCMC) methods Y. Jiang, Y. Qiu, A. J. Minn, and N. R. Zhang
CGRphylo[13] CGR method for accurate classification and tracking of rapidly evolving viruses Chaos Game Representation (CGR) method, based on concepts of statistical physics Amarinder Singh Thind, Somdatta Sinha
CITUP Clonality Inference in Tumors Using Phylogeny Exhaustive search, Quadratic Integer Programming (QIP) S. Malikic, A.W. McPherson, N. Donmez, C.S. Sahinalp
CladoGraph The main goal of Cladograph is to provide a user-friendly tool for students, teachers, and researchers to explore evolutionary relationships between different species. Trait analisys Pedro Andrade Giroldo
ClustalW Progressive multiple sequence alignment Distance matrix/nearest neighbor Thompson et al.[14]
CoalEvol Simulation of DNA and protein evolution along phylogenetic trees (that can also be simulated with the coalescent) Simulation of multiple sequence alignments of DNA or protein sequences M. Arenas, D. Posada
CodABC Coestimation of substitution, recombination and dN/dS in protein sequences Approximate Bayesian computation M. Arenas, J.S. Lopes, M.A. Beaumont, D. Posada
Dendroscope[15] Tool for visualizing rooted trees and calculating rooted networks Rooted trees, tanglegrams, consensus networks, hybridization networks Daniel Huson et al.
EXACT[16][17] EXACT is based on the perfect phylogeny model, and uses a very fast homotopy algorithm to evaluate the fitness of different trees, and then it brute forces the tree search using GPUs, or multiple CPUs, on the same or on different machines Brute force search and homotopy algorithm Jia B., Ray S., Safavi S., Bento J.
EzEditor[18] EzEditor is a java-based sequence alignment editor for rRNA and protein coding genes. It allows manipulation of both DNA and protein sequence alignments for phylogenetic analysis Multiple sequence alignment and editing Y.-S. Jeon, K. Lee, S.-C. Park, B.-S. Kim, Y.-J. Cho, S.-M. Ha, and J. Chun
fastDNAml Optimized maximum likelihood (nucleotides only) Maximum likelihood G.J. Olsen
FastTree 2[19] Fast phylogenetic inference for alignments with up to hundreds of thousands of sequences Approximate maximum likelihood M.N. Price, P.S. Dehal, A.P. Arkin
fitmodel Fits branch-site codon models without the need of prior knowledge of clades undergoing positive selection Maximum likelihood S. Guindon
Geneious Geneious provides genome and proteome research tools Neighbor-joining, UPGMA, MrBayes plugin, PhyML plugin, RAxML plugin, FastTree plugin, GARLi plugin, PAUP* Plugin A. J. Drummond, M.Suchard, V.Lefort et al.
HyPhy Hypothesis testing using phylogenies Maximum likelihood, neighbor-joining, clustering techniques, distance matrices S.L. Kosakovsky Pond, S.D.W. Frost, S.V. Muse
INDELible[20] Simulation of DNA/protein sequence evolution Simulation W. Fletcher, Z. Yang
IQPNNI (No longer maintained; superseded by IQ-TREE)[21] Iterative ML treesearch with stopping rule Maximum likelihood, neighbor-joining L.S. Vinh, A. von Haeseler, B.Q. Minh
IQ-Tree[22][23] An efficient phylogenomic software by maximum likelihood, as successor of IQPNNI and Tree-Puzzle Maximum likelihood, model selection, partitioning scheme finding, AIC, AICc, BIC, ultrafast bootstrapping,[24] branch tests, tree topology tests, likelihood mapping Lam-Tung Nguyen, O. Chernomor, H.A. Schmidt, A. von Haeseler, B.Q. Minh
jModelTest 2 A high-performance computing program to carry out statistical selection of best-fit models of nucleotide substitution Maximum likelihood, AIC, BIC, DT, hLTR, dLTR D. Darriba, GL. Taboada, R. Doallo, D. Posada
JolyTree[25][26] An alignment-free bioinformatics procedure to infer distance-based phylogenetic trees from genome assemblies, specifically designed to quickly infer trees from genomes belonging to the same genus MinHash-based pairwise genome distance, Balanced Minimum Evolution (BME), ratchet-based BME tree search, Rate of Elementary Quartets A. Criscuolo
LisBeth Three-item analysis for phylogenetics and biogeography Three-item analysis J. Ducasse, N. Cao & R. Zaragüeta-Bagils
MEGA Molecular Evolutionary Genetics Analysis Distance, Parsimony and Maximum Composite Likelihood Methods Tamura K, Dudley J, Nei M & Kumar S
MegAlign Pro MegAlign Pro is part of DNASTAR's Lasergene Molecular Biology package. This application performs multiple and pairwise sequence alignments, provides alignment editing, and generates phylogenetic trees. Maximum Likelihood (RAxML) and Neighbor-Joining DNASTAR
Mesquite Mesquite is software for evolutionary biology, designed to help biologists analyze comparative data about organisms. Its emphasis is on phylogenetic analysis, but some of its modules concern comparative analyses or population genetics, while others do non-phylogenetic multivariate analysis. It can also be used to build timetrees incorporating a geological timescale, with some optional modules. Maximum parsimony, distance matrix, maximum likelihood Wayne Maddison and D. R. Maddison
MetaPIGA2 Maximum likelihood phylogeny inference multi-core program for DNA and protein sequences, and morphological data. Analyses can be performed using an extensive and user-friendly graphical interface or by using batch files. It also implements tree visualization tools, ancestral sequences, and automated selection of best substitution model and parameters. Maximum likelihood, stochastic heuristics (genetic algorithm, metapopulation genetic algorithm, simulated annealing, etc.), discrete Gamma rate heterogeneity, ancestral state reconstruction, model testing Michel C. Milinkovitch and Raphaël Helaers
MicrobeTrace MicrobeTrace is a free, browser-based web application. 2D and 3D network visualization tool, Neighbor-joining tree visualization, Gantt charts, bubbles charts, networks visualized on maps, flow diagrams, aggregate tables, epi curves, histograms, alignment viewer, and much more. Ellsworth M. Campbell, Anthony Boyles, Anupama Shankar, Jay Kim, Sergey Knyazev, Roxana Cintron, William M. Switzer[27]
MNHN-Tree-Tools MNHN-Tree-Tools is an opensource phylogenetics inference software working on nucleic and protein sequences. Clustering of DNA or protein sequences and phylogenetic tree inference from a set of sequences. At the core it employs a distance-density based approach. Thomas Haschka, Loïc Ponger, Christophe Escudé and Julien Mozziconacci[28]
Modelgenerator Model selection (protein or nucleotide) Maximum likelihood Thomas Keane
MOLPHY Molecular phylogenetics (protein or nucleotide) Maximum likelihood J. Adachi and M. Hasegawa
MorphoBank Web application to organize trait data (morphological characters) for tree building for use with Maximum Parsimony (via the CIPRES portal), Maximum Likelihood, and Bayesian analysis) O'Leary, M. A., and S. Kaufman,[29] also K. Alphonse
MrBayes Posterior probability estimation Bayesian inference J. Huelsenbeck, et al.[30]
Network Free Phylogenetic Network Software Median Joining, Reduced Median, Steiner Network A. Roehl
Nona Phylogenetic inference Maximum parsimony, implied weighting, ratchet P. Goloboff
OrientAGraph Admixture graph reconstruction from allele frequencies f2-statistics or covariance matrix, maximum likelihood network orientation search implemented within TreeMix [31] Erin Molloy, Arun Durvasula, Sriram Sankararaman [32]
PAML[33] Phylogenetic analysis by maximum likelihood Maximum likelihood and Bayesian inference Z. Yang
ParaPhylo[34] Computation of gene and species trees based on event-relations (orthology, paralogy) Cograph-Editing and Triple-Inference Hellmuth
PartitionFinder[35] Combined selection of models of molecular evolution and partitioning schemes for DNA and protein alignments Maximum likelihood, AIC, AICc, BIC R. Lanfear, B Calcott, SYW Ho, S Guindon
PASTIS R package for phylogenetic assembly R, two‐stage Bayesian inference using MrBayes 3.2 Thomas et al. 2013[36]
PAUP* Phylogenetic analysis using parsimony (*and other methods) Maximum parsimony, distance matrix, maximum likelihood D. Swofford
phangorn[37] Phylogenetic analysis in R ML, MP, distance matrix, bootstrap, phylogentic networks, bootstrap, model selection, SH-test, SOWH-test Maintainer: K. Schliep
Phybase[38] an R package for species tree analysis phylogenetics functions, STAR, NJst, STEAC, maxtree, etc L. Liu & L. Yu
phyclust Phylogenetic Clustering (Phyloclustering) Maximum likelihood of Finite Mixture Modes Wei-Chen Chen
PHYLIP PHYLogeny Inference Package Maximum parsimony, distance matrix, maximum likelihood J. Felsenstein
phyloT Generates phylogenetic trees in various formats, based on NCBI taxonomy none I. Letunic
PhyloQuart Quartet implementation (uses sequences or distances) Quartet method V. Berry
PhyloWGS Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors MCMC A. G. Deshwar, S. Vembu, C. K. Yung, G. H. Jang, L. Stein, and Q. Morris
PhyML[39] Fast and accurate estimation of phylogenies using maximum likelihood Maximum likelihood S. Guindon & O. Gascuel
phyx[40] Unix/Linux command line phylogenetic tools Explore, manipulate, analyze, and simulate phylogenetic objects (alignments, trees, and MCMC logs) J.W. Brown, J.F. Walker, and S.A. Smith
POY A phylogenetic analysis program that supports multiple kinds of data and can perform alignment and phylogeny inference. A variety of heuristic algorithms have been developed for this purpose Maximum parsimony, Maximum likelihood, Chromosome rearrangement, discreet characters, continuous characters, Alignment A. Varon, N. Lucaroni, L. Hong, W. Wheeler
ProtASR2[41] Ancestral reconstruction of protein sequences accounting for folding stability Maximum likelihood, substitution models M. Arenas, U. Bastolla
ProtEvol Simulation of protein sequences under structurally constrained substitution models Simulating sequences, substitution models M. Arenas, A. Sanchez-Cobos, U. Bastolla U
ProteinEvolver Simulation of protein sequences along phylogenies under empirical and structurally constrained substitution models of protein evolution Simulating sequences forward in time, substitution models M. Arenas, H.G. Dos Santos, D. Posada, U. Bastolla
ProteinEvolverABC[42] Coestimation of recombination and substitution rates in protein sequences Approximate Bayesian computation M. Arenas
ProteinModelerABC[43] Selection among site-dependent structurally constrained substitution models of protein evolution Approximate Bayesian computation D. Ferreiro et al
ProtTest3 A high-performance computing program for selecting the model of protein evolution that best fits a given set of aligned sequences Maximum likelihood, AIC, BIC, DT D. Darriba, GL. Taboada, R. Doallo, D. Posada
PyCogent Software library for genomic biology Simulating sequences, alignment, controlling third party applications, workflows, querying databases, generating graphics and phylogenetic trees Knight et al.
QuickTree Tree construction optimized for efficiency Neighbor-joining K. Howe, A. Bateman, R. Durbin
RAxML-HPC Randomized Axelerated Maximum Likelihood for High Performance Computing (nucleotides and aminoacids) Maximum likelihood, simple Maximum parsimony A. Stamatakis
RAxML-NG[44] Randomized Axelerated Maximum Likelihood for High Performance Computing (nucleotides and aminoacids) Next Generation Maximum likelihood, simple Maximum parsimony A. Kozlov, D. Darriba, T. Flouri, B. Morel, A. Stamatakis
RevBayes[45] RevBayes provides an interactive environment for statistical computation in phylogenetics. It is primarily intended for modeling, simulation, and Bayesian inference in evolutionary biology, particularly phylogenetics. However, the environment is quite general and can be useful for many complex modeling tasks. Bayesian inference S. Höhna et al.[46]
SEMPHY Tree reconstruction using the combined strengths of maximum-likelihood (accuracy) and neighbor-joining (speed). SEMPHY has become outdated. The authors now refer users to RAxML, which is superior in accuracy and speed. A hybrid maximum-likelihood – neighbor-joining method M. Ninio, E. Privman, T. Pupko, N. Friedman
SGWE Simulation of genome-wide evolution along phylogenetic trees Simulating genome-wide sequences forward time Arenas M., Posada D.
SimPlot++[47] Sequence similarity plots (SimPlots[48]), detection of intragenic and intergenic recombination events, bootscan analysis[49] and sequence similarity networks SimPlot using different nucleotide/protein distance models; Phi, χ2 and NSS recombination tests; Sequence similarity network analysis S. Samson, E. Lord, V. Makarenkov
sowhat[50] Hypothesis testing SOWH test Samuel H Church, Joseph F Ryan, and Casey W Dunn
Splatche3[51] Simulation of genetic data under diverse spatially explicit evolutionary scenarios Coalescent, molecular evolution, DNA sequences, SNPs, STRs, RFLPs M. Currat et al.
SplitsTree[52] Tree and network program Computation, visualization and exploration of phylogenetic trees and networks D.H. Huson and D. Bryant
TNT[53] Phylogenetic inference Parsimony, weighting, ratchet, tree drift, tree fusing, sectorial searches P. A. Goloboff, J. S. Farris, and K. C. Nixon
TOPALi Phylogenetic inference Phylogenetic model selection, Bayesian analysis and Maximum Likelihood phylogenetic tree estimation, detection of sites under positive selection, and recombination breakpoint location analysis Iain Milne, Dominik Lindner et al.
TreeGen Tree construction given precomputed distance data Distance matrix ETH Zurich
TreeAlign Efficient hybrid method Distance matrix and approximate parsimony J. Hein
TreeLine Tree construction algorithm within the DECIPHER package for R Maximum likelihood, maximum parsimony, and distance E. Wright
Treefinder[54] Fast ML tree reconstruction, bootstrap analysis, model selection, hypothesis testing, tree calibration, tree manipulation and visualization, computation of sitewise rates, sequence simulation, many models of evolution (DNA, protein, rRNA, mixed protein, user-definable), GUI and scripting language Maximum likelihood, distances, and others Jobb G, von Haeseler A, Strimmer K
TreeMix Admixture graph reconstruction from allele frequencies f2-statistics or covariance matrix, maximum likelihood, heuristic search (building tree via randomized taxon addition and then adding admixture edges) Joseph K. Pickrell and Jonathan K. Pritchard [55]
Tree-Puzzle[56] (No longer maintained; superseded by IQ-TREE)[57] Maximum likelihood and statistical analysis Maximum likelihood H. A. Schmidt, K. Strimmer, M. Vingron, and A. von Haeseler
TREE-QMC Summarizes unrooted gene trees into unrooted species tree Graph-cut-based heuristic for maximum quartet support species tree problem [58][59] Yunheng Han, Erin Molloy [60][61]
T-REX (Webserver)[62][63] Tree inference and visualization, Horizontal gene transfer detection, multiple sequence alignment Distance (neighbor joining), Parsimony and Maximum likelihood (PhyML, RAxML) tree inference, MUSCLE, MAFFT and ClustalW sequence alignments and related applications Boc A, Diallo AB, Makarenkov V
UShER[64] Phylogenetic placement using maximum parsimony for viral genomes Maximum parsimony Turakhia Y, Thornlow B, Hinrichs AS, De Maio N, Gozashti L, Lanfear R, Haussler D and Corbett-Detig R
UGENE Fast and free multiplatform tree editor GUI with PHYLIP 3.6 and IQTree algorithms Unipro
VeryFastTree[65] A highly-tuned tool that uses parallelizing and vectorizing strategies to speed inference of phylogenies for huge alignments Approximate maximum likelihood César Piñeiro. José M. Abuín and Juan C. Pichel
Winclada GUI and tree editor (requires Nona) Maximum parsimony, ratchet K. Nixon
Xrate Phylo-grammar engine Rate estimation, branch length estimation, alignment annotation I. Holmes

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Phylogenetics software encompasses a diverse array of computational tools designed to reconstruct and analyze phylogenetic trees, which illustrate the evolutionary relationships among biological organisms or taxa based on similarities in molecular sequences, such as DNA or proteins, or morphological traits. These programs facilitate the inference of evolutionary histories by applying mathematical models to genetic or phenotypic data, enabling researchers to explore patterns of descent, divergence, and adaptation across species. Such software typically implements a range of methodologies, including distance-matrix approaches that calculate evolutionary distances between sequences, parsimony methods that minimize the number of evolutionary changes, maximum likelihood techniques that optimize probabilistic models of sequence evolution, and frameworks that incorporate prior probabilities to estimate posterior distributions of tree topologies. Notable examples include for general phylogenetic inference, MrBayes for Bayesian analysis, and BEAST for time-calibrated phylogenies, each addressing specific aspects of data alignment, tree searching, and hypothesis testing. Lists of phylogenetics software serve as essential directories for scientists in , , and bioinformatics, compiling hundreds of free and commercial packages alongside web-based servers to promote accessibility and reproducibility in research. One of the most extensive such compilations, maintained by evolutionary biologist Joseph Felsenstein, as of 2012 documents 392 phylogeny packages and 54 free web servers—no longer actively updated—categorized by analytical methods, data types, and platform compatibility, highlighting the field's rapid growth and interdisciplinary applications. These resources underscore the software's role in advancing genomic studies, , and conservation genetics by providing robust tools for handling large-scale datasets from next-generation sequencing.

Overview

Definition and Scope

Phylogenetics software encompasses computational tools that infer evolutionary relationships among organisms or taxa by reconstructing phylogenetic trees from biological data. These trees model the branching patterns of descent with modification, illustrating how species, genes, or other entities have diverged over time from common ancestors. The software processes diverse data types, including molecular sequences such as DNA, RNA, or proteins, as well as morphological traits like anatomical structures, to estimate these relationships through statistical and algorithmic methods. The scope of phylogenetics software extends to key stages of analysis, including sequence alignment to prepare homologous data, tree construction via approaches like distance matrices or likelihood models, evolutionary model selection to account for substitution rates and patterns, and validation techniques such as bootstrap resampling to assess tree robustness. These tools support interdisciplinary applications in comparative genomics for elucidating gene family evolution, systematics for taxonomic classification, and epidemiology for reconstructing transmission histories of pathogens. Input data for phylogenetics software typically includes aligned molecular sequences in formats like or , or precomputed distance matrices representing pairwise dissimilarities between taxa. Outputs are primarily phylogenetic trees encoded in standardized formats such as Newick, which uses nested parentheses to denote branching structures with lengths, or , which supports extended annotations for metadata. Representative applications of phylogenetics software include species tree reconstruction to map and evolutionary divergence, viral phylogenies to trace outbreak origins and transmission dynamics, and to identify evolutionarily distinct lineages for priority protection.

Key Methodologies

Phylogenetics software employs several core methodologies to infer evolutionary relationships among taxa from molecular or morphological data. Distance-based methods begin by computing pairwise evolutionary distances between sequences, correcting for multiple substitutions using probabilistic models such as the Jukes-Cantor model, which assumes equal rates of substitution and estimates the true distance dd from the observed proportion of differences pp via d=34ln(143p)d = -\frac{3}{4} \ln\left(1 - \frac{4}{3}p\right)[]. These distances form a matrix that serves as input for clustering algorithms like unweighted pair group method with arithmetic mean (), which builds hierarchical s assuming a constant , or neighbor-joining, which relaxes this assumption by iteratively joining pairs of taxa that minimize total branch length, producing an unrooted efficient for large datasets. These approaches are computationally efficient but sensitive to rate variation across lineages, potentially leading to long-branch attraction artifacts. Parsimony-based methods seek the tree requiring the fewest evolutionary changes, or "steps," in character states across branches, embodying by minimizing ad hoc hypotheses of transformation. For a given topology, the minimum number of steps is computed using algorithms like Fitch's, which propagates possible ancestral states from leaves to and backtracks to changes. Exact solutions employ branch-and-bound searches, which suboptimal subtrees by bounding the parsimony score below a current minimum, guaranteeing optimality for small to moderate datasets despite exponential complexity. alternatives, such as stepwise addition or tree bisection-reconnection, approximate solutions for larger problems but risk local . Maximum likelihood methods evaluate tree topologies and branch parameters by maximizing the probability of observing the data under an explicit evolutionary model, typically via the L=i=1nP(xiT,θ)L = \prod_{i=1}^{n} P(x_i \mid T, \theta), where xix_i are site patterns, TT the , and θ\theta model parameters like substitution rates. Optimization involves iterative algorithms such as hill-climbing to search tree space while estimating parameters via expectation-maximization or , allowing through likelihood ratio tests. This framework accommodates complex models like general time-reversible for , providing via , though computational demands scale poorly with sequence length and number. Bayesian inference integrates prior probabilities on trees and parameters with the likelihood to compute posterior distributions, P(T,θX)P(XT,θ)P(T,θ)P(T, \theta \mid X) \propto P(X \mid T, \theta) P(T, \theta), using (MCMC) sampling to explore high-dimensional parameter spaces and generate credible sets of trees. Chains are run for millions of generations, discarding and assessing convergence via trace plots or effective sample sizes, enabling quantification of uncertainty and incorporation of priors like birth-death models for divergence times. This method excels in handling heterogeneous data partitions but requires careful prior specification and longer run times compared to point-estimate approaches. Other methodologies include supertree approaches, which combine overlapping source trees into a comprehensive phylogeny using with parsimony or quartet-based encoding to resolve conflicts across studies. Simulation-based methods generate under hypothesized models to test tree robustness or method performance, often via parametric to evaluate significance of topological differences. Emerging approaches incorporate and to enhance tree search and model parameter estimation, particularly for large-scale genomic data (as of 2025).

Historical Development

Early Computational Tools (Pre-1990)

The development of computational tools for phylogenetics in the pre-1990 era marked a pivotal shift from manual cladistic methods to automated inference of evolutionary trees, driven by the advent of affordable microcomputers and early molecular data. These tools, often written in FORTRAN for mainframe and personal computers, focused on implementing foundational algorithms such as distance clustering, parsimony optimization, and initial likelihood approaches, primarily for small datasets of DNA sequences or morphological characters. Their simplicity reflected the computational constraints of the time, yet they democratized phylogenetic analysis among biologists. A landmark in this period was the release of (Phylogeny Inference Package) in October 1980 by Joseph Felsenstein at the . This free, comprehensive package comprised multiple standalone programs for inferring phylogenies from molecular data, supporting distance-based methods (e.g., neighbor-joining precursors), parsimony, and under simple substitution models. PHYLIP's modular design allowed users to chain programs for sequential analysis, from to tree output in , and it quickly gained adoption for its accessibility and documentation. Complementing PHYLIP, David L. Swofford introduced PAUP (Phylogenetic Analysis Using Parsimony) in 1981 while at the Illinois Natural History Survey. Initially tailored for parsimony-based reconstruction, PAUP handled both morphological and molecular datasets, incorporating branch-and-bound searches to approximate most parsimonious trees amid the of possible topologies. By in the mid-1980s, it added distance methods and compatibility with early sequence formats, making it a staple for cladistic studies. Distance methods, rooted in the 1958 UPGMA algorithm by Sokal and Michener, found early software implementation in packages like BIOSYS-1, co-developed by Swofford and Richard B. Selander in 1981. This program targeted electrophoretic allozyme data for and , featuring and related clustering for phylogenetic trees from allele frequency matrices. It emphasized phenetic relationships over explicit evolutionary models, providing bootstrapping-like resampling in later iterations to assess cluster stability. These pioneering tools shared key limitations inherent to hardware and programming paradigms: strictly command-line interfaces required scripted execution without user-friendly GUIs or visual tree rendering; analyses were confined to modest datasets (often <20 taxa and <1000 sites) due to memory and processing limits; and they employed rudimentary models lacking corrections for rate heterogeneity or multiple substitutions, leading to potential biases in tree accuracy. Outputs were typically text-based tree descriptions, necessitating manual interpretation. Despite these constraints, early tools like and PAUP profoundly impacted systematics by enabling reproducible computational phylogenies, fueling debates on parsimony versus statistical methods in cladistics, and inspiring subsequent innovations in evolutionary biology. Their open distribution fostered global collaboration, with alone registering thousands of users by the late 1980s and influencing the integration of phylogenetics into broader genetic research.

Modern and Specialized Packages (1990-Present)

The 1990s marked a pivotal shift in phylogenetics software toward likelihood-based inference, driven by the need for more sophisticated evolutionary models amid growing sequence data from molecular biology. PHYLIP, which already included maximum likelihood methods since the early 1980s, underwent significant expansions in this era to incorporate a broader range of substitution models and other enhancements, enabling users to estimate phylogenies under more complex scenarios like heterogeneous rates across sites. Similarly, PAUP* evolved from its parsimony-focused roots to include maximum likelihood capabilities in versions released during the 1990s, supporting nucleotide, amino acid, and codon models for rigorous hypothesis testing. A notable innovation was the introduction of TREE-PUZZLE in 1995, which implemented quartet puzzling—a heuristic approach to maximum likelihood tree reconstruction that approximates full likelihood searches by combining quartet topologies, offering computational efficiency for larger datasets at the time. The 2000s ushered in the Bayesian revolution, transforming phylogenetics by integrating probabilistic inference with Markov chain Monte Carlo (MCMC) sampling to quantify uncertainty in tree topologies and parameters. MrBayes, released in 2001, popularized this paradigm by allowing users to sample posterior distributions of phylogenies under various models, including partitioned data for multi-gene analyses, and became a standard for empirical Bayesian phylogenetics due to its accessibility and robust convergence diagnostics. Building on this, BEAST emerged in 2003 as a Bayesian framework that extended MCMC to incorporate relaxed molecular clocks, enabling divergence time estimation from calibrated phylogenies and addressing temporal heterogeneity in evolutionary rates—critical for studies in molecular epidemiology and macroevolution. These tools democratized Bayesian methods, shifting the field from point estimates to full posterior explorations, though they initially required substantial computational resources. Post-2010 developments responded to the phylogenomics era, where next-generation sequencing generated massive alignments, necessitating scalable algorithms for maximum likelihood inference. RAxML, first published in 2004 and continuously updated, optimized rapid bootstrapping and tree searches for large datasets using parallel computing on multicore systems and clusters, achieving up to 100-fold speedups over predecessors for alignments exceeding 10,000 taxa. IQ-TREE, introduced in 2014, advanced this further with integrated model selection via ModelFinder, which employs hill-climbing algorithms to identify optimal substitution models, and UFBoot for ultrafast bootstrap approximations, facilitating accurate phylogenies from phylogenomic data in hours rather than days. More recent advancements include the release of IQ-TREE 2 in 2020, which improved efficiency and support for complex models, and RAxML-NG in 2018, enhancing scalability for massive datasets through optimized likelihood computations. Broader trends included the adoption of graphical user interfaces, as seen in MEGA's evolution since the 1990s to provide intuitive platforms for alignment, model testing, and tree visualization, alongside open-source collaboration on platforms like GitHub for community-driven enhancements. Integration with next-generation sequencing (NGS) pipelines became standard, addressing scalability for thousands of loci, improved uncertainty quantification through ensemble methods, and hybrid approaches combining likelihood with machine learning for anomaly detection in alignments. These advancements have enabled phylogenetics to handle the data explosion while maintaining statistical rigor.

Categorized Lists

Alignment and Data Preparation Tools

Alignment and data preparation tools are essential in phylogenetics for generating multiple sequence alignments (MSAs), which arrange homologous sequences to identify conserved regions and facilitate subsequent tree-building analyses. These tools handle the preprocessing of raw sequence data, addressing challenges such as insertions, deletions (indels), and varying sequence lengths to produce aligned inputs in standard formats like or for downstream phylogenetic inference. Clustal Omega, developed in 2011 by researchers at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), employs a progressive alignment strategy optimized for large datasets, capable of aligning hundreds of thousands of sequences in hours. It constructs guide trees using mBed, a multicore implementation of k-means clustering for rapid distance calculation, and incorporates profile-profile alignments via HHblits for enhanced accuracy on protein sequences. The tool supports DNA, protein, and even structural data, with robust handling of indels through its iterative refinement phase. MAFFT, introduced in 2002 by Kazutaka Katoh and colleagues at Kyoto University, utilizes a fast Fourier transform (FFT) approach to compute alignments efficiently, reducing CPU time compared to earlier methods while maintaining high accuracy. Its core algorithm involves iterative refinement, where initial progressive alignments are progressively improved through local rearrangements; specialized variants like G-INS-i prioritize global homology for small, accurate alignments on datasets up to thousands of sequences. MAFFT accommodates both nucleotide and amino acid sequences, excelling in profile alignments for homologous sequence sets. MUSCLE, released in 2004 by Robert C. Edgar, implements multiple sequence comparison by log-expectation (MUSCLE), a three-phase algorithm that balances speed and precision for protein and DNA alignments. The method begins with a progressive alignment using a neighbor-joining guide tree based on k-mer distances, followed by iterative refinement via dynamic programming to optimize scores, and concludes with tree-dependent refinement for final adjustments. It supports profile alignments and outputs in multiple formats, making it suitable for datasets of moderate size (up to 10,000 sequences). T-Coffee, created in 2000 by Cedric Notredame and colleagues at the Centre for Genomic Regulation (CRG) in Barcelona, adopts a consistency-based framework that builds a library of pairwise alignments to guide progressive multiple alignment construction. This approach integrates local and global alignments, enhancing accuracy by penalizing inconsistencies across sequence pairs, and extends to structural data via templates like 3D coordinates. The tool handles indels effectively through its progressive and library extension methods, supporting DNA, RNA, protein, and hybrid inputs for versatile phylogenetic preparation. Common features across these tools include sophisticated indel modeling to preserve evolutionary gaps, support for profile-based alignments to incorporate pre-existing MSAs, and export options in phylogenetics-compatible formats such as FASTA and Clustal for integration into tree inference pipelines like maximum likelihood or Bayesian methods.

Distance-Based and Clustering Methods

Distance-based and clustering methods in phylogenetics rely on pairwise evolutionary distances derived from aligned sequences to infer tree topologies, offering computational efficiency for large datasets compared to character-based approaches. These methods first compute a distance matrix using models that account for multiple substitutions, such as the Poisson correction for amino acid sequences or the Kimura 2-parameter model for nucleotides, which estimate the actual number of changes rather than observed differences. Clustering algorithms then agglomerate taxa based on these distances; unweighted pair group method with arithmetic mean () assumes a molecular clock and produces ultrametric trees, while neighbor-joining (NJ) relaxes this assumption to handle rate heterogeneity, minimizing total branch length for additive distances where path lengths match observed distances. Non-additive distances, arising from rate variation or estimation errors, are addressed through corrections in advanced implementations. Bootstrapping, by resampling alignments to generate pseudoreplicates, assesses branch support in these trees, providing statistical confidence without assuming additivity. PHYLIP (Phylogeny Inference Package), developed since 1980 and continuously updated, is a foundational free software suite for distance-based phylogenetics, distributed as modular programs for Unix, Windows, and Mac systems. It includes PROTDIST for calculating protein distances using models like the Poisson correction, which assumes equal substitution rates across sites, and DNADIST for nucleotide distances with options such as the Kimura 2-parameter model to correct for transitions and transversions. The SEQBOOT program enables bootstrapping by generating multiple datasets from input alignments, while NEIGHBOR implements NJ and UPGMA clustering to build trees from distance matrices, supporting both additive and approximate handling of non-additive distances through star-like decompositions. PHYLIP's design emphasizes flexibility for batch processing large alignments post-preparation, with over 30 programs integrated for comprehensive analysis. MEGA (Molecular Evolutionary Genetics Analysis), first released in 1993 and actively maintained by researchers at Penn State University, provides an integrated graphical interface for distance-based tree inference across DNA, protein, and gene frequency data. It computes distances using a suite of models, including the Kimura 2-parameter for nucleotides and Poisson correction for amino acids, with built-in handling of gamma-distributed rate variation to address non-additive distances. MEGA supports for clock-like data and minimum evolution (a variant of NJ) for general cases, incorporating bootstrapping via 100–1000 replicates to evaluate node reliability, often visualized directly in the software. Its user-friendly workflow, from distance estimation to tree output, has made it widely adopted for educational and research purposes, with recent versions like MEGA12 optimizing for multi-core processing of datasets up to thousands of sequences. QuickTree, introduced in 1999 by developers at the University of Oxford and the Sanger Institute, is a specialized command-line tool optimized for rapid NJ tree construction from large protein sequence alignments, achieving speeds over 100 times faster than standard NJ for datasets exceeding 10,000 taxa. It employs a four-cluster analysis to approximate the NJ criterion, efficiently handling additive distances while tolerating moderate non-additivity through sequential clustering, without built-in bootstrapping but compatible with external resampling tools. QuickTree's focus on scalability has been pivotal for analyzing massive protein families, such as Pfam alignments with 27,000 HIV GP120 sequences, making it ideal for initial exploratory phylogenies. BioNJ, developed in 2001 at LIRMM (Montpellier), enhances the NJ algorithm with bio-inspired corrections for among-site rate variation, improving topological accuracy for distance matrices from DNA or protein sequences under heterogeneous evolutionary models. It dynamically estimates variance in branch lengths to better approximate additive distances, outperforming standard NJ in simulations with gamma-distributed rates, and supports bootstrapping integration for robustness assessment. BioNJ's implementation remains available as a standalone executable, emphasizing precision over speed for moderately sized datasets where rate heterogeneity distorts non-additive distances.

Parsimony-Based Methods

Parsimony-based methods in phylogenetics software infer evolutionary trees by selecting those that require the minimum number of character state changes, known as steps, to explain the observed data. This approach, rooted in the principle of maximum parsimony, is particularly suited for datasets with morphological characters or small molecular sequences where the goal is to minimize homoplasy without assuming an evolutionary model. Software implementing these methods typically supports exhaustive searches for small datasets, branch-and-bound algorithms to guarantee optimality, and heuristic strategies for larger ones, while handling multistate characters either as unordered (Fitch parsimony) or ordered (Wagner parsimony). However, parsimony analyses can be sensitive to long-branch attraction, where rapidly evolving lineages artifactually group together, though advanced heuristics like sectorial search can mitigate this by exploring tree space more thoroughly. PAUP* (Phylogenetic Analysis Using Parsimony and Other Methods), developed by David L. Swofford since 1981 and maintained ongoing, is a foundational tool for parsimony-based tree inference. It offers exhaustive searches that evaluate all possible trees for datasets up to about 15 taxa, branch-and-bound methods to find optimal trees without exhaustive enumeration, and heuristic approaches like stepwise addition with tree-bisection-reconnection (TBR) branch swapping for larger matrices. PAUP accommodates both unordered and ordered characters, as well as compatibility-based methods that seek cliques of compatible characters, making it versatile for discrete data analysis. TNT (Tree Analysis Using New Technology), released in 2001 by an Argentine team led by Pablo Goloboff and ongoing, excels in fast heuristic parsimony searches optimized for large matrices, often handling thousands of taxa efficiently. Its algorithms include implicit enumeration for exact solutions on moderate datasets and innovative tree-fusing techniques that merge suboptimal trees to escape local optima, combined with sectorial search to reduce long-branch attraction effects. TNT uses step counts as the primary optimality criterion and supports implied weighting schemes to downweight homoplasious characters, enhancing robustness for complex datasets. NONA, introduced in 1995 by Pablo Goloboff as a command-line parsimony program and still available, focuses on efficient searches for optimal trees using the parsimony ratchet, which perturbs character weights to sample tree space beyond local optima. It handles multistate characters through non-additive or additive coding and integrates implied weights for successive approximations, prioritizing minimal step counts while allowing user-defined weights. NONA's design emphasizes speed for morphological data, often serving as a precursor to more advanced tools like TNT.

Maximum Likelihood Methods

Maximum likelihood (ML) methods for phylogenetic inference seek to find the tree topology, branch lengths, and model parameters that maximize the likelihood of the observed sequence data under a specified evolutionary model, typically incorporating nucleotide substitution rates, site heterogeneity, and among-site rate variation.http://www.atgc-montpellier.fr/phyml/paper.php These approaches provide a statistical framework for tree estimation, outperforming simpler methods in accuracy for complex datasets by explicitly modeling evolutionary processes such as transitions and transversions via models like the General Time Reversible (GTR) framework.https://academic.oup.com/sysbio/article/52/5/696/1641391 Key software in this category emphasizes efficient heuristic searches to handle large alignments, often integrating features like Gamma-distributed site-specific rates combined with invariant sites for better rate heterogeneity modeling, joint branch length estimation across partitions, and support for heterogeneous data from multiple genes or loci.https://cme.h-its.org/exelixis/resource/download/NewManual.pdf PhyML, first released in 2003 and continuously developed by Stéphane Guindon and colleagues, employs a heuristic hill-climbing algorithm based on nearest-neighbor interchanges (NNI) for topology searches, enabling rapid ML estimation on nucleotide and amino acid alignments.https://academic.oup.com/sysbio/article/52/5/696/1641391 It supports a range of substitution models including GTR and incorporates approximate likelihood ratio tests (aLRT) for branch support assessment, making it suitable for datasets up to thousands of sequences while accommodating Gamma + invariant sites for site-rate variation and optimized branch lengths.https://academic.oup.com/sysbio/article/59/3/307/1702850 RAxML, initiated in 2004 by Alexandros Stamatakis and under ongoing maintenance, utilizes randomized accelerated ML searches with subtree-pruning-regrafting (SPR) moves and rapid bootstrapping heuristics to infer trees from large phylogenomic datasets.https://academic.oup.com/bioinformatics/article/22/21/2688/240496 Designed for scalability, it leverages Pthreads for shared-memory parallelism and MPI for distributed computing, efficiently processing alignments with millions of sites while supporting partitioned models, GTR-based evolution, and site-specific rates via Gamma + invariant sites for heterogeneous data handling.https://academic.oup.com/bioinformatics/article/30/9/1312/239061 IQ-TREE, developed since 2014 by Lam-Tung Nguyen, Bui Quang Lam, and team, integrates ModelFinder for automated model selection using the Bayesian information criterion (BIC) and employs stochastic hill-climbing with UFBoot for ultrafast bootstrap approximations, achieving high accuracy on diverse molecular data.https://academic.oup.com/mbe/article/32/1/268/2925592 It excels in branch length optimization and supports advanced features like mixture models for site heterogeneity, Gamma + invariant sites, and partitioned analyses for multi-gene datasets, often outperforming competitors in speed and likelihood scores on large-scale phylogenomics.https://academic.oup.com/mbe/article/37/7/1911/5810651 GARLI, released in 2005 by Derrick Zwickl at the University of Kansas and actively maintained, applies a genetic algorithm for simultaneous optimization of tree topology and parameters under ML, facilitating robust searches on nucleotide, codon, and amino acid data.https://www.bio.utexas.edu/faculty/antisense/garli/Garli.html It handles partitioned models for heterogeneous evolutionary rates across loci, incorporates Gamma + invariant sites for site-specific variation, and estimates branch lengths efficiently, making it effective for complex datasets requiring fine-tuned likelihood maximization.https://repositories.lib.utexas.edu/bitstream/handle/2152/3108/zwickl347.pdf

Bayesian Inference Methods

Bayesian inference methods in phylogenetics employ Markov chain Monte Carlo (MCMC) sampling to approximate the posterior distribution of phylogenetic trees, integrating prior probabilities with the likelihood of sequence data to quantify uncertainty through posterior probabilities of clades and parameters. Unlike maximum likelihood approaches, which optimize point estimates, these methods generate distributions that reflect evolutionary model uncertainty and provide credible intervals for branch lengths and divergence times. Key software in this category focuses on flexible prior specifications, such as uniform priors on tree topologies, which assume equal probability across all possible unrooted topologies to represent ignorance about relationships. MrBayes, released in 2001 and developed by Fredrik Ronquist and colleagues, implements Bayesian phylogenetic inference using MCMC sampling based on the Metropolis-Hastings algorithm to explore tree space and estimate posterior probabilities. It supports mixed evolutionary models, allowing different partitions of data (e.g., genes or codon positions) to evolve under distinct substitution models, which enhances accuracy for heterogeneous datasets. Convergence is assessed via diagnostics like the average standard deviation of split frequencies, where values below 0.01 indicate adequate chain mixing, alongside effective sample size (ESS) calculations to ensure sufficient independent samples from the posterior, typically requiring ESS > 200 for reliable estimates. BEAST, introduced in 2002 by Andrew Rambaut, Alexei J. Drummond, and others, extends Bayesian analysis to incorporate molecular clocks for estimating divergence times, using relaxed clock models that allow rate variation across branches while sampling from the posterior distribution of trees and parameters. Its StarBEAST extension applies the (MSC) model to infer trees from multiple trees, accounting for incomplete lineage sorting by jointly estimating and phylogenies. Like other Bayesian tools, BEAST relies on ESS for convergence monitoring, where low ESS values signal poor chain mixing and necessitate longer runs or multiple chains. BayesPhylogenies, developed around 2004 at the (associated with collaborators at the ), specializes in variable-rate models for heterogeneous , using reversible jump MCMC to sample across models with differing numbers of rate categories for traits or substitutions. This allows inference of phylogenetic trees while accommodating rate shifts, such as in linguistic or , with priors on rates often drawn from gamma distributions to reflect uncertainty in evolutionary tempo. BAli-Phy, released in 2006 by Paul O. Lewis and Marc A. Suchard, uniquely co-estimates multiple sequence alignments and phylogenetic within a Bayesian framework, sampling from the joint posterior to propagate alignment uncertainty into tree inferences. It employs MCMC to explore and substitution models simultaneously, improving accuracy for divergent sequences where alignment errors can bias phylogeny. Across these tools, estimation is crucial for model comparison, with methods like stepping-stone sampling providing unbiased estimates by bridging the prior and posterior through a series of intermediate distributions, often yielding more accurate Bayes factors than thermodynamic integration. Prior specifications, such as distributions on topologies, neutrality in tree selection, while ESS metrics guide users in discarding periods to achieve stationary sampling.

Visualization and Post-Analysis Tools

Visualization and post-analysis tools in phylogenetics enable researchers to render, explore, and interpret inferred phylogenetic trees, often incorporating support values such as bootstrap proportions or posterior probabilities from methods like maximum likelihood or . These tools facilitate tasks such as tree editing, consensus computation, and comparative analyses, providing insights into evolutionary relationships beyond initial tree construction. Outputs from software, typically in Newick or Nexus formats, serve as inputs for these visualization platforms to generate publication-ready figures and perform quantitative comparisons. FigTree, developed by Andrew Rambaut in 2006, is a Java-based graphical viewer designed for displaying and annotating phylogenetic trees in Newick and formats. It offers customizable layouts, including radial and rectangular tree representations, along with options to label nodes with support values and branch lengths, and supports high-quality exports to vector formats like and PDF for publications. This tool is particularly valued for its simplicity and flexibility in preparing trees for scientific communication without requiring advanced programming knowledge. Dendroscope, first released in 2006 by researchers at the , provides an interactive platform for visualizing large rooted phylogenetic trees and networks in three dimensions. It excels in handling supertrees and computing consensus trees, such as majority-rule summaries, to summarize multiple inferred topologies, and includes tools for generating tanglegrams to reconcile pairs of trees, such as host-parasite phylogenies. The software supports efficient navigation of datasets with thousands of taxa through zooming, panning, and editing features, making it suitable for exploratory post-analysis. TreeView, authored by Roderic D. M. Page in 1998, is a lightweight viewer for rooted and unrooted phylogenetic trees compatible with Windows and Mac platforms. It displays bootstrap support values and branch lengths directly on trees imported from formats like or , allowing users to reroot trees and adjust visual parameters for clarity. Though simpler than modern alternatives, its cross-platform accessibility has made it a longstanding choice for quick inspections of inference outputs. Mesquite, initiated in 2001 by Wayne P. Maddison and ongoing through the Mesquite Project, serves as a modular system for comparative phylogenetic analysis and evolutionary data management. It supports ancestral state reconstruction using methods like squared-change parsimony to infer character evolution on s, and integrates tools for consensus computation from sets of trees. Additionally, Mesquite enables tanglegram visualization for comparing topologies and calculates metrics, including the Robinson-Foulds , which quantifies symmetric differences between bifurcating structures by counting unique and shared bipartitions. This extensibility via plugins allows tailored post-analysis workflows, emphasizing conceptual evolutionary insights over raw computation.

References

  1. https://www.[researchgate](/page/ResearchGate).net/publication/258368548_Phylip_and_Phylogenetics
Add your contribution
Related Hubs
User Avatar
No comments yet.