Recent from talks
All channels
Be the first to start a discussion here.
Be the first to start a discussion here.
Be the first to start a discussion here.
Be the first to start a discussion here.
Welcome to the community hub built to collect knowledge and have discussions related to List of phylogenetics software.
Nothing was collected or created yet.
List of phylogenetics software
View on Wikipediafrom Wikipedia
This list of phylogenetics software is a compilation of computational phylogenetics software used to produce phylogenetic trees. Such tools are commonly used in comparative genomics, cladistics, and bioinformatics. Methods for estimating phylogenies include neighbor-joining, maximum parsimony (also simply referred to as parsimony), unweighted pair group method with arithmetic mean (UPGMA), Bayesian phylogenetic inference, maximum likelihood, and distance matrix methods.
List
[edit]| Name | Description | Methods | Author |
|---|---|---|---|
| ADMIXTOOLS[1] | R software package that contains the qpGraph, qpAdm, qpWave, and qpDstat programs | Nick Patterson, David Reich | |
| AncesTree[2] | An algorithm for clonal tree reconstruction from multi-sample cancer sequencing data. | Maximum Likelihood, Integer Linear Programming (ILP) | M. El-Kebir, L. Oesper, H. Acheson-Field, B. J. Raphael |
| AliGROOVE[3] | Visualisation of heterogeneous sequence divergence within multiple sequence alignments and detection of inflated branch support | Identification of single taxa which show predominately randomized sequence similarity in comparison with other taxa in a multiple sequence alignment and evaluation of the reliability of node support in a given topology | Patrick Kück, Sandra A Meid, Christian Groß, Bernhard Misof, Johann Wolfgang Wägele. |
| ape[4] | R-Project package for analysis of phylogenetics and evolution | Provides a large variety of phylogenetics functions | Maintainer: Emmanuel Paradis |
| Armadillo Workflow Platform[5] | Workflow platform dedicated to phylogenetic and general bioinformatic analysis | Inference of phylogenetic trees using Distance, Maximum Likelihood, Maximum Parsimony, Bayesian methods and related workflows | E. Lord, M. Leclercq, A. Boc, A.B. Diallo and V. Makarenkov |
| BAli-Phy[6] | Simultaneous Bayesian inference of alignment and phylogeny | Bayesian inference, alignment as well as tree search | M.A. Suchard, B. D. Redelings |
| BATWING[7] | Bayesian Analysis of Trees With Internal Node Generation | Bayesian inference, demographic history, population splits | I. J. Wilson, Weale, D.Balding |
| BayesPhylogenies[8] | Bayesian inference of trees using Markov chain Monte Carlo methods | Bayesian inference, multiple models, mixture model (auto-partitioning) | M. Pagel, A. Meade |
| BayesTraits[9] | Analyses trait evolution among groups of species for which a phylogeny or sample of phylogenies is available | Trait analysis | M. Pagel, A. Meade |
| BEAST[10] | Bayesian Evolutionary Analysis Sampling Trees | Bayesian inference, relaxed molecular clock, demographic history | A. J. Drummond, M. A. Suchard, D Xie & A. Rambaut |
| BioNumerics | Universal platform for the management, storage and analysis of all types of biological data, including tree and network inference of sequence data | Neighbor-joining, maximum parsimony, UPGMA, maximum likelihood, distance matrix methods,... Calculation of the reliability of trees/branches using bootstrapping, permutation resampling or error resampling | L. Vauterin & P. Vauterin. |
| Bosque | Integrated graphical software to perform phylogenetic analyses, from the importing of sequences to the plotting and graphical edition of trees and alignments | Distance and maximum likelihood methods (through PhyML, PHYLIP, Tree-Puzzle) | S. Ramirez, E. Rodriguez. |
| BUCKy[11] | Bayesian concordance of gene trees | Bayesian concordance using modified greedy consensus of unrooted quartets | C. Ané, B. Larget, D.A. Baum, S.D. Smith, A. Rokas and B. Larget, S.K. Kotha, C.N. Dewey |
| Canopy[12] | Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing | Maximum Likelihood, Markov Chain Monte Carlo (MCMC) methods | Y. Jiang, Y. Qiu, A. J. Minn, and N. R. Zhang |
| CGRphylo[13] | CGR method for accurate classification and tracking of rapidly evolving viruses | Chaos Game Representation (CGR) method, based on concepts of statistical physics | Amarinder Singh Thind, Somdatta Sinha |
| CITUP | Clonality Inference in Tumors Using Phylogeny | Exhaustive search, Quadratic Integer Programming (QIP) | S. Malikic, A.W. McPherson, N. Donmez, C.S. Sahinalp |
| CladoGraph | The main goal of Cladograph is to provide a user-friendly tool for students, teachers, and researchers to explore evolutionary relationships between different species. | Trait analisys | Pedro Andrade Giroldo |
| ClustalW | Progressive multiple sequence alignment | Distance matrix/nearest neighbor | Thompson et al.[14] |
| CoalEvol | Simulation of DNA and protein evolution along phylogenetic trees (that can also be simulated with the coalescent) | Simulation of multiple sequence alignments of DNA or protein sequences | M. Arenas, D. Posada |
| CodABC | Coestimation of substitution, recombination and dN/dS in protein sequences | Approximate Bayesian computation | M. Arenas, J.S. Lopes, M.A. Beaumont, D. Posada |
| Dendroscope[15] | Tool for visualizing rooted trees and calculating rooted networks | Rooted trees, tanglegrams, consensus networks, hybridization networks | Daniel Huson et al. |
| EXACT[16][17] | EXACT is based on the perfect phylogeny model, and uses a very fast homotopy algorithm to evaluate the fitness of different trees, and then it brute forces the tree search using GPUs, or multiple CPUs, on the same or on different machines | Brute force search and homotopy algorithm | Jia B., Ray S., Safavi S., Bento J. |
| EzEditor[18] | EzEditor is a java-based sequence alignment editor for rRNA and protein coding genes. It allows manipulation of both DNA and protein sequence alignments for phylogenetic analysis | Multiple sequence alignment and editing | Y.-S. Jeon, K. Lee, S.-C. Park, B.-S. Kim, Y.-J. Cho, S.-M. Ha, and J. Chun |
| fastDNAml | Optimized maximum likelihood (nucleotides only) | Maximum likelihood | G.J. Olsen |
| FastTree 2[19] | Fast phylogenetic inference for alignments with up to hundreds of thousands of sequences | Approximate maximum likelihood | M.N. Price, P.S. Dehal, A.P. Arkin |
| fitmodel | Fits branch-site codon models without the need of prior knowledge of clades undergoing positive selection | Maximum likelihood | S. Guindon |
| Geneious | Geneious provides genome and proteome research tools | Neighbor-joining, UPGMA, MrBayes plugin, PhyML plugin, RAxML plugin, FastTree plugin, GARLi plugin, PAUP* Plugin | A. J. Drummond, M.Suchard, V.Lefort et al. |
| HyPhy | Hypothesis testing using phylogenies | Maximum likelihood, neighbor-joining, clustering techniques, distance matrices | S.L. Kosakovsky Pond, S.D.W. Frost, S.V. Muse |
| INDELible[20] | Simulation of DNA/protein sequence evolution | Simulation | W. Fletcher, Z. Yang |
| IQPNNI (No longer maintained; superseded by IQ-TREE)[21] | Iterative ML treesearch with stopping rule | Maximum likelihood, neighbor-joining | L.S. Vinh, A. von Haeseler, B.Q. Minh |
| IQ-Tree[22][23] | An efficient phylogenomic software by maximum likelihood, as successor of IQPNNI and Tree-Puzzle | Maximum likelihood, model selection, partitioning scheme finding, AIC, AICc, BIC, ultrafast bootstrapping,[24] branch tests, tree topology tests, likelihood mapping | Lam-Tung Nguyen, O. Chernomor, H.A. Schmidt, A. von Haeseler, B.Q. Minh |
| jModelTest 2 | A high-performance computing program to carry out statistical selection of best-fit models of nucleotide substitution | Maximum likelihood, AIC, BIC, DT, hLTR, dLTR | D. Darriba, GL. Taboada, R. Doallo, D. Posada |
| JolyTree[25][26] | An alignment-free bioinformatics procedure to infer distance-based phylogenetic trees from genome assemblies, specifically designed to quickly infer trees from genomes belonging to the same genus | MinHash-based pairwise genome distance, Balanced Minimum Evolution (BME), ratchet-based BME tree search, Rate of Elementary Quartets | A. Criscuolo |
| LisBeth | Three-item analysis for phylogenetics and biogeography | Three-item analysis | J. Ducasse, N. Cao & R. Zaragüeta-Bagils |
| MEGA | Molecular Evolutionary Genetics Analysis | Distance, Parsimony and Maximum Composite Likelihood Methods | Tamura K, Dudley J, Nei M & Kumar S |
| MegAlign Pro | MegAlign Pro is part of DNASTAR's Lasergene Molecular Biology package. This application performs multiple and pairwise sequence alignments, provides alignment editing, and generates phylogenetic trees. | Maximum Likelihood (RAxML) and Neighbor-Joining | DNASTAR |
| Mesquite | Mesquite is software for evolutionary biology, designed to help biologists analyze comparative data about organisms. Its emphasis is on phylogenetic analysis, but some of its modules concern comparative analyses or population genetics, while others do non-phylogenetic multivariate analysis. It can also be used to build timetrees incorporating a geological timescale, with some optional modules. | Maximum parsimony, distance matrix, maximum likelihood | Wayne Maddison and D. R. Maddison |
| MetaPIGA2 | Maximum likelihood phylogeny inference multi-core program for DNA and protein sequences, and morphological data. Analyses can be performed using an extensive and user-friendly graphical interface or by using batch files. It also implements tree visualization tools, ancestral sequences, and automated selection of best substitution model and parameters. | Maximum likelihood, stochastic heuristics (genetic algorithm, metapopulation genetic algorithm, simulated annealing, etc.), discrete Gamma rate heterogeneity, ancestral state reconstruction, model testing | Michel C. Milinkovitch and Raphaël Helaers |
| MicrobeTrace | MicrobeTrace is a free, browser-based web application. | 2D and 3D network visualization tool, Neighbor-joining tree visualization, Gantt charts, bubbles charts, networks visualized on maps, flow diagrams, aggregate tables, epi curves, histograms, alignment viewer, and much more. | Ellsworth M. Campbell, Anthony Boyles, Anupama Shankar, Jay Kim, Sergey Knyazev, Roxana Cintron, William M. Switzer[27] |
| MNHN-Tree-Tools | MNHN-Tree-Tools is an opensource phylogenetics inference software working on nucleic and protein sequences. | Clustering of DNA or protein sequences and phylogenetic tree inference from a set of sequences. At the core it employs a distance-density based approach. | Thomas Haschka, Loïc Ponger, Christophe Escudé and Julien Mozziconacci[28] |
| Modelgenerator | Model selection (protein or nucleotide) | Maximum likelihood | Thomas Keane |
| MOLPHY | Molecular phylogenetics (protein or nucleotide) | Maximum likelihood | J. Adachi and M. Hasegawa |
| MorphoBank | Web application to organize trait data (morphological characters) for tree building | for use with Maximum Parsimony (via the CIPRES portal), Maximum Likelihood, and Bayesian analysis) | O'Leary, M. A., and S. Kaufman,[29] also K. Alphonse |
| MrBayes | Posterior probability estimation | Bayesian inference | J. Huelsenbeck, et al.[30] |
| Network | Free Phylogenetic Network Software | Median Joining, Reduced Median, Steiner Network | A. Roehl |
| Nona | Phylogenetic inference | Maximum parsimony, implied weighting, ratchet | P. Goloboff |
| OrientAGraph | Admixture graph reconstruction from allele frequencies | f2-statistics or covariance matrix, maximum likelihood network orientation search implemented within TreeMix [31] | Erin Molloy, Arun Durvasula, Sriram Sankararaman [32] |
| PAML[33] | Phylogenetic analysis by maximum likelihood | Maximum likelihood and Bayesian inference | Z. Yang |
| ParaPhylo[34] | Computation of gene and species trees based on event-relations (orthology, paralogy) | Cograph-Editing and Triple-Inference | Hellmuth |
| PartitionFinder[35] | Combined selection of models of molecular evolution and partitioning schemes for DNA and protein alignments | Maximum likelihood, AIC, AICc, BIC | R. Lanfear, B Calcott, SYW Ho, S Guindon |
| PASTIS | R package for phylogenetic assembly | R, two‐stage Bayesian inference using MrBayes 3.2 | Thomas et al. 2013[36] |
| PAUP* | Phylogenetic analysis using parsimony (*and other methods) | Maximum parsimony, distance matrix, maximum likelihood | D. Swofford |
| phangorn[37] | Phylogenetic analysis in R | ML, MP, distance matrix, bootstrap, phylogentic networks, bootstrap, model selection, SH-test, SOWH-test | Maintainer: K. Schliep |
| Phybase[38] | an R package for species tree analysis | phylogenetics functions, STAR, NJst, STEAC, maxtree, etc | L. Liu & L. Yu |
| phyclust | Phylogenetic Clustering (Phyloclustering) | Maximum likelihood of Finite Mixture Modes | Wei-Chen Chen |
| PHYLIP | PHYLogeny Inference Package | Maximum parsimony, distance matrix, maximum likelihood | J. Felsenstein |
| phyloT | Generates phylogenetic trees in various formats, based on NCBI taxonomy | none | I. Letunic |
| PhyloQuart | Quartet implementation (uses sequences or distances) | Quartet method | V. Berry |
| PhyloWGS | Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors | MCMC | A. G. Deshwar, S. Vembu, C. K. Yung, G. H. Jang, L. Stein, and Q. Morris |
| PhyML[39] | Fast and accurate estimation of phylogenies using maximum likelihood | Maximum likelihood | S. Guindon & O. Gascuel |
| phyx[40] | Unix/Linux command line phylogenetic tools | Explore, manipulate, analyze, and simulate phylogenetic objects (alignments, trees, and MCMC logs) | J.W. Brown, J.F. Walker, and S.A. Smith |
| POY | A phylogenetic analysis program that supports multiple kinds of data and can perform alignment and phylogeny inference. A variety of heuristic algorithms have been developed for this purpose | Maximum parsimony, Maximum likelihood, Chromosome rearrangement, discreet characters, continuous characters, Alignment | A. Varon, N. Lucaroni, L. Hong, W. Wheeler |
| ProtASR2[41] | Ancestral reconstruction of protein sequences accounting for folding stability | Maximum likelihood, substitution models | M. Arenas, U. Bastolla |
| ProtEvol | Simulation of protein sequences under structurally constrained substitution models | Simulating sequences, substitution models | M. Arenas, A. Sanchez-Cobos, U. Bastolla U |
| ProteinEvolver | Simulation of protein sequences along phylogenies under empirical and structurally constrained substitution models of protein evolution | Simulating sequences forward in time, substitution models | M. Arenas, H.G. Dos Santos, D. Posada, U. Bastolla |
| ProteinEvolverABC[42] | Coestimation of recombination and substitution rates in protein sequences | Approximate Bayesian computation | M. Arenas |
| ProteinModelerABC[43] | Selection among site-dependent structurally constrained substitution models of protein evolution | Approximate Bayesian computation | D. Ferreiro et al |
| ProtTest3 | A high-performance computing program for selecting the model of protein evolution that best fits a given set of aligned sequences | Maximum likelihood, AIC, BIC, DT | D. Darriba, GL. Taboada, R. Doallo, D. Posada |
| PyCogent | Software library for genomic biology | Simulating sequences, alignment, controlling third party applications, workflows, querying databases, generating graphics and phylogenetic trees | Knight et al. |
| QuickTree | Tree construction optimized for efficiency | Neighbor-joining | K. Howe, A. Bateman, R. Durbin |
| RAxML-HPC | Randomized Axelerated Maximum Likelihood for High Performance Computing (nucleotides and aminoacids) | Maximum likelihood, simple Maximum parsimony | A. Stamatakis |
| RAxML-NG[44] | Randomized Axelerated Maximum Likelihood for High Performance Computing (nucleotides and aminoacids) Next Generation | Maximum likelihood, simple Maximum parsimony | A. Kozlov, D. Darriba, T. Flouri, B. Morel, A. Stamatakis |
| RevBayes[45] | RevBayes provides an interactive environment for statistical computation in phylogenetics. It is primarily intended for modeling, simulation, and Bayesian inference in evolutionary biology, particularly phylogenetics. However, the environment is quite general and can be useful for many complex modeling tasks. | Bayesian inference | S. Höhna et al.[46] |
| SEMPHY | Tree reconstruction using the combined strengths of maximum-likelihood (accuracy) and neighbor-joining (speed). SEMPHY has become outdated. The authors now refer users to RAxML, which is superior in accuracy and speed. | A hybrid maximum-likelihood – neighbor-joining method | M. Ninio, E. Privman, T. Pupko, N. Friedman |
| SGWE | Simulation of genome-wide evolution along phylogenetic trees | Simulating genome-wide sequences forward time | Arenas M., Posada D. |
| SimPlot++[47] | Sequence similarity plots (SimPlots[48]), detection of intragenic and intergenic recombination events, bootscan analysis[49] and sequence similarity networks | SimPlot using different nucleotide/protein distance models; Phi, χ2 and NSS recombination tests; Sequence similarity network analysis | S. Samson, E. Lord, V. Makarenkov |
| sowhat[50] | Hypothesis testing | SOWH test | Samuel H Church, Joseph F Ryan, and Casey W Dunn |
| Splatche3[51] | Simulation of genetic data under diverse spatially explicit evolutionary scenarios | Coalescent, molecular evolution, DNA sequences, SNPs, STRs, RFLPs | M. Currat et al. |
| SplitsTree[52] | Tree and network program | Computation, visualization and exploration of phylogenetic trees and networks | D.H. Huson and D. Bryant |
| TNT[53] | Phylogenetic inference | Parsimony, weighting, ratchet, tree drift, tree fusing, sectorial searches | P. A. Goloboff, J. S. Farris, and K. C. Nixon |
| TOPALi | Phylogenetic inference | Phylogenetic model selection, Bayesian analysis and Maximum Likelihood phylogenetic tree estimation, detection of sites under positive selection, and recombination breakpoint location analysis | Iain Milne, Dominik Lindner et al. |
| TreeGen | Tree construction given precomputed distance data | Distance matrix | ETH Zurich |
| TreeAlign | Efficient hybrid method | Distance matrix and approximate parsimony | J. Hein |
| TreeLine | Tree construction algorithm within the DECIPHER package for R | Maximum likelihood, maximum parsimony, and distance | E. Wright |
| Treefinder[54] | Fast ML tree reconstruction, bootstrap analysis, model selection, hypothesis testing, tree calibration, tree manipulation and visualization, computation of sitewise rates, sequence simulation, many models of evolution (DNA, protein, rRNA, mixed protein, user-definable), GUI and scripting language | Maximum likelihood, distances, and others | Jobb G, von Haeseler A, Strimmer K |
| TreeMix | Admixture graph reconstruction from allele frequencies | f2-statistics or covariance matrix, maximum likelihood, heuristic search (building tree via randomized taxon addition and then adding admixture edges) | Joseph K. Pickrell and Jonathan K. Pritchard [55] |
| Tree-Puzzle[56] (No longer maintained; superseded by IQ-TREE)[57] | Maximum likelihood and statistical analysis | Maximum likelihood | H. A. Schmidt, K. Strimmer, M. Vingron, and A. von Haeseler |
| TREE-QMC | Summarizes unrooted gene trees into unrooted species tree | Graph-cut-based heuristic for maximum quartet support species tree problem [58][59] | Yunheng Han, Erin Molloy [60][61] |
| T-REX (Webserver)[62][63] | Tree inference and visualization, Horizontal gene transfer detection, multiple sequence alignment | Distance (neighbor joining), Parsimony and Maximum likelihood (PhyML, RAxML) tree inference, MUSCLE, MAFFT and ClustalW sequence alignments and related applications | Boc A, Diallo AB, Makarenkov V |
| UShER[64] | Phylogenetic placement using maximum parsimony for viral genomes | Maximum parsimony | Turakhia Y, Thornlow B, Hinrichs AS, De Maio N, Gozashti L, Lanfear R, Haussler D and Corbett-Detig R |
| UGENE | Fast and free multiplatform tree editor | GUI with PHYLIP 3.6 and IQTree algorithms | Unipro |
| VeryFastTree[65] | A highly-tuned tool that uses parallelizing and vectorizing strategies to speed inference of phylogenies for huge alignments | Approximate maximum likelihood | César Piñeiro. José M. Abuín and Juan C. Pichel |
| Winclada | GUI and tree editor (requires Nona) | Maximum parsimony, ratchet | K. Nixon |
| Xrate | Phylo-grammar engine | Rate estimation, branch length estimation, alignment annotation | I. Holmes |
See also
[edit]References
[edit]- ^ Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, Reich D (November 2012). "Ancient admixture in human history". Genetics. 192 (3): 1065–93. doi:10.1534/genetics.112.145037. PMC 3522152. PMID 22960212.
- ^ El-Kebir M, Oesper L, Acheson-Field H, Raphael BJ (June 2015). "Reconstruction of clonal trees and tumor composition from multi-sample sequencing data". Bioinformatics. 31 (12): i62-70. doi:10.1093/bioinformatics/btv261. PMC 4542783. PMID 26072510.
- ^ Kück P, Meid SA, Groß C, Wägele JW, Misof B (August 2014). "AliGROOVE--visualization of heterogeneous sequence divergence within multiple sequence alignments and detection of inflated branch support". BMC Bioinformatics. 15 (1): 294. doi:10.1186/1471-2105-15-294. PMC 4167143. PMID 25176556.
- ^ Paradis E, Claude J, Strimmer K (January 2004). "APE: Analyses of Phylogenetics and Evolution in R language". Bioinformatics. 20 (2). Oxford, England: 289–90. doi:10.1093/bioinformatics/btg412. PMID 14734327.
- ^ Lord E, Leclercq M, Boc A, Diallo AB, Makarenkov V (2012). "Armadillo 1.1: an original workflow platform for designing and conducting phylogenetic analysis and simulations". PLOS One. 7 (1) e29903. Bibcode:2012PLoSO...729903L. doi:10.1371/journal.pone.0029903. PMC 3256230. PMID 22253821.
- ^ Suchard MA, Redelings BD (August 2006). "BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny". Bioinformatics. 22 (16): 2047–8. doi:10.1093/bioinformatics/btl175. PMID 16679334.
- ^ Wilson IJ, Weale ME, Balding DJ (June 2003). "Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities". Journal of the Royal Statistical Society, Series A (Statistics in Society). 166 (2): 155–88. doi:10.1111/1467-985X.00264.
- ^ Pagel M, Meade A (2007), BayesPhylogenies 1.0. Software distributed by the authors.
- ^ Pagel M, Meade A (2007). "BayesTraits. Computer program and documentation". pp. 1216–23.[permanent dead link]
- ^ Drummond A, Suchard MA, Xie D, Rambaut A (2012). "Bayesian phylogenetics with BEAUti and the BEAST 1.7". Molecular Biology and Evolution. 29 (8): 1969–1973. doi:10.1093/molbev/mss075. PMC 3408070. PMID 22367748.
- ^ Larget, Bret R.; Kotha, Satish K.; Dewey, Colin N.; Ané, Cécile (September 2010). "BUCKy: Gene tree/species tree reconciliation with Bayesian concordance analysis". Bioinformatics. 26 (22): 2910–2911. doi:10.1093/bioinformatics/btq539. PMID 20861028.
- ^ Jiang Y, Qiu Y, Minn AJ, Zhang NR (September 2016). "Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing". Proceedings of the National Academy of Sciences of the United States of America. 113 (37): E5528-37. Bibcode:2016PNAS..113E5528J. doi:10.1073/pnas.1522203113. PMC 5027458. PMID 27573852.
- ^ Thind, Amarinder Singh; Sinha, Somdatta (2023). "Using Chaos-Game-Representation for Analysing the SARS-CoV-2 Lineages, Newly Emerging Strains and Recombinants". Current Genomics. 24 (3): 187–195. doi:10.2174/0113892029264990231013112156. PMC 10761335. PMID 38178984. S2CID 264500732.
- ^ Thompson, Julie D.; Gibson, Toby J.; Higgins, Des G. (August 2002). "Multiple sequence alignment using ClustalW and ClustalX". Current Protocols in Bioinformatics. Chapter 2: 2.3.1–2.3.22. doi:10.1002/0471250953.bi0203s00. ISSN 1934-340X. PMID 18792934. S2CID 34156490.
- ^ Huson DH, Scornavacca C (December 2012). "Dendroscope 3: an interactive tool for rooted phylogenetic trees and networks". Systematic Biology. 61 (6): 1061–7. doi:10.1093/sysbio/sys062. PMID 22780991.
- ^ Jia B, Ray S, Safavi S, Bento J (2018). "Efficient Projection onto the Perfect Phylogeny Model". In Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds.). Advances in Neural Information Processing Systems 31 (NeurIPS 2018). pp. 4108–4118.
- ^ Ray S, Jia B, Safavi S, Opijnen T, Isberg R, Rosch J, Bento J. Exact inference under the perfect phylogeny model. arXiv:1908.08623.
- ^ Jeon YS, Lee K, Park SC, Kim BS, Cho YJ, Ha SM, Chun J (February 2014). "EzEditor: a versatile sequence alignment editor for both rRNA- and protein-coding genes". International Journal of Systematic and Evolutionary Microbiology. 64 (Pt 2): 689–91. doi:10.1099/ijs.0.059360-0. PMID 24425826.
- ^ Price MN, Dehal PS, Arkin AP (March 2010). "FastTree 2--approximately maximum-likelihood trees for large alignments". PLOS One. 5 (3) e9490. Bibcode:2010PLoSO...5.9490P. doi:10.1371/journal.pone.0009490. PMC 2835736. PMID 20224823.
- ^ Fletcher, William; Yang, Ziheng (2009-08-01). "INDELible: A Flexible Simulator of Biological Sequence Evolution". Molecular Biology and Evolution. 26 (8): 1879–1888. doi:10.1093/molbev/msp098. ISSN 0737-4038. PMC 2712615. PMID 19423664.
- ^ "IQPNNI - Important Quartet Puzzling and Nearest Neighbor Interchange". Wien, Austria: University of Vienna. 20 August 2010. Retrieved 5 February 2025.
- ^ Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ (January 2015). "IQ-Tree: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies". Molecular Biology and Evolution. 32 (1): 268–74. doi:10.1093/molbev/msu300. PMC 4271533. PMID 25371430.
- ^ Minh, Bui Quang; Schmidt, Heiko A; Chernomor, Olga; Schrempf, Dominik; Woodhams, Michael D; von Haeseler, Arndt; Lanfear, Robert (February 2020). "IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era". Molecular Biology and Evolution. 37 (5): 1530–1534. doi:10.1093/molbev/msaa015. hdl:1885/212335. ISSN 0737-4038. PMC 7182206. PMID 32011700.
- ^ Minh BQ, Nguyen MA, von Haeseler A (May 2013). "Ultrafast approximation for phylogenetic bootstrap". Molecular Biology and Evolution. 30 (5): 1188–95. doi:10.1093/molbev/mst024. PMC 3670741. PMID 23418397.
- ^ Criscuolo A (June 2019). "A fast alignment-free bioinformatics procedure to infer accurate distance-based phylogenetic trees from genome assemblies". Research Ideas and Outcomes. 5 e36178. doi:10.3897/rio.5.e36178. S2CID 196180156.
- ^ Criscuolo A (November 2020). "On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic inference". F1000Research. 9: 1309. doi:10.12688/f1000research.26930.1. PMC 7713896. PMID 33335719.
- ^ Campbell, Ellsworth M.; Boyles, Anthony; Shankar, Anupama; Kim, Jay; Knyazev, Sergey; Cintron, Roxana; Switzer, William M. (2021-09-07). "MicrobeTrace: Retooling molecular epidemiology for rapid public health response". PLOS Computational Biology. 17 (9) e1009300. Bibcode:2021PLSCB..17E9300C. doi:10.1371/journal.pcbi.1009300. ISSN 1553-7358. PMC 8491948. PMID 34492010.
- ^ Haschka, Thomas; Ponger, Loic; Escudé, Christophe; Mozziconacci, Julien (2021-06-08). "MNHN-Tree-Tools: a toolbox for tree inference using multi-scale clustering of a set of sequences". Bioinformatics. 37 (21): 3947–3949. doi:10.1093/bioinformatics/btab430. ISSN 1367-4803. PMID 34100911.
- ^ O'Leary, Maureen A.; Kaufman, Seth (October 2011). "MorphoBank: phylophenomics in the "cloud"". Cladistics. 27 (5): 529–537. doi:10.1111/j.1096-0031.2011.00355.x. PMID 34875801. S2CID 76652345.
- ^ Huelsenbeck, J. P.; Ronquist, F. (August 2001). "MRBAYES: Bayesian inference of phylogenetic trees". Bioinformatics. 17 (8): 754–755. doi:10.1093/bioinformatics/17.8.754. ISSN 1367-4803. PMID 11524383.
- ^ Pickrell, Joseph K.; Pritchard, Jonathan K. (15 November 2012). "Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data". PLOS Genetics. 8 (11) e1002967. doi:10.1371/journal.pgen.1002967. PMC 3499260. PMID 23166502.
- ^ Molloy, Erin K; Durvasula, Arun; Sankararaman, Sriram (12 July 2021). "Advancing admixture graph estimation via maximum likelihood network orientation". Bioinformatics. 37 (Supplement_1): i142 – i150. doi:10.1093/bioinformatics/btab267. PMC 8336447.
- ^ Yang, Ziheng (May 2007). "PAML 4: Phylogenetic Analysis by Maximum Likelihood". Molecular Biology and Evolution. 24 (8): 1586–1591. doi:10.1093/molbev/msm088. ISSN 0737-4038. PMID 17483113.
- ^ Hellmuth M, Wieseke N, Lechner M, Lenhof HP, Middendorf M, Stadler PF (February 2015). "Phylogenomics with paralogs". Proceedings of the National Academy of Sciences of the United States of America. 112 (7): 2058–63. arXiv:1712.06442. Bibcode:2015PNAS..112.2058H. doi:10.1073/pnas.1412770112. PMC 4343152. PMID 25646426.
- ^ Lanfear, Robert; Frandsen, Paul B; Wright, April M; Senfeld, Tereza; Calcott, Brett (24 December 2016). "PartitionFinder 2: new methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses". Molecular Biology and Evolution. 34 (3): 772–773. doi:10.1093/molbev/msw260. hdl:2027.42/145562. ISSN 0737-4038. PMID 28013191.
- ^ Thomas, Gavin H.; Hartmann, Klaas; Jetz, Walter; Joy, Jeffrey B.; Mimoto, Aki; Mooers, Arne O. (2013). "PASTIS: an R package to facilitate phylogenetic assembly with soft taxonomic inferences". Methods in Ecology and Evolution. 4 (11): 1011–1017. Bibcode:2013MEcEv...4.1011T. doi:10.1111/2041-210X.12117. ISSN 2041-210X. S2CID 86694418.
- ^ Schliep KP (February 2011). "phangorn: phylogenetic analysis in R". Bioinformatics. 27 (4): 592–3. doi:10.1093/bioinformatics/btq706. PMC 3035803. PMID 21169378.
- ^ Liu L, Yu L (April 2010). "Phybase: an R package for species tree analysis". Bioinformatics. 26 (7): 962–3. doi:10.1093/bioinformatics/btq062. PMID 20156990.
- ^ Guindon, Stéphane; Dufayard, Jean-François; Lefort, Vincent; Anisimova, Maria; Hordijk, Wim; Gascuel, Olivier (2010-03-29). "New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0". Systematic Biology. 59 (3): 307–321. doi:10.1093/sysbio/syq010. hdl:20.500.11850/25281. ISSN 1076-836X. PMID 20525638.
- ^ Brown JW, Walker JF, Smith SA (June 2017). "Phyx: phylogenetic tools for unix". Bioinformatics. 33 (12): 1886–1888. doi:10.1093/bioinformatics/btx063. PMC 5870855. PMID 28174903.
- ^ Arenas, Miguel; Bastolla, Ugo (2020). "ProtASR2: Ancestral reconstruction of protein sequences accounting for folding stability". Methods in Ecology and Evolution. 11 (2): 248–257. Bibcode:2020MEcEv..11..248A. doi:10.1111/2041-210X.13341. ISSN 2041-210X.
- ^ Arenas, Miguel (2021-08-27). "ProteinEvolverABC: coestimation of recombination and substitution rates in protein sequences by approximate Bayesian computation". Bioinformatics. 38 (1): 58–64. doi:10.1093/bioinformatics/btab617. ISSN 1367-4803. PMC 8696103. PMID 34450622.
- ^ Ferreiro, David; Branco, Catarina; Arenas, Miguel (2024). "Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation". Bioinformatics. 40 (3) btae096. doi:10.1093/bioinformatics/btae096. ISSN 1367-4811. PMC 10914458. PMID 38374231.
- ^ Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A (May 2019). "RAxML-NG: A fast, scalable, and user-friendly tool for maximum likelihood phylogenetic inference". Bioinformatics. 35 (21): 4453–4455. doi:10.1093/bioinformatics/btz305. PMC 6821337. PMID 31070718.
- ^ Höhna, Sebastian; Landis, Michael J.; Heath, Tracy A.; Boussau, Bastien; Lartillot, Nicolas; Moore, Brian R.; Huelsenbeck, John P.; Ronquist, Fredrik (July 2016). "RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model-Specification Language". Systematic Biology. 65 (4): 726–736. doi:10.1093/sysbio/syw021. ISSN 1063-5157. PMC 4911942. PMID 27235697.
- ^ "revbayes". GitHub. Retrieved 2025-09-23.
- ^ Samson, Stéphane; Lord, Étienne; Makarenkov, Vladimir (26 May 2022). "SimPlot++: a Python application for representing sequence similarity and detecting recombination". Bioinformatics. 38 (11): 3118–3120. arXiv:2112.09755. doi:10.1093/bioinformatics/btac287. PMID 35451456.
- ^ Lole, Kavita S.; Bollinger, Robert C.; Paranjape, Ramesh S.; Gadkari, Deepak; Kulkarni, Smita S.; Novak, Nicole G.; Ingersoll, Roxann; Sheppard, Haynes W.; Ray, Stuart C. (January 1999). "Full-Length Human Immunodeficiency Virus Type 1 Genomes from Subtype C-Infected Seroconverters in India, with Evidence of Intersubtype Recombination". Journal of Virology. 73 (1): 152–160. doi:10.1128/JVI.73.1.152-160.1999. PMC 103818. PMID 9847317.
- ^ Salminen, Mika O.; Carr, Jean K.; Burke, Donald S.; McCutchan, Francine E. (November 1995). "Identification of Breakpoints in Intergenotypic Recombinants of HIV Type 1 by Bootscanning". AIDS Research and Human Retroviruses. 11 (11): 1423–1425. doi:10.1089/aid.1995.11.1423. PMID 8573403.
- ^ Church SH, Ryan JF, Dunn CW (November 2015). "Automation and Evaluation of the SOWH Test with SOWHAT". Systematic Biology. 64 (6): 1048–58. doi:10.1093/sysbio/syv055. PMC 4604836. PMID 26231182.
- ^ Currat, Mathias; Arenas, Miguel; Quilodràn, Claudio S; Excoffier, Laurent; Ray, Nicolas (2019-05-11). "SPLATCHE3: simulation of serial genetic data under spatially explicit evolutionary scenarios including long-distance dispersal". Bioinformatics. 35 (21): 4480–4483. doi:10.1093/bioinformatics/btz311. ISSN 1367-4803. PMC 6821363. PMID 31077292.
- ^ Huson DH, Bryant D (February 2006). "Application of phylogenetic networks in evolutionary studies". Molecular Biology and Evolution. 23 (2): 254–67. doi:10.1093/molbev/msj030. PMID 16221896.
- ^ Goloboff, Pablo A.; Farris, James S.; Nixon, Kevin C. (29 September 2008). "TNT, a free program for phylogenetic analysis". Cladistics. 24 (5): 774–786. doi:10.1111/j.1096-0031.2008.00217.x. hdl:11336/81790.
- ^ Jobb G, von Haeseler A, Strimmer K (June 2004). "Treefinder: a powerful graphical analysis environment for molecular phylogenetics". BMC Evolutionary Biology. 4: 18. doi:10.1186/1471-2148-4-18. PMC 459214. PMID 15222900. (Retracted, see doi:10.1186/s12862-015-0513-z, PMID 26542699, Retraction Watch)
- ^ Pickrell, Joseph K.; Pritchard, Jonathan K. (15 November 2012). "Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data". PLOS Genetics. 8 (11) e1002967. doi:10.1371/journal.pgen.1002967. PMC 3499260. PMID 23166502.
- ^ Schmidt HA, Strimmer K, Vingron M, von Haeseler A (March 2002). "Tree-Puzzle: maximum likelihood phylogenetic analysis using quartets and parallel computing". Bioinformatics. 18 (3): 502–4. doi:10.1093/bioinformatics/18.3.502. PMID 11934758.
- ^ "iqtree2". GitHub. Retrieved 5 February 2025.
- ^ Snir, Sagi; Rao, Satish (1 January 2012). "Quartet MaxCut: A fast algorithm for amalgamating quartet trees". Molecular Phylogenetics and Evolution. 62 (1): 1–8. Bibcode:2012MolPE..62....1S. doi:10.1016/j.ympev.2011.06.021. PMID 21762785.
- ^ Avni, Eliran; Cohen, Reuven; Snir, Sagi (1 March 2015). "Weighted Quartets Phylogenetics". Systematic Biology. 64 (2): 233–242. doi:10.1093/sysbio/syu087. PMID 25414175.
- ^ Han, Yunheng; Molloy, Erin K (25 February 2025). "Improved robustness to gene tree incompleteness, estimation errors, and systematic homology errors with weighted TREE-QMC". Systematic Biology syaf009. doi:10.1093/sysbio/syaf009. PMID 40000439.
- ^ Han, Yunheng; Molloy, Erin K. (1 July 2023). "Improving quartet graph construction for scalable and accurate species tree estimation from gene trees". Genome Research. 33 (7): 1042–1052. doi:10.1101/gr.277629.122. PMC 10538498. PMID 37197990.
- ^ Makarenkov V (July 2001). "T-REX: reconstructing and visualizing phylogenetic trees and reticulation networks". Bioinformatics. 17 (7): 664–8. doi:10.1093/bioinformatics/17.7.664. PMID 11448889.
- ^ Boc A, Diallo AB, Makarenkov V (July 2012). "T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks". Nucleic Acids Research. 40 (Web Server issue): W573–9. doi:10.1093/nar/gks485. PMC 3394261. PMID 22675075.
- ^ Turakhia Y, Thornlow B, Hinrichs AS, De Maio N, Gozashti L, Lanfear R, Haussler D, Corbett-Detig R (June 2021). "Ultrafast Sample Placement on Existing Trees (UShER) Empowers Real-Time Phylogenetics for the SARS-CoV-2 Pandemic". Nature Genetics. 53 (6): 809–816. doi:10.1038/s41588-021-00862-7. PMC 9248294. PMID 33972780.
- ^ Piñeiro, César; Abuín, José M; Pichel, Juan C (2020-11-01). Ponty, Yann (ed.). "Very Fast Tree: speeding up the estimation of phylogenies for large alignments through parallelization and vectorization strategies". Bioinformatics. 36 (17): 4658–4659. doi:10.1093/bioinformatics/btaa582. ISSN 1367-4803. PMID 32573652.
External links
[edit]- Complete list of Institut Pasteur phylogeny webservers
- ExPASy List of phylogenetics programs
- A very comprehensive list of phylogenetic tools (reconstruction, visualization, etc.)
- Another list of evolutionary genetics software
- A list of phylogenetic software provided by the Zoological Research Museum A. Koenig
- MicrobeTrace available at https://github.com/CDCgov/MicrobeTrace/wiki
List of phylogenetics software
View on Grokipediafrom Grokipedia
Overview
Definition and Scope
Phylogenetics software encompasses computational tools that infer evolutionary relationships among organisms or taxa by reconstructing phylogenetic trees from biological data. These trees model the branching patterns of descent with modification, illustrating how species, genes, or other entities have diverged over time from common ancestors. The software processes diverse data types, including molecular sequences such as DNA, RNA, or proteins, as well as morphological traits like anatomical structures, to estimate these relationships through statistical and algorithmic methods.[6][7] The scope of phylogenetics software extends to key stages of analysis, including sequence alignment to prepare homologous data, tree construction via approaches like distance matrices or likelihood models, evolutionary model selection to account for substitution rates and patterns, and validation techniques such as bootstrap resampling to assess tree robustness. These tools support interdisciplinary applications in comparative genomics for elucidating gene family evolution, systematics for taxonomic classification, and epidemiology for reconstructing transmission histories of pathogens.[8][6] Input data for phylogenetics software typically includes aligned molecular sequences in formats like FASTA or PHYLIP, or precomputed distance matrices representing pairwise dissimilarities between taxa. Outputs are primarily phylogenetic trees encoded in standardized formats such as Newick, which uses nested parentheses to denote branching structures with branch lengths, or Nexus, which supports extended annotations for metadata.[8][9] Representative applications of phylogenetics software include species tree reconstruction to map biodiversity and evolutionary divergence, viral phylogenies to trace outbreak origins and transmission dynamics, and conservation biology to identify evolutionarily distinct lineages for priority protection.[8][6]Key Methodologies
Phylogenetics software employs several core methodologies to infer evolutionary relationships among taxa from molecular or morphological data. Distance-based methods begin by computing pairwise evolutionary distances between sequences, correcting for multiple substitutions using probabilistic models such as the Jukes-Cantor model, which assumes equal rates of nucleotide substitution and estimates the true distance from the observed proportion of differences via [[10]]. These distances form a matrix that serves as input for clustering algorithms like unweighted pair group method with arithmetic mean (UPGMA), which builds hierarchical trees assuming a constant molecular clock, or neighbor-joining, which relaxes this assumption by iteratively joining pairs of taxa that minimize total branch length, producing an unrooted tree efficient for large datasets.[11] These approaches are computationally efficient but sensitive to rate variation across lineages, potentially leading to long-branch attraction artifacts.[12] Parsimony-based methods seek the tree requiring the fewest evolutionary changes, or "steps," in character states across branches, embodying Occam's razor by minimizing ad hoc hypotheses of transformation. For a given tree topology, the minimum number of steps is computed using algorithms like Fitch's, which propagates possible ancestral states from leaves to root and backtracks to count changes. Exact solutions employ branch-and-bound searches, which prune suboptimal subtrees by bounding the parsimony score below a current minimum, guaranteeing optimality for small to moderate datasets despite exponential complexity.[13] Heuristic alternatives, such as stepwise addition or tree bisection-reconnection, approximate solutions for larger problems but risk local optima.[12] Maximum likelihood methods evaluate tree topologies and branch parameters by maximizing the probability of observing the data under an explicit evolutionary model, typically via the likelihood function , where are site patterns, the tree, and model parameters like substitution rates.[14] Optimization involves iterative algorithms such as hill-climbing to search tree space while estimating parameters via expectation-maximization or numerical integration, allowing model selection through likelihood ratio tests.[15] This framework accommodates complex models like general time-reversible for nucleotides, providing statistical inference via bootstrapping, though computational demands scale poorly with sequence length and taxon number.[12] Bayesian inference integrates prior probabilities on trees and parameters with the likelihood to compute posterior distributions, , using Markov chain Monte Carlo (MCMC) sampling to explore high-dimensional parameter spaces and generate credible sets of trees.[16] Chains are run for millions of generations, discarding burn-in and assessing convergence via trace plots or effective sample sizes, enabling quantification of uncertainty and incorporation of priors like birth-death models for divergence times.[17] This method excels in handling heterogeneous data partitions but requires careful prior specification and longer run times compared to point-estimate approaches.[12] Other methodologies include supertree approaches, which combine overlapping source trees into a comprehensive phylogeny using matrix representation with parsimony or quartet-based encoding to resolve conflicts across studies.[18] Simulation-based methods generate synthetic data under hypothesized models to test tree robustness or method performance, often via parametric bootstrapping to evaluate significance of topological differences.[14] Emerging approaches incorporate machine learning and deep learning to enhance tree search and model parameter estimation, particularly for large-scale genomic data (as of 2025).[19]Historical Development
Early Computational Tools (Pre-1990)
The development of computational tools for phylogenetics in the pre-1990 era marked a pivotal shift from manual cladistic methods to automated inference of evolutionary trees, driven by the advent of affordable microcomputers and early molecular data. These tools, often written in FORTRAN for mainframe and personal computers, focused on implementing foundational algorithms such as distance clustering, parsimony optimization, and initial likelihood approaches, primarily for small datasets of DNA sequences or morphological characters. Their simplicity reflected the computational constraints of the time, yet they democratized phylogenetic analysis among biologists. A landmark in this period was the release of PHYLIP (Phylogeny Inference Package) in October 1980 by Joseph Felsenstein at the University of Washington.[20] This free, comprehensive package comprised multiple standalone programs for inferring phylogenies from molecular data, supporting distance-based methods (e.g., neighbor-joining precursors), parsimony, and maximum likelihood estimation under simple substitution models. PHYLIP's modular design allowed users to chain programs for sequential analysis, from sequence alignment to tree output in Newick format, and it quickly gained adoption for its accessibility and documentation. Complementing PHYLIP, David L. Swofford introduced PAUP (Phylogenetic Analysis Using Parsimony) in 1981 while at the Illinois Natural History Survey.[22] Initially tailored for parsimony-based reconstruction, PAUP handled both morphological and molecular datasets, incorporating heuristic branch-and-bound searches to approximate most parsimonious trees amid the combinatorial explosion of possible topologies.[23] By version 2.0 in the mid-1980s, it added distance methods and compatibility with early sequence formats, making it a staple for cladistic studies. Distance methods, rooted in the 1958 UPGMA algorithm by Sokal and Michener, found early software implementation in packages like BIOSYS-1, co-developed by Swofford and Richard B. Selander in 1981. This FORTRAN program targeted electrophoretic allozyme data for population genetics and systematics, featuring UPGMA and related clustering for phylogenetic trees from allele frequency matrices.[24] It emphasized phenetic relationships over explicit evolutionary models, providing bootstrapping-like resampling in later iterations to assess cluster stability. These pioneering tools shared key limitations inherent to 1980s hardware and programming paradigms: strictly command-line interfaces required scripted execution without user-friendly GUIs or visual tree rendering; analyses were confined to modest datasets (often <20 taxa and <1000 sites) due to memory and processing limits; and they employed rudimentary models lacking corrections for rate heterogeneity or multiple substitutions, leading to potential biases in tree accuracy. Outputs were typically text-based tree descriptions, necessitating manual interpretation. Despite these constraints, early tools like PHYLIP and PAUP profoundly impacted systematics by enabling reproducible computational phylogenies, fueling debates on parsimony versus statistical methods in cladistics, and inspiring subsequent innovations in evolutionary biology.[25] Their open distribution fostered global collaboration, with PHYLIP alone registering thousands of users by the late 1980s and influencing the integration of phylogenetics into broader genetic research.[26]Modern and Specialized Packages (1990-Present)
The 1990s marked a pivotal shift in phylogenetics software toward likelihood-based inference, driven by the need for more sophisticated evolutionary models amid growing sequence data from molecular biology. PHYLIP, which already included maximum likelihood methods since the early 1980s, underwent significant expansions in this era to incorporate a broader range of substitution models and other enhancements, enabling users to estimate phylogenies under more complex scenarios like heterogeneous rates across sites. Similarly, PAUP* evolved from its parsimony-focused roots to include maximum likelihood capabilities in versions released during the 1990s, supporting nucleotide, amino acid, and codon models for rigorous hypothesis testing. A notable innovation was the introduction of TREE-PUZZLE in 1995, which implemented quartet puzzling—a heuristic approach to maximum likelihood tree reconstruction that approximates full likelihood searches by combining quartet topologies, offering computational efficiency for larger datasets at the time.[27] The 2000s ushered in the Bayesian revolution, transforming phylogenetics by integrating probabilistic inference with Markov chain Monte Carlo (MCMC) sampling to quantify uncertainty in tree topologies and parameters. MrBayes, released in 2001, popularized this paradigm by allowing users to sample posterior distributions of phylogenies under various models, including partitioned data for multi-gene analyses, and became a standard for empirical Bayesian phylogenetics due to its accessibility and robust convergence diagnostics. Building on this, BEAST emerged in 2003 as a Bayesian framework that extended MCMC to incorporate relaxed molecular clocks, enabling divergence time estimation from calibrated phylogenies and addressing temporal heterogeneity in evolutionary rates—critical for studies in molecular epidemiology and macroevolution. These tools democratized Bayesian methods, shifting the field from point estimates to full posterior explorations, though they initially required substantial computational resources. Post-2010 developments responded to the phylogenomics era, where next-generation sequencing generated massive alignments, necessitating scalable algorithms for maximum likelihood inference. RAxML, first published in 2004 and continuously updated, optimized rapid bootstrapping and tree searches for large datasets using parallel computing on multicore systems and clusters, achieving up to 100-fold speedups over predecessors for alignments exceeding 10,000 taxa. IQ-TREE, introduced in 2014, advanced this further with integrated model selection via ModelFinder, which employs hill-climbing algorithms to identify optimal substitution models, and UFBoot for ultrafast bootstrap approximations, facilitating accurate phylogenies from phylogenomic data in hours rather than days. More recent advancements include the release of IQ-TREE 2 in 2020, which improved efficiency and support for complex models, and RAxML-NG in 2018, enhancing scalability for massive datasets through optimized likelihood computations.[28][29] Broader trends included the adoption of graphical user interfaces, as seen in MEGA's evolution since the 1990s to provide intuitive platforms for alignment, model testing, and tree visualization, alongside open-source collaboration on platforms like GitHub for community-driven enhancements. Integration with next-generation sequencing (NGS) pipelines became standard, addressing scalability for thousands of loci, improved uncertainty quantification through ensemble methods, and hybrid approaches combining likelihood with machine learning for anomaly detection in alignments. These advancements have enabled phylogenetics to handle the data explosion while maintaining statistical rigor.Categorized Lists
Alignment and Data Preparation Tools
Alignment and data preparation tools are essential in phylogenetics for generating multiple sequence alignments (MSAs), which arrange homologous sequences to identify conserved regions and facilitate subsequent tree-building analyses. These tools handle the preprocessing of raw sequence data, addressing challenges such as insertions, deletions (indels), and varying sequence lengths to produce aligned inputs in standard formats like FASTA or Clustal for downstream phylogenetic inference.[30][31][32][33] Clustal Omega, developed in 2011 by researchers at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), employs a progressive alignment strategy optimized for large datasets, capable of aligning hundreds of thousands of sequences in hours. It constructs guide trees using mBed, a multicore implementation of k-means clustering for rapid distance calculation, and incorporates profile-profile alignments via HHblits for enhanced accuracy on protein sequences. The tool supports DNA, protein, and even structural data, with robust handling of indels through its iterative refinement phase.[30] MAFFT, introduced in 2002 by Kazutaka Katoh and colleagues at Kyoto University, utilizes a fast Fourier transform (FFT) approach to compute alignments efficiently, reducing CPU time compared to earlier methods while maintaining high accuracy. Its core algorithm involves iterative refinement, where initial progressive alignments are progressively improved through local rearrangements; specialized variants like G-INS-i prioritize global homology for small, accurate alignments on datasets up to thousands of sequences. MAFFT accommodates both nucleotide and amino acid sequences, excelling in profile alignments for homologous sequence sets.[31] MUSCLE, released in 2004 by Robert C. Edgar, implements multiple sequence comparison by log-expectation (MUSCLE), a three-phase algorithm that balances speed and precision for protein and DNA alignments. The method begins with a progressive alignment using a neighbor-joining guide tree based on k-mer distances, followed by iterative refinement via dynamic programming to optimize scores, and concludes with tree-dependent refinement for final adjustments. It supports profile alignments and outputs in multiple formats, making it suitable for datasets of moderate size (up to 10,000 sequences).[32][34] T-Coffee, created in 2000 by Cedric Notredame and colleagues at the Centre for Genomic Regulation (CRG) in Barcelona, adopts a consistency-based framework that builds a library of pairwise alignments to guide progressive multiple alignment construction. This approach integrates local and global alignments, enhancing accuracy by penalizing inconsistencies across sequence pairs, and extends to structural data via templates like 3D coordinates. The tool handles indels effectively through its progressive and library extension methods, supporting DNA, RNA, protein, and hybrid inputs for versatile phylogenetic preparation.[33][35] Common features across these tools include sophisticated indel modeling to preserve evolutionary gaps, support for profile-based alignments to incorporate pre-existing MSAs, and export options in phylogenetics-compatible formats such as FASTA and Clustal for integration into tree inference pipelines like maximum likelihood or Bayesian methods.[30][31][32][33]Distance-Based and Clustering Methods
Distance-based and clustering methods in phylogenetics rely on pairwise evolutionary distances derived from aligned sequences to infer tree topologies, offering computational efficiency for large datasets compared to character-based approaches. These methods first compute a distance matrix using models that account for multiple substitutions, such as the Poisson correction for amino acid sequences or the Kimura 2-parameter model for nucleotides, which estimate the actual number of changes rather than observed differences.[36][37] Clustering algorithms then agglomerate taxa based on these distances; unweighted pair group method with arithmetic mean (UPGMA) assumes a molecular clock and produces ultrametric trees, while neighbor-joining (NJ) relaxes this assumption to handle rate heterogeneity, minimizing total branch length for additive distances where path lengths match observed distances.[37] Non-additive distances, arising from rate variation or estimation errors, are addressed through corrections in advanced implementations. Bootstrapping, by resampling alignments to generate pseudoreplicates, assesses branch support in these trees, providing statistical confidence without assuming additivity. PHYLIP (Phylogeny Inference Package), developed since 1980 and continuously updated, is a foundational free software suite for distance-based phylogenetics, distributed as modular programs for Unix, Windows, and Mac systems. It includes PROTDIST for calculating protein distances using models like the Poisson correction, which assumes equal substitution rates across sites, and DNADIST for nucleotide distances with options such as the Kimura 2-parameter model to correct for transitions and transversions.[38] The SEQBOOT program enables bootstrapping by generating multiple datasets from input alignments, while NEIGHBOR implements NJ and UPGMA clustering to build trees from distance matrices, supporting both additive and approximate handling of non-additive distances through star-like decompositions. PHYLIP's design emphasizes flexibility for batch processing large alignments post-preparation, with over 30 programs integrated for comprehensive analysis.[3][39] MEGA (Molecular Evolutionary Genetics Analysis), first released in 1993 and actively maintained by researchers at Penn State University, provides an integrated graphical interface for distance-based tree inference across DNA, protein, and gene frequency data. It computes distances using a suite of models, including the Kimura 2-parameter for nucleotides and Poisson correction for amino acids, with built-in handling of gamma-distributed rate variation to address non-additive distances.[40] MEGA supports UPGMA for clock-like data and minimum evolution (a variant of NJ) for general cases, incorporating bootstrapping via 100–1000 replicates to evaluate node reliability, often visualized directly in the software. Its user-friendly workflow, from distance estimation to tree output, has made it widely adopted for educational and research purposes, with recent versions like MEGA12 optimizing for multi-core processing of datasets up to thousands of sequences.[41][36] QuickTree, introduced in 1999 by developers at the University of Oxford and the Sanger Institute, is a specialized command-line tool optimized for rapid NJ tree construction from large protein sequence alignments, achieving speeds over 100 times faster than standard NJ for datasets exceeding 10,000 taxa. It employs a four-cluster analysis to approximate the NJ criterion, efficiently handling additive distances while tolerating moderate non-additivity through sequential clustering, without built-in bootstrapping but compatible with external resampling tools. QuickTree's focus on scalability has been pivotal for analyzing massive protein families, such as Pfam alignments with 27,000 HIV GP120 sequences, making it ideal for initial exploratory phylogenies.[42] BioNJ, developed in 2001 at LIRMM (Montpellier), enhances the NJ algorithm with bio-inspired corrections for among-site rate variation, improving topological accuracy for distance matrices from DNA or protein sequences under heterogeneous evolutionary models. It dynamically estimates variance in branch lengths to better approximate additive distances, outperforming standard NJ in simulations with gamma-distributed rates, and supports bootstrapping integration for robustness assessment. BioNJ's implementation remains available as a standalone executable, emphasizing precision over speed for moderately sized datasets where rate heterogeneity distorts non-additive distances.[43][44]Parsimony-Based Methods
Parsimony-based methods in phylogenetics software infer evolutionary trees by selecting those that require the minimum number of character state changes, known as steps, to explain the observed data. This approach, rooted in the principle of maximum parsimony, is particularly suited for datasets with morphological characters or small molecular sequences where the goal is to minimize homoplasy without assuming an evolutionary model. Software implementing these methods typically supports exhaustive searches for small datasets, branch-and-bound algorithms to guarantee optimality, and heuristic strategies for larger ones, while handling multistate characters either as unordered (Fitch parsimony) or ordered (Wagner parsimony).[45] However, parsimony analyses can be sensitive to long-branch attraction, where rapidly evolving lineages artifactually group together, though advanced heuristics like sectorial search can mitigate this by exploring tree space more thoroughly. PAUP* (Phylogenetic Analysis Using Parsimony and Other Methods), developed by David L. Swofford since 1981 and maintained ongoing, is a foundational tool for parsimony-based tree inference. It offers exhaustive searches that evaluate all possible trees for datasets up to about 15 taxa, branch-and-bound methods to find optimal trees without exhaustive enumeration, and heuristic approaches like stepwise addition with tree-bisection-reconnection (TBR) branch swapping for larger matrices.[45] PAUP accommodates both unordered and ordered characters, as well as compatibility-based methods that seek cliques of compatible characters, making it versatile for discrete data analysis. TNT (Tree Analysis Using New Technology), released in 2001 by an Argentine team led by Pablo Goloboff and ongoing, excels in fast heuristic parsimony searches optimized for large matrices, often handling thousands of taxa efficiently.[46] Its algorithms include implicit enumeration for exact solutions on moderate datasets and innovative tree-fusing techniques that merge suboptimal trees to escape local optima, combined with sectorial search to reduce long-branch attraction effects. TNT uses step counts as the primary optimality criterion and supports implied weighting schemes to downweight homoplasious characters, enhancing robustness for complex datasets. NONA, introduced in 1995 by Pablo Goloboff as a command-line parsimony program and still available, focuses on efficient searches for optimal trees using the parsimony ratchet, which perturbs character weights to sample tree space beyond local optima.[47] It handles multistate characters through non-additive or additive coding and integrates implied weights for successive approximations, prioritizing minimal step counts while allowing user-defined weights. NONA's design emphasizes speed for morphological data, often serving as a precursor to more advanced tools like TNT.Maximum Likelihood Methods
Maximum likelihood (ML) methods for phylogenetic inference seek to find the tree topology, branch lengths, and model parameters that maximize the likelihood of the observed sequence data under a specified evolutionary model, typically incorporating nucleotide substitution rates, site heterogeneity, and among-site rate variation.http://www.atgc-montpellier.fr/phyml/paper.php These approaches provide a statistical framework for tree estimation, outperforming simpler methods in accuracy for complex datasets by explicitly modeling evolutionary processes such as transitions and transversions via models like the General Time Reversible (GTR) framework.https://academic.oup.com/sysbio/article/52/5/696/1641391 Key software in this category emphasizes efficient heuristic searches to handle large alignments, often integrating features like Gamma-distributed site-specific rates combined with invariant sites for better rate heterogeneity modeling, joint branch length estimation across partitions, and support for heterogeneous data from multiple genes or loci.https://cme.h-its.org/exelixis/resource/download/NewManual.pdf PhyML, first released in 2003 and continuously developed by Stéphane Guindon and colleagues, employs a heuristic hill-climbing algorithm based on nearest-neighbor interchanges (NNI) for topology searches, enabling rapid ML estimation on nucleotide and amino acid alignments.https://academic.oup.com/sysbio/article/52/5/696/1641391 It supports a range of substitution models including GTR and incorporates approximate likelihood ratio tests (aLRT) for branch support assessment, making it suitable for datasets up to thousands of sequences while accommodating Gamma + invariant sites for site-rate variation and optimized branch lengths.https://academic.oup.com/sysbio/article/59/3/307/1702850 RAxML, initiated in 2004 by Alexandros Stamatakis and under ongoing maintenance, utilizes randomized accelerated ML searches with subtree-pruning-regrafting (SPR) moves and rapid bootstrapping heuristics to infer trees from large phylogenomic datasets.https://academic.oup.com/bioinformatics/article/22/21/2688/240496 Designed for scalability, it leverages Pthreads for shared-memory parallelism and MPI for distributed computing, efficiently processing alignments with millions of sites while supporting partitioned models, GTR-based evolution, and site-specific rates via Gamma + invariant sites for heterogeneous data handling.https://academic.oup.com/bioinformatics/article/30/9/1312/239061 IQ-TREE, developed since 2014 by Lam-Tung Nguyen, Bui Quang Lam, and team, integrates ModelFinder for automated model selection using the Bayesian information criterion (BIC) and employs stochastic hill-climbing with UFBoot for ultrafast bootstrap approximations, achieving high accuracy on diverse molecular data.https://academic.oup.com/mbe/article/32/1/268/2925592 It excels in branch length optimization and supports advanced features like mixture models for site heterogeneity, Gamma + invariant sites, and partitioned analyses for multi-gene datasets, often outperforming competitors in speed and likelihood scores on large-scale phylogenomics.https://academic.oup.com/mbe/article/37/7/1911/5810651 GARLI, released in 2005 by Derrick Zwickl at the University of Kansas and actively maintained, applies a genetic algorithm for simultaneous optimization of tree topology and parameters under ML, facilitating robust searches on nucleotide, codon, and amino acid data.https://www.bio.utexas.edu/faculty/antisense/garli/Garli.html It handles partitioned models for heterogeneous evolutionary rates across loci, incorporates Gamma + invariant sites for site-specific variation, and estimates branch lengths efficiently, making it effective for complex datasets requiring fine-tuned likelihood maximization.https://repositories.lib.utexas.edu/bitstream/handle/2152/3108/zwickl347.pdfBayesian Inference Methods
Bayesian inference methods in phylogenetics employ Markov chain Monte Carlo (MCMC) sampling to approximate the posterior distribution of phylogenetic trees, integrating prior probabilities with the likelihood of sequence data to quantify uncertainty through posterior probabilities of clades and parameters.[48] Unlike maximum likelihood approaches, which optimize point estimates, these methods generate distributions that reflect evolutionary model uncertainty and provide credible intervals for branch lengths and divergence times.[49] Key software in this category focuses on flexible prior specifications, such as uniform priors on tree topologies, which assume equal probability across all possible unrooted topologies to represent ignorance about relationships.[48] MrBayes, released in 2001 and developed by Fredrik Ronquist and colleagues, implements Bayesian phylogenetic inference using MCMC sampling based on the Metropolis-Hastings algorithm to explore tree space and estimate posterior probabilities.[16] It supports mixed evolutionary models, allowing different partitions of data (e.g., genes or codon positions) to evolve under distinct substitution models, which enhances accuracy for heterogeneous datasets.[50] Convergence is assessed via diagnostics like the average standard deviation of split frequencies, where values below 0.01 indicate adequate chain mixing, alongside effective sample size (ESS) calculations to ensure sufficient independent samples from the posterior, typically requiring ESS > 200 for reliable estimates.[51] BEAST, introduced in 2002 by Andrew Rambaut, Alexei J. Drummond, and others, extends Bayesian analysis to incorporate molecular clocks for estimating divergence times, using relaxed clock models that allow rate variation across branches while sampling from the posterior distribution of trees and parameters.[52] Its StarBEAST extension applies the multispecies coalescent (MSC) model to infer species trees from multiple gene trees, accounting for incomplete lineage sorting by jointly estimating gene and species phylogenies.[53] Like other Bayesian tools, BEAST relies on ESS for convergence monitoring, where low ESS values signal poor chain mixing and necessitate longer runs or multiple chains.[54] BayesPhylogenies, developed around 2004 at the University of Reading (associated with collaborators at the University of Manchester), specializes in variable-rate models for heterogeneous evolution, using reversible jump MCMC to sample across models with differing numbers of rate categories for traits or substitutions.[55] This allows inference of phylogenetic trees while accommodating rate shifts, such as in linguistic or cultural evolution, with priors on rates often drawn from gamma distributions to reflect uncertainty in evolutionary tempo.[56] BAli-Phy, released in 2006 by Paul O. Lewis and Marc A. Suchard, uniquely co-estimates multiple sequence alignments and phylogenetic trees within a Bayesian framework, sampling from the joint posterior to propagate alignment uncertainty into tree inferences. It employs MCMC to explore indel and substitution models simultaneously, improving accuracy for divergent sequences where alignment errors can bias phylogeny.[49] Across these tools, marginal likelihood estimation is crucial for model comparison, with methods like stepping-stone sampling providing unbiased estimates by bridging the prior and posterior through a series of intermediate distributions, often yielding more accurate Bayes factors than thermodynamic integration.[57] Prior specifications, such as uniform distributions on topologies, ensure neutrality in tree selection, while ESS metrics guide users in discarding burn-in periods to achieve stationary sampling.[48]Visualization and Post-Analysis Tools
Visualization and post-analysis tools in phylogenetics enable researchers to render, explore, and interpret inferred phylogenetic trees, often incorporating support values such as bootstrap proportions or posterior probabilities from methods like maximum likelihood or Bayesian inference. These tools facilitate tasks such as tree editing, consensus computation, and comparative analyses, providing insights into evolutionary relationships beyond initial tree construction. Outputs from inference software, typically in Newick or Nexus formats, serve as inputs for these visualization platforms to generate publication-ready figures and perform quantitative comparisons. FigTree, developed by Andrew Rambaut in 2006, is a Java-based graphical viewer designed for displaying and annotating phylogenetic trees in Newick and Nexus formats.[58] It offers customizable layouts, including radial and rectangular tree representations, along with options to label nodes with support values and branch lengths, and supports high-quality exports to vector formats like SVG and PDF for publications.[58] This tool is particularly valued for its simplicity and flexibility in preparing trees for scientific communication without requiring advanced programming knowledge.[59] Dendroscope, first released in 2006 by researchers at the University of Tübingen, provides an interactive platform for visualizing large rooted phylogenetic trees and networks in three dimensions.[60] It excels in handling supertrees and computing consensus trees, such as majority-rule summaries, to summarize multiple inferred topologies, and includes tools for generating tanglegrams to reconcile pairs of trees, such as host-parasite phylogenies.[61] The software supports efficient navigation of datasets with thousands of taxa through zooming, panning, and editing features, making it suitable for exploratory post-analysis.[62] TreeView, authored by Roderic D. M. Page in 1998, is a lightweight viewer for rooted and unrooted phylogenetic trees compatible with Windows and Mac platforms. It displays bootstrap support values and branch lengths directly on trees imported from formats like NEXUS or PHYLIP, allowing users to reroot trees and adjust visual parameters for clarity.[63] Though simpler than modern alternatives, its cross-platform accessibility has made it a longstanding choice for quick inspections of inference outputs. Mesquite, initiated in 2001 by Wayne P. Maddison and ongoing through the Mesquite Project, serves as a modular system for comparative phylogenetic analysis and evolutionary data management.[64] It supports ancestral state reconstruction using methods like squared-change parsimony to infer character evolution on trees, and integrates tools for consensus tree computation from sets of trees.[65] Additionally, Mesquite enables tanglegram visualization for comparing tree topologies and calculates tree comparison metrics, including the Robinson-Foulds distance, which quantifies symmetric differences between bifurcating tree structures by counting unique and shared bipartitions.[64] This extensibility via plugins allows tailored post-analysis workflows, emphasizing conceptual evolutionary insights over raw computation.[65]References
- https://www.[researchgate](/page/ResearchGate).net/publication/258368548_Phylip_and_Phylogenetics
