Hubbry Logo
Molecular phylogeneticsMolecular phylogeneticsMain
Open search
Molecular phylogenetics
Community hub
Molecular phylogenetics
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Molecular phylogenetics
Molecular phylogenetics
from Wikipedia

Molecular phylogenetics (/məˈlɛkjʊlər ˌfləˈnɛtɪks, mɒ-, m-/[1][2]) is the branch of phylogeny that analyzes genetic, hereditary molecular differences, predominantly in DNA sequences, to gain information on an organism's evolutionary relationships. From these analyses, it is possible to determine the processes by which diversity among species has been achieved. The result of a molecular phylogenetic analysis is expressed in a phylogenetic tree. Molecular phylogenetics is one aspect of molecular systematics, a broader term that also includes the use of molecular data in taxonomy and biogeography.[3][4][5]

Molecular phylogenetics and molecular evolution correlate. Molecular evolution is the process of selective changes (mutations) at a molecular level (genes, proteins, etc.) throughout various branches in the tree of life (evolution). Molecular phylogenetics makes inferences of the evolutionary relationships that arise due to molecular evolution and results in the construction of a phylogenetic tree.[6]

History

[edit]

The theoretical frameworks for molecular systematics were laid in the 1960s in the works of Emile Zuckerkandl, Emanuel Margoliash, Linus Pauling, and Walter M. Fitch.[7] Applications of molecular systematics were pioneered by Charles G. Sibley (birds), Herbert C. Dessauer (herpetology), and Morris Goodman (primates), followed by Allan C. Wilson, Robert K. Selander, and John C. Avise (who studied various groups). Work with protein electrophoresis began around 1956. Although the results were not quantitative and did not initially improve on morphological classification, they provided tantalizing hints that long-held notions of the classifications of birds, for example, needed substantial revision. In the period of 1974–1986, DNA–DNA hybridization was the dominant technique used to measure genetic difference.[8]

Theoretical background

[edit]

Early attempts at molecular systematics were also termed chemotaxonomy and made use of proteins, enzymes, carbohydrates, and other molecules that were separated and characterized using techniques such as chromatography. These have been replaced in recent times largely by DNA sequencing, which produces the exact sequences of nucleotides or bases in either DNA or RNA segments extracted using different techniques. In general, these are considered superior for evolutionary studies, since the actions of evolution are ultimately reflected in the genetic sequences. At present, it is still a long and expensive process to sequence the entire DNA of an organism (its genome). However, it is quite feasible to determine the sequence of a defined area of a particular chromosome. Typical molecular systematic analyses require the sequencing of around 1000 base pairs. At any location within such a sequence, the bases found in a given position may vary between organisms. The particular sequence found in a given organism is referred to as its haplotype. In principle, since there are four base types, with 1000 base pairs, we could have 41000 distinct haplotypes. However, for organisms within a particular species or in a group of related species, it has been found empirically that only a minority of sites show any variation at all, and most of the variations that are found are correlated, so that the number of distinct haplotypes that are found is relatively small.[9]

In a phylogenetic tree, numerous groupings (clades) exist. A clade may be defined as a group of organisms having a common ancestor throughout evolution. This figure illustrates how a clade in a phylogenetic tree may be expressed.

In a molecular systematic analysis, the haplotypes are determined for a defined area of genetic material; a substantial sample of individuals of the target species or other taxon is used; however, many current studies are based on single individuals. Haplotypes of individuals of closely related, yet different, taxa are also determined. Finally, haplotypes from a smaller number of individuals from a definitely different taxon are determined: these are referred to as an outgroup. The base sequences for the haplotypes are then compared. In the simplest case, the difference between two haplotypes is assessed by counting the number of locations where they have different bases: this is referred to as the number of substitutions (other kinds of differences between haplotypes can also occur, for example, the insertion of a section of nucleic acid in one haplotype that is not present in another). The difference between organisms is usually re-expressed as a percentage divergence, by dividing the number of substitutions by the number of base pairs analysed: the hope is that this measure will be independent of the location and length of the section of DNA that is sequenced.

An older and superseded approach was to determine the divergences between the genotypes of individuals by DNA–DNA hybridization. The advantage claimed for using hybridization rather than gene sequencing was that it was based on the entire genotype, rather than on particular sections of DNA. Modern sequence comparison techniques overcome this objection by the use of multiple sequences.

Once the divergences between all pairs of samples have been determined, the resulting triangular matrix of differences is submitted to some form of statistical cluster analysis, and the resulting dendrogram is examined in order to see whether the samples cluster in the way that would be expected from current ideas about the taxonomy of the group. Any group of haplotypes that are all more similar to one another than any of them is to any other haplotype may be said to constitute a clade, which may be visually represented as the figure displayed on the right demonstrates. Statistical techniques such as bootstrapping and jackknifing help in providing reliability estimates for the positions of haplotypes within the evolutionary trees.

Techniques and applications

[edit]

Every living organism contains deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and proteins. In general, closely related organisms have a high degree of similarity in the molecular structure of these substances, while the molecules of organisms distantly related often show a pattern of dissimilarity. Conserved sequences, such as mitochondrial DNA, are expected to accumulate mutations over time, and assuming a constant rate of mutation, provide a molecular clock for dating divergence. Molecular phylogeny uses such data to build a "relationship tree" that shows the probable evolution of various organisms. With the invention of Sanger sequencing in 1977, it became possible to isolate and identify these molecular structures.[10][11] High-throughput sequencing may also be used to obtain the transcriptome of an organism, allowing inference of phylogenetic relationships using transcriptomic data.

The most common approach is the comparison of homologous sequences for genes using sequence alignment techniques to identify similarity. Another application of molecular phylogeny is in DNA barcoding, wherein the species of an individual organism is identified using small sections of mitochondrial DNA or chloroplast DNA. Another application of the techniques that make this possible can be seen in the very limited field of human genetics, such as the ever-more-popular use of genetic testing to determine a child's paternity, as well as the emergence of a new branch of criminal forensics focused on evidence known as genetic fingerprinting.

Molecular phylogenetic analysis

[edit]

There are several methods available for performing a molecular phylogenetic analysis. One method, including a comprehensive step-by-step protocol on constructing a phylogenetic tree, including DNA/Amino Acid contiguous sequence assembly, multiple sequence alignment, model-test (testing best-fitting substitution models), and phylogeny reconstruction using Maximum Likelihood and Bayesian Inference, is available at Nature Protocol.[12]

Another molecular phylogenetic analysis technique has been described by Pevsner and shall be summarized in the sentences to follow (Pevsner, 2015). A phylogenetic analysis typically consists of five major steps. The first stage comprises sequence acquisition. The following step consists of performing a multiple sequence alignment, which is the fundamental basis of constructing a phylogenetic tree. The third stage includes different models of DNA and amino acid substitution. Several models of substitution exist. A few examples include Hamming distance, the Jukes and Cantor one-parameter model, and the Kimura two-parameter model (see Models of DNA evolution). The fourth stage consists of various methods of tree building, including distance-based and character-based methods. The normalized Hamming distance and the Jukes-Cantor correction formulas provide the degree of divergence and the probability that a nucleotide changes to another, respectively. Common tree-building methods include unweighted pair group method using arithmetic mean (UPGMA) and Neighbor joining, which are distance-based methods, Maximum parsimony, which is a character-based method, and Maximum likelihood estimation and Bayesian inference, which are character-based/model-based methods. UPGMA is a simple method; however, it is less accurate than the neighbor-joining approach. Finally, the last step comprises evaluating the trees. This assessment of accuracy is composed of consistency, efficiency, and robustness.[13]

Five Stages of Molecular Phylogenetic Analysis

MEGA (molecular evolutionary genetics analysis) is an analysis software that is user-friendly and free to download and use. This software is capable of analyzing both distance-based and character-based tree methodologies. MEGA also contains several options one may choose to utilize, such as heuristic approaches and bootstrapping. Bootstrapping is an approach that is commonly used to measure the robustness of topology in a phylogenetic tree, which demonstrates the percentage each clade is supported after numerous replicates. In general, a value greater than 70% is considered significant. The flow chart displayed on the right visually demonstrates the order of the five stages of Pevsner's molecular phylogenetic analysis technique that have been described.[13]

Limitations

[edit]

Molecular systematics is an essentially cladistic approach: it assumes that classification must correspond to phylogenetic descent, and that all valid taxa must be monophyletic. This is a limitation when attempting to determine the optimal tree(s), which often involves bisecting and reconnecting portions of the phylogenetic tree(s).

The recent discovery of extensive horizontal gene transfer among organisms provides a significant complication to molecular systematics, indicating that different genes within the same organism can have different phylogenies. HGTs can be detected and excluded using a number of phylogenetic methods (see Inferring horizontal gene transfer § Explicit phylogenetic methods).

In addition, molecular phylogenies are sensitive to the assumptions and models that go into making them. Firstly, sequences must be aligned; then, issues such as long-branch attraction, saturation, and taxon sampling problems must be addressed. This means that strikingly different results can be obtained by applying different models to the same dataset.[14][15] The tree-building method also brings with it specific assumptions about tree topology, evolution speeds, and sampling. The simplistic UPGMA assumes a rooted tree and a uniform molecular clock, both of which can be incorrect.[13]

The low resolution power of single genes have been overcome using multigene phylogenies, though the issue of HGT still merits careful algorithmic design.[16]

See also

[edit]

Notes and references

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Molecular phylogenetics is the branch of evolutionary biology that employs molecular data, such as DNA, RNA, and protein sequences, to infer the phylogenetic relationships and evolutionary histories among organisms, populations, or genes. By analyzing genetic similarities and differences, it constructs phylogenetic trees—diagrammatic representations of evolutionary divergence—where branches symbolize lineages and nodes indicate speciation or divergence events. This approach has revolutionized systematics by providing quantifiable, heritable markers that overcome limitations of morphological data, such as convergence or homoplasy. The field traces its origins to the mid-20th century, building on earlier immunological and protein-based studies expanded by George H. F. Nuttall in 1904 using serological tests to assess primate relationships. A pivotal advancement came in the 1960s with the work of Émile Zuckerkandl and Linus Pauling, who proposed using amino acid sequences to estimate divergence times and construct molecular clocks, assuming constant evolutionary rates. The advent of DNA sequencing technologies in the late 1970s, pioneered by Frederick Sanger, propelled molecular phylogenetics forward, enabling large-scale sequence comparisons and shifting the discipline from qualitative to statistical inference. By the 1980s, computational tools like neighbor-joining algorithms formalized tree-building processes, marking the transition to modern phylogenetics. Central to molecular phylogenetics are several key methods for tree reconstruction, each addressing the challenges of sequence evolution under models of nucleotide or amino acid substitution. Maximum parsimony seeks the tree requiring the fewest evolutionary changes, a concept formalized by Walter Fitch in 1971. Distance-based methods, such as neighbor-joining introduced by Saitou and Nei in 1987, use pairwise genetic distances derived from models like Jukes-Cantor (1969) to cluster taxa efficiently. More sophisticated model-based approaches include maximum likelihood, developed by Joseph Felsenstein in 1981, which evaluates tree topologies by maximizing the probability of observing the data given an evolutionary model, and , advanced in the late 1990s and early 2000s with software like MrBayes (2001), incorporating prior probabilities for robust uncertainty estimation. These methods often incorporate bootstrap resampling to assess branch support and molecular clock calibrations using fossils to date divergences. Applications of molecular phylogenetics extend across biology, informing fields from conservation to medicine. It has clarified major evolutionary transitions, such as the human-chimpanzee divergence approximately 6–7 million years ago based on genomic analyses. In epidemiology, it traces pathogen origins, like HIV-1's zoonotic jump from chimpanzees circa 1900–1930. Phylogenomics, an extension using genome-wide data, addresses complex issues like incomplete lineage sorting and horizontal gene transfer, enhancing resolution in deep-time phylogenies. Despite challenges like long-branch attraction or rate heterogeneity, ongoing advances in sequencing and computation continue to refine its accuracy and scope.

Fundamentals

Definition and Scope

Molecular phylogenetics is the branch of phylogenetics that reconstructs the evolutionary histories of organisms by analyzing sequences of DNA, RNA, or proteins, with a focus on shared derived characters known as synapomorphies at the molecular level. This approach identifies evolutionary relationships through homologous molecular sequences that reflect common ancestry, enabling the construction of phylogenetic trees that depict branching patterns, such as rooted trees (which designate an ancestral node) or unrooted trees (which emphasize relative divergences without a specified root). Pioneering work by Émile Zuckerkandl and Linus Pauling laid the groundwork for using molecular data to document evolutionary history. The scope of molecular phylogenetics extends from the analysis of single genes to comprehensive comparisons of entire genomes, a field termed phylogenomics, and applies across diverse evolutionary scales, including population-level variations and deep-time divergences between ancient lineages. It integrates sequence homology to trace relationships at higher taxonomic ranks, forming a core component of molecular systematics and evolutionary biology. In distinction from classical phylogenetics, which depends on observable phenotypic traits and morphological similarities, molecular phylogenetics relies on quantifiable rates of , exemplified by the neutral theory of molecular evolution introduced by , positing that most molecular changes are due to random rather than . This molecular basis provides unambiguous for , offering key benefits such as enhanced resolution for distinguishing closely related and elucidating ancient divergences where is limited or absent.

Molecular Data Sources

Molecular phylogenetics primarily utilizes sequences from DNA, RNA, and proteins as data sources to infer evolutionary relationships among organisms. DNA sequences, derived from nuclear, mitochondrial, or chloroplast genomes, serve as the most common molecular markers due to their abundance and variability across taxa. RNA sequences, particularly (rRNA), provide conserved regions suitable for broad comparative analyses, while protein sequences, translated from coding regions, offer amino acid-level resolution that captures functional constraints on evolution. These data types are selected based on the phylogenetic scale: nuclear DNA for genome-wide patterns, mitochondrial DNA for maternal lineages in animals, and chloroplast DNA for plant-specific histories. Key properties of these data sources influence their suitability for different evolutionary questions. DNA sequences exhibit high variability, particularly in non-coding regions and synonymous sites, making them ideal for resolving recent divergences where substitutions accumulate rapidly; for instance, mitochondrial DNA evolves 5-10 times faster than nuclear DNA, enabling fine-scale population studies. In contrast, protein sequences are more conserved due to selection pressures on amino acid functionality, providing robust signals for deep phylogenies spanning millions of years. RNA molecules, such as rRNA, balance conservation and variability through structured domains, with haplotypes or alleles serving as comparable units to trace lineage-specific inheritance patterns. These properties also inform evolutionary rate inferences, as variable DNA sites help calibrate molecular clocks for divergence timing. Acquisition of these molecular data has evolved from targeted methods to high-throughput approaches. Traditional polymerase chain reaction (PCR) amplification isolates specific loci from extracted genomic material, allowing precise sequencing of genes like rRNA or mitochondrial markers in limited samples. Modern next-generation sequencing (NGS) enables genome-scale data generation by parallelizing millions of reads, facilitating phylogenomic analyses with reduced bias and increased resolution for complex datasets. Sequence alignment is essential prior to analysis to identify homologous positions across taxa. Representative examples illustrate the application of these data sources. Cytochrome c, a highly conserved protein, was among the first used in early molecular studies to reconstruct eukaryotic phylogenies based on amino acid differences, revealing branching patterns consistent with classical taxonomy. The 16S rRNA gene, an RNA marker with conserved core regions, revolutionized microbial phylogenetics by enabling the classification of prokaryotes into domains, as demonstrated in foundational analyses that uncovered the Archaea. For species-level identification, the cytochrome c oxidase I (COI) gene from mitochondrial DNA acts as a DNA barcode, offering rapid discrimination due to its moderate mutation rate and universal primers. Several considerations affect the reliability of these data in phylogenetic inference. Homoplasy, the independent convergence or reversal of sequence states, can obscure true relationships, with rates varying by data type—higher in rapidly evolving DNA than in constrained proteins. Insertion-deletion mutations (indels) introduce gaps in alignments, potentially adding informative characters but complicating homology assessment if not properly coded. In RNA data, secondary structures formed by base pairing must be accounted for, as they influence substitution patterns and alignment accuracy in conserved regions like rRNA stems and loops.

Historical Development

Early Foundations (Pre-1980s)

The foundations of molecular phylogenetics emerged in the mid-20th century, driven by advances in biochemistry that allowed comparisons of molecular sequences to infer evolutionary relationships, shifting the focus from morphological traits to quantifiable genetic and protein data. In 1962, Émile Zuckerkandl and Linus Pauling introduced the concept of "molecular disease," positing that mutations in protein sequences, akin to those causing genetic disorders like sickle cell anemia, could serve as markers of evolutionary change, linking molecular alterations directly to phylogenetic history. This idea laid the groundwork for using proteins as evolutionary documents, emphasizing how sequence variations accumulate over time to reflect divergence among species. Pioneering work by Zuckerkandl and Pauling in 1965 formalized the , proposing that mutations in proteins and nucleic acids occur at a relatively constant rate, the of evolutionary timelines through comparisons. They applied this to protein sequences, particularly and , to estimate primate phylogenies, suggesting that differences in sequences could quantify times and challenge traditional taxonomic hierarchies based on . Their of primate highlighted closer relatedness among humans, chimpanzees, and than previously thought from . Early techniques for molecular included protein in the 1950s, which separated proteins based on charge and to detect variations, and immunological methods that measured antigenic distances between proteins from different . These approaches, such as microcomplement fixation, provided initial quantitative estimates of genetic similarity without full sequencing. By the 1970s, DNA-DNA hybridization emerged as a key method, where the stability of hybrid DNA duplexes from different organisms indicated , allowing of evolutionary rates for broader taxonomic groups. A event was the development by Walter M. Fitch and Emanuel Margoliash of a method using sequences from multiple to construct phylogenetic trees, minimizing deviations between observed and inferred evolutionary distances. This approach demonstrated the feasibility of objective tree-building from molecular , revealing patterns like equidistant divergence in vertebrates. That same year, Vincent Sarich and Allan C. Wilson published the first molecular phylogeny of primates using immunological distances from serum albumins, estimating the human-chimpanzee divergence at about 5 million years ago—far more recent than morphological estimates—and upending classical views of hominid evolution. This molecular turn enabled a conceptual shift from qualitative morphological assessments to quantitative evolutionary rates, with relatively constant evolutionary rates, typically around 0.1% amino acid substitutions per site per million years for many proteins, providing a clock-like metric for timing splits, though calibrations varied by lineage and were informed by fossil evidence. These pre-1980s innovations established molecules as reliable phylogenetic tools, influencing later DNA sequencing efforts by proving the power of sequence data to resolve deep evolutionary questions.

Expansion and Modernization (1980s-Present)

The 1980s marked a pivotal era in molecular phylogenetics, driven by technological breakthroughs that enabled the routine generation of DNA sequence data. The invention of the polymerase chain reaction (PCR) in 1983 by Kary Mullis revolutionized nucleic acid amplification, allowing researchers to produce sufficient quantities of target DNA for sequencing and analysis from minimal starting material. Concurrently, Sanger sequencing, developed in 1977 by Frederick Sanger and colleagues, saw widespread adoption throughout the 1980s as automated versions became commercially available, facilitating the sequencing of longer DNA fragments and enabling the first systematic comparisons of ribosomal RNA (rRNA) genes across diverse taxa. A landmark achievement came in 1986 when Carl Woese published an rRNA-based phylogenetic tree that underscored the deep evolutionary divergences among prokaryotes, laying the groundwork for recognizing Archaea as a distinct domain. By the late 1980s and into the 1990s, initial whole-genome comparisons emerged, such as those involving small viral and mitochondrial genomes, which provided early insights into genome-wide evolutionary patterns beyond single genes. The 2000s ushered in phylogenomics, characterized by the integration of multi-gene datasets to reconstruct more robust evolutionary histories. This shift was propelled by the completion of the in 2003, which not only sequenced the entire but also established infrastructure for high-throughput genomic analyses, enabling comparative studies across . Seminal reviews highlighted how genome-scale resolved longstanding ambiguities in animal and microbial phylogenies, such as the position of bilaterian , by analyzing hundreds of orthologous genes simultaneously. These multi-locus approaches reduced stochastic errors from single-gene analyses and improved resolution of deep divergences, marking a transition from gene-centric to genome-wide inference. From the 2010s onward, next-generation sequencing (NGS) technologies, exemplified by Illumina's platform introduced in 2005 and commercialized in 2006, dramatically increased data throughput, allowing for metagenomic surveys of unculturable microbes and complex communities. This facilitated real-time phylogenetic tracking in viral epidemics; for instance, NGS enabled detailed reconstruction of HIV transmission networks by capturing intra-host diversity and inter-individual spread. Similarly, during the COVID-19 pandemic starting in 2020, NGS-driven phylogenetics allowed global monitoring of SARS-CoV-2 variants, revealing rapid evolutionary dynamics and informing public health responses through continuous genome surveillance. Post-2020, integration of machine learning has optimized tree search algorithms, using neural networks to predict optimal topologies from vast datasets and accelerate inference under complex models. These advancements precipitated paradigm shifts in the field, notably from single-gene phylogenies to genome-wide analyses, which better capture reticulate evolution but introduce challenges like incomplete lineage sorting (ILS). ILS occurs when ancestral polymorphisms persist through speciation events, leading to discordant gene trees; coalescent models, formalized in the multispecies coalescent framework, address this by modeling gene coalescence within species trees, enabling more accurate species-level inferences. This coalescent-based approach has become essential for resolving rapid radiations and hybridization events, transforming molecular phylogenetics into a data-rich, computationally intensive discipline.

Theoretical Foundations

Principles of Phylogenetic Inference

Molecular phylogenetics reconstructs evolutionary histories by inferring hypotheses of ancestor-descendant relationships among organisms using molecular data, such as DNA sequences. These relationships are typically represented as phylogenetic trees, which are branching diagrams illustrating the divergence of lineages from common ancestors. Cladograms depict these relationships without indicating the amount of evolutionary change along branches, focusing solely on the topology of splits, while phylograms incorporate branch lengths proportional to the extent of change, such as nucleotide substitutions. The primary goals of phylogenetic inference are to identify the tree topology that best explains the observed molecular data by either maximizing parsimony, which favors the tree requiring the fewest evolutionary changes, or maximum likelihood, which selects the tree that maximizes the probability of the data under specified evolutionary processes. To establish directionality in these trees, outgroup rooting is employed, where a distantly related taxon (the outgroup) is included to identify the root, thereby orienting the tree relative to the ingroup of interest and distinguishing ancestral from derived states. Central to this process are principles distinguishing homology—similarities in molecular characters due to shared ancestry—from homoplasy, which arises from independent evolution via convergence, parallelism, or reversal. In molecular data, character state changes, such as transitions (purine-to-purine or pyrimidine-to-pyrimidine substitutions) versus transversions (purine-to-pyrimidine changes), are evaluated to trace evolutionary transformations while minimizing homoplasy. Confidence in inferred trees is assessed through bootstrap resampling, a statistical technique that repeatedly samples the data with replacement to generate pseudoreplicates, from which the proportion of replicates supporting a particular branch indicates its robustness. Phylogenetic inference operates within a statistical framework that treats tree topologies as testable hypotheses, allowing comparisons of alternative arrangements through metrics like parsimony scores or likelihood values to determine the most supported evolutionary scenario. Monophyly—the condition where a group comprises a common ancestor and all its descendants—is rigorously tested using molecular synapomorphies, which are shared derived character states (e.g., unique sequence motifs) that corroborate the group's unity and distinguish it from outgroups. These analyses presuppose common descent, the shared ancestry of the taxa under study, and gradualism in molecular evolution, where changes accumulate incrementally over time rather than in abrupt shifts, enabling the reliable reconstruction of branching patterns from sequence divergences.

Evolutionary Substitution Models

Evolutionary substitution models describe the probabilistic processes by which nucleotides or amino acids change along phylogenetic branches, providing the mathematical foundation for correcting observed differences in sequences to infer evolutionary distances and likelihoods under a given tree topology. These models assume a Markov process where the rate of substitution depends on the current state and time, enabling the computation of transition probabilities between character states over evolutionary time. The simplest model is the Jukes-Cantor (JC69) model, which posits equal rates of substitution among all four nucleotides and equal stationary frequencies of 0.25 for each base. Under this one-parameter model, the evolutionary distance dd between two sequences is estimated from the proportion pp of observed differences as d=34ln(143p),d = -\frac{3}{4} \ln \left(1 - \frac{4}{3} p \right), which corrects for multiple substitutions at the same site. This distance represents the expected number of substitutions per site, assuming infinite sites and no back-mutations beyond the correction. The JC69 model serves as a baseline for more complex scenarios but underperforms when substitution rates vary. The Kimura two-parameter (K80) model extends JC69 by distinguishing between transitions (purine-to-purine or pyrimidine-to-pyrimidine changes) and transversions (purine-to-pyrimidine or vice versa), with transitions occurring at a higher rate α\alpha and transversions at rate β\beta. Let PP be the proportion of transitional differences and QQ the proportion of transversional differences; the evolutionary distance KK is then K=12ln((12PQ)12Q).K = -\frac{1}{2} \ln \left( (1 - 2P - Q) \sqrt{1 - 2Q} \right).
Add your contribution
Related Hubs
User Avatar
No comments yet.