UniFrac

UniFrac, a shortened version of unique fraction metric, is a distance metric used for comparing biological communities. It differs from dissimilarity measures such as Bray-Curtis dissimilarity in that it incorporates information on the relative relatedness of community members by incorporating phylogenetic distances between observed organisms in the computation.

Both weighted (quantitative) and unweighted (qualitative) variants of UniFrac^[1] are widely used in microbial ecology, where the former accounts for abundance of observed organisms, while the latter only considers their presence or absence. The method was devised by Catherine Lozupone, when she was working with Rob Knight^[2] of the University of Colorado at Boulder in 2005.^[3]^[4]

Research methods

The distance is calculated between pairs of samples (each sample represents an organismal community). All taxa found in one or both samples are placed on a phylogenetic tree. A branch leading to taxa from both samples is marked as "shared" and branches leading to taxa which appears only in one sample are marked as "unshared". The distance between the two samples is then calculated as:

\left({\frac {sum~of~unshared~branch~lengths}{sum~of~all~tree~branch~lengths}}\right){=}fraction~of~total~unshared~branch~lengths

This definition satisfies the requirements of a distance metric, being non-negative, zero only when entities are identical, transitive, and conforming to the triangle inequality.

Three examples of the triangle inequality for triangles with sides of lengths $x$ , $y$ , $z$ . The top example shows a case where $z$ is much less than the sum $x + y$ of the other two sides, and the bottom example shows a case where the side $z$ is only slightly less than $x + y$ .

If there are several different samples, a distance matrix can be created by making a tree for each pair of samples and calculating their UniFrac measure. Subsequently, standard multivariate statistical methods such as data clustering and principal co-ordinates analysis can be used.

One can determine the statistical significance of the UniFrac distance between two samples using Monte Carlo simulations. By randomizing the sample classification of each taxon on the tree (leaving the branch structure unchanged) and creating a distribution of UniFrac distance values, one can obtain a distribution of UniFrac values. From this, a p-value can be given to the actual distance between the samples.

Additionally, there is a weighted version of the UniFrac metric which accounts for the relative abundance of each of the taxa within the communities. This is commonly used in metagenomic studies, where the number of metagenomic reads can be in the tens of thousands, and it is appropriate to 'bin' these reads into operational taxonomic units, or OTUs, which can then be dealt with as taxa within the UniFrac framework.

In 2012, a generalized UniFrac version,^[5] which unifies the weighted and unweighted UniFrac distance in a single framework, was proposed. The authors argued that the weighted and unweighted UniFrac distances place too much emphasis on either abundant lineages or rare lineages, respectively, leading to “loss of power when the important composition change occurs in moderately abundant lineages”. The generalized UniFrac distance aims to address this limitation by down-weighting the emphasis on abundant or rare lineages.

References

^ Lozupone, C. A.; Hamady, M; Kelley, S. T.; Knight, R. (2007). "Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities". Applied and Environmental Microbiology. 73 (5): 1576–85. doi:10.1128/AEM.01996-06. PMC 1828774. PMID 17220268.
^ Knight, Rob (2015). Follow Your Gut: The Enormous Impact of Tiny Microbes. Simon & Schuster/TED. p. 89. ISBN 978-1-4767-8475-5.
^ Lozupone, C.; Knight, R. (2005). "UniFrac: A New Phylogenetic Method for Comparing Microbial Communities". Applied and Environmental Microbiology. 71 (12): 8228–8235. doi:10.1128/AEM.71.12.8228-8235.2005. PMC 1317376. PMID 16332807.
^ Hamady, M; Lozupone, C; Knight, R (2010). "Fast Uni Frac: Facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and Phylo Chip data". The ISME Journal. 4 (1): 17–27. doi:10.1038/ismej.2009.97. PMC 2797552. PMID 19710709.
^ Chen, J.; Bittinger, K.; Charlson, E. S.; Hoffmann, C.; Lewis, J.; Wu, G. D.; Collman, R. G.; Bushman, F. D.; Li, H. (2012). "Associating microbiome composition with environmental covariates using generalized UniFrac distances". Bioinformatics. 28 (16): 2106–2113. doi:10.1093/bioinformatics/bts342. PMC 3413390. PMID 22711789.

External links

UniFrac online Archived 2014-08-12 at the Wayback Machine
Knight Lab website
Description of UniFrac, with worked examples Archived 2019-10-24 at the Wayback Machine

[1] Lozupone, C. A.; Hamady, M; Kelley, S. T.; Knight, R. (2007). "Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities". Applied and Environmental Microbiology. 73 (5): 1576–85. doi:10.1128/AEM.01996-06. PMC 1828774. PMID 17220268.

[2] Knight, Rob (2015). Follow Your Gut: The Enormous Impact of Tiny Microbes. Simon & Schuster/TED. p. 89. ISBN 978-1-4767-8475-5.

[3] Lozupone, C.; Knight, R. (2005). "UniFrac: A New Phylogenetic Method for Comparing Microbial Communities". Applied and Environmental Microbiology. 71 (12): 8228–8235. doi:10.1128/AEM.71.12.8228-8235.2005. PMC 1317376. PMID 16332807.

[4] Hamady, M; Lozupone, C; Knight, R (2010). "Fast Uni Frac: Facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and Phylo Chip data". The ISME Journal. 4 (1): 17–27. doi:10.1038/ismej.2009.97. PMC 2797552. PMID 19710709.

[5] Chen, J.; Bittinger, K.; Charlson, E. S.; Hoffmann, C.; Lewis, J.; Wu, G. D.; Collman, R. G.; Bushman, F. D.; Li, H. (2012). "Associating microbiome composition with environmental covariates using generalized UniFrac distances". Bioinformatics. 28 (16): 2106–2113. doi:10.1093/bioinformatics/bts342. PMC 3413390. PMID 22711789.

[1]

[2]

[3]

[4]

[5]

History

UniFrac

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

UniFrac

Research methods

References

External links

UniFrac

Overview

Definition and Purpose

Key Advantages Over Traditional Metrics

History and Development

Original Introduction

Evolution and Key Publications

Core Methodology

Phylogenetic Tree Construction

Unweighted UniFrac Calculation

Variants and Extensions

Weighted UniFrac

Generalized and Adjusted Variants

Applications

Community Comparison and Clustering

Statistical Hypothesis Testing

Implementation and Software

Available Tools and Libraries

Practical Considerations for Use

References

Add your contribution

Related Hubs

Contribute something