Statistical distance
View on WikipediaThis article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
In statistics, probability theory, and information theory, a statistical distance quantifies the distance between two statistical objects, which can be two random variables, or two probability distributions or samples, or the distance can be between an individual sample point and a population or a wider sample of points.
A distance between populations can be interpreted as measuring the distance between two probability distributions and hence they are essentially measures of distances between probability measures. Where statistical distance measures relate to the differences between random variables, these may have statistical dependence,[1] and hence these distances are not directly related to measures of distances between probability measures. Again, a measure of distance between random variables may relate to the extent of dependence between them, rather than to their individual values.
Many statistical distance measures are not metrics, and some are not symmetric. Some types of distance measures, which generalize squared distance, are referred to as (statistical) divergences.
Terminology
[edit]Many terms are used to refer to various notions of distance; these are often confusingly similar, and may be used inconsistently between authors and over time, either loosely or with precise technical meaning. In addition to "distance", similar terms include deviance, deviation, discrepancy, discrimination, and divergence, as well as others such as contrast function and metric. Terms from information theory include cross entropy, relative entropy, discrimination information, and information gain.
Distances as metrics
[edit]Metrics
[edit]A metric on a set X is a function (called the distance function or simply distance) d : X × X → R+ (where R+ is the set of non-negative real numbers). For all x, y, z in X, this function is required to satisfy the following conditions:
- d(x, y) ≥ 0 (non-negativity)
- d(x, y) = 0 if and only if x = y (identity of indiscernibles. Note that condition 1 and 2 together produce positive definiteness)
- d(x, y) = d(y, x) (symmetry)
- d(x, z) ≤ d(x, y) + d(y, z) (subadditivity / triangle inequality).
Generalized metrics
[edit]Many statistical distances are not metrics, because they lack one or more properties of proper metrics. For example, pseudometrics violate property (2), identity of indiscernibles; quasimetrics violate property (3), symmetry; and semimetrics violate property (4), the triangle inequality. Statistical distances that satisfy (1) and (2) are referred to as divergences.
Statistically close
[edit]The total variation distance of two distributions and over a finite domain , (often referred to as statistical difference[2] or statistical distance[3] in cryptography) is defined as
.
We say that two probability ensembles and are statistically close if is a negligible function in .
Examples
[edit]Metrics
[edit]- Total variation distance (sometimes just called "the" statistical distance)
- Hellinger distance
- Lévy–Prokhorov metric
- Wasserstein metric: also known as the Kantorovich metric, or earth mover's distance
- Mahalanobis distance
- Integral probability metrics generalize several metrics or pseudometrics on distributions
Divergences
[edit]- Kullback–Leibler divergence
- Rényi divergence
- Jensen–Shannon divergence
- Ball divergence
- Bhattacharyya distance (despite its name it is not a distance, as it violates the triangle inequality)
- f-divergence: generalizes several distances and divergences
- Discriminability index, specifically the Bayes discriminability index, is a positive-definite symmetric measure of the overlap of two distributions.
See also
[edit]Notes
[edit]- ^ Dodge, Y. (2003)—entry for distance
- ^ Goldreich, Oded (2001). Foundations of Cryptography: Basic Tools (1st ed.). Berlin: Cambridge University Press. p. 106. ISBN 0-521-79172-3.
- ^ Reyzin, Leo. (Lecture Notes) Extractors and the Leftover Hash Lemma
External links
[edit]References
[edit]- Dodge, Y. (2003) Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9
Statistical distance
View on GrokipediaFundamentals
Definition
A statistical distance is a non-negative function that quantifies the difference between two probability distributions and defined on the same probability space, satisfying for any distribution , and often whenever .[3] This measure captures how dissimilar the distributions are in terms of their probabilistic behavior, providing a way to compare random variables or samples drawn from them.[3] Formally, and are probability measures on a measurable space , where is the sample space and is a -algebra. For distributions admitting density functions and with respect to a dominating measure (such as Lebesgue measure for continuous cases or counting measure for discrete cases), statistical distances are frequently expressed through integrals or sums that aggregate differences between these densities. In the continuous setting, a common general form involves expressions like , while for discrete distributions over a countable space, it takes the form .[3] These formulations establish the foundational mathematical structure, enabling the analysis of distributional differences without requiring the distributions to share the same support.[3] The concept of statistical distance originated in early 20th-century probability theory, with formal developments in the 1930s in the context of weak convergence of probability measures. This setup provides the prerequisite framework for subsequent discussions of specific distances, where probability measures on measurable spaces can be referenced directly. Some statistical distances form a subclass of metrics, satisfying additional axioms like symmetry and the triangle inequality.[3]Terminology
The term "statistical distance" serves as an umbrella designation for a broad class of quantitative measures assessing the dissimilarity between two probability distributions, encompassing both metric and non-metric forms.[4] It is often used interchangeably with "probability distance" or "distributional distance" in the literature on probabilistic comparisons, though the latter may emphasize applications to random variables or samples drawn from distributions.[5] Specific subclasses, such as f-divergences, Bregman divergences, and integral probability metrics, fall under this umbrella but are distinguished by their construction; for instance, f-divergences are generated by convex functions and include measures like the Kullback-Leibler divergence, while Bregman divergences arise from convex potential functions and are prevalent in optimization contexts. These subclasses highlight the diversity within statistical distances, where f-divergences and integral probability metrics provide frameworks for directed or kernel-based dissimilarities, respectively. Statistical distances may be symmetric, satisfying for distributions and , as in the total variation distance, or asymmetric, where the measure is directed and , exemplified by divergences that quantify information loss in one direction.[5] Asymmetric forms, often termed directed divergences, are crucial in scenarios requiring orientation, such as model approximation.[4] Common notations include or for symmetric distances and for asymmetric divergences, with the double vertical bar emphasizing directionality in the latter. This convention, popularized in information-theoretic works, aids in distinguishing metric-like properties from broader dissimilarity assessments. The terminology evolved from "metric" in early 20th-century literature, which strictly implied satisfaction of the triangle inequality (e.g., Hellinger metric in 1909), to the more inclusive "distance" post-1950s, accommodating non-metric and asymmetric measures influenced by information theory, such as the Kullback-Leibler divergence introduced in 1951. This shift reflected the growing recognition of directed measures in statistical inference and hypothesis testing.Properties
Metrics
In the context of statistical distances, a distance function between probability measures qualifies as a metric if it satisfies the standard axioms of a metric space adapted to the space of probability measures. These axioms ensure that provides a consistent notion of separation between distributions, enabling the application of geometric and topological tools. Specifically, for any probability measures , , and on a measurable space, the axioms are:- Non-negativity: , reflecting that distances are never negative.
- Identity of indiscernibles: if and only if almost everywhere, ensuring that only identical distributions have zero distance.
- Symmetry: , meaning the distance is invariant under reversal of arguments.
- Triangle inequality: , which bounds the direct distance by paths through intermediate measures.