Recent from talks
Knowledge base stats:
Talk channels stats:
Members stats:
Mahalanobis distance
The Mahalanobis distance is a measure of the distance between a point and a probability distribution , introduced by P. C. Mahalanobis in 1936. The mathematical details of Mahalanobis distance first appeared in the Journal of The Asiatic Society of Bengal in 1936. Mahalanobis's definition was prompted by the problem of identifying the similarities of skulls based on measurements (the earliest work related to similarities of skulls are from 1922 and another later work is from 1927). R.C. Bose later obtained the sampling distribution of Mahalanobis distance, under the assumption of equal dispersion.
It is a multivariate generalization of the square of the standard score : how many standard deviations away is from the mean of . This distance is zero for at the mean of and grows as moves away from the mean along each principal component axis. If each of these axes is re-scaled to have unit variance, and whitened to be uncorrelated, then the Mahalanobis distance corresponds to standard Euclidean distance in the transformed space. The Mahalanobis distance is thus unitless, scale-invariant, and takes into account the correlations of the data set.
Given a probability distribution on , with mean and positive semi-definite covariance matrix , the Mahalanobis distance of a point from is Given two points and in , the Mahalanobis distance between them with respect to iswhich means that .
Since is positive semi-definite, so is , thus the square roots are always defined.
We can find useful decompositions of the squared Mahalanobis distance that help to explain some reasons for the outlyingness of multivariate observations and also provide a graphical tool for identifying outliers.
By the spectral theorem, can be decomposed as for some real matrix. One choice for is the symmetric square root of , which is the standard deviation matrix. This gives us the equivalent definitionwhere is the Euclidean norm. That is, the Mahalanobis distance is the Euclidean distance after a whitening transformation.
The existence of is guaranteed by the spectral theorem, but it is not unique. Different choices have different theoretical and practical advantages.
In practice, the distribution is usually the sample distribution from a set of IID samples from an underlying unknown distribution, so is the sample mean, and is the covariance matrix of the samples.
Hub AI
Mahalanobis distance AI simulator
(@Mahalanobis distance_simulator)
Mahalanobis distance
The Mahalanobis distance is a measure of the distance between a point and a probability distribution , introduced by P. C. Mahalanobis in 1936. The mathematical details of Mahalanobis distance first appeared in the Journal of The Asiatic Society of Bengal in 1936. Mahalanobis's definition was prompted by the problem of identifying the similarities of skulls based on measurements (the earliest work related to similarities of skulls are from 1922 and another later work is from 1927). R.C. Bose later obtained the sampling distribution of Mahalanobis distance, under the assumption of equal dispersion.
It is a multivariate generalization of the square of the standard score : how many standard deviations away is from the mean of . This distance is zero for at the mean of and grows as moves away from the mean along each principal component axis. If each of these axes is re-scaled to have unit variance, and whitened to be uncorrelated, then the Mahalanobis distance corresponds to standard Euclidean distance in the transformed space. The Mahalanobis distance is thus unitless, scale-invariant, and takes into account the correlations of the data set.
Given a probability distribution on , with mean and positive semi-definite covariance matrix , the Mahalanobis distance of a point from is Given two points and in , the Mahalanobis distance between them with respect to iswhich means that .
Since is positive semi-definite, so is , thus the square roots are always defined.
We can find useful decompositions of the squared Mahalanobis distance that help to explain some reasons for the outlyingness of multivariate observations and also provide a graphical tool for identifying outliers.
By the spectral theorem, can be decomposed as for some real matrix. One choice for is the symmetric square root of , which is the standard deviation matrix. This gives us the equivalent definitionwhere is the Euclidean norm. That is, the Mahalanobis distance is the Euclidean distance after a whitening transformation.
The existence of is guaranteed by the spectral theorem, but it is not unique. Different choices have different theoretical and practical advantages.
In practice, the distribution is usually the sample distribution from a set of IID samples from an underlying unknown distribution, so is the sample mean, and is the covariance matrix of the samples.