Robust measures of scale

In statistics, robust measures of scale are methods which quantify the statistical dispersion in a sample of numerical data while resisting outliers. These are contrasted with conventional or non-robust measures of scale, such as sample standard deviation, which are greatly influenced by outliers.

The most common such robust statistics are the interquartile range (IQR) and the median absolute deviation (MAD). Alternatives robust estimators have also been developed, such as those based on pairwise differences and biweight midvariance.

These robust statistics are particularly used as estimators of a scale parameter, and have the advantages of both robustness and superior efficiency on contaminated data, at the cost of inferior efficiency on clean data from distributions such as the normal distribution. To illustrate robustness, the standard deviation can be made arbitrarily large by increasing exactly one observation (it has a breakdown point of 0, as it can be contaminated by a single point), a defect that is not shared by robust statistics.

Note that, in domains such as finance, the assumption of normality may lead to excessive risk exposure, and that further parameterization may be needed to mitigate risks presented by abnormal kurtosis.

Approaches to estimation

Robust measures of scale can be used as estimators of properties of the population, either for parameter estimation or as estimators of their own expected value.

For example, robust estimators of scale are used to estimate the population standard deviation, generally by multiplying by a scale factor to make it an unbiased consistent estimator; see scale parameter: estimation. For example, the interquartile range may be rendered an unbiased, consistent estimator for the population standard deviation if the data follow a normal distribution and the measure is divided by: $1.349\approx 2{\sqrt {2}}\,\operatorname {erf} ^{-1}\!\left({\tfrac {1}{2}}\right)$ where $\operatorname {erf} ^{-1}$ is the inverse error function.

In other situations, it makes more sense to think of a robust measure of scale as an estimator of its own expected value, interpreted as an alternative to the population standard deviation as a measure of scale. For example, the median absolute deviation (MAD) of a sample from a standard Cauchy distribution is an estimator of the population MAD, which in this case is 1, whereas the population variance does not exist.

Statistical efficiency

Robust estimators typically have inferior statistical efficiency compared to conventional estimators for data drawn from a distribution without outliers, such as a normal distribution. However, they have superior efficiency for data drawn from a mixture distribution or from a heavy-tailed distribution, for which non-robust measures such as the standard deviation should not be used.

For example, for data drawn from the normal distribution, the median absolute deviation is 37% as efficient as the sample standard deviation, while the Rousseeuw–Croux estimator Q_n is 88% as efficient as the sample standard deviation.

Common robust estimators

One of the most common robust measures of scale is the interquartile range (IQR), the difference between the 75th percentile and the 25th percentile of a sample; this is the 25% trimmed range, an example of an L-estimator. Other trimmed ranges, such as the interdecile range (10% trimmed range) can also be used.

For a Gaussian distribution, IQR is related to $\sigma$ , the standard deviation, as:^[1] $\sigma \approx 0.7413\operatorname {IQR} =\operatorname {IQR} /1.349$

Another commonly used robust measure of scale is the median absolute deviation (MAD), the median of the absolute values of the differences between the data values and the overall median of the data set; for a Gaussian distribution, MAD is related to $\sigma$ as:^[2] $\sigma \approx 1.4826\operatorname {MAD} \approx \operatorname {MAD} /0.6745$ For details, visit the section on relation to standard deviation in the main article on MAD.

S_n and Q_n

Rousseeuw and Croux^[3] proposed 2 alternatives to the Median Absolute Deviation, motivated by two of its weaknesses:

It is inefficient (37% efficiency) at Gaussian distributions.
it computes a symmetric statistic about a location estimate, thus not dealing with skewness.

They propose two alternative statistics based on pairwise differences: S_n and Q_n.

S_n is defined as: ${\begin{aligned}\sigma \approx S_{n}&:=1.1926\,\operatorname {med} _{i}\left(\operatorname {med} _{j}(\,\left|x_{i}-x_{j}\right|\,)\right),\\\end{aligned}}$ Q_n is defined as:^[4]

$\sigma \approx Q_{n}:=2.2219\;{\bigl \{}|x_{i}-x_{j}|:i<j{\bigr \}}_{(k)}$

Where:

The factor 2.2219 is a consistency constant,
The set $\{\,|x_{i}-x_{j}|:i<j\,\}$ consists of all pairwise absolute differences between the observations $x_{i}$ and $x_{j}$ , and
The subscript $(k)$ represents the $k$ th order statistic, or $k\approx {\binom {n}{2}}/4$

These can be computed in O(n log n) time and O(n) space.

Neither of these requires location estimation, as they are based only on differences between values. They are both more efficient than the MAD under a Gaussian distribution: S_n is 58% efficient, while Q_n is 82% efficient.

For a sample from a normal distribution, S_n is approximately unbiased for the population standard deviation even down to very modest sample sizes (<1% bias for n = 10).

For a large sample from a normal distribution, 2.22Q_n is approximately unbiased for the population standard deviation. For small or moderate samples, the expected value of Q_n under a normal distribution depends markedly on the sample size, so finite-sample correction factors (obtained from a table or from simulations) are used to calibrate the scale of Q_n.

The biweight midvariance

Like S_n and Q_n, the biweight midvariance is intended to be robust without sacrificing too much efficiency. It is defined as:^[5]

${\frac {n\displaystyle \sum _{i=1}^{n}\left(x_{i}-Q\right)^{2}\left(1-u_{i}^{2}\right)^{4}I(|u_{i}|<1)}{\left(\displaystyle \sum _{i}\left(1-u_{i}^{2}\right)\left(1-5u_{i}^{2}\right)I(|u_{i}|<1)\right)^{2}}},$

where I is the indicator function, Q is the sample median of the X_i, and

$u_{i}={\frac {x_{i}-Q}{9\cdot {\rm {MAD}}}}.$

Its square root is a robust estimator of scale, since data points are downweighted as their distance from the median increases, with points more than 9 MAD units from the median having no influence at all.

The biweight's efficiency has been estimated at around 84.7% for sets of 20 samples drawn from synthetically generated distributions with added excess kurtosis ("stretched tails"). For Gaussian distributions, its efficiency has been estimated at 98.2%.^[6]

Location-scale depth

Mizera and Müller extended the approach offered by Rousseeuw and Hubert by proposing a robust depth-based estimator for location and scale simultaneously, called location-scale depth. It is defined as follows:^[7]

$d(\mu ,\sigma )={\begin{cases}\displaystyle \inf _{u\neq 0}\#{\Bigl \{}i:(u_{1},u_{2}){\Bigl (}{\begin{array}{c}\psi (\tau _{i})\\\chi (\tau _{i})-1\end{array}}{\Bigr )}\geq 0{\Bigr \}},&{\text{if }}\sigma >0,\\[1.5ex]\#\{i:y_{i}=\mu \},&{\text{if }}\sigma =0.\end{cases}}$

Where:

$\tau _{i}$ is a shorthand for $(y_{i}-\mu )/\sigma$ ,
$\psi$ and $\chi$ depend on a fixed density $f$

They suggest that the most tractable version of location-scale depth is the one based on Student's t-distribution.

Confidence intervals

A robust confidence interval is a robust modification of confidence intervals, meaning that one modifies the non-robust calculations of the confidence interval so that they are not badly affected by outlying or aberrant observations in a data-set.

Example

In the process of weighing 1000 objects, under practical conditions, it is easy to believe that the operator might make a mistake in procedure and so report an incorrect mass (thereby making one type of systematic error). Suppose there were 100 objects and the operator weighed them all, one at a time, and repeated the whole process ten times. Then the operator can calculate a sample standard deviation for each object, and look for outliers. Any object with an unusually large standard deviation probably has an outlier in its data. These can be removed by various non-parametric techniques. If the operator repeated the process only three times, simply taking the median of the three measurements and using σ would give a confidence interval. The 200 extra weighings served only to detect and correct for operator error and did nothing to improve the confidence interval. With more repetitions, one could use a truncated mean, discarding the largest and smallest values and averaging the rest. A bootstrap calculation could be used to determine a confidence interval narrower than that calculated from σ, and so obtain some benefit from a large amount of extra work.

These procedures are robust against procedural errors which are not modeled by the assumption that the balance has a fixed known standard deviation σ. In practical applications where the occasional operator error can occur, or the balance can malfunction, the assumptions behind simple statistical calculations cannot be taken for granted. Before trusting the results of 100 objects weighed just three times each to have confidence intervals calculated from σ, it is necessary to test for and remove a reasonable number of outliers (testing the assumption that the operator is careful and correcting for the fact that he is not perfect), and to test the assumption that the data really have a normal distribution with standard deviation σ.

Computer simulation

The theoretical analysis of such an experiment is complicated, but it is easy to set up a spreadsheet which draws random numbers from a normal distribution with standard deviation σ to simulate the situation; this can be done in Microsoft Excel using =NORMINV(RAND(),0,σ)), as discussed in ^[8] and the same techniques can be used in other spreadsheet programs such as in OpenOffice.org Calc and gnumeric.

After removing obvious outliers, one could subtract the median from the other two values for each object, and examine the distribution of the 200 resulting numbers. It should be normal with mean near zero and standard deviation a little larger than σ. A simple Monte Carlo spreadsheet calculation would reveal typical values for the standard deviation (around 105 to 115% of σ). Or, one could subtract the mean of each triplet from the values, and examine the distribution of 300 values. The mean is identically zero, but the standard deviation should be somewhat smaller (around 75 to 85% of σ).

References

^ "Interquartile Range". NIST. Retrieved 2022-03-30.
^ Pham-Gia, T.; Hung, T. L. (2001-10-01). "The mean and median absolute deviations". Mathematical and Computer Modelling. 34 (7): 921–936. doi:10.1016/S0895-7177(01)00109-1. ISSN 0895-7177.
^ Rousseeuw, Peter J.; Croux, Christophe (December 1993), "Alternatives to the Median Absolute Deviation", Journal of the American Statistical Association, 88 (424), American Statistical Association: 1273–1283, doi:10.2307/2291267, JSTOR 2291267
^ Croux, Christophe; Rousseeuw, Peter J. (1992). "Time-Efficient Algorithms for Two Highly Robust Estimators of Scale". In Dodge, Yadolah; Whittaker, Joe (eds.). Computational Statistics. Heidelberg: Physica-Verlag HD. pp. 411–428. doi:10.1007/978-3-662-26811-7_58. ISBN 978-3-662-26811-7.
^ "Biweight Midvariance". www.itl.nist.gov. Retrieved 2025-05-18.
^ Kafadar, Karen (1983). "The Efficiency of the Biweight as a Robust Estimator of Location". Journal of Research of the National Bureau of Standards. 88 (2): 105–116. doi:10.6028/jres.088.006. ISSN 0160-1741. PMC 6768164. PMID 34566098.
^ Mizera, I.; Müller, C. H. (2004), "Location-scale depth", Journal of the American Statistical Association, 99 (468): 949–966, doi:10.1198/016214504000001312.
^ Wittwer, J.W., "Monte Carlo Simulation in Excel: A Practical Guide", June 1, 2004

[1] "Interquartile Range". NIST. Retrieved 2022-03-30.

[2] Pham-Gia, T.; Hung, T. L. (2001-10-01). "The mean and median absolute deviations". Mathematical and Computer Modelling. 34 (7): 921–936. doi:10.1016/S0895-7177(01)00109-1. ISSN 0895-7177.

[3] Rousseeuw, Peter J.; Croux, Christophe (December 1993), "Alternatives to the Median Absolute Deviation", Journal of the American Statistical Association, 88 (424), American Statistical Association: 1273–1283, doi:10.2307/2291267, JSTOR 2291267

[4] Croux, Christophe; Rousseeuw, Peter J. (1992). "Time-Efficient Algorithms for Two Highly Robust Estimators of Scale". In Dodge, Yadolah; Whittaker, Joe (eds.). Computational Statistics. Heidelberg: Physica-Verlag HD. pp. 411–428. doi:10.1007/978-3-662-26811-7_58. ISBN 978-3-662-26811-7.

[5] "Biweight Midvariance". www.itl.nist.gov. Retrieved 2025-05-18.

[6] Kafadar, Karen (1983). "The Efficiency of the Biweight as a Robust Estimator of Location". Journal of Research of the National Bureau of Standards. 88 (2): 105–116. doi:10.6028/jres.088.006. ISSN 0160-1741. PMC 6768164. PMID 34566098.

[7] Mizera, I.; Müller, C. H. (2004), "Location-scale depth", Journal of the American Statistical Association, 99 (468): 949–966, doi:10.1198/016214504000001312.

[8] Wittwer, J.W., "Monte Carlo Simulation in Excel: A Practical Guide", June 1, 2004

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

Estimator	Gaussian	t (df=3)	Slash
MAD	0.37	0.74	>1.0
IQR	0.37	0.67	0.80
Biweight Midvariance	0.95	0.92	0.85
Sn	0.58	0.85	0.95
Qn	0.82	0.90	0.98

History

Robust measures of scale

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Robust measures of scale

Approaches to estimation

Statistical efficiency

Common robust estimators

Sn and Qn

The biweight midvariance

Location-scale depth

Confidence intervals

Example

Computer simulation

See also

References