Mid-range
Mid-range
Main page

Mid-range

logo
Community Hub0 subscribers
Read side by side
from Wikipedia

In statistics, the mid-range or mid-extreme is a measure of central tendency of a sample defined as the arithmetic mean of the maximum and minimum values of the data set:[1]

The mid-range is closely related to the range, a measure of statistical dispersion defined as the difference between maximum and minimum values. The two measures are complementary in sense that if one knows the mid-range and the range, one can find the sample maximum and minimum values.

The mid-range is rarely used in practical statistical analysis, as it lacks efficiency as an estimator for most distributions of interest, because it ignores all intermediate points, and lacks robustness, as outliers change it significantly. Indeed, for many distributions it is one of the least efficient and least robust statistics. However, it finds some use in special cases: it is the maximally efficient estimator for the center of a uniform distribution, trimmed mid-ranges address robustness, and as an L-estimator, it is simple to understand and compute.

Robustness

[edit]

The midrange is highly sensitive to outliers and ignores all but two data points. It is therefore a very non-robust statistic, having a breakdown point of 0, meaning that a single observation can change it arbitrarily. Further, it is highly influenced by outliers: increasing the sample maximum or decreasing the sample minimum by x changes the mid-range by while it changes the sample mean, which also has breakdown point of 0, by only It is thus of little use in practical statistics, unless outliers are already handled.

A trimmed midrange is known as a midsummary – the n% trimmed midrange is the average of the n% and (100−n)% percentiles, and is more robust, having a breakdown point of n%. In the middle of these is the midhinge, which is the 25% midsummary. The median can be interpreted as the fully trimmed (50%) mid-range; this accords with the convention that the median of an even number of points is the mean of the two middle points.

These trimmed midranges are also of interest as descriptive statistics or as L-estimators of central location or skewness: differences of midsummaries, such as midhinge minus the median, give measures of skewness at different points in the tail.[2]

Efficiency

[edit]

Despite its drawbacks, in some cases it is useful: the midrange is a highly efficient estimator of μ, given a small sample of a sufficiently platykurtic distribution, but it is inefficient for mesokurtic distributions, such as the normal.

For example, for a continuous uniform distribution with unknown maximum and minimum, the mid-range is the uniformly minimum-variance unbiased estimator (UMVU) estimator for the mean. The sample maximum and sample minimum, together with sample size, are a sufficient statistic for the population maximum and minimum – the distribution of other samples, conditional on a given maximum and minimum, is just the uniform distribution between the maximum and minimum and thus add no information. See German tank problem for further discussion. Thus the mid-range, which is an unbiased and sufficient estimator of the population mean, is in fact the UMVU: using the sample mean just adds noise based on the uninformative distribution of points within this range.

Conversely, for the normal distribution, the sample mean is the UMVU estimator of the mean. Thus for platykurtic distributions, which can often be thought of as between a uniform distribution and a normal distribution, the informativeness of the middle sample points versus the extrema values varies from "equal" for normal to "uninformative" for uniform, and for different distributions, one or the other (or some combination thereof) may be most efficient. A robust analog is the trimean, which averages the midhinge (25% trimmed mid-range) and median.

Small samples

[edit]

For small sample sizes (n from 4 to 20) drawn from a sufficiently platykurtic distribution (negative excess kurtosis, defined as γ2 = (μ4/(μ2)²) − 3), the mid-range is an efficient estimator of the mean μ. The following table summarizes empirical data comparing three estimators of the mean for distributions of varied kurtosis; the modified mean is the truncated mean, where the maximum and minimum are eliminated.[3][4]

Excess kurtosis (γ2) Most efficient estimator of μ
−1.2 to −0.8 Midrange
−0.8 to 2.0 Mean
2.0 to 6.0 Modified mean

For n = 1 or 2, the midrange and the mean are equal (and coincide with the median), and are most efficient for all distributions. For n = 3, the modified mean is the median, and instead the mean is the most efficient measure of central tendency for values of γ2 from 2.0 to 6.0 as well as from −0.8 to 2.0.

Sampling properties

[edit]

For a sample of size n from the standard normal distribution, the mid-range M is unbiased, and has a variance given by:[5]

For a sample of size n from the standard Laplace distribution, the mid-range M is unbiased, and has a variance given by:[6]

and, in particular, the variance does not decrease to zero as the sample size grows.

For a sample of size n from a zero-centred uniform distribution, the mid-range M is unbiased, nM has an asymptotic distribution which is a Laplace distribution.[7]

Deviation

[edit]

While the mean of a set of values minimizes the sum of squares of deviations and the median minimizes the average absolute deviation, the midrange minimizes the maximum deviation (defined as ): it is a solution to a variational problem.

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In statistics, the mid-range, also known as the midrange, is a measure of central tendency defined as the arithmetic mean of the minimum and maximum values in a data set.[1][2] It provides a quick estimate of the central value by averaging the extremes, making it one of the simplest statistical measures to compute.[1] The mid-range is calculated by adding the smallest and largest data points and dividing by 2, expressed as min(X)+max(X)2\frac{\min(X) + \max(X)}{2}, where XX represents the data set.[3] This method is particularly straightforward for small or ordered data sets, such as test scores or measurements, and is often used alongside other central tendency measures like the mean and median.[4] However, its sensitivity to outliers—where a single extreme value can skew the result significantly—limits its reliability compared to the median or arithmetic mean, rendering it prone to bias in distributions with anomalies.[5] For instance, in a data set of {1, 2, 3, 4, 100}, the mid-range is 50.5, far from the more representative median of 3.[1] Despite these drawbacks, the mid-range remains useful in preliminary data analysis or when computational resources are limited, as it requires only identification of the extremes rather than all values.[2] In certain contexts, such as survey scales, it may refer to the theoretical midpoint of a response range (e.g., 4 on a 1–7 Likert scale), independent of actual responses, to assess neutrality.[6] Overall, while not as robust as other measures, the mid-range offers a basic tool for summarizing data location, especially in educational or exploratory settings.[7]

Definition and Basics

Definition as a Measure of Central Tendency

The mid-range, also known as the mid-extreme, is a measure of central tendency defined as the arithmetic mean of the minimum and maximum values in a sample dataset, providing a straightforward estimator of the population's central location.[8][9] This approach leverages only the dataset's extremes to approximate the center, making it one of the simplest location statistics alongside the arithmetic mean and median.[10] Originating in descriptive statistics, the mid-range emerged as a quick method to gauge central location by averaging extremes, with early references appearing in 19th-century statistical literature focused on practical data summarization.[11][12] No single inventor is attributed to its formalization, as it evolved naturally from rudimentary averaging techniques in early statistical practice, predating more comprehensive measures like the full arithmetic mean.[11] As a location estimator, the mid-range distinctly emphasizes the dataset's boundaries, rendering it particularly sensitive to extreme values that can skew the estimate away from the true center.[5] This sensitivity highlights its role in descriptive analysis where rapid assessment of spread-influenced centrality is prioritized over robustness.[1]

Relation to Order Statistics and Range

In statistics, order statistics are the sorted values of a random sample X1,X2,,XnX_1, X_2, \dots, X_n of size nn from a distribution, arranged in non-decreasing order as X(1)X(2)X(n)X_{(1)} \leq X_{(2)} \leq \cdots \leq X_{(n)}, where X(1)X_{(1)} denotes the sample minimum and X(n)X_{(n)} the sample maximum./06%3A_Random_Samples/6.06%3A_Order_Statistics) The sample range RR is the length of the interval spanning these extremes, defined as R=X(n)X(1)R = X_{(n)} - X_{(1)}.[13] The mid-range is the midpoint of this interval [X(1),X(n)][X_{(1)}, X_{(n)}], expressed as X(1)+X(n)2\frac{X_{(1)} + X_{(n)}}{2}.[14] This construction underscores the mid-range's reliance exclusively on the two extreme order statistics, effectively ignoring all intermediate sample values in its computation.[14]

Calculation

Formula and Computation

The mid-range, denoted as $ M $, is computed as the average of the sample minimum and maximum values, formally expressed using order statistics as
M=X(1)+X(n)2, M = \frac{X_{(1)} + X_{(n)}}{2},
where $ X_{(1)} $ represents the smallest observation in the ordered sample and $ X_{(n)} $ the largest.[15][16] To compute the mid-range, identify the minimum $ X_{(1)} $ and maximum $ X_{(n)} $ in the dataset; then apply the averaging formula directly to these extremes.[16][15] For edge cases, the mid-range is undefined for an empty sample ($ n = 0 ),asnominimumormaximumexists;forasinglevaluesample(), as no minimum or maximum exists; for a single-value sample ( n = 1 $), it equals that value, since the minimum and maximum coincide.[16]

Illustrative Examples

To illustrate the computation of the mid-range, consider a simple dataset consisting of the odd numbers from 1 to 9: {1, 3, 5, 7, 9}. The minimum value is 1 and the maximum value is 9, so the mid-range is calculated as (1 + 9)/2 = 5. Another example involves a dataset where an extreme value is present: {1, 2, 3, 4, 100}. Here, the minimum is 1 and the maximum is 100, yielding a mid-range of (1 + 100)/2 = 50.5. For a dataset with an even number of observations, such as {2, 4, 6, 8}, the minimum is 2 and the maximum is 8, resulting in a mid-range of (2 + 8)/2 = 5.

Statistical Properties

Robustness to Outliers

The mid-range, defined as the average of the sample minimum and maximum, demonstrates extreme sensitivity to outliers due to its reliance on only the two extreme order statistics. This lack of robustness is quantified by its breakdown point of 0, indicating that a single contaminated observation can cause the estimator to produce arbitrarily large or small values, completely distorting the location estimate.[17] A key aspect of this sensitivity arises from the direct impact of an extreme value on the mid-range. If a dataset consists of values clustered around a true center $ \mu $, and one outlier deviates from $ \mu $ by a distance $ d $ (becoming the new minimum or maximum), the mid-range shifts by exactly $ d/2 $, as the estimator averages the unaffected extreme with the outlier. For instance, consider a sample of 10 values all equal to 5 (mid-range = 5); introducing an outlier of 15 changes the mid-range to 10 (average of 5 and 15). This linear propagation of the outlier's deviation halves the influence compared to the mean but still renders the mid-range unreliable for contaminated data.[17] In contrast, trimmed variants like the midhinge—the average of the 25th and 75th percentiles, equivalent to a 25% trimmed mid-range—improve robustness, achieving a breakdown point of 25%, though they sacrifice some efficiency in clean samples.[17]

Efficiency Across Distributions

The mid-range serves as an unbiased estimator of the population mean for symmetric distributions, and its performance relative to the sample mean varies significantly depending on the underlying distribution's kurtosis. For platykurtic distributions, such as the uniform distribution on [a, b], the mid-range is the uniformly minimum variance unbiased (UMVU) estimator of the mean μ = (a + b)/2. In this case, its variance attains the Cramér-Rao lower bound among all unbiased estimators, making it optimal and yielding an asymptotic relative efficiency (ARE) of 1 relative to the best possible unbiased estimator; consequently, it outperforms the sample mean, with relative efficiency exceeding 1 and increasing with sample size.[18] In contrast, for mesokurtic distributions like the normal, the sample mean is the efficient estimator, achieving the Cramér-Rao bound. The mid-range converges at a slower rate of O_p(1/√(log n)) compared to the √n rate of the sample mean, resulting in an ARE of 0 relative to the sample mean. For leptokurtic distributions, which exhibit heavier tails than the normal, the mid-range performs even more poorly due to greater influence from extreme order statistics, leading to an ARE less than that for the normal case and approaching 0 asymptotically.[19] The mid-range's suitability is thus highest for symmetric platykurtic cases like the uniform [a, b], where the population mean directly corresponds to the mid-point of the support, allowing the estimator to leverage the bounded extremes effectively. Efficiency is derived by comparing the asymptotic variances (or more generally, mean squared errors) of the mid-range and sample mean, adjusted for their respective convergence rates; when rates differ, the relative efficiency reflects the ratio of sample sizes required to achieve equivalent precision, highlighting the mid-range's advantages in bounded-support scenarios and disadvantages in unbounded or heavy-tailed ones.[18][19]

Sampling Properties and Variance

The mid-range $ M = \frac{X_{(1)} + X_{(n)}}{2} $, where $ X_{(1)} $ and $ X_{(n)} $ are the sample minimum and maximum order statistics from a sample of size $ n $, is an unbiased estimator of the population mean for symmetric distributions. For distributions with finite support, such as the uniform distribution, the sample mid-range also unbiasedly estimates the population mid-range, which coincides with the mean.[20] Under the uniform distribution $ U(0,1) $, the exact variance of the mid-range is given by
Var(M)=12(n+1)(n+2), \text{Var}(M) = \frac{1}{2(n+1)(n+2)},
derived from the known moments of the minimum and maximum order statistics, which follow Beta distributions, and their covariance.[21] This variance decreases rapidly with $ n $, reflecting the concentration of the extremes near 0 and 1. For the normal distribution $ N(\mu, \sigma^2) $, the variance of the mid-range is approximately $ \frac{\pi^2 \sigma^2}{24 \ln n} $ for large $ n $, arising from the asymptotic Gumbel distribution of the normalized extremes, with the min and max being asymptotically independent and symmetric around $ \mu $. A rough large-sample approximation sometimes used is $ \frac{\sigma^2}{2n} $, though the logarithmic term provides better accuracy as it captures the slower convergence due to the unbounded tails. In the Laplace distribution, which has heavier tails than the normal (exponential decay versus Gaussian), the variance of the mid-range is higher than in the normal case for comparable $ \sigma^2 $, as the extremes exhibit greater variability; exact expressions are more complex and typically obtained via numerical integration of order statistic moments, but simulations confirm elevated variance relative to lighter-tailed distributions.[21] The sampling distribution of the mid-range is approximately normal for large $ n $, justified by the central limit theorem applied to the sum of the dependent extremes, whose joint distribution converges to a bivariate form that yields normality for their average after normalization. This asymptotic normality holds across common distributions, facilitating confidence intervals via $ M \pm z_{\alpha/2} \sqrt{\text{Var}(M)} $.

Performance Characteristics

Behavior in Small Samples

In small samples, the mid-range demonstrates heightened sensitivity to the distributional shape, performing optimally as a central tendency estimator under conditions approximating uniformity. For the uniform distribution, the mid-range is more efficient than the sample mean, particularly for small to moderate sample sizes. The estimator's reliance on just two order statistics—the minimum and maximum—introduces substantial instability in small samples due to the high variability of these extremes, which are determined by only a few observations. This volatility is particularly pronounced as the number of data points is low, amplifying the impact of any single outlier or random fluctuation on the result. For instance, with n=2, the mid-range simplifies to the arithmetic mean of the two values, offering no benefit from interior points since none exist, and its variance matches that of the mean exactly.[22] Monte Carlo simulations with up to 200,000 iterations reveal that in non-uniform small samples, the mid-range exhibits greater uncertainty, with coverage factors increasing markedly (e.g., from 2.41 for uniform at ν=16 to 3.96 for 50% Gaussian mixture), leading to overestimation of the estimator's spread relative to uniform conditions. These empirical findings underscore the mid-range's diminished reliability outside platykurtic settings, where deviations from uniformity inflate the standard deviation of the estimator by factors exceeding 10 in some cases for n up to 20.[22]

Bias and Deviation Metrics

The mid-range estimator exhibits zero bias as an estimate of the population mean for symmetric distributions, such as the uniform and normal distributions, where the expected values of the sample minimum and maximum are equidistant from the mean.[23] In positively skewed distributions, the mid-range displays positive bias, being drawn toward the extreme in the longer right tail, while negatively skewed distributions induce negative bias toward the left tail extreme. A key deviation property of the mid-range is its minimax characteristic: it minimizes the maximum absolute deviation from any point in the sample, serving as the center of the smallest interval that encompasses all data points.[24] The mean squared error of the mid-range estimator exceeds that of the sample mean for most distributions, including the normal, owing to its heightened sensitivity to outliers, which inflates its variance.[23] For instance, with samples of size 100 from a standard normal distribution, the mid-range's variance is approximately 0.0925, compared to 0.01 for the sample mean.[23]

Comparisons to Other Central Tendency Measures

The mid-range, calculated as the average of the minimum and maximum values in a dataset, utilizes only two data points out of n, making it computationally faster than the arithmetic mean, which incorporates every observation by summing all values and dividing by n.[25] However, this reliance on extremes renders the mid-range less stable, as it is highly sensitive to outliers that affect the minimum or maximum, whereas the mean distributes the impact of outliers across all data points equally.[25] For normally distributed data, the mean exhibits superior efficiency, with relative efficiencies showing its variance as approximately 59% of the mid-range's in small samples from standard normal distributions.[21] In contrast to the median, which is the central order statistic and thus leverages the ranked positions of all observations to mitigate extreme values, the mid-range disregards the ordering of interior points beyond identifying the extremes.[25] While both measures can demonstrate robustness in trimmed variants, the mid-range's dependence solely on boundary values makes it less effective against outliers compared to the median, which remains stable in skewed or heavy-tailed distributions like the Cauchy. In Cauchy-distributed samples, the mid-range has infinite variance, while the median has finite variance, highlighting the median's advantage in such settings.[25][21] The mid-range is preferable for rapid assessments in uniform distributions, where it achieves the lowest variance among central tendency measures, outperforming the mean by a factor of about 2.2 in relative efficiency for certain sample sizes.[21] In general inferential contexts, however, the arithmetic mean is favored for symmetric, light-tailed data like the normal, and the median for skewed or outlier-prone scenarios, as the mid-range's asymptotic lack of efficiency limits its broader applicability.[26]

Applications and Limitations

Uses in Specific Distributions

In uniform distributions, the mid-range serves as the uniformly minimum variance unbiased estimator (UMVUE) of the population mean, given by μ = (a + b)/2, where a and b are the lower and upper bounds of the distribution Uniform(a, b).[27] This property arises because the order statistics X_{(1)} and X_{(n)} (the sample minimum and maximum) form a complete sufficient statistic for the mean in this setting, and their average achieves the lowest variance among unbiased estimators.[27] In quality control for bounded processes, such as manufacturing tolerances where measurements are assumed to follow a uniform distribution due to uniform spread within specified limits, the mid-range provides a reliable estimate of the central value, aiding in process monitoring and adjustment.[22] The mid-range contributes to descriptive summaries as a derived measure from the five-number summary, which includes the minimum, first quartile (Q1), median, third quartile (Q3), and maximum; specifically, it is computed as the average of the minimum and maximum to offer a simple central tendency indicator. This makes it useful in exploratory data analysis for platykurtic distributions like the uniform, where the data exhibit low kurtosis and bounded support, allowing the mid-range to capture the center effectively without sensitivity to intermediate values. A practical real-world application involves estimating the average length of physical measurements, such as component dimensions in manufacturing, from a sorted sample assumed to follow a uniform distribution; here, the mid-range of the extremes provides an unbiased and efficient approximation of the true mean length when the process operates within fixed tolerances.[22]

Drawbacks and Alternative Approaches

The mid-range exhibits extreme sensitivity to outliers, as a single extreme value can arbitrarily distort the estimate by affecting the sample minimum or maximum, rendering it unsuitable for datasets with potential errors or contamination. This vulnerability stems from its reliance on only two data points, ignoring the rest of the sample and leading to inefficiency as an estimator for most real-world distributions that are not uniform.[28] Consequently, the mid-range lacks robustness for statistical inference, with an asymptotic breakdown point of 0—the lowest possible value—making it prone to failure in the presence of even minimal contamination.[29] To mitigate these drawbacks, alternatives such as the trimmed mid-range (or midhinge), defined as the average of the first and third quartiles, offer improved robustness by excluding extreme values while maintaining reasonable efficiency for symmetric data.[30] For symmetric distributions without outliers, the arithmetic mean is generally preferred due to its optimal efficiency under normality, whereas the median provides a more robust option for skewed distributions or outlier-prone data.[10] When assessing spread rather than central tendency, the interquartile range serves as a robust alternative to the full range, avoiding the influence of extremes.[31] The mid-range should be avoided in large, outlier-prone datasets or non-platykurtic distributions, where its poor performance is exacerbated, and it has become outdated in modern statistical software that favors robust methods like the median or trimmed estimators.[29]

References

User Avatar
No comments yet.