Hubbry Logo
logo
Mean
Community hub

Mean

logo
0 subscribers
Read side by side
from Wikipedia

A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers.[1] There are several kinds of means (or "measures of central tendency") in mathematics, especially in statistics. Each attempts to summarize or typify a given group of data, illustrating the magnitude and sign of the data set. Which of these measures is most illuminating depends on what is being measured, and on context and purpose.[2]

The arithmetic mean, also known as "arithmetic average", is the sum of the values divided by the number of values. The arithmetic mean of a set of numbers x1, x2, ..., xn is typically denoted using an overhead bar, .[note 1] If the numbers are from observing a sample of a larger group, the arithmetic mean is termed the sample mean () to distinguish it from the group mean (or expected value) of the underlying distribution, denoted or .[note 2][3]

Outside probability and statistics, a wide range of other notions of mean are often used in geometry and mathematical analysis; examples are given below.

Types of means

[edit]

Pythagorean means

[edit]

In mathematics, the three classical Pythagorean means are the arithmetic mean (AM), the geometric mean (GM), and the harmonic mean (HM). These means were studied with proportions by Pythagoreans and later generations of Greek mathematicians[4] because of their importance in geometry and music.

Arithmetic mean (AM)

[edit]

The arithmetic mean (or simply mean or average) of a list of numbers, is the sum of all of the numbers divided by their count. Similarly, the mean of a sample , usually denoted by , is the sum of the sampled values divided by the number of items in the sample.

For example, the arithmetic mean of five values: 4, 36, 45, 50, 75 is:

Geometric mean (GM)

[edit]

The geometric mean is an average that is useful for sets of positive numbers, that are interpreted according to their product (as is the case with rates of growth) and not their sum (as is the case with the arithmetic mean):[1]

For example, the geometric mean of five values: 4, 36, 45, 50, 75 is:

Harmonic mean (HM)

[edit]

The harmonic mean is an average which is useful for sets of numbers which are defined in relation to some unit, as in the case of speed (i.e., distance per unit of time):

For example, the harmonic mean of the five values: 4, 36, 45, 50, 75 is

If we have five pumps that can empty a tank of a certain size in respectively 4, 36, 45, 50, and 75 minutes, then the harmonic mean of tells us that these five different pumps working together will pump at the same rate as five pumps that can each empty the tank in minutes.

Relationship between AM, GM, and HM

[edit]
Proof without words of the AM–GM inequality:
PR is the diameter of a circle centered on O; its radius AO is the arithmetic mean of a and b. Triangle PGR is a right triangle from Thales's theorem, enabling use of the geometric mean theorem to show that its altitude GQ is the geometric mean. For any ratio a:b, AO ≥ GQ.

AM, GM, and HM of nonnegative real numbers satisfy these inequalities:[5]

Equality holds if all the elements of the given sample are equal.

Statistical location

[edit]
Comparison of the arithmetic mean, median, and mode of two skewed (log-normal) distributions
Geometric visualization of the mode, median and mean of an arbitrary probability density function[6]

In descriptive statistics, the mean may be confused with the median, mode or mid-range, as any of these may colloquially be called an "average" (more formally, a measure of central tendency). The mean of a set of observations is the arithmetic average of the values; however, for skewed distributions, the mean is not necessarily the same as the middle value (median), or the most likely value (mode). For example, mean income is typically skewed upwards by a small number of people with very large incomes, so that the majority have an income lower than the mean. By contrast, the median income is the level at which half the population is below and half is above. The mode income is the most likely income and favors the larger number of people with lower incomes. While the median and mode are often more intuitive measures for such skewed data, many skewed distributions are in fact best described by their mean, including the exponential and Poisson distributions.

Mean of a probability distribution

[edit]

The mean of a probability distribution is the long-run arithmetic average value of a random variable having that distribution. If the random variable is denoted by , then the mean is also known as the expected value of (denoted ). For a discrete probability distribution, the mean is given by , where the sum is taken over all possible values of the random variable and is the probability mass function. For a continuous distribution, the mean is , where is the probability density function.[7] In all cases, including those in which the distribution is neither discrete nor continuous, the mean is the Lebesgue integral of the random variable with respect to its probability measure. The mean need not exist or be finite; for some probability distributions the mean is infinite (+∞ or −∞), while for others the mean is undefined.

Generalized means

[edit]

Power mean

[edit]

The generalized mean, also known as the power mean or Hölder mean, abstracts several other means. It is defined for positive numbers by[1]

This, as a function of , is well defined on , but can be extended continuously to .[8] By choosing different values for , other well known means are retrieved.

Name Exponent Value
Minimum
Harmonic mean
Geometric mean
Arithmetic mean
Root mean square
Cubic mean
Maximum

Quasi-arithmetic mean

[edit]

A similar approach to the power mean is the -mean, also known as the quasi-arithmetic mean. For an injective function on an interval and real numbers we define their -mean as

By choosing different functions , other well known means are retrieved.

Mean Function[note 3]
Arithmetic mean
Geometric mean [note 4]
Harmonic mean
Power mean [note 5]

Weighted arithmetic mean

[edit]

The weighted arithmetic mean (or weighted average) is used if one wants to combine average values from different sized samples of the same population, and is define by[1]

where and are the mean and size of sample respectively. In other applications, they represent a measure for the reliability of the influence upon the mean by the respective values.

Truncated mean

[edit]

Sometimes, a set of numbers might contain outliers (i.e., data values which are much lower or much higher than the others). Often, outliers are erroneous data caused by artifacts. In this case, one can use a truncated mean. It involves discarding given parts of the data at the top or the bottom end, typically an equal amount at each end and then taking the arithmetic mean of the remaining data. The number of values removed is indicated as a percentage of the total number of values.

Interquartile mean

[edit]

The interquartile mean is a specific example of a truncated mean. It is simply the arithmetic mean after removing the lowest and the highest quarter of values.

assuming the values have been ordered, so is simply a specific example of a weighted mean for a specific set of weights.

Mean of a function

[edit]

In some circumstances, mathematicians may calculate a mean of an infinite (or even an uncountable) set of values. This can happen when calculating the mean value of a function . Intuitively, a mean of a function can be thought of as calculating the area under a section of a curve, and then dividing by the length of that section. This can be done crudely by counting squares on graph paper, or more precisely by integration. The integration formula is written as:

In this case, care must be taken to make sure that the integral converges. But the mean may be finite even if the function itself tends to infinity at some points.

Mean of angles and cyclical quantities

[edit]

Angles, times of day, and other cyclical quantities require modular arithmetic to add and otherwise combine numbers. These quantities can be averaged using the circular mean. In all these situations, it is possible that no mean exists, for example if all points being averaged are equidistant. Consider a color wheel—there is no mean to the set of all colors. Additionally, there may not be a unique mean for a set of values: for example, when averaging points on a clock, the mean of the locations of 11:00 and 13:00 is 12:00, but this location is equivalent to that of 00:00.

Fréchet mean

[edit]

The Fréchet mean gives a manner for determining the "center" of a mass distribution on a surface or, more generally, Riemannian manifold. Unlike many other means, the Fréchet mean is defined on a space whose elements cannot necessarily be added together or multiplied by scalars. It is sometimes also known as the Karcher mean (named after Hermann Karcher).

Triangular sets

[edit]

In geometry, there are thousands of different definitions for the center of a triangle that can all be interpreted as the mean of a triangular set of points in the plane.[9]

Swanson's rule

[edit]

This is an approximation to the mean for a moderately skewed distribution.[10] It is used in hydrocarbon exploration and is defined as:

where , and are the 10th, 50th and 90th percentiles of the distribution, respectively.

Other means

[edit]

See also

[edit]

Notes

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In statistics and mathematics, the mean, often specifically the arithmetic mean, is a fundamental measure of central tendency that represents the average value of a dataset, calculated by summing all the numerical values in the set and dividing by the total number of values.[1] This computation provides a single representative value that summarizes the overall level of the data, making it widely used for its simplicity and interpretability in descriptive statistics.[2] The arithmetic mean is particularly effective for symmetric distributions but can be influenced by extreme outliers, potentially skewing the result away from the typical value.[3] Beyond the arithmetic mean, other types of means address specific data characteristics or applications, such as the geometric mean and harmonic mean. The geometric mean, computed as the nth root of the product of n positive numbers, is appropriate for averaging ratios, growth rates, or multiplicative processes, as it mitigates the impact of very large or small values compared to the arithmetic mean.[4] For instance, it is commonly applied in finance to calculate average returns over time[5] or in biology for modeling population growth.[6] The harmonic mean, defined as n divided by the sum of the reciprocals of the numbers, is ideal for averaging rates or ratios where the denominator varies, such as speeds over equal distances, and it always yields a value less than or equal to the geometric mean, which is itself less than or equal to the arithmetic mean for the same dataset.[7] These relationships, known as the AM-GM-HM inequality, highlight the arithmetic mean's tendency to produce the highest value among them for positive unequal numbers.[8] The concept of the mean traces its origins to ancient Greek mathematics, particularly the Pythagoreans, who developed the classical arithmetic, geometric, and harmonic means, where it initially described the midpoint between two numbers,[9] and evolved in the 16th century with astronomers like Tycho Brahe applying the arithmetic mean to reduce observational errors by averaging multiple measurements.[10] By the 19th century, Carl Friedrich Gauss advanced its role in statistics through the method of least squares, establishing the mean as the expected value in normal distributions and integral to inferential statistics.[11] Today, means underpin diverse fields including economics, engineering, and machine learning, where they facilitate data summarization, hypothesis testing, and model evaluation, though selection of the appropriate type depends on the data's scale and distribution properties.[12]

Classical Pythagorean Means

Arithmetic Mean

The arithmetic mean of a finite set of real numbers x1,x2,,xnx_1, x_2, \dots, x_n, where nn is the number of observations, is defined as their sum divided by the count:
xˉ=1ni=1nxi. \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i.
This measure provides a basic summary of the central value in the dataset and is applicable to any real numbers, positive or negative.[13] In statistical contexts, a distinction is drawn between the population mean, which applies to the entire dataset of size NN, and the sample mean, used for a subset of size nn. The population mean is calculated as
μ=1Ni=1Nxi, \mu = \frac{1}{N} \sum_{i=1}^N x_i,
while the sample mean is
xˉ=1ni=1nxi. \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i.
These formulas ensure the mean serves as an unbiased estimator when sampling from a larger population.[14] The arithmetic mean exhibits key mathematical properties, including linearity: for constants aa and bb, the mean of the transformed set aX+baX + b equals aa times the original mean plus bb, or AM(aX+b)=aAM(X)+b\mathrm{AM}(aX + b) = a \cdot \mathrm{AM}(X) + b. This affine property makes it useful for scaling and shifting data. However, the arithmetic mean is sensitive to outliers, as a single extreme value can significantly shift the result away from the typical values in the set.[15][16] The concept of the arithmetic mean traces back to ancient Greek mathematics.[17] It was later integrated into the method of least squares in the early 19th century by Carl Friedrich Gauss in his 1809 work Theoria motus corporum coelestium in sectionibus conicis solem ambientium, justifying its use for minimizing errors in astronomical observations.[18] Common applications include everyday calculations such as averaging daily temperatures or test scores. For example, the mean temperature for three days with readings of 20°C, 22°C, and 19°C is (20+22+19)/320.33(20 + 22 + 19)/3 \approx 20.33^\circC, providing a quick summary of the period's warmth. Similarly, for test scores of 85, 92, and 78, the arithmetic mean is (85+92+78)/3=85(85 + 92 + 78)/3 = 85, representing the group's overall performance.[16][19] For continuous data, the arithmetic mean generalizes to the average value of a function f(x)f(x) over an interval [a,b][a, b], given by
1baabf(x)dx. \frac{1}{b - a} \int_a^b f(x) \, dx.
This integral form extends the discrete summation, capturing the mean for continuously varying quantities like velocity over time.[20] For positive numbers, the arithmetic mean can be compared briefly to other classical means, such as the geometric or harmonic, though it remains the standard for additive averaging.[13]

Geometric Mean

The geometric mean of $ n $ positive real numbers $ x_1, x_2, \dots, x_n $ is defined as the $ n $-th root of their product, given by
GM=(i=1nxi)1/n. \text{GM} = \left( \prod_{i=1}^n x_i \right)^{1/n}.
[21] For two positive numbers $ x $ and $ y $, this simplifies to $ \sqrt{xy} $.[21] This measure is particularly suitable for aggregating ratios, rates of change, or positive data where multiplicative relationships dominate, such as in growth processes.[21] An equivalent formulation leverages logarithms, expressing the geometric mean as the exponential of the arithmetic mean of the natural logarithms:
GM=exp(1ni=1nlnxi). \text{GM} = \exp\left( \frac{1}{n} \sum_{i=1}^n \ln x_i \right).
[22] This property highlights its connection to logarithmic scales and makes it useful for data that span orders of magnitude, as the logarithm transforms multiplicative effects into additive ones.[22] The geometric mean finds applications in contexts involving compounded growth or proportional changes. In finance, it calculates the average annual return on investments over multiple periods, accounting for compounding effects; for example, successive returns of 10%, 20%, and -5% (corresponding to factors of 1.1, 1.2, and 0.95) yield a geometric mean of approximately 1.077, or an effective 7.7% annual rate.[21] In biology, it is used to summarize bacterial concentrations in environmental samples, such as water quality assessments, where data vary widely and geometric means provide a stable central tendency for log-normally distributed counts.[23] It also applies to modeling bacterial population growth rates in exponential phases, where multiplicative factors describe cell division over time.[23] A key property of the geometric mean for positive real numbers is that it is always less than or equal to the arithmetic mean, with equality if and only if all the numbers are equal; this is a consequence of the arithmetic mean-geometric mean (AM-GM) inequality.[24] As the multiplicative counterpart in the classical Pythagorean means—alongside the arithmetic and harmonic means—the geometric mean emphasizes balanced proportions in geometric progressions.[24]

Harmonic Mean

The harmonic mean of nn positive real numbers x1,x2,,xnx_1, x_2, \dots, x_n is defined as the reciprocal of the arithmetic mean of their reciprocals:
HM=ni=1n1xi. \text{HM} = \frac{n}{\sum_{i=1}^n \frac{1}{x_i}}.
[5] For two positive numbers aa and bb, this simplifies to
HM=2aba+b. \text{HM} = \frac{2ab}{a + b}.
[5] A key property of the harmonic mean is that it is always less than or equal to the geometric mean and the arithmetic mean for the same set of positive numbers, with equality holding if and only if all the numbers are equal.[25] This positions it as the smallest among the classical Pythagorean means.[25] The harmonic mean is particularly appropriate for datasets consisting of rates or ratios, as it gives greater weight to smaller values, providing a balanced measure in such contexts.[26] In applications involving rates, the harmonic mean yields the correct average when the quantities being averaged are inversely proportional to the rates, such as speeds over equal distances or the equivalent resistance of components in parallel.[26] For parallel resistors with resistances R1,R2,,RnR_1, R_2, \dots, R_n, the equivalent resistance ReqR_{\text{eq}} satisfies $ \frac{1}{R_{\text{eq}}} = \sum_{i=1}^n \frac{1}{R_i} $, so $ R_{\text{eq}} = \frac{n}{\sum_{i=1}^n \frac{1}{R_i}} $, which is precisely the harmonic mean of the individual resistances. In economics, it is used to average ratios like price-to-earnings multiples across companies, ensuring the result reflects the harmonic relationship in valuation metrics.[27] A common example illustrates its use for averaging speeds: suppose a vehicle travels one leg of a round trip at 60 km/h and the return leg at 40 km/h, with both legs covering equal distances. The arithmetic mean of 50 km/h overestimates the true average speed, but the harmonic mean gives
HM=2×60×4060+40=48 km/h, \text{HM} = \frac{2 \times 60 \times 40}{60 + 40} = 48 \text{ km/h},
which matches the total distance divided by total time.[26]

Relationships and Inequalities

The Pythagorean means—the arithmetic mean (AM), geometric mean (GM), and harmonic mean (HM)—are interconnected through the classical inequality HM ≤ GM ≤ AM, which holds for any finite collection of positive real numbers x1,x2,,xnx_1, x_2, \dots, x_n. Equality occurs if and only if x1=x2==xnx_1 = x_2 = \dots = x_n. This relationship highlights how the HM provides a lower bound, the GM an intermediate value, and the AM an upper bound, reflecting different ways of aggregating data while preserving order.[9][28] Proofs of this inequality often rely on convexity and Jensen's inequality. For the AM-GM portion, consider the convex function f(x)=logxf(x) = -\log x on (0,)(0, \infty). Jensen's inequality states that 1ni=1nf(xi)f(1ni=1nxi)\frac{1}{n} \sum_{i=1}^n f(x_i) \geq f\left( \frac{1}{n} \sum_{i=1}^n x_i \right), so 1ni=1n(logxi)log(1ni=1nxi)\frac{1}{n} \sum_{i=1}^n (-\log x_i) \geq -\log \left( \frac{1}{n} \sum_{i=1}^n x_i \right). Rearranging yields log(1ni=1nxi)1ni=1nlogxi=log((i=1nxi)1/n)\log \left( \frac{1}{n} \sum_{i=1}^n x_i \right) \geq \frac{1}{n} \sum_{i=1}^n \log x_i = \log \left( \left( \prod_{i=1}^n x_i \right)^{1/n} \right), implying AM ≥ GM, with equality when all xix_i are equal. The GM-HM portion follows by applying AM-GM to the reciprocals 1/xi1/x_i, since HM is the reciprocal of the AM of the reciprocals.[29][30] Geometrically, the means for two positive numbers aa and bb admit constructions involving basic figures. The GM ab\sqrt{ab} is the length of the side of a square with area equal to that of a rectangle of sides aa and bb, or the altitude to the hypotenuse in a right triangle with legs aa and bb. The HM 2ab/(a+b)2ab/(a+b) arises in the context of harmonic divisions, such as the intersection points of parallel lines with transversals or in circle inversions preserving angles. The AM (a+b)/2(a+b)/2 corresponds to the midpoint of a line segment joining aa and bb. These interpretations underscore the means' roles in proportion and similarity.[9] The inequality has significant applications in optimization, particularly for problems involving constraints on sums or products. For instance, to minimize xi\prod x_i subject to a fixed sum xi=S\sum x_i = S with xi>0x_i > 0, the AM-GM inequality implies the minimum occurs when all xi=S/nx_i = S/n, achieving equality. This technique appears in resource allocation and extremal problems, as elaborated in Hardy, Littlewood, and Pólya's seminal work on inequalities.[31][32] Extensions of the inequality apply to any number of variables, with proofs generalizing via induction on nn (starting from the two-variable case) or directly through Jensen's inequality for the weighted form. Historical roots trace to the Pythagorean school around the 6th century BCE, where the means emerged from studies of musical intervals and geometric proportions—such as hammer weights producing harmonies via HM ratios—though formal definitions were provided later by Archytas (c. 428–347 BCE) and Euclid in the Elements.[33][34][29]

Means in Statistics and Probability

Arithmetic Mean as Central Tendency

In statistics, the arithmetic mean, often denoted as xˉ\bar{x}, represents the sample mean calculated from a dataset and serves as the primary estimator of the population mean μ\mu.[35] This measure sums all observed values and divides by the sample size nn, providing a straightforward summary of the data's central location.[36] A key property of the sample mean is its unbiasedness: the expected value E(xˉ)=μE(\bar{x}) = \mu, holding for any distribution with a finite population mean, ensuring that over repeated samples, the average of the sample means equals the true population parameter.[35] Additionally, the variance of the sample mean is given by σ2n\frac{\sigma^2}{n}, where σ2\sigma^2 is the population variance, indicating that larger sample sizes reduce estimation uncertainty.[37] These properties make the arithmetic mean a foundational tool in parametric inference, though its performance assumes certain data characteristics. Compared to other measures of central tendency, the arithmetic mean is more sensitive to outliers than the median, which resists extreme values by focusing on the middle ordered value, or the mode, which identifies the most frequent value.[38] It performs best with symmetric distributions where data points cluster evenly around the center, but in skewed datasets, it can be pulled toward extremes, misrepresenting the typical value.[39] The arithmetic mean plays a central role in hypothesis testing, such as t-tests comparing group means, and in constructing confidence intervals, which quantify the precision of xˉ\bar{x} as an estimate of μ\mu using the standard error sn\frac{s}{\sqrt{n}}, where ss is the sample standard deviation.[40] For example, in population income data, which often exhibits right-skewness due to a few high earners, the mean income tends to exceed the median, potentially overstating the central tendency for most individuals.[41] This skewness highlights a limitation in non-normal distributions, where the mean's sensitivity to tails can lead to biased interpretations of centrality, even though it remains an unbiased estimator in expectation.[39] In such cases, modern robust alternatives, like the median or trimmed means that exclude extreme values, offer more reliable measures for skewed data.[39]

Expected Value of a Probability Distribution

In probability theory, the expected value of a random variable XX, denoted E[X]E[X], represents the long-run average value of XX over many independent repetitions of the experiment, serving as the first moment of the probability distribution.[42] For a discrete random variable taking values xix_i with probabilities p(xi)p(x_i), the expected value is defined as the sum
E[X]=ixip(xi), E[X] = \sum_i x_i p(x_i),
where the sum is over all possible values in the support of the distribution.[42] For a continuous random variable with probability density function f(x)f(x), the expected value is the integral
E[X]=xf(x)dx. E[X] = \int_{-\infty}^{\infty} x f(x) \, dx.
[43]
Key properties of the expected value include linearity, which holds regardless of dependence between variables: for constants aa and bb and random variables XX and YY,
E[aX+bY]=aE[X]+bE[Y]. E[aX + bY] = a E[X] + b E[Y].
[44] This property facilitates computation for complex expressions by decomposing them into simpler components.[45] The expected value also relates directly to the variance, a measure of dispersion defined as
Var(X)=E[(XE[X])2]=E[X2](E[X])2, \text{Var}(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2,
quantifying the average squared deviation from the mean.[46] Specific distributions yield closed-form expected values that highlight its role as the mean parameter. For a binomial random variable XBin(n,p)X \sim \text{Bin}(n, p) modeling the number of successes in nn independent trials each with success probability pp, the expected value is E[X]=npE[X] = np.[47] For a normal distribution XN(μ,σ2)X \sim N(\mu, \sigma^2), the expected value is exactly the location parameter E[X]=μE[X] = \mu, centering the symmetric bell-shaped density.[48] The central limit theorem underscores the expected value's theoretical importance: for independent and identically distributed random variables X1,,XnX_1, \dots, X_n with finite mean μ=E[Xi]\mu = E[X_i] and variance σ2>0\sigma^2 > 0, the sample mean Xˉn=n1i=1nXi\bar{X}_n = n^{-1} \sum_{i=1}^n X_i converges in distribution to N(μ,σ2/n)N(\mu, \sigma^2 / n) as nn \to \infty, implying that Xˉn\bar{X}_n concentrates around the population expected value μ\mu.[49] This convergence justifies using sample averages to estimate population means in large datasets. In applications, the expected value informs risk assessment by quantifying average outcomes under uncertainty, such as calculating the anticipated loss or gain in financial portfolios to guide investment decisions.[50] It also supports forecasting by providing the predicted average value in probabilistic models, as in project management where expected completion times account for variable durations across scenarios.[51] Computing expected values becomes challenging in high-dimensional settings where analytical integration is intractable, prompting the use of Monte Carlo methods, which approximate E[X]E[X] by averaging samples drawn from the distribution: E^[X]n1i=1nXi\hat{E}[X] \approx n^{-1} \sum_{i=1}^n X_i for large nn, with error decreasing as O(1/n)O(1/\sqrt{n}) independent of dimension.[52] In 2025 AI contexts, these methods are integral to uncertainty quantification in neural networks, such as Monte Carlo dropout for estimating predictive means in high-dimensional inference tasks like image recognition or reinforcement learning.[53]

Weighted Arithmetic Mean

The weighted arithmetic mean extends the concept of the arithmetic mean by assigning different levels of importance, or weights, to each data point in a dataset. For a finite set of values x1,x2,,xnx_1, x_2, \dots, x_n with corresponding positive weights w1,w2,,wn>0w_1, w_2, \dots, w_n > 0, the weighted arithmetic mean xˉw\bar{x}_w is defined as
xˉw=i=1nwixii=1nwi. \bar{x}_w = \frac{\sum_{i=1}^n w_i x_i}{\sum_{i=1}^n w_i}.
This formulation ensures that the result is a weighted average that accounts for the relative significance of each xix_i, as established in standard statistical texts on measures of central tendency.[54] A key property of the weighted arithmetic mean is that it reduces to the ordinary arithmetic mean when all weights are equal, i.e., wi=1w_i = 1 for all ii, providing a direct generalization of the unweighted case. Additionally, when the weights are normalized such that i=1nwi=1\sum_{i=1}^n w_i = 1 and each wi0w_i \geq 0, the expression simplifies to xˉw=i=1nwixi\bar{x}_w = \sum_{i=1}^n w_i x_i, forming a convex combination of the values, which lies within the convex hull of the data points and preserves properties like boundedness between the minimum and maximum values. This convexity ensures the weighted mean is a stable estimator in optimization contexts, as it maintains the affine structure of linear combinations.[55][54] In applications, the weighted arithmetic mean is widely used in education to compute grade point averages (GPAs), where course grades are weighted by the number of credit hours to reflect their relative academic load. For instance, a student with grades of 3.0 in a 3-credit course and 4.0 in a 4-credit course has a GPA of (3×3.0+4×4.0)/(3+4)=3.57(3 \times 3.0 + 4 \times 4.0)/(3+4) = 3.57. In survey sampling, particularly stratified random sampling, it estimates population parameters by weighting stratum-specific sample means proportionally to the stratum sizes, improving precision over simple random sampling for heterogeneous populations. An example occurs in finance, where the expected return of an investment portfolio is calculated as the weighted arithmetic mean of individual asset returns, with weights corresponding to the proportion of capital allocated to each asset, such as 60% in stocks yielding 8% and 40% in bonds yielding 4%, resulting in a portfolio return of 0.6×8%+0.4×4%=6.4%0.6 \times 8\% + 0.4 \times 4\% = 6.4\%.[56] More recently, the weighted arithmetic mean has found applications in machine learning, particularly in weighted loss functions that adjust training emphasis for imbalanced datasets. In object detection tasks, the focal loss function, introduced in RetinaNet, dynamically weights the cross-entropy loss to down-weight easy examples and focus on hard negatives, achieving substantial improvements in mean average precision (mAP) by up to 5.7 points on the COCO dataset compared to prior methods. This approach highlights the weighted mean's role in enhancing model performance by prioritizing challenging data points during optimization.[57]

Generalized Means

Power Mean

The power mean of order pp, also known as the generalized mean or Hölder mean, is a family of means that generalizes several classical averages for a finite set of positive real numbers x1,x2,,xnx_1, x_2, \dots, x_n. For p0p \neq 0, it is defined as
Mp(x)=(1ni=1nxip)1/p, M_p(\mathbf{x}) = \left( \frac{1}{n} \sum_{i=1}^n x_i^p \right)^{1/p},
where x=(x1,,xn)\mathbf{x} = (x_1, \dots, x_n) and pRp \in \mathbb{R} is the order parameter.[58] For p=0p = 0, the power mean is defined as the limit
M0(x)=limp0Mp(x)=exp(1ni=1nlnxi), M_0(\mathbf{x}) = \lim_{p \to 0} M_p(\mathbf{x}) = \exp\left( \frac{1}{n} \sum_{i=1}^n \ln x_i \right),
which yields the geometric mean. As pp \to \infty, Mp(x)M_p(\mathbf{x}) approaches the maximum value maxixi\max_i x_i, and as pp \to -\infty, it approaches the minimum minixi\min_i x_i.[58] This family was introduced by Otto Hölder in the context of his inequality, providing a unified framework for aggregating data based on the exponent pp.[59] The power mean relates directly to the classical Pythagorean means as special cases: M1(x)M_1(\mathbf{x}) is the arithmetic mean, M0(x)M_0(\mathbf{x}) (in the limit) is the geometric mean, and M1(x)M_{-1}(\mathbf{x}) is the harmonic mean. A key property is its monotonicity: for fixed x\mathbf{x} with all xi>0x_i > 0, Mp(x)M_p(\mathbf{x}) is non-decreasing in pp, meaning Mq(x)Mp(x)M_q(\mathbf{x}) \leq M_p(\mathbf{x}) whenever qpq \leq p.[59] This monotonicity underpins the power mean inequality, which generalizes classical inequalities like AM-GM-HM and holds for all real pqp \geq q. Power means find applications in inequality theory, where the monotonicity property extends classical results to arbitrary orders, facilitating proofs in optimization and analysis.[60] In signal processing, the case p=2p=2 corresponds to the root mean square (RMS),
M2(x)=1ni=1nxi2, M_2(\mathbf{x}) = \sqrt{\frac{1}{n} \sum_{i=1}^n x_i^2},
which measures the effective magnitude of alternating signals, such as voltages in AC circuits, where it equals the DC value producing the same power dissipation.[58] For example, for a sinusoidal voltage v(t)=Vsin(ωt)v(t) = V \sin(\omega t), the RMS value is V/2V / \sqrt{2}, providing a standardized metric for power calculations in electrical engineering.[61]

Quasi-Arithmetic Mean

The quasi-arithmetic mean, also known as the Kolmogorov mean or ff-mean, generalizes classical means for a finite set of positive real numbers x1,x2,,xnx_1, x_2, \dots, x_n via a continuous and strictly monotonic function f:(0,)Rf: (0, \infty) \to \mathbb{R}. It is defined as
Mf(x1,,xn)=f1(1ni=1nf(xi)), M_f(x_1, \dots, x_n) = f^{-1}\left( \frac{1}{n} \sum_{i=1}^n f(x_i) \right),
where f1f^{-1} denotes the inverse function of ff. This formulation allows the construction of means tailored to specific transformation behaviors by varying ff. The concept was introduced by Andrey Kolmogorov in 1930, with independent work by Mitio Nagumo in the same year characterizing such means through functional equations.[62] Particular selections of ff recover standard means. When f(x)=xf(x) = x, MfM_f is the arithmetic mean. For f(x)=lnxf(x) = \ln x (with xi>0x_i > 0), it yields the geometric mean. Choosing f(x)=1/xf(x) = 1/x (again for xi>0x_i > 0) produces the harmonic mean. These examples illustrate how the quasi-arithmetic framework unifies the Pythagorean means through appropriate monotonic transformations.[63] Quasi-arithmetic means exhibit key properties that make them versatile for averaging. They are internal, meaning min{xi}Mf(x1,,xn)max{xi}\min\{x_i\} \leq M_f(x_1, \dots, x_n) \leq \max\{x_i\}, and idempotent, so Mf(a,,a)=aM_f(a, \dots, a) = a for any a>0a > 0. The power mean arises as a special case by selecting f(x)=xpf(x) = x^p for p0p \neq 0.[63][64] In economics, quasi-arithmetic means facilitate custom averaging in utility theory and risk analysis, where the function ff can reflect nonlinear preferences or downward risk measures. For instance, they underpin combined indices of utility and weighting functions to evaluate economic aggregates. In data science, modern extensions employ quasi-arithmetic means for non-linear aggregation in tasks like entropy estimation and information geometry, enabling flexible Bregman-based divergences for machine learning models. These applications highlight their role beyond linear statistics, supporting robust handling of skewed or transformed datasets.[65][66][67]

Robust and Truncated Means

Truncated Mean

The truncated mean, also known as the trimmed mean, is a robust estimator of central tendency computed by first sorting a dataset in ascending order and then excluding a predetermined proportion α\alpha of the highest and lowest values from each tail before calculating the arithmetic mean of the remaining observations.[68] For instance, a 5% trimmed mean removes the bottom 5% and top 5% of the sorted data, reducing the influence of potential outliers while preserving more information than the median.[69] This approach addresses the sensitivity of the standard arithmetic mean to extreme values, making it particularly useful in datasets prone to contamination or heavy tails.[70] A key variant is the Winsorized mean, which instead of discarding extreme values, replaces the lowest α\alpha proportion with the smallest retained value and the highest α\alpha proportion with the largest retained value, then computes the mean of this adjusted dataset.[71] This method retains all observations, avoiding the loss of data points associated with trimming, and is similarly robust to outliers by capping their impact.[72] The interquartile mean represents a specific instance of the 25% trimmed mean, focusing on the central 50% of the data.[68] Truncated and Winsorized means exhibit lower sensitivity to outliers compared to the arithmetic mean, with breakdown points up to α\alpha for trimming, allowing robustness against up to that fraction of corrupted data.[70] However, in small samples, these estimators can introduce bias, as the removal or capping of extremes may systematically shift the estimate away from the true population mean, especially if the trimming proportion is large relative to the sample size.[73] Their variance is generally higher than that of the arithmetic mean under normality but decreases with increasing sample size, approaching the efficiency of the mean asymptotically. In applications, truncated means are employed in sports scoring, such as Olympic gymnastics judging, where the highest and lowest scores from a panel are discarded to mitigate bias from extreme opinions.[74] They also appear in signal processing for robust estimation, such as in alpha-trimmed mean filters that suppress noise in images by averaging windowed data after trimming extremes, improving edge preservation over simple averaging.[75] For example, consider a class of 20 exam scores ranging from 45 to 98; a 10% trimmed mean excludes the two lowest (45, 52) and two highest (95, 98) scores, then averages the remaining 16 values (e.g., yielding approximately 78 if the central scores average that figure), providing a fairer assessment less skewed by potential cheating or errors.[76] In the context of big data, efficient algorithms enable O(n computation of trimmed means without full sorting, such as pairwise aggregation methods that facilitate scalable robust estimation in high-volume datasets.

Interquartile Mean

The interquartile mean is defined as the arithmetic mean of the data values lying between the first quartile (Q1) and the third quartile (Q3) in a sorted dataset. This approach discards the lowest 25% and highest 25% of the observations, focusing exclusively on the central 50% to provide a measure of central tendency.[77][78] The formula for the interquartile mean (IQM) of a sorted dataset {x1x2xn}\{x_1 \leq x_2 \leq \dots \leq x_n\} is given by
IQM={iQ1xiQ3}xi#{iQ1xiQ3}, \text{IQM} = \frac{\sum_{\{i \mid Q_1 \leq x_i \leq Q_3\}} x_i}{\# \{i \mid Q_1 \leq x_i \leq Q_3\}},
where the summation is over the indices ii such that xix_i falls between Q1 and Q3, and the denominator is the count of such values.[78] This formulation ensures that approximately half the data contributes to the calculation, with the exact number depending on the dataset size and quartile positions. As a type of truncated mean, the interquartile mean exhibits strong resistance to outliers, remaining stable even if up to 25% of the data are contaminated by extreme values. It is particularly valued in descriptive statistics for summarizing datasets with skewness or anomalies, where the full arithmetic mean might be misleading.[77][68] In applications, the interquartile mean is employed in robust statistical analysis of environmental data, such as air quality measurements that often include extreme pollution spikes due to unusual events. It helps provide a reliable central estimate without distortion from these outliers. For instance, consider the dataset {1,2,3,10,11,100}\{1, 2, 3, 10, 11, 100\}; here, Q1 = 2 and Q3 = 11, so the interquartile mean [is (2](/page/IS2)+3+10+11)/4=[6.5](/page/6.5)(2](/page/IS-2) + 3 + 10 + 11)/4 = [6.5](/page/6.5).[78] Compared to the midrange, which averages only the minimum and maximum values and is highly sensitive to extremes, the interquartile mean offers greater robustness by incorporating more central data points.[77]

Means for Special Data Types

Mean of Angles and Circular Quantities

When averaging angles or other circular quantities, the standard arithmetic mean can produce misleading results due to the periodic nature of the data, where values wrap around at 360° (or 2π radians). For instance, the angles 1° and 359° have an arithmetic mean of 180°, which incorrectly suggests a direction opposite to the clustered values near 0°; this occurs because the arithmetic mean treats the circle as a linear interval, ignoring its topology.[79] To address this, the circular mean (or mean direction) is used, defined as the angle corresponding to the resultant vector from unit vectors representing each data point. For a set of angles θi\theta_i (in radians, i=1,,ni = 1, \dots, n), the circular mean μ\mu is given by
μ=\atantwo(1ni=1nsinθi,1ni=1ncosθi), \mu = \atantwo\left( \frac{1}{n} \sum_{i=1}^n \sin \theta_i, \frac{1}{n} \sum_{i=1}^n \cos \theta_i \right),
where \atantwo(y,x)\atantwo(y, x) is the two-argument arctangent function that returns the angle in the correct quadrant, ensuring μ\mu lies in (π,π](-\pi, \pi].[79] This mean is defined modulo 2π2\pi, reflecting the circular domain, and relies on vector addition in the plane, where each angle is projected onto the unit circle as (cosθi,sinθi)(\cos \theta_i, \sin \theta_i); the resultant length R=(1ncosθi)2+(1nsinθi)2R = \sqrt{\left( \frac{1}{n} \sum \cos \theta_i \right)^2 + \left( \frac{1}{n} \sum \sin \theta_i \right)^2} quantifies concentration, with R=1R = 1 for perfect alignment and R=0R = 0 for uniform dispersion. Applications include analyzing wind directions in meteorology, where circular means aggregate prevailing flows without boundary artifacts; clock times on a 24-hour cycle, such as averaging event occurrences modulo 24 hours; and phase angles in physics, like synchronizing wave phases in signal processing or quantum mechanics.[80][81] For example, the directions 0°, 10°, and 350° (converted to radians) yield means of cos0.99\cos \approx 0.99 and sin0\sin \approx 0, so μ0\mu \approx 0^\circ, correctly capturing the clustering near north.[79] A key variant is circular variance, measuring dispersion as V=1RV = 1 - R, which ranges from 0 (no spread) to 1 (maximum spread) and complements the mean by assessing data concentration without assuming a linear scale. In recent developments, Bayesian approaches for circular data have emerged in geospatial AI, particularly for spatio-temporal interpolation of directional observations like wind patterns, using hierarchical models such as wrapped distributions or von Mises mixtures to incorporate priors and uncertainty in large-scale environmental datasets.[82]

Fréchet Mean

The Fréchet mean, also known as the Karcher mean or Riemannian barycenter in certain contexts, generalizes the concept of a central tendency to arbitrary metric spaces. For a set of points $ {x_1, \dots, x_n} $ in a metric space $ (M, d) $, it is defined as the point $ y \in M $ that minimizes the expected squared distance to the data points, given by
y=argminzM1ni=1nd(z,xi)2. y = \arg\min_{z \in M} \frac{1}{n} \sum_{i=1}^n d(z, x_i)^2.
[83] This formulation extends the classical notion of averaging beyond vector spaces, where addition may not be defined, by relying solely on the metric structure.[84]
In Euclidean spaces equipped with the standard $ \ell_2 $ metric, the Fréchet mean coincides with the arithmetic mean, as the minimizer of the sum of squared distances is the centroid.[84] This reduction highlights its role as a unifying framework, recovering familiar statistics in familiar settings while enabling extensions to non-Euclidean geometries. On Riemannian manifolds, computing the Fréchet mean typically requires iterative optimization algorithms, such as gradient descent on the manifold or recursive estimation procedures, due to the lack of closed-form solutions.[85] These methods leverage the manifold's tangent spaces and exponential maps to approximate the minimizer, converging under conditions like small data variance or bounded curvature.[86] The Fréchet mean is unique when the objective functional is strictly convex, which holds in convex metric spaces or under sufficient geodesic convexity in the data support; otherwise, multiple local minima may exist in non-convex spaces.[87] For instance, on the unit sphere $ S^2 $ with the great-circle distance metric, the Fréchet mean of scattered points represents their "average orientation," computed by minimizing the sum of squared geodesic distances, often yielding a point near the geometric center of the spherical convex hull.[88] Applications of the Fréchet mean span diverse fields, including shape analysis where it averages landmark configurations on deformation spaces, robotics for pose estimation by aggregating orientations or transformations, and machine learning for initializing clustering centroids in non-Euclidean feature spaces.[89][90][86] In the 2020s, it has gained traction in deep learning for aggregating embeddings on Riemannian manifolds, such as hyperbolic spaces for hierarchical data or symmetric positive definite matrices for covariance representations, enabling end-to-end differentiable averaging in neural architectures.[91] The circular mean emerges as a special case when applied to the circle manifold.[92]

Means in Geometric Contexts

In geometry, the centroid of a triangle represents the arithmetic mean of its vertices' positions. For a triangle with vertices at position vectors A\mathbf{A}, B\mathbf{B}, and C\mathbf{C}, the centroid G\mathbf{G} is given by G=A+B+C3\mathbf{G} = \frac{\mathbf{A} + \mathbf{B} + \mathbf{C}}{3}, serving as the mean position that balances the triangle's mass if uniformly distributed. This concept extends to the mean position in triangular configurations, where the centroid minimizes the sum of squared distances to the vertices, analogous to the arithmetic mean in one dimension. In triangular sets, means can be computed using barycentric coordinates, which express points as weighted averages of the vertices with weights summing to unity. The barycentric mean of points within a triangle weights contributions by their areal coordinates, providing a geometrically intuitive average that preserves affine properties. For instance, the area-weighted mean position in a subdivided triangular mesh averages vertex positions proportional to the areas of adjacent triangles, ensuring balanced representation in irregular triangulations. This approach is foundational in computational geometry for tasks like mesh smoothing. The geometric mean finds application in triangles through the lengths of sides or altitudes. For altitudes, the geometric mean provides a measure of the triangle's "average height," useful in optimization problems where scaling factors are involved, such as in similar triangles. A key example is the geometric mean theorem in right triangles: the altitude to the hypotenuse is the geometric mean of the two segments it divides the hypotenuse into, and each leg is the geometric mean of the hypotenuse and the projection of that leg on the hypotenuse. Applications of these geometric means abound in computer graphics and surveying. In graphics, the centroid and barycentric means enable realistic rendering of triangular meshes by interpolating textures or colors at mean positions, as seen in algorithms for Gouraud shading where vertex attributes are averaged. Surveying leverages the centroid as a mean coordinate for establishing control points in triangular networks, minimizing errors in geodetic computations. In geographic information systems (GIS), computational geometry employs area-weighted triangular means to aggregate spatial data over polygonal regions, facilitating accurate interpolation in terrain modeling and urban planning. These uses underscore the role of means in preserving geometric integrity across scales.

Other Specialized Means

Integral Mean of a Function

The integral mean of a function, also known as the average value of the function over an interval, provides a continuous analog to the discrete arithmetic mean by integrating the function over its domain. For a continuous function f(x)f(x) defined on the closed interval [a,b][a, b], the integral mean is given by
favg=1baabf(x)dx. f_{\text{avg}} = \frac{1}{b - a} \int_a^b f(x) \, dx.
This formula arises from the concept of averaging the function's values uniformly across the interval, where the integral represents the net area under the curve and division by the interval length normalizes it.[20] The Mean Value Theorem for Integrals guarantees that if ff is continuous on [a,b][a, b], there exists some c[a,b]c \in [a, b] such that f(c)=favgf(c) = f_{\text{avg}}, meaning the function attains its average value at least once in the interval. This theorem links the integral mean directly to the function's behavior and is fundamental in proving properties of definite integrals.[20] Variants of the integral mean incorporate weighting to emphasize certain parts of the domain. For a non-negative weight function w(x)w(x) with abw(x)dx=1\int_a^b w(x) \, dx = 1 (acting as a density), the weighted integral mean is
favg,w=abf(x)w(x)dx. f_{\text{avg}, w} = \int_a^b f(x) w(x) \, dx.
If w(x)w(x) is not normalized, divide by abw(x)dx\int_a^b w(x) \, dx. Such weighted forms are essential when the uniform measure is inappropriate, allowing for customized averaging based on relevance across the interval.[93] The integral mean connects to discrete means through Riemann sums, where partitioning the interval [a,b][a, b] into nn subintervals and summing f(xi)Δxf(x_i^*) \Delta x (with Δx=(ba)/n\Delta x = (b - a)/n) approximates the integral; as nn \to \infty, this sum converges to (ba)favg(b - a) f_{\text{avg}}, bridging continuous and discrete averaging./05:_Integration/5.03:_Riemann_Sums) In applications, the integral mean quantifies average behaviors in continuous systems. In physics, it computes quantities like average velocity over time, where for position s(t)s(t), the average velocity is 1T0Tv(t)dt=s(T)s(0)T\frac{1}{T} \int_0^T v(t) \, dt = \frac{s(T) - s(0)}{T}, illustrating conservation principles via the Fundamental Theorem of Calculus.[94] In signal processing, the time average of a deterministic signal s(t)s(t) over duration TT is 1T0Ts(t)dt\frac{1}{T} \int_0^T s(t) \, dt, used to extract DC components or steady-state values from waveforms, such as in rectifier circuits where the average voltage of a sinusoidal input is 2πVp\frac{2}{\pi} V_p.[95] A representative example is the integral mean of f(x)=sinxf(x) = \sin x over [0,π][0, \pi]:
sinavg=1π00πsinxdx=1π[cosx]0π=2π0.637. \sin_{\text{avg}} = \frac{1}{\pi - 0} \int_0^\pi \sin x \, dx = \frac{1}{\pi} [-\cos x]_0^\pi = \frac{2}{\pi} \approx 0.637.
This value, greater than zero despite the function's symmetry around π/2\pi/2, reflects the positive net area in the interval.[96] For a uniform distribution on [a,b][a, b], the integral mean of f(X)f(X) coincides with the expected value E[f(X)]=1baabf(x)dxE[f(X)] = \frac{1}{b - a} \int_a^b f(x) \, dx, highlighting its role as a special case in probability where the density is constant.[46] In modern stochastic processes, the integral mean extends to random functions via stochastic integrals, such as the Itô integral 0Tf(t,ω)dWt\int_0^T f(t, \omega) \, dW_t for a Brownian motion WtW_t, where the mean E[0Tf(t,ω)dWt]=0E\left[\int_0^T f(t, \omega) \, dW_t\right] = 0 due to the martingale property, enabling analysis of random fluctuations in finance and physics.[97]

Swanson's Rule

Swanson's rule provides an approximation for the mean of a positively skewed distribution using a weighted combination of its lower, median, and upper fractiles. It is defined by the formula
μ0.3P10+0.4P50+0.3P90, \mu \approx 0.3 P_{10} + 0.4 P_{50} + 0.3 P_{90},
where P10P_{10} is the 10th percentile (conservative estimate), P50P_{50} is the median, and P90P_{90} is the 90th percentile (optimistic estimate). This weighting scheme, often called the 30-40-30 rule, emphasizes the median while balancing the tails to better capture the expected value in distributions where the arithmetic mean of all data points may be misleading due to skewness. The rule originated in 1972 from an internal memorandum by Roy Swanson, a geologist at Exxon, who developed it to estimate the mean size of oil fields from probabilistic assessments of reserves. Swanson aimed to create a practical heuristic for resource evaluation when full distributional data were unavailable, drawing on empirical observations of field size distributions that often follow log-normal patterns. It gained wider adoption in the geosciences community through subsequent publications and has since become a standard tool in risk analysis for uncertain quantities.[98][99] In applications, Swanson's rule is particularly valuable in petroleum engineering and geostatistics for aggregating probabilistic forecasts, such as expected recoverable oil volumes or basin-wide resource potentials. For instance, in assessing an oil field, if the low-case estimate (P10P_{10}) is 50 million barrels, the median (P50P_{50}) is 100 million barrels, and the high-case (P90P_{90}) is 300 million barrels, the approximated mean is 0.3×50+0.4×100+0.3×300=1400.3 \times 50 + 0.4 \times 100 + 0.3 \times 300 = 140 million barrels, providing a balanced expectation for economic planning. This approach is favored in scenarios involving log-normal or modestly skewed data, where it outperforms the simple arithmetic average by reducing bias from extreme values.[99] As a simple, ad-hoc method, Swanson's rule offers robustness without requiring computational simulations like Monte Carlo methods, making it accessible for quick assessments in field evaluations or regulatory reporting. Studies have validated its accuracy for log-normal distributions, where it closely reproduces the true mean, though it may underperform for highly skewed or multi-modal cases, prompting alternatives like full probabilistic modeling in modern contexts.[100]

References

User Avatar
No comments yet.