Hubbry Logo
search
logo

Geometric standard deviation

logo
Community Hub0 Subscribers
Read side by side
from Wikipedia

In probability theory and statistics, the geometric standard deviation (GSD) describes how spread out are a set of numbers whose preferred average is the geometric mean. For such data, it may be preferred to the more usual standard deviation. Note that unlike the usual arithmetic standard deviation, the geometric standard deviation is a multiplicative factor, and thus is dimensionless, rather than having the same dimension as the input values. Thus, the geometric standard deviation may be more appropriately called geometric SD factor.[1][2] When using geometric SD factor in conjunction with geometric mean, it should be described as "the range from (the geometric mean divided by the geometric SD factor) to (the geometric mean multiplied by the geometric SD factor), and one cannot add/subtract "geometric SD factor" to/from geometric mean.[3]

Definition

[edit]

If the geometric mean of a set of numbers is denoted as , then the geometric standard deviation is

Derivation

[edit]

If the geometric mean is

then taking the natural logarithm of both sides results in

The logarithm of a product is a sum of logarithms (assuming is positive for all ), so

It can now be seen that is the arithmetic mean of the set , therefore the arithmetic standard deviation of this same set should be

This simplifies to

Geometric standard score

[edit]

The geometric version of the standard score is

If the geometric mean, standard deviation, and z-score of a datum are known, then the raw score can be reconstructed by

Relationship to log-normal distribution

[edit]

The geometric standard deviation is used as a measure of log-normal dispersion analogously to the geometric mean.[3] As the log-transform of a log-normal distribution results in a normal distribution, we see that the geometric standard deviation is the exponentiated value of the standard deviation of the log-transformed values, i.e. .

As such, the geometric mean and the geometric standard deviation of a sample of data from a log-normally distributed population may be used to find the bounds of confidence intervals analogously to the way the arithmetic mean and standard deviation are used to bound confidence intervals for a normal distribution. See discussion in log-normal distribution for details.

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The geometric standard deviation (GSD) is a measure of statistical dispersion for positive real numbers, serving as the multiplicative analog to the arithmetic standard deviation when the geometric mean is the appropriate central tendency, especially for lognormally distributed data. It quantifies variability by exponentiating the standard deviation of the natural logarithms of the data, yielding a unitless factor greater than or equal to 1 that describes how much the data spreads multiplicatively around the geometric mean—for instance, a GSD of 1.2 indicates that approximately 68% of the values lie between the geometric mean divided by 1.2 and multiplied by 1.2 in lognormal distributions.[1][2] Introduced by biostatistician T.B.L. Kirkwood in 1979 to address limitations of arithmetic measures for skewed, ratio-based data, the GSD transforms the dataset via logarithms to normalize it before applying standard deviation calculations, then back-transforms the result to preserve the original scale's multiplicative nature.[2] The formal definition for a sample $ x_1, x_2, \dots, x_n > 0 $ involves computing the sample standard deviation $ s $ of $ y_i = \ln(x_i) $, followed by $ \text{GSD} = e^s $, where the natural logarithm ensures the measure is invariant to proportional scaling of the data.[3][1] This approach contrasts with the arithmetic standard deviation, which is additive and suited to normal distributions, as the GSD cannot be added or subtracted but instead multiplies or divides the geometric mean to form confidence-like intervals.[2] Key properties of the GSD include its dimensionless quality and minimum value of 1 (achieved when all data are identical), making it ideal for expressing relative variability in percentages via the geometric coefficient of variation, defined as $ 100(\text{GSD} - 1)% $.[3] In practice, software implementations like SAS, SciPy, and R compute it directly from log-transformed data, often adjusting for degrees of freedom in sample estimates.[3][1] The measure assumes lognormality for optimal interpretability, where it captures about two-thirds of the data within the factor bounds, but it can be applied more broadly to positive skewed datasets with caution.[2][3] Applications of the GSD span fields requiring analysis of multiplicative processes, such as environmental science for pollutant concentrations (e.g., reporting geometric means for radionuclides like 210Pb^{210}\text{Pb} at 0.52 mBq m3^{-3}), aerosol engineering for particle size distributions in pharmaceuticals (where GSD = $ d_{84}/d_{50} $ or similar percentiles define spread), and finance for modeling investment returns or compounded growth rates.[4] In biomedical research, it evaluates assay variability and bioequivalence, such as inter-laboratory differences in drug potency, while in demography, it assesses population growth fluctuations over time.[5][4] These uses highlight its utility in summarizing data where ratios or percentages dominate, ensuring interpretations remain proportional rather than absolute.[3]

Core Concepts

Definition

The geometric standard deviation (GSD) is a measure of dispersion applicable to sets of positive real numbers, especially those exhibiting multiplicative variability or following a log-normal distribution. It quantifies the spread of data on a multiplicative scale by taking the exponential of the standard deviation of the natural logarithms of the data values, thereby transforming the additive spread in the logarithmic domain back to the original scale.[3][2] For a sample of nn positive values x1,x2,,xn>0x_1, x_2, \dots, x_n > 0, the GSD is calculated as
σg=exp(1n1i=1n(lnxiμg)2), \sigma_g = \exp\left( \sqrt{\frac{1}{n-1} \sum_{i=1}^n (\ln x_i - \mu_g)^2} \right),
where μg=1ni=1nlnxi\mu_g = \frac{1}{n} \sum_{i=1}^n \ln x_i is the arithmetic mean of the logarithms (also known as the log-mean).[3] In the population context, for a log-normal random variable with parameters μ\mu (mean of the logarithms) and σ\sigma (standard deviation of the logarithms), the GSD simplifies to σg=exp(σ)\sigma_g = \exp(\sigma).[3] Intuitively, the GSD represents a multiplicative factor indicating how much the data spread around the geometric mean; for instance, roughly 68% of the observations lie within a factor of σg\sigma_g above or below the geometric mean when the logarithms are normally distributed.[2] This contrasts with additive interpretations of spread in other measures. The term "geometric standard deviation" was introduced by T. B. L. Kirkwood in 1979 within the framework of log-normal data analysis.[6]

Properties

The geometric standard deviation (GSD) is scale-invariant, meaning that if all data points are multiplied by a positive constant $ k > 0 $, the GSD remains unchanged, thereby preserving the relative ratios among the data values. This property stems from its definition as the exponential of the standard deviation of the natural logarithms of the data, which converts multiplicative scaling into an additive shift in the log-space without affecting the dispersion measure.[6] The sample GSD is a biased estimator of the population GSD, generally biased downward in finite samples because the underlying sample standard deviation of the log-transformed data underestimates the population value. An unbiased estimator for the variance in the log-space uses the divisor $ n-1 $, but correcting the GSD itself for bias requires an adjustment factor that accounts for the nonlinearity of the exponential function; approximate methods suffice for larger $ n $.[7] Confidence intervals for the GSD can be constructed using Fieller's theorem for parameters involving ratios on the log-scale or nonparametric bootstrap resampling of the log-transformed data. For related parameters like the geometric mean $ \mu_g $, an approximate 95% confidence interval is given by
exp(μg±1.96σn), \exp\left( \mu_g \pm \frac{1.96 \sigma}{\sqrt{n}} \right),
where $ \sigma $ is the standard deviation of the logs and $ n $ is the sample size; similar log-scale transformations apply to derive intervals for the GSD by exponentiating bounds on the log-dispersion.[8][9] On the log-scale, the GSD exhibits symmetry analogous to the arithmetic standard deviation for normal data, as it directly quantifies the spread of the logarithms, rendering it appropriate for positively skewed datasets where logs approximate normality.[9] The GSD has well-defined limits: it equals 1 when all data values are identical, reflecting zero dispersion on the log-scale, and approaches infinity as the variance of the log-transformed data grows without bound.[6]

Mathematical Foundations

Derivation

The geometric standard deviation addresses the limitations of the arithmetic standard deviation when dealing with positive data that exhibit multiplicative variability, such as growth rates or concentrations, where relative changes are more relevant than absolute ones. For such skewed distributions, a logarithmic transformation normalizes the data, converting products into sums and enabling the arithmetic standard deviation to capture relative dispersion effectively on the transformed scale. This approach is particularly justified for data approximately following a log-normal distribution, where the logs are normally distributed, allowing standard statistical tools to measure spread in a way that translates to multiplicative factors on the original scale.[2][3] To derive the formula, begin with a sample of nn positive observations x1,x2,,xn>0x_1, x_2, \dots, x_n > 0. Apply the natural logarithm to each: yi=lnxiy_i = \ln x_i for i=1,,ni = 1, \dots, n. Compute the sample mean of the transformed values: yˉ=1ni=1nyi\bar{y} = \frac{1}{n} \sum_{i=1}^n y_i. The sample standard deviation of the yiy_i is then
sy=1n1i=1n(yiyˉ)2. s_y = \sqrt{ \frac{1}{n-1} \sum_{i=1}^n (y_i - \bar{y})^2 }.
The geometric standard deviation σg\sigma_g is obtained by exponentiating this quantity: σg=esy\sigma_g = e^{s_y}. This step reverses the transformation, yielding a unitless factor that quantifies the typical multiplicative deviation from the geometric mean eyˉe^{\bar{y}}, analogous to how the arithmetic standard deviation measures additive spread.[6] This derivation assumes all data points are strictly positive, as the logarithm is undefined for non-positive values; in cases involving zeros, they are typically excluded from the calculation or handled by adding a small positive constant before transformation to approximate the limit behavior.[2] An alternative derivation views the geometric standard deviation through the population variance of the logs. For a random variable X>0X > 0 where lnX\ln X has variance Var(lnX)=σ2\mathrm{Var}(\ln X) = \sigma^2, the geometric standard deviation satisfies lnσg=Var(lnX)\ln \sigma_g = \sqrt{\mathrm{Var}(\ln X)}, so σg=eVar(lnX)\sigma_g = e^{\sqrt{\mathrm{Var}(\ln X)}}. This formulation connects to the moment-generating function of the log-normal distribution, where the second derivative at zero yields the variance of lnX\ln X, confirming the exponential relationship for multiplicative scale.[3] Under the assumption that the log-transformed data yiy_i follow a normal distribution, the sample variance sy2=1n1i=1n(yiyˉ)2s_y^2 = \frac{1}{n-1} \sum_{i=1}^n (y_i - \bar{y})^2 is an unbiased estimator of the population log-variance σ2\sigma^2, as (n1)sy2/σ2(n-1)s_y^2 / \sigma^2 follows a chi-squared distribution with n1n-1 degrees of freedom, ensuring E[sy2]=σ2E[s_y^2] = \sigma^2. Consequently, sys_y provides an unbiased basis for estimating σg=esy\sigma_g = e^{s_y}, though the exponential introduces slight bias in the final estimate.[10]

Relationship to Log-Normal Distribution

The geometric standard deviation (GSD) serves as a key parameter in the log-normal distribution, which models positive random variables subject to multiplicative effects. For a random variable XLN(μ,σ2)X \sim \mathrm{LN}(\mu, \sigma^2), where μ\mu and σ>0\sigma > 0 are the location and shape parameters of the underlying normal distribution of lnX\ln X, the geometric mean is given by exp(μ)\exp(\mu) and the GSD by σg=exp(σ)\sigma_g = \exp(\sigma).[3][11] This parameterization highlights how the GSD quantifies dispersion on the logarithmic scale, making it particularly suitable for data exhibiting multiplicative errors, such as growth processes or financial returns, where variability is proportional rather than additive.[3] The GSD connects directly to the coefficient of variation (CV) of the log-normal distribution, providing a mapping between measures of relative spread. Specifically, the CV is exp(σ2)1\sqrt{\exp(\sigma^2) - 1}, and the GSD relates to the CV by σg=exp(ln(1+CV2))\sigma_g = \exp\left( \sqrt{ \ln (1 + \mathrm{CV}^2 ) } \right).[3][11] This relation underscores the GSD's role in capturing the factor by which observations deviate multiplicatively from the geometric mean; for instance, approximately 68% of the data lie within the interval [μg/σg,μgσg][\mu_g / \sigma_g, \mu_g \sigma_g], analogous to the empirical rule for normal distributions but on a log scale.[3] Higher moments of the log-normal distribution can also be expressed in terms of the parameters of the underlying normal distribution, revealing the influence of σ\sigma on shape characteristics. The skewness is (eσ2+2)eσ21(e^{\sigma^2} + 2) \sqrt{e^{\sigma^2} - 1}, while the excess kurtosis is e4σ2+2e3σ2+3e2σ23e^{4\sigma^2} + 2e^{3\sigma^2} + 3e^{2\sigma^2} - 3.[11] These expressions show how larger σ\sigma values amplify positive skewness and leptokurtosis, common in log-normal data.[11] Estimation of the GSD from a sample assuming log-normality follows the maximum likelihood approach for the underlying normal parameters. The estimator is σ^g=exp(1ni=1n(lnxiμ^)2)\hat{\sigma}_g = \exp\left( \sqrt{\frac{1}{n} \sum_{i=1}^n (\ln x_i - \hat{\mu})^2} \right), where μ^=1ni=1nlnxi\hat{\mu} = \frac{1}{n} \sum_{i=1}^n \ln x_i is the sample mean of the log-transformed data.[3][11] This method leverages the log-transformation to yield unbiased estimates under the log-normal assumption.[3]

Geometric Standard Score

The geometric standard score, often denoted as the geometric z-score $ z_g $, provides a normalized measure of how far a positive value $ x $ deviates from the geometric mean $ \mu_g $ in terms of the geometric standard deviation $ \sigma_g $. It is defined by the formula
zg=ln(x/μg)lnσg, z_g = \frac{\ln(x / \mu_g)}{\ln \sigma_g},
where the logarithm is typically the natural logarithm, ensuring the score captures multiplicative relationships in the data.[12] This formulation standardizes deviations on a logarithmic scale, making it particularly suitable for datasets where ratios rather than differences are meaningful, such as growth rates or concentrations. The derivation of the geometric standard score stems directly from applying the standard z-score to the logarithms of the data. For a dataset following a log-normal distribution, the logarithms $ \ln x $ are normally distributed with mean $ \ln \mu_g $ and standard deviation $ \sigma_{\ln x} $, yielding the logarithmic z-score $ z = \frac{\ln x - \ln \mu_g}{\sigma_{\ln x}} $. Since the geometric standard deviation is related by $ \sigma_g = \exp(\sigma_{\ln x}) $, it follows that $ \ln \sigma_g = \sigma_{\ln x} $, so $ z_g = z $.[13] This equivalence preserves the probabilistic properties of the normal distribution while adapting to the original multiplicative scale.[14] In interpretation, a geometric standard score of $ z_g = 0 $ indicates that $ x $ equals $ \mu_g $, while values with $ |z_g| < 1 $ lie within one geometric standard deviation multiplicatively—meaning $ x $ is between $ \mu_g / \sigma_g $ and $ \mu_g \cdot \sigma_g $. This makes it valuable for detecting outliers in positively skewed, positive-valued data, as it emphasizes relative rather than absolute deviations, analogous to the arithmetic z-score but for ratio-based scales. For log-normally distributed data, $ z_g $ follows a standard normal distribution, enabling the use of normal quantiles for confidence intervals or thresholds; for instance, approximately 95% of values satisfy $ |z_g| < 1.96 $.[14] Confidence bands can thus be constructed as $ x \in \mu_g \exp(z_g \ln \sigma_g) $, providing multiplicative intervals.[13] As an example, consider $ x = 10 $, $ \mu_g = 5 $, and $ \sigma_g = 1.5 $. Substituting into the formula gives
zg=ln(10/5)ln1.5=ln2ln1.50.6930.4051.71, z_g = \frac{\ln(10 / 5)}{\ln 1.5} = \frac{\ln 2}{\ln 1.5} \approx \frac{0.693}{0.405} \approx 1.71,
indicating that 10 is about 1.71 geometric standard deviations above the geometric mean, or roughly three times larger than $ \mu_g $ adjusted for the spread.

Comparison to Arithmetic Measures

The arithmetic standard deviation (ASD) quantifies the additive spread of data points around the arithmetic mean, measuring absolute deviations in the original scale, whereas the geometric standard deviation (GSD) quantifies the multiplicative spread around the geometric mean, capturing relative or proportional variations, which is particularly suitable for datasets where ratios exceed 1.[2][15] In formula terms, the ASD is given by $ \sigma = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2} $, where $ \bar{x} $ is the arithmetic mean, allowing computation on any real-valued data but sensitive to outliers and zeros; the GSD, defined as $ e^{s} $ with $ s $ as the standard deviation of the logarithms of the data, circumvents issues with zeros or negatives by requiring strictly positive values but interprets dispersion as a unitless factor.[15][2] The GSD is preferable for log-skewed or lognormally distributed data, such as income distributions or biological measurements like cell sizes, where multiplicative effects dominate, while the ASD is more appropriate for symmetric, Gaussian-distributed data exhibiting additive variability.[2][15] The GSD relates to the coefficient of variation (CV, defined as ASD divided by the arithmetic mean) through an approximation where GSD ≈ 1 + CV for small variability, reflecting their shared focus on relative dispersion, though the GSD emphasizes multiplicative scaling. Key limitations of the GSD include its undefined status for non-positive data and less intuitive interpretation for absolute errors compared to the ASD, which remains applicable across broader data types despite potential distortion by skewness.[2][15] For the dataset {1, 2, 4}, the ASD is approximately 1.53 (measuring additive spread around the arithmetic mean of 2.33), while the GSD is exactly 2 (indicating that data points are spread by a multiplicative factor of 2 around the geometric mean of 2).

Applications

In Probability and Statistics

In probability and statistics, the geometric standard deviation (GSD) plays a key role in hypothesis testing for log-normal data, particularly through adaptations of t-tests that account for the multiplicative nature of the distribution. The log-normal t-test compares geometric means between two groups by first log-transforming the data and then applying a standard t-test on the logs, assuming equal GSDs across groups; this assumption is tested using an F-test on the log-transformed variances, where unequal GSDs (indicated by a small P-value) suggest differences in spread as significant as those in location.[16] When GSDs differ, the Welch log-normal t-test is preferred, as it adjusts for unequal variances on the log scale, providing better control of type I error rates and higher power, especially with imbalanced sample sizes or moderate to large GSDs (e.g., GeoSD > 2).[17] These methods outperform direct application of normal t-tests on untransformed data, which can reduce power by up to 50% for GeoSDs around 4 and effect sizes of threefold differences.[17] For constructing confidence intervals in Bayesian frameworks, GSD informs the parameterization of log-normal priors, where the prior on the log-scale standard deviation (σ) corresponds to a GSD of e^σ, enabling credible intervals that bound the geometric mean while respecting the distribution's asymmetry. This approach is particularly useful for inference on ratios or multiplicative effects, as the resulting intervals are asymmetric and naturally constrained to positive values, aligning with the log-normal's properties.[18][19] The GSD exhibits greater robustness to outliers than the arithmetic standard deviation (ASD) in heavy-tailed log-normal distributions, as the log transformation compresses extreme values, reducing their leverage on the measure of spread. For instance, in datasets with GeoSDs of 3 or higher, the GSD remains more stable relative to the ASD because outliers contribute less to the variance on the log scale.[17] In simulation methods, log-normal samples are generated with a specified GSD for Monte Carlo estimation of variances in probabilistic models, by setting the log-scale standard deviation as ln(GSD) in random number generators. This facilitates variance estimation for statistics like portfolio risks or process yields, where repeated sampling (e.g., 10,000 iterations) with fixed GSD approximates the distribution's tail behavior for reliable uncertainty quantification.[20] Such simulations are essential for validating inference procedures under log-normality, ensuring that estimated variances converge to true values even for GeoSDs up to 5.[21] Implementations of GSD are available in statistical software for both computation and simulation. In R, the EnvStats package provides the geoSD() function to calculate the sample GSD as the exponential of the standard deviation of log-transformed data, supporting robust estimation for positive-valued vectors.[22] For simulation, R's rlnorm() generates log-normal samples by specifying sdlog = ln(GSD). In Python, SciPy's stats.gstd() computes the GSD directly, while stats.lognorm.rvs(s=ln(GSD), scale=geometric_mean) produces samples for Monte Carlo applications, integrating seamlessly with NumPy for large-scale variance computations.[1]

In Finance and Other Fields

In finance, the geometric standard deviation (GSD) serves as a measure of volatility for asset returns, particularly when analyzing multiplicative changes in stock prices through daily multipliers (1 + r_i, where r_i is the return).[23] For instance, given daily multipliers of {1.01, 0.99, 1.02}, the GSD approximates 1.015, indicating a 1.5% multiplicative volatility that captures the compounded variability in returns.[24] This approach aligns with the Black-Scholes model, where volatility is parameterized as the standard deviation of log-returns (ln(1 + r_i)), equivalent to the natural logarithm of the GSD for price multipliers, enabling accurate option pricing under log-normal assumptions.[24] In biology and environmental science, GSD quantifies variability in growth rates, such as bacterial replication factors, where cell counts often follow log-normal distributions due to multiplicative processes.[25] For example, fluorescence-based estimates of bacterial densities use geometric means and GSDs to account for the skewed, positive nature of microbial growth data, providing robust summaries within a multiplicative factor of approximately 3 (GSD ≈ 3.06).[25] Similarly, in environmental monitoring, GSD describes the spread of pollutant concentrations, like indoor PM2.5 levels with a geometric mean of 41.1 μg/m³ and GSD of 1.3 in urban settings, or CO at 4.9 ppm with GSD 4.3 in rural areas, highlighting log-normal variability from sources like biomass burning.[26] In engineering reliability analysis, GSD indicates the spread in failure times modeled as log-normal distributions, where the parameter reflects multiplicative uncertainty in component lifespans.[27] For physical systems like electronic devices, the GSD derived from the standard deviation of log-transformed failure times helps assess dispersion in time-to-failure, supporting predictions of reliability under exponential-like degradation but with positive skew.[28] Beyond these domains, GSD applies to particle size distributions in physics, where it parameterizes the width of log-normal aerosol spectra, such as geometric standard deviations around 2.0 for atmospheric particles influencing cloud formation.[29] In economics, it measures inequality in wages or incomes, treating distributions as log-normal to capture relative disparities; for instance, multiplicative models use GSD alongside geometric means to evaluate skewed income data more naturally than arithmetic counterparts.[30] The GSD's advantage over arithmetic standard deviation lies in its inherent handling of percentage or multiplicative changes, preserving scale-invariance for positive, skewed data like returns or concentrations, thus providing a more intuitive measure of relative variability.[31]

References

User Avatar
No comments yet.