Recent from talks
Contribute something
Nothing was collected or created yet.
Delta method
View on WikipediaIn statistics, the delta method is a method of deriving the asymptotic distribution of a random variable. It is applicable when the random variable being considered can be defined as a differentiable function of a random variable which is asymptotically Gaussian.
History
[edit]The delta method was derived from propagation of error, and the idea behind was known in the early 20th century.[1] Its statistical application can be traced as far back as 1928 by T. L. Kelley.[2] A formal description of the method was presented by J. L. Doob in 1935.[3] Robert Dorfman also described a version of it in 1938.[4]
Univariate delta method
[edit]While the delta method generalizes easily to a multivariate setting, careful motivation of the technique is more easily demonstrated in univariate terms. Roughly, if there is a sequence of random variables Xn satisfying
where θ and σ2 are finite valued constants and denotes convergence in distribution, then
for any function g satisfying the property that its first derivative, evaluated at , exists and is non-zero valued.
The intuition of the delta method is that any such g function, in a "small enough" range of the function, can be approximated via a first order Taylor series (which is basically a linear function). If the random variable is roughly normal then a linear transformation of it is also normal. Small range can be achieved when approximating the function around the mean, when the variance is "small enough". When g is applied to a random variable such as the mean, the delta method would tend to work better as the sample size increases, since it would help reduce the variance, and thus the Taylor approximation would be applied to a smaller range of the function g at the point of interest.
Proof in the univariate case
[edit]Demonstration of this result is fairly straightforward under the assumption that is differentiable near the neighborhood of and is continuous at with . To begin, we use the mean value theorem (i.e.: the first order approximation of a Taylor series using Taylor's theorem):
where lies between Xn and θ. Note that since and , it must be that and since g′(θ) is continuous, applying the continuous mapping theorem yields
where denotes convergence in probability.
Rearranging the terms and multiplying by gives
Since
by assumption, it follows immediately from appeal to Slutsky's theorem that
This concludes the proof.
Proof with an explicit order of approximation
[edit]Alternatively, one can add one more step at the end, to obtain the order of approximation:
This suggests that the error in the approximation converges to 0 in probability.
Multivariate delta method
[edit]By definition, a consistent estimator B converges in probability to its true value β, and often a central limit theorem can be applied to obtain asymptotic normality:
where n is the number of observations and Σ is a (symmetric positive semi-definite) covariance matrix. Suppose we want to estimate the variance of a scalar-valued function h of the estimator B. Keeping only the first two terms of the Taylor series, and using vector notation for the gradient, we can estimate h(B) as
which implies the variance of h(B) is approximately
One can use the mean value theorem (for real-valued functions of many variables) to see that this does not rely on taking first order approximation.
The delta method therefore implies that
or in univariate terms,
Example: the binomial proportion
[edit]Suppose Xn is binomial with parameters and n. Since
we can apply the Delta method with g(θ) = log(θ) to see
Hence, even though for any finite n, the variance of does not actually exist (since Xn can be zero), the asymptotic variance of does exist and is equal to
Note that since p>0, as , so with probability converging to one, is finite for large n.
Moreover, if and are estimates of different group rates from independent samples of sizes n and m respectively, then the logarithm of the estimated relative risk has asymptotic variance equal to
This is useful to construct a hypothesis test or to make a confidence interval for the relative risk.
Alternative form
[edit]The delta method is often used in a form that is essentially identical to that above, but without the assumption that Xn or B is asymptotically normal. Often the only context is that the variance is "small". The results then just give approximations to the means and covariances of the transformed quantities. For example, the formulae presented in Klein (1953, p. 258) are:[5]
where hr is the rth element of h(B) and Bi is the ith element of B.
Second-order delta method
[edit]When g′(θ) = 0 the delta method cannot be applied. However, if g′′(θ) exists and is not zero, the second-order delta method can be applied. By the Taylor expansion, , so that the variance of relies on up to the 4th moment of .
The second-order delta method is also useful in conducting a more accurate approximation of 's distribution when sample size is small. . For example, when follows the standard normal distribution, can be approximated as the weighted sum of a standard normal and a chi-square with 1 degree of freedom.
Nonparametric delta method
[edit]A version of the delta method exists in nonparametric statistics. Let be an independent and identically distributed random variable with a sample of size with an empirical distribution function , and let be a functional. If is Hadamard differentiable with respect to the Chebyshev metric, then
where and , with denoting the empirical influence function for . A nonparametric pointwise asymptotic confidence interval for is therefore given by
where denotes the -quantile of the standard normal. See Wasserman (2006) p. 19f. for details and examples.
See also
[edit]References
[edit]- ^ Portnoy, Stephen (2013). "Letter to the Editor". The American Statistician. 67 (3): 190. doi:10.1080/00031305.2013.820668. S2CID 219596186.
- ^ Kelley, Truman L. (1928). Crossroads in the Mind of Man: A Study of Differentiable Mental Abilities. pp. 49–50. ISBN 978-1-4338-0048-1.
{{cite book}}: ISBN / Date incompatibility (help) - ^ Doob, J. L. (1935). "The Limiting Distributions of Certain Statistics". Annals of Mathematical Statistics. 6 (3): 160–169. doi:10.1214/aoms/1177732594. JSTOR 2957546.
- ^ Ver Hoef, J. M. (2012). "Who invented the delta method?". The American Statistician. 66 (2): 124–127. doi:10.1080/00031305.2012.687494. JSTOR 23339471.
- ^ Klein, L. R. (1953). A Textbook of Econometrics. p. 258.
Further reading
[edit]- Oehlert, G. W. (1992). "A Note on the Delta Method". The American Statistician. 46 (1): 27–29. doi:10.1080/00031305.1992.10475842. JSTOR 2684406.
- Wolter, Kirk M. (1985). "Taylor Series Methods". Introduction to Variance Estimation. New York: Springer. pp. 221–247. ISBN 0-387-96119-4.
- Wasserman, Larry (2006). All of Nonparametric Statistics. New York: Springer. pp. 19–20. ISBN 0-387-25145-6.
External links
[edit]- Asmussen, Søren (2005). "Some Applications of the Delta Method" (PDF). Lecture notes. Aarhus University. Archived from the original (PDF) on May 25, 2015.
- Feiveson, Alan H. "Explanation of the delta method". Stata Corp.
Delta method
View on GrokipediaFundamentals
Definition and Intuition
The delta method is a fundamental technique in asymptotic statistics for approximating the distribution of a smooth function applied to an asymptotically normal estimator. It relies on a first-order Taylor expansion to derive the asymptotic variance and normality of , where is an estimator of a parameter , and is a differentiable function. This approach is particularly valuable when the direct computation of the sampling distribution of is intractable, allowing statisticians to leverage the known asymptotic properties of itself.[6] The intuition arises from the linear approximation property of differentiable functions: for large sample sizes , concentrates around , so . The distribution of the transformed estimator then mirrors that of the original, scaled by the derivative , which captures how sensitive the function is to small changes in . This is especially useful for nonlinear transformations, such as taking the logarithm of an estimate or forming ratios of parameters, where exact distributions are often unavailable or complex.[6][7] In standard notation, represents the true parameter, its consistent estimator from a sample of size , a smooth function with derivative , and the asymptotic variance entering the central limit theorem for . The core theorem states that if , and is differentiable at with , then under suitable regularity conditions, This result establishes the asymptotic normality of the transformed estimator, facilitating inference for functions of parameters. The method extends naturally to multivariate settings via the Jacobian matrix, though the univariate case highlights the essential mechanism.[7][8]Univariate Delta Method
The univariate delta method provides an asymptotic approximation for the distribution of a smooth function of a scalar estimator that converges in distribution to a normal random variable. Specifically, suppose is an estimator satisfying as , where is the true parameter and . For a function that is continuously differentiable at with , the delta method states that .[9][7] This result follows from a first-order Taylor expansion of around : , which, when combined with the asymptotic normality of , propagates the limiting distribution to .[10] The approximation for the variance is then , often used to construct standard errors or confidence intervals for .[9] The method requires that is differentiable at , is consistent for (i.e., ), and satisfies the central limit theorem for asymptotic normality.[7][10] These conditions ensure the remainder term in the Taylor expansion vanishes in probability, validating the linear approximation asymptotically.[9] As a numerical illustration, consider estimating the log of a population mean using the sample mean from i.i.d. observations with and . Here, , and letting gives . The delta method yields , so the approximate standard error of is , or in practice, using consistent estimators.[11][10] For example, if , , and , the standard error is approximately .[10]Multivariate Delta Method
The multivariate delta method extends the approximation of asymptotic distributions to functions of vector-valued estimators. Consider a -dimensional parameter and a consistent estimator satisfying , where is the asymptotic covariance matrix and is the sample size.[8] For a -dimensional function that is continuously differentiable at , the theorem states that where is the Jacobian matrix of evaluated at .[8][12] This result relies on a first-order Taylor expansion of around , leveraging the asymptotic normality of . The Jacobian matrix consists of the partial derivatives of the components of , with entries .[8] It linearizes the transformation induced by , propagating the variability from to through matrix multiplication. The resulting asymptotic covariance matrix provides the variance-covariance structure for the transformed estimator.[12] A practical approximation follows: the variance-covariance matrix of is This holds provided is continuously differentiable in a neighborhood of , is consistent and asymptotically normal, and lies in the interior of the domain of .[8] For illustration, suppose one seeks the asymptotic variance of the sample coefficient of variation , where and are the sample mean and standard deviation estimating population parameters and . Here, , , and the Jacobian is the row vector . The asymptotic variance is then where is the asymptotic covariance matrix of .[10] This computation accounts for the correlation between and , yielding a more accurate approximation than ignoring dependencies.[12]Mathematical Foundations
Univariate Proof
To prove the univariate delta method, assume that is a consistent estimator of the parameter satisfying as , where is a sequence of random variables, and let be a continuously differentiable function at with .[13] Consider the first-order Taylor expansion of around : where the remainder term holds because is continuously differentiable at . Multiplying through by yields Since , it follows that , but here is a non-random constant. The term , so .[13] For the remainder, if . To establish this, assume is twice differentiable in a neighborhood of with satisfying a Lipschitz condition: for some constant and near . The Lagrange form of the Taylor remainder is then for some between and . Under the Lipschitz condition, for some constant , so . By Slutsky's theorem, since and , their sum converges in distribution to . Thus, or equivalently, , where denotes asymptotic equivalence in distribution.[13]Multivariate Proof
The multivariate delta method extends the univariate case by considering a vector-valued function applied to a -dimensional estimator , where is asymptotically normal. Under suitable regularity conditions, the asymptotic distribution of is derived using a first-order Taylor expansion and continuous mapping theorems for random vectors.[14] Assume , where is the true parameter vector in the interior of the parameter space, and is a positive definite covariance matrix. The function must be continuously differentiable (i.e., ) in a neighborhood of , ensuring the Jacobian matrix , of dimension , exists and is well-defined.[14][6] The proof begins with the first-order Taylor expansion of around : where lies on the line segment between and , and the remainder term is due to the differentiability of . Since , it follows that , and by the continuous mapping theorem, .[14][15] Multiplying through by , the expansion yields The remainder term , as . Thus, By the multivariate Slutsky's theorem, since and , their product converges in distribution to , where . The term vanishes in the limit, so . This establishes joint asymptotic normality for the components of , with the asymptotic covariance matrix given by the quadratic form involving the Jacobian and .[14][6][15] This vector formulation parallels the scalar analog in the univariate delta method but incorporates matrix multiplication to handle the multidimensional transformation.[14]Applications and Examples
Binomial Proportion Example
A common application of the univariate delta method arises in estimating the variance of the logit transformation of a binomial proportion estimator. Consider a binomial random variable , where the sample proportion is . The variance of is .[16] To approximate the variance of the log-odds, define the function , which represents the logit transformation relevant to odds ratios.[16] The first derivative of is . Applying the univariate delta method, the approximate variance is .[16] This formula provides the asymptotic variance for the estimated log-odds. This approximation is particularly useful as the standard error for the logit transform in logistic regression models, where the intercept corresponds to the log-odds of the baseline probability, and its standard error is . For illustration with and , the delta method yields an approximate variance of , demonstrating the high accuracy of the approximation even for moderate sample sizes.[17]Other Statistical Applications
The delta method finds wide application in deriving the asymptotic variance of the sample coefficient of variation (CV), a scale-free measure of relative dispersion defined as , where is the sample mean and is the sample standard deviation. Under the assumption of asymptotic normality of and , the multivariate delta method applies a first-order Taylor expansion to approximate the variance as for large from normally distributed data, enabling confidence intervals for relative variability in fields like biology and engineering.[18] This approach is particularly useful when comparing dispersion across datasets with differing scales, as it leverages the known asymptotic covariance structure between and .[9] In maximum likelihood estimation (MLE), the delta method provides the asymptotic distribution for nonlinear transformations of MLEs, which are typically asymptotically normal with known variance. For instance, if is an MLE with , then for a smooth function , . A common example is estimating rate parameters via in Poisson or exponential regression models, where the asymptotic variance of the transformed estimator is , facilitating inference on multiplicative effects like incidence rates.[9] This extends to generalized linear models, where it approximates standard errors for exponentiated coefficients without refitting the model.[19] The delta method also supports hypothesis testing through Wald-type confidence intervals for functions of parameters, notably odds ratios in 2×2 contingency tables. The log odds ratio is a smooth function of the cell proportions, and its variance is approximated as , where are the proportions and their covariance matrix; exponentiating yields intervals for the odds ratio itself. This is standard in epidemiological studies for assessing associations while accounting for the nonlinearity of the odds scale.[19] A key limitation arises when the derivative at the true parameter value, causing the first-order approximation to degenerate to zero variance and fail to capture the true asymptotic behavior, known as the flat spot issue; higher-order expansions are then required for accuracy.[9]Extensions and Variations
Second-Order Delta Method
The second-order delta method extends the first-order approximation by incorporating the quadratic term in the Taylor expansion of a smooth function around the true parameter , providing a more accurate representation of the estimator for finite samples. Specifically, the expansion is given by where the second-order term captures the leading bias in the approximation.[20] Taking expectations, the bias of is approximately , which is of order when . This bias arises from the curvature of and becomes relevant when the first-order approximation is insufficient, such as in scenarios where higher precision is needed beyond -consistency.[20] To obtain an asymptotically normal distribution that accounts for this bias, consider the centered estimator: This result adjusts for the bias term, yielding a centered normal limit with the same leading-order variance as the first-order delta method.[21] The second-order expansion also influences higher moments; for instance, the second derivative contributes to the kurtosis of through terms involving the fourth moment of , which can be incorporated for refined variance estimates in Edgeworth expansions. This adjustment is particularly useful when the first-order bias is significant.[21] The second-order delta method is especially valuable for small sample sizes or when the nonlinearity of amplifies the bias, such as in ratio estimators or transformations requiring bias correction for reliable inference; it integrates well with Edgeworth expansions to further mitigate skewness and improve coverage accuracy.[21]Alternative Forms
The delta method can be reformulated for estimating the variance of a ratio estimator, where the function of interest is , with and being asymptotically normal estimators. The Jacobian of at the true parameters is , leading to the approximate variance .[22] This form is particularly useful in survey sampling and epidemiology for approximating confidence intervals of proportions or rates without direct simulation.[23] From the perspective of influence functions, the delta method represents a first-order linearization of estimating equations, where the influence function of a functional captures the effect of infinitesimal contamination at observation on the estimator. For smooth functionals, the asymptotic variance of is given by the variance of the influence function, , which aligns directly with the delta method's Taylor expansion around the true distribution .[24] This connection is especially valuable in robust statistics and functional data analysis, as it facilitates variance estimation for complex estimators like those in semiparametric models.[25] In multivariate settings, an alternative formulation ties the delta method to the Hessian matrix for maximum likelihood estimators (MLEs), where the asymptotic covariance of the MLE is the inverse of the observed or expected Hessian, and the delta method then propagates this to functions via the gradient. This approach leverages the information matrix equality, ensuring the delta approximation remains consistent with the standard multivariate delta method for general smooth transformations.[26][27] Compared to direct simulation methods like bootstrapping, the delta method often outperforms in computational efficiency and simplicity, particularly for large samples or when model assumptions hold, as it avoids resampling overhead while providing reliable variance estimates under asymptotic normality.[28] For instance, in metric analytics for A/B testing, the delta method requires fewer assumptions and less computation than individual-level simulations, making it preferable for real-time applications.[29]Nonparametric Delta Method
The nonparametric delta method extends the classical approach to settings where parametric assumptions on the underlying distribution are relaxed, allowing for the asymptotic analysis of transformations of nonparametric estimators such as kernel density estimates or empirical distribution functions.[30] In these contexts, the method relies on the functional delta method, which interprets statistics as functionals of the distribution and uses Hadamard differentiability to derive limiting distributions.[31] A primary application involves nonparametric estimators like the kernel density estimator , where the asymptotic variance of a smooth transformation is obtained by applying the delta method to the known asymptotic variance of itself. For instance, under conditions such as stationary -mixing data, a smooth density, and appropriate bandwidth selection where and , the normalized difference converges in distribution to a normal random variable with mean zero and variance determined by the functional derivative and kernel properties.[3] This framework handles nonlinear functionals of kernel estimators, including cases with non-differentiable transformations via generalized derivatives like the Dirac delta.[3] The bootstrap-delta method provides a resampling-based alternative for estimating the derivative empirically when the parameter is unknown in nonparametric settings. By generating bootstrap replicates from the empirical distribution and applying a first-order Taylor expansion around the estimated functional, it approximates the variance of and yields standard error estimates equivalent to those from the infinitesimal jackknife, a variant of the delta method.[32] This approach is particularly useful for complex statistics like correlation coefficients, requiring sufficient sample sizes (e.g., n ≥ 14) but benefiting from smoothing to reduce bias.[32] In functional data analysis, the delta method applies to integrals of density estimates, treating such functionals as paths in infinite-dimensional spaces. For example, the asymptotic distribution of smoothed integral functionals of a kernel density estimator for functional data can be derived using Hadamard differentiability, yielding under weak convergence in appropriate Banach spaces.[30] This enables inference on quantities like expected values or moments derived from density integrals in high-dimensional or curve data.[33] Key challenges in the nonparametric delta method include the need for weaker regularity conditions compared to parametric cases, such as Hadamard differentiability rather than Fréchet differentiability, and ensuring uniform convergence over compact sets to handle infinite-dimensional spaces.[31] These requirements often necessitate careful bandwidth tuning and mixing conditions for dependent data, as violations can lead to slower convergence rates or non-normal limits.[3]Historical Context
Origins and Development
The delta method emerged from early efforts in error propagation and asymptotic approximations in statistics during the late 19th and early 20th centuries. Initial hints appeared in the work of Karl Pearson and his collaborators, who in the 1890s developed formulas for propagating errors under normality assumptions, laying groundwork for approximating variances of functions of random variables. A key early application came in Pearson and Filon's 1898 analysis of the asymptotic variance of sample correlation coefficients, marking one of the first statistical uses of such approximations. Further early development occurred in Spearman and Holzinger's 1924 paper on the variance of non-linear functions in psychological statistics.[34] Ronald A. Fisher advanced these ideas in the context of statistical inference for transformations, notably in his 1915 proposal of variance-stabilizing transformations and further in his 1922 paper on the mathematical foundations of theoretical statistics, where he established the asymptotic normality of maximum likelihood estimators—essential for later delta method derivations. Fisher's 1925 book, Statistical Methods for Research Workers, applied these concepts to practical inference problems, including transformations of correlation coefficients to approximate normal distributions. These contributions were motivated by the need to approximate distributions of estimators in biological and agricultural experiments, where direct computation was often infeasible. Formalization accelerated in the 1930s and 1940s through asymptotic expansions. J. L. Doob provided a general probabilistic proof in 1935, extending the method to functions of sample moments. Robert Dorfman independently derived a similar result in 1938 for biometric applications. The first rigorous textbook treatment appeared in Harald Cramér's 1946 Mathematical Methods of Statistics, where the method was stated for functions of central moments with explicit error bounds, solidifying its role in asymptotic theory. Richard von Mises further formalized the asymptotic distribution of differentiable statistical functionals in 1947.[5]Key Contributors and Milestones
The delta method's formalization and widespread adoption in statistical practice began in the 1930s with contributions from key figures in asymptotic theory. J. L. Doob's 1935 work on the limiting distributions of functions of sample statistics provided an early rigorous foundation for approximating the asymptotic normality of transformed estimators. Jerzy Neyman and E. S. Pearson further advanced its application in the late 1930s for constructing confidence intervals based on functions of asymptotically normal estimators, integrating it into the Neyman-Pearson framework for hypothesis testing and interval estimation. Robert Dorfman is widely credited with coining the term "delta method" in 1938, where he applied the technique to derive approximate confidence intervals for ratios and other nonlinear functions in biometric data analysis. Harold Cramér's influential 1946 monograph Mathematical Methods of Statistics offered a comprehensive mathematical treatment of the method, emphasizing its role in deriving asymptotic variances and distributions for functions of maximum likelihood estimators. In the post-World War II era, C. R. Rao's 1952 book Advanced Statistical Methods in Biometric Research highlighted the method's utility in asymptotic analysis for biometric applications, including growth curve comparisons and variance stabilization. The 1970s marked significant extensions to robust and semiparametric estimation; Peter J. Bickel and colleagues developed applications to M-estimators, demonstrating the method's robustness to model misspecification through linearization of estimating equations. By the 1990s, computational advancements enabled efficient numerical implementation of the delta method, particularly for Jacobian matrix computations in high-dimensional settings and integration with bootstrap procedures for variance estimation. This evolution shifted the method from purely parametric likelihood contexts to broader semiparametric and nonparametric uses, enhancing its flexibility in modern statistical modeling. The delta method's enduring impact is evident in its standard incorporation into statistical software; for instance, R'sconfint function employs it to compute confidence intervals for nonlinear transformations of model parameters, facilitating routine use in applied analyses.