Recent from talks
Contribute something
Nothing was collected or created yet.
Wald test
View on WikipediaIn statistics, the Wald test (named after Abraham Wald) assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is the precision of the estimate.[1][2] Intuitively, the larger this weighted distance, the less likely it is that the constraint is true. While the finite sample distributions of Wald tests are generally unknown,[3]: 138 it has an asymptotic χ2-distribution under the null hypothesis, a fact that can be used to determine statistical significance.[4]
Together with the Lagrange multiplier test and the likelihood-ratio test, the Wald test is one of three classical approaches to hypothesis testing. An advantage of the Wald test over the other two is that it only requires the estimation of the unrestricted model, which lowers the computational burden as compared to the likelihood-ratio test. However, a major disadvantage is that (in finite samples) it is not invariant to changes in the representation of the null hypothesis; in other words, algebraically equivalent expressions of non-linear parameter restriction can lead to different values of the test statistic.[5][6] That is because the Wald statistic is derived from a Taylor expansion,[7] and different ways of writing equivalent nonlinear expressions lead to nontrivial differences in the corresponding Taylor coefficients.[8] Another aberration, known as the Hauck–Donner effect,[9] can occur in binomial models when the estimated (unconstrained) parameter is close to the boundary of the parameter space—for instance a fitted probability being extremely close to zero or one—which results in the Wald test no longer monotonically increasing in the distance between the unconstrained and constrained parameter.[10][11]
Mathematical details
[edit]Under the Wald test, the estimated that was found as the maximizing argument of the unconstrained likelihood function is compared with a hypothesized value . In particular, the squared difference is weighted by the curvature of the log-likelihood function.
Test on a single parameter
[edit]If the hypothesis involves only a single parameter restriction, then the Wald statistic takes the following form:
which under the null hypothesis follows an asymptotic χ2-distribution with one degree of freedom. The square root of the single-restriction Wald statistic can be understood as a (pseudo) t-ratio that is, however, not actually t-distributed except for the special case of linear regression with normally distributed errors.[12] In general, it follows an asymptotic z distribution.[13]
where is the standard error (SE) of the maximum likelihood estimate (MLE), the square root of the variance. There are several ways to consistently estimate the variance matrix which in finite samples leads to alternative estimates of standard errors and associated test statistics and p-values.[3]: 129 The validity of still getting an asymptotically normal distribution after plugin-in the MLE estimator of into the SE relies on Slutsky's theorem.
Test(s) on multiple parameters
[edit]The Wald test can be used to test a single hypothesis on multiple parameters, as well as to test jointly multiple hypotheses on single/multiple parameters. Let be our sample estimator of P parameters (i.e., is a vector), which is supposed to follow asymptotically a normal distribution with covariance matrix V, . The test of Q hypotheses on the P parameters is expressed with a matrix R:
The distribution of the test statistic under the null hypothesis is
which in turn implies
where is an estimator of the covariance matrix.[14]
Suppose . Then, by Slutsky's theorem and by the properties of the normal distribution, multiplying by R has distribution:
Recalling that a quadratic form of normal distribution has a Chi-squared distribution:
Rearranging n finally gives:
What if the covariance matrix is not known a-priori and needs to be estimated from the data? If we have a consistent estimator of such that has a determinant that is distributed , then by the independence of the covariance estimator and equation above, we have:
Nonlinear hypothesis
[edit]In the standard form, the Wald test is used to test linear hypotheses that can be represented by a single matrix R. If one wishes to test a non-linear hypothesis of the form:
The test statistic becomes:
where is the derivative of c evaluated at the sample estimator. This result is obtained using the delta method, which uses a first order approximation of the variance.
Non-invariance to re-parameterisations
[edit]The fact that one uses an approximation of the variance has the drawback that the Wald statistic is not-invariant to a non-linear transformation/reparametrisation of the hypothesis: it can give different answers to the same question, depending on how the question is phrased.[15][5] For example, asking whether R = 1 is the same as asking whether log R = 0; but the Wald statistic for R = 1 is not the same as the Wald statistic for log R = 0 (because there is in general no neat relationship between the standard errors of R and log R, so it needs to be approximated).[16]
Alternatives to the Wald test
[edit]There exist several alternatives to the Wald test, namely the likelihood-ratio test and the Lagrange multiplier test (also known as the score test). Robert F. Engle showed that these three tests, the Wald test, the likelihood-ratio test and the Lagrange multiplier test are asymptotically equivalent.[17] Although they are asymptotically equivalent, in finite samples, they could disagree enough to lead to different conclusions.
There are several reasons to prefer the likelihood ratio test or the Lagrange multiplier to the Wald test:[18][19][20]
- Non-invariance: As argued above, the Wald test is not invariant under reparametrization, while the likelihood ratio tests will give exactly the same answer whether we work with R, log R or any other monotonic transformation of R.[5]
- The other reason is that the Wald test uses two approximations (that we know the standard error or Fisher information and the maximum likelihood estimate), whereas the likelihood ratio test depends only on the ratio of likelihood functions under the null hypothesis and alternative hypothesis.
- The Wald test requires an estimate using the maximizing argument, corresponding to the "full" model. In some cases, the model is simpler under the null hypothesis, so that one might prefer to use the score test (also called Lagrange multiplier test), which has the advantage that it can be formulated in situations where the variability of the maximizing element is difficult to estimate or computing the estimate according to the maximum likelihood estimator is difficult; e.g. the Cochran–Mantel–Haenzel test is a score test.[21]
See also
[edit]References
[edit]- ^ Fahrmeir, Ludwig; Kneib, Thomas; Lang, Stefan; Marx, Brian (2013). Regression : Models, Methods and Applications. Berlin: Springer. p. 663. ISBN 978-3-642-34332-2.
- ^ Ward, Michael D.; Ahlquist, John S. (2018). Maximum Likelihood for Social Science : Strategies for Analysis. Cambridge University Press. p. 36. ISBN 978-1-316-63682-4.
- ^ a b Martin, Vance; Hurn, Stan; Harris, David (2013). Econometric Modelling with Time Series: Specification, Estimation and Testing. Cambridge University Press. ISBN 978-0-521-13981-6.
- ^ Davidson, Russell; MacKinnon, James G. (1993). "The Method of Maximum Likelihood : Fundamental Concepts and Notation". Estimation and Inference in Econometrics. New York: Oxford University Press. p. 89. ISBN 0-19-506011-3.
- ^ a b c Gregory, Allan W.; Veall, Michael R. (1985). "Formulating Wald Tests of Nonlinear Restrictions". Econometrica. 53 (6): 1465–1468. doi:10.2307/1913221. JSTOR 1913221. Archived from the original on 2018-07-21. Retrieved 2019-09-05.
- ^ Phillips, P. C. B.; Park, Joon Y. (1988). "On the Formulation of Wald Tests of Nonlinear Restrictions" (PDF). Econometrica. 56 (5): 1065–1083. doi:10.2307/1911359. JSTOR 1911359.
- ^ Hayashi, Fumio (2000). Econometrics. Princeton: Princeton University Press. pp. 489–491. ISBN 1-4008-2383-8.,
- ^ Lafontaine, Francine; White, Kenneth J. (1986). "Obtaining Any Wald Statistic You Want". Economics Letters. 21 (1): 35–40. doi:10.1016/0165-1765(86)90117-5.
- ^ Hauck, Walter W. Jr.; Donner, Allan (1977). "Wald's Test as Applied to Hypotheses in Logit Analysis". Journal of the American Statistical Association. 72 (360a): 851–853. doi:10.1080/01621459.1977.10479969.
- ^ King, Maxwell L.; Goh, Kim-Leng (2002). "Improvements to the Wald Test". Handbook of Applied Econometrics and Statistical Inference. New York: Marcel Dekker. pp. 251–276. ISBN 0-8247-0652-8.
- ^ Yee, Thomas William (2022). "On the Hauck–Donner Effect in Wald Tests: Detection, Tipping Points, and Parameter Space Characterization". Journal of the American Statistical Association. 117 (540): 1763–1774. arXiv:2001.08431. doi:10.1080/01621459.2021.1886936.
- ^ Cameron, A. Colin; Trivedi, Pravin K. (2005). Microeconometrics : Methods and Applications. New York: Cambridge University Press. p. 137. ISBN 0-521-84805-9.
- ^ Davidson, Russell; MacKinnon, James G. (1993). "The Method of Maximum Likelihood : Fundamental Concepts and Notation". Estimation and Inference in Econometrics. New York: Oxford University Press. p. 89. ISBN 0-19-506011-3.
- ^ Harrell, Frank E. Jr. (2001). "Section 9.3.1". Regression modeling strategies. New York: Springer-Verlag. ISBN 0387952322.
- ^ Fears, Thomas R.; Benichou, Jacques; Gail, Mitchell H. (1996). "A reminder of the fallibility of the Wald statistic". The American Statistician. 50 (3): 226–227. doi:10.1080/00031305.1996.10474384.
- ^ Critchley, Frank; Marriott, Paul; Salmon, Mark (1996). "On the Differential Geometry of the Wald Test with Nonlinear Restrictions". Econometrica. 64 (5): 1213–1222. doi:10.2307/2171963. hdl:1814/524. JSTOR 2171963.
- ^ Engle, Robert F. (1983). "Wald, Likelihood Ratio, and Lagrange Multiplier Tests in Econometrics". In Intriligator, M. D.; Griliches, Z. (eds.). Handbook of Econometrics. Vol. II. Elsevier. pp. 796–801. ISBN 978-0-444-86185-6.
- ^ Harrell, Frank E. Jr. (2001). "Section 9.3.3". Regression modeling strategies. New York: Springer-Verlag. ISBN 0387952322.
- ^ Collett, David (1994). Modelling Survival Data in Medical Research. London: Chapman & Hall. ISBN 0412448807.
- ^ Pawitan, Yudi (2001). In All Likelihood. New York: Oxford University Press. ISBN 0198507658.
- ^ Agresti, Alan (2002). Categorical Data Analysis (2nd ed.). Wiley. p. 232. ISBN 0471360937.
Further reading
[edit]- Greene, William H. (2012). Econometric Analysis (Seventh international ed.). Boston: Pearson. pp. 155–161. ISBN 978-0-273-75356-8.
- Kmenta, Jan (1986). Elements of Econometrics (Second ed.). New York: Macmillan. pp. 492–493. ISBN 0-02-365070-2.
- Thomas, R. L. (1993). Introductory Econometrics: Theory and Application (Second ed.). London: Longman. pp. 73–77. ISBN 0-582-07378-2.
External links
[edit]Wald test
View on GrokipediaOverview
Definition and Purpose
The Wald test is a statistical method used to assess whether the estimated parameters of a parametric model differ significantly from specified hypothesized values, typically based on maximum likelihood estimation (MLE). It evaluates the null hypothesis , where represents the parameter vector and is the hypothesized value, by measuring the standardized distance between the MLE and . Under the null hypothesis and suitable conditions, the test statistic follows an asymptotic chi-squared distribution with degrees of freedom equal to the number of restrictions imposed by , enabling p-value computation and decision-making for large samples.[3][5] The primary purpose of the Wald test is to facilitate hypothesis testing in parametric models, such as those in econometrics, biostatistics, and social sciences, where direct inference on parameter significance is required without refitting the model under restrictions. It is particularly valuable for its computational simplicity, as it relies solely on the unrestricted MLE and its estimated covariance matrix, making it efficient for complex models with many parameters. This approach contrasts with methods that require constrained optimization, offering a practical tool for model diagnostics and inference on subsets of parameters.[5] Key assumptions underlying the Wald test include model identifiability, ensuring parameters are uniquely estimable, and regularity conditions for the asymptotic normality of the MLE, such as the existence of a positive definite Fisher information matrix and thrice-differentiable log-likelihood functions. These conditions guarantee that the information matrix is invertible and that the MLE converges in probability to the true parameter, with a limiting normal distribution scaled by the inverse information matrix. Violations, such as singularity of the information matrix, can invalidate the test's asymptotic properties.[5] A representative application is in linear regression, where the Wald test evaluates whether a coefficient equals zero to determine the variable's significance; for instance, in an ordinary least squares model, the t-statistic for is a special case of the Wald statistic under normality assumptions.[5] The test is named after Abraham Wald, who introduced it in 1943 as a general procedure for testing multiple parameter hypotheses in large fixed samples, building on his foundational work in statistical decision theory during the 1940s.[3]Historical Development
The Wald test originated from the work of Abraham Wald during World War II, as part of his contributions to decision theory and efficient hypothesis testing in large-sample settings. Wald, a Hungarian-American mathematician and statistician, developed the test amid research on sequential analysis for military applications, including quality control and decision-making under uncertainty. It was formally introduced in his 1943 paper, which addressed testing multiple parameters asymptotically without requiring small-sample exact distributions.[3] Wald's framework gained further traction through key publications that refined and extended its application. His 1945 paper in the Annals of Mathematical Statistics elaborated on sequential variants, while C. R. Rao's 1948 work in the Proceedings of the Cambridge Philosophical Society provided extensions for multiparameter cases, integrating the Wald statistic with score-based alternatives for broader hypothesis testing. These efforts emphasized the test's efficiency in leveraging maximum likelihood estimates to evaluate parameter constraints.[6][7] In the 1950s and 1960s, the Wald test became integrated into asymptotic statistics through foundational texts and papers by Harald Cramér, C. R. Rao, and Samuel S. Wilks, who connected it to likelihood ratio principles and large-sample approximations. This period solidified its role in general parametric inference. By the post-1970s era, computational advances elevated its prominence in econometrics, as highlighted in Robert F. Engle's 1984 analysis of Wald, likelihood ratio, and Lagrange multiplier tests, enabling routine use in complex models. The test emerged as a computationally efficient alternative to methods like t-tests, which rely on exact small-sample normality, by avoiding full likelihood evaluations under restrictions.[5] Notable milestones include its adoption in generalized linear models, as formalized by John Nelder and Robert Wedderburn in 1972, where the Wald test facilitated parameter significance assessment across diverse distributions like binomial and Poisson. Its enduring relevance extends to machine learning, where it supports confidence intervals for parameters in logistic regression and related algorithms, building on its asymptotic foundations.[8]Mathematical Foundations
General Setup and Assumptions
The Wald test operates within the framework of parametric statistical inference, where the observed data are assumed to arise from a probability distribution parameterized by a -dimensional vector . The likelihood function is denoted , typically expressed as the product of individual densities or mass functions under independence, and the maximum likelihood estimator is obtained by maximizing or, equivalently, the log-likelihood .[5] The null hypothesis for the test is generally stated as , where is a -dimensional function (with ) that may be linear or nonlinear, and is a specified vector; for the basic setup, this often simplifies to the linear case for some fixed .[5] This formulation allows testing restrictions on subsets of parameters while permitting others to vary freely, provided the model remains identifiable under . Key assumptions underpinning the validity of the Wald test include the observations being independent and identically distributed (i.i.d.) or, more generally, satisfying ergodicity conditions to ensure consistent estimation in dependent data settings such as time series.[5] The log-likelihood must be twice continuously differentiable with respect to in a neighborhood of the true parameter value , and the Fisher information matrix must be positive definite at to guarantee the invertibility required for asymptotic variance estimation.[5] Under these conditions, the maximum likelihood estimator satisfies asymptotic normality: as the sample size .[9] Additional regularity conditions are necessary for the consistency and efficiency of , including that the true parameter lies in the interior of the parameter space to avoid boundary issues that could invalidate asymptotic approximations, and that the model parameters are identifiable, meaning distinct values of yield distinct distributions for .[10][11] The log-likelihood should also satisfy domination conditions, such as the existence of an integrable function bounding the derivatives, to justify interchanges of differentiation and integration in deriving the information matrix equality .[10] These assumptions hold for a wide class of models, including exponential families where the likelihood takes the form , but extend generally to any setup supporting maximum likelihood estimation.[9]Derivation of the Test Statistic
The derivation of the Wald test statistic begins with the asymptotic properties of the maximum likelihood estimator (MLE) under standard regularity conditions for the likelihood function. For a sample of size from a parametric model with parameter vector , the MLE satisfies , where denotes the Fisher information matrix per observation.[11][3] Consider testing the null hypothesis , where is a -dimensional (with ) continuously differentiable function, and . Under , let satisfy . A first-order Taylor expansion yields , where is the Jacobian matrix at . Thus, .[11][12] The Wald test statistic standardizes this quantity to obtain which converges in distribution to under as .[11][12] The hypothesis is rejected at significance level if , the quantile of the distribution.[11] For the special case of a linear hypothesis with , such as for a scalar parameter, the statistic simplifies to under .[13][14] The information matrix is typically estimated using the observed information (negative Hessian of the log-likelihood at ) or the expected information evaluated at ; both yield asymptotically equivalent results under correct specification.[12] In misspecified models, where the assumed likelihood does not match the true data-generating process, the sandwich estimator provides a robust alternative: , with estimating the variance of the score; substituting this into the Wald statistic ensures consistent inference.[15] The Wald statistic measures the squared standardized distance between the estimated constraint and the null value , scaled by its estimated asymptotic variance; the p-value is computed as , where is the cumulative distribution function of the distribution.[11][13]Specific Formulations
Test for a Single Parameter
The Wald test for a single parameter addresses the null hypothesis against the alternative , where is a scalar parameter in a parametric model estimated via maximum likelihood.[13] This formulation specializes the general Wald test to cases where only one parameter is constrained under the null, simplifying the asymptotic distribution to a chi-squared with one degree of freedom. The test statistic is given by where is the maximum likelihood estimator (MLE) of , and the standard error is , with denoting the total observed Fisher information evaluated at .[16] Under and standard regularity conditions for asymptotic normality of the MLE, is approximately normally distributed with mean and variance , so the standardized pivot follows an asymptotic standard normal distribution , implying .[13] The null is rejected at significance level if , the -quantile of the chi-squared distribution with one degree of freedom.[17] In practice, the standard error is routinely output by maximum likelihood software alongside the MLE, facilitating straightforward computation of or the equivalent -statistic without additional estimation under the null.[16] This test exhibits duality with confidence intervals: the Wald confidence interval for is , where is the -quantile of the standard normal; thus, is rejected if and only if falls outside this interval.[17] A common application arises in logistic regression, where the model parameterizes the log-odds as ; to test if a specific coefficient (corresponding to an odds ratio of 1), the Wald statistic uses the MLE and its standard error from the fitted model, yielding a test for no association between the corresponding predictor and the log-odds.[17] For instance, in binary outcome models with a binary predictor, this assesses whether the odds ratio equals unity.[18] Under local alternatives where the true parameter satisfies for some fixed , the asymptotic power of the test is , where is the standard normal cumulative distribution function, reflecting non-centrality in the limiting normal distribution of the pivot.[13] For finite samples, particularly small , the normal approximation may underperform, and practitioners often approximate the distribution of using a t-distribution with degrees of freedom (where is the number of parameters) to improve coverage and test validity, though this remains an ad hoc adjustment without exact guarantees in general MLE settings.[19]Tests for Multiple Parameters
The Wald test extends naturally to joint hypotheses involving multiple parameters in a vector , where the null hypothesis specifies with a matrix of full rank and a vector (including subset tests as a special case, e.g., for ).[1][20] Under standard regularity conditions for maximum likelihood estimation, the test statistic is given by where is the maximum likelihood estimator, is the total observed Fisher information matrix evaluated at . Asymptotically under , follows a chi-squared distribution with degrees of freedom.[1][21] Computation of requires estimating the asymptotic covariance matrix of , which is , and then forming the relevant submatrix or transformation via . For hypotheses on a subset of parameters (e.g., in a partitioned parameter vector ), this involves inverting the corresponding block of the variance-covariance matrix ; the full matrix accounts for correlations among parameters, ensuring the test adjusts for dependencies in the estimates.[20][1] In multiple linear regression models, the Wald test for the joint significance of a subset of coefficients is asymptotically equivalent to the F-test for the same hypothesis, particularly when the error variance is estimated; both assess whether the restricted model (imposing ) fits significantly worse than the unrestricted model. For instance, testing the overall significance of all slope coefficients () yields a Wald statistic that, under normality and large samples, aligns with the standard F-statistic for the regression. The degrees of freedom remain for the chi-squared approximation, corresponding to the dimension of the hypothesis; for testing subsets, one can apply the test sequentially to nested or partitioned groups of parameters, adjusting the matrix accordingly to evaluate hierarchical restrictions.[22][20]Advanced Considerations
Application to Nonlinear Hypotheses
The Wald test extends naturally to nonlinear hypotheses of the form , where is a -dimensional continuously differentiable function and is the -dimensional parameter vector with .[1][5] Under standard regularity conditions, including the full rank of the Jacobian matrix (the matrix of partial derivatives), the test statistic is approximated using the delta method as where is the unrestricted maximum likelihood estimator, is the estimated information matrix, and is the sample size; asymptotically, under the null.[1][5] This form arises from a first-order Taylor expansion of around , linearizing the constraint and leveraging the asymptotic normality of .[1][5] Computing the statistic requires evaluating the Jacobian , which can be obtained analytically if permits or via numerical differentiation otherwise; the choice of in introduces sensitivity, as small variations in the estimate may affect the matrix's conditioning, particularly when the null holds near the boundary of the parameter space.[5][23] For instance, in nonlinear regression models, one might test to assess whether a linear parameter equals the exponential of another, yielding and substituting into the statistic for inference.[1][5] The test's validity relies on the smoothness of (ensuring the Taylor approximation holds) and asymptotic arguments, with the chi-squared distribution emerging under local alternatives; for finite samples, where the approximation may falter due to nonlinearity, bootstrap resampling of the score or residuals provides improved inference by empirically estimating the distribution of .[1][5][24] This linearization-based approach generalizes the multiparameter linear case, where and is constant.[1]Sensitivity to Reparameterization
The Wald test lacks invariance under reparameterization, meaning that equivalent hypotheses expressed in different parameter forms can yield different test statistics and p-values. Consider a parameter θ transformed to φ = h(θ), where h is a differentiable one-to-one function; the null hypothesis H₀: θ = θ₀ is mathematically equivalent to H₀: φ = h(θ₀), yet the Wald statistic W generally differs between the two formulations due to the test's reliance on the unrestricted maximum likelihood estimate (MLE) \hat{θ}. This non-invariance arises because the test approximates the parameter's distribution locally at \hat{θ}, introducing asymmetry when the transformation is nonlinear.[25][26] The mathematical basis for this sensitivity lies in the transformation of the Fisher information matrix. Under reparameterization, the information matrix for φ relates to that for θ via where the Jacobian evaluated at \hat{θ} affects the estimated variance used in the Wald statistic. Although the asymptotic distribution remains χ² under regularity conditions, the finite-sample approximation varies with the parameterization, leading to inconsistent inferences across equivalent models. For instance, in a Poisson regression context where the mean λ is the parameter of interest, testing H₀: λ = 1 directly differs from testing H₀: log(λ) = 0 (the log-link parameterization), often producing divergent p-values due to the curvature induced by the exponential transformation.[25][27] This lack of invariance has practical consequences, potentially resulting in contradictory conclusions from the same data depending on the chosen parameterization, which undermines the test's reliability in curved exponential families or nonlinear models. Empirical studies demonstrate bias in such settings, particularly when the MLE is far from the null value, exacerbating size distortions. To mitigate these issues, practitioners are advised to report results in the original or scientifically meaningful parameterization and consider invariant alternatives like the likelihood ratio test, which remains unaffected by reparameterization. Profile likelihood methods can also provide more robust inference in problematic cases.[25][26][28]Comparisons and Alternatives
Relation to Likelihood Ratio Test
The likelihood ratio (LR) test is a classical hypothesis testing procedure that compares the goodness-of-fit of two nested models: the full (unrestricted) model and the restricted model under the null hypothesis . The test statistic is given by where is the maximized likelihood under the alternative hypothesis and is the maximized likelihood under , with and denoting the corresponding maximum likelihood estimators (MLEs). Under , for large samples, LR asymptotically follows a chi-squared distribution with degrees of freedom, where is the difference in the number of free parameters between the full and restricted models. Computing the LR statistic requires estimating MLEs under both the full and restricted models. The Wald test, LR test, and score test (also known as the Lagrange multiplier test) are asymptotically equivalent under the null hypothesis and local alternatives, all converging in distribution to a chi-squared random variable with the appropriate degrees of freedom as the sample size increases. This equivalence arises because each test leverages the quadratic approximation of the log-likelihood function near the MLE, leading to identical asymptotic behavior under standard regularity conditions. However, the tests differ in their construction: the Wald test assesses the distance of the unrestricted MLE from the null value using its estimated covariance matrix, whereas the LR test measures the difference in log-likelihood fits between unrestricted and restricted models. Key differences include computational demands and finite-sample performance. The Wald test is often simpler to compute because it relies solely on the unrestricted MLE and its estimated dispersion, avoiding the need to refit the restricted model, which can be advantageous for large datasets or post-hoc analyses. In contrast, the LR test requires maximizing the likelihood under the restriction, making it more computationally intensive but also more stable in finite samples, particularly near the boundary of the parameter space where the Wald test can exhibit inflated type I error rates or reduced power. Additionally, the LR test is invariant to reparameterization of the model, yielding the same p-value regardless of how the parameters are transformed, whereas the Wald test's statistic and inference can vary with reparameterization. Empirical studies confirm that the LR test generally has higher power than the Wald test, especially near the null hypothesis, though the Wald test may suffice for very large samples where asymptotic approximations hold well. For instance, in linear regression under normality assumptions, the LR test for comparing nested models (e.g., testing whether a subset of coefficients is zero) is equivalent to the F-test, which assesses the incremental explained variance. The Wald test, meanwhile, directly tests individual or joint coefficients using t- or F-statistics based on the coefficient estimates and their standard errors. This equivalence highlights the LR test's role in formal model comparison, while the Wald test is more suited for targeted parameter inquiries. Preferences between the two tests depend on context: the Wald test is favored for its computational efficiency in large-scale or exploratory analyses, but the LR test is generally recommended for small to moderate samples, nested model comparisons, or when invariance and robustness near boundaries are critical, as supported by power comparisons showing LR's superiority in such scenarios.Relation to Score Test
The score test, also known as the Lagrange multiplier test, evaluates the null hypothesis by assessing the gradient of the log-likelihood function, or score, at the maximum likelihood estimate under the restriction imposed by the null. The test statistic is given by where is the score vector evaluated at the restricted estimate , is the Fisher information matrix, and is the sample size; under the null hypothesis, follows an asymptotic distribution with degrees of freedom, corresponding to the number of restrictions. Unlike the Wald test, which relies on the unrestricted estimate, the score test requires only the estimation of the restricted model, making it computationally efficient for testing whether constraints hold without needing to optimize the full alternative model.[5][29] Asymptotically, the score test and the Wald test are equivalent under the null hypothesis, both converging in distribution to , as do they under local alternatives; this equivalence arises from the quadratic approximation of the log-likelihood in large samples, ensuring that the tests yield the same decisions with probability approaching 1 as . However, they differ in their use of information: the Wald test employs the estimate and variance at the unrestricted maximum likelihood estimator , while the score test uses them at , leading to theoretical contrasts in sensitivity to model misspecification—the score test can be more vulnerable if the null is poorly specified, whereas the Wald test performs better once the full model is confirmed. In settings with estimating functions or composite likelihoods, the score test's robustness ties to the Godambe information matrix, which generalizes the Fisher information to account for model uncertainty beyond parametric assumptions.[5][12][30] In generalized linear models (GLMs), the score test is commonly applied to assess overall model fit by testing the null that all slope parameters are zero against the alternative of a full model with predictors; for instance, fitting an intercept-only null model allows computation of the score statistic to evaluate if the predictors collectively improve fit, often yielding a value that rejects the null if significant predictors exist. This contrasts with the Wald test in GLMs, which might test individual parameters post-fitting the full model. Preferences favor the score test for preliminary or diagnostic checks in large models, as it avoids the optimization burden of the unrestricted fit, though combined use with the Wald test enhances reliability in comprehensive analyses.[31][29]Applications and Implementations
Use in Regression Models
In linear regression models, the Wald test for the null hypothesis that a single regression coefficient is mathematically equivalent to the square of the corresponding t-test statistic, providing a chi-squared distributed test under the null for large samples.[17] For joint tests involving multiple coefficients, the Wald statistic relates directly to the F-statistic through , where is the number of restrictions, allowing assessment of overall model significance while maintaining the exact finite-sample distribution properties of the F-test in homoskedastic linear settings.[32] In generalized linear models (GLMs), such as logistic regression, the Wald test evaluates the significance of parameters in the link function, for instance, testing whether a coefficient indicates no effect of the predictor on the log-odds of the outcome.[33] This application is particularly useful in binary outcome models where traditional t-tests do not apply, as the Wald statistic leverages the asymptotic normality of maximum likelihood estimators to assess deviations from the null, often reported alongside confidence intervals for interpretability.[34] For nonlinear least squares estimation, the Wald test assesses parameter significance in models with non-linear parameterizations, such as the Michaelis-Menten equation used in enzyme kinetics, where it tests hypotheses on or by comparing estimated values to hypothesized ones scaled by their asymptotic standard errors.[35] Caveats arise due to potential curvature in the parameter space, which can distort confidence regions, but the test remains a standard tool for inference when bootstrap alternatives are computationally intensive.[35] In time series analysis, particularly ARIMA models, the Wald test examines hypotheses on autoregressive coefficients to assess stationarity, such as testing whether the sum of AR coefficients equals unity, which would indicate a unit root and non-stationarity.[36] This joint restriction test helps determine if differencing is needed, with the statistic providing evidence against stationarity when autoregressive roots lie inside the unit circle, guiding model specification in forecasting applications.[37] A prominent econometric application is in testing the Capital Asset Pricing Model (CAPM), where the Wald test, such as the Gibbons-Ross-Shanken statistic, jointly evaluates whether the intercepts (alphas) are zero across multiple assets in time-series regressions , assessing if the model correctly prices assets without systematic mispricing.[38] Rejection indicates deviations from CAPM predictions, informing asset pricing research and investment strategies.[39] To address finite-sample issues like heteroskedasticity in regression models, the Wald test incorporates robust standard errors, such as Huber-White estimators, which adjust the variance-covariance matrix to account for non-constant error variances without altering the point estimates.[40] This adjustment ensures valid inference under violations of homoskedasticity assumptions, as the sandwich form of the estimator consistently estimates the true asymptotic variance, making the test reliable in empirical settings with clustered or cross-sectional data.[41]Software and Computational Aspects
The Wald test is implemented in various statistical software packages, facilitating its application in regression and generalized linear models. In R, thelm() function for linear models provides coefficient estimates and associated Wald test p-values directly in the summary() output, derived from t-statistics equivalent to Wald tests under normality assumptions. For generalized linear models via glm(), the summary() method similarly reports Wald chi-squared statistics and p-values for individual coefficients, with the variance-covariance matrix accessible through vcov() for custom joint tests using functions like wald.test() from the aod package or waldtest() from lmtest. In Python's statsmodels library, the wald_test() method in regression result objects, such as OLSResults, enables testing of linear hypotheses on coefficients, including constraints specified as matrices or formulas for joint significance. Stata employs the post-estimation test command to perform Wald tests on linear combinations of parameters, while SAS's PROC GENMOD outputs Wald chi-squared statistics and p-values in the parameter estimates table for generalized linear models, with Type 3 analysis options for contrasts.
Computational challenges arise particularly in high-dimensional settings, where inverting the estimated information matrix to obtain the covariance matrix can lead to numerical instability due to ill-conditioning or near-singularity. In such cases, Hessian-based estimates of the observed information matrix (the negative second derivatives of the log-likelihood) may be replaced by more stable approximations, such as the outer product of gradients (sandwich estimator) or regularized inverses to mitigate rank deficiencies. These issues are exacerbated in sparse high-dimensional models, where divide-and-conquer algorithms have been proposed to distribute computations while preserving asymptotic validity of the Wald statistic.
Best practices emphasize reporting robust standard errors, such as heteroskedasticity-consistent (HC) or cluster-robust variants, to account for model misspecification and improve inference reliability, especially in the presence of heteroskedasticity or dependence. For small samples, where the asymptotic chi-squared approximation may inflate Type I errors, simulating critical values from the null distribution or using bootstrap methods is recommended to enhance accuracy. Additionally, Wald confidence intervals can be interpreted alongside Bayesian credible intervals in hybrid analyses, providing frequentist guarantees that align with posterior summaries in large samples.
Recent advances post-2020 have extended Wald tests to machine learning contexts, including uses in variable selection for deep neural networks.[42] Recent work as of 2025 has applied Wald tests in item response theory for power analysis in educational assessments and compared them to machine learning methods in survival analysis.[43][44]