Recent from talks
Nothing was collected or created yet.
Wilks's lambda distribution
View on WikipediaIn statistics, Wilks' lambda distribution (named for Samuel S. Wilks), is a probability distribution used in multivariate hypothesis testing, especially with regard to the likelihood-ratio test and multivariate analysis of variance (MANOVA).
Definitions
[edit]Wilks' lambda distribution is defined from two independent Wishart distributed variables as the ratio distribution of their determinants,[1]
given
independent and with
where p is the number of dimensions. In the context of likelihood-ratio tests m is typically the error degrees of freedom, and n is the hypothesis degrees of freedom, so that is the total degrees of freedom.[1]
Properties
[edit]There is a symmetry among the parameters of the Wilks distribution,[1]
Approximations
[edit]Computations or tables of the Wilks' distribution for higher dimensions are not readily available and one usually resorts to approximations. One approximation is attributed to M. S. Bartlett and works for large m[2] allows Wilks' lambda to be approximated with a chi-squared distribution
Related distributions
[edit]The distribution can be related to a product of independent beta-distributed random variables
As such it can be regarded as a multivariate generalization of the beta distribution.
It follows directly that for a one-dimension problem, when the Wishart distributions are one-dimensional with (i.e., chi-squared-distributed), then the Wilks' distribution equals the beta-distribution with a certain parameter set,
From the relations between a beta and an F-distribution, Wilks' lambda can be related to the F-distribution when one of the parameters of the Wilks lambda distribution is either 1 or 2, e.g.,[1]
and
See also
[edit]References
[edit]- ^ a b c d e f Kanti Mardia, John T. Kent and John Bibby (1979). Multivariate Analysis. Academic Press. ISBN 0-12-471250-9.
- ^ M. S. Bartlett (1954). "A Note on the Multiplying Factors for Various Approximations". J R Stat Soc Ser B. 16 (2): 296–298. JSTOR 2984057.
- ^ C. R. Rao (1951). "An Asymptotic Expansion of the Distribution of Wilks' Criterion". Bulletin de l'Institut International de Statistique. 33: 177–180.
Wilks's lambda distribution
View on GrokipediaOverview
Definition and interpretation
Wilks's lambda, denoted , is defined as the ratio of the determinant of the error sum-of-squares and cross-products matrix to the determinant of the total sum-of-squares and cross-products matrix , where is the hypothesis sum-of-squares and cross-products matrix: This statistic arises in the context of multivariate normal distributions with equal covariance matrices across groups and serves as the likelihood ratio criterion for testing hypotheses concerning equality of mean vectors in multivariate analysis of variance (MANOVA).[8] The value of measures the proportion of the generalized variance in the dependent variables that remains unexplained by differences between groups. Specifically, close to 1 implies that group effects account for little of the total variance, consistent with the null hypothesis of no significant multivariate differences, while close to 0 indicates that group differences explain a substantial portion of the variance, rejecting the null in favor of significant effects.[9] Samuel S. Wilks originally introduced the lambda criterion in 1932 within generalizations of the analysis of variance for multivariate observations, with further developments in 1946 specifically for testing equality of covariance matrices under normality, which was later extended to broader applications in MANOVA for mean vector comparisons assuming common covariances.[10][8]Historical development
Samuel S. Wilks introduced the lambda statistic in 1932 as a likelihood ratio test for multivariate hypotheses under the assumption of multivariate normality, specifically for assessing the equality of group centroids in the analysis of variance framework.[1] This marked a foundational step in extending univariate methods to multiple response variables, addressing the need for testing composite hypotheses involving covariance structures.[11] In the 1940s and 1950s, extensions to broader MANOVA frameworks were advanced by researchers including D. N. Lawley and Harold Hotelling. Lawley proposed a generalization of Fisher's z-test in 1938, leading to the development of the Hotelling-Lawley trace statistic, which complements Wilks's lambda by focusing on the sum of eigenvalues for hypothesis testing in multivariate settings. Hotelling further refined these ideas in 1951, generalizing Student's t-ratio to multivariate cases and enhancing the trace-based approaches for comparing group means. Earlier, Bartlett (1947) provided a chi-squared approximation for Wilks's lambda. F-approximations for Wilks's lambda were provided by C. R. Rao in 1951. The 1960s saw significant progress in handling small sample sizes through approximations developed by K. C. S. Pillai and A. G. Constantine. Pillai introduced the trace criterion in 1955, improving power and accessibility for finite samples. Constantine contributed detailed tables and non-central distribution approximations in 1963, facilitating practical computation of critical values and p-values. Recent advances have focused on exact distributions, particularly for challenging cases. For instance, a 2011 paper by Grilo and Coelho derives exact distributions of Wilks's lambda for independence tests involving two sets of variables with odd numbers of dimensions, enabling precise inference without approximations. These developments build on earlier work for even dimensions, enhancing applicability in high-dimensional data.[12] The statistic's influence extended to computational statistics in the 1980s onward, with implementations in software like SAS (via PROC GLM for MANOVA tests) and later in R (through the stats package's manova function), standardizing its use in empirical research.Mathematical formulation
The lambda statistic
Wilks's lambda statistic arises in the context of testing hypotheses about mean vectors in multivariate data drawn from normal distributions. Consider a one-way multivariate analysis of variance setup with groups, where observations in the -th group consist of independent -dimensional random vectors for and , assuming a common covariance matrix across groups. The total sample size is . The null hypothesis is , which posits equality of the group mean vectors. Under these assumptions, the likelihood ratio test statistic for is derived by comparing the maximized likelihood under the null (where means are equal) to that under the alternative (where means may differ). The hypothesis sums-of-squares and cross-products matrix is where is the group sample mean and is the overall sample mean. The error sums-of-squares and cross-products matrix is Wilks's lambda is then given by and the test statistic is often taken as , which under follows approximately a chi-squared distribution for large samples. This formulation measures the ratio of the generalized variance within groups to the total generalized variance, providing a multivariate generalization of the univariate F-test.[13] The assumptions underlying this derivation include multivariate normality of the observations and homogeneity of the covariance matrices across groups. The normality ensures the likelihood function is well-defined, while homogeneity (i.e., is the same for all groups) allows pooling of the error matrix. This assumption can be validated using Box's M test, a likelihood ratio test for equality of covariance matrices.[13] The lambda statistic generalizes to arbitrary linear hypotheses in the multivariate linear model , where is an response matrix, is an design matrix of full column rank, is a parameter matrix, and rows of are i.i.d. . To test a linear hypothesis of the form , where is of full row rank, is of full column rank, and is , the hypothesis and error matrices are constructed from the least-squares estimates under the full and restricted models, yielding in terms of these matrices. This extends the one-way case by incorporating the design structure in to specify between- and within-subspace variability.[14]Probability density function
The probability density function of Wilks's lambda (Λ) under the null hypothesis in the central case is derived from its representation as the product of p independent beta-distributed random variables, where p is the number of response variables, m is the hypothesis degrees of freedom, and s is the error degrees of freedom with s > p - 1 and m > 0.[15] Specifically, Λ ~ ∏_{j=1}^p U_j, where U_j ~ Beta( (s - j + 1)/2 , (m - p + j)/2 ) independently for j = 1, ..., p, and 0 < Λ ≤ 1.[15] The exact density f(Λ) for the general multivariate case is complex due to the product structure and is expressed in closed form using the Fox H-function or Meijer G-function.[15] Alternatively, the density can be represented as an infinite series expansion using hypergeometric functions of matrix argument, though this form is primarily used for computational purposes.[16] The cumulative distribution function (CDF), F(Λ) = ∫_0^Λ f(u) du, lacks a simple closed form and poses computational challenges, often requiring numerical integration or series approximations for evaluation.[17] For the special case when p = 1, the distribution reduces to a beta distribution with density f(Λ) = C Λ^{a - 1} (1 - Λ)^{b - 1}, where a = (s + 1)/2 and b = m/2, and C = 1 / B(a, b).[15] In the non-central case, the density adjusts to account for non-zero eigenvalues in the non-centrality matrix, typically expressed as a mixture of central densities weighted by the non-central parameters.[16]Properties
Moments and characteristic function
The moments of Wilks's lambda distribution provide key insights into its analytical properties, particularly under the null hypothesis in multivariate settings such as MANOVA. The distribution arises from the ratio of determinants of Wishart matrices, and its moments can be derived from a representation as a product of independent beta random variables: , where independently, with denoting the error degrees of freedom, the dimension, and the hypothesis degrees of freedom.[16] These moments and the characteristic function pertain to the central (null) distribution. The exact expectation of is given by which simplifies using the property to For large , this yields the approximation , reflecting the tendency of to approach 1 as sample size increases under the null. The exact expectation of is where is the digamma function, obtained by linearity from the beta representation since with and . For large , . Higher central moments follow from the general -th moment which involves ratios of gamma functions and facilitates derivations like unbiased estimators in multivariate tests by correcting for bias in likelihood ratio statistics.[18] The variance of is where is the trigamma function (first-order polygamma). Higher cumulants of are given by for , with the first cumulant , enabling Edgeworth expansions for finite-sample inference. These moments underpin bias corrections in estimators for non-centrality parameters in tests like MANOVA.[18] The characteristic function of under the central (null) case is a product of ratios of complex gamma functions that fully characterizes the distribution and supports numerical inversion for densities and tails. For the non-central case (alternative hypothesis with non-centrality matrix ), moments involve confluent hypergeometric functions of matrix argument, such as , expressed via series expansions; these are computed using Laplace approximations for accuracy in high dimensions, aiding power calculations and unbiased estimation of .[4]Asymptotic behavior
Under the null hypothesis, as the error degrees of freedom tends to infinity with fixed dimension and hypothesis degrees of freedom , the statistic converges in distribution to a chi-squared distribution with degrees of freedom. To improve the approximation in moderate samples, Bartlett introduced a correction factor , such that provides a closer approximation to the distribution. In high-dimensional regimes where the dimension grows with the sample size such that , the centered and scaled logarithm of Wilks's lambda, specifically , converges in distribution to a normal random variable with mean 0 and variance , where and are explicit expressions involving , , and . This asymptotic normality facilitates inference when traditional fixed-dimension approximations fail. The tail behavior of Wilks's lambda under the null can be analyzed using large deviation principles derived from concentration inequalities for the log-determinant of Wishart matrices, yielding exponential decay rates for probabilities for small , with the rate function depending on the spectral properties of the underlying covariance. Under a fixed alternative hypothesis with positive effect size, as , Wilks's lambda converges to 0 in probability, ensuring the consistency of the associated test, as the hypothesis matrix accumulates sufficient non-centrality to dominate the error matrix.Inference and approximations
Null hypothesis distribution
Under the null hypothesis of no effects in multivariate analysis of variance (MANOVA), Wilks's lambda statistic Λ follows a central distribution that depends on the degrees of freedom parameters: p (number of response variables), m (hypothesis degrees of freedom), and s (error degrees of freedom). The exact cumulative distribution function (CDF) of Λ can be derived from its probability density function, which is expressed as an infinite series involving hypergeometric functions or as a product of independent beta-distributed random variables. Specifically, under the null, Λ is distributed as the product ∏_{j=1}^p U_j, where each U_j follows a beta distribution with shape parameters ((s - j + 1)/2, m/2), allowing the CDF to be computed via convolution or series expansion for numerical evaluation.[4][19] Recursive algorithms facilitate efficient computation of the exact CDF, particularly for moderate parameter values, by iteratively evaluating the series terms or using continued fraction representations of the characteristic function of -2 log Λ. One such approach, developed by Takane, employs recursion to calculate percentage points and tail probabilities, avoiding direct integration of the complex density. These methods are implemented in statistical software for precise p-value calculation when analytical closed forms are intractable.[15] Near-exact methods for p-value computation under the null often rely on simulation, where samples of Λ are generated by simulating independent Wishart-distributed matrices for the hypothesis (H ~ Wishart_m(p, I_p)) and error (E ~ Wishart_s(p, I_p)) components, then computing Λ = |E| / |H + E| repeatedly (e.g., 10,000–100,000 iterations) to approximate the empirical CDF and obtain Monte Carlo p-values. To enhance efficiency, gamma approximations to the matrix variates are used, modeling the diagonal elements or eigenvalues of the Wishart matrices as sums of independent gamma random variables, which speeds up simulations while preserving accuracy for the null distribution. These simulation-based approaches achieve near-exact control of Type I error rates, with empirical error below 0.001 for α = 0.05 in most cases.[20][21] Critical values of Λ for selected parameters at common significance levels (e.g., α = 0.05, 0.01) are tabulated in multivariate statistics references for practical reference, particularly for small to moderate degrees of freedom; values for other combinations are generated via software such as SAS PROC GLM. These tables ensure conservative or exact Type I error control when using the lower-tail rejection region (reject H_0 if Λ < critical value). The null distribution of Wilks's lambda provides robust Type I error control at the nominal level α, even in designs prone to assumption violations. In particular, for repeated measures MANOVA, the test maintains the specified Type I error rate under sphericity violations, where correlations among repeated measures deviate from equality; this robustness arises because the multivariate formulation does not rely on the compound symmetry assumption required by univariate F-tests, which can inflate Type I errors up to 70% in severe cases. Adjustments like Greenhouse-Geisser or Huynh-Feldt epsilon corrections are unnecessary for Wilks's lambda, making it preferable for such scenarios.[22][23]Approximate distributions and critical values
For large sample sizes, the null distribution of Wilks's lambda (Λ) can be approximated using a chi-squared distribution, where -2 \ln \Lambda \approx \chi^2_{mp} and m denotes the degrees of freedom for the hypothesis, p the number of response variables.[15] This approximation performs well when the sample size is much larger than the product mp, providing a simple asymptotic test for the multivariate null hypothesis.[24] Bartlett's modification improves the chi-squared approximation for finite samples by applying a correction factor ρ to adjust for bias, yielding -2 \rho \ln \Lambda \approx \chi^2_{mp}, where ρ = 1 - \frac{(m+1)(mp+1) - 2p}{6(N-1)(m+1)} and N is the total sample size.[25] This correction, originally derived in the context of multivariate tests, reduces the type I error rate in moderate samples and is particularly useful when exact distributions are intractable.[26] Rao's F-approximation offers an alternative for smaller samples or when the chi-squared test lacks power, transforming Λ into an F statistic: F = \frac{s - p + 1}{mp} (\Lambda^{-1/m} - 1) \approx F_{mp, s - p + 1}, where s = N - g and g is the number of groups.[27] This method, developed for biometric applications, better approximates the distribution when m is small relative to s, enabling the use of standard F tables for hypothesis testing at common significance levels such as α = 0.05 or 0.01.[26] In cases where Wilks's lambda performs poorly, such as high-dimensional data (p approaching N) or unbalanced designs, alternative statistics like Pillai's trace (trace of the hypothesis sum of squares projected onto the error space) or Hotelling's T^2 (trace of E^{-1} H) are preferred, as they exhibit greater robustness to violations of normality or sphericity.[6] These criteria converge asymptotically to the same distribution as Λ but provide more stable p-values in finite samples.[2] Critical values for these approximations are typically obtained from F distribution tables or computed directly in statistical software; for instance, R'ssummary.manova function applies Rao's F-approximation to Wilks's lambda and reports p-values at α = 0.05 and 0.01, while SAS PROC GLM uses similar transformations for multivariate tests.[28] Such implementations facilitate practical inference without requiring manual table lookups.
For non-standard scenarios, including non-normal data or complex covariance structures, simulation-based methods generate empirical critical values by resampling under the null hypothesis, often yielding more accurate rejection regions than parametric approximations.[29] Bootstrap procedures, either parametric (resampling from fitted multivariate normals) or nonparametric (permuting residuals), further refine these by estimating percentile-based critical values from B replicated samples, enhancing reliability in unbalanced or heteroscedastic designs.[29]
