Recent from talks
Contribute something
Nothing was collected or created yet.
Projection matrix
View on WikipediaIn statistics, the projection matrix ,[1] sometimes also called the influence matrix[2] or hat matrix , maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). It describes the influence each response value has on each fitted value.[3][4] The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation.
Definition
[edit]If the vector of response values is denoted by and the vector of fitted values by ,
As is usually pronounced "y-hat", the projection matrix is also named hat matrix as it "puts a hat on ".
Application for residuals
[edit]The formula for the vector of residuals can also be expressed compactly using the projection matrix:
where is the identity matrix. The matrix is sometimes referred to as the residual maker matrix or the annihilator matrix.
The covariance matrix of the residuals , by error propagation, equals
- ,
where is the covariance matrix of the error vector (and by extension, the response vector as well). For the case of linear models with independent and identically distributed errors in which , this reduces to:[3]
- .
Intuition
[edit]
From the figure, it is clear that the closest point from the vector onto the column space of , is , and is one where we can draw a line orthogonal to the column space of . A vector that is orthogonal to the column space of a matrix is in the nullspace of the matrix transpose, so
- .
From there, one rearranges, so
- .
Therefore, since is on the column space of , the projection matrix, which maps onto , is .
Linear model
[edit]Suppose that we wish to estimate a linear model using linear least squares. The model can be written as
where is a matrix of explanatory variables (the design matrix), β is a vector of unknown parameters to be estimated, and ε is the error vector.
Many types of models and techniques are subject to this formulation. A few examples are linear least squares, smoothing splines, regression splines, local regression, kernel regression, and linear filtering.
Ordinary least squares
[edit]When the weights for each observation are identical and the errors are uncorrelated, the estimated parameters are
so the fitted values are
Therefore, the projection matrix (and hat matrix) is given by
Weighted and generalized least squares
[edit]The above may be generalized to the cases where the weights are not identical and/or the errors are correlated. Suppose that the covariance matrix of the errors is Σ. Then since
- .
the hat matrix is thus
and again it may be seen that , though now it is no longer symmetric.
Properties
[edit]The projection matrix has a number of useful algebraic properties.[5][6] In the language of linear algebra, the projection matrix is the orthogonal projection onto the column space of the design matrix .[4] (Note that is the pseudoinverse of X.) Some facts of the projection matrix in this setting are summarized as follows:[4]
- and
- is symmetric, and so is .
- is idempotent: , and so is .
- If is an n × r matrix with , then
- The eigenvalues of consist of r ones and n − r zeros, while the eigenvalues of consist of n − r ones and r zeros.[7]
- is invariant under : hence .
- is unique for certain subspaces.
The projection matrix corresponding to a linear model is symmetric and idempotent, that is, . However, this is not always the case; in locally weighted scatterplot smoothing (LOESS), for example, the hat matrix is in general neither symmetric nor idempotent.
For linear models, the trace of the projection matrix is equal to the rank of , which is the number of independent parameters of the linear model.[8] For other models such as LOESS that are still linear in the observations , the projection matrix can be used to define the effective degrees of freedom of the model.
Practical applications of the projection matrix in regression analysis include leverage and Cook's distance, which are concerned with identifying influential observations, i.e. observations which have a large effect on the results of a regression.
Blockwise formula
[edit]Suppose the design matrix can be decomposed by columns as . Define the hat or projection operator as . Similarly, define the residual operator as . Then the projection matrix can be decomposed as follows:[9]
where, e.g., and . There are a number of applications of such a decomposition. In the classical application is a column of all ones, which allows one to analyze the effects of adding an intercept term to a regression. Another use is in the fixed effects model, where is a large sparse matrix of the dummy variables for the fixed effect terms. One can use this partition to compute the hat matrix of without explicitly forming the matrix , which might be too large to fit into computer memory.
History
[edit]The hat matrix was introduced by John Wilder in 1972. An article by Hoaglin, D.C. and Welsch, R.E. (1978) gives the properties of the matrix and also many examples of its application.
See also
[edit]References
[edit]- ^ Basilevsky, Alexander (2005). Applied Matrix Algebra in the Statistical Sciences. Dover. pp. 160–176. ISBN 0-486-44538-0.
- ^ "Data Assimilation: Observation influence diagnostic of a data assimilation system" (PDF). Archived from the original (PDF) on 2014-09-03.
- ^ a b Hoaglin, David C.; Welsch, Roy E. (February 1978). "The Hat Matrix in Regression and ANOVA" (PDF). The American Statistician. 32 (1): 17–22. doi:10.2307/2683469. hdl:1721.1/1920. JSTOR 2683469.
- ^ a b c David A. Freedman (2009). Statistical Models: Theory and Practice. Cambridge University Press.
- ^ Gans, P. (1992). Data Fitting in the Chemical Sciences. Wiley. ISBN 0-471-93412-7.
- ^ Draper, N. R.; Smith, H. (1998). Applied Regression Analysis. Wiley. ISBN 0-471-17082-8.
- ^ Amemiya, Takeshi (1985). Advanced Econometrics. Cambridge: Harvard University Press. pp. 460–461. ISBN 0-674-00560-0.
- ^ "Proof that trace of 'hat' matrix in linear regression is rank of X". Stack Exchange. April 13, 2017.
- ^ Rao, C. Radhakrishna; Toutenburg, Helge; Shalabh; Heumann, Christian (2008). Linear Models and Generalizations (3rd ed.). Berlin: Springer. p. 323. ISBN 978-3-540-74226-5.
Projection matrix
View on GrokipediaFundamentals
Definition
In linear algebra, a projection matrix is defined as a square matrix that satisfies the idempotence condition . This property characterizes projections onto a subspace, where applying the operator twice yields the same result as applying it once.[4] For orthogonal projections, which are the most common in applications involving Euclidean spaces, the matrix is additionally symmetric, satisfying . This symmetry ensures that the projection is perpendicular to the complementary subspace. Such matrices project vectors from onto a lower-dimensional subspace while preserving angles and lengths within that subspace.[5][4] In statistical linear models, the projection matrix takes the specific form of the hat matrix , where is the design matrix with full column rank. This matrix projects the observed response vector onto the column space of , producing the vector of fitted values . The hat matrix inherits the idempotence and symmetry properties, making it an orthogonal projection operator in this context.[6][7] The diagonal elements of the hat matrix, known as leverages, quantify the influence of the -th response value on the corresponding fitted value . These leverages satisfy for each , with their sum equal to the number of parameters in the model. This formulation assumes familiarity with basic concepts of vectors, matrices, and linear subspaces.[6]Geometric Interpretation
In linear algebra, a projection matrix provides an orthogonal mapping of a vector in a vector space onto a subspace spanned by the columns of a matrix , such that the projected vector lies within the subspace and the residual vector is perpendicular to every vector in that subspace.[8] This orthogonality ensures that the projection minimizes the Euclidean distance between and the subspace, representing the "closest point" approximation in the geometric sense.[8] Consider an example in , where the subspace is the x-axis, spanned by the standard basis vector . The corresponding orthogonal projection matrix is , which maps any vector to , effectively dropping the y-component while preserving the x-coordinate.[9] Geometrically, this can be visualized as a vector being "dropped" perpendicularly from its tip to the x-axis line, forming a right angle between the residual and the projected point on the axis; such diagrams often illustrate the decomposition with the residual orthogonal to the subspace.[8] This focus on orthogonality distinguishes orthogonal projections from oblique projections, where the residuals are not perpendicular to the subspace, leading to a slanted "drop" rather than a right-angled one; however, the orthogonal case maintains the property that applying repeatedly to a vector in the subspace yields the same result, keeping it fixed within the subspace.[10] In higher dimensions, such as projecting onto a plane in , the visualization extends to the residual being perpendicular to the entire plane, akin to a shadow cast by perpendicular light rays onto a flat surface.[8]Properties
Algebraic Properties
A projection matrix in linear algebra is characterized by its idempotence, meaning . This property implies that acts as a projection onto an invariant subspace, where applying multiple times yields the same result as applying it once, leaving vectors in the range of unchanged while mapping others to that subspace. To see this, consider the spectral decomposition of , where its eigenvalues are either 0 or 1; thus, has the same eigenvalues, confirming .[11] For orthogonal projections, is also symmetric, satisfying . This symmetry ensures that the projection is self-adjoint, preserving inner products in the sense that for all vectors . Consequently, coincides with its own Moore-Penrose pseudoinverse in the context of orthogonal projections onto the column space, as .[11][4] The rank of equals the dimension of its range, which is the subspace onto which it projects, and this rank is also equal to the trace of . Since the nonzero eigenvalues of are all 1 and their count matches the rank, the trace, as the sum of eigenvalues, directly gives .[11] Regarding null spaces, the kernel of is precisely the range of , while the kernel of is the range of . This decomposition highlights how and partition the space into the projected subspace and its orthogonal complement for orthogonal projections.[11] A standard formula for the orthogonal projection matrix onto the column space of a full column rank matrix (with ) is , where is the Moore-Penrose pseudoinverse of . For full column rank, , so To derive this, note that must satisfy and project onto . Substituting yields confirming idempotence, and the columns of lie in the range of since . Symmetry follows from the transpose: . This form generalizes via the pseudoinverse for rank-deficient cases, where remains the orthogonal projection onto .[11][12]Trace and Rank Interpretations
In the context of linear regression, the trace of a projection matrix equals its rank, which corresponds to the number of columns in the design matrix , or equivalently, the number of parameters in the model, assuming has full column rank. This equality holds because is idempotent and symmetric, projecting onto the column space of . As established in the algebraic properties of projection matrices, .[13] The trace of provides a measure of model complexity in regression analysis, representing the degrees of freedom associated with the fitted values. Specifically, , where is the number of parameters, while the trace of the residual projection matrix equals , indicating the degrees of freedom for the residuals, with denoting the number of observations. This interpretation links the projection's dimensionality directly to statistical inference, such as in estimating variance or testing hypotheses.[14][15] The diagonal elements of , known as leverage values, quantify the influence of each observation on the fitted values; their sum equals , yielding an average leverage of . This sum underscores the overall "pull" of the model on the data, with high-leverage points potentially affecting fit stability.[16] For instance, in simple linear regression with an intercept and slope, the projection matrix has trace 2, reflecting the two parameters and the rank of the two-column design matrix.[17]Applications in Statistics
Ordinary Least Squares
In the ordinary least squares (OLS) framework, the linear regression model is expressed in matrix form as , where is an vector of observations, is an design matrix with full column rank, is a vector of unknown parameters, and is an vector of errors.[18] The OLS estimator minimizes the residual sum of squares and is given by .[18] This estimator is unbiased and has minimum variance among linear unbiased estimators under the model's assumptions.[19] The fitted values are , which can be written compactly as , where is the projection matrix, also known as the hat matrix.[18] Geometrically, orthogonally projects the response vector onto the column space of , ensuring that lies in this subspace and minimizes the Euclidean distance to .[18] The residuals are defined as , where is the identity matrix.[19] These residuals are orthogonal to the columns of , satisfying , which implies that the unexplained variation is perpendicular to the fitted hyperplane spanned by the predictors.[18] This orthogonality property decomposes into fitted and residual components with no correlation between them.[19] The projection matrix in OLS arises under the assumptions of a linear model, errors with zero conditional mean , homoscedasticity , and uncorrelated errors.[18] These conditions ensure that represents an orthogonal projection onto , with being symmetric and idempotent.[19] For a univariate example with an intercept, consider the simple linear model for , where the design matrix has first column of ones and second column . The projection matrix then projects onto the span of , yielding fitted values that represent the best linear approximation in the least squares sense; for instance, with centered predictors, the slope estimate aligns with this projection.[18]Generalized Least Squares
In the context of linear regression models where the errors exhibit heteroscedasticity or correlation, the generalized least squares (GLS) method employs a projection matrix adapted to the error covariance structure. Consider the linear model , where and , with a positive definite matrix. The GLS estimator of the parameter vector is given by The fitted values are then , where the projection matrix projects the response vector onto the column space of with respect to the inner product induced by .[20] This formulation generalizes the ordinary least squares case, which arises as a special instance when .[20] A key special case of GLS is weighted least squares (WLS), which applies when the errors are uncorrelated but have unequal variances, so is diagonal with entries . In this scenario, the weights are , and the projection matrix becomes , where is the diagonal weight matrix. This weighting ensures that observations with smaller error variances contribute more to the estimation, yielding more efficient parameter estimates under the specified error structure.[20] The projection matrix in GLS retains the idempotence property, satisfying , which confirms its role as a projection operator. However, unlike the orthogonal projection in ordinary least squares, is generally not symmetric () unless is a scalar multiple of the identity matrix, reflecting the oblique nature of the projection in the presence of correlated or heteroscedastic errors.[20] For the residuals , the covariance matrix is , demonstrating how the projection accounts for the underlying error structure to produce unbiased residuals with adjusted variance.[20]Advanced Formulations
Oblique Projections
Oblique projections extend the concept of projections beyond the orthogonal case by allowing the direction of projection to be non-perpendicular to the target subspace. A matrix represents an oblique projection if it is idempotent, satisfying , but not symmetric, so ; consequently, the projected vector and the residual vector are not orthogonal under the standard inner product. This distinguishes oblique projections from orthogonal ones, where symmetry holds and orthogonality is preserved.[21] The formula for the oblique projection onto the column space of a matrix parallel to the column space of is given by assuming is positive definite and invertible, and the inner matrices have full rank to ensure well-definedness. This expression captures projections in spaces equipped with a non-standard metric induced by , where the "parallel" direction aligns with the geometry defined by .[22] Representative examples of oblique projections appear in coordinate transformations and non-Euclidean metrics, such as affine mappings or engineering applications requiring angled views. For instance, in a 2D setting analogous to multiview projections, the matrix (for a 4D homogeneous coordinate system) implements an oblique projection onto a plane along directions specified by angles and , preserving certain lengths while shearing others. This satisfies but lacks symmetry unless .[10] Oblique projections arise naturally when working in weighted spaces, such as those with a non-identity covariance structure , where the projection becomes oblique in the Euclidean metric but orthogonal under the weighted inner product . In this context, setting in the formula yields the projection used in such settings.[22]Blockwise Formulas
Blockwise formulas for projection matrices arise in the context of partitioned design matrices, particularly when the column space is decomposed into nested or augmented subspaces. Consider a design matrix partitioned as , where and are matrices whose columns span subspaces and the augmentation to , respectively. The orthogonal projection matrix onto , denoted , can be expressed in terms of the projection onto and the contribution from orthogonal to : assuming is invertible, which holds if the columns of span a subspace with full rank relative to the orthogonal complement of .[23] This decomposition highlights the incremental nature of the projection, where the first term projects onto the initial subspace, and the second term adds the projection onto the component of orthogonal to . The second term, , represents the incremental projection contributed by the added variables in . This structure is particularly useful in sequential model building, as it allows updating the projection matrix without recomputing the full inverse of . In the context of the hat matrix in regression, this incremental term captures how additional predictors modify the fitted values and associated diagnostics. For example, in linear regression, adding a single covariate corresponding to a column vector (so ) updates the leverages, which are the diagonal elements of the hat matrix. The new leverage for observation becomes , reflecting the increased influence of that observation if it has high residual leverage in the orthogonalized direction. This update formula facilitates efficient computation in stepwise regression procedures.[23] Regarding ranks, the decomposition preserves rank additivity in the projected subspaces: , where the second rank term measures the dimension added by the orthogonal component of . This follows from the direct sum decomposition of , ensuring the total dimension is the sum of the dimensions of the nested and complementary subspaces.[23] The idempotence of is preserved through this block structure, consistent with general algebraic properties of projections.Computation and Extensions
Numerical Aspects
Computing the projection matrix directly via the normal equations is numerically unstable, particularly when the design matrix exhibits multicollinearity or near-collinearity, as the condition number of is the square of that of , amplifying rounding errors. Instead, a stable approach involves the QR decomposition of , where has orthonormal columns, yielding ; this avoids explicit inversion and preserves numerical accuracy even for ill-conditioned . High multicollinearity, indicated by a large condition number , inflates the variance of regression coefficients and can lead to extreme leverage values on the diagonal of , where leverages measure an observation's potential influence and may exceed typical bounds (e.g., for predictors and observations).[24] For rank-deficient cases where , the singular value decomposition (SVD) provides a robust alternative, with the projection onto the column space given by , where comprises the first left singular vectors corresponding to nonzero singular values; this handles numerical rank deficiency effectively. Orthogonal projections are typically computed via QR factorization algorithms, such as Householder reflections, which introduce zeros column-by-column through orthogonal transformations and ensure backward stability with flops for an matrix, or the modified Gram-Schmidt process, which orthogonalizes columns sequentially while mitigating loss-of-orthogonality issues present in the classical version.[25] In practice, blockwise updates from symbolic decompositions can enhance efficiency for large-scale computations, though stability remains paramount. Implementations in statistical software often compute projections implicitly for efficiency and stability; for instance, the R functionlm() employs QR decomposition internally to solve least squares problems, with leverages accessible via hatvalues() on the fitted model object. Similarly, Python's SciPy linalg.lstsq uses QR or SVD based on matrix conditioning to handle projections in linear models.
