Hubbry Logo
Projection matrixProjection matrixMain
Open search
Projection matrix
Community hub
Projection matrix
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Projection matrix
Projection matrix
from Wikipedia

In statistics, the projection matrix ,[1] sometimes also called the influence matrix[2] or hat matrix , maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). It describes the influence each response value has on each fitted value.[3][4] The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation.

Definition

[edit]

If the vector of response values is denoted by and the vector of fitted values by ,

As is usually pronounced "y-hat", the projection matrix is also named hat matrix as it "puts a hat on ".

Application for residuals

[edit]

The formula for the vector of residuals can also be expressed compactly using the projection matrix:

where is the identity matrix. The matrix is sometimes referred to as the residual maker matrix or the annihilator matrix.

The covariance matrix of the residuals , by error propagation, equals

,

where is the covariance matrix of the error vector (and by extension, the response vector as well). For the case of linear models with independent and identically distributed errors in which , this reduces to:[3]

.

Intuition

[edit]
A matrix, has its column space depicted as the green line. The projection of some vector onto the column space of is the vector

From the figure, it is clear that the closest point from the vector onto the column space of , is , and is one where we can draw a line orthogonal to the column space of . A vector that is orthogonal to the column space of a matrix is in the nullspace of the matrix transpose, so

.

From there, one rearranges, so

.

Therefore, since is on the column space of , the projection matrix, which maps onto , is .

Linear model

[edit]

Suppose that we wish to estimate a linear model using linear least squares. The model can be written as

where is a matrix of explanatory variables (the design matrix), β is a vector of unknown parameters to be estimated, and ε is the error vector.

Many types of models and techniques are subject to this formulation. A few examples are linear least squares, smoothing splines, regression splines, local regression, kernel regression, and linear filtering.

Ordinary least squares

[edit]

When the weights for each observation are identical and the errors are uncorrelated, the estimated parameters are

so the fitted values are

Therefore, the projection matrix (and hat matrix) is given by

Weighted and generalized least squares

[edit]

The above may be generalized to the cases where the weights are not identical and/or the errors are correlated. Suppose that the covariance matrix of the errors is Σ. Then since

.

the hat matrix is thus

and again it may be seen that , though now it is no longer symmetric.

Properties

[edit]

The projection matrix has a number of useful algebraic properties.[5][6] In the language of linear algebra, the projection matrix is the orthogonal projection onto the column space of the design matrix .[4] (Note that is the pseudoinverse of X.) Some facts of the projection matrix in this setting are summarized as follows:[4]

  • and
  • is symmetric, and so is .
  • is idempotent: , and so is .
  • If is an n × r matrix with , then
  • The eigenvalues of consist of r ones and nr zeros, while the eigenvalues of consist of nr ones and r zeros.[7]
  • is invariant under  : hence .
  • is unique for certain subspaces.

The projection matrix corresponding to a linear model is symmetric and idempotent, that is, . However, this is not always the case; in locally weighted scatterplot smoothing (LOESS), for example, the hat matrix is in general neither symmetric nor idempotent.

For linear models, the trace of the projection matrix is equal to the rank of , which is the number of independent parameters of the linear model.[8] For other models such as LOESS that are still linear in the observations , the projection matrix can be used to define the effective degrees of freedom of the model.

Practical applications of the projection matrix in regression analysis include leverage and Cook's distance, which are concerned with identifying influential observations, i.e. observations which have a large effect on the results of a regression.

Blockwise formula

[edit]

Suppose the design matrix can be decomposed by columns as . Define the hat or projection operator as . Similarly, define the residual operator as . Then the projection matrix can be decomposed as follows:[9]

where, e.g., and . There are a number of applications of such a decomposition. In the classical application is a column of all ones, which allows one to analyze the effects of adding an intercept term to a regression. Another use is in the fixed effects model, where is a large sparse matrix of the dummy variables for the fixed effect terms. One can use this partition to compute the hat matrix of without explicitly forming the matrix , which might be too large to fit into computer memory.

History

[edit]

The hat matrix was introduced by John Wilder in 1972. An article by Hoaglin, D.C. and Welsch, R.E. (1978) gives the properties of the matrix and also many examples of its application.

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In linear algebra, a projection matrix is a square matrix PP that represents a linear transformation projecting vectors from a vector space onto a subspace, satisfying the idempotence property P2=PP^2 = P. Such matrices arise in the context of decomposing a vector space VV as a direct sum V=XYV = X \oplus Y, where PP maps any vector v=x+yv = x + y (with xXx \in X and yYy \in Y) to its component xXx \in X, ensuring the image of PP is XX and the kernel is YY. For orthogonal projections, which are the most common type, PP is symmetric and projects onto a subspace WW (e.g., the column space of a matrix AA) such that the error vector is perpendicular to WW; the explicit formula is P=A(ATA)1ATP = A (A^T A)^{-1} A^T, assuming AA has full column rank and its columns form a basis for WW. Projection matrices are fundamental in applications like regression, where they provide the closest approximation of a vector bb in the subspace spanned by AA's columns, minimizing the bPb\| b - Pb \|. They also appear in for coordinate projections, for , and numerical methods for solving overdetermined systems. Key properties include , the fact that IPI - P is also a projection (onto the orthogonal complement for orthogonal cases), and invariance under certain linear operators if the subspaces are invariant.

Fundamentals

Definition

In linear algebra, a projection matrix PP is defined as a square matrix that satisfies the idempotence condition P2=PP^2 = P. This property characterizes projections onto a subspace, where applying the operator twice yields the same result as applying it once. For orthogonal projections, which are the most common in applications involving Euclidean spaces, the matrix is additionally symmetric, satisfying PT=PP^T = P. This symmetry ensures that the projection is perpendicular to the complementary subspace. Such matrices project vectors from Rn\mathbb{R}^n onto a lower-dimensional subspace while preserving angles and lengths within that subspace. In statistical linear models, the projection matrix takes the specific form of the hat matrix H=X(XTX)1XTH = X(X^T X)^{-1} X^T, where XX is the n×pn \times p design matrix with full column rank. This matrix projects the observed response vector yy onto the column space of XX, producing the vector of fitted values y^=Hy\hat{y} = H y. The hat matrix inherits the idempotence and symmetry properties, making it an orthogonal projection operator in this context. The diagonal elements hiih_{ii} of the hat matrix, known as leverages, quantify the influence of the ii-th response value yiy_i on the corresponding fitted value y^i\hat{y}_i. These leverages satisfy 0hii10 \leq h_{ii} \leq 1 for each ii, with their sum equal to the number of parameters pp in the model. This formulation assumes familiarity with basic concepts of vectors, matrices, and linear subspaces.

Geometric Interpretation

In linear algebra, a projection matrix PP provides an orthogonal mapping of a vector v\mathbf{v} in a onto a subspace spanned by the columns of a matrix AA, such that the projected vector PvP\mathbf{v} lies within the subspace and the residual vector vPv\mathbf{v} - P\mathbf{v} is to every vector in that subspace. This orthogonality ensures that the projection minimizes the between v\mathbf{v} and the subspace, representing the "closest point" approximation in the geometric sense. Consider an example in R2\mathbb{R}^2, where the subspace is the x-axis, spanned by the standard basis vector e1=(10)\mathbf{e}_1 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}. The corresponding orthogonal projection matrix is P=(1000)P = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}, which maps any vector v=(xy)\mathbf{v} = \begin{pmatrix} x \\ y \end{pmatrix} to Pv=(x0)P\mathbf{v} = \begin{pmatrix} x \\ 0 \end{pmatrix}, effectively dropping the y-component while preserving the x-coordinate. Geometrically, this can be visualized as a vector v\mathbf{v} being "dropped" perpendicularly from its tip to the x-axis line, forming a between the residual (0y)\begin{pmatrix} 0 \\ y \end{pmatrix} and the projected point on the axis; such diagrams often illustrate the v=Pv+(vPv)\mathbf{v} = P\mathbf{v} + (\mathbf{v} - P\mathbf{v}) with the residual orthogonal to the subspace. This focus on orthogonality distinguishes orthogonal projections from oblique projections, where the residuals are not perpendicular to the subspace, leading to a slanted "drop" rather than a right-angled one; however, the orthogonal case maintains the property that applying PP repeatedly to a vector in the subspace yields the same result, keeping it fixed within the subspace. In higher dimensions, such as projecting onto a plane in R3\mathbb{R}^3, the visualization extends to the residual being perpendicular to the entire plane, akin to a shadow cast by perpendicular light rays onto a flat surface.

Properties

Algebraic Properties

A projection matrix PP in linear algebra is characterized by its idempotence, meaning P2=PP^2 = P. This property implies that PP acts as a projection onto an , where applying PP multiple times yields the same result as applying it once, leaving vectors in the range of PP unchanged while mapping others to that subspace. To see this, consider the spectral decomposition of PP, where its eigenvalues are either 0 or 1; thus, P2P^2 has the same eigenvalues, confirming P2=PP^2 = P. For orthogonal projections, PP is also symmetric, satisfying PT=PP^T = P. This symmetry ensures that the projection is self-adjoint, preserving inner products in the sense that Pv,w=v,Pw\langle Pv, w \rangle = \langle v, Pw \rangle for all vectors v,wv, w. Consequently, PP coincides with its own Moore-Penrose pseudoinverse in the context of orthogonal projections onto the column space, as P+=PP^+ = P. The rank of PP equals the dimension of its range, which is the subspace onto which it projects, and this rank is also equal to the trace of PP. Since the nonzero eigenvalues of PP are all 1 and their count matches the rank, the trace, as the sum of eigenvalues, directly gives trace(P)=rank(P)\operatorname{trace}(P) = \operatorname{rank}(P). Regarding null spaces, the kernel of IPI - P is precisely the range of PP, while the kernel of PP is the range of IPI - P. This decomposition highlights how PP and IPI - P partition the space into the projected subspace and its for orthogonal projections. A standard formula for the orthogonal projection matrix onto the column space of a full column rank matrix ARm×nA \in \mathbb{R}^{m \times n} (with mnm \geq n) is P=AA+P = A A^+, where A+A^+ is the Moore-Penrose pseudoinverse of AA. For full column rank, A+=(ATA)1ATA^+ = (A^T A)^{-1} A^T, so P=A(ATA)1AT.P = A (A^T A)^{-1} A^T. To derive this, note that PP must satisfy P2=PP^2 = P and project onto range(A)\operatorname{range}(A). Substituting yields P2=A(ATA)1ATA(ATA)1AT=A(ATA)1AT=P,P^2 = A (A^T A)^{-1} A^T A (A^T A)^{-1} A^T = A (A^T A)^{-1} A^T = P, confirming idempotence, and the columns of AA lie in the range of PP since PA=APA = A. Symmetry follows from the transpose: PT=A(ATA)1AT=PP^T = A (A^T A)^{-1} A^T = P. This form generalizes via the pseudoinverse for rank-deficient cases, where AA+A A^+ remains the orthogonal projection onto range(A)\operatorname{range}(A).

Trace and Rank Interpretations

In the context of , the trace of a projection matrix PP equals its rank, which corresponds to the number of columns in the XX, or equivalently, the number of parameters in the model, assuming XX has full column rank. This equality holds because PP is idempotent and symmetric, projecting onto the column space of XX. As established in the algebraic properties of projection matrices, trace(P)=rank(P)\operatorname{trace}(P) = \operatorname{rank}(P). The trace of PP provides a measure of model complexity in , representing the associated with the fitted values. Specifically, trace(P)=p\operatorname{trace}(P) = p, where pp is the number of parameters, while the trace of the residual projection matrix IPI - P equals npn - p, indicating the for the residuals, with nn denoting the number of observations. This interpretation links the projection's dimensionality directly to , such as in estimating variance or testing hypotheses. The diagonal elements of PP, known as leverage values, quantify the influence of each observation on the fitted values; their sum equals trace(P)=p\operatorname{trace}(P) = p, yielding an average leverage of p/np/n. This sum underscores the overall "pull" of the model on the data, with high-leverage points potentially affecting fit stability. For instance, in with an intercept and slope, the projection matrix has trace 2, reflecting the two parameters and the rank of the two-column .

Applications in Statistics

Ordinary Least Squares

In the ordinary least squares (OLS) framework, the linear regression model is expressed in matrix form as y=Xβ+ϵ\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}, where y\mathbf{y} is an n×1n \times 1 vector of observations, X\mathbf{X} is an n×pn \times p design matrix with full column rank, β\boldsymbol{\beta} is a p×1p \times 1 vector of unknown parameters, and ϵ\boldsymbol{\epsilon} is an n×1n \times 1 vector of errors. The OLS estimator minimizes the residual sum of squares yXβ^2\|\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}\|^2 and is given by β^=(XX)1Xy\hat{\boldsymbol{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}. This estimator is unbiased and has minimum variance among linear unbiased estimators under the model's assumptions. The fitted values are y^=Xβ^=X(XX)1Xy\hat{\mathbf{y}} = \mathbf{X} \hat{\boldsymbol{\beta}} = \mathbf{X} (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}, which can be written compactly as y^=Py\hat{\mathbf{y}} = \mathbf{P} \mathbf{y}, where P=X(XX)1X\mathbf{P} = \mathbf{X} (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top is the projection matrix, also known as the hat matrix. Geometrically, P\mathbf{P} orthogonally projects the response vector y\mathbf{y} onto the column of X\mathbf{X}, ensuring that y^\hat{\mathbf{y}} lies in this subspace and minimizes the to y\mathbf{y}. The residuals are defined as e=yy^=(IP)y\mathbf{e} = \mathbf{y} - \hat{\mathbf{y}} = (\mathbf{I} - \mathbf{P}) \mathbf{y}, where I\mathbf{I} is the n×nn \times n . These residuals are orthogonal to the columns of X\mathbf{X}, satisfying Xe=0\mathbf{X}^\top \mathbf{e} = \mathbf{0}, which implies that the unexplained variation is perpendicular to the fitted spanned by the predictors. This property decomposes y\mathbf{y} into fitted and residual components with no between them. The projection matrix P\mathbf{P} in OLS arises under the assumptions of a , errors with zero conditional E[ϵX]=0E[\boldsymbol{\epsilon} \mid \mathbf{X}] = \mathbf{0}, homoscedasticity Var(ϵX)=σ2I\text{Var}(\boldsymbol{\epsilon} \mid \mathbf{X}) = \sigma^2 \mathbf{I}, and uncorrelated errors. These conditions ensure that P\mathbf{P} represents an orthogonal projection onto Col(X)\text{Col}(\mathbf{X}), with P\mathbf{P} being symmetric and idempotent. For a univariate example with an intercept, consider the simple linear model yi=β0+β1xi+ϵiy_i = \beta_0 + \beta_1 x_i + \epsilon_i for i=1,,ni = 1, \dots, n, where the X\mathbf{X} has first column of ones and second column x=(x1,,xn)\mathbf{x} = (x_1, \dots, x_n)^\top. The projection matrix P\mathbf{P} then projects y\mathbf{y} onto the span of {1,x}\{\mathbf{1}, \mathbf{x}\}, yielding fitted values that represent the best linear approximation in the sense; for instance, with centered predictors, the estimate β^1=(xixˉ)(yiyˉ)(xixˉ)2\hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} aligns with this projection.

Generalized Least Squares

In the context of models where the errors exhibit heteroscedasticity or , the (GLS) method employs a projection matrix adapted to the error structure. Consider the y=Xβ+ϵy = X \beta + \epsilon, where E(ϵ)=0\mathbb{E}(\epsilon) = 0 and Var(ϵ)=Σ\mathrm{Var}(\epsilon) = \Sigma, with Σ\Sigma a positive definite matrix. The GLS of the parameter vector β\beta is given by β^=(XTΣ1X)1XTΣ1y.\hat{\beta} = (X^T \Sigma^{-1} X)^{-1} X^T \Sigma^{-1} y. The fitted values are then y^=Hy\hat{y} = H y, where the projection matrix H=X(XTΣ1X)1XTΣ1H = X (X^T \Sigma^{-1} X)^{-1} X^T \Sigma^{-1} projects the response vector onto the column space of XX with respect to the inner product induced by Σ1\Sigma^{-1}. This formulation generalizes the case, which arises as a special instance when Σ=σ2I\Sigma = \sigma^2 I. A key special case of GLS is (WLS), which applies when the errors are uncorrelated but have unequal variances, so Σ\Sigma is diagonal with entries σi2\sigma_i^2. In this scenario, the weights are wi=1/σi2w_i = 1 / \sigma_i^2, and the projection matrix becomes H=X(XTWX)1XTWH = X (X^T W X)^{-1} X^T W, where W=Σ1W = \Sigma^{-1} is the diagonal weight matrix. This weighting ensures that observations with smaller error variances contribute more to the estimation, yielding more efficient parameter estimates under the specified error structure. The projection matrix HH in GLS retains the idempotence property, satisfying H2=HH^2 = H, which confirms its role as a projection operator. However, unlike the orthogonal projection in ordinary , HH is generally not symmetric (HTHH^T \neq H) unless Σ\Sigma is a scalar multiple of the , reflecting the oblique nature of the projection in the presence of correlated or heteroscedastic errors. For the residuals e=yy^=(IH)ye = y - \hat{y} = (I - H) y, the is Var(e)=(IH)Σ(IH)T\mathrm{Var}(e) = (I - H) \Sigma (I - H)^T, demonstrating how the projection accounts for the underlying error structure to produce unbiased residuals with adjusted variance.

Advanced Formulations

Oblique Projections

Oblique projections extend the concept of projections beyond the orthogonal case by allowing the direction of projection to be non-perpendicular to the target subspace. A matrix PP represents an oblique projection if it is idempotent, satisfying P2=PP^2 = P, but not symmetric, so PTPP^T \neq P; consequently, the projected vector and the residual vector (IP)x(I - P)x are not orthogonal under the standard inner product. This distinguishes oblique projections from orthogonal ones, where symmetry holds and orthogonality is preserved. The formula for the oblique projection onto the column space of a matrix AA parallel to the column space of BB is given by P=A(ATB1A)1ATB1,P = A (A^T B^{-1} A)^{-1} A^T B^{-1}, assuming BB is positive definite and invertible, and the inner matrices have full rank to ensure well-definedness. This expression captures projections in spaces equipped with a non-standard metric induced by B1B^{-1}, where the "parallel" direction aligns with the geometry defined by BB. Representative examples of oblique projections appear in coordinate transformations and non-Euclidean metrics, such as affine mappings or engineering applications requiring angled views. For instance, in a 2D setting analogous to multiview projections, the matrix P=[10cotθ001cotϕ000000001]P = \begin{bmatrix} 1 & 0 & -\cot \theta & 0 \\ 0 & 1 & -\cot \phi & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}
Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.