Generalized least squares

current hub

Write something...

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

About hubStatsRules

See all

Wikipedia

Grokipedia

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression model. GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences, as compared to conventional least squares and weighted least squares methods. It was first described by Alexander Aitken in 1935.

It requires knowledge of the covariance matrix for the residuals. If this is unknown, estimating the covariance matrix gives the method of feasible generalized least squares (FGLS). However, FGLS provides fewer guarantees of improvement.

In standard linear regression models, one observes data $\{y_{i},x_{ij}\}_{i=1,\dots ,n,j=2,\dots ,k}$ on n statistical units with k − 1 predictor values and one response value each.

The response values are placed in a vector, $\mathbf {y} \equiv {\begin{pmatrix}y_{1}\\\vdots \\y_{n}\end{pmatrix}},$ and the predictor values are placed in the design matrix, $\mathbf {X} \equiv {\begin{pmatrix}1&x_{12}&x_{13}&\cdots &x_{1k}\\1&x_{22}&x_{23}&\cdots &x_{2k}\\\vdots &\vdots &\vdots &\ddots &\vdots \\1&x_{n2}&x_{n3}&\cdots &x_{nk}\end{pmatrix}},$ where each row is a vector of the $k$ predictor variables (including a constant) for the $i$ th data point.

The model assumes that the conditional mean of $\mathbf {y}$ given $\mathbf {X}$ to be a linear function of $\mathbf {X}$ and that the conditional variance of the error term given $\mathbf {X}$ is a known non-singular covariance matrix, $\mathbf {\Omega }$ . That is, $\mathbf {y} =\mathbf {X} {\boldsymbol {\beta }}+{\boldsymbol {\varepsilon }},\quad \operatorname {E} [{\boldsymbol {\varepsilon }}\mid \mathbf {X} ]=0,\quad \operatorname {Cov} [{\boldsymbol {\varepsilon }}\mid \mathbf {X} ]={\boldsymbol {\Omega }},$ where ${\boldsymbol {\beta }}\in \mathbb {R} ^{k}$ is a vector of unknown constants, called "regression coefficients", which are estimated from the data.

If $\mathbf {b}$ is a candidate estimate for ${\boldsymbol {\beta }}$ , then the residual vector for $\mathbf {b}$ is $\mathbf {y} -\mathbf {X} \mathbf {b}$ . The generalized least squares method estimates ${\boldsymbol {\beta }}$ by minimizing the squared Mahalanobis length of this residual vector: ${\begin{aligned}{\hat {\boldsymbol {\beta }}}&={\underset {\mathbf {b} }{\operatorname {argmin} }}\,(\mathbf {y} -\mathbf {X} \mathbf {b} )^{\mathrm {T} }\mathbf {\Omega } ^{-1}(\mathbf {y} -\mathbf {X} \mathbf {b} )\\&={\underset {\mathbf {b} }{\operatorname {argmin} }}\,\mathbf {y} ^{\mathrm {T} }\,\mathbf {\Omega } ^{-1}\mathbf {y} +(\mathbf {X} \mathbf {b} )^{\mathrm {T} }\mathbf {\Omega } ^{-1}\mathbf {X} \mathbf {b} -\mathbf {y} ^{\mathrm {T} }\mathbf {\Omega } ^{-1}\mathbf {X} \mathbf {b} -(\mathbf {X} \mathbf {b} )^{\mathrm {T} }\mathbf {\Omega } ^{-1}\mathbf {y} \,,\end{aligned}}$ which is equivalent to ${\hat {\boldsymbol {\beta }}}={\underset {\mathbf {b} }{\operatorname {argmin} }}\,\mathbf {y} ^{\mathrm {T} }\,\mathbf {\Omega } ^{-1}\mathbf {y} +\mathbf {b} ^{\mathrm {T} }\mathbf {X} ^{\mathrm {T} }\mathbf {\Omega } ^{-1}\mathbf {X} \mathbf {b} -2\mathbf {b} ^{\mathrm {T} }\mathbf {X} ^{\mathrm {T} }\mathbf {\Omega } ^{-1}\mathbf {y} ,$ which is a quadratic programming problem. The stationary point of the objective function occurs when $2\mathbf {X} ^{\mathrm {T} }\mathbf {\Omega } ^{-1}\mathbf {X} {\mathbf {b} }-2\mathbf {X} ^{\mathrm {T} }\mathbf {\Omega } ^{-1}\mathbf {y} =0,$ so the estimator is ${\hat {\boldsymbol {\beta }}}=\left(\mathbf {X} ^{\mathrm {T} }\mathbf {\Omega } ^{-1}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathrm {T} }\mathbf {\Omega } ^{-1}\mathbf {y} .$ The quantity $\mathbf {\Omega } ^{-1}$ is known as the precision matrix (or dispersion matrix), a generalization of the diagonal weight matrix.

The GLS estimator is unbiased, consistent, efficient, and asymptotically normal with $\operatorname {E} [{\hat {\boldsymbol {\beta }}}\mid \mathbf {X} ]={\boldsymbol {\beta }},\quad {\text{and}}\quad \operatorname {Cov} [{\hat {\boldsymbol {\beta }}}\mid \mathbf {X} ]=(\mathbf {X} ^{\mathrm {T} }{\boldsymbol {\Omega }}^{-1}\mathbf {X} )^{-1}.$ GLS is equivalent to applying ordinary least squares (OLS) to a linearly transformed version of the data. This can be seen by factoring $\mathbf {\Omega } =\mathbf {C} \mathbf {C} ^{\mathrm {T} }$ using a method such as Cholesky decomposition. Left-multiplying both sides of $\mathbf {y} =\mathbf {X} {\boldsymbol {\beta }}+{\boldsymbol {\varepsilon }}$ by $\mathbf {C} ^{-1}$ yields an equivalent linear model: $\mathbf {y} ^{*}=\mathbf {X} ^{*}{\boldsymbol {\beta }}+{\boldsymbol {\varepsilon }}^{*},\quad {\text{where}}\quad \mathbf {y} ^{*}=\mathbf {C} ^{-1}\mathbf {y} ,\quad \mathbf {X} ^{*}=\mathbf {C} ^{-1}\mathbf {X} ,\quad {\boldsymbol {\varepsilon }}^{*}=\mathbf {C} ^{-1}{\boldsymbol {\varepsilon }}.$ In this model, $\operatorname {Var} [{\boldsymbol {\varepsilon }}^{*}\mid \mathbf {X} ]=\mathbf {C} ^{-1}\mathbf {\Omega } \left(\mathbf {C} ^{-1}\right)^{\mathrm {T} }=\mathbf {I}$ , where $\mathbf {I}$ is the identity matrix. Then, ${\boldsymbol {\beta }}$ can be efficiently estimated by applying OLS to the transformed data, which requires minimizing the objective, $\left(\mathbf {y} ^{*}-\mathbf {X} ^{*}{\boldsymbol {\beta }}\right)^{\mathrm {T} }(\mathbf {y} ^{*}-\mathbf {X} ^{*}{\boldsymbol {\beta }})=(\mathbf {y} -\mathbf {X} \mathbf {b} )^{\mathrm {T} }\,\mathbf {\Omega } ^{-1}(\mathbf {y} -\mathbf {X} \mathbf {b} ).$ This transformation effectively standardizes the scale of and de-correlates the errors. When OLS is used on data with homoscedastic errors, the Gauss–Markov theorem applies, so the GLS estimate is the best linear unbiased estimator for ${\boldsymbol {\beta }}$ .

A special case of GLS, called weighted least squares (WLS), occurs when all the off-diagonal entries of Ω are 0. This situation arises when the variances of the observed values are unequal or when heteroscedasticity is present, but no correlations exist among the observed variances. The weight for unit i is proportional to the reciprocal of the variance of the response for unit i.

See all

Hub AI

Generalized least squares AI simulator

(@Generalized least squares_simulator)

Wikipedia

Grokipedia

Hub AI

Generalized least squares

In standard linear regression models, one observes data $\{y_{i},x_{ij}\}_{i=1,\dots ,n,j=2,\dots ,k}$ on n statistical units with k − 1 predictor values and one response value each.

See all

Knowledge Base

Talk Channels

Special Pages

Generalized least squares

Generalized least squares

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Generalized least squares

Hub AI

Generalized least squares

History

Generalized least squares

Generalized least squares

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Generalized least squares

Hub AI

Generalized least squares