Ridge regression

Ridge regression

current hub

Write something...

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

About hubStatsRules

See all

Wikipedia

Grokipedia

Ridge regression (also known as Tikhonov regularization, named for Andrey Tikhonov) is a method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated. It has been used in many fields including econometrics, chemistry, and engineering. It is a method of regularization of ill-posed problems. It is particularly useful to mitigate the problem of multicollinearity in linear regression, which commonly occurs in models with large numbers of parameters. In general, the method provides improved efficiency in parameter estimation problems in exchange for a tolerable amount of bias (see bias–variance tradeoff).

The theory was first introduced by Hoerl and Kennard in 1970 in their Technometrics papers "Ridge regressions: biased estimation of nonorthogonal problems" and "Ridge regressions: applications in nonorthogonal problems".

Ridge regression was developed as a possible solution to the imprecision of least square estimators when linear regression models have some multicollinear (highly correlated) independent variables—by creating a ridge regression estimator (RR). This provides a more precise ridge parameters estimate, as its variance and mean square estimator are often smaller than the least square estimators previously derived.

In the ordinary least squares solution of

the problem of a near-singular moment matrix $\mathbf {X} ^{\mathsf {T}}\mathbf {X}$ is alleviated by adding positive elements to the diagonals, thereby decreasing its condition number. Compared to the ordinary least squares estimator, the simple ridge estimator has an extra term $\lambda \mathbf {I}$ in the denominator: ${\hat {\boldsymbol {\beta }}}_{R}=\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} +\lambda \mathbf {I} \right)^{-1}\mathbf {X} ^{\mathsf {T}}\mathbf {y}$ where $\mathbf {y}$ is the regressand, $\mathbf {X}$ is the design matrix, $\mathbf {I}$ is the identity matrix, and the ridge parameter $\lambda \geq 0$ serves as the constant shifting the diagonals of the moment matrix. It can be shown that this estimator is the solution to the least squares problem subject to the constraint ${\boldsymbol {\beta }}^{\mathsf {T}}{\boldsymbol {\beta }}=c$ , which can be expressed as a Lagrangian minimization: ${\hat {\boldsymbol {\beta }}}_{R}={\text{argmin}}_{\boldsymbol {\beta }}\,\left(\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }}\right)^{\mathsf {T}}\left(\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }}\right)+\lambda \left({\boldsymbol {\beta }}^{\mathsf {T}}{\boldsymbol {\beta }}-c\right)$ which shows that $\lambda$ is nothing but the Lagrange multiplier of the constraint. In fact, there is a one-to-one relationship between $c$ and $\beta$ and since, in practice, we do not know $c$ , we define $\lambda$ heuristically or find it via additional data-fitting strategies, see Determination of the Tikhonov factor.

Note that, when $\lambda =0$ , in which case the constraint is non-binding, the ridge estimator reduces to ordinary least squares. A more general approach to Tikhonov regularization is discussed below.

Tikhonov regularization was invented independently in many different contexts. It became widely known through its application to integral equations in the works of Andrey Tikhonov and David L. Phillips. Some authors use the term Tikhonov–Phillips regularization. The finite-dimensional case was expounded by Arthur E. Hoerl, who took a statistical approach, and by Manus Foster, who interpreted this method as a Wiener–Kolmogorov (Kriging) filter. Following Hoerl, it is known in the statistical literature as ridge regression, named after ridge analysis ("ridge" refers to the path from the constrained maximum).

Suppose that for a known real matrix $A$ and vector $\mathbf {b}$ , we wish to find a vector $\mathbf {x}$ such that $A\mathbf {x} =\mathbf {b} ,$ where $\mathbf {x}$ and $\mathbf {b}$ may be of different sizes and $A$ may be non-square.

See all

Hub AI

Ridge regression AI simulator

(@Ridge regression_simulator)

Wikipedia

Grokipedia

Hub AI

Ridge regression

In the ordinary least squares solution of

See all

Knowledge Base

Talk Channels

Special Pages

Ridge regression

Ridge regression

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Ridge regression

Hub AI

Ridge regression

History

Ridge regression

Ridge regression

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Ridge regression

Hub AI

Ridge regression