Leverage (statistics)

current hub

Write something...

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

About hubStatsRules

See all

Wikipedia

Grokipedia

In statistics and in particular in regression analysis, leverage is a measure of how far away the independent variable values of an observation are from those of the other observations. High-leverage points, if any, are outliers with respect to the independent variables. That is, high-leverage points have no neighboring points in $\mathbb {R} ^{p}$ space, where ${p}$ is the number of independent variables in a regression model. This makes the fitted model likely to pass close to a high leverage observation. Hence high-leverage points have the potential to cause large changes in the parameter estimates when they are deleted i.e., to be influential points. Although an influential point will typically have high leverage, a high leverage point is not necessarily an influential point. The leverage is typically defined as the diagonal elements of the hat matrix.

Consider the linear regression model ${y}_{i}={\boldsymbol {x}}_{i}^{\top }{\boldsymbol {\beta }}+{\varepsilon }_{i}$ , $i=1,\,2,\ldots ,\,n$ . That is, ${\boldsymbol {y}}=\mathbf {X} {\boldsymbol {\beta }}+{\boldsymbol {\varepsilon }}$ , where, $\mathbf {X}$ is the $n\times p$ design matrix whose rows correspond to the observations and whose columns correspond to the independent or explanatory variables. The leverage score for the ${i}^{th}$ independent observation ${\boldsymbol {x}}_{i}$ is given as:

Thus the ${i}^{th}$ leverage score can be viewed as the 'weighted' distance between ${\boldsymbol {x}}_{i}$ to the mean of ${\boldsymbol {x}}_{i}$ 's (see its relation with Mahalanobis distance). It can also be interpreted as the degree by which the ${i}^{th}$ measured (dependent) value (i.e., $y_{i}$ ) influences the ${i}^{th}$ fitted (predicted) value (i.e., ${\widehat {y\,}}_{i}$ ): mathematically,

Hence, the leverage score is also known as the observation self-sensitivity or self-influence. Using the fact that ${\boldsymbol {\widehat {y}}}={\mathbf {H} }{\boldsymbol {y}}$ (i.e., the prediction ${\boldsymbol {\widehat {y}}}$ is ortho-projection of ${\boldsymbol {y}}$ onto range space of $\mathbf {X}$ ) in the above expression, we get $h_{ii}=\left[\mathbf {H} \right]_{ii}$ . Note that this leverage depends on the values of the explanatory variables $(\mathbf {X} )$ of all observations but not on any of the values of the dependent variables $(y_{i})$ .

where $\operatorname {Tr}$ is the trace operator.

Large leverage ${h_{ii}}$ corresponds to an ${{\boldsymbol {x}}_{i}}$ that is extreme. A common rule is to identify ${{\boldsymbol {x}}_{i}}$ whose leverage value ${h}_{ii}$ is more than 2 times larger than the mean leverage ${\bar {h}}={\dfrac {1}{n}}\sum _{i=1}^{n}h_{ii}={\dfrac {p}{n}}$ (see property 2 above). That is, if $h_{ii}>2{\dfrac {p}{n}}$ , ${{\boldsymbol {x}}_{i}}$ shall be considered an outlier. Some statisticians prefer the threshold of $3p/{n}$ instead of $2p/{n}$ .

Leverage is closely related to the Mahalanobis distance (proof). Specifically, for some $n\times p$ matrix $\mathbf {X}$ , the squared Mahalanobis distance of ${{\boldsymbol {x}}_{i}}$ (where ${\boldsymbol {x}}_{i}^{\top }$ is ${i}^{th}$ row of $\mathbf {X}$ ) from the vector of mean ${\widehat {\boldsymbol {\mu }}}=\sum _{i=1}^{n}{\boldsymbol {x}}_{i}$ of length $p$ , is $D^{2}({\boldsymbol {x}}_{i})=({\boldsymbol {x}}_{i}-{\widehat {\boldsymbol {\mu }}})^{\top }\mathbf {S} ^{-1}({\boldsymbol {x}}_{i}-{\widehat {\boldsymbol {\mu }}})$ , where $\mathbf {S} =\mathbf {X} ^{\top }\mathbf {X}$ is the estimated covariance matrix of ${{\boldsymbol {x}}_{i}}$ 's. This is related to the leverage $h_{ii}$ of the hat matrix of $\mathbf {X}$ after appending a column vector of 1's to it. The relationship between the two is: