Clustered standard errors

Clustered standard errors (or Liang-Zeger standard errors)^[1] are measurements that estimate the standard error of a regression parameter in settings where observations may be subdivided into smaller-sized groups ("clusters") and where the sampling and/or treatment assignment is correlated within each group.^[2]^[3] Clustered standard errors are widely used in a variety of applied econometric settings, including difference-in-differences^[4] or experiments.^[5]

Analogous to how Huber-White standard errors are consistent in the presence of heteroscedasticity and Newey–West standard errors are consistent in the presence of accurately-modeled autocorrelation, clustered standard errors are consistent in the presence of cluster-based sampling or treatment assignment. Clustered standard errors are often justified by possible correlation in modeling residuals within each cluster; while recent work suggests that this is not the precise justification behind clustering,^[6] it may be pedagogically useful.

Intuitive motivation

Clustered standard errors are often useful when treatment is assigned at the level of a cluster instead of at the individual level. For example, suppose that an educational researcher wants to discover whether a new teaching technique improves student test scores. She therefore assigns teachers in "treated" classrooms to try this new technique, while leaving "control" classrooms unaffected. When analyzing her results, she may want to keep the data at the student level (for example, to control for student-level observable characteristics). However, when estimating the standard error or confidence interval of her statistical model, she realizes that classical or even heteroscedasticity-robust standard errors are inappropriate because student test scores within each class are not independently distributed. Instead, students in classes with better teachers have especially high test scores (regardless of whether they receive the experimental treatment) while students in classes with worse teachers have especially low test scores. The researcher can cluster her standard errors at the level of a classroom to account for this aspect of her experiment.^[7]

While this example is very specific, similar issues arise in a wide variety of settings. For example, in many panel data settings (such as difference-in-differences) clustering often offers a simple and effective way to account for non-independence between periods within each unit (sometimes referred to as "autocorrelation in residuals").^[4] Another common and logically distinct justification for clustering arises when a full population cannot be randomly sampled, and so instead clusters are sampled and then units are randomized within cluster. In this case, clustered standard errors account for the uncertainty driven by the fact that the researcher does not observe large parts of the population of interest.^[8]

Mathematical motivation

A useful mathematical illustration comes from the case of one-way clustering in an ordinary least squares (OLS) model. Consider a simple model with N observations that are subdivided in C clusters. Let $Y$ be an $n\times 1$ vector of outcomes, $X$ a $n\times m$ matrix of covariates, $\beta$ an $m\times 1$ vector of unknown parameters, and $e$ an $n\times 1$ vector of unexplained residuals:

Y=X\beta +e

As is standard with OLS models, we minimize the sum of squared residuals $e$ to get an estimate ${\hat {\beta }}$ :

\min _{\beta }(Y-X\beta )^{2}

\Rightarrow X'(Y-X{\hat {\beta }})=0

\Rightarrow {\hat {\beta }}=(X'X)^{-1}X'Y

From there, we can derive the classic "sandwich" estimator:

V({\hat {\beta }})=V((X'X)^{-1}X'Y)=V(\beta +(X'X)^{-1}X'e)=V((X'X)^{-1}X'e)=(X'X)^{-1}X'ee'X(X'X)^{-1}

Denoting $\Omega \equiv ee'$ yields a potentially more familiar form

V({\hat {\beta }})=(X'X)^{-1}X'\Omega X(X'X)^{-1}

While one can develop a plug-in estimator by defining ${\hat {e}}\equiv Y-X{\hat {\beta }}$ and letting ${\hat {\Omega }}\equiv {\hat {e}}{\hat {e}}'$ , this completely flexible estimator will not converge to $V({\hat {\beta }})$ as $N\rightarrow \infty$ . Given the assumptions that a practitioner deems as reasonable, different types of standard errors solve this problem in different ways. For example, classic homoskedastic standard errors assume that $\Omega$ is diagonal with identical elements $\sigma ^{2}$ , which simplifies the expression for $V({\hat {\beta }})=\sigma ^{2}(X'X)^{-1}$ . Huber-White standard errors assume $\Omega$ is diagonal but that the diagonal value varies, while other types of standard errors (e.g. Newey–West, Moulton SEs, Conley spatial SEs) make other restrictions on the form of this matrix to reduce the number of parameters that the practitioner needs to estimate.

Clustered standard errors assume that $\Omega$ is block-diagonal according to the clusters in the sample, with unrestricted values in each block but zeros elsewhere. In this case, one can define $X_{c}$ and $\Omega _{c}$ as the within-block analogues of $X$ and $\Omega$ and derive the following mathematical fact:

X'\Omega X=\sum _{c}X'_{c}\Omega _{c}X_{c}

By constructing plug-in matrices ${\hat {\Omega }}_{c}$ , one can form an estimator for $V({\hat {\beta }})$ that is consistent as the number of clusters $c$ becomes large. While no specific number of clusters is statistically proven to be sufficient, practitioners often cite a number in the range of 30-50 and are comfortable using clustered standard errors when the number of clusters exceeds that threshold.

Alternatively, finite-sample modifications are also typically used, to reduce downwards bias $V({\hat {\beta }})$ due to finite C.^[9] Often practitioners use the following bias corrected estimator:

{\hat {V}}({\hat {\beta }})=(X'X)^{-1}\sum _{c}X'_{c}{\hat {\Omega }}_{c}X_{c}(X'X)^{-1}{\frac {C}{C-1}}{\frac {n-1}{n-k}}.

However, more recent practice has shifted towards analogues of the heteroscedasticity-robust HC2 and HC3 estimators.^[9] Often called the CR2 and CR3 estimators, these estimators are unbiased under certain assumptions. They also have been shown, especially when combined with degrees of freedom corrections for use in building confidence intervals, to produce better coverage rates when the number of clusters is not large.^[10]^[11]

References

^ Liang, Kung-Yee; Zeger, Scott L. (1986-04-01). "Longitudinal data analysis using generalized linear models". Biometrika. 73 (1): 13–22. doi:10.1093/biomet/73.1.13. ISSN 0006-3444.
^ Cameron, A. Colin; Miller, Douglas L. (2015-03-31). "A Practitioner's Guide to Cluster-Robust Inference". Journal of Human Resources. 50 (2): 317–372. CiteSeerX 10.1.1.703.724. doi:10.3368/jhr.50.2.317. ISSN 0022-166X. S2CID 1296789.
^ "ARE 212". Fiona Burlig. Retrieved 2020-07-05.
^ ^a ^b Bertrand, Marianne; Duflo, Esther; Mullainathan, Sendhil (2004-02-01). "How Much Should We Trust Differences-In-Differences Estimates?". The Quarterly Journal of Economics. 119 (1): 249–275. doi:10.1162/003355304772839588. hdl:1721.1/63690. ISSN 0033-5533. S2CID 470667.
^ Yixin Tang (2019-09-11). "Analyzing Switchback Experiments by Cluster Robust Standard Error to prevent false positive results". DoorDash Engineering Blog. Retrieved 2020-07-05.
^ Abadie, Alberto; Athey, Susan; Imbens, Guido; Wooldridge, Jeffrey (2017-10-24). "When Should You Adjust Standard Errors for Clustering?". arXiv:1710.02926 [math.ST].
^ "CLUSTERED STANDARD ERRORS". Economic Theory Blog. 2016. Archived from the original on 2016-11-06. Retrieved 28 September 2021.
^ "When should you cluster standard errors? New wisdom from the econometrics oracle". blogs.worldbank.org. Retrieved 2020-07-05.
^ ^a ^b "A Practitioner's Guide to Cluster-Robust Inference" (PDF). UC Davis - Economics. Retrieved 2024-07-04.
^ Bell, Robert M; McCaffrey, Daniel F (December 2002). "Bias Reduction in Standard Errors for Linear Regression with Multi-Stage Samples" (PDF). Survey Methodology. 28 (2): 169–181.
^ Imbens, Guido W; Kolesár, Michal (October 2012). "Robust Standard Errors in Small Samples: Some Practical Advice". NBER Working Paper No. w18478.

[1] Liang, Kung-Yee; Zeger, Scott L. (1986-04-01). "Longitudinal data analysis using generalized linear models". Biometrika. 73 (1): 13–22. doi:10.1093/biomet/73.1.13. ISSN 0006-3444.

[2] Cameron, A. Colin; Miller, Douglas L. (2015-03-31). "A Practitioner's Guide to Cluster-Robust Inference". Journal of Human Resources. 50 (2): 317–372. CiteSeerX 10.1.1.703.724. doi:10.3368/jhr.50.2.317. ISSN 0022-166X. S2CID 1296789.

[3] "ARE 212". Fiona Burlig. Retrieved 2020-07-05.

[:0-4] Bertrand, Marianne; Duflo, Esther; Mullainathan, Sendhil (2004-02-01). "How Much Should We Trust Differences-In-Differences Estimates?". The Quarterly Journal of Economics. 119 (1): 249–275. doi:10.1162/003355304772839588. hdl:1721.1/63690. ISSN 0033-5533. S2CID 470667.

[5] Yixin Tang (2019-09-11). "Analyzing Switchback Experiments by Cluster Robust Standard Error to prevent false positive results". DoorDash Engineering Blog. Retrieved 2020-07-05.

[6] Abadie, Alberto; Athey, Susan; Imbens, Guido; Wooldridge, Jeffrey (2017-10-24). "When Should You Adjust Standard Errors for Clustering?". arXiv:1710.02926 [math.ST].

[7] "CLUSTERED STANDARD ERRORS". Economic Theory Blog. 2016. Archived from the original on 2016-11-06. Retrieved 28 September 2021.

[8] "When should you cluster standard errors? New wisdom from the econometrics oracle". blogs.worldbank.org. Retrieved 2020-07-05.

[:1-9] "A Practitioner's Guide to Cluster-Robust Inference" (PDF). UC Davis - Economics. Retrieved 2024-07-04.

[10] Bell, Robert M; McCaffrey, Daniel F (December 2002). "Bias Reduction in Standard Errors for Linear Regression with Multi-Stage Samples" (PDF). Survey Methodology. 28 (2): 169–181.

[11] Imbens, Guido W; Kolesár, Michal (October 2012). "Robust Standard Errors in Small Samples: Some Practical Advice". NBER Working Paper No. w18478.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

History

Clustered standard errors

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Clustered standard errors

Intuitive motivation

Mathematical motivation

Further reading

References

Clustered standard errors

Background on Regression Standard Errors

Ordinary Least Squares Standard Errors

Violations of Independence in Data

Motivation for Clustering

Intuitive Rationale

Statistical Foundations

Mathematical Formulation

Model Assumptions and Clustering

Variance-Covariance Matrix Derivation

Add your contribution

Related Hubs

Contribute something