Hubbry Logo
Instrumental variables estimationInstrumental variables estimationMain
Open search
Instrumental variables estimation
Community hub
Instrumental variables estimation
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Instrumental variables estimation
Instrumental variables estimation
from Wikipedia

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment.[1] Intuitively, IVs are used when an explanatory (also known as independent or predictor) variable of interest is correlated with the error term (endogenous), in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable (is correlated with the endogenous variable) but has no independent effect on the dependent variable and is not correlated with the error term, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.

Instrumental variable methods allow for consistent estimation when the explanatory variables (covariates) are correlated with the error terms in a regression model. Such correlation may occur when:

  1. changes in the dependent variable change the value of at least one of the covariates ("reverse" causation),
  2. there are omitted variables that affect both the dependent and explanatory variables, or
  3. the covariates are subject to measurement error.

Explanatory variables that suffer from one or more of these issues in the context of a regression are sometimes referred to as endogenous. In this situation, ordinary least squares produces biased and inconsistent estimates.[2] However, if an instrument is available, consistent estimates may still be obtained. An instrument is a variable that does not itself belong in the explanatory equation but is correlated with the endogenous explanatory variables, conditionally on the value of other covariates.

In linear models, there are two main requirements for using IVs:

  • The instrument must be correlated with the endogenous explanatory variables, conditionally on the other covariates. If this correlation is strong, then the instrument is said to have a strong first stage. A weak correlation may provide misleading inferences about parameter estimates and cause the standard errors in the second stage to be larger than the ordinary least squares estimates.[3][4]
  • The instrument cannot be correlated with the error term in the explanatory equation, conditionally on the other covariates. In other words, the instrument cannot suffer from the same problem as the original predicting variable. If this condition is met, then the instrument is said to satisfy the exclusion restriction.

Example

[edit]

Informally, in attempting to estimate the causal effect of some variable X ("covariate" or "explanatory variable") on another Y ("dependent variable"), an instrument is a third variable Z which affects Y only through its effect on X.

For example, suppose a researcher wishes to estimate the causal effect of smoking (X) on general health (Y).[5] Correlation between smoking and health does not imply that smoking causes poor health because other variables, such as depression, may affect both health and smoking, or because health may affect smoking. It is not possible to conduct controlled experiments on smoking status in the general population. The researcher may attempt to estimate the causal effect of smoking on health from observational data by using the tax rate for tobacco products (Z) as an instrument for smoking. The tax rate for tobacco products is a reasonable choice for an instrument because the researcher assumes that it can only be correlated with health through its effect on smoking. If the researcher then finds tobacco taxes and state of health to be correlated, this may be viewed as evidence that smoking causes changes in health.

History

[edit]

The first use of an instrument variable occurred in a 1928 book by Philip G. Wright, best known for his excellent description of the production, transport and sale of vegetable and animal oils in the early 1900s in the United States.[6][7] In 1945, Olav Reiersøl applied the same approach in the context of errors-in-variables models in his dissertation, giving the method its name.[8]

Wright attempted to determine the supply and demand for butter using panel data on prices and quantities sold in the United States. The idea was that a regression analysis could produce a demand or supply curve because they are formed by the path between prices and quantities demanded or supplied. The problem was that the observational data did not form a demand or supply curve as such, but rather a cloud of point observations that took different shapes under varying market conditions. It seemed that making deductions from the data remained elusive.

The problem was that price affected both supply and demand so that a function describing only one of the two could not be constructed directly from the observational data. Wright correctly concluded that he needed a variable that correlated with either demand or supply but not both – that is, an instrumental variable.

After much deliberation, Wright decided to use regional rainfall as his instrumental variable: he concluded that rainfall affected grass production and hence milk production and ultimately butter supply, but not butter demand. In this way he was able to construct a regression equation with only the instrumental variable of price and supply.[9]

Formal definitions of instrumental variables, using counterfactuals and graphical criteria, were given by Judea Pearl in 2000.[10] Angrist and Krueger (2001) present a survey of the history and uses of instrumental variable techniques.[11] Notions of causality in econometrics, and their relationship with instrumental variables and other methods, are discussed by Heckman (2008).[12]

Theory

[edit]

While the ideas behind IV extend to a broad class of models, a very common context for IV is in linear regression. Traditionally,[13] an instrumental variable is defined as a variable that is correlated with the independent variable and uncorrelated with the "error term" U in the linear equation

is a vector. is a matrix, usually with a column of ones and perhaps with additional columns for other covariates. Consider how an instrument allows to be recovered. Recall that OLS solves for such that (when we minimize the sum of squared errors, , the first-order condition is exactly . If the true model is believed to have due to any of the reasons listed above—for example, if there is an omitted variable which affects both and separately—then this OLS procedure will not yield the causal impact of on . OLS will simply pick the parameter that makes the resulting errors appear uncorrelated with .

Consider for simplicity the single-variable case. Suppose we are considering a regression with one variable and a constant (perhaps no other covariates are necessary, or perhaps we have partialed out any other relevant covariates):

In this case, the coefficient on the regressor of interest is given by . Substituting for gives

where is what the estimated coefficient vector would be if . In this case, it can be shown that is an unbiased estimator of . If in the underlying model that we believe, then OLS gives an inconsistent estimate which does not reflect the underlying causal effect of interest. IV helps to fix this problem by identifying the parameters not based on whether is uncorrelated with , but based on whether another variable is uncorrelated with . If theory suggests that is related to (the first stage) but uncorrelated with (the exclusion restriction), then IV may identify the causal parameter of interest where OLS fails. Because there are multiple specific ways of using and deriving IV estimators even in just the linear case (IV, 2SLS, GMM), we save further discussion for the Estimation section below.

Graphical definition

[edit]

IV techniques have been developed among a much broader class of non-linear models. General definitions of instrumental variables, using counterfactual and graphical formalism, were given by Pearl (2000; p. 248).[10] The graphical definition requires that Z satisfy the following conditions:

where stands for d-separation and stands for the graph in which all arrows entering X are cut off.

The counterfactual definition requires that Z satisfies

where Yx stands for the value that Y would attain had X been x and stands for independence.

If there are additional covariates W then the above definitions are modified so that Z qualifies as an instrument if the given criteria hold conditional on W.

The essence of Pearl's definition is:

  1. The equations of interest are "structural", not "regression".
  2. The error term U stands for all exogenous factors that affect Y when X is held constant.
  3. The instrument Z should be independent of U.
  4. The instrument Z should not affect Y when X is held constant (exclusion restriction).
  5. The instrument Z should not be independent of X.

These conditions do not rely on specific functional form of the equations and are applicable therefore to nonlinear equations, where U can be non-additive (see Non-parametric analysis). They are also applicable to a system of multiple equations, in which X (and other factors) affect Y through several intermediate variables. An instrumental variable need not be a cause of X; a proxy of such cause may also be used, if it satisfies conditions 1–5.[10] The exclusion restriction (condition 4) is redundant; it follows from conditions 2 and 3.

Selecting suitable instruments

[edit]

Since U is unobserved, the requirement that Z be independent of U cannot be inferred from data and must instead be determined from the model structure, i.e., the data-generating process. Causal graphs are a representation of this structure, and the graphical definition given above can be used to quickly determine whether a variable Z qualifies as an instrumental variable given a set of covariates W. To see how, consider the following example.

Suppose that we wish to estimate the effect of a university tutoring program on grade point average (GPA). The relationship between attending the tutoring program and GPA may be confounded by a number of factors. Students who attend the tutoring program may care more about their grades or may be struggling with their work. This confounding is depicted in the Figures 1–3 on the right through the bidirected arc between Tutoring Program and GPA. If students are assigned to dormitories at random, the proximity of the student's dorm to the tutoring program is a natural candidate for being an instrumental variable.

However, what if the tutoring program is located in the college library? In that case, Proximity may also cause students to spend more time at the library, which in turn improves their GPA (see Figure 1). Using the causal graph depicted in the Figure 2, we see that Proximity does not qualify as an instrumental variable because it is connected to GPA through the path Proximity Library Hours GPA in . However, if we control for Library Hours by adding it as a covariate then Proximity becomes an instrumental variable, since Proximity is separated from GPA given Library Hours in [citation needed].

Now, suppose that we notice that a student's "natural ability" affects his or her number of hours in the library as well as his or her GPA, as in Figure 3. Using the causal graph, we see that Library Hours is a collider and conditioning on it opens the path Proximity Library Hours GPA. As a result, Proximity cannot be used as an instrumental variable.

Finally, suppose that Library Hours does not actually affect GPA because students who do not study in the library simply study elsewhere, as in Figure 4. In this case, controlling for Library Hours still opens a spurious path from Proximity to GPA. However, if we do not control for Library Hours and remove it as a covariate then Proximity can again be used an instrumental variable.

Estimation

[edit]

We now revisit and expand upon the mechanics of IV in greater detail. Suppose the data are generated by a process of the form

where

  • i indexes observations,
  • is the i-th value of the dependent variable,
  • is a vector of the i-th values of the independent variable(s) and a constant,
  • is the i-th value of an unobserved error term representing all causes of other than , and
  • is an unobserved parameter vector.

The parameter vector is the causal effect on of a one unit change in each element of , holding all other causes of constant. The econometric goal is to estimate . For simplicity's sake assume the draws of e are uncorrelated and that they are drawn from distributions with the same variance (that is, that the errors are serially uncorrelated and homoskedastic).

Suppose also that a regression model of nominally the same form is proposed. Given a random sample of T observations from this process, the ordinary least squares estimator is

where X, y and e denote column vectors of length T. This equation is similar to the equation involving in the introduction (this is the matrix version of that equation). When X and e are uncorrelated, under certain regularity conditions the second term has an expected value conditional on X of zero and converges to zero in the limit, so the estimator is unbiased and consistent. When X and the other unmeasured, causal variables collapsed into the e term are correlated, however, the OLS estimator is generally biased and inconsistent for β. In this case, it is valid to use the estimates to predict values of y given values of X, but the estimate does not recover the causal effect of X on y.

To recover the underlying parameter , we introduce a set of variables Z that is highly correlated with each endogenous component of X but (in our underlying model) is not correlated with e. For simplicity, one might consider X to be a T × 2 matrix composed of a column of constants and one endogenous variable, and Z to be a T × 2 consisting of a column of constants and one instrumental variable. However, this technique generalizes to X being a matrix of a constant and, say, 5 endogenous variables, with Z being a matrix composed of a constant and 5 instruments. In the discussion that follows, we will assume that X is a T × K matrix and leave this value K unspecified. An estimator in which X and Z are both T × K matrices is referred to as just-identified .

Suppose that the relationship between each endogenous component xi and the instruments is given by

The most common IV specification uses the following estimator:

This specification approaches the true parameter as the sample gets large, so long as in the true model:

As long as in the underlying process which generates the data, the appropriate use of the IV estimator will identify this parameter. This works because IV solves for the unique parameter that satisfies , and therefore hones in on the true underlying parameter as the sample size grows.

Now an extension: suppose that there are more instruments than there are covariates in the equation of interest, so that Z is a T × M matrix with M > K. This is often called the over-identified case. In this case, the generalized method of moments (GMM) can be used. The GMM IV estimator is

where refers to the projection matrix .

This expression collapses to the first when the number of instruments is equal to the number of covariates in the equation of interest. The over-identified IV is therefore a generalization of the just-identified IV.

Proof that βGMM collapses to βIV in the just-identified case

Developing the expression:

In the just-identified case, we have as many instruments as covariates, so that the dimension of X is the same as that of Z. Hence, and are all squared matrices of the same dimension. We can expand the inverse, using the fact that, for any invertible n-by-n matrices A and B, (AB)−1 = B−1A−1 (see Invertible matrix#Properties):

Reference: see Davidson and Mackinnnon (1993)[14]: 218 

There is an equivalent under-identified estimator for the case where m < k. Since the parameters are the solutions to a set of linear equations, an under-identified model using the set of equations does not have a unique solution.

Interpretation as two-stage least squares

[edit]

One computational method which can be used to calculate IV estimates is two-stage least squares (2SLS or TSLS). In the first stage, each explanatory variable that is an endogenous covariate in the equation of interest is regressed on all of the exogenous variables in the model, including both exogenous covariates in the equation of interest and the excluded instruments. The predicted values from these regressions are obtained:

Stage 1: Regress each column of X on Z, ():

and save the predicted values:

In the second stage, the regression of interest is estimated as usual, except that in this stage each endogenous covariate is replaced with the predicted values from the first stage:

Stage 2: Regress Y on the predicted values from the first stage:

which gives

This method is only valid in linear models. For categorical endogenous covariates, one might be tempted to use a different first stage than ordinary least squares, such as a probit model for the first stage followed by OLS for the second. This is commonly known in the econometric literature as the forbidden regression,[15] because second-stage IV parameter estimates are consistent only in special cases.[16]

Proof: computation of the 2SLS estimator

The usual OLS estimator is: . Replacing and noting that is a symmetric and idempotent matrix, so that

The resulting estimator of is numerically identical to the expression displayed above. A small correction must be made to the sum-of-squared residuals in the second-stage fitted model in order that the covariance matrix of is calculated correctly.

Non-parametric analysis

[edit]

When the form of the structural equations is unknown, an instrumental variable can still be defined through the equations:

where and are two arbitrary functions and is independent of . Unlike linear models, however, measurements of and do not allow for the identification of the average causal effect of on , denoted ACE

Balke and Pearl [1997] derived tight bounds on ACE and showed that these can provide valuable information on the sign and size of ACE.[17]

In linear analysis, there is no test to falsify the assumption the is instrumental relative to the pair . This is not the case when is discrete. Pearl (2000) has shown that, for all and , the following constraint, called "Instrumental Inequality" must hold whenever satisfies the two equations above:[10]

Interpretation under treatment effect heterogeneity

[edit]

The exposition above assumes that the causal effect of interest does not vary across observations, that is, that is a constant. Generally, different subjects will respond in different ways to changes in the "treatment" x. When this possibility is recognized, the average effect in the population of a change in x on y may differ from the effect in a given subpopulation. For example, the average effect of a job training program may substantially differ across the group of people who actually receive the training and the group which chooses not to receive training. For these reasons, IV methods invoke implicit assumptions on behavioral response, or more generally assumptions over the correlation between the response to treatment and propensity to receive treatment.[18]

The standard IV estimator can recover local average treatment effects (LATE) rather than average treatment effects (ATE).[1] Imbens and Angrist (1994) demonstrate that the linear IV estimate can be interpreted under weak conditions as a weighted average of local average treatment effects, where the weights depend on the elasticity of the endogenous regressor to changes in the instrumental variables. Roughly, that means that the effect of a variable is only revealed for the subpopulations affected by the observed changes in the instruments, and that subpopulations which respond most to changes in the instruments will have the largest effects on the magnitude of the IV estimate.

For example, if a researcher uses presence of a land-grant college as an instrument for college education in an earnings regression, she identifies the effect of college on earnings in the subpopulation which would obtain a college degree if a college is present but which would not obtain a degree if a college is not present. This empirical approach does not, without further assumptions, tell the researcher anything about the effect of college among people who would either always or never get a college degree regardless of whether a local college exists.

Weak instruments problem

[edit]

As Bound, Jaeger, and Baker (1995) note, a problem is caused by the selection of "weak" instruments, instruments that are poor predictors of the endogenous question predictor in the first-stage equation.[19] In this case, the prediction of the question predictor by the instrument will be poor and the predicted values will have very little variation. Consequently, they are unlikely to have much success in predicting the ultimate outcome when they are used to replace the question predictor in the second-stage equation.

In the context of the smoking and health example discussed above, tobacco taxes are weak instruments for smoking if smoking status is largely unresponsive to changes in taxes. If higher taxes do not induce people to quit smoking (or not start smoking), then variation in tax rates tells us nothing about the effect of smoking on health. If taxes affect health through channels other than through their effect on smoking, then the instruments are invalid and the instrumental variables approach may yield misleading results. For example, places and times with relatively health-conscious populations may both implement high tobacco taxes and exhibit better health even holding smoking rates constant, so we would observe a correlation between health and tobacco taxes even if it were the case that smoking has no effect on health. In this case, we would be mistaken to infer a causal effect of smoking on health from the observed correlation between tobacco taxes and health.

Testing for weak instruments

[edit]

The strength of the instruments can be directly assessed because both the endogenous covariates and the instruments are observable.[20] A common rule of thumb for models with one endogenous regressor is: the F-statistic against the null that the excluded instruments are irrelevant in the first-stage regression should be larger than 10.

Statistical inference and hypothesis testing

[edit]

When the covariates are exogenous, the small-sample properties of the OLS estimator can be derived in a straightforward manner by calculating moments of the estimator conditional on X. When some of the covariates are endogenous so that instrumental variables estimation is implemented, simple expressions for the moments of the estimator cannot be so obtained. Generally, instrumental variables estimators only have desirable asymptotic, not finite sample, properties, and inference is based on asymptotic approximations to the sampling distribution of the estimator. Even when the instruments are uncorrelated with the error in the equation of interest and when the instruments are not weak, the finite sample properties of the instrumental variables estimator may be poor. For example, exactly identified models produce finite sample estimators with no moments, so the estimator can be said to be neither biased nor unbiased, the nominal size of test statistics may be substantially distorted, and the estimates may commonly be far away from the true value of the parameter.[21]

Testing the exclusion restriction

[edit]

The assumption that the instruments are not correlated with the error term in the equation of interest is not testable in exactly identified models. If the model is overidentified, there is information available which may be used to test this assumption. The most common test of these overidentifying restrictions, called the Sargan–Hansen test, is based on the observation that the residuals should be uncorrelated with the set of exogenous variables if the instruments are truly exogenous.[22] The Sargan–Hansen test statistic can be calculated as (the number of observations multiplied by the coefficient of determination) from the OLS regression of the residuals onto the set of exogenous variables. This statistic will be asymptotically chi-squared with m − k degrees of freedom under the null that the error term is uncorrelated with the instruments.

See also

[edit]

References

[edit]

Further reading

[edit]

Bibliography

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Instrumental variables (IV) estimation is a statistical method in and used to identify and estimate causal effects when an explanatory variable is endogenous—meaning it is correlated with the unobserved error term in a regression model—such as due to omitted variables, measurement error, or reverse causality. The approach relies on an instrumental variable (or instrument), a variable that is correlated with the endogenous explanatory variable (relevance condition) but uncorrelated with the error term (exogeneity or exclusion restriction), enabling consistent estimation of the causal parameter without bias from endogeneity. The primary purpose of IV estimation is to mimic the conditions of a in observational data by leveraging the instrument to isolate exogenous variation in the treatment or explanatory variable, thereby addressing threats to that plague ordinary least squares (OLS) regression. Common applications include estimating returns to using quarter of birth as an instrument for schooling (due to compulsory schooling laws), or the impact of interventions like lotteries or natural experiments where direct is infeasible. For identification, IV requires two core assumptions: the instrument must influence the endogenous variable (first-stage , often tested via an F-statistic greater than 10 to avoid weak instrument ) and must affect the outcome only through the endogenous variable (exclusion restriction, which is non-testable and relies on theoretical justification). Additional assumptions, such as monotonicity (no "defiers" who respond oppositely to the instrument), ensure the estimator recovers the local (LATE) for "compliers"—those whose treatment status changes with the instrument—rather than the for the entire population. In practice, IV estimation is implemented through methods like the simple Wald estimator for binary treatments and instruments, given by the ratio of the reduced-form effect of the instrument on the outcome to its first-stage effect on the treatment: β^IV=Cov(Y,Z)Cov(D,Z)\hat{\beta}_{IV} = \frac{\text{Cov}(Y, Z)}{\text{Cov}(D, Z)}, or more generally via two-stage (2SLS), where the endogenous variable is first regressed on the instrument(s) to obtain predicted values, which are then used in the second-stage regression on the outcome. While 2SLS is efficient under homoskedasticity and provides standard errors that account for the two-stage procedure, challenges include weak instruments (which bias estimates toward OLS and inflate variance), overidentification (when multiple instruments are available, tested via Sargan or Hansen J-statistics), and the need for robust inference in heteroskedastic data. IV methods have become foundational in empirical economics, policy evaluation, and beyond, as detailed in influential texts like Mostly Harmless Econometrics by Angrist and Pischke.

Motivation and Examples

Endogeneity in Regression Models

In regression models, endogeneity occurs when one or more explanatory variables are correlated with the disturbance term, violating the assumption of strict exogeneity that is necessary for ordinary least squares (OLS) to produce unbiased and consistent estimates. This correlation implies that the explanatory variables are not independent of the unobservable factors captured by the error term, leading to systematic errors in parameter estimation. The main sources of endogeneity in OLS regression include , in the explanatory variables, and simultaneous causation. arises when a relevant variable that affects both the dependent variable and the included explanatory variables is excluded from the model, causing the term to absorb its influence and correlate with the regressors. in regressors, particularly classical errors where the observed variable equals the plus an uncorrelated , attenuates coefficients but can induce endogeneity if the is nonclassical or correlated with the true variable. Simultaneous causation, common in economic systems, occurs when the dependent variable influences the explanatory variable in the same period, as in supply-demand models, creating mutual dependence that correlates both with the . Consider the linear structural equation y=Xβ+ϵy = X\beta + \epsilon, where yy is the dependent variable, XX includes the explanatory variables, β\beta are the parameters of interest, and ϵ\epsilon is the error term. Under the exogeneity assumption, \Cov(X,ϵ)=0\Cov(X, \epsilon) = 0, which ensures that OLS consistently estimates β\beta by projecting yy onto XX. In the presence of endogeneity, however, \Cov(X,ϵ)0\Cov(X, \epsilon) \neq 0, so the OLS is inconsistent, as it attributes part of the error's variation to the explanatory variables. The consequences of endogeneity are evident in the asymptotic behavior of the . In the simple univariate case with y=βx+ϵy = \beta x + \epsilon and \E(ϵx)0\E(\epsilon | x) \neq 0, the probability limit is given by plimβ^\OLS=β+\Cov(x,ϵ)\Var(x),\plim \hat{\beta}_{\OLS} = \beta + \frac{\Cov(x, \epsilon)}{\Var(x)}, where the second term represents the bias that does not vanish as the sample size increases, potentially overstating or understating the true effect depending on the sign of the . This inconsistency undermines , as the estimated coefficients reflect rather than the isolated impact of xx on yy. Instrumental variables methods can mitigate this issue by leveraging exogenous variation in instruments correlated with XX but not with ϵ\epsilon, enabling consistent estimation without relying on the violated exogeneity assumption.

Illustrative Applications

One prominent application of instrumental variables (IV) estimation addresses the endogeneity in estimating returns to due to ability bias, where unobserved individual correlates with both levels and wages, biasing ordinary least squares (OLS) estimates upward. In a seminal study, Angrist and Krueger (1991) used quarter of birth as an instrument for years of schooling in wage regressions, exploiting compulsory schooling laws that create exogenous variation in based on birth timing—children born in the first quarter of the year are older when starting school and thus more likely to complete additional years compared to those born later. This instrument is relevant because it predicts attainment, and it satisfies the exclusion restriction by affecting wages only through , as birth quarter is unrelated to innate or other wage determinants. Their IV estimates suggested a 7-10% return to an additional year of schooling, lower than the OLS estimate of around 12%, correcting for the ability bias. Another classic example involves using geographic proximity to colleges as an instrument for in estimating wage returns. Card (1995) analyzed data from the National Longitudinal Survey of Young Men cohort, using distance to the nearest as an instrument for years of schooling. Proximity to a reduces the costs of higher education and increases the likelihood of , providing exogenous variation, while it is assumed to affect wages primarily through education rather than directly through local labor market conditions. The IV estimates yielded returns to schooling of 9-13%, higher than the OLS estimate of about 7%, mitigating downward biases in OLS from factors such as measurement error in ability or omitted variables affecting both education and earnings. To illustrate endogeneity and IV correction conceptually, consider a simple simulated data generating process where the true causal effect of an endogenous regressor xx on outcome yy is β=0.5\beta = 0.5, but xx correlates with the error term due to omitted variables. Specifically, generate data with y=0.5x+uy = 0.5x + u, where x=πz+vx = \pi z + v, zN(2,1)z \sim N(2,1) is the instrument, and errors (u,v)(u, v) are jointly normal with correlation 0.8 and unit variance, using a large sample of 10,000 observations. OLS estimation yields a biased coefficient of approximately 0.902, overestimating the true effect because the endogeneity inflates the covariance between xx and the error. In contrast, IV using zz produces an estimate of about 0.510, closely recovering the true β\beta, as the instrument provides exogenous variation uncorrelated with uu but predictive of xx. This simulation highlights how IV isolates the causal channel, reducing bias without requiring direct measurement of confounders. In policy evaluation, IV methods have been applied to assess causal effects in randomized experiments with non-compliance, such as the impact of on student test scores. The Student-Teacher Achievement Ratio (STAR) experiment randomly assigned over 11,600 students to small (13-17 pupils) or regular (22-25 pupils) classes from 1985-1989, but some students switched classes, leading to endogeneity in observed . Krueger (1999) used initial to small classes as an instrument for actual attended, which is relevant since assignment predicts enrollment and exogenous under the , satisfying exclusion by affecting scores only through . The IV estimates indicated that reducing by 10 students increased test scores by about 0.2 standard deviations, particularly benefiting disadvantaged students, providing evidence for policy interventions like smaller classes in early grades.

Historical Development

Early Formulations

The origins of instrumental variables estimation trace back to early 20th-century developments in statistics and , where precursors to modern methods emerged. , a , introduced path analysis in the as a technique to decompose correlations into direct and indirect causal effects within structural equation models, laying foundational groundwork for handling interdependent relationships in data. This method, detailed in his seminal 1921 paper, emphasized the use of diagrammatic representations to trace causal paths, which anticipated later econometric tools for addressing confounding in simultaneous systems. Philip G. Wright, an economist and son of , extended these ideas to economic applications in 1928, proposing an early form of instrumental variables as a solution to identification challenges in models affected by simultaneity. In his analysis of tariffs on animal and vegetable oils, Wright suggested using external variables—such as lagged prices or exogenous factors—to isolate causal effects, effectively treating them as instruments to mitigate endogeneity from correlated errors. This grouping approach served as a precursor to instrumental variables estimation, demonstrating its utility in by averaging estimates across multiple instruments to improve reliability in the presence of measurement errors and omitted variables. In the 1940s, econometricians began formalizing these concepts amid growing recognition of simultaneity biases in economic models. Trygve Haavelmo's probability approach revolutionized the field by framing econometric within a framework, explicitly highlighting how simultaneous equations lead to biased ordinary least squares estimates due to correlated disturbances. His work underscored the need for identification strategies to distinguish structural relations from reduced-form correlations, setting the stage for methods to resolve these issues. Complementing this, Tjalling C. Koopmans and William C. Hood advanced identification theory in simultaneous systems during the mid-1940s Cowles Commission efforts, emphasizing conditions under which exogenous variables could uniquely determine model parameters. Their contributions clarified the role of restrictions—like exclusion conditions—in enabling consistent estimation, bridging early statistical insights to rigorous econometric practice.

Key Advancements in Econometrics

In the 1950s, the Cowles Commission played a pivotal role in formalizing instrumental variables within the framework of simultaneous equations models, building on earlier inspirations from statistical identification problems. A foundational contribution came from T.W. Anderson and H. Rubin, who in 1949 developed methods for estimating parameters of a single equation in a complete system of linear stochastic relations, emphasizing conditions for identification using instrumental variables to address simultaneity bias. This work established the rank and order conditions for identification, which remain central to IV theory. Complementing this, R.L. Basmann in 1957 proposed a generalized classical method of linear estimation for structural equations, introducing the limited information maximum likelihood (LIML) estimator as an alternative to full-system approaches, which proved computationally efficient for overidentified models. Henri Theil's 1953 development of k-class estimators marked a significant precursor to two-stage (2SLS), offering a flexible family of estimators that interpolate between ordinary and indirect by adjusting for endogeneity via variables. These estimators, detailed in Theil's mimeographed and later elaborated in his writings, provided a unified framework for handling incomplete observations and simultaneity in multiple regression contexts. The 1960s saw further advancements integrating Bayesian perspectives and efficient estimation techniques. Arnold Zellner introduced Bayesian approaches to instrumental variables, particularly in analyzing regression models with unobservable variables, as explored in his 1970 work that laid groundwork for posterior inference in IV settings during the decade's Bayesian surge. Concurrently, Dale W. Jorgenson contributed early ideas toward (GMM) through efficient instrumental variables estimation in simultaneous equations, notably in his 1971 collaboration with J.M. Brundy on constructing optimal instruments without initial reduced-form estimation. By the 1970s, Arthur S. Goldberger's writings solidified IV applications in linear models, with his 1972 paper on of regressions containing unobservables highlighting IV's role in handling measurement error and endogeneity. Goldberger's contributions, including extensions to full information estimators, influenced pedagogical texts and practical implementations, emphasizing the method's robustness in econometric modeling.

Core Theory and Assumptions

Identification Conditions

Instrumental variables estimation addresses endogeneity in the y=Xβ+uy = X\beta + u, where yy is an n×1n \times 1 vector of outcomes, XX is an n×Kn \times K matrix of endogenous regressors, β\beta is a K×1K \times 1 vector of parameters, and uu is an n×1n \times 1 term with E(Xu)0E(X'u) \neq 0. An instrument matrix ZZ ( n×Ln \times L ) is introduced such that the first-stage relation is X=ZΠ+VX = Z\Pi + V, where Π\Pi is an L×KL \times K matrix of coefficients, VV is an matrix, and the exogeneity condition holds: E(Zu)=0E(Z'u) = 0. The rank of ZZ is assumed to be LL, with LKL \geq K allowing for potential overidentification. Identification requires two key conditions: the order condition and the rank condition. The order condition states that the model is if the number of instruments LL is at least as large as the number of endogenous regressors KK (i.e., LKL \geq K). This ensures there are sufficient independent sources of exogenous variation to solve for the KK parameters in β\beta. When L=KL = K, the model is just-identified, yielding a unique solution analogous to solving a square ; when L>KL > K, it is over-identified, providing additional instruments that allow testing of overidentifying restrictions but requiring all instruments to satisfy exogeneity. The intuition for solvability under LKL \geq K is that the instruments must span the space of the endogenous regressors in the projection onto the exogenous variation, preventing of the structural parameters. The rank condition complements the order condition by requiring that the matrix E(ZX)E(Z'X) (or equivalently, Π\Pi) has full column rank KK, meaning the instruments are relevant and provide linearly independent variation in the endogenous regressors. This ensures that the covariance between ZZ and XX is of full rank, so the first-stage projection isolates exogenous components without collapse to zero or singularity. Without full rank, even if LKL \geq K, identification fails as the instruments do not sufficiently predict XX. Under these conditions, β\beta is identified via the population moment condition: there exists a matrix ZZ such that E[Z(Xβ+u)]=E[ZX]β,E[Z(X\beta + u)] = E[ZX]\beta, which simplifies to E[ZX]βE[ZX]\beta since E[Zu]=0E[Zu] = 0. This equates the projected moments of the outcome equation to the structural parameters, enabling consistent estimation when the order and rank conditions hold.

Exclusion and Relevance Restrictions

The exclusion restriction and the relevance condition constitute the two core assumptions underlying the validity of instrumental variables (IV) estimation. The relevance condition requires that the instrument ZZ is sufficiently correlated with the endogenous explanatory variable XX, ensuring that ZZ provides meaningful variation in XX for identification purposes. In the case of a single endogenous regressor, this is expressed as Corr(Z,X)0\operatorname{Corr}(Z, X) \neq 0; more generally, for multiple instruments and regressors, the matrix E[ZX]E[Z'X] must have full column rank. Violation of relevance leads to weak instruments, where the IV estimator exhibits substantial finite-sample bias and poor inference properties, even in large samples. Instruments are considered strong if the first-stage F-statistic exceeds 10, a rule of thumb indicating adequate explanatory power of ZZ for XX. The exclusion restriction mandates that the instrument ZZ influences the outcome YY solely through its effect on the endogenous variable XX, with no direct pathway from ZZ to YY. Mathematically, this implies that the partial derivative of YY with respect to ZZ, holding XX fixed, is zero: YZ=0\frac{\partial Y}{\partial Z} = 0. Equivalently, in the structural equation for YY, the coefficient on ZZ is zero, as ZZ is excluded from this equation after accounting for its role via XX. Breaches of the exclusion restriction introduce direct confounding, rendering the IV estimator inconsistent by failing to isolate the causal channel through XX. Both restrictions must hold jointly to ensure the consistency of the IV estimator, as relevance alone cannot compensate for exclusion violations, and vice versa; their absence results in biased estimates that mimic ordinary least squares inconsistencies. In settings with binary treatment, the exclusion restriction is often supplemented by a monotonicity assumption, which posits that the instrument does not reverse the treatment assignment for any subgroup (i.e., no "defiers" exist), thereby supporting interpretation of the IV estimand without altering the core exclusion requirement.

Graphical and Conceptual Frameworks

Directed Acyclic Graphs for IV

Directed acyclic graphs (DAGs) provide a visual framework for representing causal assumptions in instrumental variables (IV) estimation, where nodes represent variables and directed arrows denote causal influences. These graphs are acyclic, meaning no cycles exist among the arrows, ensuring a clear temporal or causal ordering. Backdoor paths in DAGs illustrate , defined as non-directed paths from the treatment variable to the outcome that pass through common causes, potentially biasing causal estimates if unblocked. In IV applications, a DAG typically depicts the instrument ZZ causally influencing the endogenous treatment XX, which in turn affects the outcome YY, forming the path ZXYZ \to X \to Y. Crucially, no direct connects ZZ to YY (enforcing the exclusion restriction), and ZZ shares no common unobserved causes with YY or XX beyond this path (ensuring independence from unobservables). This configuration blocks backdoor paths from XX to YY—such as those through unobserved confounders—by leveraging ZZ's exogeneity, allowing identification of the causal effect of XX on YY. A illustrative example involves estimating the causal effect of (XX) on wages (YY), confounded by unobserved (UU). The DAG includes arrows UXU \to X and UYU \to Y for , XYX \to Y for the treatment effect, and quarter of birth (ZZ) as the instrument with ZXZ \to X (due to compulsory schooling laws tying school entry to birth quarter), but no arrows from ZZ to YY or ZZ to UU. This structure isolates the effect of education by exploiting variation in ZZ that influences schooling without directly impacting wages or ability. The IV path in a DAG parallels the front-door criterion, where causation is identified via an intermediate variable free of direct , but differs in that [Z](/page/Z)[Z](/page/Z) serves as an external entry point rather than an observed . D-separation, a graphical criterion, verifies identification by confirming that conditioning on appropriate variables (here, the instrument's role) closes all backdoor paths while leaving the causal path open, thus enabling unbiased estimation.

Criteria for Instrument Selection

Selecting valid instruments in instrumental variables (IV) estimation requires satisfying two primary conditions: , where the instrument correlates strongly with the endogenous explanatory variable, and the exclusion restriction, where the instrument affects the outcome only through the endogenous variable. These criteria ensure the instrument provides exogenous variation for causal identification without introducing . Economic theory often guides initial selection by identifying variables that plausibly influence the endogenous regressor, such as policy changes or natural experiments that shift behavior without directly impacting outcomes. Relevance is assessed through both theoretical justification and empirical pre-tests. Theoretically, instruments should stem from mechanisms that credibly affect the endogenous variable, like exogenous shocks in supply chains influencing firm . Empirically, the first-stage regression tests this via t-statistics on the instrument's or, preferably, the F-statistic on excluded instruments, with a requiring an F-statistic greater than 10 to avoid weak instrument bias. Weak occurs when the is low, leading to finite-sample biases that inflate standard errors and distort , as demonstrated in simulations where low first-stage correlations produced IV estimates deviating substantially from true effects. Validating the exclusion restriction relies heavily on , as it cannot be directly tested from data alone. Instruments should represent exogenous shocks uncorrelated with unobservables affecting the outcome, such as randomized lotteries assigned to treatments, ensuring no direct pathway to the dependent variable. Researchers must rule out direct effects through theoretical arguments, for instance, confirming that a geographic variation influences only via and not through local economic spillovers. Directed acyclic graphs can aid this by visualizing potential pathways, highlighting instruments that block but preserve the desired link. A key trade-off in instrument selection involves the number of instruments: more instruments enhance efficiency by exploiting additional variation, but they increase the of including invalid ones, amplifying bias if exclusion fails for even a . A common guideline for overidentified models is to use one more instrument than endogenous regressors (L = K + 1), allowing an overidentification while minimizing proliferation risks. Common pitfalls include irrelevant instruments, which weaken identification and mimic OLS biases, and invalid ones correlated with errors, violating exogeneity. For example, using random lottery numbers as an instrument for income effects would fail relevance if they do not correlate with earnings-determining choices, rendering the approach ineffective despite randomness. Such errors underscore the need for rigorous pre-selection scrutiny to balance theoretical plausibility with empirical strength.

Estimation Procedures

Two-Stage Least Squares

Two-stage least squares (2SLS) is a widely used for instrumental variables (IV) models in settings where endogenous regressors require correction for with the error term. It operates by projecting the endogenous variables onto the space spanned by the instruments in a preliminary step, thereby isolating the exogenous variation needed for consistent . The method was independently developed by Theil in 1953 and Basmann in 1957 as a practical approach to estimating parameters in systems of simultaneous equations. Under the standard IV assumptions of and exogeneity of the instruments, 2SLS delivers consistent estimates of the structural parameters. The procedure consists of two distinct stages. In the first stage, each endogenous regressor XX (an n×kn \times k matrix) is regressed on the matrix of instruments ZZ (an n×mn \times m matrix, where mkm \geq k) using ordinary least squares (OLS), yielding the fitted values X^=Z(ZZ)1ZX\hat{X} = Z(Z'Z)^{-1}Z'X. This step purges the endogenous components from XX, producing an instrumented version X^\hat{X} that is uncorrelated with the structural error term. In the second stage, the outcome variable yy (an n×1n \times 1 vector) is regressed on X^\hat{X} via OLS, resulting in the 2SLS estimator β^2SLS=(X^X^)1X^y\hat{\beta}_{2SLS} = (\hat{X}'\hat{X})^{-1}\hat{X}'y. An equivalent closed-form expression for the 2SLS estimator avoids explicit computation of X^\hat{X} and is given by β^2SLS=(XPZX)1XPZy\hat{\beta}_{2SLS} = (X'P_Z X)^{-1} X'P_Z y, where PZ=Z(ZZ)1ZP_Z = Z(Z'Z)^{-1}Z' is the projection matrix onto the column space of ZZ. This formulation ensures numerical stability and is the basis for implementation in statistical software. Directly substituting the generated regressors X^\hat{X} into the second-stage OLS can introduce an errors-in-variables bias in the estimated standard errors; thus, modern implementations compute the closed form to obtain correct inference. In the just-identified case where the number of instruments equals the number of endogenous regressors (m=km = k), the 2SLS coincides exactly with the simple IV and is uniquely determined without projection. More generally, 2SLS is consistent for the true parameters as the sample size grows, provided the instruments satisfy the relevance condition (nonzero correlation with the endogenous regressors) and the exclusion restriction (uncorrelated with the error term). The is asymptotically normal, enabling standard testing and confidence intervals under homoskedasticity, though robust variants address heteroskedasticity.

Generalized Method of Moments

The (GMM) serves as a broad framework for instrumental variables (IV) estimation, encompassing and generalizing approaches like two-stage (2SLS) by exploiting moment conditions derived from the of instruments to the error term. In the linear IV model y=Xβ+uy = X \beta + u, where XX includes endogenous regressors and instruments ZZ satisfy E[ZTu]=0E[Z^T u] = 0, the population moment conditions are E[ZT(yXβ)]=0E[Z^T (y - X \beta)] = 0. The GMM estimator targets these by minimizing a of the sample moments: β^GMM=argminβ(1nZT(yXβ))TW(1nZT(yXβ)),\hat{\beta}_{\text{GMM}} = \arg\min_{\beta} \left( \frac{1}{n} Z^T (y - X \beta) \right)^T W \left( \frac{1}{n} Z^T (y - X \beta) \right), where nn is the sample size and WW is a positive definite weighting matrix. This setup allows for flexible estimation when the number of instruments exceeds the number of endogenous regressors, enabling efficiency improvements over simpler methods. The efficiency of the GMM estimator depends critically on the choice of WW; the optimal weighting matrix is the inverse of the asymptotic of the sample moments, W=S1W = S^{-1}, where S=AsyVar(n1nZTu)S = \text{AsyVar}(\sqrt{n} \cdot \frac{1}{n} Z^T u)
Add your contribution
Related Hubs
User Avatar
No comments yet.