Endogeneity (econometrics)

In econometrics, endogeneity broadly refers to situations in which an explanatory variable is correlated with the error term.^[1]

In simplest terms, endogeneity means that a factor or cause one uses to explain something as an outcome is also being influenced by that same thing. For example, education can affect income, but income can also affect how much education someone gets. When this happens, one's analysis might wrongly estimate cause and effect. The thing one thinks is causing change is also being influenced by the outcome, making the results unreliable.

The concept originates from simultaneous equations models, in which one distinguishes variables whose values are determined within the economic model (endogenous) from those that are predetermined (exogenous).^[a]^[2]

Ignoring simultaneity in estimation leads to biased and inconsistent estimators, as it violates the exogeneity condition of the Gauss–Markov theorem. This issue is often overlooked in non-experimental research, which limits the validity of causal inference and the ability to draw reliable policy recommendations.^[3]

Common solutions to address endogeneity include the use of instrumental variable techniques, which provide consistent estimators by introducing variables that are correlated with the endogenous explanatory variable but uncorrelated with the error term.

Besides simultaneity, correlation between explanatory variables and the error term can arise when an unobserved or omitted variable is confounding both independent and dependent variables, or when independent variables are measured with error.^[4]

Exogeneity versus endogeneity

In a stochastic model, the notion of the usual exogeneity, sequential exogeneity, strong/strict exogeneity can be defined. Exogeneity is articulated in such a way that a variable or variables is exogenous for parameter $\alpha$ . Even if a variable is exogenous for parameter $\alpha$ , it might be endogenous for parameter $\beta$ .

When the explanatory variables are not stochastic, then they are strong exogenous for all the parameters.

If the independent variable is correlated with the error term in a regression model then the estimate of the regression coefficient in an ordinary least squares (OLS) regression is biased; however if the correlation is not contemporaneous, then the coefficient estimate may still be consistent. There are many methods of correcting the bias, including instrumental variable regression and Heckman selection correction.

Static models

The following are some common sources of endogeneity.

Omitted variable

In this case, the endogeneity comes from an uncontrolled confounding variable, a variable that is correlated with both the independent variable in the model and with the error term. (Equivalently, the omitted variable affects the independent variable and separately affects the dependent variable.)

Assume that the "true" model to be estimated is

y_{i}=\alpha +\beta x_{i}+\gamma z_{i}+u_{i}

but $z_{i}$ is omitted from the regression model (perhaps because there is no way to measure it directly). Then the model that is actually estimated is

y_{i}=\alpha +\beta x_{i}+\varepsilon _{i}

where $\varepsilon _{i}=\gamma z_{i}+u_{i}$ (thus, the $z_{i}$ term has been absorbed into the error term).

If the correlation of $x$ and $z$ is not 0 and $z$ separately affects $y$ (meaning $\gamma \neq 0$ ), then $x$ is correlated with the error term $\varepsilon$ .

Here, $x$ is not exogenous for $\alpha$ and $\beta$ , since, given $x$ , the distribution of $y$ depends not only on $\alpha$ and $\beta$ , but also on $z$ and $\gamma$ .

Measurement error

Suppose that a perfect measure of an independent variable is impossible. That is, instead of observing $x_{i}^{*}$ , what is actually observed is $x_{i}=x_{i}^{*}+\nu _{i}$ where $\nu _{i}$ is the measurement error or "noise". In this case, a model given by

y_{i}=\alpha +\beta x_{i}^{*}+\varepsilon _{i}

can be written in terms of observables and error terms as

{\begin{aligned}y_{i}&=\alpha +\beta (x_{i}-\nu _{i})+\varepsilon _{i}\\[3pt]y_{i}&=\alpha +\beta x_{i}+(\varepsilon _{i}-\beta \nu _{i})\\[3pt]y_{i}&=\alpha +\beta x_{i}+u_{i}\quad ({\text{where }}u_{i}=\varepsilon _{i}-\beta \nu _{i})\end{aligned}}

Since both $x_{i}$ and $u_{i}$ depend on $\nu _{i}$ , they are correlated, so the OLS estimation of $\beta$ will be biased downward.

Measurement error in the dependent variable, $y_{i}$ , does not cause endogeneity, though it does increase the variance of the error term.

Simultaneity

Suppose that two variables are codetermined, with each affecting the other according to the following "structural" equations:

y_{i}=\beta _{1}x_{i}+\gamma _{1}z_{i}+u_{i}

z_{i}=\beta _{2}x_{i}+\gamma _{2}y_{i}+v_{i}

Estimating either equation by itself results in endogeneity. In the case of the first structural equation, $E(z_{i}u_{i})\neq 0$ . Solving for $z_{i}$ while assuming that $1-\gamma _{1}\gamma _{2}\neq 0$ results in

z_{i}={\frac {\beta _{2}+\gamma _{2}\beta _{1}}{1-\gamma _{1}\gamma _{2}}}x_{i}+{\frac {1}{1-\gamma _{1}\gamma _{2}}}v_{i}+{\frac {\gamma _{2}}{1-\gamma _{1}\gamma _{2}}}u_{i}

.

Assuming that $x_{i}$ and $v_{i}$ are uncorrelated with $u_{i}$ ,

\operatorname {E} (z_{i}u_{i})={\frac {\gamma _{2}}{1-\gamma _{1}\gamma _{2}}}\operatorname {E} (u_{i}u_{i})\neq 0

.

Therefore, attempts at estimating either structural equation will be hampered by endogeneity.

Dynamic models

The endogeneity problem is particularly relevant in the context of time series analysis of causal processes. It is common for some factors within a causal system to be dependent for their value in period t on the values of other factors in the causal system in period t − 1. Suppose that the level of pest infestation is independent of all other factors within a given period, but is influenced by the level of rainfall and fertilizer in the preceding period. In this instance it would be correct to say that infestation is exogenous within the period, but endogenous over time.

Let the model be y = f(x, z) + u. If the variable x is sequential exogenous for parameter $\alpha$ , and y does not cause x in the Granger sense, then the variable x is strongly/strictly exogenous for the parameter $\alpha$ .

Simultaneity

Generally speaking, simultaneity occurs in the dynamic model just like in the example of static simultaneity above.

Footnotes

^ For example, in a simple supply and demand model, when predicting the equilibrium quantity demanded, the price is endogenous because producers adjust their prices in response to demand, and consumers adjust their demand in response to price. In this case, the price variable exhibits total endogeneity once the demand and supply curves are specified. By contrast, a change in consumer tastes or preferences represents an exogenous shift in the demand curve.

References

^ Wooldridge, Jeffrey M. (2009). Introductory Econometrics: A Modern Approach (4th ed.). Australia: South-Western. p. 88. ISBN 978-0-324-66054-8.
^ Kmenta, Jan (1986). Elements of Econometrics (2nd ed.). New York: MacMillan. pp. 652–653. ISBN 0-02-365070-2.
^ Antonakis, John; Bendahan, Samuel; Jacquart, Philippe; Lalive, Rafael (December 2010). "On making causal claims: A review and recommendations" (PDF). The Leadership Quarterly. 21 (6): 1086–1120. doi:10.1016/j.leaqua.2010.10.010. ISSN 1048-9843.
^ Johnston, John (1972). Econometric Methods (Second ed.). New York: McGraw-Hill. pp. 267–291. ISBN 0-07-032679-7.

External links

[2] For example, in a simple supply and demand model, when predicting the equilibrium quantity demanded, the price is endogenous because producers adjust their prices in response to demand, and consumers adjust their demand in response to price. In this case, the price variable exhibits total endogeneity once the demand and supply curves are specified. By contrast, a change in consumer tastes or preferences represents an exogenous shift in the demand curve.

[1] Wooldridge, Jeffrey M. (2009). Introductory Econometrics: A Modern Approach (4th ed.). Australia: South-Western. p. 88. ISBN 978-0-324-66054-8.

[3] Kmenta, Jan (1986). Elements of Econometrics (2nd ed.). New York: MacMillan. pp. 652–653. ISBN 0-02-365070-2.

[4] Antonakis, John; Bendahan, Samuel; Jacquart, Philippe; Lalive, Rafael (December 2010). "On making causal claims: A review and recommendations" (PDF). The Leadership Quarterly. 21 (6): 1086–1120. doi:10.1016/j.leaqua.2010.10.010. ISSN 1048-9843.

[5] Johnston, John (1972). Econometric Methods (Second ed.). New York: McGraw-Hill. pp. 267–291. ISBN 0-07-032679-7.

[1]

[a]

[2]

[3]

[4]

History

Endogeneity (econometrics)

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Endogeneity (econometrics)

Exogeneity versus endogeneity

Static models

Omitted variable

Measurement error

Simultaneity

Dynamic models

Simultaneity

See also

Footnotes

References

Further reading

External links