Hubbry Logo
Parameter identification problemParameter identification problemMain
Open search
Parameter identification problem
Community hub
Parameter identification problem
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Parameter identification problem
Parameter identification problem
from Wikipedia

In economics and econometrics, the parameter identification problem arises when the value of one or more parameters in an economic model cannot be determined from observable variables. It is closely related to non-identifiability in statistics and econometrics, which occurs when a statistical model has more than one set of parameters that generate the same distribution of observations, meaning that multiple parameterizations are observationally equivalent.

For example, this problem can occur in the estimation of multiple-equation econometric models where the equations have variables in common.

In simultaneous equations models

[edit]

Standard example, with two equations

[edit]

Consider a linear model for the supply and demand of some specific good. The quantity demanded varies negatively with the price: a higher price decreases the quantity demanded. The quantity supplied varies directly with the price: a higher price increases the quantity supplied.

Assume that, say for several years, we have data on both the price and the traded quantity of this good. Unfortunately this is not enough to identify the two equations (demand and supply) using regression analysis on observations of Q and P: one cannot estimate a downward slope and an upward slope with one linear regression line involving only two variables. Additional variables can make it possible to identify the individual relations.

Supply and demand
Supply and demand

In the graph shown here, the supply curve (red line, upward sloping) shows the quantity supplied depending positively on the price, while the demand curve (black lines, downward sloping) shows quantity depending negatively on the price and also on some additional variable Z, which affects the location of the demand curve in quantity-price space. This Z might be consumers' income, with a rise in income shifting the demand curve outwards. This is symbolically indicated with the values 1, 2 and 3 for Z.

With the quantities supplied and demanded being equal, the observations on quantity and price are the three white points in the graph: they reveal the supply curve. Hence the effect of Z on demand makes it possible to identify the (positive) slope of the supply equation. The (negative) slope parameter of the demand equation cannot be identified in this case. In other words, the parameters of an equation can be identified if it is known that some variable does not enter into the equation, while it does enter the other equation.

A situation in which both the supply and the demand equation are identified arises if there is not only a variable Z entering the demand equation but not the supply equation, but also a variable X entering the supply equation but not the demand equation:

supply:   
demand:  

with positive bS and negative bD. Here both equations are identified if c and d are nonzero.

Note that this is the structural form of the model, showing the relations between the Q and P. The reduced form however can be identified easily.

Fisher points out that this problem is fundamental to the model, and not a matter of statistical estimation:

It is important to note that the problem is not one of the appropriateness of a particular estimation technique. In the situation described [without the Z variable], there clearly exists no way using any technique whatsoever in which the true demand (or supply) curve can be estimated. Nor, indeed, is the problem here one of statistical inference—of separating out the effects of random disturbance. There is no disturbance in this model [...] It is the logic of the supply-demand equilibrium itself which leads to the difficulty. (Fisher 1966, p. 5)

More equations

[edit]

More generally, consider a linear system of M equations, with M > 1.

An equation cannot be identified from the data if less than M − 1 variables are excluded from that equation. This is a particular form of the order condition for identification. (The general form of the order condition deals also with restrictions other than exclusions.) The order condition is necessary but not sufficient for identification.

The rank condition is a necessary and sufficient condition for identification. In the case of only exclusion restrictions, it must "be possible to form at least one nonvanishing determinant of order M − 1 from the columns of A corresponding to the variables excluded a priori from that equation" (Fisher 1966, p. 40), where A is the matrix of coefficients of the equations. This is the generalization in matrix algebra of the requirement "while it does enter the other equation" mentioned above (in the line above the formulas).

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In statistics, , and related fields, the parameter identification problem (also known as the problem) arises when one or more parameters in a structural model cannot be uniquely determined from the observable or the model's reduced-form implications. This issue is fundamental to econometric modeling, as non-identifiable parameters prevent reliable estimation of causal effects and policy responses, such as distinguishing supply from demand shifts in market equilibrium analysis. The problem is particularly prominent in , where correlations among endogenous variables complicate parameter recovery. is assessed through conditions like the order and rank criteria, with strategies including exclusion restrictions and instrumental variables to ensure parameters can be consistently estimated. Beyond classical econometrics, identification challenges appear in , , and , where failure to identify parameters can lead to biased inferences. Recent advancements, such as partial identification methods, allow bounding parameters when point identification is not possible, enhancing robustness in .

Fundamentals

Definition

The parameter identification problem in and refers to the challenge of uniquely recovering the structural parameters of a model from the observable data distributions or reduced-form estimates derived from those data. In parametric models, identification ensures that the underlying parameters, which capture the causal mechanisms or behavioral relationships, can be distinguished based on without ambiguity. This is crucial because non-identified parameters lead to multiple possible values that are consistent with the same observed data, rendering unreliable. Formally, a vector θ\theta in a is if, for any two distinct values θ1\theta_1 and θ2\theta_2 in the parameter space, the implied probability distributions over the observables differ, i.e., Pθ1Pθ2P_{\theta_1} \neq P_{\theta_2}. Equivalently, θ\theta is identifiable if no other parameter value is observationally equivalent, meaning that if the or data-generating process f(yθ1)=f(yθ2)f(y \mid \theta_1) = f(y \mid \theta_2) for all possible observations yy, then θ1=θ2\theta_1 = \theta_2. This concept extends to local identification, where the parameter is uniquely determined within a neighborhood of the , often verified through the rank of the information matrix or of the mapping from parameters to observables. Structural parameters represent the primitive elements of the economic theory, such as coefficients in behavioral equations that describe causal relationships between variables, whereas reduced-form parameters summarize the observable correlations or joint distributions without specifying the underlying structure. Identification bridges these by requiring that the mapping from structural to reduced-form parameters is injective, allowing recovery of the former from the latter. A primary application arises in , where interdependence among variables complicates this recovery. The origins of the parameter identification problem trace back to the late 1940s, particularly through the foundational work of the Cowles Commission for Research in Economics, which formalized the issue in the context of linear simultaneous equation systems to address limitations in early econometric estimation methods. Key contributions, such as those by Tjalling C. Koopmans and colleagues, emphasized the need for theoretical restrictions to achieve , building on prior statistical ideas from and Trygve Haavelmo.

Importance in Modeling

Non-identification in parametric models occurs when multiple distinct sets generate the same distribution, resulting in non-unique and preventing consistent recovery of true structural . This leads to biased , as standard methods like maximum likelihood fail to converge to a single point without additional restrictions, undermining the reliability of model outputs. In econometric contexts, such failures complicate the interpretation of results, as the model cannot distinguish between observationally equivalent specifications, often requiring sensitivity analyses to bound possible values. Identification plays a crucial role in structural models by enabling the separation of causal relationships from mere correlations, a foundational challenge in where endogenous variables confound direct associations. Without proper identification, models conflate spurious correlations with true causal effects, as seen in simultaneous equations systems where endogeneity arises from mutual dependencies; identification strategies, such as instrumental variables, impose restrictions to isolate exogenous variation and recover policy-invariant causal parameters. This distinction is essential for advancing beyond to explanatory frameworks that inform theoretical understanding. In , unidentified parameters render models unsuitable for simulating counterfactual scenarios or evaluating interventions, as the lack of unique estimates precludes reliable predictions of policy impacts under new conditions. For instance, structural models with parameters allow for extrapolations to unexperienced , such as reforms, by preserving behavioral invariances across environments; non-identification, however, amplifies in welfare evaluations and forecast accuracy, limiting the practical of econometric tools in . Seminal work emphasizes that robust identification ensures parameters remain stable despite behavioral responses, facilitating credible recommendations. Identification issues intersect with broader statistical problems like and omitted variables, where high correlations among regressors or unmodeled factors mimic non-identification by inflating variance and obscuring parameter uniqueness in simultaneous equations. In such cases, in the exacerbates identification failures, leading to ill-conditioned estimation matrices and imprecise inferences, though these can be mitigated through reparameterization or exclusion restrictions without altering the core identifiability framework.

Simultaneous Equations Models

Model Structure

In simultaneous equations models, the structural form captures the theoretical relationships among variables, expressed in matrix notation as Y=ΓY+BX+CZ+εY = \Gamma Y + B X + C Z + \varepsilon, where YY is the T×GT \times G matrix of endogenous variables (with TT observations and GG variables determined within the system), XX is the T×KxT \times K_x matrix of strictly exogenous variables, ZZ is the T×KzT \times K_z matrix of predetermined variables (such as lagged endogenous variables), Γ\Gamma is the G×GG \times G on the endogenous variables (typically with zeros on the diagonal for each equation's own coefficient normalized to 1), BB and CC are the G×KxG \times K_x and G×KzG \times K_z matrices on the exogenous and predetermined variables, respectively, and ε\varepsilon is the T×GT \times G matrix of structural error terms. This form rearranges to (IΓ)Y=BX+CZ+ε(I - \Gamma) Y = B X + C Z + \varepsilon, highlighting the interdependent nature of the endogenous variables. The structural form (SF) differs from the reduced form (RF), which expresses the endogenous variables solely as functions of the predetermined variables by solving the SF algebraically, yielding Y=ΠX+ΘZ+υY = \Pi X + \Theta Z + \upsilon, where Π=(IΓ)1B\Pi = (I - \Gamma)^{-1} B, Θ=(IΓ)1C\Theta = (I - \Gamma)^{-1} C, and υ=(IΓ)1ε\upsilon = (I - \Gamma)^{-1} \varepsilon (assuming IΓI - \Gamma is invertible). The RF parameters represent the total (direct and indirect) effects of changes in the predetermined variables on the endogenous ones, and it is directly estimable by ordinary least squares (OLS) under standard conditions, whereas the SF parameters reflect the underlying causal mechanisms specified by economic theory. Identification in these models involves recovering the SF parameters from the observable RF. Endogeneity arises in simultaneous equations models due to the mutual dependence among endogenous variables, such as feedback loops where one endogenous variable influences another contemporaneously, causing each to be correlated with the structural errors in the system's equations. This correlation violates the exogeneity assumption required for consistent OLS estimation of individual structural equations, as the covariance between regressors and errors prevents unbiased recovery of coefficients directly from sample covariances. Key assumptions underlying these models include the strict exogeneity of XX (i.e., E(εX)=0E(\varepsilon | X) = 0), the predetermination of ZZ (meaning E(εtZs)=0E(\varepsilon_t | Z_s) = 0 for sts \leq t), no perfect among the regressors in each equation, and a zero for the errors with finite variance (often assuming contemporaneous across equations but no serial correlation within). These ensure the RF is well-defined and estimable, providing a foundation for addressing identification issues in the SF.

Identification Challenges

In (SEMs), the parameter identification problem manifests as an inherent ambiguity in recovering unique structural parameters from , primarily because the representation aggregates information in a way that obscures the underlying causal relationships. This challenge arises from the interdependence among endogenous variables, where the structural equations do not directly correspond to the population moments estimated from , leading to potential non-uniqueness in parameter estimates. Seminal work by the Cowles Commission highlighted that without sufficient restrictions, the mapping from structural forms to s is not invertible, complicating inference in economic modeling. A key obstacle is functional form multiplicity, where multiple structural forms can generate identical reduced forms, often due to overparameterization or factors that allow different configurations to produce the same observable outcomes. For instance, in linear SEMs, bilinear restrictions introduced by structures between errors can result in non-unique solutions, as no general necessary and sufficient conditions exist to rule out such multiplicities without additional assumptions. This issue is exacerbated in models with limited information, where the lack of prior constraints on matrices prevents distinguishing between competing structural interpretations. The role of excluded variables further contributes to underidentification, as omitting relevant exogenous variables from specific equations reduces the informational content available for isolating structural effects, thereby failing to provide enough independent instruments for estimation. In SEMs, if the number of excluded exogenous variables is insufficient relative to the number of included endogenous regressors, the order condition for identification is violated, rendering parameters non-recoverable from the data. This omission often stems from incomplete model specification based on economic theory, leading to a loss of degrees of freedom in the identification process. Correlation between endogenous variables and error terms poses another critical challenge, as it violates the exogeneity assumption required for consistent estimation methods like (OLS), necessitating rigorous checks to avoid biased inferences. In SEMs, this endogeneity arises structurally from the simultaneity, where disturbances influence multiple equations, correlating regressors with errors and preventing unique parameter recovery without validation. Such correlations underscore the need for predetermined variables or exclusion restrictions to break the feedback loops. Non-identification in SEMs can be categorized into exact and partial types: exact non-identification occurs when no parameters are recoverable due to complete observational equivalence across structures, while partial non-identification allows some parameters to be point-identified but leaves others in bounded sets without unique values. In classical linear SEMs, exact cases often result from global underidentification where rank conditions fail entirely, whereas partial scenarios emerge under weaker restrictions, enabling set-valued inferences but complicating point estimation. This distinction is crucial for applied econometrics, as partial identification permits bounded uncertainty analysis when full identification is unattainable.

Illustrative Examples

Two-Equation Case

The two-equation case provides a foundational illustration of the parameter identification problem within , particularly through the canonical system for a market good. In equilibrium, the endogenous quantity QQ equates quantity demanded and supplied, while the endogenous price PP clears the market. The demand function is specified as Q=α0+α1P+γY+ud,Q = \alpha_0 + \alpha_1 P + \gamma Y + u_d, where YY represents an exogenous shifter such as consumer income, α1<0\alpha_1 < 0 captures the negative price responsiveness, and udu_d is the random disturbance with E(ud)=0E(u_d) = 0 and uncorrelated with the exogenous variables. The supply function is Q=β0+β1P+δW+us,Q = \beta_0 + \beta_1 P + \delta W + u_s, where WW is an exogenous shifter such as input costs or weather, β1>0\beta_1 > 0 reflects positive price responsiveness, and usu_s is the supply disturbance satisfying analogous assumptions. This setup embodies the core challenge: the endogeneity of PP in both equations induces correlation between PP and the disturbances, preventing direct ordinary least squares estimation of the structural parameters. The of the model, obtained by solving the equilibrium conditions for the endogenous variables, expresses PP and QQ linearly in terms of the exogenous variables YY and WW: P=πP0+πPYY+πPWW+vP,P = \pi_{P0} + \pi_{PY} Y + \pi_{PW} W + v_P, Q=πQ0+πQYY+πQWW+vQ,Q = \pi_{Q0} + \pi_{QY} Y + \pi_{QW} W + v_Q, where the composite errors vPv_P and vQv_Q are combinations of udu_d and usu_s, and the reduced-form coefficients are πPY=γα1β1,πPW=δα1β1,πQY=α1γ+β1δα1β1,πQW=α1δ+β1(γ)α1β1,\pi_{PY} = \frac{-\gamma}{\alpha_1 - \beta_1}, \quad \pi_{PW} = \frac{\delta}{\alpha_1 - \beta_1}, \quad \pi_{QY} = \frac{\alpha_1 \gamma + \beta_1 \delta}{\alpha_1 - \beta_1}, \quad \pi_{QW} = \frac{\alpha_1 \delta + \beta_1 (-\gamma)}{\alpha_1 - \beta_1}, assuming α1β1\alpha_1 \neq \beta_1 for model stability. These reduced-form parameters are consistently estimable by ordinary because the regressors YY and WW are exogenous and uncorrelated with vPv_P and vQv_Q. Identification hinges on whether the structural parameters can be uniquely recovered from these estimable reduced-form coefficients. Underidentification manifests prominently when both structural equations include the same endogenous variables without any excluded exogenous variables, as in the baseline case omitting YY and WW: Q=α1P+ud,Q = \alpha_1 P + u_d, Q=β1P+us.Q = \beta_1 P + u_s. Here, the only variation in PP and QQ stems from the disturbances, and the reduced form collapses to the joint distribution of QQ and PP driven by udu_d and usu_s. The structural slopes α1\alpha_1 and β1\beta_1 cannot be uniquely identified because infinitely many parameter combinations, along with corresponding error variance adjustments, can replicate the observed covariances and variances. This arises from the homogeneity of the system: linear combinations of the equations yield observationally equivalent structures. For instance, Koopmans illustrated this using scatter plots where multiple demand and supply slope pairs fit the same data points equally well, rendering the true curves indistinguishable. A numerical illustration underscores this . Suppose the yields the hypothetical (Var(Q)Cov(Q,P)Cov(Q,P)Var(P))=(10.50.51),\begin{pmatrix} \text{Var}(Q) & \text{Cov}(Q, P) \\ \text{Cov}(Q, P) & \text{Var}(P) \end{pmatrix} = \begin{pmatrix} 1 & -0.5 \\ -0.5 & 1 \end{pmatrix}, assuming zero means and no exogenous . One compatible structural solution is α1=2\alpha_1 = -2, β1=0\beta_1 = 0, with Var(ud)=3\text{Var}(u_d) = 3, Var(us)=1\text{Var}(u_s) = 1, and Cov(ud,us)=0\text{Cov}(u_d, u_s) = 0, as the implied reduced-form moments match the observed matrix via the formulas Var(P)=[Var(ud)+Var(us)]/(α1β1)2\text{Var}(P) = [\text{Var}(u_d) + \text{Var}(u_s)] / (\alpha_1 - \beta_1)^2, Cov(Q,P)=α1Var(P)Var(ud)/(α1β1)\text{Cov}(Q, P) = \alpha_1 \cdot \text{Var}(P) - \text{Var}(u_d) / (\alpha_1 - \beta_1), and Var(Q)=α12Var(P)+Var(ud)+2α1[Var(ud)/(α1β1)]\text{Var}(Q) = \alpha_1^2 \cdot \text{Var}(P) + \text{Var}(u_d) + 2 \alpha_1 \cdot [-\text{Var}(u_d) / (\alpha_1 - \beta_1)]. Yet, an alternative solution α1=1\alpha_1 = -1, β1=1\beta_1 = 1, with Var(ud)=1\text{Var}(u_d) = 1, Var(us)=3\text{Var}(u_s) = 3 also reproduces the exact same , demonstrating non-uniqueness and the failure to pinpoint the demand slope α1\alpha_1 or supply slope β1\beta_1. Such multiplicity prevents reliable on key elasticities without further restrictions. Overidentification occurs when exclusion restrictions provide more instruments than needed for an equation, enabling both estimation and hypothesis testing. Consider augmenting the original model with advertising expenditure AA, an exogenous variable included only in demand: Q=α0+α1P+γY+ηA+ud,Q = \alpha_0 + \alpha_1 P + \gamma Y + \eta A + u_d, Q=β0+β1P+δW+us.Q = \beta_0 + \beta_1 P + \delta W + u_s. The expanded reduced form now incorporates AA: P=πP0+πPYY+πPAA+πPWW+vP,P = \pi_{P0} + \pi_{PY} Y + \pi_{PA} A + \pi_{PW} W + v_P, Q=πQ0+πQYY+πQAA+πQWW+vQ,Q = \pi_{Q0} + \pi_{QY} Y + \pi_{QA} A + \pi_{QW} W + v_Q, with additional coefficients πPA=η/(α1β1)\pi_{PA} = -\eta / (\alpha_1 - \beta_1) and πQA=[α1η+β1(γ)]/(α1β1)\pi_{QA} = [\alpha_1 \eta + \beta_1 (-\gamma)] / (\alpha_1 - \beta_1), among others. The supply equation is overidentified, as YY and AA serve as two excluded exogenous variables (valid instruments for PP) exceeding the single endogenous regressor in supply. This surplus allows overidentifying restrictions to be tested—for example, verifying whether πQY/πPY=πQA/πPA\pi_{QY} / \pi_{PY} = \pi_{QA} / \pi_{PA}, a cross-equation implication of the model structure—using statistics like the Hansen J-test on instrumental variable estimates. Overidentification thus facilitates model scrutiny and more efficient estimation via methods such as two-stage least squares, provided the extra instruments are valid and relevant.

Multi-Equation Systems

In multi-equation systems, the parameter identification problem involves determining the structural parameters across multiple interdependent equations, typically exceeding two, where endogenous variables influence each other contemporaneously. The general form of a linear k-equation simultaneous , with m endogenous variables (m ≥ k) and l exogenous variables, is expressed in matrix notation as YΓ+XB=U,\mathbf{Y} \Gamma + \mathbf{X} \mathbf{B} = \mathbf{U}, where Y\mathbf{Y} is a T×mT \times m matrix of observations on endogenous variables, X\mathbf{X} is a T×lT \times l matrix of observations on exogenous variables, Γ\Gamma is an m×mm \times m structural (with diagonal elements normalized to 1 for behavioral equations), B\mathbf{B} is an l×ml \times m matrix of exogenous coefficients, and U\mathbf{U} is a T×mT \times m matrix of error terms with zero means and contemporaneous Σ\Sigma. Cross-equation restrictions, such as equality constraints on coefficients shared across equations or zero restrictions on certain elements of Γ\Gamma and B\mathbf{B}, are commonly imposed to reduce the parameter space and facilitate identification in these systems. As the number of equations k increases, identification challenges intensify due to the proliferation of parameters in Γ\Gamma and B\mathbf{B}, which can grow with the square of m in fully simultaneous setups, while the reduced-form parameters (from the invertible transformation Π=ΓB1\Pi = -\Gamma \mathbf{B}^{-1}) provide only m l + m(m+1)/2 moments for recovery. This often results in a higher probability of underidentification, particularly when the number of excluded exogenous variables (potential instruments) falls short relative to the total number of endogenous regressors included across equations, limiting the ability to distinguish structural relations from reduced-form correlations. A representative example is a three-equation extending the IS-LM framework to incorporate an explicit , capturing interactions among output (Y), interest rates (r), and (I). The structural might include: (1) an It=α0+α1Yt+α2rt+α3Zt+ϵ1tI_t = \alpha_0 + \alpha_1 Y_t + \alpha_2 r_t + \alpha_3 Z_t + \epsilon_{1t}, where Z_t is an excluded exogenous variable like business confidence; (2) a goods-market (IS) Yt=β0+β1(YtTt)+β2It+β3Gt+ϵ2tY_t = \beta_0 + \beta_1 (Y_t - T_t) + \beta_2 I_t + \beta_3 G_t + \epsilon_{2t}, representing ; and (3) a money-market (LM) mt=γ0+γ1Ytγ2rt+ϵ3tm_t = \gamma_0 + \gamma_1 Y_t - \gamma_2 r_t + \epsilon_{3t}, where m_t is real . In a recursive structure, if the excludes r_t and depends only on lagged Y, it can be identified first using Z_t as an instrument, simplifying subsequent identification of the IS and LM ; however, a non-recursive structure with bidirectional feedbacks (e.g., Y affecting I and r simultaneously) demands stricter exclusion restrictions to avoid underidentification in at least one . Identification by subsystem arises in block-triangular forms of the Γ\Gamma, where the decomposes into sequentially independent blocks, allowing recursive and partial identification of earlier blocks without full restrictions. For instance, in a block-triangular three-equation model, the first block (e.g., ) may be just-identified using its own exclusions, serving as predetermined for the second block (IS), while the third (LM) leverages instruments from prior blocks plus additional exogenous variables. This approach mitigates underidentification in larger s by exploiting the triangular ordering, though it assumes no feedback within blocks and requires verifying the rank of the relevant submatrices for global consistency.

Identification Conditions

Order Condition

The order condition serves as a necessary criterion for local identification of the parameters in a specific equation within a system of simultaneous linear equations, providing a straightforward counting rule to assess whether sufficient restrictions are imposed to potentially recover the structural coefficients from the reduced-form parameters. In a system with GG equations, mm endogenous variables, and KK exogenous variables, consider the jj-th structural equation, which includes mjm_j endogenous variables (including the dependent variable) and MjM_j of the KK exogenous variables. The order condition requires that the number of excluded exogenous variables, KMjK - M_j, be at least as large as the number of right-hand-side endogenous variables, mj1m_j - 1: KMjmj1.K - M_j \geq m_j - 1. This condition ensures that there are enough excluded exogenous variables to provide independent instruments for tracing out the effects of the included endogenous regressors through the reduced form. The intuition behind this condition derives from the requirement that the structural parameters must be uniquely recoverable via linear combinations of the reduced-form coefficients. In the reduced form, each structural equation projects onto all exogenous variables, yielding a matrix of coefficients Π\Pi where the rows correspond to endogenous variables. For the jj-th equation, the structural coefficients γj\gamma_j (associated with the included endogenous variables) and βj\beta_j (for included exogenous) satisfy a relation like Πjγj=πj\Pi_j^* \gamma_j = \pi_j^*, where Πj\Pi_j^* involves submatrices from excluded exogenous variables. To solve uniquely for γj\gamma_j, the dimension of Πj\Pi_j^* (number of excluded exogenous) must be at least the dimension of γj\gamma_j (number of right-hand-side endogenous), preventing underdetermination in the system. This counting rule originates from the linear algebra necessary for the existence of a solution in the identification mapping from reduced to structural form. To illustrate, consider a classic two-equation supply-and-demand model where quantity QQ and PP are endogenous, YY is an exogenous shifter for , and a like weather WW is excluded from . The equation Q=α0+α1P+α2Y+udQ = \alpha_0 + \alpha_1 P + \alpha_2 Y + u_d includes one right-hand-side endogenous variable (PP) and one exogenous (YY), so mj1=1m_j - 1 = 1 and Mj=1M_j = 1. With total exogenous K=2K = 2 (Y,WY, W), excluded exogenous KMj=1K - M_j = 1, satisfying the condition 111 \geq 1 and allowing identification if WW influences supply. Conversely, without exclusions—both equations depending only on PP—excluded exogenous = 0 < 1, violating the condition and rendering the equation underidentified, as the reduced form cannot distinguish from supply shifts. In multi-equation systems, the condition applies equation-by-equation, failing similarly in underidentified cases like a equation omitting supply-specific exogenous variables. Despite its utility as a quick feasibility check, the order condition is merely necessary and not sufficient for identification; it may hold even when the rank condition fails due to linear dependencies among the excluded variables' reduced-form coefficients. Moreover, satisfaction with strict inequality (KMj>mj1K - M_j > m_j - 1) indicates overidentification, where multiple instruments exceed the minimum needed, enabling tests of model validity but requiring the rank condition for actual .

Rank Condition

The rank condition provides a sufficient criterion for the local identification of the parameters in an individual equation of a simultaneous equations model. Specifically, for the jj-th equation, which includes mjm_j endogenous variables (including the dependent variable) and ljl_j exogenous variables, the condition requires that the rank of the submatrix comprising the coefficients on the excluded exogenous variables from the other equations in the system equals mj1m_j - 1. This submatrix is formed from the full system's coefficient matrices on the endogenous and exogenous variables, excluding the row corresponding to the jj-th equation. In matrix notation, consider the structural form of the system as YB+XC=U\mathbf{Y} \mathbf{B} + \mathbf{X} \mathbf{C} = \mathbf{U}, where Y\mathbf{Y} is the matrix of endogenous variables, X\mathbf{X} the matrix of exogenous variables, B\mathbf{B} the for endogenous variables (with diagonal normalization often assumed for the dependent variables), and C\mathbf{C} for exogenous variables. For identification of the jj-th , the rank condition is rank(Cjexcl)=mj1\operatorname{rank}(\mathbf{C}_{-j}^{\mathrm{excl}}) = m_j - 1, where Cjexcl\mathbf{C}_{-j}^{\mathrm{excl}} is the submatrix of Cj\mathbf{C}_{-j} consisting of the columns corresponding to the excluded exogenous variables. This ensures that the structural parameters can be uniquely recovered from the reduced-form parameters. The intuition behind the rank condition lies in the need for the excluded exogenous variables to introduce linearly independent sources of variation across the equations. These exclusions must affect the other endogenous variables in ways that are not collinear with the included regressors, thereby allowing the structural relationships to be isolated from the observed data generated by the . Without this full rank, the equation's parameters would remain entangled with linear combinations of the system's other equations, preventing unique . The rank condition complements the order condition, which serves as a necessary but not sufficient prerequisite by requiring at least mj1m_j - 1 excluded exogenous variables. Satisfaction of the order condition implies that the rank could potentially reach mj1m_j - 1, but the rank condition verifies that the actual coefficients on these exclusions achieve the required . Thus, while the order condition is simpler to check via exclusion counts, the rank condition provides the deeper verification essential for identification. To illustrate, consider a standard supply and demand model where quantity QQ and price PP are endogenous, demand is Q=αP+βY+udQ = \alpha P + \beta Y + u_d (with income YY exogenous), and supply is Q=γP+δW+usQ = \gamma P + \delta W + u_s (with wage WW exogenous). For the demand equation, the excluded exogenous WW appears only in supply, and the submatrix of coefficients excluding the demand row has rank 1 (matching md1=21m_d - 1 = 2 - 1), satisfying the condition. If instead no exclusion exists (e.g., both equations include both YY and WW), the submatrix rank drops to 0, failing identification. This verification confirms that the exclusions enable tracing out the demand curve via supply shifts.

Strategies for Achieving Identification

Exclusion Restrictions

Exclusion restrictions constitute a primary method for achieving identification in by specifying that certain exogenous variables are omitted from specific structural equations, thereby limiting their direct influence to particular parts of the system. These restrictions are imposed on the of the exogenous variables, setting particular elements to zero based on theoretical priors. By excluding an exogenous variable from one while including it in others, the restriction generates excluded instruments that are correlated with the endogenous variables but not directly with the error term in the of interest. This increases the count of available instruments relative to the number of included endogenous and exogenous variables, helping to satisfy the order condition for identification, which requires at least as many excluded exogenous variables as included endogenous ones. The theoretical foundation for exclusion restrictions derives from economic theory, which identifies variables that plausibly affect only certain behavioral relations; for instance, weather conditions like rainfall may influence a supply through production costs but are excluded from the demand , as they do not directly impact consumer preferences. Such exclusions ensure the structural equations are distinguishable, preventing underidentification arising from linear dependencies among them. While exclusion restrictions strengthen identification by leveraging domain-specific knowledge to impose credible zero coefficients, they carry the risk of model misspecification if the theoretical exclusions prove invalid, potentially leading to biased estimates since these restrictions are inherently untestable without additional assumptions. In practice, invalid exclusions can undermine the rank condition, rendering parameters non-unique even when the order condition holds. A representative application appears in supply-demand models, where expenditure is often included in the demand to capture shifts in preferences but excluded from the supply , as it does not directly affect producers' costs or output decisions; this exclusion provides an instrument for identifying the demand relation.

Instrumental Variables

The instrumental variables (IV) method addresses endogeneity in parameter estimation by introducing auxiliary variables, known as instruments ZZ, that satisfy two key conditions: , meaning ZZ is correlated with the endogenous regressor XX, and exogeneity, meaning ZZ is uncorrelated with the model error term. These instruments provide exogenous variation to identify causal parameters in models where direct regressors are invalid due to correlation with disturbances. A widely used implementation of the IV approach is two-stage least squares (2SLS), which proceeds in two steps: in the first stage, the endogenous variables are regressed on the instruments and any included exogenous variables to obtain fitted values; in the second stage, these fitted values replace the endogenous variables in the original structural equation, which is then estimated via ordinary least squares. This procedure, originally developed in the context of simultaneous equations, yields consistent estimates under the IV assumptions. In relation to identification, valid instruments—often derived from excluded exogenous variables—ensure the consistency of IV estimates by isolating exogenous components of the endogenous regressors. When the model is overidentified (more instruments than endogenous regressors), tests such as the Sargan statistic or Hansen's J-test can validate instrument exogeneity by assessing overidentifying restrictions. The basic IV estimator for a structural equation Y=Xβ+uY = X \beta + u with instruments ZZ takes the form β^IV=(ZX)1ZY,\hat{\beta}_{IV} = (Z' X)^{-1} Z' Y, provided the matrix ZXZ' X has full column rank and E[Zu]=0E[Z u] = 0; under these conditions, the estimator is asymptotically unbiased and consistent as the sample size grows. For overidentified systems, the (GMM) extends IV estimation by minimizing a in sample moments, weighting instruments optimally to achieve efficiency under heteroskedasticity or . This framework encompasses 2SLS as a special case when errors are homoskedastic and instruments are equally weighted.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.