Hubbry Logo
Quasi-maximum likelihood estimateQuasi-maximum likelihood estimateMain
Open search
Quasi-maximum likelihood estimate
Community hub
Quasi-maximum likelihood estimate
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Quasi-maximum likelihood estimate
Quasi-maximum likelihood estimate
from Wikipedia

In statistics a quasi-maximum likelihood estimate (QMLE), also known as a pseudo-likelihood estimate or a composite likelihood estimate, is an estimate of a parameter θ in a statistical model that is formed by maximizing a function that is related to the logarithm of the likelihood function, but in discussing the consistency and (asymptotic) variance-covariance matrix, we assume some parts of the distribution may be mis-specified.[1][2] In contrast, the maximum likelihood estimate maximizes the actual log likelihood function for the data and model. The function that is maximized to form a QMLE is often a simplified form of the actual log likelihood function. A common way to form such a simplified function is to use the log-likelihood function of a misspecified model that treats certain data values as being independent, even when in actuality they may not be. This removes any parameters from the model that are used to characterize these dependencies. Doing this only makes sense if the dependency structure is a nuisance parameter with respect to the goals of the analysis. [3] As long as the quasi-likelihood function that is maximized is not oversimplified, the QMLE (or composite likelihood estimate) is consistent and asymptotically normal. It is less efficient than the maximum likelihood estimate, but may only be slightly less efficient if the quasi-likelihood is constructed so as to minimize the loss of information relative to the actual likelihood.[4] Standard approaches to statistical inference that are used with maximum likelihood estimates, such as the formation of confidence intervals, and statistics for model comparison,[5] can be generalized to the quasi-maximum likelihood setting.

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In statistics, the quasi-maximum likelihood estimate (QMLE), also referred to as a pseudo-maximum likelihood estimate, is an method that maximizes a log-likelihood function constructed from an assumed probability density, even when this density does not match the true underlying distribution of the data. This approach yields consistent parameter estimates for the value that minimizes the Kullback-Leibler divergence between the true and assumed distributions, making it robust to model misspecification under suitable conditions. QMLE is particularly valuable in scenarios where the full likelihood is intractable or unknown, allowing researchers to leverage familiar maximum likelihood machinery while ensuring reliable inference. The concept of QMLE emerged in the early 1980s within econometrics, building on foundational work examining the behavior of maximum likelihood under misspecification. Halbert White's 1982 paper formalized the properties of such estimators, demonstrating their consistency and deriving their asymptotic distribution in a general framework. Shortly thereafter, Christian Gourieroux, Alain Monfort, and Alain Trognon extended this theory in 1984, introducing "pseudo maximum likelihood methods" and classifying families of densities (such as certain exponential families) that guarantee strong consistency and asymptotic normality for estimators of interest parameters like means and variances. These developments addressed limitations in traditional maximum likelihood estimation, which assumes correct model specification, and provided tools for robust estimation in nonlinear and dynamic models. Key properties of QMLE include weak or for the pseudo-true parameter, provided the parameter space is compact and a uniform holds for the quasi-log-likelihood. Asymptotically, the is normally distributed, but unlike standard maximum likelihood—where the matrix equality simplifies the variance—the QMLE variance takes a "sandwich" form: A1BA1A^{-1} B A^{-1}, where AA is the expected Hessian of the quasi-log-likelihood and BB is its score covariance, reflecting heteroskedasticity or misspecification. This robust covariance enables valid hypothesis testing and confidence intervals without assuming correct specification. QMLE has broad applications in , analysis, and generalized linear models, such as estimating (ARCH) models or systems where the true distribution is complex. For instance, ordinary can be viewed as a QMLE under Gaussian assumptions for , remaining consistent even with non-normal errors. Its flexibility has made it a cornerstone for robust in empirical research, influencing subsequent methods like .

Background Concepts

Maximum Likelihood Estimation

Maximum likelihood estimation (MLE) is a fundamental statistical method for estimating the parameters of a probabilistic model from observed . Given a sample of independent and identically distributed (i.i.d.) observations y=(y1,,yn)y = (y_1, \dots, y_n) drawn from a f(yiθ)f(y_i \mid \theta), where θ\theta is the unknown parameter vector, the is defined as L(θ;y)=i=1nf(yiθ)L(\theta; y) = \prod_{i=1}^n f(y_i \mid \theta). The MLE, denoted θ^\hat{\theta}, is the value of θ\theta that maximizes this likelihood, assuming the model is correctly specified. To facilitate computation, the maximization is typically performed on the log-likelihood: θ^=argmaxθlogL(θ;y)=argmaxθi=1nlogf(yiθ)\hat{\theta} = \arg\max_\theta \log L(\theta; y) = \arg\max_\theta \sum_{i=1}^n \log f(y_i \mid \theta). The first of the log-likelihood with respect to θ\theta, known as the score function s(θ)=θlogL(θ;y)s(\theta) = \frac{\partial}{\partial \theta} \log L(\theta; y), plays a central role; at the maximum, s(θ^)=0s(\hat{\theta}) = 0. Under standard regularity conditions, such as differentiability of the log-likelihood and the existence of finite moments, the of the score is zero: E[s(θ)]=0E[s(\theta)] = 0, ensuring that the true parameter satisfies the first-order condition in expectation. The method was developed by Ronald A. Fisher in the early , with its formal introduction in his paper, where he presented MLE as an optimal procedure under correct model specification, offering desirable properties like relative to other estimators. A key result in this framework is the information matrix equality, which states that the variance of the score function equals the negative of the second of the log-likelihood: Var(s(θ))=E[2θ2logL(θ;y)]=I(θ)\operatorname{Var}(s(\theta)) = -E\left[ \frac{\partial^2}{\partial \theta^2} \log L(\theta; y) \right] = I(\theta), where I(θ)I(\theta) is the Fisher information matrix measuring the amount of information the sample carries about θ\theta. This equality underpins the asymptotic of the MLE. Under the same regularity conditions, the MLE exhibits asymptotic normality, converging in distribution to a normal random variable centered at the true with given by the inverse .

Likelihood Misspecification

Likelihood misspecification arises when the parametric family of probability distributions specified for does not encompass the true data-generating process, such that the assumed density f(yθ)f(y \mid \theta) deviates from the true density g(yθ)g(y \mid \theta). In this scenario, the maximum likelihood estimator fails to converge to the true values, resulting in inconsistency. The consequences of such misspecification are profound, including biased estimates that do not recover the underlying true values, invalid standard errors that undermine testing and confidence intervals, and the violation of the information matrix equality, where the expected outer product of the score differs from the negative expected Hessian. A prevalent form of misspecification in involves assuming normally distributed residuals when the true error distribution exhibits heavier tails, such as the Student-t distribution; this can severely distort inferences about random effects and variance components. Despite these issues, misspecification does not necessarily destroy all structural properties of the likelihood; in particular, the score function may retain a zero expected value and finite variance at a pseudo-true parameter value, laying the groundwork for quasi-maximum likelihood approaches that exploit these moments for robust estimation.

Formal Definition

Quasi-Likelihood Function

The quasi-likelihood function in the context of quasi-maximum likelihood estimation is the log-likelihood constructed from an assumed parametric probability density f(yθ)f(y \mid \theta), which may not match the true data-generating distribution g(yθ0)g(y \mid \theta_0). It is defined as Q(θ;y)=1ni=1nlogf(yiθ),Q(\theta; y) = \frac{1}{n} \sum_{i=1}^n \log f(y_i \mid \theta), where the average is over nn independent observations. This function is maximized to obtain parameter estimates that approximate the data under the assumed model, even under misspecification. The resulting estimator converges to the pseudo-true parameter θ\theta^* that minimizes the expected Kullback-Leibler divergence Eg[logg(yθ0)logf(yθ)]\mathbb{E}_g [\log g(y \mid \theta_0) - \log f(y \mid \theta)]. A key property is that the expected score under the true distribution vanishes at θ\theta^*: Eg[θlogf(yθ)]θ=θ=0\mathbb{E}_g \left[ \frac{\partial}{\partial \theta} \log f(y \mid \theta) \right]_{\theta = \theta^*} = 0, provided suitable regularity conditions hold, such as of the parameter space and . This ensures consistency of the without requiring correct specification of ff. The score function is generally si(θ)=θlogf(yiθ)s_i(\theta) = \frac{\partial}{\partial \theta} \log f(y_i \mid \theta), and for specific assumed densities (e.g., Gaussian in ), it simplifies to forms like , but the framework applies broadly to nonlinear models. In contrast to full maximum likelihood, which requires the assumed ff to be correctly specified for optimal efficiency, the approach yields consistent as long as the pseudo-true parameter is well-defined, making it robust to distributional misspecification while retaining the computational advantages of likelihood optimization.

Quasi-Maximum Likelihood

The quasi-maximum likelihood (QMLE), denoted θ^QML\hat{\theta}_{\mathrm{QML}}, is defined as θ^QML=[argmax](/page/Argmax)θQ(θ;y)\hat{\theta}_{\mathrm{QML}} = [\arg\max](/page/Arg_max)_{\theta} Q(\theta; y), where Q(θ;y)Q(\theta; y) is the function based on an assumed but potentially misspecified f(yiθ)f(y_i \mid \theta). This , introduced by White (1982) in the context of misspecified parametric models, seeks to find the parameter values that best approximate the data under the chosen , even when the true data-generating process differs from the assumed one. To obtain the QMLE, the is typically solved by setting the score equations to zero: i=1nsi(θ)=0\sum_{i=1}^n s_i(\theta) = 0, where si(θ)=θlogf(yiθ)s_i(\theta) = \frac{\partial}{\partial \theta} \log f(y_i \mid \theta) represents the individual score contributions. When a closed-form solution is unavailable—which is common for nonlinear models—numerical methods such as the Newton-Raphson algorithm are applied iteratively to converge to the maximum, relying on approximations of the for updates. These procedures ensure computational feasibility while maintaining the estimator's properties under misspecification. For , standard errors are computed using the sandwich variance estimator, which accounts for potential misspecification in the assumed density: V^(θ^QML)=A^1B^A^1,\hat{V}(\hat{\theta}_{\mathrm{QML}}) = \hat{A}^{-1} \hat{B} \hat{A}^{-1}, where A^=1ni=1n2logf(yiθ^QML)θθT\hat{A} = -\frac{1}{n} \sum_{i=1}^n \frac{\partial^2 \log f(y_i \mid \hat{\theta}_{\mathrm{QML}})}{\partial \theta \partial \theta^T} is the average negative Hessian (approximating the expected information matrix), and B^=1ni=1nsi(θ^QML)si(θ^QML)T\hat{B} = \frac{1}{n} \sum_{i=1}^n s_i(\hat{\theta}_{\mathrm{QML}}) s_i(\hat{\theta}_{\mathrm{QML}})^T is the score outer-product matrix. This robust form, derived from the asymptotic structure under misspecification, provides consistent estimates of the variability without assuming the is correctly specified. In practice, QMLE is frequently implemented in statistical software such as and , often assuming a Gaussian quasi- for its analytical simplicity and ease of optimization in models like conditional heteroskedasticity or regressions. For instance, R's sandwich package computes the corresponding robust matrices, while 's command with the vce(robust) option facilitates quasi-maximum likelihood fitting.

Asymptotic Properties

Consistency

The consistency of the quasi-maximum likelihood estimator (QMLE), denoted θ^QML\hat{\theta}_{QML}, refers to its convergence in probability to a pseudo-true θ0\theta_0 as the sample size nn \to \infty, even when the assumed is misspecified relative to the true data-generating process. Under the general framework of under misspecification, θ0\theta_0 is defined as the value that minimizes the Kullback-Leibler (KL) divergence between the true density g(yx)g(y|x) and the assumed quasi-density f(yx;θ)f(y|x; \theta), ensuring the estimator targets the best approximating the true model in an information-theoretic sense. A key theorem establishes that if the conditional mean is correctly specified, i.e., E[yx;θ]=μ(θ)E[y|x; \theta] = \mu(\theta) holds for the true parameter, then θ^QMLpθ0\hat{\theta}_{QML} \to_p \theta_0 as nn \to \infty, where θ0\theta_0 coincides with the true parameter value that achieves this mean specification. This result holds because the KL minimizer θ0\theta_0 aligns with the true parameter when the mean function is well-specified, regardless of misspecification in higher-order moments such as the conditional variance. The proof sketch relies on the uniform (ULLN) applied to the average quasi-log-likelihood (1/n)i=1nlogf(yixi;θ)(1/n) \sum_{i=1}^n \log f(y_i | x_i; \theta), which converges to its expectation Eg[logf(yx;θ)]E_g[\log f(y|x; \theta)], where the expectation is taken under the true density gg. This expected quasi-log-likelihood is maximized uniquely at θ0\theta_0, and under standard regularity conditions, the maximizer of the sample average θ^QML\hat{\theta}_{QML} thus converges in probability to θ0\theta_0. Necessary conditions for this ULLN include i.i.d. observations (or for dependent data), bounded moments to ensure integrability of the log-quasi-likelihood, and of the mean function, meaning θ0\theta_0 is the unique minimizer of the KL divergence within the parameter space. This consistency property is particularly robust in contexts like generalized linear models (GLMs), where the QMLE remains consistent for the parameters governing the conditional even if the variance structure is misspecified, provided the link function correctly captures the relationship. For instance, assuming a Gaussian in a setting yields consistent estimates of the parameters despite variance misspecification.

Asymptotic Normality

Under misspecification, the quasi-maximum likelihood estimator (QMLE) θ^n\hat{\theta}_n converges at rate n\sqrt{n}
Add your contribution
Related Hubs
User Avatar
No comments yet.