Hubbry Logo
Relative likelihoodRelative likelihoodMain
Open search
Relative likelihood
Community hub
Relative likelihood
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Relative likelihood
Relative likelihood
from Wikipedia

In statistics, when selecting a statistical model for given data, the relative likelihood compares the relative plausibilities of different candidate models or of different values of a parameter of a single model.

Relative likelihood of parameter values

[edit]

Assume that we are given some data x for which we have a statistical model with parameter θ. Suppose that the maximum likelihood estimate for θ is . Relative plausibilities of other θ values may be found by comparing the likelihoods of those other values with the likelihood of . The relative likelihood of θ is defined to be[1][2][3][4][5] where denotes the likelihood function. Thus, the relative likelihood is the likelihood ratio with fixed denominator .

The function is the relative likelihood function.

Likelihood region

[edit]

A likelihood region is the set of all values of θ whose relative likelihood is greater than or equal to a given threshold. In terms of percentages, a p% likelihood region for θ is defined to be.[1][3][6]

If θ is a single real parameter, a p% likelihood region will usually comprise an interval of real values. If the region does comprise an interval, then it is called a likelihood interval.[1][3][7]

Likelihood intervals, and more generally likelihood regions, are used for interval estimation within likelihood-based statistics ("likelihoodist" statistics): They are similar to confidence intervals in frequentist statistics and credible intervals in Bayesian statistics. Likelihood intervals are interpreted directly in terms of relative likelihood, not in terms of coverage probability (frequentism) or posterior probability (Bayesianism).

Given a model, likelihood intervals can be compared to confidence intervals. If θ is a single real parameter, then under certain conditions, a 14.65% likelihood interval (about 1:7 likelihood) for θ will be the same as a 95% confidence interval (19/20 coverage probability).[1][6] In a slightly different formulation suited to the use of log-likelihoods (see Wilks' theorem), the test statistic is twice the difference in log-likelihoods and the probability distribution of the test statistic is approximately a chi-squared distribution with degrees-of-freedom (df) equal to the difference in df-s between the two models (therefore, the e−2 likelihood interval is the same as the 0.954 confidence interval; assuming difference in df-s to be 1).[6][7]

Relative likelihood of models

[edit]

The definition of relative likelihood can be generalized to compare different statistical models. This generalization is based on AIC (Akaike information criterion), or sometimes AICc (Akaike Information Criterion with correction).

Suppose that for some given data we have two statistical models, M1 and M2. Also suppose that AIC(M1) ≤ AIC(M2). Then the relative likelihood of M2 with respect to M1 is defined as follows.[8]

To see that this is a generalization of the earlier definition, suppose that we have some model M with a (possibly multivariate) parameter θ. Then for any θ, set M2 = M(θ), and also set M1 = M(). The general definition now gives the same result as the earlier definition.

See also

[edit]

Notes

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In statistics, relative likelihood refers to the ratio of the likelihood of a specific value or to the maximum possible likelihood within a given model, serving as a measure of evidential support or plausibility for that value relative to the best-supported alternative. This concept, formalized through the L(θx)=f(xθ)L(\theta \mid x) = f(x \mid \theta), where ff is the probability density or mass function and θ\theta is the , enables direct comparisons without requiring prior probabilities or frequentist error rates. The relative likelihood R(θ)=L(θx)/L(θ^x)R(\theta) = L(\theta \mid x) / L(\hat{\theta} \mid x), with 0R(θ)10 \leq R(\theta) \leq 1, highlights how much less plausible a parameter is compared to the maximum likelihood estimate θ^\hat{\theta}. Developed as part of the likelihood paradigm, relative likelihood builds on R.A. Fisher's introduction of the likelihood function in the 1920s and was advanced by A.W.F. Edwards in his 1972 monograph Likelihood, which argued for its use as the foundation of inductive inference over probability-based approaches. Later proponents, including Richard Royall in Statistical Evidence: A Likelihood Paradigm (1997), emphasized its role in quantifying statistical evidence via the Law of Likelihood, which states that data support one hypothesis over another to the extent of their likelihood ratio. Key applications include model selection, where relative likelihoods assess competing models' fit; parameter inference, via likelihood intervals (e.g., sets where R(θ)1/8R(\theta) \geq 1/8 or 0.15 for approximate 95% confidence regions); and hypothesis testing, avoiding p-values by focusing on direct evidential comparisons. This approach is particularly valuable in fields like ecology, physics, and machine learning for robust uncertainty quantification without Bayesian priors.

Fundamentals

Likelihood Function

The likelihood function, denoted L(θx)L(\theta \mid x), represents the joint probability density function (or probability mass function in discrete cases) of the observed data xx, expressed as a function of the unknown parameter θ\theta. Unlike a probability distribution over θ\theta, it is not normalized such that its integral (or sum) over θ\theta equals 1; instead, it measures the plausibility of different θ\theta values given the fixed data xx. This distinction emphasizes that the likelihood treats the data as given and varies the parameters, reversing the roles in the conditional probability f(xθ)f(x \mid \theta). For a sample of nn independent and identically distributed observations x=(x1,,xn)x = (x_1, \dots, x_n), the takes the product form L(θx)=i=1nf(xiθ),L(\theta \mid x) = \prod_{i=1}^n f(x_i \mid \theta), where f(θ)f(\cdot \mid \theta) is the probability density or mass function of each observation under parameter θ\theta. This formulation arises directly from the joint distribution under , facilitating computation and maximization. Key properties of the likelihood function include its invariance under reparameterization: if ϕ=g(θ)\phi = g(\theta) for a one-to-one transformation gg, then the likelihood in terms of ϕ\phi is L(ϕx)=L(θ(ϕ)x)JL(\phi \mid x) = L(\theta(\phi) \mid x) \cdot |J|, where JJ is the Jacobian determinant, preserving the relative ordering of parameter values after accounting for the transformation. Because absolute values depend on arbitrary scaling and the data's support, inference typically relies on relative likelihoods rather than absolutes, comparing L(θx)L(\theta \mid x) across θ\theta to assess plausibility. The function plays a central role in maximum likelihood estimation (MLE), where the estimator θ^\hat{\theta} maximizes L(θx)L(\theta \mid x) (or equivalently, its logarithm for convenience), providing a method to select the most data-compatible parameter. The concept was introduced by Ronald A. Fisher in his 1922 paper "On the Mathematical Foundations of Theoretical Statistics," where he developed it as a foundational tool for , shifting focus from inverse probabilities to data-driven parameter assessment. A simple example is the binomial likelihood for modeling kk successes in nn independent trials, such as coin flips, with success probability pp: L(pk)=(nk)pk(1p)nk.L(p \mid k) = \binom{n}{k} p^k (1-p)^{n-k}. Here, the likelihood peaks at p=k/np = k/n, illustrating how it quantifies support for different pp values based on the observed proportion.

Relative Likelihood Definition

In statistics, the relative likelihood provides a normalized measure of the plausibility of a parameter value given observed data, by comparing it to the most plausible value within the model. Formally, for a parameter θ\theta and data xx, the relative likelihood is defined as R(θx)=L(θx)L(θ^x),R(\theta \mid x) = \frac{L(\theta \mid x)}{L(\hat{\theta} \mid x)}, where L(θx)L(\theta \mid x) is the and θ^\hat{\theta} is the maximum likelihood estimate (MLE) that maximizes LL over the parameter space. This ratio satisfies 0R(θx)10 \leq R(\theta \mid x) \leq 1, with R(θ^x)=1R(\hat{\theta} \mid x) = 1, emphasizing the relative support for θ\theta without dependence on absolute likelihood scales. The log-relative likelihood, l(θx)=logR(θx)=logL(θx)logL(θ^x)l(\theta \mid x) = \log R(\theta \mid x) = \log L(\theta \mid x) - \log L(\hat{\theta} \mid x), is often preferred for numerical stability and analysis, as it transforms the multiplicative scale to an additive one. This form facilitates approximations, such as the second-order Taylor expansion around θ^\hat{\theta}, which yields a resembling a normal approximation for large samples. Computationally, it simplifies evaluations in optimization and procedures. Relative likelihood values near 1 indicate high plausibility for the ; for instance, the where R(θx)0.15R(\theta \mid x) \geq 0.15 approximates a 95% support interval, corresponding asymptotically to a spanning roughly ±1.96\pm 1.96 standard errors around the MLE for scalar parameters. Under standard regularity conditions, the statistic 2logR(θx)-2 \log R(\theta \mid x) follows an approximate χ2\chi^2 distribution with equal to the of θ\theta when the true is at θ\theta, supporting likelihood-based tests and intervals. Unlike a general likelihood ratio, which compares the likelihoods of two distinct hypotheses L(θ1x)/L(θ2x)L(\theta_1 \mid x) / L(\theta_2 \mid x), relative likelihood is inherently tied to the model's maximum, offering a unified scale for assessing evidential support within a single framework. This distinction underscores its role in profiling parameter plausibility rather than direct hypothesis contrast.

Parameter Values

Relative Likelihood for Parameters

In for parameters within a single model, the relative likelihood R(θx)=L(θx)/L(θ^x)R(\theta | x) = L(\theta | x) / L(\hat{\theta} | x) serves as a direct measure of the evidential support for a specific value θ\theta relative to the maximum likelihood estimate (MLE) θ^\hat{\theta}, where R(θ^x)=1R(\hat{\theta} | x) = 1 by definition. This quantifies the degree to which the observed data xx are less plausible under θ\theta than under θ^\hat{\theta}, providing a scale-invariant assessment of parameter plausibility that avoids assumptions of normality or other asymptotic approximations. Unlike standard errors, which depend on the curvature of the log-likelihood at the MLE, relative likelihood offers a global view of the likelihood surface, enabling visualization of through plots of R(θx)R(\theta | x) against θ\theta. This approach aligns with the , concentrating all inferential information in the itself. For multiparameter models, where interest lies in a subset of parameters θj\theta_j (parameters of interest) amid parameters ν\nu, the profile relative likelihood addresses the issue by maximizing the likelihood over ν\nu for fixed θj\theta_j. Formally, R(θjx)=[supνL(θj,νx)]/L(θ^,ν^x)R(\theta_j | x) = \left[ \sup_{\nu} L(\theta_j, \nu | x) \right] / L(\hat{\theta}, \hat{\nu} | x), which concentrates the to focus inference on θj\theta_j while marginalizing the impact of ν\nu. This profiling technique preserves the shape of the likelihood for θj\theta_j and is essential for practical applications, such as in generalized linear models, where parameters like dispersion must be accounted for without distorting the assessment of key effects. Adjustments to the profile likelihood, such as the Cox-Reid correction, further refine it by subtracting half the log-determinant of the matrix for ν\nu, improving accuracy in small samples. Likelihood intervals based on relative likelihood provide an to intervals by delineating the range of θ\theta values deemed sufficiently plausible. The set {θ:R(θx)0.15}\{\theta : R(\theta | x) \geq 0.15 \} (or where the log-likelihood drops by at most 1.92 units from its maximum) roughly corresponds to a 95% likelihood interval—a threshold that asymptotically aligns with the 95% of a χ12\chi^2_1 distribution under the , though it differs from Wald intervals by not relying on local or normality. These intervals are typically more conservative and data-dependent than frequentist intervals, emphasizing evidential support over long-run coverage properties, and they perform well even in non-normal settings. Evaluating relative likelihood often involves numerical computation, such as gridding over plausible θ\theta values or using optimization algorithms like Newton-Raphson to locate the MLE and profile maxima. Software implementations in or Python facilitate this via built-in maximizers, but challenges arise with multimodal likelihood surfaces, where local optima may mislead inference and require techniques or multiple starting points to ensure reliable profiling. The expectation-maximization (EM) algorithm proves useful for models with latent variables, iteratively handling incomplete data to converge on the likelihood. A concrete example occurs with nn independent observations from a N(μ,σ2)N(\mu, \sigma^2) where σ2\sigma^2 is known, yielding sample mean xˉ\bar{x}. Here, the MLE is μ^=xˉ\hat{\mu} = \bar{x}, and the relative likelihood simplifies to R(μx)=exp[n(μxˉ)22σ2],R(\mu | x) = \exp\left[ -\frac{n (\mu - \bar{x})^2}{2 \sigma^2} \right], demonstrating a symmetric parabolic decline from 1 at μ=xˉ\mu = \bar{x}, with the rate of drop-off governed by sample size nn and precision 1/σ21/\sigma^2. This form underscores the quadratic nature of the log-likelihood near the MLE, making it straightforward to compute intervals like xˉ±2σ2/n\bar{x} \pm \sqrt{2 \sigma^2 / n}
Add your contribution
Related Hubs
User Avatar
No comments yet.