Hubbry Logo
Scoring algorithmScoring algorithmMain
Open search
Scoring algorithm
Community hub
Scoring algorithm
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Scoring algorithm
Scoring algorithm
from Wikipedia

Scoring algorithm, also known as Fisher's scoring,[1] is a form of Newton's method used in statistics to solve maximum likelihood equations numerically, named after Ronald Fisher.

Sketch of derivation

[edit]

Let be random variables, independent and identically distributed with twice differentiable p.d.f. , and we wish to calculate the maximum likelihood estimator (M.L.E.) of . First, suppose we have a starting point for our algorithm , and consider a Taylor expansion of the score function, , about :

where

is the observed information matrix at . Now, setting , using that and rearranging gives us:

We therefore use the algorithm

and under certain regularity conditions, it can be shown that .

Fisher scoring

[edit]

In practice, is usually replaced by , the Fisher information, thus giving us the Fisher Scoring Algorithm:

..

Under some regularity conditions, if is a consistent estimator, then (the correction after a single step) is 'optimal' in the sense that its error distribution is asymptotically identical to that of the true max-likelihood estimate.[2]

See also

[edit]

References

[edit]

Further reading

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A scoring algorithm, also known as Fisher's scoring method, is an iterative numerical optimization technique in statistics designed to solve the score equations for obtaining maximum likelihood estimates (MLEs) of model parameters. It functions as a specialized variant of the Newton-Raphson method, where the update step replaces the observed ( of the log-likelihood) with its , known as the matrix, which simplifies computations and enhances . Developed by the statistician Ronald A. Fisher in the early , the method emerged as part of his foundational work on , with initial presentations of the numerical procedure appearing around 1912 and further refinements detailed in publications through 1922. Fisher's contributions integrated the scoring approach into broader advancements in likelihood theory, emphasizing its role in handling complex probabilistic models where direct analytical solutions are infeasible. The algorithm's key advantages include faster convergence in many scenarios compared to the full Newton-Raphson method, due to the of the expected information matrix, which avoids issues with non-positive definite observed Hessians, and its equivalence to (IRLS) for generalized linear models (GLMs) with canonical links. It finds widespread application in fitting GLMs, such as logistic and , as well as in more advanced frameworks like mixed-effects models and , where reliable MLE computation is essential for inference and prediction.

Background

Maximum Likelihood Estimation

Maximum likelihood estimation (MLE) is a fundamental method in statistical inference for estimating the parameters of a probabilistic model by selecting the values that make the observed data most probable under that model. Given a sample of independent observations x=(x1,,xn)\mathbf{x} = (x_1, \dots, x_n) from a distribution parameterized by θ\theta, MLE seeks to maximize the likelihood function L(θx)=i=1nf(xiθ)L(\theta \mid \mathbf{x}) = \prod_{i=1}^n f(x_i \mid \theta), where f(θ)f(\cdot \mid \theta) is the probability density or mass function. Equivalently, maximization of the log-likelihood l(θx)=logL(θx)=i=1nlogf(xiθ)l(\theta \mid \mathbf{x}) = \log L(\theta \mid \mathbf{x}) = \sum_{i=1}^n \log f(x_i \mid \theta) is often performed, as it transforms the product into a sum, simplifying optimization while preserving the location of maxima. The likelihood principle underlies MLE, asserting that all evidential content in the data regarding the unknown parameters θ\theta is encapsulated within the likelihood function, thereby directing parametric inference toward comparisons of relative plausibility across parameter values rather than absolute probabilities. This approach emphasizes the data's role in updating beliefs about θ\theta solely through how well different parameter configurations explain the observations, forming a cornerstone of frequentist parametric statistics. MLE was developed by Ronald A. Fisher in the early 1920s, with its formal introduction in his seminal 1922 paper "On the Mathematical Foundations of Theoretical Statistics," where he established it as a general method for in parametric models. A classic example arises in estimating the mean μ\mu and variance σ2\sigma^2 of a based on an i.i.d. sample x1,,xnx_1, \dots, x_n. The log-likelihood function is l(μ,σ2x)=n2log(2πσ2)12σ2i=1n(xiμ)2.l(\mu, \sigma^2 \mid \mathbf{x}) = -\frac{n}{2} \log (2\pi \sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^n (x_i - \mu)^2. Maximizing l(μ,σ2x)l(\mu, \sigma^2 \mid \mathbf{x}) produces the estimators μ^=xˉ=n1i=1nxi\hat{\mu} = \bar{x} = n^{-1} \sum_{i=1}^n x_i and σ^2=n1i=1n(xixˉ)2\hat{\sigma}^2 = n^{-1} \sum_{i=1}^n (x_i - \bar{x})^2, which coincide with the familiar sample mean and (biased) sample variance.

Score Function and Fisher Information

In , the score function, often denoted as V(θ)V(\theta) or l˙(θ)\dot{l}(\theta), is defined as the of the log-likelihood function l(θ)l(\theta) with respect to the parameter vector θ\theta, given by V(θ)=l(θ)θ.V(\theta) = \frac{\partial l(\theta)}{\partial \theta}. This vector measures the sensitivity of the log-likelihood to changes in θ\theta. Under standard regularity conditions, the of the score function is zero when evaluated at the true parameter value, i.e., E[V(θ)]=0E[V(\theta)] = 0, which implies that the maximum likelihood estimator (MLE) occurs where the score equals zero. The matrix, denoted J(θ)J(\theta), is the negative of the of the log-likelihood, expressed as J(θ)=2l(θ)θθ.J(\theta) = -\frac{\partial^2 l(\theta)}{\partial \theta \partial \theta^\top}. This matrix captures the local of the log-likelihood surface at a specific θ\theta, providing a data-dependent measure of precision for estimates. The matrix I(θ)I(\theta) is the of the matrix, defined as I(θ)=E[J(θ)]=E[2l(θ)θθ].I(\theta) = E[J(\theta)] = -E\left[ \frac{\partial^2 l(\theta)}{\partial \theta \partial \theta^\top} \right]. It quantifies the amount of information that the observed data carry about the unknown θ\theta, serving as a fundamental measure of in parametric models. Additionally, the variance of the score function equals the matrix, Var(V(θ))=I(θ)\operatorname{Var}(V(\theta)) = I(\theta), highlighting its role in bounding the precision of estimators.

Derivation of the Scoring Algorithm

Taylor Expansion Approach

The Taylor expansion approach provides a foundational derivation for iteratively solving the score equation V(θ)=0V(\theta) = 0, where V(θ)V(\theta) denotes the score function, defined as the gradient of the log-likelihood with respect to the parameter θ\theta. The observed information matrix J(θ)J(\theta), which is the negative Hessian of the log-likelihood, serves as the of the score function. To approximate the root, consider a first-order expansion of V(θ)V(\theta) around an initial estimate θ0\theta_0: V(θ)V(θ0)+V(θ)θθ0(θθ0)=V(θ0)J(θ0)(θθ0),V(\theta) \approx V(\theta_0) + \left. \frac{\partial V(\theta)}{\partial \theta} \right|_{\theta_0} (\theta - \theta_0) = V(\theta_0) - J(\theta_0) (\theta - \theta_0),
Add your contribution
Related Hubs
User Avatar
No comments yet.