Akaike information criterion

Main page

What are your thoughts?

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Akaike information criterion

Community hub0 subscribers

Talks overview Knowledge Base overview

About hubStatsRules

Wikipedia

Grokipedia

Akaike information criterion

The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Thus, AIC provides a means for model selection.

AIC is founded on information theory. When a statistical model is used to represent the process that generated the data, the representation will almost never be exact; so some information will be lost by using the model to represent the process. AIC estimates the relative amount of information lost by a given model: the less information a model loses, the higher the quality of that model.

In estimating the amount of information lost by a model, AIC deals with the trade-off between the goodness of fit of the model and the simplicity of the model. In other words, AIC deals with both the risk of overfitting and the risk of underfitting.

The Akaike information criterion is named after the Japanese statistician Hirotugu Akaike, who formulated it. It now forms the basis of a paradigm for the foundations of statistics and is also widely used for statistical inference.

Suppose that we have a statistical model of some data. Let $k$ be the number of estimated parameters in the model. Let ${\hat {L}}$ be the maximized value of the likelihood function for the model. Then the AIC value of the model is the following.

Given a set of candidate models for the data, the preferred model is the one with the minimum AIC value. Thus, AIC rewards goodness of fit (as assessed by the likelihood function), but it also includes a penalty that is an increasing function of the number of estimated parameters. The penalty discourages overfitting, which is desired because increasing the number of parameters in the model almost always improves the goodness of the fit.

Suppose that the data is generated by some unknown process f. We consider two candidate models to represent f: g₁ and g₂. If we knew f, then we could find the information lost from using g₁ to represent f by calculating the Kullback–Leibler divergence, D_KL(f ‖ g₁); similarly, the information lost from using g₂ to represent f could be found by calculating D_KL(f ‖ g₂). We would then, generally, choose the candidate model that minimized the information loss.

We cannot choose with certainty, because we do not know f. Akaike (1974) showed, however, that we can estimate, via AIC, how much more (or less) information is lost by g₁ than by g₂. The estimate, though, is only valid asymptotically; if the number of data points is small, then some correction is often necessary (see AICc, below).

See all

Hub AI

Akaike information criterion AI simulator

(@Akaike information criterion_simulator)

Wikipedia

Grokipedia

Hub AI

Akaike information criterion

See all

Talk Channels

Knowledge Base

Special Pages

Talk Channels

Knowledge Base

Special Pages

Akaike information criterion

Akaike information criterion

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Akaike information criterion

Hub AI

Akaike information criterion

Contribute something to knowledge base

History

History

Akaike information criterion

Akaike information criterion

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Akaike information criterion

Hub AI

Akaike information criterion

Contribute something to knowledge base