Recent from talks
Nothing was collected or created yet.
Deviance information criterion
View on WikipediaThe deviance information criterion (DIC) is a hierarchical modeling generalization of the Akaike information criterion (AIC). It is particularly useful in Bayesian model selection problems where the posterior distributions of the models have been obtained by Markov chain Monte Carlo (MCMC) simulation. DIC is an asymptotic approximation as the sample size becomes large, like AIC. It is only valid when the posterior distribution is approximately multivariate normal.
Definition
[edit]Define the deviance as , where are the data, are the unknown parameters of the model and is the likelihood function. is a constant that cancels out in all calculations that compare different models, and which therefore does not need to be known.
There are two calculations in common usage for the effective number of parameters of the model. The first, as described in Spiegelhalter et al. (2002, p. 587), is , where is the expectation of . The second, as described in Gelman et al. (2004, p. 182), is . The larger the effective number of parameters is, the easier it is for the model to fit the data, and so the deviance needs to be penalized.
The deviance information criterion is calculated as
or equivalently as
From this latter form, the connection with AIC is more evident.
Motivation
[edit]The idea is that models with smaller DIC should be preferred to models with larger DIC. Models are penalized both by the value of , which favors a good fit, but also (similar to AIC) by the effective number of parameters . Since will decrease as the number of parameters in a model increases, the term compensates for this effect by favoring models with a smaller number of parameters.
An advantage of DIC over other criteria in the case of Bayesian model selection is that the DIC is easily calculated from the samples generated by a Markov chain Monte Carlo simulation. AIC requires calculating the likelihood at its maximum over , which is not readily available from the MCMC simulation. But to calculate DIC, simply compute as the average of over the samples of , and as the value of evaluated at the average of the samples of . Then the DIC follows directly from these approximations. Claeskens and Hjort (2008, Ch. 3.5) show that the DIC is large-sample equivalent to the natural model-robust version of the AIC.
Assumptions
[edit]In the derivation of DIC, it is assumed that the specified parametric family of probability distributions that generate future observations encompasses the true model. This assumption does not always hold, and it is desirable to consider model assessment procedures in that scenario.
Also, the observed data are used both to construct the posterior distribution and to evaluate the estimated models. Therefore, DIC tends to select over-fitted models.
Extensions
[edit]A resolution to the issues above was suggested by Ando (2007), with the proposal of the Bayesian predictive information criterion (BPIC). Ando (2010, Ch. 8) provided a discussion of various Bayesian model selection criteria. To avoid the over-fitting problems of DIC, Ando (2011) developed Bayesian model selection criteria from a predictive view point. The criterion is calculated as
The first term is a measure of how well the model fits the data, while the second term is a penalty on the model complexity. Note that the p in this expression is the predictive distribution rather than the likelihood above.
Other applications
[edit]DIC was used in multiple S-Plus (and subsequently R) libraries for fitting likelihood-based models in the 1990's (having precedent over the Bayesian methods, to the extent they overlap); usually presented as a generalization of AIC. The DIC was defined by Hastie and Tibshirani (1990 p160, eqn 6.32)[1] for the weighted smoothers used in Generalized Additive Models, and the requisite deviance and effective degrees-of-freedom calculations were incorporated into the GAM library (Hastie, 1991).[2]
The aic method in S-Plus and R is credited (initially) to Pinheiro and Bates, developed in conjunction with nlme software,[3] and subsequently backported to other libraries (some are simply AIC; others requiring DIC-style approximations).
In the context of local likelihood, a deviance information criterion is defined Loader (1999 p69, def'n 4.4),[4] with a derivation based on jackknifed leave-one-out cross-validation, with effective degrees-of-freedom calculations explicitly in terms of likelihood derivatives. This leads to some small technical differences compared to the Hastie and Tibshirani approach.
Irizarry (2001) [5] also has an extensive development of information criteria for local likelihood. Unlike the global techniques in the above sources, the criteria developed by Irizarry are applied when estimating at a single point in the predictor space, and so are applicable to locally adaptive smoothers considered by other authors.
See also
[edit]References
[edit]- Ando, Tomohiro (2007). "Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models". Biometrika. 94 (2): 443–458. doi:10.1093/biomet/asm017.
- Ando, T. (2010). Bayesian Model Selection and Statistical Modeling, CRC Press. Chapter 7.
- Ando, Tomohiro (2011). "Predictive Bayesian Model Selection". American Journal of Mathematical and Management Sciences. 31 (1–2): 13–38. doi:10.1080/01966324.2011.10737798. S2CID 123680697.
- Claeskens, G, and Hjort, N.L. (2008). Model Selection and Model Averaging, Cambridge. Section 3.5.
- Gelman, Andrew; Carlin, John B.; Stern, Hal S.; Rubin, Donald B. (2004). Bayesian Data Analysis: Second Edition. Texts in Statistical Science. CRC Press. ISBN 978-1-58488-388-3. LCCN 2003051474. MR 2027492.
- van der Linde, A. (2005). "DIC in variable selection", Statistica Neerlandica, 59: 45-56. doi:10.1111/j.1467-9574.2005.00278.x
- Spiegelhalter, David J.; Best, Nicola G.; Carlin, Bradley P.; van der Linde, Angelika (2002). "Bayesian measures of model complexity and fit (with discussion)". Journal of the Royal Statistical Society, Series B. 64 (4): 583–639. doi:10.1111/1467-9868.00353. JSTOR 3088806. MR 1979380.
- Spiegelhalter, David J.; Best, Nicola G.; Carlin, Bradley P.; van der Linde, Angelika (2014). "The deviance information criterion: 12 years on (with discussion)". Journal of the Royal Statistical Society, Series B. 76 (3): 485–493. doi:10.1111/rssb.12062. S2CID 119742633.
- ^ Trevor Hastie; Robert Tibshirani (1990). Generalized Additive Models. Monographs on Statistics and Applied Probability. Chapman & Hall. ISBN 0-412-34390-8. LCCN 91109621. MR 1082147. OCLC 41137485. OL 4147339W. Zbl 0747.62061. Wikidata Q132471455.
- ^ Trevor Hastie (1991). Generalized Additive Models. pp. 249–307. doi:10.1201/9780203738535-7. ISBN 978-0-534-16765-3. Wikidata Q134623036.
{{cite book}}:|journal=ignored (help) - ^ José C. Pinheiro; Douglas M Bates (2000). Mixed-Effects Models in S and S-PLUS. Statistics and Computing. Springer New York. doi:10.1007/B98882. ISBN 978-0-387-98957-0. LCCN 99053566. OCLC 42690165. Zbl 0953.62065. Wikidata Q58040120.
- ^ Catherine Loader (1999). Local Regression and Likelihood. Statistics and Computing. Springer Nature. doi:10.1007/B98858. ISBN 978-0-387-98775-0. LCCN 99014732. MR 1704236. OL 14851039W. Zbl 0929.62046. Wikidata Q59410587.
- ^ Rafael Irizarry (2001). "Information and Posterior Probability Criteria for Model Selection in Local Likelihood Estimation". Journal of the American Statistical Association. 96 (453): 303–315. doi:10.1198/016214501750332875. ISSN 0162-1459. JSTOR 2670368. Zbl 1015.62016. Wikidata Q135676816.
External links
[edit]- McElreath, Richard (January 29, 2015). "Statistical Rethinking Lecture 8 (on DIC and other information criteria)". Archived from the original on 2021-12-21 – via YouTube.
Deviance information criterion
View on GrokipediaOverview
Definition
The Deviance Information Criterion (DIC) is a Bayesian model assessment tool designed for hierarchical modeling, which balances goodness of fit against model complexity to facilitate model comparison and selection.[2] Introduced in 2002 by Spiegelhalter et al. as a Bayesian counterpart to the Akaike Information Criterion (AIC), DIC extends frequentist principles to Bayesian frameworks, particularly those involving complex, high-dimensional parameter spaces.[2] At its core, DIC is formulated aswhere is the posterior mean deviance, is the deviance evaluated at the posterior mean of the parameters , denoted , and denotes the effective number of parameters that quantifies model complexity.[2] The deviance function itself is defined as
measuring the discrepancy between the observed data and the model predictions under the likelihood .[2] This setup assumes a well-specified likelihood model, allowing DIC to penalize overfitting while rewarding adequate fit to the data.[2] In practice, DIC is estimated using posterior samples from Markov chain Monte Carlo (MCMC) methods, making it computationally accessible for intricate Bayesian models where analytical solutions are infeasible.[2] Lower DIC values indicate superior models, providing a practical metric for decision-making in fields such as epidemiology and ecology.[2]
Historical Development
The Deviance Information Criterion (DIC) was introduced in 2002 by David J. Spiegelhalter, Nicola G. Best, Bradley P. Carlin, and Angelika van der Linde as a Bayesian tool for assessing model fit and complexity in hierarchical models.[2] Their seminal paper, "Bayesian measures of model complexity and fit," published in the Journal of the Royal Statistical Society: Series B (Statistical Methodology), proposed DIC as a method to balance predictive accuracy against model parsimony, leveraging posterior distributions obtained from Markov chain Monte Carlo (MCMC) simulations.[2] This work addressed the need for practical model comparison techniques in Bayesian analysis, where traditional frequentist criteria like the Akaike Information Criterion were less directly applicable.[1] The emergence of DIC coincided with the expansion of Bayesian computation in the late 1990s and early 2000s, driven by advances in MCMC methods that enabled inference in complex, high-dimensional models previously intractable with analytical approaches. Key developments, such as the Gibbs sampler popularized by Gelfand and Smith in 1990, had sparked a revolution in Bayesian statistics by the mid-1990s, making software implementations like BUGS feasible for routine use. Against this backdrop, DIC filled a critical gap by providing an in-sample estimate of out-of-sample predictive performance, adaptable to the posterior summaries routinely generated by MCMC.[2] DIC saw rapid adoption following its proposal, with implementation in the WinBUGS software (version 1.4) released in 2003, which facilitated its use in applied Bayesian modeling across fields like epidemiology and ecology. By integrating DIC computation directly into MCMC monitoring, WinBUGS streamlined model selection workflows, contributing to its widespread uptake in statistical practice.[3] From 2005 to 2010, critiques of DIC began to surface, particularly concerning the reliability of its effective number of parameters in hierarchical and singular models, prompting refinements such as adjusted estimation procedures and alternative criteria like the Widely Applicable Information Criterion (WAIC).[1] These discussions, including formal rebuttals in statistical journals, refined DIC's theoretical foundations and application guidelines without supplanting its core utility.[1] By the end of the decade, DIC had become a standard Bayesian model assessment tool, cited in thousands of studies despite ongoing debates.[1]Theoretical Background
Motivation
In Bayesian statistical modeling, traditional frequentist criteria such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC) often fail to adequately penalize model complexity because they rely on point estimates of parameters rather than full posterior distributions, which are central to Bayesian inference. This limitation becomes particularly acute in hierarchical and nonlinear models, where the effective number of parameters is ambiguous and traditional measures underestimate uncertainty. The deviance information criterion (DIC) was developed to address this gap by providing a Bayesian analog that incorporates the entire posterior, enabling fair comparisons across models with diverse structures. DIC estimates out-of-sample predictive accuracy by balancing in-sample fit, assessed through the posterior deviance—a measure of how well the model explains observed data averaged over the posterior distribution—with a penalty for model complexity that reflects the variability in posterior estimates. This approach mitigates overfitting in complex Bayesian models, such as those involving random effects or latent variables, where simpler plug-in estimates might overlook the spread of plausible parameter values. By design, DIC approximates expected predictive loss, offering a practical tool for model selection in scenarios where direct cross-validation is computationally infeasible. A key advantage of DIC arises in Markov chain Monte Carlo (MCMC)-based inference, where it naturally integrates parameter uncertainty without requiring additional approximations or algebraic derivations beyond the MCMC samples themselves. Unlike frequentist plug-in methods that treat parameters as fixed, DIC leverages the posterior to quantify both fit and complexity, making it especially suitable for modern Bayesian applications in fields like epidemiology and ecology. This incorporation of uncertainty enhances the reliability of model comparisons in high-dimensional settings.Relationship to Deviance
The deviance information criterion (DIC) originates from the concept of deviance, a fundamental measure in generalized linear models (GLMs) that quantifies the discrepancy between observed data and model predictions. In classical statistics, deviance generalizes the chi-squared goodness-of-fit test statistic to a broader class of models, defined as , where is the model's likelihood and is the saturated likelihood that perfectly fits the data. This formulation, equivalent to up to a constant, serves as a measure of relative fit, with lower values indicating better agreement between predictions and observations. In the Bayesian framework, deviance is adapted to incorporate posterior uncertainty over parameters . The posterior expected deviance, , averages the deviance over the posterior distribution , providing a measure of predictive accuracy that accounts for parameter variability. However, for practical point summaries in model comparison, the deviance evaluated at the posterior mean, where , is often preferred as it offers a focused estimate of fit akin to classical plug-in approaches, while remaining computationally feasible via Markov chain Monte Carlo (MCMC) samples. This distinction highlights the Bayesian extension: captures expected loss under posterior averaging, whereas emphasizes a central tendency for interpretability. The integration of these deviance components enables DIC to approximate the expected predictive deviance for new data, facilitating model selection in complex hierarchical settings. By combining with a penalty term derived from the difference , DIC balances goodness-of-fit against model complexity, extending the deviance's role from GLMs to Bayesian inference where parameter dimensionality is ambiguous. This approximation aligns DIC with information-theoretic criteria, promoting parsimonious models without exhaustive cross-validation.Formulation
Core Formula
The deviance information criterion (DIC) is defined as the sum of the posterior mean deviance and the effective number of parameters, providing a balance between model fit and complexity in Bayesian model selection.[2] The core formula is given by where represents the posterior mean deviance, capturing the expected deviance over the posterior distribution of the parameters given the data , and denotes the effective number of parameters, which adjusts for model complexity.[2] The deviance itself serves as a measure of discrepancy between the observed data and the model's predicted distribution under parameters .[2] This formulation interprets as a term that penalizes inadequate fit, with smaller values indicating better average predictive performance across the posterior, while promotes parsimony by increasing with greater effective model complexity.[2] In practice, for computational ease when using Markov chain Monte Carlo (MCMC) methods to sample from the posterior, DIC is equivalently expressed as , where is the posterior mean of , allowing direct evaluation of the deviance at the mean alongside the complexity penalty.[2] This equivalence arises because , making the two forms identical rather than approximate, though the point-estimate version facilitates simpler implementation from MCMC outputs.[2]Effective Number of Parameters
The effective number of parameters, denoted , is a central component of the deviance information criterion (DIC) that quantifies model complexity in Bayesian hierarchical models by accounting for posterior uncertainty in the deviance. It is formally defined as the difference between the posterior expected deviance and the deviance evaluated at the posterior mean of the parameters , yielding the formula This measure captures the extent to which posterior variability in the parameters inflates the expected deviance relative to the point estimate at the posterior mean. An equivalent approximation for arises under assumptions of approximate posterior normality and quadratic deviance, where it equals half the posterior variance of the deviance: Here, the deviance reflects the model's fit, and the variance term adjusts for the uncertainty introduced by the posterior distribution over , effectively penalizing models with high variability in their predictive performance. This approximation is particularly useful in practice for estimating from MCMC samples when direct computation of is challenging. In interpretation, represents the number of "free" or identifiable parameters in the model, adjusted for dependencies induced by hierarchical structure and prior information, rather than simply the nominal dimension of . In models with strong regularization through informative priors, typically falls below the nominal parameter count, reflecting reduced effective dimensionality due to shrinkage and borrowing of strength across parameters. Conversely, with vague priors and well-identified parameters, approximates the nominal count; however, it can exceed this in cases of highly non-normal posteriors, indicating additional complexity from multimodality or asymmetry. This adjustment makes a dynamic measure of estimation difficulty, distinct from frequentist degrees of freedom. For illustration, in a simple Bayesian linear regression model with vague priors, closely matches the nominal number of regression coefficients plus the intercept, as posterior uncertainty aligns with independent parameter estimation. In contrast, overparameterized hierarchical models, such as random-effects ANOVA, yield less than the total number of variance components, often computed as the sum of intraclass correlation coefficients across groups, thereby penalizing excess complexity while rewarding parsimonious effective structure.Properties and Assumptions
Key Assumptions
The Deviance Information Criterion (DIC) relies on several key statistical and modeling assumptions to ensure its reliable application and interpretation in Bayesian model selection. A fundamental requirement is the differentiability of the log-likelihood function with respect to the model parameters. This smoothness condition enables the use of Taylor series expansions around the posterior mean or mode, which underpin the approximation of the effective number of parameters, , as the difference between the expected deviance and the deviance at the posterior mean. Without sufficient differentiability—typically at least twice for basic approximations, but higher orders for more precise asymptotic justifications—these expansions may fail, leading to inaccurate estimates of model complexity.[4][5] Another critical assumption is that the posterior distribution is dominated by a single mode, approximating a unimodal or nearly normal shape. DIC's formulation, particularly the computation of , assumes the posterior concentrates around one primary mode, allowing for valid normal approximations via methods like Laplace's approximation. In cases of multi-modal posteriors, such as those arising from complex hierarchical models or conflicting data, DIC may inadequately capture model complexity by averaging over modes, potentially underestimating and favoring overly simple models. This unimodality is often implicitly assumed in the heuristic derivations linking DIC to predictive performance.[4] Finally, DIC assumes proper prior specifications that do not unduly dominate the posterior distribution. The criterion is designed to reflect data-driven model fit and complexity, so priors should be weakly informative or asymptotically negligible, ensuring that the posterior is primarily shaped by the likelihood. If priors are too influential—such as vague or improper ones—they can distort by reducing the effective number of parameters below the true dimensionality, as the prior effectively constrains the parameter space. This assumption aligns with DIC's goal of generalizing frequentist criteria like AIC in a Bayesian context, where prior effects are minimized for large samples.[4][5]Theoretical Properties
Under regularity conditions, such as compactness of the parameter space, strong mixing of the data, and asymptotic normality of the posterior distribution, the deviance information criterion (DIC) provides an asymptotically unbiased estimator of the expected Kullback-Leibler divergence between the data-generating process and the model's plug-in predictive distribution. However, unlike consistent criteria such as the Bayesian information criterion (BIC), DIC is not asymptotically consistent for model selection and does not select the true model with probability approaching 1 as the sample size . This behavior aligns more closely with that of the Akaike information criterion (AIC) but incorporates Bayesian priors to account for model uncertainty.[5][1] DIC achieves a bias-variance tradeoff by approximating the expected log pointwise predictive density (lppd), which measures out-of-sample predictive performance. The criterion balances model fit, captured by the posterior mean of the deviance, against complexity, penalized by the effective number of parameters that reflects variance due to parameter uncertainty. This approximation arises from using posterior expectations to estimate predictive accuracy, thereby avoiding overfitting while incorporating the full posterior distribution rather than point estimates.[4] DIC demonstrates invariance to reparameterization under certain conditions, such as when employing posterior medians or intrinsic estimators in specific models like binomial or Poisson distributions. However, sensitivity to parameterization has been widely noted in the literature following its introduction, where changes in parameter scaling can lead to substantial variations in and thus DIC values, potentially affecting model rankings.[4]Comparisons and Limitations
Comparison to Other Criteria
The Deviance Information Criterion (DIC) serves as a Bayesian analog to the Akaike Information Criterion (AIC), both aiming to balance model fit and complexity, but differing fundamentally in their estimation approaches.[1] Whereas AIC relies on maximum likelihood estimates (MLE) with a fixed penalty of twice the number of parameters (2k), DIC incorporates posterior uncertainty through the effective number of parameters , making it particularly suitable for Bayesian analyses using Markov chain Monte Carlo (MCMC) methods. This posterior averaging in DIC often leads to similar model rankings as AIC in large samples but provides superior performance in small samples where informative priors can regularize estimates.[1] In contrast to the Bayesian Information Criterion (BIC), which imposes a stronger penalty of (growing with sample size ) to favor parsimony under frequentist assumptions, DIC's penalty remains on the order of , akin to AIC, and thus tends to select more complex models.[1] This philosophical difference reflects BIC's asymptotic consistency in identifying the true model as , while DIC prioritizes predictive accuracy across the posterior in finite samples. Empirical simulations in statistical catch-at-age models (Wilberg and Bence, 2008) illustrate DIC's effectiveness in selecting appropriate hierarchical Bayesian models for fisheries assessment, supporting its use in complex structured data over criteria with fixed penalties like BIC.[6] The Widely Applicable Information Criterion (WAIC), developed by Watanabe around 2010 and further elaborated in subsequent works, represents a more recent Bayesian alternative to DIC, emphasizing unbiased estimation of the expected log pointwise predictive density (ELPD).[7] Unlike DIC, which can suffer from in-sample bias due to its reliance on the posterior mean of the deviance, WAIC employs Pareto-smoothed importance sampling (PSIS) to compute an asymptotically unbiased estimate of the leave-one-out predictive density, better approximating out-of-sample performance. This makes WAIC computationally similar to DIC—both require posterior samples from MCMC—but more robust in singular or hierarchical models, as validated in comparative studies showing WAIC's closer alignment with cross-validation metrics.| Criterion | Core Formula | Penalty Term | Computation Method | Key Advantage Over DIC |
|---|---|---|---|---|
| DIC | , where is the posterior mean deviance and | : Effective parameters based on posterior variance | MCMC posterior samples; evaluates deviance at posterior mean | N/A |
| WAIC | , where | : Pointwise posterior variance | MCMC samples with PSIS for unbiased approximation | Avoids in-sample bias; better for out-of-sample prediction in complex models |
