Hubbry Logo
Deviance information criterionDeviance information criterionMain
Open search
Deviance information criterion
Community hub
Deviance information criterion
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Deviance information criterion
Deviance information criterion
from Wikipedia

The deviance information criterion (DIC) is a hierarchical modeling generalization of the Akaike information criterion (AIC). It is particularly useful in Bayesian model selection problems where the posterior distributions of the models have been obtained by Markov chain Monte Carlo (MCMC) simulation. DIC is an asymptotic approximation as the sample size becomes large, like AIC. It is only valid when the posterior distribution is approximately multivariate normal.

Definition

[edit]

Define the deviance as , where are the data, are the unknown parameters of the model and is the likelihood function. is a constant that cancels out in all calculations that compare different models, and which therefore does not need to be known.

There are two calculations in common usage for the effective number of parameters of the model. The first, as described in Spiegelhalter et al. (2002, p. 587), is , where is the expectation of . The second, as described in Gelman et al. (2004, p. 182), is . The larger the effective number of parameters is, the easier it is for the model to fit the data, and so the deviance needs to be penalized.

The deviance information criterion is calculated as

or equivalently as

From this latter form, the connection with AIC is more evident.

Motivation

[edit]

The idea is that models with smaller DIC should be preferred to models with larger DIC. Models are penalized both by the value of , which favors a good fit, but also (similar to AIC) by the effective number of parameters . Since will decrease as the number of parameters in a model increases, the term compensates for this effect by favoring models with a smaller number of parameters.

An advantage of DIC over other criteria in the case of Bayesian model selection is that the DIC is easily calculated from the samples generated by a Markov chain Monte Carlo simulation. AIC requires calculating the likelihood at its maximum over , which is not readily available from the MCMC simulation. But to calculate DIC, simply compute as the average of over the samples of , and as the value of evaluated at the average of the samples of . Then the DIC follows directly from these approximations. Claeskens and Hjort (2008, Ch. 3.5) show that the DIC is large-sample equivalent to the natural model-robust version of the AIC.

Assumptions

[edit]

In the derivation of DIC, it is assumed that the specified parametric family of probability distributions that generate future observations encompasses the true model. This assumption does not always hold, and it is desirable to consider model assessment procedures in that scenario.

Also, the observed data are used both to construct the posterior distribution and to evaluate the estimated models. Therefore, DIC tends to select over-fitted models.

Extensions

[edit]

A resolution to the issues above was suggested by Ando (2007), with the proposal of the Bayesian predictive information criterion (BPIC). Ando (2010, Ch. 8) provided a discussion of various Bayesian model selection criteria. To avoid the over-fitting problems of DIC, Ando (2011) developed Bayesian model selection criteria from a predictive view point. The criterion is calculated as

The first term is a measure of how well the model fits the data, while the second term is a penalty on the model complexity. Note that the p in this expression is the predictive distribution rather than the likelihood above.

Other applications

[edit]

DIC was used in multiple S-Plus (and subsequently R) libraries for fitting likelihood-based models in the 1990's (having precedent over the Bayesian methods, to the extent they overlap); usually presented as a generalization of AIC. The DIC was defined by Hastie and Tibshirani (1990 p160, eqn 6.32)[1] for the weighted smoothers used in Generalized Additive Models, and the requisite deviance and effective degrees-of-freedom calculations were incorporated into the GAM library (Hastie, 1991).[2]

The aic method in S-Plus and R is credited (initially) to Pinheiro and Bates, developed in conjunction with nlme software,[3] and subsequently backported to other libraries (some are simply AIC; others requiring DIC-style approximations).

In the context of local likelihood, a deviance information criterion is defined Loader (1999 p69, def'n 4.4),[4] with a derivation based on jackknifed leave-one-out cross-validation, with effective degrees-of-freedom calculations explicitly in terms of likelihood derivatives. This leads to some small technical differences compared to the Hastie and Tibshirani approach.

Irizarry (2001) [5] also has an extensive development of information criteria for local likelihood. Unlike the global techniques in the above sources, the criteria developed by Irizarry are applied when estimating at a single point in the predictor space, and so are applicable to locally adaptive smoothers considered by other authors.

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The Deviance Information Criterion (DIC) is a Bayesian statistical measure for evaluating and comparing the relative fit of competing models, particularly in hierarchical or complex settings, by balancing goodness-of-fit against model complexity. It serves as an analogue to frequentist criteria like the , but is tailored for use with posterior distributions obtained via methods, providing an estimate of out-of-sample predictive accuracy. Introduced in 2002 by , Nicky Best, Bradley Carlin, and Angelika van der Linde, DIC addresses the challenge of in Bayesian frameworks where the effective number of parameters is not straightforward to determine due to dependencies in hierarchical structures. The criterion is formally defined using the deviance function D(θ)=2logp(yθ)D(\theta) = -2 \log p(y \mid \theta), where yy are the observed data and θ\theta are model parameters. The posterior mean deviance Dˉ=Eθy[D(θ)]\bar{D} = \mathbb{E}_{\theta \mid y}[D(\theta)] quantifies fit, while the effective number of parameters pD=DˉD(θˉ)p_D = \bar{D} - D(\bar{\theta}) (with θˉ\bar{\theta} as the posterior mean of θ\theta) penalizes complexity; thus, DIC = Dˉ+pD=2DˉD(θˉ)\bar{D} + p_D = 2\bar{D} - D(\bar{\theta}). Lower DIC values indicate better models, and it can be readily computed from MCMC samples without requiring closed-form expressions. DIC has been widely adopted in fields such as , , and for tasks like comparing generalized linear mixed models or latent variable models, offering advantages in handling prior information and uncertainty propagation. However, it is sensitive to parameterization choices and may underestimate complexity in certain scenarios, such as when priors conflict with data or in non-nested model comparisons; extensions like the conditional DIC have been proposed to mitigate these issues in specific contexts, such as or latent structures. A 2014 review by the original authors reaffirmed its utility while noting ongoing debates about its theoretical foundations compared to alternatives like the Widely Applicable Information Criterion (WAIC) or cross-validation methods.

Overview

Definition

The Deviance Information Criterion (DIC) is a Bayesian model assessment tool designed for hierarchical modeling, which balances against model complexity to facilitate model comparison and selection. Introduced in 2002 by Spiegelhalter et al. as a Bayesian counterpart to the (AIC), DIC extends frequentist principles to Bayesian frameworks, particularly those involving complex, high-dimensional parameter spaces. At its core, DIC is formulated as
DIC=Dˉ+pD,\text{DIC} = \bar{D} + p_D,
where Dˉ=Eθy[D(θ)]\bar{D} = \mathbb{E}_{\theta \mid y}[D(\theta)] is the posterior deviance, D(θˉ)D(\bar{\theta}) is the deviance evaluated at the posterior of the parameters θ\theta, denoted θˉ\bar{\theta}, and pD=DˉD(θˉ)p_D = \bar{D} - D(\bar{\theta}) denotes the effective number of parameters that quantifies model . The deviance function D(θ)D(\theta) itself is defined as
D(θ)=2logf(yθ),D(\theta) = -2 \log f(y \mid \theta),
measuring the discrepancy between the observed data yy and the model predictions under the likelihood f(yθ)f(y \mid \theta). This setup assumes a well-specified likelihood model, allowing DIC to penalize while rewarding adequate fit to the data.
In practice, DIC is estimated using posterior samples from (MCMC) methods, making it computationally accessible for intricate Bayesian models where analytical solutions are infeasible. Lower DIC values indicate superior models, providing a practical metric for decision-making in fields such as and .

Historical Development

The Deviance Information Criterion (DIC) was introduced in 2002 by David J. Spiegelhalter, Nicola G. Best, Bradley P. Carlin, and Angelika van der Linde as a Bayesian tool for assessing model fit and complexity in hierarchical models. Their seminal paper, "Bayesian measures of model complexity and fit," published in the Journal of the Royal Statistical Society: Series B (Statistical Methodology), proposed DIC as a method to balance predictive accuracy against model parsimony, leveraging posterior distributions obtained from (MCMC) simulations. This work addressed the need for practical model comparison techniques in Bayesian analysis, where traditional frequentist criteria like the were less directly applicable. The emergence of DIC coincided with the expansion of Bayesian computation in the late 1990s and early 2000s, driven by advances in MCMC methods that enabled inference in complex, high-dimensional models previously intractable with analytical approaches. Key developments, such as the Gibbs sampler popularized by Gelfand and Smith in 1990, had sparked a revolution in by the mid-1990s, making software implementations like BUGS feasible for routine use. Against this backdrop, DIC filled a critical gap by providing an in-sample estimate of out-of-sample predictive performance, adaptable to the posterior summaries routinely generated by MCMC. DIC saw rapid adoption following its proposal, with implementation in the WinBUGS software (version 1.4) released in 2003, which facilitated its use in applied Bayesian modeling across fields like and . By integrating DIC computation directly into MCMC monitoring, WinBUGS streamlined model selection workflows, contributing to its widespread uptake in statistical practice. From 2005 to 2010, critiques of DIC began to surface, particularly concerning the reliability of its effective number of parameters in hierarchical and singular models, prompting refinements such as adjusted estimation procedures and alternative criteria like the Widely Applicable Information Criterion (WAIC). These discussions, including formal rebuttals in statistical journals, refined DIC's theoretical foundations and application guidelines without supplanting its core utility. By the end of the decade, DIC had become a standard Bayesian model assessment tool, cited in thousands of studies despite ongoing debates.

Theoretical Background

Motivation

In Bayesian statistical modeling, traditional frequentist criteria such as the (AIC) and (BIC) often fail to adequately penalize model complexity because they rely on point estimates of parameters rather than full posterior distributions, which are central to . This limitation becomes particularly acute in hierarchical and nonlinear models, where the effective number of parameters is ambiguous and traditional measures underestimate uncertainty. The deviance information criterion (DIC) was developed to address this gap by providing a Bayesian analog that incorporates the entire posterior, enabling fair comparisons across models with diverse structures. DIC estimates out-of-sample predictive accuracy by balancing in-sample fit, assessed through the posterior deviance—a measure of how well the model explains observed averaged over the posterior distribution—with a penalty for model complexity that reflects the variability in posterior estimates. This approach mitigates in complex Bayesian models, such as those involving random effects or latent variables, where simpler plug-in estimates might overlook the spread of plausible values. By design, DIC approximates expected predictive loss, offering a practical tool for in scenarios where direct cross-validation is computationally infeasible. A key advantage of DIC arises in (MCMC)-based inference, where it naturally integrates parameter uncertainty without requiring additional approximations or algebraic derivations beyond the MCMC samples themselves. Unlike frequentist plug-in methods that treat parameters as fixed, DIC leverages the posterior to quantify both fit and complexity, making it especially suitable for modern Bayesian applications in fields like and . This incorporation of uncertainty enhances the reliability of model comparisons in high-dimensional settings.

Relationship to Deviance

The deviance information criterion (DIC) originates from the concept of deviance, a fundamental measure in generalized linear models (GLMs) that quantifies the discrepancy between observed data and model predictions. In classical statistics, deviance generalizes the chi-squared goodness-of-fit to a broader class of models, defined as D(θ)=2log(p(yθ)p(yy))D(\theta) = -2 \log \left( \frac{p(y|\theta)}{p(y|y)} \right), where p(yθ)p(y|\theta) is the model's likelihood and p(yy)p(y|y) is the saturated likelihood that perfectly fits the data. This formulation, equivalent to 2logp(yθ)-2 \log p(y|\theta) up to a constant, serves as a measure of relative fit, with lower values indicating better agreement between predictions and observations. In the Bayesian framework, deviance is adapted to incorporate posterior over parameters θ\theta. The posterior expected deviance, Dˉ=Eθy[D(θ)]\bar{D} = E_{\theta|y} [D(\theta)], averages the deviance over the posterior distribution p(θy)p(\theta|y), providing a measure of predictive accuracy that accounts for parameter variability. However, for practical point summaries in model comparison, the deviance evaluated at the posterior , D(θˉ)D(\bar{\theta}) where θˉ=Eθy[θ]\bar{\theta} = E_{\theta|y}[\theta], is often preferred as it offers a focused estimate of fit akin to classical plug-in approaches, while remaining computationally feasible via (MCMC) samples. This distinction highlights the Bayesian extension: Dˉ\bar{D} captures under posterior averaging, whereas D(θˉ)D(\bar{\theta}) emphasizes a for interpretability. The integration of these deviance components enables DIC to approximate the expected predictive deviance for new data, facilitating model selection in complex hierarchical settings. By combining Dˉ\bar{D} with a penalty term derived from the difference pD=DˉD(θˉ)p_D = \bar{D} - D(\bar{\theta}), DIC balances goodness-of-fit against model complexity, extending the deviance's role from GLMs to where parameter dimensionality is ambiguous. This approximation aligns DIC with information-theoretic criteria, promoting parsimonious models without exhaustive cross-validation.

Formulation

Core Formula

The deviance information criterion (DIC) is defined as the sum of the posterior mean deviance and the effective number of parameters, providing a balance between model fit and in Bayesian . The core formula is given by DIC=Dˉ+pD,\text{DIC} = \bar{D} + p_D, where Dˉ=2Eθy[logf(yθ)]\bar{D} = -2 \, E_{\theta \mid y} [\log f(y \mid \theta)] represents the posterior mean deviance, capturing the expected deviance over the posterior distribution of the parameters θ\theta given the yy, and pDp_D denotes the effective number of parameters, which adjusts for model . The deviance D(θ)=2logf(yθ)D(\theta) = -2 \log f(y \mid \theta) itself serves as a measure of discrepancy between the observed and the model's predicted distribution under parameters θ\theta. This formulation interprets Dˉ\bar{D} as a term that penalizes inadequate fit, with smaller values indicating better average predictive performance across the posterior, while pDp_D promotes parsimony by increasing with greater effective model . In practice, for computational ease when using (MCMC) methods to sample from the posterior, DIC is equivalently expressed as D(θˉ)+2pDD(\bar{\theta}) + 2 p_D, where θˉ\bar{\theta} is the posterior mean of θ\theta, allowing direct evaluation of the deviance at the mean alongside the complexity penalty. This equivalence arises because pD=DˉD(θˉ)p_D = \bar{D} - D(\bar{\theta}), making the two forms identical rather than approximate, though the point-estimate version facilitates simpler implementation from MCMC outputs.

Effective Number of Parameters

The effective number of parameters, denoted pDp_D, is a central component of the deviance information criterion (DIC) that quantifies model complexity in Bayesian hierarchical models by for posterior uncertainty in the deviance. It is formally defined as the difference between the posterior expected deviance Dˉ=Eθy[D(θ)]\bar{D} = \mathbb{E}_{\theta \mid y} [D(\theta)] and the deviance evaluated at the posterior of the parameters D(θˉ)D(\bar{\theta}), yielding the formula pD=DˉD(θˉ).p_D = \bar{D} - D(\bar{\theta}). This measure captures the extent to which posterior variability in the parameters inflates the expected deviance relative to the point estimate at the posterior . An equivalent approximation for pDp_D arises under assumptions of approximate posterior normality and quadratic deviance, where it equals half the posterior variance of the deviance: pD12Varθy[D(θ)].p_D \approx \frac{1}{2} \mathrm{Var}_{\theta \mid y} [D(\theta)]. Here, the deviance D(θ)=2logf(yθ)D(\theta) = -2 \log f(y \mid \theta) reflects the model's fit, and the variance term adjusts for the uncertainty introduced by the posterior distribution over θ\theta, effectively penalizing models with high variability in their predictive performance. This is particularly useful in practice for estimating pDp_D from MCMC samples when direct computation of Dˉ\bar{D} is challenging. In interpretation, pDp_D represents the number of "free" or identifiable in the model, adjusted for dependencies induced by hierarchical structure and prior information, rather than simply the nominal dimension of θ\theta. In models with strong regularization through informative priors, pDp_D typically falls below the nominal count, reflecting reduced effective dimensionality due to shrinkage and borrowing of strength across parameters. Conversely, with vague priors and well-identified parameters, pDp_D approximates the nominal ; however, it can exceed this in cases of highly non-normal posteriors, indicating additional complexity from or . This adjustment makes pDp_D a dynamic measure of difficulty, distinct from frequentist . For illustration, in a simple model with vague priors, pDp_D closely matches the nominal number of regression coefficients plus the intercept, as posterior uncertainty aligns with independent parameter estimation. In contrast, overparameterized hierarchical models, such as random-effects ANOVA, yield pDp_D less than the total number of variance components, often computed as the sum of coefficients across groups, thereby penalizing excess complexity while rewarding parsimonious effective structure.

Properties and Assumptions

Key Assumptions

The Deviance Information Criterion (DIC) relies on several key statistical and modeling assumptions to ensure its reliable application and interpretation in Bayesian . A fundamental requirement is the differentiability of the log-likelihood function with respect to the model parameters. This smoothness condition enables the use of expansions around the posterior mean or mode, which underpin the approximation of the effective number of parameters, pDp_D, as the difference between the expected deviance and the deviance at the posterior mean. Without sufficient differentiability—typically at least twice for basic approximations, but higher orders for more precise asymptotic justifications—these expansions may fail, leading to inaccurate estimates of model . Another critical assumption is that the posterior distribution is dominated by a single mode, approximating a unimodal or nearly normal shape. DIC's formulation, particularly the computation of pDp_D, assumes the posterior concentrates around one primary mode, allowing for valid normal approximations via methods like . In cases of multi-modal posteriors, such as those arising from complex hierarchical models or conflicting , DIC may inadequately capture model complexity by averaging over modes, potentially underestimating pDp_D and favoring overly simple models. This is often implicitly assumed in the heuristic derivations linking DIC to predictive performance. Finally, DIC assumes proper prior specifications that do not unduly dominate the posterior distribution. The criterion is designed to reflect data-driven model fit and complexity, so priors should be weakly informative or asymptotically negligible, ensuring that the posterior is primarily shaped by the likelihood. If priors are too influential—such as vague or improper ones—they can distort pDp_D by reducing the effective number of below the true dimensionality, as the prior effectively constrains the . This assumption aligns with DIC's goal of generalizing frequentist criteria like AIC in a Bayesian , where prior effects are minimized for large samples.

Theoretical Properties

Under regularity conditions, such as of the parameter space, strong mixing of the data, and asymptotic normality of the posterior distribution, the deviance information criterion (DIC) provides an asymptotically unbiased estimator of the expected Kullback-Leibler divergence between the data-generating process and the model's plug-in predictive distribution. However, unlike consistent criteria such as the (BIC), DIC is not asymptotically consistent for and does not select the true model with probability approaching 1 as the sample size nn \to \infty. This behavior aligns more closely with that of the (AIC) but incorporates Bayesian priors to account for model uncertainty. DIC achieves a bias-variance by approximating the expected log pointwise predictive (lppd), which measures out-of-sample predictive . The criterion balances model fit, captured by the posterior of the deviance, against , penalized by the effective number of pDp_D that reflects variance due to parameter . This arises from using posterior expectations to estimate predictive accuracy, thereby avoiding while incorporating the full posterior distribution rather than point estimates. DIC demonstrates invariance to reparameterization under certain conditions, such as when employing posterior medians or intrinsic estimators in specific models like binomial or Poisson distributions. However, sensitivity to parameterization has been widely noted in the literature following its introduction, where changes in scaling can lead to substantial variations in pDp_D and thus DIC values, potentially affecting model rankings.

Comparisons and Limitations

Comparison to Other Criteria

The Deviance Information Criterion (DIC) serves as a Bayesian analog to the (AIC), both aiming to balance model fit and complexity, but differing fundamentally in their estimation approaches. Whereas AIC relies on maximum likelihood estimates (MLE) with a fixed penalty of twice the number of parameters (2k), DIC incorporates posterior uncertainty through the effective number of parameters pDp_D, making it particularly suitable for Bayesian analyses using (MCMC) methods. This posterior averaging in DIC often leads to similar model rankings as AIC in large samples but provides superior performance in small samples where informative priors can regularize estimates. In contrast to the (BIC), which imposes a stronger penalty of klognk \log n (growing with sample size nn) to favor parsimony under frequentist assumptions, DIC's penalty pDp_D remains on the order of O(1)O(1), akin to AIC, and thus tends to select more complex models. This philosophical difference reflects BIC's asymptotic consistency in identifying the true model as nn \to \infty, while DIC prioritizes predictive accuracy across the posterior in finite samples. Empirical simulations in statistical catch-at-age models (Wilberg and Bence, 2008) illustrate DIC's effectiveness in selecting appropriate hierarchical Bayesian models for fisheries assessment, supporting its use in complex structured data over criteria with fixed penalties like BIC. The Widely Applicable Information Criterion (WAIC), developed by around 2010 and further elaborated in subsequent works, represents a more recent Bayesian alternative to DIC, emphasizing unbiased estimation of the expected log pointwise predictive density (ELPD). Unlike DIC, which can suffer from in-sample due to its reliance on the posterior mean of the deviance, WAIC employs Pareto-smoothed (PSIS) to compute an asymptotically unbiased estimate of the leave-one-out predictive , better approximating out-of-sample performance. This makes WAIC computationally similar to DIC—both require posterior samples from MCMC—but more robust in singular or hierarchical models, as validated in comparative studies showing WAIC's closer alignment with cross-validation metrics.
CriterionCore FormulaPenalty TermComputation MethodKey Advantage Over DIC
DICDIC=D+pD\text{DIC} = \overline{D} + p_D, where D=2E[logp(yθ)]\overline{D} = -2 E[\log p(y \mid \theta)] is the posterior mean deviance and pD=DD(θˉ)p_D = \overline{D} - D(\bar{\theta})pDp_D: Effective parameters based on posterior varianceMCMC posterior samples; evaluates deviance at posterior mean θˉ\bar{\theta}N/A
WAICWAIC=2(\ellpdiVar\post[logp(yiθ)])\text{WAIC} = -2 \left( \ellpd - \sum_i \text{Var}_{\post} [\log p(y_i \mid \theta)] \right), where \ellpd=logp(yiθ)p(θy)dθ\ellpd = \sum \log \int p(y_i \mid \theta) p(\theta \mid y) d\thetapWAIC=Varpost(logp(yiθ))p_{\text{WAIC}} = \sum \text{Var}_{\text{post}}(\log p(y_i \mid \theta)): Pointwise posterior varianceMCMC samples with PSIS for unbiased \ellpd\ellpd approximationAvoids in-sample bias; better for out-of-sample prediction in complex models

Criticisms and Limitations

One significant criticism of the deviance information criterion (DIC) is its lack of invariance to reparameterization of the posterior distribution. Specifically, changing the variables used to parameterize the model can substantially alter the effective number of parameters pDp_D, leading to inconsistent model rankings that depend on arbitrary choices in model formulation rather than intrinsic model quality. In finite samples, DIC tends to overestimate predictive accuracy because it evaluates fit using in-sample data, introducing that penalizes overly complex models insufficiently. This issue is exacerbated in singular models, such as or hierarchical models with overparameterization, where the posterior may not be identifiable, resulting in unstable or negative values of pDp_D that undermine the criterion's reliability. Furthermore, DIC performs poorly when priors are vague or when the posterior distribution is multi-modal, as its reliance on the posterior as a point summary fails to capture the full or multiple modes, distorting both fit and assessments compared to more robust alternatives like WAIC or LOO-CV.

Applications

Bayesian Model Selection

The Deviance Information Criterion (DIC) serves as a key tool for Bayesian , enabling the comparison of competing hierarchical models fitted via (MCMC) simulation to approximate posterior distributions. In practice, the workflow begins with specifying and sampling from the posterior for each candidate model using MCMC algorithms, such as those implemented in software like WinBUGS or Stan; this yields samples from which the expected deviance and effective number of parameters are estimated to compute the DIC. The model exhibiting the lowest DIC is preferred, as it optimally trades off predictive accuracy against complexity in a Bayesian framework. A representative application arises in for modeling count data, such as disease incidence rates; here, an intercept-only model assuming constant rates across units can be contrasted with a covariate-inclusive version incorporating factors like or environmental exposures, where MCMC sampling facilitates DIC calculation to identify the more parsimonious yet adequate specification. In , DIC has been applied to model selection in analyses of species distributions and , such as in for ecological processes. In , it supports Bayesian multiple quantitative trait loci (QTL) mapping by balancing model fit and complexity. In epidemiological disease mapping, DIC has proven effective for selecting spatial models; for instance, Knorr-Held and Best (2007) used DIC in joint disease mapping of multiple cancers in the region of the , where it favored specifications incorporating shared spatial hierarchies to better account for geographic clustering in risks. Differences in DIC values (ΔDIC) guide interpretation, with ΔDIC exceeding 10 indicating strong evidence supporting the lower-DIC model over its competitor, akin to substantial evidence thresholds in frequentist criteria. For robust selection, DIC results are commonly supplemented by posterior predictive checks, which generate replicated datasets from the to evaluate discrepancies between observed and simulated data, ensuring the chosen model aligns with substantive patterns beyond mere numerical fit.

Use in Generalized Linear Models

The deviance information criterion (DIC) is particularly well-suited for Bayesian generalized linear models (GLMs), where it facilitates the evaluation of distributional assumptions and link functions by balancing model fit against complexity through posterior predictive checks. For instance, in binary outcome models, DIC can compare the logistic link, which assumes a for latent errors, against the probit link, which assumes a ; the criterion favors the link that better captures the data's structure without excessive parameterization, especially when response probabilities are not extreme. Similarly, for count data in GLMs, DIC penalizes Poisson models that ignore by assigning higher values compared to negative binomial alternatives, which incorporate a dispersion parameter to account for excess variance relative to the mean. A representative application arises in Bayesian logistic regression for analyzing medical outcomes, such as infant health indicators across multiple studies. Here, DIC helps select between fixed-effects and random-effects structures by quantifying how well the model predicts observed binary responses while penalizing overly complex random intercepts or slopes that introduce unnecessary latent variability; for example, in hierarchical models pooling data from diverse clinical settings, a random-effects specification with multiple membership often yields a lower DIC than simpler fixed-effects alternatives, improving generalizability across trials. Compared to frequentist deviance tests in GLMs, which rely on asymptotic approximations and fixed estimates, DIC offers advantages in handling latent variables—such as random effects in mixed GLMs—by integrating over the full posterior distribution, thus providing a more robust measure of predictive accuracy under . Additionally, for GLMs with , Bayesian extensions of DIC treat missing values as latent s within the model, avoiding the need for explicit imputation and enabling direct comparison of incomplete-data models, whereas frequentist approaches often require listwise deletion or multiple imputation that can deviance estimates. In practice, post-2015 implementations in R's brms package support fitting Bayesian GLMs via Stan and compute DIC from MCMC samples for model assessment, though it recommends complementary use with WAIC for stability in complex hierarchical setups.

Extensions

Variants of DIC

Several variants of the deviance information criterion (DIC) have been proposed to address limitations of the standard formulation, particularly in hierarchical or latent variable models where nuisance parameters can lead to biased . These modifications aim to improve the criterion's performance by better accounting for model and parameter uncertainty in complex structures. The conditional DIC, introduced for hierarchical models, splits the parameters into fixed effects θf\theta_f and random effects θr\theta_r. It computes the deviance using the posterior mean of the fixed effects conditioned on the random effects, with a penalty term focused on the fixed effects complexity. The formula is given by DICcond=D(θˉfθˉr)+2pD,f,\text{DIC}_{\text{cond}} = D(\bar{\theta}_f \mid \bar{\theta}_r) + 2p_{D,f}, where D(θˉfθˉr)D(\bar{\theta}_f \mid \bar{\theta}_r) is the deviance evaluated at the conditional posterior , and pD,fp_{D,f} is the effective number of fixed parameters. This approach handles hierarchical models more effectively by avoiding overpenalization from random effects variability, though it can still favor overfitted models in some cases. The marginal DIC addresses issues with parameters by integrating them out of the likelihood, focusing on a of interest such as fixed effects. This integration reduces sensitivity to the specification of random or latent components, making it particularly useful in high-dimensional settings where standard DIC may underestimate complexity. For example, in overdispersed count data models, the marginal DIC provides more stable comparisons by marginalizing over intermediate levels in the . The Widely Applicable Information Criterion (WAIC), introduced by Vehtari, Gelman, and Gabry in , serves as a Bayesian alternative to DIC for model evaluation by estimating the expected log pointwise predictive (elppd) through , thereby addressing DIC's sensitivity to model parameterization. Unlike DIC, which relies on the posterior mean of the deviance, WAIC computes the log pointwise posterior (lppd) and subtracts an estimate of the effective number of parameters (pWAICp_{\text{WAIC}}), derived from the variance of log predictive densities smoothed using Pareto approximations to handle influential observations. The is given by WAIC=2(ppdpWAIC),\text{WAIC} = -2 \left( \ell\text{ppd} - p_{\text{WAIC}} \right), where lower values indicate better predictive performance, and this approach extends reliably to singular models where DIC may fail. Leave-One-Out Cross-Validation (LOO-CV) provides an unbiased estimate of out-of-sample predictive accuracy by iteratively fitting the model on all but one data point and evaluating on the held-out point, offering a direct approximation to elppd without relying on asymptotic assumptions like those in DIC. Although exact LOO-CV is computationally intensive for large datasets, the Pareto-smoothed (PSIS-LOO) variant efficiently approximates it by reusing full posterior samples and applying smoothing to importance weights, making it practical for complex Bayesian models. PSIS-LOO is particularly robust for detecting model misspecification through diagnostics like Pareto tail indices, outperforming DIC in scenarios with weak . The development of WAIC and LOO-CV in the was influenced by the limitations of DIC, positioning them as more robust tools within modern Bayesian software ecosystems such as Stan and PyMC, where they facilitate reliable and comparison.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.