Main page
Bayes factor
View on Wikipediafrom Wikipedia
Error loading Wikipedia content
Bayes factor
View on Grokipediafrom Grokipedia
Mathematical Foundations
Definition
The Bayes factor is a statistical measure used in Bayesian inference to quantify the relative evidence provided by observed data for one model over another competing model. It was introduced by Harold Jeffreys as a tool for objective hypothesis testing within a Bayesian framework.[2] Mathematically, the Bayes factor in favor of model over model given data , denoted , is defined as the ratio of the marginal likelihoods under each model:
The marginal likelihood for a model with parameters is obtained by integrating the likelihood over the prior distribution of the parameters:
This integration averages the model's predictive performance across all plausible parameter values weighted by the prior, providing a summary of the model's overall fit to the data independent of specific parameter estimates.[2]
A common notation convention is , which reverses the comparison to favor (often the null model) over . The Bayes factor plays a central role in Bayesian model comparison by directly comparing the predictive adequacy of competing models based on the observed data, facilitating decisions about model selection without relying on point estimates or frequentist criteria.[2]
Relationship to Bayes' Theorem
The Bayes factor emerges directly from Bayes' theorem as a key component in updating the probabilities of competing models based on observed data. Bayes' theorem states that the posterior probability of a model given data is , where is the marginal likelihood under the model and is the prior probability. For two models and , the ratio of posterior model probabilities, known as the posterior odds, is therefore
with the second factor on the right-hand side defining the Bayes factor .[2] This formulation demonstrates that the Bayes factor serves as a multiplier that adjusts the prior odds to yield the posterior odds, encapsulating how the data shifts belief between models.[2]
By isolating , the Bayes factor measures the relative support for each model provided solely by the data, disentangling this evidential contribution from subjective prior beliefs about the models' plausibility.[2] This separation allows the Bayes factor to function as an objective summary of the data's evidential value within the Bayesian updating process, applicable across diverse modeling contexts.[2]
The derivation of the Bayes factor highlights a fundamental distinction in handling point-null hypotheses versus composite models. Under a point-null hypothesis, such as , the marginal likelihood simplifies to the likelihood evaluated directly at the fixed parameter value, as there is no parameter uncertainty to integrate over.[2] In contrast, for a composite model with parameters varying over a continuous space, requires integrating the likelihood over a prior distribution on the parameters to average out uncertainty, as previously outlined in the definition of marginal likelihood.[2] This difference affects the computational form of the Bayes factor but preserves its role in the posterior odds equation.[2]
Harold Jeffreys pioneered the application of the Bayes factor within this framework of Bayes' theorem in his 1939 monograph Theory of Probability (first edition).[3][2]
Interpretation
Evidence Scales
The interpretation of the Bayes factor (BF) relies on standardized scales that categorize its magnitude into qualitative levels of evidence for one model (say, the alternative ) over another (say, the null ). These scales provide a heuristic framework for assessing evidential strength, though they are not universally fixed.[4] A seminal scale was proposed by Harold Jeffreys, which divides BF values into grades based on orders of magnitude, emphasizing decisive evidence for large values. Jeffreys' scale, as commonly referenced, is as follows:| BF | Evidence against |
|---|---|
| > 100 | Decisive |
| 30–100 | Very strong |
| 10–30 | Strong |
| 3–10 | Substantial |
| 1–3 | Barely worth mentioning |
| BF | 2 ln(BF) | Evidence against |
|---|---|---|
| > 150 | > 10 | Very strong |
| 20–150 | 6–10 | Strong |
| 3–20 | 2–6 | Positive |
| 1–3 | 0–2 | Barely worth mentioning |
Posterior Odds Connection
The Bayes factor connects directly to posterior odds through Bayes' theorem applied to model comparison. Specifically, the posterior odds in favor of model over model given data are obtained by multiplying the prior odds by the Bayes factor:
where is the Bayes factor comparing to . This relationship highlights the Bayes factor's role as the multiplicative update factor representing the evidence contributed solely by the data, independent of prior beliefs.
Expanding this to posterior probabilities, let and ; then
In model selection, when prior probabilities are fixed, the Bayes factor serves as a sufficient statistic for the evidential content of the data, allowing direct quantification of how the observed data shifts belief between competing models without needing to recompute full posteriors for each prior adjustment. This makes it particularly valuable for objective comparisons, as it isolates the data's influence while priors handle subjective elements.
A common default assumption in Bayes factor applications is equal prior probabilities (), which simplifies the posterior odds to equal the Bayes factor itself and the posterior probability to . This assumption rests on the premise that the models are a priori equally plausible, often justified in exploratory analyses or when domain knowledge lacks strong preferences, though it can be sensitive to model complexity if not carefully considered.
In contrast to non-Bayesian approaches, where likelihood ratios compare point estimates of parameters under each model, the Bayes factor employs marginal likelihoods that integrate over parameter priors, providing a fuller evidential measure that accounts for model uncertainty.
Computation Methods
Exact Calculation
Exact calculation of the Bayes factor is possible in cases where the marginal likelihoods under each model can be derived analytically or evaluated via direct numerical methods, particularly for models with low-dimensional parameter spaces or conjugate prior distributions. These approaches avoid the need for simulation-based approximations and provide precise values, though they are limited to relatively simple model structures. In models employing conjugate priors, such as the normal distribution with known variance or the binomial distribution with a beta prior, the marginal likelihoods admit closed-form expressions, enabling straightforward computation of the Bayes factor. For instance, consider testing a point null hypothesis $ H_0: \mu = \mu_0 $ against an alternative $ H_1: \mu \sim \mathcal{N}(\mu_0, \sigma_0^2) $ for data $ x_1, \dots, x_n \iid \mathcal{N}(\mu, \sigma^2) $ with known $ \sigma^2 $. The Bayes factor $ BF_{01} $ favoring the null is given by
This formula arises from the ratio of the normal marginal likelihood under the null to the integrated likelihood under the alternative prior.[6] Similarly, for a binomial model testing $ H_0: p = p_0 $ against $ H_1: p \sim \text{Beta}(\alpha, \beta) $, the marginal likelihood under $ H_1 $ is the beta-binomial probability mass function, $ p(k | n, \alpha, \beta) = \binom{n}{k} \frac{ B(\alpha + k, \beta + n - k) }{ B(\alpha, \beta) } $, where $ k $ is the number of successes and $ B $ is the beta function, yielding an exact Bayes factor as the ratio to the null binomial probability.[6]
When analytical solutions are unavailable but the parameter dimensionality remains low (e.g., one or two parameters), numerical integration techniques such as Gaussian quadrature can evaluate the required integrals for the marginal likelihoods with high precision. These methods discretize the integral over the parameter space using carefully chosen nodes and weights to approximate the exact value, making them suitable for exact computation in feasible cases.[6] For slightly more complex low-dimensional settings, Laplace approximations provide near-exact results by expanding the integrand around its mode, though they rely on asymptotic assumptions for accuracy.[6]
For nested models, where the null model is a special case of the alternative (e.g., imposing a point restriction $ \theta = \theta_0 $), the Savage-Dickey density ratio offers an exact computational shortcut under specific prior conditions. The Bayes factor $ BF_{01} $ is then
provided the prior distribution for the nuisance parameters under $ M_1 $ matches that under $ M_0 $ when $ \theta \to \theta_0 $, and the posterior and prior densities are continuous at $ \theta_0 $. This ratio equates the marginal likelihoods without full integration over the alternative model.[7]
Software tools facilitate these exact methods for standard models. The R package BayesFactor implements analytical and numerical integration (via Monte Carlo with adjustable iterations for precision) to compute Bayes factors precisely for basic designs, including one-sample t-tests (equivalent to normal means with known variance under certain priors) and linear models.[8]
Approximations and Algorithms
Computing Bayes factors exactly becomes infeasible for complex, high-dimensional models where the marginal likelihood integral cannot be evaluated analytically. Monte Carlo methods provide scalable approximations by estimating the marginal likelihood through simulation. Importance sampling draws samples from a proposal distribution to approximate the posterior, reweighting them to estimate the evidence; for instance, schemes tailored to mixture models use maximum likelihood estimates or Rao-Blackwellized dual sampling to mitigate bias from posterior mode exploration issues, enabling reliable Bayes factor computation in such settings.[9] The harmonic mean estimator, derived from posterior samples, inverts the identity , where are MCMC draws from the posterior, offering a simple yet variance-prone approach to marginal likelihoods for Bayes factors.[10] Markov chain Monte Carlo (MCMC) techniques extend these approximations for more robust estimation in nested or non-nested models. Bridge sampling leverages samples from prior and posterior distributions to estimate the normalizing constant ratio via , where bridges the distributions, yielding accurate Bayes factors with reduced variance compared to importance sampling alone.[11] Thermodynamic integration approximates the marginal likelihood by integrating the expected log-likelihood along a power posterior path , , often implemented with MCMC at discrete levels; this method excels for comparing phylogenetic or cognitive models, providing stable Bayes factors even in high dimensions. Recent enhancements, such as differential evolution MCMC for thermodynamic integration, further improve efficiency by requiring fewer samples per path rung, achieving convergence 5-8 times faster than standard implementations.[12] Nested sampling is another class of algorithms for approximating marginal likelihoods, particularly effective in high-dimensional spaces. It transforms the evidence integral into a one-dimensional integral over prior mass, using sequential sampling to estimate it efficiently without tuning, as implemented in tools like MultiNest or diffuse nested sampling. This method is popular in fields like cosmology and provides reliable Bayes factors for complex models.[13] Information criteria offer asymptotic approximations to Bayes factors without simulation. The Bayesian information criterion (BIC) estimates , where is the maximized log-likelihood, the number of parameters, and the sample size, deriving from Laplace's method under a unit information prior; thus, .[14] This approximation holds asymptotically for large and fixed , assuming regularity conditions like identifiability and correct model specification, but falters in small samples or high dimensions where the unit prior mismatches the true scenario, potentially biasing model selection.[6] For large datasets, variational inference and integrated nested Laplace approximations (INLA) enable faster marginal likelihood estimates. Variational methods optimize a lower bound on the evidence, , approximating the posterior with a tractable to derive Bayes factors in factor analysis or mixture settings, though they may underestimate evidence due to the bound's conservatism.[15] INLA targets latent Gaussian models, combining Laplace approximations for conditional modes with numerical integration for hyperparameters to compute marginal posteriors and likelihoods efficiently; it supports Bayes factor estimation via model averaging in spatial or time-series contexts, scaling to thousands of observations without MCMC.[16] Post-2020 advances refine these for broader applicability. Path sampling, an extension of thermodynamic integration, estimates evidence ratios by simulating paths between models, improving accuracy in non-nested comparisons for hydrological or evolutionary models.[17] Generalized harmonic mean estimators, such as the learnt variant, employ machine learning to optimize the importance proposal from posterior samples, reducing variance by orders of magnitude and enabling scalable Bayes factors in dimensions up to , outperforming traditional methods in speed and precision for cosmological and statistical applications.[18]Examples and Applications
Basic Coin Flip Example
Consider a simple hypothesis testing scenario involving coin flips to illustrate the Bayes factor. Suppose we observe data consisting of 8 heads in 10 independent flips. We compare two models: , the null hypothesis that the coin is fair with fixed bias ; and , the alternative hypothesis that the coin is biased with following a uniform prior distribution Beta(1,1) on [0,1]. The marginal likelihood under is the binomial probability of the data given :
Under , the marginal likelihood integrates the binomial likelihood over the prior:
where is the beta function. The Bayes factor in favor of over is then
This calculation shows how the Bayes factor quantifies the relative evidential support for the biased coin model.
According to Jeffreys' scale for interpreting Bayes factors, a value of between 1 and 3 provides "barely worth mentioning" support for the alternative hypothesis—in this case, evidence for a biased coin.
To visualize the models, consider a plot of the prior and posterior distributions under alongside the point mass at under . The prior under is flat (uniform on [0,1]). The posterior under is Beta(9,3), which peaks around and shifts mass toward higher values of after observing 8 heads. The point mass under remains fixed at 0.5, highlighting the concentrated evidence for fairness versus the spread under the alternative. The relative heights of the predictive distributions at the observed data further illustrate why receives more support here.