Hubbry Logo
Bayes factorBayes factorMain
Open search
Bayes factor
Community hub
Bayes factor
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Bayes factor
Bayes factor
from Wikipedia

The Bayes factor is a ratio of two competing statistical models represented by their evidence, and is used to quantify the support for one model over the other.[1] The models in question can have a common set of parameters, such as a null hypothesis and an alternative, but this is not necessary; for instance, it could also be a non-linear model compared to its linear approximation. The Bayes factor can be thought of as a Bayesian analog to the likelihood-ratio test, although it uses the integrated (i.e., marginal) likelihood rather than the maximized likelihood. As such, both quantities only coincide under simple hypotheses (e.g., two specific parameter values).[2] Also, in contrast with null hypothesis significance testing, Bayes factors support evaluation of evidence in favor of a null hypothesis, rather than only allowing the null to be rejected or not rejected.[3]

Although conceptually simple, the computation of the Bayes factor can be challenging depending on the complexity of the model and the hypotheses.[4] Since closed-form expressions of the marginal likelihood are generally not available, numerical approximations based on MCMC samples have been suggested.[5] For certain special cases, simplified algebraic expressions can be derived; for instance, the Savage–Dickey density ratio in the case of a precise (equality constrained) hypothesis against an unrestricted alternative.[6][7] Another approximation, derived by applying Laplace's approximation to the integrated likelihoods, is known as the Bayesian information criterion (BIC);[8] in large data sets the Bayes factor will approach the BIC as the influence of the priors wanes. In small data sets, priors generally matter and must not be improper since the Bayes factor will be undefined if either of the two integrals in its ratio is not finite.

Definition

[edit]

The Bayes factor is the ratio of two marginal likelihoods; that is, the likelihoods of two statistical models integrated over the prior probabilities of their parameters.[9]

The posterior probability of a model M given data D is given by Bayes' theorem:

The key data-dependent term represents the probability that some data are produced under the assumption of the model M; evaluating it correctly is the key to Bayesian model comparison.

Given a model selection problem in which one wishes to choose between two models on the basis of observed data D, the plausibility of the two different models M1 and M2, parametrised by model parameter vectors and , is assessed by the Bayes factor K given by

When the two models have equal prior probability, so that , the Bayes factor is equal to the ratio of the posterior probabilities of M1 and M2. If instead of the Bayes factor integral, the likelihood corresponding to the maximum likelihood estimate of the parameter for each statistical model is used, then the test becomes a classical likelihood-ratio test. Unlike a likelihood-ratio test, this Bayesian model comparison does not depend on any single set of parameters, as it integrates over all parameters in each model (with respect to the respective priors). An advantage of the use of Bayes factors is that it automatically, and quite naturally, includes a penalty for including too much model structure.[10] It thus guards against overfitting. For models where an explicit version of the likelihood is not available or too costly to evaluate numerically, approximate Bayesian computation can be used for model selection in a Bayesian framework,[11] with the caveat that approximate-Bayesian estimates of Bayes factors are often biased.[12]

Other approaches are:

Interpretation

[edit]

A value of K > 1 means that M1 is more strongly supported by the data under consideration than M2. Note that classical hypothesis testing gives one hypothesis (or model) preferred status (the 'null hypothesis'), and only considers evidence against it. The fact that a Bayes factor can produce evidence for and not just against a null hypothesis is one of the key advantages of this analysis method.[13]

Harold Jeffreys gave a scale (Jeffreys' scale) for interpretation of :[14]

K dHart bits Strength of evidence
< 100 < 0 < 0 Negative (supports M2)
100 to 101/2 0 to 5 0 to 1.6 Barely worth mentioning
101/2 to 101 5 to 10 1.6 to 3.3 Substantial
101 to 103/2 10 to 15 3.3 to 5.0 Strong
103/2 to 102 15 to 20 5.0 to 6.6 Very strong
> 102 > 20 > 6.6 Decisive

The second column gives the corresponding weights of evidence in decihartleys (also known as decibans); bits are added in the third column for clarity. The table continues in the other direction, so that, for example, is decisive evidence for .

An alternative table, widely cited, is provided by Kass and Raftery (1995):[10]

log10 K K Strength of evidence
0 to 1/2 1 to 3.2 Not worth more than a bare mention
1/2 to 1 3.2 to 10 Substantial
1 to 2 10 to 100 Strong
> 2 > 100 Decisive

According to I. J. Good, the just-noticeable difference of humans in their everyday life, when it comes to a change degree of belief in a hypothesis, is about a factor of 1.3x, or 1 deciban, or 1/3 of a bit, or from 1:1 to 5:4 in odds ratio.[15]

Example

[edit]

Suppose we have a random variable that produces either a success or a failure. We want to compare a model where the probability of success is q = 12, and another model where q is unknown and we take a prior distribution for q that is uniform on [0,1]. We take a sample of 200, and find 115 successes and 85 failures. The likelihood can be calculated according to the binomial distribution:

Thus we have for

whereas for we have

The ratio is then 1.2, which is "barely worth mentioning" even if it points very slightly towards .

A frequentist hypothesis test of (here considered as a null hypothesis) would have produced a very different result. Such a test says that should be rejected at the 5% significance level, since the probability of getting 115 or more successes from a sample of 200 if q = 12 is 0.02, and as a two-tailed test of getting a figure as extreme as or more extreme than 115 is 0.04. Note that 115 is more than two standard deviations away from 100. Thus, whereas a frequentist hypothesis test would yield significant results at the 5% significance level, the Bayes factor hardly considers this to be an extreme result. Note, however, that a non-uniform prior (for example one that reflects that you expect the number of success and failures to be of the same order of magnitude) could result in a Bayes factor that is more in agreement with the frequentist hypothesis test.

A classical likelihood-ratio test would have found the maximum likelihood estimate for q, namely , whence

(rather than averaging over all possible q). That gives a likelihood ratio of 0.1 and points towards M2.

is a more complex model than because it has a free parameter which allows it to model the data more closely. The ability of Bayes factors to take this into account is a reason why Bayesian inference has been put forward as a theoretical justification for and generalisation of Occam's razor, reducing Type I errors.[16]

On the other hand, the modern method of relative likelihood takes into account the number of free parameters in the models, unlike the classical likelihood ratio. The relative likelihood method could be applied as follows. Model M1 has 0 parameters, and so its Akaike information criterion (AIC) value is . Model M2 has 1 parameter, and so its AIC value is . Hence M1 is about times as probable as M2 to minimize the information loss. Thus M2 is slightly preferred, but M1 cannot be excluded.

See also

[edit]
Statistical ratios

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The Bayes factor is a key quantity in that compares the relative support for two competing hypotheses or models given observed , defined as the of the marginal likelihood of the under one model to that under the other. It serves as an updating factor on prior odds, where a Bayes factor B10=6B_{10} = 6, for instance, indicates that the are six times more likely under the H1H_1 than under the H0H_0. Originating in the work of Harold Jeffreys in 1935, the concept built on earlier contributions from Dorothy Wrinch and J.B.S. Haldane in the 1920s and 1930s, with Jeffreys formalizing it as a tool for scientific inference in his influential book Theory of Probability. The term "Bayes factor" itself was coined later by Robert E. Kass and Adrian E. Raftery in their 1995 paper, which popularized its use for model selection and hypothesis testing. Unlike frequentist p-values, which only assess evidence against a null hypothesis, the Bayes factor provides a symmetric measure that can quantify evidence in favor of either hypothesis, distinguishing between absence of evidence and evidence of absence. It is particularly advantageous for comparing non-nested models and is robust to optional stopping in data collection, making it suitable for sequential experimental designs. Jeffreys proposed a heuristic scale to interpret Bayes factor magnitudes as strength of evidence: values between 1 and 3 indicate "anecdotal" support for H1H_1, 3 to 10 offer "moderate" evidence, 10 to 30 provide "strong" evidence, 30 to 100 yield "very strong" evidence, and greater than 100 represent "extreme" evidence, with reciprocals applying for support of H0H_0. This scale, while subjective, has been widely adopted and refined in fields like , , and for model comparison tasks. Bayes factors often require numerical approximations due to the intractability of marginal likelihoods in complex models, but they remain central to Bayesian and evidence accumulation.

Mathematical Foundations

Definition

The Bayes factor is a statistical measure used in Bayesian inference to quantify the relative evidence provided by observed data for one model over another competing model. It was introduced by Harold Jeffreys as a tool for objective hypothesis testing within a Bayesian framework. Mathematically, the Bayes factor in favor of model M1M_1 over model M2M_2 given data DD, denoted BF10BF_{10}, is defined as the ratio of the marginal likelihoods under each model: BF10=p(DM1)p(DM2)BF_{10} = \frac{p(D \mid M_1)}{p(D \mid M_2)} The marginal likelihood p(DM)p(D \mid M) for a model MM with parameters θ\theta is obtained by integrating the likelihood over the prior distribution of the parameters: p(DM)=p(Dθ,M)p(θM)dθp(D \mid M) = \int p(D \mid \theta, M) \, p(\theta \mid M) \, d\theta This integration averages the model's predictive performance across all plausible parameter values weighted by the prior, providing a summary of the model's overall fit to the independent of specific parameter estimates. A common notation convention is BF01=1/BF10BF_{01} = 1 / BF_{10}, which reverses the comparison to favor M0M_0 (often the null model) over M1M_1. The Bayes factor plays a central role in Bayesian model comparison by directly comparing the predictive adequacy of competing models based on the observed data, facilitating decisions about model selection without relying on point estimates or frequentist criteria.

Relationship to Bayes' Theorem

The Bayes factor emerges directly from Bayes' theorem as a key component in updating the probabilities of competing models based on observed data. Bayes' theorem states that the posterior probability of a model MiM_i given data DD is P(MiD)P(DMi)P(Mi)P(M_i | D) \propto P(D | M_i) P(M_i), where P(DMi)P(D | M_i) is the marginal likelihood under the model and P(Mi)P(M_i) is the prior probability. For two models M1M_1 and M2M_2, the ratio of posterior model probabilities, known as the posterior odds, is therefore P(M1D)P(M2D)=P(M1)P(M2)×P(DM1)P(DM2),\frac{P(M_1 | D)}{P(M_2 | D)} = \frac{P(M_1)}{P(M_2)} \times \frac{P(D | M_1)}{P(D | M_2)}, with the second factor on the right-hand side defining the Bayes factor BF12BF_{12}. This formulation demonstrates that the Bayes factor serves as a multiplier that adjusts the prior odds to yield the posterior odds, encapsulating how the data shifts belief between models. By isolating P(DM1)P(DM2)\frac{P(D | M_1)}{P(D | M_2)}, the Bayes factor measures the relative support for each model provided solely by the , disentangling this evidential contribution from subjective prior beliefs about the models' plausibility. This separation allows the Bayes factor to function as an objective summary of the 's evidential value within the Bayesian updating process, applicable across diverse modeling contexts. The derivation of the Bayes factor highlights a fundamental distinction in handling point-null hypotheses versus composite models. Under a point-null hypothesis, such as M0:θ=θ0M_0: \theta = \theta_0, the marginal likelihood P(DM0)P(D | M_0) simplifies to the likelihood evaluated directly at the fixed value, as there is no parameter uncertainty to integrate over. In contrast, for a composite model M1M_1 with parameters varying over a continuous space, P(DM1)P(D | M_1) requires integrating the likelihood over a prior distribution on the parameters to average out uncertainty, as previously outlined in the definition of . This difference affects the computational form of the Bayes factor but preserves its role in the posterior odds equation. Harold Jeffreys pioneered the application of the within this framework of in his 1939 monograph Theory of Probability (first edition).

Interpretation

Evidence Scales

The interpretation of the (BF) relies on standardized scales that categorize its magnitude into qualitative levels of evidence for one model (say, the alternative M1M_1) over another (say, the null M0M_0). These scales provide a framework for assessing evidential strength, though they are not universally fixed. A seminal scale was proposed by , which divides BF values into grades based on orders of magnitude, emphasizing decisive evidence for large values. Jeffreys' scale, as commonly referenced, is as follows:
BF10_{10}Evidence against M0M_0
> 100Decisive
30–100Very strong
10–30Strong
3–10Substantial
1–3Barely worth mentioning
This classification interprets BF10>1_{10} > 1 as favoring M1M_1 over M0M_0, with the strength increasing as the value grows; reciprocally, BF10<1_{10} < 1 (or BF01>1_{01} > 1) supports M0M_0. Kass and Raftery later modified this scale to align more closely with logarithmic transformations of , adjusting thresholds for practicality in empirical applications and incorporating a deviance-like measure (2 ln(BF)). Their revised scale is:
BF10_{10}2 ln(BF10_{10})Evidence against M0M_0
> 150> 10Very strong
20–1506–10Strong
3–202–6Positive
1–30–2Barely worth mentioning
This adjustment extends the "very strong" category to higher values while broadening the "positive" range, facilitating interpretation in contexts like . Both scales maintain the directional guideline that BF10>1_{10} > 1 indicates data more compatible with M1M_1 than M0M_0, with the ratio quantifying the relative evidential support. Despite their utility, these thresholds are inherently arbitrary, serving as rough guides rather than strict cutoffs, and have varied across implementations (e.g., some adaptations use "extreme" instead of "decisive" for BF > 100). Moreover, Bayes factors are sensitive to model specification, as changes in how competing models are parameterized or nested can substantially alter the marginal likelihoods and thus the BF value, underscoring the need for careful model formulation.

Posterior Odds Connection

The Bayes factor connects directly to posterior odds through Bayes' theorem applied to model comparison. Specifically, the posterior odds in favor of model M1M_1 over model M2M_2 given data DD are obtained by multiplying the prior odds by the Bayes factor: P(M1D)P(M2D)=BF10×P(M1)P(M2),\frac{P(M_1 \mid D)}{P(M_2 \mid D)} = BF_{10} \times \frac{P(M_1)}{P(M_2)}, where BF10BF_{10} is the Bayes factor comparing M1M_1 to M2M_2. This relationship highlights the Bayes factor's role as the multiplicative update factor representing the evidence contributed solely by the data, independent of prior beliefs. Expanding this to posterior probabilities, let π1=P(M1)\pi_1 = P(M_1) and π2=P(M2)=1π1\pi_2 = P(M_2) = 1 - \pi_1; then P(M1D)=BF10π1BF10π1+π2.P(M_1 \mid D) = \frac{BF_{10} \pi_1}{BF_{10} \pi_1 + \pi_2}. In , when prior probabilities are fixed, the Bayes factor serves as a for the evidential content of the , allowing direct quantification of how the observed shifts belief between competing models without needing to recompute full posteriors for each prior adjustment. This makes it particularly valuable for objective comparisons, as it isolates the data's influence while priors handle subjective elements. A common default assumption in Bayes factor applications is equal prior probabilities (π1=π2=0.5\pi_1 = \pi_2 = 0.5), which simplifies the posterior odds to equal the Bayes factor itself and the posterior probability to P(M1D)=BF101+BF10P(M_1 \mid D) = \frac{BF_{10}}{1 + BF_{10}}. This assumption rests on the premise that the models are a priori equally plausible, often justified in exploratory analyses or when domain knowledge lacks strong preferences, though it can be sensitive to model complexity if not carefully considered. In contrast to non-Bayesian approaches, where likelihood ratios compare point estimates of under each model, the Bayes factor employs marginal likelihoods that integrate over parameter priors, providing a fuller evidential measure that accounts for model .

Computation Methods

Exact Calculation

Exact calculation of the Bayes factor is possible in cases where the marginal likelihoods under each model can be derived analytically or evaluated via direct numerical methods, particularly for models with low-dimensional spaces or distributions. These approaches avoid the need for simulation-based approximations and provide precise values, though they are limited to relatively simple model structures. In models employing conjugate priors, such as the normal distribution with known variance or the with a beta prior, the marginal likelihoods admit closed-form expressions, enabling straightforward computation of the Bayes factor. For instance, consider testing a point H0:μ=μ0H_0: \mu = \mu_0 against an alternative H1:μN(μ0,σ02)H_1: \mu \sim \mathcal{N}(\mu_0, \sigma_0^2) for data x1,,xn\iidN(μ,σ2)x_1, \dots, x_n \iid \mathcal{N}(\mu, \sigma^2) with known σ2\sigma^2. The Bayes factor BF01BF_{01} favoring the null is given by BF01=σ02+σ2nσ2nexp(n(xˉμ0)2σ022σ2(σ02+σ2n)).BF_{01} = \sqrt{ \frac{ \sigma_0^2 + \frac{\sigma^2}{n} }{ \frac{\sigma^2}{n} } } \exp\left( -\frac{ n (\bar{x} - \mu_0)^2 \sigma_0^2 }{ 2 \sigma^2 \left( \sigma_0^2 + \frac{\sigma^2}{n} \right) } \right).
Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.