Hubbry Logo
Beta distributionBeta distributionMain
Open search
Beta distribution
Community hub
Beta distribution
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Beta distribution
Beta distribution
from Wikipedia
Not found
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The Beta distribution is a continuous probability distribution defined on the interval (0, 1) and parameterized by two positive shape parameters, commonly denoted as α>0\alpha > 0 and β>0\beta > 0, which control its shape and concentration around the mean. It serves as a flexible model for bounded random variables, such as proportions, probabilities, or fractions, and is particularly useful due to its conjugate prior properties in Bayesian inference for binomial likelihoods. The distribution arises naturally from the normalization of the product of two power functions and is closely related to the Beta function, B(α,β)=01tα1(1t)β1dtB(\alpha, \beta) = \int_0^1 t^{\alpha-1}(1-t)^{\beta-1} \, dt, which appears in its normalizing constant. The probability density function of the Beta distribution is given by f(x;α,β)=xα1(1x)β1B(α,β),0<x<1.f(x; \alpha, \beta) = \frac{x^{\alpha-1} (1-x)^{\beta-1}}{B(\alpha, \beta)}, \quad 0 < x < 1. Its mean is μ=αα+β\mu = \frac{\alpha}{\alpha + \beta} and variance is σ2=αβ(α+β)2(α+β+1)\sigma^2 = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}, with the mode at α1α+β2\frac{\alpha - 1}{\alpha + \beta - 2} for α>1\alpha > 1, β>1\beta > 1. These moments highlight the distribution's ability to produce U-shaped, J-shaped, unimodal, or uniform densities depending on the parameter values: for example, when α=β=1\alpha = \beta = 1, it reduces to the uniform distribution on (0,1); equal α=β>1\alpha = \beta > 1 yields symmetry around 0.5; and unequal values skew it toward 0 or 1. The Beta distribution also generalizes to forms with location and scale parameters, shifting and stretching the support to any finite interval (a, b). In statistical applications, the Beta distribution is widely employed as a prior for the success probability in binomial or Bernoulli models, enabling closed-form posterior updates in Bayesian analysis. It models task durations in project management via the PERT method, where parameters are estimated from optimistic, most likely, and pessimistic times to capture uncertainty in task durations. Additionally, it appears in reliability engineering for proportions of defective items, resource assessment for probabilistic evaluations within bounded intervals, and as a component in deriving other distributions like the F and t via transformations. Its computational tractability in software like R and its role in Dirichlet-multinomial models further underscore its importance in modern data analysis.

Definitions

Probability density function

The Beta distribution is a continuous probability distribution defined on the interval [0, 1] with two positive shape parameters α>0\alpha > 0 and β>0\beta > 0. The probability density function of a Beta-distributed random variable XBeta(α,β)X \sim \text{Beta}(\alpha, \beta) is f(x;α,β)=xα1(1x)β1B(α,β),0x1,f(x; \alpha, \beta) = \frac{x^{\alpha-1} (1-x)^{\beta-1}}{B(\alpha, \beta)}, \quad 0 \leq x \leq 1, with f(x;α,β)=0f(x; \alpha, \beta) = 0 otherwise. Here, B(α,β)B(\alpha, \beta) denotes the Beta function, which acts as the normalizing constant ensuring that the density integrates to 1 over [0, 1]. The Beta function is defined by the integral B(α,β)=01tα1(1t)β1dtB(\alpha, \beta) = \int_0^1 t^{\alpha-1} (1-t)^{\beta-1} \, dt and admits an alternative expression in terms of the Gamma function: B(α,β)=Γ(α)Γ(β)Γ(α+β).B(\alpha, \beta) = \frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha + \beta)}. At the boundaries, the density exhibits singular behavior when α<1\alpha < 1, approaching infinity as x0+x \to 0^+, and when β<1\beta < 1, approaching infinity as x1x \to 1^-. In the special case α=β=1\alpha = \beta = 1, the density simplifies to the uniform distribution on [0, 1], with f(x;1,1)=1f(x; 1, 1) = 1 for 0x10 \leq x \leq 1.

Cumulative distribution function

The cumulative distribution function (CDF) of the Beta distribution with shape parameters α>0\alpha > 0 and β>0\beta > 0 is given by F(x;α,β)=0xf(t;α,β)dt=Ix(α,β),F(x; \alpha, \beta) = \int_0^x f(t; \alpha, \beta) \, dt = I_x(\alpha, \beta), for 0x10 \leq x \leq 1, where f(t;α,β)f(t; \alpha, \beta) is the probability density function and Ix(α,β)I_x(\alpha, \beta) denotes the regularized incomplete beta function. The regularized incomplete beta function is defined as Ix(α,β)=B(x;α,β)B(α,β),I_x(\alpha, \beta) = \frac{B(x; \alpha, \beta)}{B(\alpha, \beta)}, with the incomplete beta function B(x;α,β)=0xtα1(1t)β1dtB(x; \alpha, \beta) = \int_0^x t^{\alpha-1} (1-t)^{\beta-1} \, dt and the (complete) beta function B(α,β)=01tα1(1t)β1dt=Γ(α)Γ(β)Γ(α+β)B(\alpha, \beta) = \int_0^1 t^{\alpha-1} (1-t)^{\beta-1} \, dt = \frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha + \beta)}. This representation follows directly from the fundamental theorem of calculus, as the PDF is the derivative of the CDF. The CDF satisfies F(0;α,β)=0F(0; \alpha, \beta) = 0 and F(1;α,β)=1F(1; \alpha, \beta) = 1, and is strictly increasing on [0,1][0, 1] because the PDF is positive and continuous on (0,1)(0, 1) for α>0\alpha > 0 and β>0\beta > 0. No closed-form expression exists in terms of elementary functions, so numerical evaluation is required. The incomplete beta function can be computed using series expansions, such as the hypergeometric representation Bx(α,β)=xαα2F1(α,1β;α+1;x),B_x(\alpha, \beta) = \frac{x^\alpha}{\alpha} \, {}_2F_1(\alpha, 1 - \beta; \alpha + 1; x), where 2F1{}_2F_1 is the Gauss hypergeometric function, or via continued fractions for efficient convergence in certain regions. Specifically, the continued fraction form for the regularized version converges rapidly when x<(α+1)/(α+β+2)x < (\alpha + 1)/(\alpha + \beta + 2), expressed as Ix(α,β)=xα(1x)βαB(α,β)11+d111+d211+,I_x(\alpha, \beta) = \frac{x^\alpha (1 - x)^\beta}{\alpha B(\alpha, \beta)} \cfrac{1}{1 + d_1 \cfrac{1}{1 + d_2 \cfrac{1}{1 + \ddots}}}, with partial denominators d2m=m(βm)x(α+2m1)(α+2m)d_{2m} = \frac{m(\beta - m)x}{(\alpha + 2m - 1)(\alpha + 2m)} and d2m+1=(α+m)(α+β+m)x(α+2m)(α+2m+1)d_{2m+1} = -\frac{(\alpha + m)(\alpha + \beta + m)x}{(\alpha + 2m)(\alpha + 2m + 1)}. These methods ensure accurate computation for statistical applications involving the Beta distribution.

Parameterizations

The Beta distribution is most commonly parameterized using two shape parameters, denoted α>0\alpha > 0 and β>0\beta > 0, which control the shape of the distribution on the support interval [0,1][0, 1]. In this standard form, the probability density function is given by f(x;α,β)=1B(α,β)xα1(1x)β1,0<x<1,f(x; \alpha, \beta) = \frac{1}{B(\alpha, \beta)} x^{\alpha-1} (1-x)^{\beta-1}, \quad 0 < x < 1, where B(α,β)=01tα1(1t)β1dt=Γ(α)Γ(β)Γ(α+β)B(\alpha, \beta) = \int_0^1 t^{\alpha-1} (1-t)^{\beta-1} \, dt = \frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha + \beta)} is the beta function. These parameters allow the distribution to model a wide range of shapes, from uniform (α=β=1\alpha = \beta = 1) to highly skewed forms, and are conjugate priors for binomial likelihoods in Bayesian inference. An alternative parameterization expresses the shape parameters in terms of the mean μ(0,1)\mu \in (0,1) and variance σ2<μ(1μ)\sigma^2 < \mu(1-\mu). Here, α=μ(μ(1μ)σ21)\alpha = \mu \left( \frac{\mu(1-\mu)}{\sigma^2} - 1 \right) and β=(1μ)(μ(1μ)σ21)\beta = (1-\mu) \left( \frac{\mu(1-\mu)}{\sigma^2} - 1 \right). This form is particularly useful when specifying the distribution based on moments, such as in project management or simulation studies where mean and variability are elicited directly. Closely related is the precision parameterization, which uses the mean μ(0,1)\mu \in (0,1) and a precision parameter ϕ>0\phi > 0 (representing the total "sample size" or concentration). In this setup, α=μϕ\alpha = \mu \phi and β=(1μ)ϕ\beta = (1-\mu) \phi, yielding variance σ2=μ(1μ)ϕ+1\sigma^2 = \frac{\mu(1-\mu)}{\phi + 1}. Higher ϕ\phi values concentrate the distribution around μ\mu, making it suitable for modeling proportions with varying reliability, as in group-based trajectory models or beta regression. The four-parameter generalization extends the standard Beta to an arbitrary interval [a,b][a, b] with a<ba < b, incorporating location aa and scale bab - a alongside shape parameters α>0\alpha > 0 and β>0\beta > 0. The probability density function becomes f(x;α,β,a,b)=1B(α,β)(ba)α+β1(xa)α1(bx)β1,a<x<b.f(x; \alpha, \beta, a, b) = \frac{1}{B(\alpha, \beta) (b-a)^{\alpha + \beta - 1}} (x - a)^{\alpha - 1} (b - x)^{\beta - 1}, \quad a < x < b. This form, often called the four-parameter Beta or PERT distribution in project evaluation and review technique (PERT) applications, models bounded variables like task durations by shifting and scaling the standard Beta. In PERT, α\alpha and β\beta are sometimes further reparameterized using the mode or mean to align with optimistic, most likely, and pessimistic estimates. A niche parameterization arises in the context of order statistics from uniform samples. The kk-th order statistic in a sample of size nn from a Uniform[0,1][0,1] distribution follows a Beta(k,nk+1)(k, n - k + 1) distribution, providing a direct link to sample quantiles. This form highlights the Beta's role in non-parametric statistics and spacing distributions, where parameters reflect sample size and rank.

Properties

Mode

The mode of the Beta distribution, which represents the value of xx in [0,1][0, 1] that maximizes the probability density function and serves as a measure of central tendency, is given by m=α1α+β2m = \frac{\alpha - 1}{\alpha + \beta - 2} when α>1\alpha > 1 and β>1\beta > 1. This formula arises from finding the critical point of the density by taking the derivative and setting it to zero. Specifically, the log-density is logf(x)=C+(α1)logx+(β1)log(1x)\log f(x) = C + (\alpha - 1) \log x + (\beta - 1) \log(1 - x), where CC is a constant; differentiating yields ddxlogf(x)=α1xβ11x=0\frac{d}{dx} \log f(x) = \frac{\alpha - 1}{x} - \frac{\beta - 1}{1 - x} = 0, which simplifies to the mode expression above. When α1\alpha \leq 1 and β>1\beta > 1, the mode is at the boundary m=0m = 0, as the density is non-increasing over [0,1][0, 1]. Conversely, if β1\beta \leq 1 and α>1\alpha > 1, the mode is at m=1m = 1. In the special case where α=β=1\alpha = \beta = 1, the distribution is uniform on [0,1][0, 1], and every point is equally likely, so no unique mode exists. The Beta distribution has a single mode (unimodal) when at least one of α\alpha or β\beta is greater than or equal to 1: in the interior if both exceed 1, or at the corresponding boundary if one is less than 1 and the other exceeds 1. It is bimodal with modes at both boundaries when both parameters are less than 1. In the uniform case (α=β=1\alpha = \beta = 1), every point is equally likely, so no unique mode exists. The location of the mode shifts toward 1 as α\alpha increases relative to β\beta, reflecting greater concentration of probability mass near the upper endpoint of the support.

Median

The median of a Beta distribution with shape parameters α>0\alpha > 0 and β>0\beta > 0, denoted m(α,β)m(\alpha, \beta), is defined as the value satisfying F(m(α,β);α,β)=12F(m(\alpha, \beta); \alpha, \beta) = \frac{1}{2}, where F(x;α,β)F(x; \alpha, \beta) is the cumulative distribution function given by the regularized incomplete beta function Ix(α,β)I_x(\alpha, \beta). No closed-form expression for the median exists in general, except in special cases such as when α=1\alpha = 1 or β=1\beta = 1, or when the distribution is symmetric. It is typically computed numerically by finding the inverse of the CDF. For the symmetric case where α=β\alpha = \beta, the median is exactly m(α,α)=12m(\alpha, \alpha) = \frac{1}{2}, coinciding with both the mean and the mode. In skewed Beta distributions (where αβ\alpha \neq \beta), the median provides a robust measure of central tendency, being less sensitive to the asymmetry than the mean, which is pulled toward the longer tail. This property follows the mode-median-mean inequality, where for right-skewed distributions (α<β\alpha < \beta), mode \leq median \leq mean, and the reverse for left-skewed cases. A useful closed-form approximation for the median, valid for α,β>1\alpha, \beta > 1, is m(α,β)α13α+β23.m(\alpha, \beta) \approx \frac{\alpha - \frac{1}{3}}{\alpha + \beta - \frac{2}{3}}. This approximation arises as a refinement of the mode formula and exhibits relative errors below 4% for α,β1\alpha, \beta \geq 1, improving to under 1% for α,β2\alpha, \beta \geq 2, with errors decreasing as the parameters increase. For large α+β\alpha + \beta, it converges closely to the true median and outperforms simpler alternatives like the mean. Numerical computation of the median often employs root-finding algorithms such as the bisection method or Newton-Raphson iteration on the equation Ix(α,β)12=0I_x(\alpha, \beta) - \frac{1}{2} = 0, leveraging efficient implementations of the incomplete beta function. These methods are reliable for practical applications, especially in statistical software where the inverse CDF is a standard routine.

Mean

The expected value, or mean, of a Beta-distributed random variable XBeta(α,β)X \sim \text{Beta}(\alpha, \beta) with shape parameters α>0\alpha > 0 and β>0\beta > 0 is given by μ=E[X]=αα+β.\mu = \mathbb{E}[X] = \frac{\alpha}{\alpha + \beta}. This formula arises from the first moment of the distribution. Specifically, E[X]=01xf(x;α,β)dx\mathbb{E}[X] = \int_0^1 x f(x; \alpha, \beta) \, dx, where f(x;α,β)=xα1(1x)β1B(α,β)f(x; \alpha, \beta) = \frac{x^{\alpha-1} (1-x)^{\beta-1}}{B(\alpha, \beta)} is the probability density function and B(α,β)=Γ(α)Γ(β)Γ(α+β)B(\alpha, \beta) = \frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha + \beta)} is the beta function. The integral evaluates to E[X]=B(α+1,β)B(α,β)=Γ(α+1)Γ(β)/Γ(α+β+1)Γ(α)Γ(β)/Γ(α+β).\mathbb{E}[X] = \frac{B(\alpha + 1, \beta)}{B(\alpha, \beta)} = \frac{\Gamma(\alpha + 1) \Gamma(\beta) / \Gamma(\alpha + \beta + 1)}{\Gamma(\alpha) \Gamma(\beta) / \Gamma(\alpha + \beta)}. Applying the Gamma function recurrence Γ(z+1)=zΓ(z)\Gamma(z + 1) = z \Gamma(z) twice yields Γ(α+1)Γ(α)=α\frac{\Gamma(\alpha + 1)}{\Gamma(\alpha)} = \alpha and Γ(α+β)Γ(α+β+1)=1α+β\frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha + \beta + 1)} = \frac{1}{\alpha + \beta}, simplifying the expression to αα+β\frac{\alpha}{\alpha + \beta}. The mean μ\mu interprets as the long-run average proportion or success probability in processes modeled by the Beta distribution, which always falls in the interval (0, 1) for finite positive parameters. In Bayesian analysis, the Beta distribution acts as a conjugate prior for the binomial likelihood, where the prior mean αα+β\frac{\alpha}{\alpha + \beta} updates to the posterior mean α+sα+β+n\frac{\alpha + s}{\alpha + \beta + n} after observing ss successes in nn trials, providing a weighted balance between prior belief and data. When parameters are known, this mean is the exact central tendency; in Bayesian estimation, the posterior mean is unbiased under squared error loss.

Variance

The variance of a Beta-distributed random variable XBeta(α,β)X \sim \operatorname{Beta}(\alpha, \beta) with shape parameters α>0\alpha > 0 and β>0\beta > 0 is Var(X)=αβ(α+β)2(α+β+1).\operatorname{Var}(X) = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}. This formula quantifies the dispersion around the mean μ=α/(α+β)\mu = \alpha / (\alpha + \beta), and it can equivalently be expressed as Var(X)=μ(1μ)/(α+β+1)\operatorname{Var}(X) = \mu (1 - \mu) / (\alpha + \beta + 1). To derive this, first compute the second raw moment using the relation to the Beta function: E[X2]=B(α+2,β)/B(α,β)=α(α+1)(α+β)(α+β+1)E[X^2] = B(\alpha + 2, \beta) / B(\alpha, \beta) = \frac{\alpha (\alpha + 1)}{(\alpha + \beta) (\alpha + \beta + 1)}, where B(a,b)=Γ(a)Γ(b)/Γ(a+b)B(a, b) = \Gamma(a) \Gamma(b) / \Gamma(a + b). Then, apply the variance definition Var(X)=E[X2](E[X])2\operatorname{Var}(X) = E[X^2] - (E[X])^2, substituting E[X]=α/(α+β)E[X] = \alpha / (\alpha + \beta) to yield the formula above. The variance achieves its maximum value of 1/121/12 when α=β=1\alpha = \beta = 1, corresponding to the uniform distribution on [0,1][0, 1]. For a fixed mean μ\mu, the variance decreases as α+β\alpha + \beta increases, reflecting greater concentration around μ\mu.

Skewness

The skewness of the Beta distribution, denoted γ1\gamma_1, quantifies the asymmetry of its probability density function around the mean. It is defined as the third standardized central moment: γ1=μ3σ3,\gamma_1 = \frac{\mu_3}{\sigma^3}, where μ3=E[(Xμ)3]\mu_3 = \mathbb{E}[(X - \mu)^3] is the third central moment, μ=E[X]=αα+β\mu = \mathbb{E}[X] = \frac{\alpha}{\alpha + \beta} is the mean, and σ2=Var(X)=αβ(α+β)2(α+β+1)\sigma^2 = \mathrm{Var}(X) = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)} is the variance. To derive γ1\gamma_1, first compute the raw moments of the Beta distribution, given by E[Xk]=B(α+k,β)B(α,β)=i=0k1α+iα+β+i\mathbb{E}[X^k] = \frac{B(\alpha + k, \beta)}{B(\alpha, \beta)} = \prod_{i=0}^{k-1} \frac{\alpha + i}{\alpha + \beta + i} for positive integer kk, where BB is the beta function. The third raw moment is thus E[X3]=α(α+1)(α+2)(α+β)(α+β+1)(α+β+2).\mathbb{E}[X^3] = \frac{\alpha (\alpha + 1) (\alpha + 2)}{(\alpha + \beta) (\alpha + \beta + 1) (\alpha + \beta + 2)}. The second raw moment is E[X2]=μ2+σ2\mathbb{E}[X^2] = \mu^2 + \sigma^2. Substituting into the expansion μ3=E[X3]3μE[X2]+2μ3\mu_3 = \mathbb{E}[X^3] - 3 \mu \mathbb{E}[X^2] + 2 \mu^3 yields μ3=2αβ(βα)(α+β)3(α+β+1)(α+β+2).\mu_3 = \frac{2 \alpha \beta (\beta - \alpha)}{(\alpha + \beta)^3 (\alpha + \beta + 1) (\alpha + \beta + 2)}. Dividing by σ3\sigma^3 then gives the skewness: γ1=2(βα)α+β+1(α+β+2)αβ.\gamma_1 = \frac{2 (\beta - \alpha) \sqrt{\alpha + \beta + 1}}{(\alpha + \beta + 2) \sqrt{\alpha \beta}}.
Add your contribution
Related Hubs
User Avatar
No comments yet.