Hubbry Logo
Continuity correctionContinuity correctionMain
Open search
Continuity correction
Community hub
Continuity correction
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Continuity correction
Continuity correction
from Wikipedia

In mathematics, a continuity correction is an adjustment made when a discrete object is approximated using a continuous object.

Examples

[edit]

Binomial

[edit]

If a random variable X has a binomial distribution with parameters n and p, i.e., X is distributed as the number of "successes" in n independent Bernoulli trials with probability p of success on each trial, then

for any x ∈ {0, 1, 2, ... n}. If np and np(1 − p) are large (sometimes taken as both ≥ 5), then the probability above is fairly well approximated by

where Y is a normally distributed random variable with the same expected value and the same variance as X, i.e., E(Y) = np and var(Y) = np(1 − p). This addition of 1/2 to x is a continuity correction.

Poisson

[edit]

A continuity correction can also be applied when other discrete distributions supported on the integers are approximated by the normal distribution. For example, if X has a Poisson distribution with expected value λ then the variance of X is also λ, and

if Y is normally distributed with expectation and variance both λ.

Applications

[edit]

Before the ready availability of statistical software having the ability to evaluate probability distribution functions accurately, continuity corrections played an important role in the practical application of statistical tests in which the test statistic has a discrete distribution: it had a special importance for manual calculations. A particular example of this is the binomial test, involving the binomial distribution, as in checking whether a coin is fair. Where extreme accuracy is not necessary, computer calculations for some ranges of parameters may still rely on using continuity corrections to improve accuracy while retaining simplicity.

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In statistics, a continuity correction (also known as the continuity correction factor) is an adjustment applied when approximating a discrete —such as the binomial or —with a continuous one, typically distribution, by adding or subtracting 0.5 from the boundary values to account for the discreteness of the original variable. This technique enhances the accuracy of the approximation, particularly for probabilities involving inequalities or exact values, as the continuous distribution inherently spans all real numbers while the discrete one is limited to integers. The primary purpose of the continuity correction is to bridge the gap between the "step-like" nature of discrete distributions and the smooth curve of the normal distribution, reducing error in calculated probabilities under the Central Limit Theorem. For instance, to approximate P(X ≤ k) for a binomial random variable X, the correction adjusts it to P(Y < k + 0.5) where Y follows the approximating normal distribution with mean μ = np and variance σ² = np(1-p); similarly, P(X ≥ k) becomes P(Y > k - 0.5). This adjustment is recommended when the sample size n is sufficiently large, specifically when np ≥ 5 and n(1-p) ≥ 5, ensuring the normal approximation is valid. Without the correction, approximations can overestimate or underestimate probabilities, especially near the tails or for small n. Continuity corrections have broad applications in , including hypothesis testing and intervals. In the context of the , they are routinely used for large n to compute cumulative probabilities efficiently without exact methods. A related form, known as Yates' continuity correction, applies specifically to the of independence in 2×2 contingency tables by subtracting 0.5 from the absolute differences in expected and observed frequencies, improving accuracy when equal 1 and sample sizes are small. Proposed by Frank Yates in as an ad-hoc improvement, this variant addresses inflation in test statistics due to discreteness. Overall, while modern computational tools often allow exact calculations, continuity corrections remain valuable for hand computations and theoretical analyses.

Fundamentals

Definition and Purpose

Continuity correction is a statistical adjustment applied when approximating a discrete probability distribution, which assigns probabilities to specific points (e.g., P(X=k)P(X = k) for values of kk), with a continuous distribution that uses probability densities over intervals. This discreteness causes inherent errors in approximations, such as using distribution to model binomial outcomes, because the continuous density spreads probability continuously while the discrete distribution concentrates it at points. Without correction, the approximation tends to underestimate probabilities near boundaries or in tails. The primary purpose of continuity correction is to refine these approximations by bridging the gap between discrete and continuous models, typically by expanding or shifting the integration limits by 0.5 units to account for the "width" of each discrete point as if it were a . This adjustment is particularly useful for normal approximations to discrete distributions like the binomial or Poisson, enhancing accuracy in probability calculations, hypothesis tests, and confidence intervals for moderate sample sizes where exact computation is feasible but cumbersome. In practice, for cumulative probabilities like P(Xk)P(X \leq k), one computes P(Xk+0.5)P(X \leq k + 0.5) under the continuous distribution, and similarly for other inequalities. A general for approximating a point probability is P(X=k)k0.5k+0.5f(x)dxP(X = k) \approx \int_{k-0.5}^{k+0.5} f(x) \, dx, where f(x)f(x) is the of the continuous approximating distribution (e.g., normal). This represents the area under the continuous curve over the interval centered at kk with width 1, mimicking the discrete mass at that point. The technique traces its origins to early 19th-century efforts to improve normal approximations to the binomial, with systematic development in the , including Feller's influential analysis of its mathematical justification in 1957 and Yates' 1934 proposal for related corrections in chi-square tests. By reducing systematic bias, continuity correction improves estimates in central and tail regions; for instance, in binomial settings with n=20n=20 and p=0.4p=0.4, the uncorrected normal approximation for P(X7)P(X \leq 7) yields 0.3240 versus the 0.4177, while the corrected version yields 0.4046, closer to the true value. This makes it valuable for practical applications where sample sizes are not extremely large, though its benefits diminish as nn increases and the provides better inherent accuracy.

Mathematical Basis

The continuity correction originates from the conceptual modeling of a discrete random variable's probability mass at each point kk as being uniformly distributed over the interval [k0.5,k+0.5][k - 0.5, k + 0.5] of width 1, centered at the . This uniform distribution assumption treats the discrete steps as continuous intervals, justifying a shift of ±0.5\pm 0.5 in the boundaries to align the discrete (CDF) with its continuous approximation. Under this framework, the probability mass P(X=k)P(X = k) is equated to the of a of height 1 over that interval, which integrates to 1, providing a between discrete and continuous representations. For a general discrete integer-valued XX, the probability P(aXb)P(a \leq X \leq b) where aa and bb are integers is approximated by P(a0.5<Yb+0.5)P(a - 0.5 < Y \leq b + 0.5), with YY following the continuous approximating distribution (typically normal for large samples). This adjustment accounts for the "half-unit" on either side of the discrete points, effectively smoothing the step function of the discrete CDF to better match the continuous CDF. A proof sketch for the point mass approximation illustrates this: for a degenerate discrete distribution with P(X=k)=1P(X = k) = 1, the equivalent continuous model spreads this mass uniformly over [k0.5,k+0.5][k - 0.5, k + 0.5] with density f(y)=1f(y) = 1 for y[k0.5,k+0.5]y \in [k - 0.5, k + 0.5] and 0 elsewhere, yielding k0.5k+0.5f(y)dy=1\int_{k-0.5}^{k+0.5} f(y) \, dy = 1, which exactly matches the discrete probability. For non-degenerate cases, the local density of the approximating continuous distribution integrated over each such interval k0.5k+0.5fY(y)dy\int_{k-0.5}^{k+0.5} f_Y(y) \, dy approximates P(X=k)P(X = k), with the ±0.5\pm 0.5 shift ensuring the intervals capture the full mass without overlap or gap under the uniformity assumption. The Euler-Maclaurin formula provides a rigorous basis for error analysis in this approximation, expressing the difference between a sum (discrete probabilities) and an integral (continuous) as a series involving Bernoulli numbers and derivatives of the density. Without correction, the normal approximation to the binomial CDF incurs an error of order O(1/n)O(1/\sqrt{n})
Add your contribution
Related Hubs
User Avatar
No comments yet.