Hubbry Logo
Degenerate distributionDegenerate distributionMain
Open search
Degenerate distribution
Community hub
Degenerate distribution
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Degenerate distribution
Degenerate distribution
from Wikipedia
Degenerate univariate
Cumulative distribution function
Plot of the degenerate distribution CDF for a = 0
CDF for a = 0. The horizontal axis is x.
Parameters
Support
PMF
CDF
Mean
Median
Mode
Variance
Skewness undefined
Excess kurtosis undefined
Entropy
MGF
CF
PGF

In probability theory, a degenerate distribution on a measure space is a probability distribution whose support is a null set with respect to . For instance, in the n-dimensional space n endowed with the Lebesgue measure, any distribution concentrated on a d-dimensional subspace with d < n is a degenerate distribution on n.[1] This is essentially the same notion as a singular probability measure, but the term degenerate is typically used when the distribution arises as a limit of (non-degenerate) distributions.

When the support of a degenerate distribution consists of a single point a, this distribution is a Dirac measure in a: it is the distribution of a deterministic random variable equal to a with probability 1. This is a special case of a discrete distribution; its probability mass function equals 1 in a and 0 everywhere else.

In the case of a real-valued random variable, the cumulative distribution function of the degenerate distribution localized in a is Such degenerate distributions often arise as limits of continuous distributions whose variance goes to 0.

Constant random variable

[edit]

A constant random variable is a discrete random variable that takes a constant value, regardless of any event that occurs. This is technically different from an almost surely constant random variable, which may take other values, but only on events with probability zero: Let X: Ω → ℝ be a real-valued random variable defined on a probability space (Ω, ℙ). Then X is an almost surely constant random variable if there exists such that and is furthermore a constant random variable if A constant random variable is almost surely constant, but the converse is not true, since if X is almost surely constant then there may still exist γ ∈ Ω such that X(γ) ≠ a.

For practical purposes, the distinction between X being constant or almost surely constant is unimportant, since these two situation correspond to the same degenerate distribution: the Dirac measure.

Higher dimensions

[edit]

Degeneracy of a multivariate distribution in n random variables arises when the support lies in a space of dimension less than n.[1] This occurs when at least one of the variables is a deterministic function of the others. For example, in the 2-variable case suppose that Y = aX + b for scalar random variables X and Y and scalar constants a ≠ 0 and b; here knowing the value of one of X or Y gives exact knowledge of the value of the other. All the possible points (x, y) fall on the one-dimensional line y = ax + b.[citation needed]

In general when one or more of n random variables are exactly linearly determined by the others, if the covariance matrix exists its rank is less than n[1][verification needed] and its determinant is 0, so it is positive semi-definite but not positive definite, and the joint probability distribution is degenerate.[citation needed]

Degeneracy can also occur even with non-zero covariance. For example, when scalar X is symmetrically distributed about 0 and Y is exactly given by Y = X2, all possible points (x, y) fall on the parabola y = x2, which is a one-dimensional subset of the two-dimensional space.[citation needed]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In , a degenerate distribution is a concentrated entirely on a single point, where a takes a fixed value cc with probability 1. This corresponds to the Dirac delta measure centered at cc. This makes it a deterministic case with no , serving as a trivial or baseline example in statistical modeling. For a univariate real-valued XX, the assigns P(X=c)=1P(X = c) = 1 and P(X=x)=0P(X = x) = 0 for all xcx \neq c, while the is a jumping from 0 to 1 at cc. The () is exactly cc, and the variance is 0, reflecting the absence of variability. In the multivariate setting, a random vector X=(X1,,Xk)X = (X_1, \dots, X_k) with k>1k > 1 has a degenerate distribution if there exists a non-zero vector aa such that aTXa^T X equals a constant with probability 1, implying the support lies on a lower-dimensional . Degenerate distributions often arise as limiting cases of non-degenerate distributions in convergence theorems, such as weak convergence where a of random variables converges in probability to a constant. They play a key role in foundational results like the , where the sample mean converges to its expectation, yielding a degenerate limit at that value. Although lacking practical , they provide essential theoretical insights into probability measures and distribution theory.

Fundamentals

Definition

A degenerate distribution is a that assigns probability 1 to a single point, known as the degenerate point, in the and probability 0 to all other outcomes. This makes it the distribution of a constant , where the outcome is deterministic with no randomness involved. In measure-theoretic probability, a degenerate distribution is formally defined as the δx\delta_x at a point xx, which places all mass at xx. The support of the distribution is the singleton set {x}\{x\}, such that for a XX following this distribution, P(X=x)=1P(X = x) = 1. In contrast to non-degenerate distributions, which exhibit variability across multiple outcomes, a degenerate distribution has zero variance and lacks any spread, effectively collapsing the probability mass to a single value. This property distinguishes it as a boundary case in , often arising in limiting scenarios.

Probability Measures

The , denoted δx\delta_x, is a fundamental probability measure associated with the degenerate distribution concentrated at a point xx in a measurable space (S,S)(S, \mathcal{S}). It is defined such that for any measurable set ASA \in \mathcal{S}, δx(A)=1\delta_x(A) = 1 if xAx \in A and δx(A)=0\delta_x(A) = 0 otherwise. This construction ensures that δx\delta_x assigns the entire probability mass of 1 to the singleton {x}\{x\}, making it a valid probability measure since δx(S)=1\delta_x(S) = 1. In the context of , the represents the distribution of a deterministic random variable that takes the value xx with probability 1. For a discrete XX following a degenerate distribution at a point aa, the (PMF) is given by pX(k)=1p_X(k) = 1 if k=ak = a and pX(k)=0p_X(k) = 0 otherwise, for all kk in the support. This PMF fully captures the measure-theoretic structure, where the probability is entirely concentrated at the single point aa, aligning with the δa\delta_a. In the continuous setting, a degenerate distribution does not admit a true probability density function with respect to Lebesgue measure, as the support is a single point of measure zero. However, the Dirac delta function δ(xa)\delta(x - a) serves as a generalized density, satisfying the normalization condition δ(xa)dx=1\int_{-\infty}^{\infty} \delta(x - a) \, dx = 1. This generalized function acts as the continuous analog of the Dirac measure, enabling the representation of expectations through integration. Consequently, for any ff, the expectation with respect to the degenerate distribution is E[f(X)]=f(a)E[f(X)] = f(a), reflecting the concentration of the at aa. This follows directly from the sifting property of the or delta function, where f(y)δx(dy)=f(x)\int f(y) \delta_x(dy) = f(x).

Univariate Case

Cumulative Distribution Function

The (CDF) of a univariate degenerate XX that takes the value aa with probability 1 is given by FX(x)={0if x<a,1if xa.F_X(x) = \begin{cases} 0 & \text{if } x < a, \\ 1 & \text{if } x \geq a. \end{cases} This form reflects the concentration of all probability mass at the single point aa. The CDF FX(x)F_X(x) is non-decreasing and right-continuous, as required for any valid CDF, with a single jump discontinuity of height 1 at x=ax = a. The left-hand limit at aa is 0, and the right-hand limit is 1, while limxFX(x)=0\lim_{x \to -\infty} F_X(x) = 0 and limxFX(x)=1\lim_{x \to \infty} F_X(x) = 1. This step-function behavior arises because the distribution assigns no probability to any interval not containing aa, and full probability to those that do. Graphically, the CDF appears as a horizontal line at height 0 for all x<ax < a, followed by a vertical jump to height 1 at x=ax = a, and then remains constant at 1 for x>ax > a. This representation underscores the deterministic nature of the degenerate distribution. The function is equivalently expressed using the as FX(x)=I{xa}F_X(x) = I_{\{x \geq a\}}, where II denotes the indicator that equals 1 if the condition holds and 0 otherwise; it is also known as a shifted θ(xa)\theta(x - a).

Moments and Characteristics

The of a univariate degenerate XX concentrated at a point aRa \in \mathbb{R} is E[X]=aE[X] = a, as the distribution assigns probability 1 to the value aa. The variance follows directly as Var(X)=E[(Xa)2]=0\operatorname{Var}(X) = E[(X - a)^2] = 0, reflecting the complete lack of dispersion in the distribution. Higher-order raw moments are given by E[Xk]=akE[X^k] = a^k for any positive kk, since X=aX = a with probability 1. The s μk=E[(Xa)k]\mu_k = E[(X - a)^k] are zero for all k2k \geq 2, while the first central moment is zero by definition; this underscores the deterministic nature of the distribution, where no variability affects moment calculations beyond the . The characteristic function of XX is ϕ(t)=E[eitX]=eita\phi(t) = E[e^{itX}] = e^{ita} for tRt \in \mathbb{R}, which serves as the of the Dirac delta measure at aa. All measures of location, including the median, mode, and quantiles, coincide at aa, as the cumulative distribution function jumps from 0 to 1 exactly at this point. The Shannon entropy H(X)=p(x)logp(x)=0H(X) = -\sum p(x) \log p(x) = 0, since the distribution is fully concentrated on a single outcome with probability 1, indicating zero uncertainty.

Multivariate Case

Geometric Interpretation

In the multivariate setting, a degenerate distribution places its entire probability measure on a lower-dimensional affine subspace of the n-dimensional Euclidean space Rn\mathbb{R}^n, where the dimension kk of this subspace satisfies k<nk < n. This concentration arises due to linear dependencies among the random variables, restricting the possible realizations to a proper subset that does not span the full space. Geometrically, the support forms a flat structure such as a point, line, plane, or higher-dimensional hyperplane embedded within Rn\mathbb{R}^n, with the distribution behaving as a non-degenerate probability measure only along this subspace. To illustrate in two dimensions, consider a degenerate distribution with k=0k=0, where the support is a single point, assigning probability 1 to a fixed location like (c,c)(c, c); for k=1k=1, the support reduces to a line, such as all points satisfying x1+x2=ax_1 + x_2 = a for some constant aa, forming a one-dimensional manifold; in contrast, a non-degenerate case with k=2k=2 would have support filling the entire plane. These examples highlight how degeneracy collapses the geometric extent, preventing the distribution from having positive density across the full ambient space. This structure extends the univariate degenerate distribution, which concentrates on a single point as a 0-dimensional case in R1\mathbb{R}^1. From a measure-theoretic perspective, a degenerate multivariate distribution is singular with respect to the λn\lambda_n on Rn\mathbb{R}^n whenever k<nk < n, as the support has λn\lambda_n-measure zero and the distribution assigns probability only to sets intersecting this lower-dimensional subspace. The degree of this degeneracy is captured by the nkn - k, which quantifies the "deficiency" in dimensionality relative to the full space, influencing properties like the impossibility of defining a density function over Rn\mathbb{R}^n.

Covariance Matrix Properties

In the multivariate case, the Σ\Sigma of a degenerate distribution supported on an rr-dimensional affine subspace of Rn\mathbb{R}^n (with r<nr < n) is singular, meaning its is zero, and its rank is exactly rr, reflecting the lower-dimensional nature of the support. This rank deficiency arises because the random vector XX lies in a proper subspace, preventing the distribution from having full support in Rn\mathbb{R}^n. As a symmetric positive semi-definite matrix, Σ\Sigma satisfies xTΣx0x^T \Sigma x \geq 0 for all xRnx \in \mathbb{R}^n, but it is not positive definite due to the existence of non-trivial vectors in its kernel. The eigenvalues of Σ\Sigma consist of exactly nrn - r zeros and rr non-negative values, with the non-zero eigenvalues determining the spread along the support directions, as per the applied to symmetric matrices. This structure underscores the semi-definiteness: the zero eigenvalues correspond to directions orthogonal to the support where there is no variance. A degenerate random vector XRnX \in \mathbb{R}^n can be represented as X=μ+AYX = \mu + A Y, where YRrY \in \mathbb{R}^r follows a non-degenerate distribution (e.g., multivariate normal with positive definite covariance), μRn\mu \in \mathbb{R}^n is the location vector, and AA is an n×rn \times r matrix of full column rank rr. The covariance matrix then takes the form Var(X)=AVar(Y)AT,\operatorname{Var}(X) = A \operatorname{Var}(Y) A^T, which inherits the rank rr from AA and Var(Y)\operatorname{Var}(Y), ensuring the singularity of Σ\Sigma. This parametrization highlights how the degeneracy propagates through linear mappings from a lower-dimensional space. Such rank deficiency implies linear dependence among the components of XX, affecting covariances and precluding mutual unless the dependence is trivial. For instance, if the second component satisfies X2=cX1+dX_2 = c X_1 + d for constants c,dc, d, then Cov(X1,X2)=cVar(X1)\operatorname{Cov}(X_1, X_2) = c \operatorname{Var}(X_1), illustrating how the off-diagonal entries capture the deterministic relationship.

Applications and Examples

Linear Transformations

Linear transformations of random variables can lead to degenerate distributions when the transformation effectively eliminates variability. A simple case occurs with a constant transformation, where Y=aY = a for some fixed constant aa; here, YY follows a degenerate distribution concentrated at aa, regardless of the distribution of any underlying random variables. This reflects the zero variance property inherent to degenerate distributions. More generally, consider a Y=cX+dY = c X + d, where XX is a and c,dc, d are constants. If c=0c = 0, then Y=dY = d , yielding a degenerate distribution at dd. Conversely, if XX itself is degenerate at some value mm, then YY is degenerate at cm+dc m + d, preserving the point-mass nature through the transformation. An illustrative example arises in linear regression models. When the residual term ε\varepsilon is identically zero, the model achieves a perfect fit, with all residuals degenerate at zero; in this scenario, the observed response values exactly equal the predicted values, resulting in no variability in the errors. In the multivariate setting, degeneracy manifests when applying an affine transformation Z=AX+bZ = A X + b, where XX is a random vector in Rn\mathbb{R}^n, AA is an m×nm \times n matrix with rank r<mr < m, and bb is a constant vector. The resulting distribution of ZZ is degenerate, supported solely on an affine subspace of dimension rr.

Limit Distributions

A sequence of probability distributions FnF_n converges in distribution to a degenerate distribution δa\delta_a if, for every continuity point xx of the limiting , Fn(x)0F_n(x) \to 0 for x<ax < a and Fn(x)1F_n(x) \to 1 for xax \geq a. This form of weak convergence captures the concentration of probability mass at the point aa, where the limiting equals aa with probability 1. A prominent example occurs in the normal distribution family: as the variance parameter σ20\sigma^2 \to 0 in the N(μ,σ2)N(\mu, \sigma^2) distribution, the probability density concentrates entirely at μ\mu, yielding the degenerate distribution δμ\delta_\mu. This limiting behavior illustrates how non-degenerate distributions with shrinking spread approach degeneracy. In statistical , a sequence of estimators θ^n\hat{\theta}_n is consistent for the true θ\theta if θ^n\hat{\theta}_n converges in probability to θ\theta, implying that the limiting distribution of θ^n\hat{\theta}_n is degenerate at θ\theta. This convergence ensures that, for large sample sizes, the estimator's variability diminishes, placing all probabilistic weight on the value. The provides another key instance: for independent and identically distributed random variables X1,X2,X_1, X_2, \dots with finite mean μ\mu, the sample mean Xˉn=1ni=1nXi\bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i converges (and thus in probability and distribution) to μ\mu, resulting in a degenerate limiting distribution δμ\delta_\mu. Under finite variance conditions, the weak law of large numbers establishes this via , highlighting the asymptotic certainty of the average.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.