Hubbry Logo
Expected valueExpected valueMain
Open search
Expected value
Community hub
Expected value
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Expected value
Expected value
from Wikipedia
Not found
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In , the expected value (also known as the mathematical expectation, expectation, or simply the mean) of a is a measure of the that represents the long-run value of the over infinitely many independent repetitions of the associated experiment. For a discrete XX taking values xix_i with probabilities pip_i, the expected value is calculated as E[X]=xipiE[X] = \sum x_i p_i; for a continuous with f(x)f(x), it is E[X]=xf(x)dxE[X] = \int_{-\infty}^{\infty} x f(x) \, dx. This concept quantifies the "average" outcome weighted by the likelihood of each possibility, distinguishing it from the most probable value, and serves as the cornerstone for understanding distributions in statistics. The concept of expected value originated in the 17th century from analyses of games of chance, with introducing it in 1657 in his treatise De Ratiociniis in Ludo Aleae to compute fair divisions in interrupted games; it was later formalized by in 1718 and advanced by in 1814. Key properties of expected value underpin its utility across disciplines, with linearity of expectation being particularly notable: for any random variables R1R_1 and R2R_2 and constants a1,a2a_1, a_2, E[a1R1+a2R2]=a1E[R1]+a2E[R2]E[a_1 R_1 + a_2 R_2] = a_1 E[R_1] + a_2 E[R_2], holding even without between the variables. This property enables efficient computations in complex scenarios, such as using indicator random variables where E[IA]=Pr[A]E[I_A] = Pr[A] for an event AA. In statistics, expected value defines the population mean μ\mu, guiding hypothesis testing and confidence intervals; in and , it informs by calculating weighted averages of potential profits and costs, as in analyses for investments where outcomes are probabilistic. For instance, in evaluating a , expected value aggregates probabilities of dry holes (70%) versus successful yields (30%) to determine long-term viability, often yielding positive returns like $425,000 on average despite variability. Beyond core applications, expected value extends to and optimization, where it maximizes under , as in expected utility theory for rational choice. It also appears in algorithms, such as the coupon collector problem, where the expected trials to gather nn types is nHnn H_n (with HnH_n the ), approximately nlnn+γnn \ln n + \gamma n for large nn, illustrating its role in . Overall, expected value remains indispensable for modeling , from insurance pricing to expectations in neural networks, always emphasizing the balance between probability and payoff.

History and Etymology

Historical Development

The concept of expected value emerged in the mid-17th century amid efforts to resolve disputes in , particularly through the correspondence between and in 1654. Prompted by the Chevalier de Méré, they addressed the "," which involved fairly dividing stakes in an interrupted , such as dice or cards, based on the probabilities of completing the game. Their exchange, preserved in letters, laid foundational principles for calculating fair shares proportional to winning chances, marking the inception of systematic probability reasoning applied to expectations in games. Building on this, formalized the idea in his 1657 treatise De Ratiociniis in Ludo Aleae, the first published work on . Huygens introduced mathematical expectation as the value a player could reasonably anticipate from a game, using it to analyze fair divisions and advantages in various chance scenarios, such as lotteries and dice rolls. His propositions equated expectation to the weighted average of possible outcomes, providing a practical tool for gamblers and establishing expectation as a core probabilistic concept. Jacob Bernoulli advanced the notion significantly in his posthumously published 1713 work , extending expectations beyond simple games to broader combinatorial outcomes and moral certainty. Bernoulli demonstrated how repeated trials converge to the expected value, introducing the as a theorem justifying the reliability of expectations in empirical settings. His analysis connected expectations to binomial expansions, influencing applications in annuities and demographics. Abraham de Moivre further refined these ideas in his 1718 book The Doctrine of Chances, where he developed approximations linking expectations to the binomial distribution for large numbers of trials. De Moivre's methods allowed estimation of expected outcomes in complex scenarios, bridging combinatorial probability with continuous approximations and enhancing the precision of expectation calculations in insurance and gaming. The modern rigorous framework for expected value was established by Andrey Kolmogorov in his 1933 monograph Grundbegriffe der Wahrscheinlichkeitsrechnung, which axiomatized probability theory using measure theory. Kolmogorov integrated expectation as the Lebesgue integral of a random variable over the probability space, unifying discrete and continuous cases within a general abstract setting and enabling its application across mathematics and sciences.

Etymology

The term "expectation" in originated in the , deriving from the Latin expectatio, which was introduced in Frans van Schooten's 1657 Latin translation of ' treatise De ratiociniis in ludo aleae. This work, based on Huygens' unpublished Dutch manuscript Van Rekeningh in Spelen van Gluck (1656), addressed problems in games of chance, where the concept denoted the anticipated monetary gain a player could reasonably foresee from fair play. The Latin root exspectatio, from the verb exspectare meaning "to look out for" or "to await," aligned with the context of awaiting outcomes, emphasizing a balanced anticipation rather than mere hope. In French, the parallel term espérance mathématique ("mathematical hope" or "mathematical expectation") first appeared in a letter by Gabriel Cramer dated May 21, 1728, marking its initial documented use with the modern probabilistic meaning. This phrasing influenced subsequent works, including Pierre-Simon Laplace's adoption of espérance in Théorie analytique des probabilités (1812), where it signified the weighted average outcome. Meanwhile, in German mathematical literature, Erwartungswert ("expected value") emerged as an equivalent, with roots traceable to early 18th-century translations; for instance, Jakob Bernoulli employed related Latin expressions like valor expectationis (value of expectation) in (1713) to describe anticipated gains, and occasionally mediocris to denote the mean or average value in probabilistic calculations. The English adoption evolved further in the 19th century, with coining "mathematical expectation" in An Essay on Probabilities (1838) to formalize the numerical aspect of the concept. By the , "expected value" supplanted "expectation" in many English texts to underscore its role as a precise , avoiding connotations of subjective ; this shift is evident in works like Arne Fisher's The Mathematical Theory of Probabilities (1915), which used the term to highlight the of a random variable's distribution.

Notations and Terminology

Standard Notations

The standard notation for the expected value of a XX is E[X]E[X], where EE stands for expectation. Alternative notations include E(X)\mathcal{E}(X) or E[X]\mathbb{E}[X], the latter often using to distinguish it in printed texts. The integral form xdF(x)\int x \, dF(x) represents the expected value in terms of the FF. For conditional expectation, the subscripted notation E[XY]E[X \mid Y] is commonly used, indicating the expected value of XX given the random variable YY. In statistics, the expected value of a random variable is frequently denoted by μ\mu, representing the population mean. For multiple random variables, the joint expectation may be written as E[X,Y]E[X,Y], denoting the expectation of their product XYXY. Variance serves as a fundamental measure of the dispersion or spread of a 's values around its , quantifying the average squared deviation from the . Formally, for a XX, the variance is defined as Var(X)=E[(XE[X])2]\operatorname{Var}(X) = E[(X - E[X])^2], which captures the second of the distribution. This concept highlights how expected value acts as the from which variability is assessed, with higher variance indicating greater unpredictability in outcomes relative to the . Covariance extends this idea to pairs of random variables, measuring the joint variability between them by assessing how deviations from their respective expected values tend to align. It is defined as Cov(X,Y)=E[(XE[X])(YE[Y])]\operatorname{Cov}(X, Y) = E[(X - E[X])(Y - E[Y])] for random variables XX and YY, where positive values suggest that above-average occurrences in one variable correspond with above-average in the other, indicating positive association. Conceptually, covariance links the expected values of XX and YY to their shared fluctuations, providing insight into dependence without assuming linearity. The (MGF) of a XX, denoted MX(t)=E[etX]M_X(t) = E[e^{tX}], encapsulates all moments of the distribution, with the expected value E[X]E[X] corresponding to the first moment obtained by differentiating the MGF and evaluating [at t](/page/AT&T)=0. This relation underscores expected value as the foundational moment from which higher-order moments like variance derive. In essence, the MGF provides a generating tool where expected value emerges as the primary derivative, facilitating analysis of distributional properties. In , the sample represents an empirical computed from observed , serving as an of the theoretical expected value, which is the population parameter defined probabilistically. While the sample varies with each realization of the , the expected value remains fixed as the long-term under repeated sampling. This distinction emphasizes that expected value is an intrinsic property of the random variable's distribution, whereas the sample approximates it through finite observations. The conceptually ties these ideas together by stating that, under suitable conditions, the sample mean converges to the expected value as the number of independent observations increases, justifying the use of empirical averages to infer theoretical expectations. This convergence, often in probability or , illustrates how repeated sampling diminishes the influence of variability around the expected value. Thus, it bridges the gap between the abstract expected value and practical .

Core Definitions

Finite Discrete Random Variables

A finite discrete random variable XX takes on a finite number of distinct values x1,x2,,xnx_1, x_2, \dots, x_n in the real numbers, each occurring with probability P(X=xi)=pi>0P(X = x_i) = p_i > 0, where the satisfies i=1npi=1\sum_{i=1}^n p_i = 1. The expected value E[X]E[X], also known as the or first moment, is defined as the sum E[X]=i=1nxipi.E[X] = \sum_{i=1}^n x_i p_i. This formulation arises in the axiomatic foundations of probability, where the expectation captures the center of mass of the distribution under a discrete uniform measure scaled by probabilities. The expected value serves as a weighted of the possible outcomes, with the probabilities pip_i acting as weights that reflect their relative likelihoods; if all pi=1/np_i = 1/n, it reduces to the of the xix_i. This interpretation aligns with the , indicating that the sample average from many independent repetitions of the experiment converges to E[X]E[X]. For a six-sided die, where XX denotes the shown and each outcome from 1 to 6 has probability 1/61/6, the expected value is E[X]=k=16k16=216=3.5.E[X] = \sum_{k=1}^6 k \cdot \frac{1}{6} = \frac{21}{6} = 3.5. This result implies that, over many rolls, the outcome approaches 3.5, even though no single roll yields this value. Consider a biased flip where XX is the payoff: +5 for heads (with P(heads)=0.6P(\text{heads}) = 0.6) and -5 for tails (with P(tails)=0.4P(\text{tails}) = 0.4). The expected value is E[X]=0.65+0.4(5)=32=1.E[X] = 0.6 \cdot 5 + 0.4 \cdot (-5) = 3 - 2 = 1. In repeated plays, the average payoff would thus approach +1 per flip. This extends to binary outcomes such as a bet with win probability pp, payoff ww on win, and payoff ll on loss, where E[X]=pw+(1p)lE[X] = p \cdot w + (1-p) \cdot l, a special case of the general discrete sum with two terms.

Countable Discrete Random Variables

For a countable discrete random variable XX taking values in a countable set {xi:iZ}\{x_i : i \in \mathbb{Z}\}, the expected value is defined as E[X]=i=xiP(X=xi),E[X] = \sum_{i=-\infty}^{\infty} x_i P(X = x_i), provided the series converges absolutely, meaning i=xiP(X=xi)<\sum_{i=-\infty}^{\infty} |x_i| P(X = x_i) < \infty. This absolute convergence ensures the sum is well-defined regardless of the enumeration of the support, distinguishing it from the finite case where simple summation always applies without convergence concerns. The expectation exists and is finite if and only if xiP(X=xi)<\sum |x_i| P(X = x_i) < \infty, which is equivalent to both the positive part xi>0xiP(X=xi)<\sum_{x_i > 0} x_i P(X = x_i) < \infty and the negative part xi<0xiP(X=xi)<\sum_{x_i < 0} |x_i| P(X = x_i) < \infty. A classic example is the geometric distribution, modeling the number of failures before the first success in independent Bernoulli trials with success probability p(0,1]p \in (0,1], where P(X=k)=p(1p)kP(X = k) = p (1-p)^k for k=0,1,2,k = 0, 1, 2, \dots. Here, E[X]=1ppE[X] = \frac{1-p}{p}, and the series converges due to the exponential decay of probabilities. Another is the with parameter λ>0\lambda > 0, where P(X=k)=eλλkk!P(X = k) = e^{-\lambda} \frac{\lambda^k}{k!} for k=0,1,2,k = 0, 1, 2, \dots, yielding E[X]=λE[X] = \lambda, with convergence assured by the factorial growth in the denominator. The expectation may fail to exist for distributions with heavy tails, where probabilities decay too slowly, causing the series xiP(X=xi)\sum |x_i| P(X = x_i) to diverge. For instance, consider P(X=n)=1n(n+1)P(X = n) = \frac{1}{n(n+1)} for n=1,2,n = 1, 2, \dots, which satisfies normalization but leads to E[X]=n=1nn(n+1)=n=11n+1=E[|X|] = \sum_{n=1}^{\infty} \frac{n}{n(n+1)} = \sum_{n=1}^{\infty} \frac{1}{n+1} = \infty, rendering the expectation undefined.

Continuous Random Variables

For a continuous XX with f(x)f(x), the expected value E[X]E[X] is defined as the Lebesgue E[X]=xf(x)dx,E[X] = \int_{-\infty}^{\infty} x f(x) \, dx, provided the exists. This requires that f(x)0f(x) \geq 0 for all xx, f(x)dx=1\int_{-\infty}^{\infty} f(x) \, dx = 1, and of the , i.e., xf(x)dx<\int_{-\infty}^{\infty} |x| f(x) \, dx < \infty. Without , the expected value is undefined, even if the principal value exists. An equivalent expression for E[X]E[X] can be obtained using the cumulative distribution function F(x)=xf(t)dtF(x) = \int_{-\infty}^{x} f(t) \, dt, which facilitates computation in cases where differentiating the CDF to obtain f(x)f(x) is cumbersome: E[X]=0[1F(x)]dx0F(x)dx.E[X] = \int_{0}^{\infty} [1 - F(x)] \, dx - \int_{-\infty}^{0} F(x) \, dx. This tail formula decomposes the expectation into contributions from the positive and negative parts of XX, with each integral representing the expected contribution from the respective tail of the distribution. A classic example is the uniform distribution on the interval [a,b][a, b], where a<ba < b and the density is f(x)=1baf(x) = \frac{1}{b-a} for x[a,b]x \in [a, b] and 0 otherwise. The expected value is E[X]=abx1badx=a+b2,E[X] = \int_{a}^{b} x \cdot \frac{1}{b-a} \, dx = \frac{a + b}{2}, the midpoint of the interval, reflecting the symmetry of the distribution. For the exponential distribution with rate parameter λ>0\lambda > 0, the density is f(x)=λeλxf(x) = \lambda e^{-\lambda x} for x0x \geq 0 and 0 otherwise. The expected value is E[X]=0xλeλxdx=1λ,E[X] = \int_{0}^{\infty} x \lambda e^{-\lambda x} \, dx = \frac{1}{\lambda}, which corresponds to the mean waiting time in a Poisson process with rate λ\lambda. Using the tail formula, since F(x)=1eλxF(x) = 1 - e^{-\lambda x} for x0x \geq 0, it simplifies to 0eλxdx=1λ\int_{0}^{\infty} e^{-\lambda x} \, dx = \frac{1}{\lambda}, confirming the result without direct integration against the density.

Advanced Definitions

General Real-Valued Random Variables

In measure-theoretic probability, the expected value of a real-valued random variable X:ΩRX: \Omega \to \mathbb{R} defined on a probability space (Ω,F,P)(\Omega, \mathcal{F}, P) is given by the Lebesgue integral E[X]=ΩX(ω)dP(ω),E[X] = \int_{\Omega} X(\omega) \, dP(\omega), provided this integral exists. This definition is equivalent to the integral with respect to the cumulative distribution function FXF_X of XX, E[X]=xdFX(x),E[X] = \int_{-\infty}^{\infty} x \, dF_X(x), where the integral is understood in the Riemann–Stieltjes sense. The expected value E[X]E[X] is said to exist (and be finite) E[X]<E[|X|] < \infty, where E[X]=ΩX(ω)dP(ω).E[|X|] = \int_{\Omega} |X(\omega)| \, dP(\omega). In cases where E[X+]<E[|X^+|] < \infty and E[X]=E[|X^-|] = \infty (or vice versa), E[X]E[X] may be defined as ++\infty or -\infty, but the absolute expectation is infinite./04:_Expected_Value/4.01:_Definitions_and_Basic_Properties) This measure-theoretic formulation unifies the cases of discrete and continuous random variables: for discrete XX taking values in a countable set, the expectation reduces to an integral with respect to the counting measure on that set, recovering the summation form; for continuous XX, it corresponds to integration with respect to Lebesgue measure weighted by the (when it exists). As an illustration, consider a general Bernoulli random variable XX on (Ω,F,P)(\Omega, \mathcal{F}, P) such that X(ω)=1X(\omega) = 1 if ωAF\omega \in A \in \mathcal{F} with P(A)=p[0,1]P(A) = p \in [0,1] and X(ω)=0X(\omega) = 0 otherwise. Then E[X]=ΩX(ω)dP(ω)=1P(A)+0P(Ac)=pE[X] = \int_{\Omega} X(\omega) \, dP(\omega) = 1 \cdot P(A) + 0 \cdot P(A^c) = p, and E[X]=p<E[|X|] = p < \infty.

Infinite Expected Values

In probability theory, the expected value E[X]E[X] of a real-valued random variable XX is defined as E[X+]E[X]E[X^+] - E[X^-], where X+=max(X,0)X^+ = \max(X, 0) and X=min(X,0)X^- = -\min(X, 0) are the positive and negative parts, respectively. If E[X+]=+E[X^+] = +\infty and E[X]<E[X^-] < \infty, then E[X]=+E[X] = +\infty; similarly, E[X]=E[X] = -\infty if E[X]=+E[X^-] = +\infty and E[X+]<E[X^+] < \infty. The expectation is undefined if both E[X+]=+E[X^+] = +\infty and E[X]=+E[X^-] = +\infty. A classic illustration of an infinite expected value is the St. Petersburg paradox, first posed by Nicolaus Bernoulli in a 1713 letter and later analyzed by Daniel Bernoulli in 1738. In this game, a fair coin is flipped until the first heads appears on the kk-th trial, yielding a payoff of 2k2^k units; the expected value is k=12k(1/2)k=k=11=+\sum_{k=1}^\infty 2^k \cdot (1/2)^k = \sum_{k=1}^\infty 1 = +\infty. Despite this infinite expectation, rational agents typically value the game at only a finite amount, often due to considerations of utility or risk aversion rather than the raw expectation. Examples of distributions with infinite or undefined expectations include the Cauchy distribution and certain Pareto distributions. For the standard Cauchy distribution with probability density function f(x)=1π(1+x2)f(x) = \frac{1}{\pi(1 + x^2)} for xRx \in \mathbb{R}, the expectation is undefined because both 0xf(x)dx=+\int_{-\infty}^0 |x| f(x) \, dx = +\infty and 0xf(x)dx=+\int_0^\infty x f(x) \, dx = +\infty. /05%3A_Special_Distributions/5.14%3A_The_Cauchy_Distribution) Similarly, for a Pareto distribution with shape parameter α1\alpha \leq 1 and minimum value xm>0x_m > 0, the density is f(x)=αxmαxα+1f(x) = \frac{\alpha x_m^\alpha}{x^{\alpha+1}} for xxmx \geq x_m, and the expectation E[X]=+E[X] = +\infty since the integral xmxf(x)dx\int_{x_m}^\infty x f(x) \, dx diverges. /05%3A_Special_Distributions/5.36%3A_The_Pareto_Distribution) Such infinite expectations have significant implications, particularly in limit theorems and applications. For instance, the strong law of large numbers fails to converge to a finite limit when the expectation is infinite; for nonnegative random variables with E[X]=+E[X] = +\infty, the sample average Xˉn\bar{X}_n satisfies Xˉn+\bar{X}_n \to +\infty almost surely as nn \to \infty. In and , distributions with infinite means, such as heavy-tailed Pareto models for losses or returns, challenge traditional risk measures like value-at-risk, as extreme events dominate and standard averaging breaks down, necessitating alternative approaches like tail dependence or infinite-mean estimators.

Properties

Basic Properties

The expected value, often denoted as E[X]E[X] for a random variable XX, possesses several fundamental algebraic properties that underpin its utility in . These properties hold under minimal assumptions, such as the finiteness of the expected value, and apply to both discrete and continuous random variables. They are derived directly from the definitions of expected value as a sum or , leveraging the of and integration. One of the most essential properties is , which states that for any constants aa and bb and random variables XX and YY (which may be dependent or independent), E[aX+bY]=aE[X]+bE[Y]E[aX + bY] = a E[X] + b E[Y]. This holds regardless of the joint distribution of XX and YY, making it particularly powerful for computations involving sums of random variables. The proof follows from the : for discrete cases, E[aX+bY]=(axi+byi)P(X=xi,Y=yi)=axiP(X=xi,Y=yi)+byiP(X=xi,Y=yi)=aE[X]+bE[Y]E[aX + bY] = \sum (a x_i + b y_i) P(X=x_i, Y=y_i) = a \sum x_i P(X=x_i, Y=y_i) + b \sum y_i P(X=x_i, Y=y_i) = a E[X] + b E[Y], using the linearity of finite sums; a similar argument applies to integrals for continuous cases. Another basic property is monotonicity: if XYX \leq Y (i.e., with probability 1), and both expected values are finite, then E[X]E[Y]E[X] \leq E[Y]. This follows by applying linearity to E[YX]=E[Y]E[X]E[Y - X] = E[Y] - E[X] and noting that YX0Y - X \geq 0 , which implies E[YX]0E[Y - X] \geq 0 (see non-negativity below). For proof sketches, in the discrete case, the sum (yixi)P(X=xi,Y=yi)0\sum (y_i - x_i) P(X=x_i, Y=y_i) \geq 0 since each term is non-negative; integration yields the continuous analog. Non-negativity asserts that if X0X \geq 0 almost surely, then E[X]0E[X] \geq 0 (assuming finiteness). The proof is immediate from the definition, as the sum or integral of non-negative terms weighted by probabilities (which are non-negative) cannot be negative. This property extends naturally to the expected value of a constant: for any constant cc, E=cE = c, since the random variable is constant with probability 1, and the sum or integral simplifies directly to cc. A useful consequence arises with indicator random variables. For an event AA, the indicator 1A1_A (which equals 1 if AA occurs and 0 otherwise) has E[1A]=P(A)E[1_A] = P(A), directly from the definition since E[1A]=1P(A)+0(1P(A))=P(A)E[1_A] = 1 \cdot P(A) + 0 \cdot (1 - P(A)) = P(A) in the discrete case, or by integration over the density in the continuous case. This connection highlights how expected value generalizes probability measures.

Inequalities

Markov's inequality is a fundamental result in that bounds the tail probability of a non-negative using its expected value. For a non-negative random variable XX with finite expectation and any a>0a > 0, P(Xa)E[X]a.P(X \geq a) \leq \frac{E[X]}{a}. This inequality holds under the assumption that E[X]<E[X] < \infty, and it applies to both discrete and continuous random variables. The proof relies on the integral representation of the expectation for non-negative XX: E[X]=0P(Xt)dtE[X] = \int_0^\infty P(X \geq t) \, dt. Since P(Xt)P(X \geq t) is non-increasing, the integral from aa to \infty satisfies aP(Xt)dtaP(Xa)\int_a^\infty P(X \geq t) \, dt \geq a \cdot P(X \geq a), leading directly to the bound. For discrete cases, a similar summation argument yields E[X]=k=1P(Xk)aP(Xa)E[X] = \sum_{k=1}^\infty P(X \geq k) \geq a \cdot P(X \geq a). Equality holds if P(X=0)+P(X=a)=1P(X = 0) + P(X = a) = 1. Chebyshev's inequality extends Markov's result to bound deviations from the mean using the variance. For a random variable XX with finite mean μ=E[X]\mu = E[X] and variance σ2=Var(X)<\sigma^2 = \operatorname{Var}(X) < \infty, and for any k>0k > 0, P(Xμkσ)1k2.P(|X - \mu| \geq k \sigma) \leq \frac{1}{k^2}. This assumes the existence of the second moment E[X2]<E[X^2] < \infty. The inequality provides a distribution-free upper bound on the probability of large deviations. The proof follows by applying Markov's inequality to the non-negative random variable Y=(Xμ)2Y = (X - \mu)^2: P(Xμkσ)=P(Yk2σ2)E[Y]/(k2σ2)=σ2/(k2σ2)=1/k2P(|X - \mu| \geq k \sigma) = P(Y \geq k^2 \sigma^2) \leq E[Y] / (k^2 \sigma^2) = \sigma^2 / (k^2 \sigma^2) = 1/k^2. Equality occurs when P(Xμ=kσ)=1P(|X - \mu| = k \sigma) = 1. Jensen's inequality relates the expected value of a function to the function of the for convex functions. If ϕ\phi is a convex function and XX is a random variable with finite expectation E[X]<E[X] < \infty, then ϕ(E[X])E[ϕ(X)],\phi(E[X]) \leq E[\phi(X)], provided E[ϕ(X)]<E[|\phi(X)|] < \infty. For concave ϕ\phi, the inequality reverses. This holds for real-valued random variables where the relevant moments exist. The proof uses the definition of convexity: for any x,yx, y and λ[0,1]\lambda \in [0,1], ϕ(λx+(1λ)y)λϕ(x)+(1λ)ϕ(y)\phi(\lambda x + (1-\lambda) y) \leq \lambda \phi(x) + (1-\lambda) \phi(y). Expressing E[X]E[X] as an integral or sum, the inequality follows by integrating the convexity condition with respect to the distribution of XX. For twice-differentiable ϕ\phi, non-negativity of ϕ\phi'' implies convexity and supports the result via Taylor expansion. Equality holds if ϕ\phi is affine on the support of XX or if XX is constant almost surely. Hölder's inequality generalizes the Cauchy-Schwarz inequality to bound the expectation of products using conjugate exponents. For random variables XX and YY with finite moments E[Xp]<E[|X|^p] < \infty and E[Yq]<E[|Y|^q] < \infty, where p>1p > 1, q=p/(p1)q = p/(p-1) (so 1/p+1/q=11/p + 1/q = 1), E[XY](E[Xp])1/p(E[Yq])1/q.|E[XY]| \leq \left( E[|X|^p] \right)^{1/p} \left( E[|Y|^q] \right)^{1/q}. This assumes the pp-th and qq-th moments exist and are finite. The case p=q=2p = q = 2 recovers Cauchy-Schwarz. The proof employs : for a=Xp/pa = |X|^p / p, b=Yq/qb = |Y|^q / q, aba+bab \leq a + b, leading to XYXp/p+Yq/q|XY| \leq |X|^p / p + |Y|^q / q. Taking expectations and optimizing yields the bound. Equality holds when Xp|X|^p and Yq|Y|^q are proportional almost surely.

Convergence and Limits

In , the expected value of a of random variables does not necessarily converge to the expected value of the limit under mere pointwise or probabilistic convergence, necessitating specific conditions to interchange limits and expectations. These conditions arise from measure-theoretic foundations and ensure the preservation of integrability and the validity of limit operations on expectations. The provides one such condition for non-negative . Specifically, if (Xn)n=1(X_n)_{n=1}^\infty is a of non-negative random variables such that XnXX_n \uparrow X (i.e., 0X1(ω)X2(ω)X(ω)0 \leq X_1(\omega) \leq X_2(\omega) \leq \cdots \leq X(\omega) for ω\omega), then E[Xn]E[X]\mathbb{E}[X_n] \uparrow \mathbb{E}[X]. This theorem guarantees that the expectations increase monotonically to the expectation of the limit, allowing the interchange of limit and expectation under monotonicity. A more general result is the , which relaxes the monotonicity requirement at the cost of an integrability bound. If XnXX_n \to X , and there exists a YY with E[Y]<\mathbb{E}[|Y|] < \infty such that XnY|X_n| \leq Y for all nn, then E[Xn]E[X]\mathbb{E}[X_n] \to \mathbb{E}[X] and E[XnX]0\mathbb{E}[|X_n - X|] \to 0. In probabilistic terms, the almost sure convergence can be weakened to convergence in probability under the same domination condition. This theorem is pivotal for establishing convergence of expectations in settings where sequences are bounded by an integrable dominator, such as in stochastic processes or limit theorems. Even without domination or monotonicity, uniform integrability offers a sufficient condition for interchanging limits and expectations. A sequence (Xn)(X_n) is uniformly integrable if limcsupnE[Xn1Xnc]=0\lim_{c \to \infty} \sup_n \mathbb{E}[|X_n| \mathbf{1}_{|X_n| \geq c}] = 0. If XnXX_n \to X almost surely, E[Xn]<\mathbb{E}[|X_n|] < \infty for all nn, and (Xn)(X_n) is uniformly integrable, then E[X]<\mathbb{E}[X] < \infty and E[Xn]E[X]\mathbb{E}[X_n] \to \mathbb{E}[X]. Uniform integrability controls the contribution of large tails uniformly across the sequence, ensuring L¹ convergence and thus the desired limit for expectations; it is equivalent to the condition that E[XnX]0\mathbb{E}[|X_n - X|] \to 0 under almost sure convergence. Fatou's lemma provides an inequality rather than equality, serving as a foundational tool for proving the above theorems. For a sequence of non-negative random variables Xn0X_n \geq 0, it states that E[lim infnXn]lim infnE[Xn]\mathbb{E}[\liminf_{n \to \infty} X_n] \leq \liminf_{n \to \infty} \mathbb{E}[X_n]. This lower semicontinuity of the expectation functional holds without additional assumptions beyond non-negativity, bounding the expectation of the limit inferior by the limit inferior of the expectations. Convergence in probability alone does not suffice to preserve expectations, as illustrated by counterexamples where the mass of the distribution "escapes" to infinity. Consider a uniform random variable UU on [0,1][0,1], and define Xn=nX_n = n if U1/nU \leq 1/n and Xn=0X_n = 0 otherwise. Then Xn0X_n \to 0 in probability, since P(Xn>ϵ)=1/n0\mathbb{P}(|X_n| > \epsilon) = 1/n \to 0 for any ϵ>0\epsilon > 0, but E[Xn]=n(1/n)=1↛0\mathbb{E}[X_n] = n \cdot (1/n) = 1 \not\to 0. This "spiking" or "moving bump" phenomenon highlights the need for tail control, as the rare but large values prevent expectation convergence despite probabilistic convergence to zero.

Expected Values of Distributions

Discrete Distributions

The expected value of a discrete random variable XX with probability mass function p(x)p(x) is given by E[X]=xxp(x)E[X] = \sum_x x \, p(x), where the sum is over the support of XX. For the , XX takes values 0 or 1 with success probability pp, so the PMF is p(0)=1pp(0) = 1 - p and p(1)=pp(1) = p. The expected value is E[X]=0(1p)+1p=pE[X] = 0 \cdot (1 - p) + 1 \cdot p = p. The models the number of in nn independent Bernoulli trials, each with probability pp. The PMF is p(x)=(nx)px(1p)nxp(x) = \binom{n}{x} p^x (1 - p)^{n - x} for x=0,1,,nx = 0, 1, \dots, n. The expected value follows from the linearity of expectation applied to the sum of nn indicator variables, yielding E[X]=npE[X] = np. The counts the number of failures before the rr-th in independent Bernoulli trials with probability pp. The PMF is p(x)=(x+r1x)pr(1p)xp(x) = \binom{x + r - 1}{x} p^r (1 - p)^x for x=0,1,2,x = 0, 1, 2, \dots. The expected value is E[X]=r(1p)/pE[X] = r(1 - p)/p, derived by viewing XX as the sum of rr independent geometric random variables each counting failures before a . The Poisson distribution with parameter λ>0\lambda > 0 models the number of events in a fixed interval, with PMF p(k)=λkeλk!p(k) = \frac{\lambda^k e^{-\lambda}}{k!} for k=0,1,2,k = 0, 1, 2, \dots. The expected value is E[Y]=k=0kλkeλk!=λeλk=1λk1(k1)!=λE[Y] = \sum_{k=0}^\infty k \frac{\lambda^k e^{-\lambda}}{k!} = \lambda e^{-\lambda} \sum_{k=1}^\infty \frac{\lambda^{k-1}}{(k-1)!} = \lambda, recognizing the sum as eλe^\lambda. The , in the convention of trials until the first , has PMF p(x)=(1[p](/page/P′′))x1[p](/page/P′′)p(x) = (1 - [p](/page/P′′))^{x-1} [p](/page/P′′) for x=1,2,[3,](/page/3Dots)x = 1, 2, [3, \dots](/page/3_Dots), where [p](/page/P′′)[p](/page/P′′) is the probability. The expected value is E[X]=x=1x(1[p](/page/P′′))x1[p](/page/P′′)=1[p](/page/P′′)E[X] = \sum_{x=1}^\infty x (1 - [p](/page/P′′))^{x-1} [p](/page/P′′) = \frac{1}{[p](/page/P′′)}, obtained by differentiating the sum x=[0](/page/0)[q](/page/Q)x=1/(1[q](/page/Q))\sum_{x=[0](/page/0)}^\infty [q](/page/Q)^x = 1/(1 - [q](/page/Q)) for [q](/page/Q)=1[p](/page/P′′)[q](/page/Q) = 1 - [p](/page/P′′).
DistributionParametersExpected Value E[X]E[X]
Bernoulli[p](/page/P′′)(0,1)[p](/page/P′′) \in (0,1)[p](/page/P′′)[p](/page/P′′)
BinomialnNn \in \mathbb{N}, [p](/page/P′′)(0,1)[p](/page/P′′) \in (0,1)n[p](/page/P′′)n[p](/page/P′′)
Negative BinomialrNr \in \mathbb{N}, [p](/page/P′′)(0,1)[p](/page/P′′) \in (0,1)r(1[p](/page/P′′))/[p](/page/P′′)r(1-[p](/page/P′′))/[p](/page/P′′)
Poissonλ>0\lambda > 0λ\lambda
[p](/page/P′′)(0,1)[p](/page/P′′) \in (0,1)1/[p](/page/P′′)1/[p](/page/P′′)

Continuous Distributions

For continuous random variables, the expected value is defined as the integral of the product of the variable and its (pdf) over the entire real line, ensuring the integral converges: E[X]=xf(x)dxE[X] = \int_{-\infty}^{\infty} x f(x) \, dx, where f(x)f(x) is the pdf. This contrasts with discrete cases by replacing with integration, providing the long-run average value under the distribution. Common continuous distributions have closed-form expected values derived through direct integration. For the uniform distribution on [a,b][a, b] with pdf f(x)=1baf(x) = \frac{1}{b-a} for axba \leq x \leq b (and 0 otherwise), the expected value is obtained by E[X]=abx1badx=a+b2E[X] = \int_a^b x \cdot \frac{1}{b-a} \, dx = \frac{a + b}{2}. For the with rate parameter λ>0\lambda > 0 and pdf f(x)=λeλxf(x) = \lambda e^{-\lambda x} for x0x \geq 0 (and 0 otherwise), yields E[X]=0xλeλxdx=1λE[X] = \int_0^{\infty} x \lambda e^{-\lambda x} \, dx = \frac{1}{\lambda}. The normal distribution N(μ,σ2)N(\mu, \sigma^2), with pdf f(x)=12πσ2exp((xμ)22σ2)f(x) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left( -\frac{(x - \mu)^2}{2\sigma^2} \right)
Add your contribution
Related Hubs
User Avatar
No comments yet.