Hubbry Logo
Joint probability distributionJoint probability distributionMain
Open search
Joint probability distribution
Community hub
Joint probability distribution
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Joint probability distribution
Joint probability distribution
from Wikipedia

Many sample observations (black) are shown from a joint probability distribution. The marginal densities are shown as well (in blue and in red).

Given random variables , that are defined on the same[1] probability space, the multivariate or joint probability distribution for is a probability distribution that gives the probability that each of falls in any particular range or discrete set of values specified for that variable. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables.

The joint probability distribution can be expressed in terms of a joint cumulative distribution function and either in terms of a joint probability density function (in the case of continuous variables) or joint probability mass function (in the case of discrete variables). These in turn can be used to find two other types of distributions: the marginal distribution giving the probabilities for any one of the variables with no reference to any specific ranges of values for the other variables, and the conditional probability distribution giving the probabilities for any subset of the variables conditional on particular values of the remaining variables.

Examples

[edit]

Draws from an urn

[edit]

Each of two urns contains twice as many red balls as blue balls, and no others, and one ball is randomly selected from each urn, with the two draws independent of each other. Let and be discrete random variables associated with the outcomes of the draw from the first urn and second urn respectively. The probability of drawing a red ball from either of the urns is 2/3, and the probability of drawing a blue ball is 1/3. The joint probability distribution is presented in the following table:

A=Red A=Blue P(B)
B=Red (2/3)(2/3) = 4/9 (1/3)(2/3) = 2/9 4/9 + 2/9 = 2/3
B=Blue (2/3)(1/3) = 2/9 (1/3)(1/3) = 1/9 2/9 + 1/9 = 1/3
P(A) 4/9 + 2/9 = 2/3 2/9 + 1/9 = 1/3

Each of the four inner cells shows the probability of a particular combination of results from the two draws; these probabilities are the joint distribution. In any one cell the probability of a particular combination occurring is (since the draws are independent) the product of the probability of the specified result for A and the probability of the specified result for B. The probabilities in these four cells sum to 1, as with all probability distributions.

Moreover, the final row and the final column give the marginal probability distribution for A and the marginal probability distribution for B respectively. For example, for A the first of these cells gives the sum of the probabilities for A being red, regardless of which possibility for B in the column above the cell occurs, as 2/3. Thus the marginal probability distribution for gives 's probabilities unconditional on , in a margin of the table.

Coin flips

[edit]

Consider the flip of two fair coins; let and be discrete random variables associated with the outcomes of the first and second coin flips respectively. Each coin flip is a Bernoulli trial and has a Bernoulli distribution. If a coin displays "heads" then the associated random variable takes the value 1, and it takes the value 0 otherwise. The probability of each of these outcomes is 1/2, so the marginal (unconditional) density functions are

The joint probability mass function of and defines probabilities for each pair of outcomes. All possible outcomes are Since each outcome is equally likely the joint probability mass function becomes

Since the coin flips are independent, the joint probability mass function is the product of the marginals:

Rolling a die

[edit]

Consider the roll of a fair die and let if the number is even (i.e. 2, 4, or 6) and otherwise. Furthermore, let if the number is prime (i.e. 2, 3, or 5) and otherwise.

1 2 3 4 5 6
A 0 1 0 1 0 1
B 0 1 1 0 1 0

Then, the joint distribution of and , expressed as a probability mass function, is

These probabilities necessarily sum to 1, since the probability of some combination of and occurring is 1.

Marginal probability distribution

[edit]

If more than one random variable is defined in a random experiment, it is important to distinguish between the joint probability distribution of X and Y and the probability distribution of each variable individually. The individual probability distribution of a random variable is referred to as its marginal probability distribution. In general, the marginal probability distribution of X can be determined from the joint probability distribution of X and other random variables.

If the joint probability density function of random variable X and Y is , the marginal probability density function of X and Y, which defines the marginal distribution, is given by:

where the first integral is over all points in the range of (X,Y) for which X=x and the second integral is over all points in the range of (X,Y) for which Y=y.[2]

Joint cumulative distribution function

[edit]

For a pair of random variables , the joint cumulative distribution function (CDF) is given by[3]: 89 

   (Eq.1)

where the right-hand side represents the probability that the random variable takes on a value less than or equal to and that takes on a value less than or equal to .

For random variables , the joint CDF is given by

   (Eq.2)

Interpreting the random variables as a random vector yields a shorter notation:

Joint density function or mass function

[edit]

Discrete case

[edit]

The joint probability mass function of two discrete random variables is:

   (Eq.3)

or written in terms of conditional distributions where is the probability of given that .

The generalization of the preceding two-variable case is the joint probability distribution of discrete random variables which is:

   (Eq.4)

or equivalently

This identity is known as the chain rule of probability.

Since these are probabilities, in the two-variable case

which generalizes for discrete random variables to

Continuous case

[edit]

The joint probability density function for two continuous random variables is defined as the derivative of the joint cumulative distribution function (see Eq.1):

   (Eq.5)

This is equal to:

where and are the conditional distributions of given and of given respectively, and and are the marginal distributions for and respectively.

The definition extends naturally to more than two random variables:

   (Eq.6)

Again, since these are probability distributions, one has respectively

Mixed case

[edit]

The "mixed joint density" may be defined where one or more random variables are continuous and the other random variables are discrete. With one variable of each type One example of a situation in which one may wish to find the cumulative distribution of one random variable which is continuous and another random variable which is discrete arises when one wishes to use a logistic regression in predicting the probability of a binary outcome Y conditional on the value of a continuously distributed outcome . One must use the "mixed" joint density when finding the cumulative distribution of this binary outcome because the input variables were initially defined in such a way that one could not collectively assign it either a probability density function or a probability mass function. Formally, is the probability density function of with respect to the product measure on the respective supports of and . Either of these two decompositions can then be used to recover the joint cumulative distribution function: The definition generalizes to a mixture of arbitrary numbers of discrete and continuous random variables.

Additional properties

[edit]

Joint distribution for independent variables

[edit]

In general two random variables and are independent if and only if the joint cumulative distribution function satisfies

Two discrete random variables and are independent if and only if the joint probability mass function satisfies for all and .

While the number of independent random events grows, the related joint probability value decreases rapidly to zero, according to a negative exponential law.

Similarly, two absolutely continuous random variables are independent if and only if for all and . This means that acquiring any information about the value of one or more of the random variables leads to a conditional distribution of any other variable that is identical to its unconditional (marginal) distribution; thus no variable provides any information about any other variable.

Joint distribution for conditionally dependent variables

[edit]

If a subset of the variables is conditionally dependent given another subset of these variables, then the probability mass function of the joint distribution is . is equal to . Therefore, it can be efficiently represented by the lower-dimensional probability distributions and . Such conditional independence relations can be represented with a Bayesian network or copula functions.

Covariance

[edit]

When two or more random variables are defined on a probability space, it is useful to describe how they vary together; that is, it is useful to measure the relationship between the variables. A common measure of the relationship between two random variables is the covariance. Covariance is a measure of linear relationship between the random variables. If the relationship between the random variables is nonlinear, the covariance might not be sensitive to the relationship, which means, it does not relate the correlation between two variables.

The covariance between the random variables and is[4]

Correlation

[edit]

There is another measure of the relationship between two random variables that is often easier to interpret than the covariance.

The correlation just scales the covariance by the product of the standard deviation of each variable. Consequently, the correlation is a dimensionless quantity that can be used to compare the linear relationships between pairs of variables in different units. If the points in the joint probability distribution of X and Y that receive positive probability tend to fall along a line of positive (or negative) slope, ρXY is near +1 (or −1). If ρXY equals +1 or −1, it can be shown that the points in the joint probability distribution that receive positive probability fall exactly along a straight line. Two random variables with nonzero correlation are said to be correlated. Similar to covariance, the correlation is a measure of the linear relationship between random variables.

The correlation coefficient between the random variables and is

Important named distributions

[edit]

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A joint probability distribution is a probability distribution that describes the simultaneous values of two or more random variables, specifying the probabilities for all possible combinations of their outcomes. For discrete random variables XX and YY, it is defined by the joint probability mass function pX,Y(x,y)=P(X=x,Y=y)p_{X,Y}(x,y) = P(X = x, Y = y), where pX,Y(x,y)0p_{X,Y}(x,y) \geq 0 for all x,yx, y in the support and xypX,Y(x,y)=1\sum_x \sum_y p_{X,Y}(x,y) = 1. For continuous random variables, it is given by the joint probability density function fX,Y(x,y)f_{X,Y}(x,y), where fX,Y(x,y)0f_{X,Y}(x,y) \geq 0 and fX,Y(x,y)dxdy=1\iint f_{X,Y}(x,y) \, dx \, dy = 1 over the entire plane, with the probability over a region AA being AfX,Y(x,y)dxdy\iint_A f_{X,Y}(x,y) \, dx \, dy. From the joint distribution, marginal distributions can be obtained by summing or integrating out the other variables; for discrete cases, the marginal PMF of XX is pX(x)=ypX,Y(x,y)p_X(x) = \sum_y p_{X,Y}(x,y), and similarly for continuous cases, fX(x)=fX,Y(x,y)dyf_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) \, dy. Conditional distributions, which describe the probability of one variable given the value of another, are derived as pYX(yx)=pX,Y(x,y)pX(x)p_{Y|X}(y|x) = \frac{p_{X,Y}(x,y)}{p_X(x)} for discrete variables (with pX(x)>0p_X(x) > 0) and analogously for continuous via fYX(yx)=fX,Y(x,y)fX(x)f_{Y|X}(y|x) = \frac{f_{X,Y}(x,y)}{f_X(x)}. Two random variables are independent if their joint distribution factors into the product of their marginal distributions, i.e., pX,Y(x,y)=pX(x)pY(y)p_{X,Y}(x,y) = p_X(x) p_Y(y) for discrete or fX,Y(x,y)=fX(x)fY(y)f_{X,Y}(x,y) = f_X(x) f_Y(y) for continuous cases, implying that knowledge of one variable provides no information about the other. Joint distributions form the foundation for multivariate statistical analysis, enabling the study of dependencies, correlations, and expectations in systems with multiple interacting random variables.

Introduction

Definition

A joint probability distribution is the probability distribution of two or more random variables defined on the same probability space, which specifies the probabilities associated with all possible combinations or tuples of outcomes from those variables. This generalizes the univariate case by capturing the simultaneous behavior of multiple variables, including any dependencies between them. The concept presupposes familiarity with basic elements of probability theory, such as random variables—functions from a sample space to the real numbers—and the underlying probability space consisting of the sample space, event algebra, and probability measure. Formally, within Kolmogorov's axiomatic framework, for discrete random variables XX and YY, the joint distribution is given by the probability mass function P(X=x,Y=y)P(X = x, Y = y) for all possible values xx and yy in their respective supports, satisfying the axioms of non-negativity, normalization to 1, and additivity over disjoint events. For continuous random variables, the joint distribution is described by a probability density function fX,Y(x,y)f_{X,Y}(x,y) such that the probability of (X,Y)(X, Y) falling in a region AA is the double integral AfX,Y(x,y)dxdy\iint_A f_{X,Y}(x,y) \, dx \, dy, again adhering to the Kolmogorov axioms extended to product measures on R2\mathbb{R}^2. This foundational idea was rigorously formalized in the early 20th century by Andrey Kolmogorov through his axiomatic treatment of probability theory, which provided a measure-theoretic basis for handling multiple dimensions via product spaces. Marginal distributions arise from the joint by summing (discrete) or integrating (continuous) out the other variables, yielding the univariate distribution of a single variable.

Bivariate and Multivariate Cases

In the bivariate case, the joint probability distribution concerns two random variables, typically denoted as XX and YY. For discrete random variables, this distribution is described by the joint probability mass function pX,Y(x,y)=P(X=x,Y=y)p_{X,Y}(x,y) = P(X = x, Y = y), where the probabilities are non-negative and sum to 1 over all possible pairs (x,y)(x, y). For continuous random variables, the joint probability density function fX,Y(x,y)f_{X,Y}(x,y) is employed, satisfying fX,Y(x,y)dxdy=1\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_{X,Y}(x,y) \, dx \, dy = 1, with probabilities computed via double integrals over regions in the plane. This formulation captures the simultaneous probabilistic behavior of XX and YY, enabling analysis of dependencies between them. The multivariate case extends the bivariate framework to n>2n > 2 random variables X1,,XnX_1, \dots, X_n. The joint probability mass function for discrete variables is p(x1,,xn)=P(X1=x1,,Xn=xn)p(x_1, \dots, x_n) = P(X_1 = x_1, \dots, X_n = x_n), while for continuous variables, it is the joint probability density function f(x1,,xn)f(x_1, \dots, x_n) with the normalization f(x1,,xn)dx1dxn=1\int \cdots \int f(x_1, \dots, x_n) \, dx_1 \cdots dx_n = 1. Vector notation is common, representing the variables as a boldface X=(X1,,Xn)\mathbf{X} = (X_1, \dots, X_n)^\top, which emphasizes the multidimensional nature of the distribution. Bivariate joint distributions benefit from straightforward visualization tools, such as scatter plots to depict sample realizations or contour plots to illustrate density levels in the two-dimensional plane, facilitating intuitive understanding of relationships like correlation. In contrast, multivariate distributions introduce greater complexity due to higher dimensionality, where direct visualization becomes impractical beyond three dimensions; instead, projections onto lower-dimensional subspaces or advanced techniques like parallel coordinates are often necessary to explore the structure. The joint distribution in both cases fully characterizes the underlying probability space, specifying all possible outcomes and their probabilities for the collection of variables. A notable special case arises when the variables are independent, in which the joint distribution factors into the product of the individual marginal distributions.

Examples

Discrete Uniform Distributions

A discrete uniform joint probability distribution occurs when all possible outcomes in a finite sample space are equally likely, assigning the same probability to each joint event. This setup is common in basic probability models involving independent or symmetrically structured discrete random variables, such as coin flips or die rolls, allowing straightforward computation of joint probabilities as the reciprocal of the number of outcomes. Consider two fair coin flips, where the random variables XX and YY represent the outcomes of the first and second flips, respectively, taking values heads (H) or tails (T). The sample space consists of four equally likely outcomes: (H,H), (H,T), (T,H), (T,T), each with joint probability mass function value pX,Y(x,y)=14p_{X,Y}(x,y) = \frac{1}{4}. This uniform distribution can be tabulated as follows:
X\YX \backslash YHT
H1/41/4
T1/41/4
The marginal distribution for a single flip is binomial with parameters n=1n=1 and p=1/2p=1/2. For two fair six-sided dice, let XX and YY denote the outcomes of the first and second die, each ranging from 1 to 6. The sample space has 36 equally likely ordered pairs, so the joint PMF is pX,Y(x,y)=136p_{X,Y}(x,y) = \frac{1}{36} for each x,y{1,2,3,4,5,6}x,y \in \{1,2,3,4,5,6\}. This uniform structure simplifies calculations for events like matching numbers or sums exceeding a threshold. These examples demonstrate uniform discrete joint distributions, where all outcomes are equiprobable, building intuition for more complex joint probability structures.

Continuous Uniform Distributions

In the continuous case, a joint uniform distribution arises when two random variables XX and YY are uniformly distributed over a rectangular region in the plane, such as [0,a]×[0,b][0, a] \times [0, b]. The joint probability density function (PDF) is given by fX,Y(x,y)=1ab,0xa, 0yb,f_{X,Y}(x,y) = \frac{1}{ab}, \quad 0 \leq x \leq a, \ 0 \leq y \leq b, and zero elsewhere, ensuring the total probability integrates to 1 over the region. This setup models scenarios where outcomes are equally likely across a bounded area, analogous to the discrete uniform but using densities instead of masses. A common example involves two independent continuous uniform random variables on [0,1][0,1], resulting in a joint uniform distribution over the unit square [0,1]×[0,1][0,1] \times [0,1] with PDF fX,Y(x,y)=1f_{X,Y}(x,y) = 1 for 0x,y10 \leq x,y \leq 1. This independence implies the joint PDF factors into the product of marginal uniforms, facilitating computations like expected values or probabilities within subregions. Such distributions are foundational for simulating random points in geometric probability problems, like Buffon's needle./5%3A_Probability_Distributions_for_Combinations_of_Random_Variables/5.2%3A_Joint_Distributions_of_Continuous_Random_Variables For basic illustrations contrasting with non-uniform cases, consider waiting times modeled by independent exponentials, which yield a joint density decreasing away from the origin; however, the uniform case simplifies analysis by assuming constant density across the support. Visualizations of these joint uniforms often employ contour plots, showing level curves of constant density (flat within the rectangle), or 3D surface plots depicting a flat "roof" over the support region to emphasize uniformity.

Marginal Distributions

Computation from Joint

To compute the marginal distribution of one random variable from the joint distribution of multiple random variables, the joint probability mass function (PMF) or probability density function (PDF) serves as the starting point. For discrete random variables XX and YY with joint PMF pX,Y(x,y)p_{X,Y}(x,y), the marginal PMF of XX is obtained by summing the joint probabilities over all possible values of YY in its support: pX(x)=ypX,Y(x,y),p_X(x) = \sum_{y} p_{X,Y}(x,y), where the summation is taken over the support of YY. Similarly, the marginal PMF of YY is pY(y)=xpX,Y(x,y)p_Y(y) = \sum_{x} p_{X,Y}(x,y). This process "marginalizes out" the other variable by aggregating probabilities across its outcomes. For continuous random variables XX and YY with joint PDF fX,Y(x,y)f_{X,Y}(x,y), the marginal PDF of XX is found by integrating the joint PDF over all values of YY, typically over the real line or a bounded region depending on the support: fX(x)=fX,Y(x,y)dy.f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) \, dy. The marginal PDF of YY follows analogously as fY(y)=fX,Y(x,y)dxf_Y(y) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) \, dx. Consider an example with two independent fair coin flips, where X=1X = 1 if the first coin lands heads and 00 otherwise, and Y=1Y = 1 if the second coin lands heads and 00 otherwise. The joint PMF is pX,Y(0,0)=pX,Y(0,1)=pX,Y(1,0)=pX,Y(1,1)=14p_{X,Y}(0,0) = p_{X,Y}(0,1) = p_{X,Y}(1,0) = p_{X,Y}(1,1) = \frac{1}{4}. The marginal PMF of XX is then pX(0)=pX,Y(0,0)+pX,Y(0,1)=14+14=12p_X(0) = p_{X,Y}(0,0) + p_{X,Y}(0,1) = \frac{1}{4} + \frac{1}{4} = \frac{1}{2} and pX(1)=12p_X(1) = \frac{1}{2}, which is the Bernoulli distribution with parameter p=12p = \frac{1}{2}. To obtain the distribution of the total number of heads Z=X+YZ = X + Y, sum the joint probabilities over pairs (x,y)(x,y) such that x+y=zx + y = z: P(Z=0)=pX,Y(0,0)=14P(Z=0) = p_{X,Y}(0,0) = \frac{1}{4}, P(Z=1)=pX,Y(0,1)+pX,Y(1,0)=12P(Z=1) = p_{X,Y}(0,1) + p_{X,Y}(1,0) = \frac{1}{2}, and P(Z=2)=pX,Y(1,1)=14P(Z=2) = p_{X,Y}(1,1) = \frac{1}{4}, yielding the binomial distribution with parameters n=2n=2 and p=12p=\frac{1}{2}. While the joint distribution uniquely determines the marginal distributions, the converse does not hold: multiple joint distributions can produce the same marginals, as the marginals discard information about the dependence structure between the variables.

Interpretation and Uses

The marginal distribution of a random variable XX obtained from a joint distribution with another variable YY represents the probability distribution of XX in isolation, effectively marginalizing over or ignoring the possible values of YY. This interpretation allows analysts to focus on the standalone behavior of XX, treating it as if the joint context did not exist. In data analysis, marginal distributions serve to summarize the individual characteristics and patterns of single variables within multivariate datasets, facilitating easier visualization and preliminary insights into variable-specific trends. They are particularly essential for hypothesis testing focused on individual variables, such as assessing whether the distribution of XX aligns with a theoretical model or differs across groups, without needing to model inter-variable relationships. Marginal distributions retain the core properties of probability distributions, ensuring that their total probability sums to 1 in the discrete case or integrates to 1 in the continuous case. They also enable direct computation of key summary statistics, including the expected value E[X]E[X], given by E[X]=xxP(X=x)E[X] = \sum_x x \, P(X = x) for discrete XX. A fundamental limitation of marginal distributions is their inability to describe joint events or dependencies between variables; for example, covariance as a dependence measure requires information from the full joint distribution and cannot be derived solely from marginals.

Joint Cumulative Distribution Function

Definition and Properties

The joint cumulative distribution function (JCDF) of two random variables XX and YY, denoted FX,Y(x,y)F_{X,Y}(x,y), is defined as the probability that XX is less than or equal to xx and YY is less than or equal to yy, that is, FX,Y(x,y)=P(Xx,Yy)F_{X,Y}(x,y) = P(X \leq x, Y \leq y) for all real numbers xx and yy. This definition generalizes to the multivariate case for nn random variables X1,,XnX_1, \dots, X_n, where the JCDF is F(x1,,xn)=P(X1x1,,Xnxn)F(x_1, \dots, x_n) = P(X_1 \leq x_1, \dots, X_n \leq x_n) for all real numbers x1,,xnx_1, \dots, x_n. The JCDF exhibits several fundamental mathematical properties that ensure it corresponds to a valid probability distribution. It is non-decreasing in each argument, meaning that if any argument increases while others are fixed, the function value does not decrease. Additionally, it is right-continuous in each argument, so that limh0F(,xi+h,)=F(,xi,)\lim_{h \downarrow 0} F(\dots, x_i + h, \dots) = F(\dots, x_i, \dots) for each ii. The limits satisfy F(,,,y,)=0F(\dots, -\infty, \dots, y, \dots) = 0 for any fixed values of the other arguments, limx,yF(x,y)=0\lim_{x \to -\infty, y \to -\infty} F(x,y) = 0 (or the multivariate analog), and limx,yF(x,y)=1\lim_{x \to \infty, y \to \infty} F(x,y) = 1 (extending to all arguments approaching \infty). A key inequality property is that for all x1x2x_1 \leq x_2, y1y2y_1 \leq y_2, F(x2,y2)F(x2,y1)F(x1,y2)+F(x1,y1)0,F(x_2,y_2) - F(x_2,y_1) - F(x_1,y_2) + F(x_1,y_1) \geq 0, which generalizes to higher dimensions and guarantees non-negative probabilities for rectangular (or hyper-rectangular) regions in the support. The JCDF is also related to the joint survival function, defined as the probability that all variables exceed their respective thresholds, which can be derived from the JCDF and marginals using inclusion-exclusion principles. Importantly, the JCDF completely and uniquely determines the joint probability distribution of the random variables, as any two distributions with the same JCDF must coincide. Marginal cumulative distribution functions can be obtained from the JCDF by taking appropriate limits.

Relation to Individual CDFs

The marginal cumulative distribution function (CDF) of a single random variable can be derived from the joint CDF of multiple random variables by allowing the unspecified variables to extend to their full range. In the bivariate case, for jointly distributed random variables XX and YY, the marginal CDF of XX is obtained as FX(x)=limyFX,Y(x,y),F_X(x) = \lim_{y \to \infty} F_{X,Y}(x, y), where FX,Y(x,y)=P(Xx,Yy)F_{X,Y}(x, y) = P(X \leq x, Y \leq y). Similarly, the marginal CDF of YY is F_Y(y) = \lim_{x \to \infty} F_{X,Y}(x, y).[](https://www.probabilitycourse.com/chapter5/5_2_2_joint_cdf.php) This derivation follows from the definition of the joint CDF, as the probability P(Xx)P(X \leq x) equals P(Xx,Y)P(X \leq x, Y \leq \infty), effectively integrating out the influence of YY. In the multivariate setting, the process generalizes to obtain the marginal CDF for any subset of variables by letting the remaining variables approach infinity. For random variables X1,,XnX_1, \dots, X_n, the marginal CDF of XiX_i is FXi(xi)=FX1,,Xn(,,,xi,,,),F_{X_i}(x_i) = F_{X_1, \dots, X_n}(\infty, \dots, \infty, x_i, \infty, \dots, \infty), with infinities placed in all positions except the ii-th. This marginalization preserves the distributional information for the subset while discarding details about the other variables. For illustration, consider XX and YY jointly uniformly distributed on the unit square [0,1]×[0,1][0,1] \times [0,1], which implies independence in this case. The joint CDF is FX,Y(x,y)=xyF_{X,Y}(x,y) = xy for 0x,y10 \leq x,y \leq 1, and thus the marginal CDF of XX simplifies to FX(x)=xF_X(x) = x for x[0,1]x \in [0,1], confirming that XX is marginally uniform on [0,1][0,1]. The joint CDF fully encapsulates the marginal CDFs of all individual variables, providing complete information about their univariate behaviors, but it additionally encodes the dependence structure among them, which cannot be recovered from the marginals alone. This relational property underscores the joint CDF's role as a comprehensive descriptor of multivariate distributions.

Joint Probability Functions

Probability Mass Function

The joint probability mass function (PMF) of two discrete random variables XX and YY is defined as pX,Y(x,y)=P(X=x,Y=y)p_{X,Y}(x,y) = P(X = x, Y = y) for all xx in the support of XX and yy in the support of YY, where the function assigns non-negative probabilities to each possible pair (x,y)(x, y) and satisfies xypX,Y(x,y)=1\sum_x \sum_y p_{X,Y}(x,y) = 1. This definition extends to multivariate cases with nn discrete random variables X1,,XnX_1, \dots, X_n, where the joint PMF pX1,,Xn(x1,,xn)=P(X1=x1,,Xn=xn)p_{X_1,\dots,X_n}(x_1,\dots,x_n) = P(X_1 = x_1, \dots, X_n = x_n) is non-negative and sums to 1 over all points in the joint support. For bivariate distributions, the joint PMF is commonly represented as a two-dimensional table, with rows corresponding to values of one variable and columns to the other, where each entry contains the probability pX,Y(x,y)p_{X,Y}(x,y). In the multivariate setting, it takes the form of a multi-dimensional array, analogous to a tensor, indexing probabilities across all combinations of the variables' values. The support of a joint PMF is countable, reflecting the discrete nature of the random variables involved, and each probability satisfies 0pX,Y(x,y)10 \leq p_{X,Y}(x,y) \leq 1. A key property is its role in computing expectations: for any function g(X,Y)g(X,Y) of the variables, the expected value is given by E[g(X,Y)]=xypX,Y(x,y)g(x,y)E[g(X,Y)] = \sum_x \sum_y p_{X,Y}(x,y) \, g(x,y), provided the sum exists. This orthogonality-like summation underpins derivations in probability theory, such as moments and generating functions. The joint PMF relates to the joint cumulative distribution function (CDF) FX,Y(x,y)=P(Xx,Yy)F_{X,Y}(x,y) = P(X \leq x, Y \leq y) through differencing: for integer-valued discrete variables, pX,Y(x,y)=FX,Y(x,y)FX,Y(x1,y)FX,Y(x,y1)+FX,Y(x1,y1)p_{X,Y}(x,y) = F_{X,Y}(x,y) - F_{X,Y}(x-1,y) - F_{X,Y}(x,y-1) + F_{X,Y}(x-1,y-1), where the limits at minus infinity are zero. This second-order finite difference recovers the point probabilities from the cumulative form, mirroring the univariate case but extended to two dimensions.

Probability Density Function

For continuous random variables XX and YY, the joint probability density function (PDF), denoted fX,Y(x,y)f_{X,Y}(x,y), is a non-negative function that describes the probability distribution of the pair (X,Y)(X, Y) with respect to the Lebesgue measure on R2\mathbb{R}^2. The probability that (X,Y)(X, Y) falls within a rectangular region [a,b]×[c,d][a, b] \times [c, d] is given by P(aXb,cYd)=abcdfX,Y(x,y)dydx,P(a \leq X \leq b, c \leq Y \leq d) = \int_a^b \int_c^d f_{X,Y}(x,y) \, dy \, dx, and the total integral over the entire plane must equal 1: fX,Y(x,y)dydx=1\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_{X,Y}(x,y) \, dy \, dx = 1. The joint PDF satisfies fX,Y(x,y)0f_{X,Y}(x,y) \geq 0 for all x,yRx, y \in \mathbb{R}, ensuring that probabilities are non-negative. However, not all continuous joint distributions admit a PDF; singular distributions, such as those concentrated on a lower-dimensional subspace like a line in R2\mathbb{R}^2 (e.g., a degenerate bivariate normal with singular covariance matrix), do not have a density with respect to the full Lebesgue measure on R2\mathbb{R}^2. This concept extends to nn continuous random variables X1,,XnX_1, \dots, X_n, where the joint PDF fX1,,Xn(x1,,xn)f_{X_1,\dots,X_n}(x_1,\dots,x_n) is non-negative and integrates to 1 over Rn\mathbb{R}^n: RnfX1,,Xn(x1,,xn)dx1dxn=1,\int_{\mathbb{R}^n} f_{X_1,\dots,X_n}(x_1,\dots,x_n) \, dx_1 \cdots dx_n = 1, with probabilities for events defined analogously via multiple integrals. The joint PDF relates to the joint cumulative distribution function (CDF) FX,Y(x,y)=P(Xx,Yy)F_{X,Y}(x,y) = P(X \leq x, Y \leq y) through mixed partial differentiation, where the PDF exists: fX,Y(x,y)=2xyFX,Y(x,y)f_{X,Y}(x,y) = \frac{\partial^2}{\partial x \partial y} F_{X,Y}(x,y). This contrasts with the discrete case, where the joint probability mass function uses summation instead of integration to compute probabilities.

Mixed Joint Distributions

A mixed joint distribution describes the joint behavior of random variables where at least one has a discrete support and another has a continuous support, combining elements of both probability mass functions and probability density functions. In such distributions, the probability assignment involves summing over the discrete values and integrating over the continuous range, rather than relying solely on one or the other. This structure is particularly useful when modeling phenomena where outcomes fall into categories (discrete) but associated measurements vary continuously. For instance, consider a scenario where XX is a discrete random variable taking values 0 or 1 with equal probability P(X=0)=P(X=1)=12P(X=0) = P(X=1) = \frac{1}{2}, and YY given XX follows a uniform distribution: YX=0Uniform[0,1]Y \mid X=0 \sim \text{Uniform}[0,1] and YX=1Uniform[0,2]Y \mid X=1 \sim \text{Uniform}[0,2]. The joint distribution is then given by fX,Y(x,y)=P(X=x)fYX(yx),f_{X,Y}(x,y) = P(X=x) f_{Y \mid X}(y \mid x), where fYX(y0)=1f_{Y \mid X}(y \mid 0) = 1 for 0y10 \leq y \leq 1 and 0 otherwise, and fYX(y1)=12f_{Y \mid X}(y \mid 1) = \frac{1}{2} for 0y20 \leq y \leq 2 and 0 otherwise. To compute a joint probability like P(0Y1,X=0)P(0 \leq Y \leq 1, X=0), one evaluates P(X=0)01fYX=0(y)dy=121=12P(X=0) \int_0^1 f_{Y \mid X=0}(y) \, dy = \frac{1}{2} \cdot 1 = \frac{1}{2}. Similarly, marginal probabilities for the continuous variable require integration weighted by the discrete masses. Properties of mixed joint distributions include the absence of a unified probability mass function (PMF) or probability density function (PDF) across all variables; instead, the distribution is characterized through conditional densities or generalized functions that account for the hybrid nature of the support. This often involves measure-theoretic concepts like disintegrations, where the joint measure decomposes into a discrete part and a family of conditional measures on the continuous subspace, ensuring the total probability integrates to 1. Such distributions lack a standard joint cumulative distribution function in the purely continuous sense but can be handled via the law of total probability adapted to mixed types. Applications of mixed joint distributions are prevalent in fields requiring hybrid modeling, such as survival analysis, where continuous covariates (e.g., biomarker levels) are jointly modeled with discrete event indicators (e.g., survival status). In this context, joint models link longitudinal continuous processes to time-to-event data using shared random effects, enabling predictions of survival probabilities conditional on observed trajectories. Similarly, in point processes, discrete event counts occur within continuous time or space, as seen in Poisson point processes where the intensity function governs the rate of discrete occurrences over a continuum, facilitating analysis of phenomena like earthquake occurrences or neural spikes.

Independence and Dependence

Statistical Independence

In probability theory, two random variables XX and YY are statistically independent if their joint probability mass function satisfies P(X=x,Y=y)=P(X=x)P(Y=y)P(X = x, Y = y) = P(X = x) P(Y = y) for all xx and yy in their respective supports. For continuous random variables, independence holds if the joint probability density function factors as fX,Y(x,y)=fX(x)fY(y)f_{X,Y}(x,y) = f_X(x) f_Y(y) for all xx and yy. This factorization criterion serves as a direct test for independence: the joint distribution separates into the product of the marginal distributions if and only if the variables are independent. A key property of independent random variables is that their joint cumulative distribution function (CDF) is the product of the marginal CDFs: FX,Y(x,y)=FX(x)FY(y)F_{X,Y}(x,y) = F_X(x) F_Y(y) for all xx and yy. This holds for both discrete and continuous cases, as well as mixed distributions. Additionally, the marginal distributions remain unchanged under independence, meaning the distribution of XX (or YY) is identical regardless of the value of the other variable, reflecting the absence of influence between them. For a collection of random variables X1,,XnX_1, \dots, X_n, mutual independence requires that the joint probability mass or density function factors completely into the product of all individual marginals: pX1,,Xn(x1,,xn)=i=1npXi(xi)p_{X_1,\dots,X_n}(x_1,\dots,x_n) = \prod_{i=1}^n p_{X_i}(x_i) (or analogously for densities). This full factorization implies pairwise independence for every pair, though the converse does not hold in general. Under mutual independence, the joint CDF similarly factors as FX1,,Xn(x1,,xn)=i=1nFXi(xi)F_{X_1,\dots,X_n}(x_1,\dots,x_n) = \prod_{i=1}^n F_{X_i}(x_i). Independence also implies that the covariance between any pair of variables is zero, though zero covariance does not guarantee independence.

Conditional Dependence

In probability theory, conditional dependence refers to a relationship between two or more random variables where their joint behavior remains linked even after accounting for the information provided by a conditioning variable. Specifically, for random variables XX and YY given Z=zZ = z, conditional dependence holds if the conditional joint distribution satisfies fX,YZ(x,yz)fXZ(xz)fYZ(yz)f_{X,Y|Z}(x,y|z) \neq f_{X|Z}(x|z) \cdot f_{Y|Z}(y|z), meaning the probability of observing specific values of XX and YY together, given ZZ, cannot be expressed as the product of their separate conditional probabilities. This contrasts with unconditional dependence, where the joint distribution fX,Y(x,y)fX(x)fY(y)f_{X,Y}(x,y) \neq f_X(x) \cdot f_Y(y) without any conditioning, but partial conditioning on ZZ can either preserve, induce, or remove such linkages depending on the underlying structure of the variables. A classic example arises in Markov chains, where the sequence of states exhibits conditional dependence on the immediate past but independence from more distant history when conditioned appropriately. In a first-order Markov chain, the future state Xt+1X_{t+1} is conditionally independent of the entire past {X0,,Xt1}\{X_0, \dots, X_{t-1}\} given the current state XtX_t, implying that without this conditioning, Xt+1X_{t+1} and the distant past are dependent through the chain of influences; however, the conditioning on XtX_t breaks this long-range dependence, simplifying predictions to rely only on the present. This property, known as the Markov property, highlights how conditional dependence structures sequential processes by localizing dependencies. Conversely, conditioning can induce dependence between variables that are unconditionally independent. Consider two independent causes XX and YY (e.g., separate coin flips) both influencing a common effect ZZ (e.g., the total number of heads); unconditionally, XX and YY are independent, but conditioning on a specific value of ZZ, such as exactly one head, creates conditional dependence because the outcomes of XX and YY must now "explain" the observed ZZ jointly, altering their probabilistic relationship. This phenomenon, sometimes called collider bias, illustrates how partial conditioning can open new paths of dependence in causal structures. In graphical models like Bayesian networks, the presence or absence of conditional dependence plays a key role in structuring joint distributions. When variables exhibit conditional independence given their direct influences (parents), the joint probability factors into a product of simpler conditional distributions, P(X1,,Xn)=i=1nP(XiParents(Xi))P(X_1, \dots, X_n) = \prod_{i=1}^n P(X_i | \text{Parents}(X_i)), which reduces the complexity of representing high-dimensional joints from exponential to linear in the number of variables; however, where conditional dependence persists, additional parameters are needed to capture the non-factorable interactions. This factorization leverages conditional independence to enable efficient inference, but conditional dependence requires explicit modeling to avoid oversimplification.

Dependence Measures

Covariance

In the context of joint probability distributions, the covariance between two random variables XX and YY is defined as Cov(X,Y)=E[(XE[X])(YE[Y])],\operatorname{Cov}(X, Y) = \mathbb{E}\left[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])\right], which measures the expected value of the product of their deviations from respective means. This can equivalently be expressed as Cov(X,Y)=E[XY]E[X]E[Y].\operatorname{Cov}(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]. For discrete random variables with joint probability mass function p(x,y)p(x, y), the covariance is computed as Cov(X,Y)=xy(xμX)(yμY)p(x,y),\operatorname{Cov}(X, Y) = \sum_{x} \sum_{y} (x - \mu_X)(y - \mu_Y) p(x, y), where μX=E[X]\mu_X = \mathbb{E}[X] and μY=E[Y]\mu_Y = \mathbb{E}[Y]. For continuous random variables with joint probability density function f(x,y)f(x, y), it is given by the double integral Cov(X,Y)=(xμX)(yμY)f(x,y)dxdy.\operatorname{Cov}(X, Y) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} (x - \mu_X)(y - \mu_Y) f(x, y) \, dx \, dy. Covariance possesses several key properties. It is symmetric, satisfying Cov(X,Y)=Cov(Y,X)\operatorname{Cov}(X, Y) = \operatorname{Cov}(Y, X). It is also bilinear, meaning that for constants a,b,c,da, b, c, d and random variables X,YX, Y, Cov(aX+b,cY+d)=acCov(X,Y)\operatorname{Cov}(aX + b, cY + d) = ac \operatorname{Cov}(X, Y). If XX and YY are independent (assuming finite variances), then Cov(X,Y)=0\operatorname{Cov}(X, Y) = 0, though the converse does not hold in general. Additionally, the units of covariance are the product of the units of XX and YY. The sign of the covariance provides insight into the linear dependence: a positive value indicates that XX and YY tend to deviate from their means in the same direction (both above or both below), while a negative value indicates deviations in opposite directions. A zero covariance implies no linear association, but does not rule out other forms of dependence.

Correlation Coefficient

The Pearson correlation coefficient, denoted ρX,Y\rho_{X,Y}, quantifies the strength and direction of the linear relationship between two random variables XX and YY in a joint probability distribution. It is defined as the covariance between XX and YY divided by the product of their standard deviations: ρX,Y=\Cov(X,Y)σXσY,\rho_{X,Y} = \frac{\Cov(X,Y)}{\sigma_X \sigma_Y}, where \Cov(X,Y)\Cov(X,Y) is the covariance and σX\sigma_X, σY\sigma_Y are the standard deviations. This standardization renders the coefficient unitless and comparable across different scales of measurement. The concept was introduced by Karl Pearson in his 1895 work on skew variation. The value of ρX,Y\rho_{X,Y} ranges from -1 to 1. A value of 1 indicates a perfect positive linear relationship, -1 a perfect negative linear relationship, and 0 signifies that XX and YY are uncorrelated, meaning there is no linear association between them—though uncorrelated variables are not necessarily independent in general joint distributions. Key properties of the Pearson correlation coefficient include its invariance under linear transformations of the variables, such as affine shifts (adding constants) or scalings (multiplying by positive constants), which preserve the coefficient's value. In the specific case of a bivariate normal distribution, a correlation coefficient of zero is equivalent to statistical independence between XX and YY. Despite its utility, the Pearson correlation coefficient has limitations as a measure of dependence in joint distributions. It exclusively captures linear associations and remains insensitive to nonlinear relationships, potentially underestimating or missing strong dependencies that do not follow a straight-line pattern.

Common Named Distributions

Bivariate Normal Distribution

The bivariate normal distribution, also known as the Gaussian bivariate distribution, is a fundamental continuous joint probability distribution for two random variables XX and YY, characterized by its bell-shaped density and elliptical symmetry, which captures linear dependence between the variables. It is parameterized by the means μX\mu_X and μY\mu_Y, the standard deviations σX>0\sigma_X > 0 and σY>0\sigma_Y > 0, and the correlation coefficient ρ(1,1)\rho \in (-1, 1), where ρ\rho quantifies the strength and direction of the linear relationship between XX and YY. These parameters define the location, scale, and dependence structure, making the bivariate normal a cornerstone for modeling jointly normal phenomena in fields like finance, engineering, and biostatistics. The probability density function (PDF) of the bivariate normal distribution is given by f(x,y)=12πσXσY1ρ2exp{12(1ρ2)[(xμX)2σX2+(yμY)2σY22ρ(xμX)(yμY)σXσY]},f(x, y) = \frac{1}{2\pi \sigma_X \sigma_Y \sqrt{1 - \rho^2}} \exp\left\{ -\frac{1}{2(1 - \rho^2)} \left[ \frac{(x - \mu_X)^2}{\sigma_X^2} + \frac{(y - \mu_Y)^2}{\sigma_Y^2} - \frac{2\rho (x - \mu_X)(y - \mu_Y)}{\sigma_X \sigma_Y} \right] \right\},
Add your contribution
Related Hubs
User Avatar
No comments yet.