Hubbry Logo
Marginal distributionMarginal distributionMain
Open search
Marginal distribution
Community hub
Marginal distribution
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Marginal distribution
Marginal distribution
from Wikipedia

In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables. This contrasts with a conditional distribution, which gives the probabilities contingent upon the values of the other variables.

Marginal variables are those variables in the subset of variables being retained. These concepts are "marginal" because they can be found by summing values in a table along rows or columns, and writing the sum in the margins of the table.[1] The distribution of the marginal variables (the marginal distribution) is obtained by marginalizing (that is, focusing on the sums in the margin) over the distribution of the variables being discarded, and the discarded variables are said to have been marginalized out.

The context here is that the theoretical studies being undertaken, or the data analysis being done, involves a wider set of random variables but that attention is being limited to a reduced number of those variables. In many applications, an analysis may start with a given collection of random variables, then first extend the set by defining new ones (such as the sum of the original random variables) and finally reduce the number by placing interest in the marginal distribution of a subset (such as the sum). Several different analyses may be done, each treating a different subset of variables as the marginal distribution.

Definition

[edit]

Marginal probability mass function

[edit]

Given a known joint distribution of two discrete random variables, say, X and Y, the marginal distribution of either variable – X for example – is the probability distribution of X when the values of Y are not taken into consideration. This can be calculated by summing the joint probability distribution over all values of Y. Naturally, the converse is also true: the marginal distribution can be obtained for Y by summing over the separate values of X.

, and
X
Y
x1 x2 x3 x4 pY(y) ↓
y1 4/32 2/32 1/32 1/32 8/32
y2 3/32 6/32 3/32 3/32 15/32
y3 9/32 0 0 0 9/32
pX(x) → 16/32 8/32 4/32 4/32 32/32
Joint and marginal distributions of a pair of discrete random variables, X and Y, dependent, thus having nonzero mutual information I(X; Y). The values of the joint distribution are in the 3×4 rectangle; the values of the marginal distributions are along the right and bottom margins.

A marginal probability can always be written as an expected value:

Intuitively, the marginal probability of X is computed by examining the conditional probability of X given a particular value of Y, and then averaging this conditional probability over the distribution of all values of Y.

This follows from the definition of expected value (after applying the law of the unconscious statistician)

Therefore, marginalization provides the rule for the transformation of the probability distribution of a random variable Y and another random variable X=g(Y):

Marginal probability density function

[edit]

Given two continuous random variables X and Y whose joint distribution is known, then the marginal probability density function can be obtained by integrating the joint probability density, f, over Y, and vice versa. That is

where , and .

Marginal cumulative distribution function

[edit]

Finding the marginal cumulative distribution function from the joint cumulative distribution function is easy. Recall that:

  • For discrete random variables,
  • For continuous random variables,

If X and Y jointly take values on [a, b] × [c, d] then

and

If d is ∞, then this becomes a limit . Likewise for .

Marginal distribution vs. conditional distribution

[edit]

Definition

[edit]

The marginal probability is the probability of a single event occurring, independent of other events. A conditional probability, on the other hand, is the probability that an event occurs given that another specific event has already occurred. This means that the calculation for one variable is dependent on another variable.[2]

The conditional distribution of a variable given another variable is the joint distribution of both variables divided by the marginal distribution of the other variable.[3] That is,

  • For discrete random variables,
  • For continuous random variables,

Example

[edit]

Suppose there is data from a classroom of 200 students on the amount of time studied (X) and the percentage of correct answers (Y).[4] Assuming that X and Y are discrete random variables, the joint distribution of X and Y can be described by listing all the possible values of p(xi,yj), as shown in Table.3.

X
Y
Time studied (minutes)
% correct
x1 (0-20) x2 (21-40) x3 (41-60) x4(>60) pY(y) ↓
y1 (0-20) 2/200 0 0 8/200 10/200
y2 (21-40) 10/200 2/200 8/200 0 20/200
y3 (41-59) 2/200 4/200 32/200 32/200 70/200
y4 (60-79) 0 20/200 30/200 10/200 60/200
y5 (80-100) 0 4/200 16/200 20/200 40/200
pX(x) → 14/200 30/200 86/200 70/200 1
Two-way table of dataset of the relationship in a classroom of 200 students between the amount of time studied and the percentage correct

The marginal distribution can be used to determine how many students scored 20 or below: , meaning 10 students or 5%.

The conditional distribution can be used to determine the probability that a student that studied 60 minutes or more obtains a score of 20 or below: , meaning there is about a 11% probability of scoring 20 after having studied for at least 60 minutes.

Real-world example

[edit]

Suppose that the probability that a pedestrian will be hit by a car, while crossing the road at a pedestrian crossing, without paying attention to the traffic light, is to be computed. Let H be a discrete random variable taking one value from {Hit, Not Hit}. Let L (for traffic light) be a discrete random variable taking one value from {Red, Yellow, Green}.

Realistically, H will be dependent on L. That is, P(H = Hit) will take different values depending on whether L is red, yellow or green (and likewise for P(H = Not Hit)). A person is, for example, far more likely to be hit by a car when trying to cross while the lights for perpendicular traffic are green than if they are red. In other words, for any given possible pair of values for H and L, one must consider the joint probability distribution of H and L to find the probability of that pair of events occurring together if the pedestrian ignores the state of the light.

However, in trying to calculate the marginal probability P(H = Hit), what is being sought is the probability that H = Hit in the situation in which the particular value of L is unknown and in which the pedestrian ignores the state of the light. In general, a pedestrian can be hit if the lights are red OR if the lights are yellow OR if the lights are green. So, the answer for the marginal probability can be found by summing P(H | L) for all possible values of L, with each value of L weighted by its probability of occurring.

Here is a table showing the conditional probabilities of being hit, depending on the state of the lights. (Note that the columns in this table must add up to 1 because the probability of being hit or not hit is 1 regardless of the state of the light.)

Conditional distribution:
L
H
Red Yellow Green
Not Hit 0.99 0.9 0.2
Hit 0.01 0.1 0.8

To find the joint probability distribution, more data is required. For example, suppose P(L = red) = 0.2, P(L = yellow) = 0.1, and P(L = green) = 0.7. Multiplying each column in the conditional distribution by the probability of that column occurring results in the joint probability distribution of H and L, given in the central 2×3 block of entries. (Note that the cells in this 2×3 block add up to 1).

Joint distribution:
L
H
Red Yellow Green Marginal probability P(H)
Not Hit 0.198 0.09 0.14 0.428
Hit 0.002 0.01 0.56 0.572
Total 0.2 0.1 0.7 1

The marginal probability P(H = Hit) is the sum 0.572 along the H = Hit row of this joint distribution table, as this is the probability of being hit when the lights are red OR yellow OR green. Similarly, the marginal probability that P(H = Not Hit) is the sum along the H = Not Hit row.

Multivariate distributions

[edit]
Many samples from a bivariate normal distribution. The marginal distributions are shown in red and blue. The marginal distribution of X is also approximated by creating a histogram of the X coordinates without consideration of the Y coordinates.

For multivariate distributions, formulae similar to those above apply with the symbols X and/or Y being interpreted as vectors. In particular, each summation or integration would be over all variables except those contained in X.[5]

That means, If X1,X2,…,Xn are discrete random variables, then the marginal probability mass function should be if X1,X2,…,Xn are continuous random variables, then the marginal probability density function should be

See also

[edit]

References

[edit]

Bibliography

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In probability theory and statistics, a marginal distribution is the probability distribution of a single random variable or a subset of random variables derived from a joint probability distribution by summing or integrating out the probabilities of the other variables. This process effectively ignores the dependencies on the excluded variables, providing the unconditional probability distribution for the variable(s) of interest. For discrete random variables, the marginal probability mass function is obtained by summing the joint probabilities over all possible values of the other variables, as in fX(x)=yfX,Y(x,y)f_X(x) = \sum_y f_{X,Y}(x,y). In the continuous case, the marginal probability density function results from integrating the joint density, such as fX(x)=fX,Y(x,y)dyf_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) \, dy. Marginal distributions are fundamental in multivariate analysis because they allow researchers to focus on individual variables without considering interactions, which is essential for tasks like hypothesis testing, simulation, and model simplification. For instance, in a bivariate joint distribution table, the marginal probabilities appear in the row and column totals, representing the distributions of each variable alone. They differ from conditional distributions, which account for the value of another variable, and from the full joint distribution, which captures all interdependencies. Understanding marginals is crucial in fields like , where summing out latent variables yields predictive distributions, and in for marginalizing over hidden states in probabilistic models. The concept extends to higher dimensions, where marginalizing over multiple variables produces the distribution for any subset, facilitating in complex datasets. Properties of marginal distributions include preserving certain moments (like means and variances under ) and enabling the calculation of expectations via iterated integrals, as per the . In practice, computing marginals analytically is straightforward for simple cases but often requires numerical methods or approximations for high-dimensional or non-standard distributions.

Definition

General concept

In probability theory and statistics, a marginal distribution is the probability distribution of one or more random variables from a larger set, derived from their joint distribution by summing or integrating out the probabilities associated with the remaining variables. This process, known as marginalization, effectively isolates the distribution of the variables of interest while preserving the total probability mass or density. The resulting marginal distribution captures the behavior of the selected variables without regard to the specific values of the others, making it a fundamental tool for simplifying multivariate analyses. The term "marginal distribution" originates from the practice of recording totals in the margins of joint probability tables, a convention that emerged in early 20th-century statistics with the development of analysis by around 1900. Intuitively, obtaining a marginal distribution is akin to collapsing a multi-dimensional into a lower-dimensional one by summing the entries along rows or columns, thereby focusing on the totals for the variables of interest. In general, for two random variables XX and YY with joint distribution P(X,Y)P(X, Y), the marginal distribution of XX, denoted P(X)P(X), is obtained through the marginalization operation over YY. This framework extends naturally to subsets of any collection of random variables, providing a way to extract univariate or lower-dimensional distributions from more complex joint structures.

Discrete case

In the discrete case, the marginal distribution of a XX from a distribution of discrete s XX and YY is defined by its (PMF), given by pX(x)=ypX,Y(x,y),p_X(x) = \sum_y p_{X,Y}(x,y), where the sum is over all possible values of YY, and pX,Y(x,y)p_{X,Y}(x,y) is the PMF. This formula extracts the distribution of XX by aggregating the joint probabilities across the support of YY. Similarly, the marginal PMF for YY is pY(y)=xpX,Y(x,y)p_Y(y) = \sum_x p_{X,Y}(x,y). The marginal (CDF) for the discrete XX is then obtained by summing the marginal PMF up to xx: FX(x)=kxpX(k),F_X(x) = \sum_{k \leq x} p_X(k), where the sum is over all discrete points kk in the support of XX that are less than or equal to xx. This CDF fully characterizes the marginal distribution, reflecting the countable nature of the outcomes. To compute the marginal PMF in practice, one constructs a joint PMF table representing pX,Y(x,y)p_{X,Y}(x,y) for the finite or countable supports of XX and YY, then sums the entries along the rows (for pX(x)p_X(x)) or columns (for pY(y)p_Y(y)). For instance, consider a simple bivariate discrete distribution where XX takes values {1, 2} and YY takes values {a, b}, with the following joint PMF table:
y=ay = ay=by = bMarginal pX(x)p_X(x)
x=1x=10.20.30.5
x=2x=20.10.40.5
Marginal pY(y)p_Y(y)0.30.71.0
Here, the marginal for X=1X=1 is 0.2+0.3=0.50.2 + 0.3 = 0.5, and similarly for others, ensuring the marginals sum to 1 due to the total probability theorem. This tabular approach facilitates verification that xpX(x)=1\sum_x p_X(x) = 1 and ypY(y)=1\sum_y p_Y(y) = 1, a property inherent to the additivity of discrete probabilities. In statistical applications, these marginal distributions often appear in the margins of contingency tables, which tabulate observed frequencies analogous to joint PMFs in probability models, allowing on individual variables while ignoring associations. This discrete framework contrasts with the continuous case, where integration replaces summation to obtain marginals.

Continuous case

In the continuous case, the marginal distribution of a random variable is derived from the joint probability density function (PDF) of two or more continuous random variables. For jointly continuous random variables XX and YY with joint PDF fX,Y(x,y)f_{X,Y}(x,y), the marginal PDF of XX, denoted fX(x)f_X(x), is obtained by integrating the joint PDF over all possible values of YY: fX(x)=fX,Y(x,y)dy.f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) \, dy. This integral represents the total probability density associated with each value of xx, marginalizing out the dependence on yy. The limits of integration must align with the support of the joint distribution to ensure the result is well-defined; if the joint PDF has a restricted support region (e.g., yy only defined for 0y10 \leq y \leq 1 given xx), the integral bounds are adjusted accordingly, such as 01fX,Y(x,y)dy\int_{0}^{1} f_{X,Y}(x,y) \, dy, rather than extending to infinity. Improper integrals arise naturally when the support is unbounded, but the joint PDF's normalization guarantees that fX(x)f_X(x) integrates to 1 over its domain. The marginal cumulative distribution function (CDF) of XX, FX(x)=P(Xx)F_X(x) = P(X \leq x), follows from the marginal PDF as FX(x)=xfX(t)dt.F_X(x) = \int_{-\infty}^{x} f_X(t) \, dt. This can also be expressed directly via the joint PDF by iterated integration: FX(x)=xfX,Y(t,y)dydtF_X(x) = \int_{-\infty}^{x} \int_{-\infty}^{\infty} f_{X,Y}(t,y) \, dy \, dt, though the marginal PDF route is typically more straightforward for computation. Unlike the discrete case, which relies on over probability mass functions, the continuous marginal distribution emphasizes probability densities and requires integration, often necessitating techniques like (e.g., transformations) when the joint PDF is expressed in non-Cartesian coordinates for complex support regions.

Relations to Other Distributions

With joint distributions

Marginal distributions are obtained from distributions through the process of marginalization, which involves summing the probability mass function (PMF) over the possible values of the other variables in the discrete case, or integrating the probability density function (PDF) over the range of the other variables in the continuous case. For two discrete random variables XX and YY, the marginal PMF of XX is given by PX(x)=yPX,Y(x,y),P_X(x) = \sum_y P_{X,Y}(x,y), where the sum is taken over all possible values of YY. Similarly, for continuous variables, the marginal PDF of XX is fX(x)=fX,Y(x,y)dy.[](https://www.statlect.com/glossary/marginalprobabilitydensityfunction)f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) \, dy.[](https://www.statlect.com/glossary/marginal-probability-density-function) This marginalization acts as a projection from the multidimensional joint space onto the lower-dimensional space of the variable of interest, effectively discarding information about dependencies between variables. The joint distribution cannot be uniquely recovered from the marginal distributions alone, as multiple joint distributions can yield the same marginals; reconstruction requires supplementary information, such as conditional distributions. In multivariate joint distributions involving more than two variables, the operation of marginalization exhibits associativity: the marginal distribution for a of variables remains the same irrespective of the sequence in which the remaining variables are marginalized out, due to the associative nature of and integration. A key implication arises under independence: if random variables XX and YY are independent, their distribution factors into the product of their marginal distributions, PX,Y(x,y)=PX(x)PY(y)P_{X,Y}(x,y) = P_X(x) P_Y(y), ensuring that the marginal of XX extracted from the coincides precisely with its standalone marginal, with no influence from YY. This factorization highlights how eliminates dependencies, simplifying the relationship between and marginal forms. The connection between joint, marginal, and conditional distributions is bridged by the formula expressing the joint PMF in terms of the marginal and conditional: PX,Y(x,y)=PX(x)PYX(yx)P_{X,Y}(x,y) = P_X(x) P_{Y|X}(y|x), where PYX(yx)P_{Y|X}(y|x) is the conditional PMF of YY given X=xX = x. An analogous relation holds for PDFs: fX,Y(x,y)=fX(x)fYX(yx)f_{X,Y}(x,y) = f_X(x) f_{Y|X}(y|x). This decomposition underscores the complementary role of marginal and conditional components in fully specifying the joint distribution.

With conditional distributions

In probability theory, the conditional distribution describes the probability distribution of one random variable given the value of another. For discrete random variables XX and YY, the conditional probability mass function is defined as PXY(xy)=PX,Y(x,y)PY(y),P_{X \mid Y}(x \mid y) = \frac{P_{X,Y}(x,y)}{P_Y(y)}, provided that PY(y)>0P_Y(y) > 0, where PX,Y(x,y)P_{X,Y}(x,y) is the joint probability mass function and PY(y)P_Y(y) is the marginal probability mass function of YY. For continuous random variables, the conditional probability density function is similarly given by fXY(xy)=fX,Y(x,y)fY(y),f_{X \mid Y}(x \mid y) = \frac{f_{X,Y}(x,y)}{f_Y(y)}, assuming fY(y)>0f_Y(y) > 0, with fX,Y(x,y)f_{X,Y}(x,y) denoting the joint density and fY(y)f_Y(y) the marginal density of YY. Both marginal and conditional distributions are derived from the joint distribution of the variables. A key distinction between marginal and conditional distributions lies in how they handle the conditioning variable. The marginal distribution of XX ignores YY entirely by averaging over all possible values of YY through or integration of the joint distribution, providing an unconditional view of XX's behavior. In contrast, the conditional distribution fixes YY to a specific value yy, restricting the analysis to the subset of outcomes where Y=yY = y and revealing how XX behaves under that condition. This difference underscores their complementary roles in probabilistic modeling: marginals capture the overall, unconditional characteristics of a variable, while conditionals account for dependencies and provide context-specific insights. In probabilistic inference, marginal distributions are used to assess the general properties of a random variable without additional constraints, such as computing expected values or variances in isolation. Conditional distributions, however, play a central role in scenario-based reasoning, enabling updates to beliefs based on observed evidence; for instance, they form the basis of Bayes' theorem, where the posterior distribution is a conditional distribution proportional to the likelihood times the prior. A fundamental property linking the two is that the marginal distribution of XX can be obtained by averaging the conditional distribution over the marginal of YY, as per the law of total probability: PX(x)=yPXY(xy)PY(y)P_X(x) = \sum_y P_{X \mid Y}(x \mid y) P_Y(y) for the discrete case, or the integral analog fX(x)=fXY(xy)fY(y)dyf_X(x) = \int_{-\infty}^{\infty} f_{X \mid Y}(x \mid y) f_Y(y) \, dy for the continuous case, demonstrating that marginalizing over the conditional recovers the unconditional marginal.

Examples and Applications

Bivariate example

A classic example of a bivariate discrete distribution arises when rolling two six-sided . Let XX denote the outcome of the first die and YY the outcome of the second die, each taking values in {1,2,3,4,5,6}\{1, 2, 3, 4, 5, 6\}. Since the dice are independent and , the (PMF) is : pX,Y(i,j)=P(X=i,Y=j)=136for i,j=1,2,,6.p_{X,Y}(i,j) = P(X = i, Y = j) = \frac{1}{36} \quad \text{for } i,j = 1, 2, \dots, 6. This PMF can be visualized in a 6×6 , where each cell (i,j)(i,j) contains the probability 136\frac{1}{36}, the row sums (marginals for XX) are each 636=16\frac{6}{36} = \frac{1}{6}, and the column sums (marginals for YY) are similarly 16\frac{1}{6}. To compute the marginal PMF of XX, sum the joint probabilities over all possible values of YY: pX(i)=j=16pX,Y(i,j)=j=16136=636=16for i=1,2,,6.p_X(i) = \sum_{j=1}^6 p_{X,Y}(i,j) = \sum_{j=1}^6 \frac{1}{36} = \frac{6}{36} = \frac{1}{6} \quad \text{for } i = 1, 2, \dots, 6. By symmetry, the marginal PMF of YY is identical: pY(j)=16p_Y(j) = \frac{1}{6} for j=1,2,,6j = 1, 2, \dots, 6. This follows the general formula for the marginal PMF in the discrete case. The resulting marginal distributions are uniform over {1,2,3,4,5,6}\{1, 2, 3, 4, 5, 6\}, which aligns with the known distribution of a single fair die, verifying the computation as the dice are independent.

Real-world application

In demographic analysis, a practical application of marginal distributions arises in processing U.S. Census Bureau data on household income and age. The (ACS) publishes joint distributions in tables such as B19037, which cross-tabulates age groups of householders (e.g., under 25, 25–44, 45–64, and 65 and over) with binned income categories (e.g., less than $10,000 to $200,000 or more), providing counts or percentages for each combination based on survey responses. This binned joint distribution reflects real-world data from millions of households, capturing variations influenced by factors like and patterns. To obtain the marginal income distribution, analysts sum the joint table entries across all age groups for each income bin, yielding the overall proportion or count of households in each income category irrespective of age. For instance, this computation aggregates from the 2022 ACS, revealing that approximately 5.5% of households earned less than $10,000, while 11.5% earned $200,000 or more, derived directly from the totals of the table. These marginal distributions simplify complex analyses by focusing on aggregate income patterns, enabling policymakers to calculate key metrics like the national median household income—$74,755 as of the 2022 ACS 1-year estimates—without conditioning on age, which supports broad economic indicators and inequality assessments. In policy contexts, such marginals inform decisions on taxation, social welfare programs, and poverty thresholds; for example, the Census Bureau uses them to track income inequality via the , guiding federal budget allocations for programs like the . This aggregation highlights overall economic health, as seen in reports showing stagnant median incomes for younger age groups influencing youth-targeted initiatives. Real-world joint data often faces incompleteness due to survey nonresponse, privacy protections, and aggregation for disclosure avoidance, particularly since the 2020 adopted techniques that introduce to prevent re-identification, potentially distorting fine-grained distributions. To address this, analysts employ approximations such as histogram-based marginals from binned tables or imputation methods like to reconstruct reliable aggregates from noisy or partial joints, ensuring usability for policy while maintaining confidentiality.

Multivariate Extensions

Definition in higher dimensions

In , the concept of marginal distribution generalizes naturally to higher dimensions, where a joint distribution involves three or more random variables, and one seeks the distribution of a of them. Consider a random vector X=(X1,,Xk)\mathbf{X} = (X_1, \dots, X_k) forming a of a larger collection of n>kn > k random variables with joint (PMF) p(x,y)p(\mathbf{x}, \mathbf{y}) for the discrete case, or joint (PDF) f(x,y)f(\mathbf{x}, \mathbf{y}) for the continuous case, where y\mathbf{y} denotes the complementary vector of the remaining nkn - k variables. The marginal distribution of X\mathbf{X} is derived by eliminating the dependence on y\mathbf{y} through over all possible values of y\mathbf{y} in the discrete case, or integration over the support of y\mathbf{y} in the continuous case. For the discrete case, the marginal PMF is given by pX(x)=yp(x,y),p_{\mathbf{X}}(\mathbf{x}) = \sum_{\mathbf{y}} p(\mathbf{x}, \mathbf{y}), where the is taken over all possible outcomes of y\mathbf{y}. In the continuous case, the marginal PDF is fX(x)=f(x,y)dy,f_{\mathbf{X}}(\mathbf{x}) = \int \cdots \int f(\mathbf{x}, \mathbf{y}) \, d\mathbf{y}, with the extending over the appropriate domain for y\mathbf{y}. This process, known as marginalization, can be applied iteratively if the X\mathbf{X} involves non-consecutive variables, building on the bivariate marginalization as a foundational step. Notation for marginal distributions in higher dimensions often employs subscripts to indicate the specific subset of variables. For instance, if the full joint distribution is over random variables X1,,XnX_1, \dots, X_n, the marginal distribution over the first mm variables (m<nm < n) is denoted pX1:m(x1:m)p_{X_{1:m}}(x_{1:m}) or fX1:m(x1:m)f_{X_{1:m}}(x_{1:m}), obtained by summing or integrating out Xm+1,,XnX_{m+1}, \dots, X_n. This subscript convention facilitates precise reference to arbitrary subsets, such as XSX_S where S{1,,n}S \subseteq \{1, \dots, n\}, and underscores the reduction in dimensionality from the full joint to the desired marginal.

Properties and computations

In the multivariate setting, the marginal distribution of a of random variables from a distribution over X=(X1,,Xn)\mathbf{X} = (X_1, \dots, X_n) is obtained by integrating the (pdf) over the complementary variables or, for discrete cases, by summing the (pmf). For a continuous pdf f(x)f(\mathbf{x}), the marginal pdf for a subvector XS\mathbf{X}_S corresponding to S{1,,n}S \subset \{1, \dots, n\} is given by fXS(xS)=RnSf(x)dxSc,f_{\mathbf{X}_S}(\mathbf{x}_S) = \int_{\mathbb{R}^{n - |S|}} f(\mathbf{x}) \, d\mathbf{x}_{S^c}, where ScS^c denotes the complement of SS. Similarly, for a discrete pmf p(x)p(\mathbf{x}), the marginal pmf is pXS(xS)=xScp(x).p_{\mathbf{X}_S}(\mathbf{x}_S) = \sum_{\mathbf{x}_{S^c}} p(\mathbf{x}). These operations ensure that the resulting marginal is a valid , integrating (or summing) to 1 over its support. A key property is that the joint distribution uniquely determines all possible marginal distributions for subsets of any size, but the marginals do not uniquely determine the joint; multiple joints can share identical marginals, reflecting the loss of dependence information upon marginalization. Marginal distributions inherit certain structural properties from the joint when the latter belongs to a closed family under marginalization. For instance, in the XNn(μ,Σ)\mathbf{X} \sim \mathcal{N}_n(\boldsymbol{\mu}, \boldsymbol{\Sigma}), the marginal distribution of any subvector XS\mathbf{X}_S is also , specifically XSNS(μS,ΣS,S)\mathbf{X}_S \sim \mathcal{N}_{|S|}(\boldsymbol{\mu}_S, \boldsymbol{\Sigma}_{S,S}), where μS\boldsymbol{\mu}_S and ΣS,S\boldsymbol{\Sigma}_{S,S} are the subvector and principal submatrix corresponding to SS. This closure property facilitates analytical computations without approximation for Gaussian joints. The expectation and of the marginal follow directly: E[XS]=μS\mathbb{E}[\mathbf{X}_S] = \boldsymbol{\mu}_S and Cov(XS)=ΣS,S\mathrm{Cov}(\mathbf{X}_S) = \boldsymbol{\Sigma}_{S,S}. Computations of marginals are straightforward in low dimensions or when closed-form expressions exist, as in the multivariate normal case, where the marginal pdf is explicitly fXS(xS)=(2π)S/2ΣS,S1/2exp(12(xSμS)ΣS,S1(xSμS)).f_{\mathbf{X}_S}(\mathbf{x}_S) = (2\pi)^{-|S|/2} |\boldsymbol{\Sigma}_{S,S}|^{-1/2} \exp\left( -\frac{1}{2} (\mathbf{x}_S - \boldsymbol{\mu}_S)^\top \boldsymbol{\Sigma}_{S,S}^{-1} (\mathbf{x}_S - \boldsymbol{\mu}_S) \right). For discrete multivariate distributions, such as the multinomial, marginals are obtained via summation and often retain the same family form, like binomial marginals from multinomial s. In higher dimensions or for non-closed-form s (e.g., certain copula-based models), numerical methods are required, including —sampling from the joint and averaging over the unwanted dimensions—or quadrature rules for deterministic approximation. These techniques scale poorly with dimension due to the curse of dimensionality, emphasizing the importance of exploiting joint structure when available.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.