Hubbry Logo
Conditional independenceConditional independenceMain
Open search
Conditional independence
Community hub
Conditional independence
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Conditional independence
Conditional independence
from Wikipedia

In probability theory, conditional independence describes situations in which an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probability, as a special case where the probability of the hypothesis given the uninformative observation is equal to the probability without. If is the hypothesis, and and are observations, conditional independence can be stated as an equality:

where is the probability of given both and . Since the probability of given is the same as the probability of given both and , this equality expresses that contributes nothing to the certainty of . In this case, and are said to be conditionally independent given , written symbolically as: .

The concept of conditional independence is essential to graph-based theories of statistical inference, as it establishes a mathematical relation between a collection of conditional statements and a graphoid.

Conditional independence of events

[edit]

Let , , and be events. and are said to be conditionally independent given if and only if and. This property is symmetric (more on this below) and often written as , which should be read as.

Equivalently, conditional independence may be stated as where is the joint probability of and given . This alternate formulation states that and are independent events, given .

It demonstrates that is equivalent to .

Proof of the equivalent definition

[edit]

Examples

[edit]

Coloured boxes

[edit]

Each cell represents a possible outcome. The events , and are represented by the areas shaded red, blue and yellow respectively. The overlap between the events and is shaded purple.

These are two examples illustrating conditional independence.

The probabilities of these events are shaded areas with respect to the total area. In both examples and are conditionally independent given because:

[1]

but not conditionally independent given because:

Proximity and delays

[edit]

Let events A and B be defined as the probability that person A and person B will be home in time for dinner where both people are randomly sampled from the entire world. Events A and B can be assumed to be independent i.e. knowledge that A is late has minimal to no change on the probability that B will be late. However, if a third event is introduced, person A and person B live in the same neighborhood, the two events are now considered not conditionally independent. Traffic conditions and weather-related events that might delay person A, might delay person B as well. Given the third event and knowledge that person A was late, the probability that person B will be late does meaningfully change.[2]

Dice rolling

[edit]

Conditional independence depends on the nature of the third event. If you roll two dice, one may assume that the two dice behave independently of each other. Looking at the results of one die will not tell you about the result of the second die. (That is, the two dice are independent.) If, however, the 1st die's result is a 3, and someone tells you about a third event - that the sum of the two results is even - then this extra unit of information restricts the options for the 2nd result to an odd number. In other words, two events can be independent, but NOT conditionally independent.[2]

Height and vocabulary

[edit]

Height and vocabulary are dependent since very small people tend to be children, known for their more basic vocabularies. But knowing that two people are 19 years old (i.e., conditional on age) there is no reason to think that one person's vocabulary is larger if we are told that they are taller.

Conditional independence of random variables

[edit]

Two discrete random variables and are conditionally independent given a third discrete random variable if and only if they are independent in their conditional probability distribution given . That is, and are conditionally independent given if and only if, given any value of , the probability distribution of is the same for all values of and the probability distribution of is the same for all values of . Formally:

where is the conditional cumulative distribution function of and given .

Two events and are conditionally independent given a σ-algebra if

where denotes the conditional expectation of the indicator function of the event , , given the sigma algebra . That is,

Two random variables and are conditionally independent given a σ-algebra if the above equation holds for all in and in .

Two random variables and are conditionally independent given a random variable if they are independent given σ(W): the σ-algebra generated by . This is commonly written:

or

This is read " is independent of , given "; the conditioning applies to the whole statement: "( is independent of ) given ".

This notation extends for " is independent of ."

If assumes a countable set of values, this is equivalent to the conditional independence of X and Y for the events of the form . Conditional independence of more than two events, or of more than two random variables, is defined analogously.

The following two examples show that neither implies nor is implied by .

First, suppose is 0 with probability 0.5 and 1 otherwise. When W = 0 take and to be independent, each having the value 0 with probability 0.99 and the value 1 otherwise. When , and are again independent, but this time they take the value 1 with probability 0.99. Then . But and are dependent, because Pr(X = 0) < Pr(X = 0|Y = 0). This is because Pr(X = 0) = 0.5, but if Y = 0 then it's very likely that W = 0 and thus that X = 0 as well, so Pr(X = 0|Y = 0) > 0.5.

For the second example, suppose , each taking the values 0 and 1 with probability 0.5. Let be the product . Then when , Pr(X = 0) = 2/3, but Pr(X = 0|Y = 0) = 1/2, so is false. This is also an example of Explaining Away. See Kevin Murphy's tutorial [3] where and take the values "brainy" and "sporty".

Conditional independence of random vectors

[edit]

Two random vectors and are conditionally independent given a third random vector if and only if they are independent in their conditional cumulative distribution given . Formally:

where , and and the conditional cumulative distributions are defined as follows.

Uses in Bayesian inference

[edit]

Let p be the proportion of voters who will vote "yes" in an upcoming referendum. In taking an opinion poll, one chooses n voters randomly from the population. For i = 1, ..., n, let Xi = 1 or 0 corresponding, respectively, to whether or not the ith chosen voter will or will not vote "yes".

In a frequentist approach to statistical inference one would not attribute any probability distribution to p (unless the probabilities could be somehow interpreted as relative frequencies of occurrence of some event or as proportions of some population) and one would say that X1, ..., Xn are independent random variables.

By contrast, in a Bayesian approach to statistical inference, one would assign a probability distribution to p regardless of the non-existence of any such "frequency" interpretation, and one would construe the probabilities as degrees of belief that p is in any interval to which a probability is assigned. In that model, the random variables X1, ..., Xn are not independent, but they are conditionally independent given the value of p. In particular, if a large number of the Xs are observed to be equal to 1, that would imply a high conditional probability, given that observation, that p is near 1, and thus a high conditional probability, given that observation, that the next X to be observed will be equal to 1.

Rules of conditional independence

[edit]

A set of rules governing statements of conditional independence have been derived from the basic definition.[4][5]

These rules were termed "Graphoid Axioms" by Pearl and Paz,[6] because they hold in graphs, where is interpreted to mean: "All paths from X to A are intercepted by the set B".[7]

Symmetry

[edit]

Proof:

From the definition of conditional independence,

Decomposition

[edit]

Proof From the definition of conditional independence, we seek to show that:

. The left side of this equality is:

, where the expression on the right side of this equality is the summation over such that of the conditional probability of on . Further decomposing,

. Special cases of this property include

    • Proof: Let us define and be an 'extraction' function . Then:
    • Proof: Let us define and be again an 'extraction' function . Then:

Weak union

[edit]

Proof:

Given , we aim to show

. We begin with the left side of the equation

. From the given condition

. Thus , so we have shown that .

Special Cases:

Some textbooks present the property as

  • [8].
  • .

Both versions can be shown to follow from the weak union property given initially via the same method as in the decomposition section above.

Contraction

[edit]

Proof

This property can be proved by noticing , each equality of which is asserted by and , respectively.

Intersection

[edit]

For strictly positive probability distributions,[5] the following also holds:

Proof

By assumption:

Using this equality, together with the Law of total probability applied to :

Since and , it follows that .

Technical note: since these implications hold for any probability space, they will still hold if one considers a sub-universe by conditioning everything on another variable, say K. For example, would also mean that .

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Conditional independence is a fundamental concept in that extends the idea of statistical to scenarios where additional information is available, stating that two random variables (or events) XX and YY are conditionally independent given a third variable (or event) ZZ if P(XY,Z)=P(XZ)P(X \mid Y, Z) = P(X \mid Z), or equivalently, if the joint factors as P(X,YZ)=P(XZ)P(YZ)P(X, Y \mid Z) = P(X \mid Z) \cdot P(Y \mid Z). This property holds even when XX and YY are unconditionally dependent, as conditioning on ZZ can "explain away" or block the dependence pathway between them, a phenomenon central to and Bayesian reasoning. For instance, in graphical models like Bayesian networks, conditional independence is encoded via the absence of direct edges between nodes, enabling efficient computation of complex joint distributions by decomposing them into local conditional probabilities. Key properties include symmetry—XYZX \perp Y \mid Z implies YXZY \perp X \mid Z—and the semi-graphoid axioms, which govern how conditional independences compose and decompose in probabilistic models, forming the basis for d-separation criteria in directed acyclic graphs. These axioms ensure that conditional independence satisfies , , weak union, and contraction, providing a rigorous framework for verifying independences without full distributional knowledge. Applications span statistics, , and ; for example, it underpins naive Bayes classifiers by assuming feature independence given the class label, and it facilitates inference in hidden Markov models where observations are conditionally independent given latent states. In causal discovery, conditional independence tests help identify graph structures from data, as formalized in algorithms like PC (Peter-Clark), distinguishing from causation. Overall, conditional independence simplifies high-dimensional probabilistic modeling, making intractable problems tractable by exploiting modular structure.

Conditional Independence for Events

Definition and Basic Properties

In , the foundational concepts of conditional independence build upon the Kolmogorov axioms, which establish probability as a non-negative measure on a sigma-algebra of events that sums to 1 over the entire . is defined as the probability of an event AA given that event BB has occurred, expressed as P(AB)=P(AB)P(B)P(A \mid B) = \frac{P(A \cap B)}{P(B)} for P(B)>0P(B) > 0, providing the prerequisite framework for analyzing dependencies under partial information. Two events AA and BB in a are said to be conditionally independent given a third event CC with P(C)>0P(C) > 0 if P(ABC)=P(AC)P(BC).P(A \cap B \mid C) = P(A \mid C) \cdot P(B \mid C). This definition captures the idea that, upon observing CC, the occurrence of AA provides no additional about the likelihood of BB, or vice versa. The conditional probability measure inherits key properties from the unconditional case: it is non-negative, so 0P(C)10 \leq P(\cdot \mid C) \leq 1, and it normalizes to 1 over any partition of the into events whose union is the entire space, ensuring iP(AiC)=1\sum_i P(A_i \mid C) = 1 when iAi=Ω\bigcup_i A_i = \Omega and the AiA_i are disjoint. This structure extends the notion of unconditional , which arises as a special case when CC is the full sample space Ω\Omega (where P(Ω)=1P(\Omega) = 1), reducing the condition to P(AB)=P(A)P(B)P(A \cap B) = P(A) P(B). The formalization of conditional probability and independence originated in early 20th-century developments in measure-theoretic probability, with providing a rigorous axiomatic foundation in his 1933 work Foundations of the Theory of Probability.

Equivalent Formulations

Conditional independence of events AA and BB given event CC (with P(C)>0P(C) > 0) is primarily defined as P(ABC)=P(AC)P(BC)P(A \cap B \mid C) = P(A \mid C) P(B \mid C). This is equivalent to the joint probability formulation P(ABC)=P(AC)P(BC)P(C)P(A \cap B \cap C) = \frac{P(A \cap C) P(B \cap C)}{P(C)}. To derive this equivalence, start with the definition of : P(ABC)=P(ABC)P(C)P(A \cap B \mid C) = \frac{P(A \cap B \cap C)}{P(C)} and P(AC)=P(AC)P(C)P(A \mid C) = \frac{P(A \cap C)}{P(C)}, P(BC)=P(BC)P(C)P(B \mid C) = \frac{P(B \cap C)}{P(C)}. Substituting the latter two into the independence condition yields P(ABC)P(C)=P(AC)P(C)P(BC)P(C)\frac{P(A \cap B \cap C)}{P(C)} = \frac{P(A \cap C)}{P(C)} \cdot \frac{P(B \cap C)}{P(C)}. Multiplying both sides by P(C)P(C) gives P(ABC)=P(AC)P(BC)P(C)P(A \cap B \cap C) = \frac{P(A \cap C) P(B \cap C)}{P(C)}, confirming the equivalence. The reverse direction follows by rearranging: assuming the joint form, divide both sides by P(C)P(C) to recover the conditional product form. This derivation relies on the basic properties of and assumes P(C)>0P(C) > 0 to avoid division by zero. Another equivalent formulation is P(ABC)=P(AC)P(A \mid B \cap C) = P(A \mid C). To prove this from the primary definition, apply the definition of : P(ABC)=P(ABC)P(BC)P(A \mid B \cap C) = \frac{P(A \cap B \cap C)}{P(B \cap C)}. Now substitute P(ABC)=P(AC)P(BC)/P(C)P(A \cap B \cap C) = P(A \cap C) P(B \cap C) / P(C) from the joint equivalence, yielding [P(AC)P(BC)/P(C)]P(BC)=P(AC)P(C)=P(AC)\frac{[P(A \cap C) P(B \cap C) / P(C)]}{P(B \cap C)} = \frac{P(A \cap C)}{P(C)} = P(A \mid C). By symmetry, P(BAC)=P(BC)P(B \mid A \cap C) = P(B \mid C) also holds. These equivalences extend to the factorization of the over events, where the joint measure factors conditionally on CC. Bayes' theorem is not directly required here but supports the manipulations via the chain rule for probabilities. Edge cases arise when P(C)=0P(C) = 0, rendering conditional probabilities undefined under the standard Kolmogorov axioms, as division by P(C)P(C) is impossible. In such scenarios, conditional independence is vacuously true or not considered, depending on the measure-theoretic extension, but the formulations above do not apply. Degenerate events, like CC being the or the full , similarly lead to undefined or trivial conditions, where reduces to unconditional forms if applicable.

Illustrative Examples

One illustrative example of conditional independence for events involves drawing balls from two differently composed boxes. Suppose there are two boxes: the red box contains two red balls, and the blue box contains one red ball and one blue ball. A box is selected at random with equal probability, a ball is drawn from it (and noted for color), returned, and then a ball is drawn from the other box. Let R_1 be the event that the first ball drawn is red, R_2 the event that the second ball is red, and B the event that the first box selected was the red box. Unconditionally, R_1 and R_2 are dependent because the box compositions affect both draws indirectly through the selection process. However, given B, R_1 and R_2 are conditionally independent, as the second draw is from the fixed remaining box, unaffected by the outcome of the first draw. To verify, compute P(R_1 | B) = 1 (since the red box has only red balls), P(R_2 | B) = 1/2 (second draw from blue box), and P(R_1, R_2 | B) = 1 \times 1/2 = 1/2, so P(R_1, R_2 | B) = P(R_1 | B) P(R_2 | B). Similarly, P(R_1 | B^c) = 1/2, P(R_2 | B^c) = 1, and P(R_1, R_2 | B^c) = 1/2 \times 1 = 1/2, confirming the equality holds given B^c. A classic example from dice rolling demonstrates conditional dependence contrasting with unconditional independence, but can illustrate the boundary of conditional independence. Consider two fair six-sided dice rolled independently. Let D_1 be the event that the first die shows an even number, and D_2 the event that the second die shows an even number. Unconditionally, D_1 and D_2 are independent since P(D_1) = P(D_2) = 1/2 and P(D_1 \cap D_2) = 1/4 = P(D_1) P(D_2). Now condition on the sum S being even. The probability P(D_1 | S even) = 1/2 (by symmetry), P(D_2 | S even) = 1/2, but P(D_1 \cap D_2 | S even) = P(both even | sum even) = 9/18 = 1/2 (there are 18 outcomes with even sum: 9 even-even like (2,2),(2,4),...,(6,6) and 9 odd-odd like (1,1),(1,3),...,(5,5); 9 have both even), so P(D_1 \cap D_2 | S even) ≠ P(D_1 | S even) P(D_2 | S even), showing dependence given the parity of the sum. For conditional independence, consider conditioning on the parity of one die; however, the core illustration here highlights how conditioning on the sum or parity induces dependence, contrasting the definition where equality would hold under appropriate conditions like no shared constraint. In everyday scenarios, conditional independence appears in the relationship between a child's and size given their age. Height and size are dependent unconditionally, as both tend to increase with age—taller children often have larger vocabularies due to developmental progression. However, given the child's age, and size become conditionally independent, as age accounts for the common developmental factor, making additional information about one irrelevant to predicting the other. For instance, among 5-year-olds, variations (due to or ) do not predict differences beyond what age already explains. To illustrate formally, suppose H is the event of above-average , V above-average , and A a specific age group. Then P(H | A) and P(V | A) are fixed by age norms, and P(H \cap V | A) = P(H | A) P(V | A) if no direct link beyond age, as verified in developmental studies where drops to zero when stratifying by age. Another intuitive example involves bus arrival delays on the same route. Let D_1 be the event of delay for bus 1 and D_2 for bus 2, which are dependent unconditionally due to shared traffic conditions causing both to be late together. However, given the traffic condition T (e.g., heavy congestion), D_1 and D_2 are conditionally independent, as each bus's delay then depends only on its own factors like driver behavior or stops, independent of the other given T. Computationally, P(D_1 | T) = probability of delay under known traffic (say 0.8 for heavy), P(D_2 | T) = 0.8 similarly, and P(D_1 \cap D_2 | T) = 0.8 \times 0.8 = 0.64 if independent given T, whereas unconditionally P(D_1 \cap D_2) > P(D_1) P(D_2) due to from T. This reflects a common causal fork, where traffic is the .

Conditional Independence for Random Variables

Formal Definition

In , random variables XX, YY, and ZZ are defined on a (Ω,F,P)(\Omega, \mathcal{F}, P). The random variables XX and YY are said to be conditionally independent given ZZ if the σ\sigma-algebras they generate, σ(X)\sigma(X) and σ(Y)\sigma(Y), are conditionally independent given σ(Z)\sigma(Z). Specifically, two sub-σ\sigma-algebras G\mathcal{G} and H\mathcal{H} of F\mathcal{F} are conditionally independent given a sub-σ\sigma-algebra FGH\mathcal{F} \subseteq \mathcal{G} \vee \mathcal{H} if, for every GGG \in \mathcal{G} and HHH \in \mathcal{H}, P(GHF)=P(GF)P(HF)P(G \cap H \mid \mathcal{F}) = P(G \mid \mathcal{F}) \, P(H \mid \mathcal{F}) almost surely. An equivalent measure-theoretic formulation for the conditional independence of XX and YY given ZZ is that, for all measurable sets AA and BB in the respective Borel σ\sigma-algebras, P(XA,YBZ)=P(XAZ)P(YBZ)P(X \in A, Y \in B \mid Z) = P(X \in A \mid Z) \, P(Y \in B \mid Z) almost surely with respect to PP. This holds under the assumption of a complete probability space, where null sets are included in F\mathcal{F}, ensuring the conditional probabilities are well-defined. This definition for random variables generalizes the notion of conditional independence for events, as it reduces to the event case when X=1EX = \mathbf{1}_E and Y=1FY = \mathbf{1}_F for events E,FFE, F \in \mathcal{F}, where σ(X)={,E,Ec,Ω}\sigma(X) = \{ \emptyset, E, E^c, \Omega \} and similarly for σ(Y)\sigma(Y). The formal definition applies uniformly to both discrete and continuous random variables, though verification in continuous settings typically relies on the existence of regular conditional distributions.

Key Properties and Verification

Conditional independence possesses several key properties that facilitate its use in probabilistic modeling. One fundamental property is preservation under additional conditioning by independent variables: if XYZX \perp Y \mid Z and W(X,Y,Z)W \perp (X, Y, Z), then XY(Z,W)X \perp Y \mid (Z, W). Another important property for multiple variables is the decomposition (or mixing) property: if X(Y,W)ZX \perp (Y, W) \mid Z, then XYZX \perp Y \mid Z and XWZX \perp W \mid Z. These properties ensure that conditional independence structures remain stable when expanding the conditioning set with irrelevant information or decomposing joint independences. Verification of conditional independence XYZX \perp Y \mid Z can be performed theoretically through equivalence to zero conditional mutual information, defined as I(X;YZ)=H(XZ)H(XY,Z)=0,I(X; Y \mid Z) = H(X \mid Z) - H(X \mid Y, Z) = 0, where HH denotes ; this holds the variables are conditionally independent. Empirically, log-likelihood ratio tests, such as the deviance G2=2oijklog(oijk/eijk)G^2 = 2 \sum o_{ijk} \log(o_{ijk}/e_{ijk}) in multi-way contingency tables (where oijko_{ijk} and eijke_{ijk} are observed and expected frequencies under conditional independence), provide a means to assess the hypothesis, with asymptotic under the null. For computational verification, discrete cases often rely on contingency tables, where conditional independence is tested by stratifying over the conditioner ZZ and applying chi-squared or log-likelihood ratio statistics to each slice, aggregating for overall assessment. In continuous settings, copula-based methods transform marginals to uniform via empirical copulas and test for independence in the partial copula, while kernel methods embed variables into reproducing kernel Hilbert spaces and use Hilbert-Schmidt independence criteria conditioned on ZZ to detect dependence. Unlike unconditional independence, conditional independence does not imply marginal independence, and marginalizing over the conditioner ZZ can induce dependence between XX and YY even if they are conditionally independent given ZZ. This distinction underscores the role of the conditioning set in revealing or masking dependencies.

Common Examples

One common example of conditional independence for random variables arises in the context of Bernoulli trials with an unknown success parameter. Consider independent Bernoulli random variables X1,X2,,XnX_1, X_2, \dots, X_n each with success probability θ\theta, where θ\theta itself is a random variable (e.g., following a Beta prior in Bayesian settings). Marginally, the XiX_i are dependent due to their shared dependence on θ\theta, as observing one XiX_i updates beliefs about θ\theta and thus affects the others. However, conditionally on θ\theta, the XiX_i are independent, since the joint conditional probability mass function factors as p(X1,,Xnθ)=i=1np(Xiθ)=i=1nθXi(1θ)1Xi,p(X_1, \dots, X_n \mid \theta) = \prod_{i=1}^n p(X_i \mid \theta) = \prod_{i=1}^n \theta^{X_i} (1 - \theta)^{1 - X_i}, demonstrating that X1,,XnX_1, \dots, X_n are conditionally independent given θ\theta. This structure is fundamental in for modeling sequences of binary outcomes, such as coin flips or diagnostic tests, where the parameter captures shared uncertainty. Another illustrative case involves jointly normal random variables. For a multivariate Gaussian vector X=(X1,,Xp)TN(μ,Σ)\mathbf{X} = (X_1, \dots, X_p)^T \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma), conditional independence between components XiX_i and XjX_j (for iji \neq j) given the remaining variables Xij\mathbf{X}_{-ij} holds if and only if the (i,j)(i,j)-entry of the precision matrix Θ=Σ1\Theta = \Sigma^{-1} is zero. This equivalence stems from the conditional distribution of XiXiX_i \mid \mathbf{X}_{-i}, which has mean and variance determined by the ii-th row and column of Θ\Theta; a zero Θij\Theta_{ij} implies no direct conditional dependence on XjX_j. For a simple bivariate example with a third conditioning variable, suppose X=(X,Y,Z)T\mathbf{X} = (X, Y, Z)^T with covariance matrix Σ=(10.250.50.2510.50.50.51),\Sigma = \begin{pmatrix} 1 & 0.25 & 0.5 \\ 0.25 & 1 & 0.5 \\ 0.5 & 0.5 & 1 \end{pmatrix},
Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.