Recent from talks
Nothing was collected or created yet.
Conditional dependence
View on Wikipedia
In probability theory, conditional dependence is a relationship between two or more events that are dependent when a third event occurs.[1] For example, if and are two events that individually increase the probability of a third event and do not directly affect each other, then initially (when it has not been observed whether or not the event occurs)[2][3] ( are independent).
But suppose that now is observed to occur. If event occurs then the probability of occurrence of the event will decrease because its positive relation to is less necessary as an explanation for the occurrence of (similarly, event occurring will decrease the probability of occurrence of ). Hence, now the two events and are conditionally negatively dependent on each other because the probability of occurrence of each is negatively dependent on whether the other occurs. We have[4]
Conditional dependence of A and B given C is the logical negation of conditional independence .[5] In conditional independence two events (which may be dependent or not) become independent given the occurrence of a third event.[6]
Example
[edit]In essence probability is influenced by a person's information about the possible occurrence of an event. For example, let the event be 'I have a new phone'; event be 'I have a new watch'; and event be 'I am happy'; and suppose that having either a new phone or a new watch increases the probability of my being happy. Let us assume that the event has occurred – meaning 'I am happy'. Now if another person sees my new watch, he/she will reason that my likelihood of being happy was increased by my new watch, so there is less need to attribute my happiness to a new phone.
To make the example more numerically specific, suppose that there are four possible states given in the middle four columns of the following table, in which the occurrence of event is signified by a in row and its non-occurrence is signified by a and likewise for and That is, and The probability of is for every
| Event | Probability of event | ||||
|---|---|---|---|---|---|
| 0 | 1 | 0 | 1 | ||
| 0 | 0 | 1 | 1 | ||
| 0 | 1 | 1 | 1 |
and so
| Event | Probability of event | ||||
|---|---|---|---|---|---|
| 0 | 0 | 0 | 1 | ||
| 0 | 1 | 0 | 1 | ||
| 0 | 0 | 1 | 1 | ||
| 0 | 0 | 0 | 1 |
In this example, occurs if and only if at least one of occurs. Unconditionally (that is, without reference to ), and are independent of each other because —the sum of the probabilities associated with a in row —is while But conditional on having occurred (the last three columns in the table), we have while Since in the presence of the probability of is affected by the presence or absence of and are mutually dependent conditional on
See also
[edit]- Conditional independence – Probability theory concept
- de Finetti's theorem – Conditional independence of exchangeable observations
- Conditional expectation – Expected value of a random variable given that certain conditions are known to occur
References
[edit]- ^ Husmeier, Dirk. "Introduction to Learning Bayesian Networks from Data". In Husmeier, Dirk; Dybowski, Richard; Roberts, Stephen (eds.). Probabilistic Modeling in Bioinformatics and Medical Informatics. Advanced Information and Knowledge Processing. Springer-Verlag. pp. 17–57. doi:10.1007/1-84628-119-9_2. ISBN 1852337788.
- ^ Conditional Independence in Statistical theory "Conditional Independence in Statistical Theory", A. P. Dawid" Archived 2013-12-27 at the Wayback Machine
- ^ Probabilistic independence on Britannica "Probability->Applications of conditional probability->independence (equation 7) "
- ^ Introduction to Artificial Intelligence by Sebastian Thrun and Peter Norvig, 2011 "Unit 3: Explaining Away"[permanent dead link]
- ^ Bouckaert, Remco R. (1994). "11. Conditional dependence in probabilistic networks". In Cheeseman, P.; Oldford, R. W. (eds.). Selecting Models from Data, Artificial Intelligence and Statistics IV. Lecture Notes in Statistics. Vol. 89. Springer-Verlag. pp. 101–111, especially 104. ISBN 978-0-387-94281-0.
- ^ Conditional Independence in Statistical theory "Conditional Independence in Statistical Theory", A. P. Dawid Archived 2013-12-27 at the Wayback Machine
Conditional dependence
View on GrokipediaCore Concepts
Definition
Conditional dependence refers to a relationship between random variables or events where the probability distribution of one is influenced by the other, even after incorporating information from a conditioning variable or set.[5] Intuitively, it arises when knowing the outcome of one variable alters the expected behavior of another, despite accounting for the conditioning factor, reflecting a residual association not explained by the conditioner alone.[6] Formally, two random variables and are conditionally dependent given a third variable (with ) if there exist values in their supports such that [5] This inequality indicates that the joint conditional distribution does not factorize into the product of the marginal conditionals, signifying dependence.[7] Unlike unconditional (marginal) dependence, which assesses association without conditioning, conditional dependence can emerge or disappear based on the conditioner; notably, and may be unconditionally independent yet conditionally dependent given , as in collider bias where is a common effect of and , inducing spurious association upon conditioning.[8] Conversely, unconditional dependence may vanish under certain conditioning, highlighting the context-specific nature of probabilistic relationships.[5] The concept was first formalized within modern probability theory in the early 20th century, building on Andrei Kolmogorov's axiomatic foundations established in 1933, which provided the rigorous framework for conditional probabilities underlying dependence relations.Relation to Unconditional Dependence
Unconditional dependence between two random variables and occurs when their joint probability distribution does not factorize into the product of their marginal distributions, that is, when . This contrasts with conditional dependence, which, as defined earlier, evaluates the joint distribution relative to a conditioning variable . In essence, unconditional dependence captures marginal associations without additional context, while conditional dependence reveals how these associations may alter given knowledge of . Conditioning on can induce conditional independence from unconditional dependence, particularly in scenarios involving a common cause. For instance, if directly influences both and (as in a directed acyclic graph where arrows point from to and from to ), and exhibit unconditional dependence due to their shared origin, but become conditionally independent given , as the influence of the common cause is accounted for.[5] This structure, known as a common cause or fork, illustrates how conditioning removes spurious associations propagated through .[9] Conversely, conditioning can induce conditional dependence where unconditional independence previously held, a phenomenon exemplified by the V-structure in directed acyclic graphs. In a V-structure, arrows converge on from both and (i.e., ), rendering and unconditionally independent since they lack a direct path of influence.[5] However, conditioning on —the common effect—creates a dependence between and , as observing provides evidence that selects paths linking the two causes through the collider at .[9] This is the basis for "explaining away," where evidence for one cause (say, ) reduces the likelihood of the alternative cause () given the observed effect , thereby inducing negative conditional dependence between the causes. Overall, conditioning on can thus create new dependencies, remove existing ones, or even invert the direction of association between and , fundamentally altering the dependence structure depending on the underlying causal relationships.[5] These dynamics underscore the importance of graphical models like directed acyclic graphs in visualizing how marginal and conditional dependencies interact.[9]Formal Framework
Probabilistic Formulation
In probability theory, conditional dependence between two events and given a third event with is defined as the failure of the equality , where the conditional probability is given by .[10] This inequality indicates that the occurrence of affects the probability of (or vice versa) even after accounting for .[11] For random variables, consider random variables , , and defined on a probability space. The joint conditional probability mass or density function encapsulates the probabilistic structure. Specifically, the joint conditional distribution satisfies , derived from the chain rule for conditional probabilities: starting from the joint distribution , dividing by yields the conditional form, assuming .[12] Conditional dependence holds when this factorization does not imply , i.e., when . Unconditional dependence arises as the special case where is a constant event with probability 1.[10] In the discrete case, for random variables taking values in countable sets, the conditional joint probability mass function is for , and the marginal conditionals are and similarly for . Dependence occurs if for some with .[13] For continuous random variables with joint density , the conditional joint density is for , with marginal conditionals and analogously for . Conditional dependence is present when for some with .[12] From an axiomatic perspective in measure-theoretic probability, conditional dependence is framed using sigma-algebras. Let be a probability space, and let , , be the sigma-algebras generated by measurable functions , , , respectively. The random variables and are conditionally dependent given if and are not conditionally independent given , meaning there exist events , such that on a set of positive probability, where conditional probability given a sigma-algebra is defined via the Radon-Nikodym derivative of the restricted measures.[14] Equivalently, for bounded measurable functions on the range of and on the range of , almost surely. This setup ensures the formulation aligns with Kolmogorov's axioms extended to conditional expectations.[14]Measure of Conditional Dependence
One prominent measure of conditional dependence is the conditional mutual information, denoted , which quantifies the amount of information shared between random variables and after conditioning on .[15] Defined in terms of entropies as , where is the conditional entropy of given measuring the remaining uncertainty in after observing , and similarly for the other terms, this metric captures the expected reduction in uncertainty about one variable from knowing the other, conditional on .[15] It equals zero if and only if and are conditionally independent given , providing a symmetric, non-negative measure applicable to both discrete and continuous variables without assuming linearity.[15] For jointly Gaussian random variables, partial correlation offers a computationally efficient alternative, measuring the correlation between and after removing the linear effects of . The partial correlation coefficient is given by where , , and are the pairwise Pearson correlation coefficients.[16] Under Gaussian assumptions, if and only if and are conditionally independent given , enabling straightforward hypothesis tests for dependence via its standardized distribution.[16] For non-linear dependencies, rank-based measures such as conditional Kendall's tau and conditional Spearman's rho extend unconditional rank correlations to the conditional setting. Conditional Kendall's tau assesses the concordance probability between and given , providing a robust, distribution-free measure of monotonic dependence that ranges from -1 to 1.[17] Similarly, conditional Spearman's rho evaluates the correlation of ranks after conditioning, suitable for detecting non-linear associations in non-Gaussian data.[18] Kernel-based approaches, like the conditional Hilbert-Schmidt Independence Criterion (HSIC), embed variables into reproducing kernel Hilbert spaces to detect arbitrary dependence forms, with the criterion equaling zero under conditional independence and otherwise positive, scaled by kernel choices.[2] These measures have specific limitations tied to their assumptions and practicality. Partial correlation assumes linearity and Gaussianity, potentially underestimating non-linear dependencies, while requiring inversion of covariance matrices that scales cubically with the dimension of .[16] Conditional mutual information, though versatile, demands entropy estimation, which is computationally intensive for high dimensions and sensitive to sample size in continuous cases.[15] Rank-based metrics like conditional Kendall's tau and Spearman's rho are robust to outliers but may lack power against weak or non-monotonic relations, and kernel methods such as conditional HSIC suffer from the curse of dimensionality due to kernel matrix computations, often requiring careful hyperparameter tuning.[17][18][2]Properties and Theorems
Basic Properties
Conditional dependence exhibits symmetry: if random variables and are conditionally dependent given , then and are also conditionally dependent given . This property arises directly from the definitional equivalence if and only if .[11] Measures of conditional dependence, such as conditional mutual information , possess non-negativity, satisfying , with equality holding if and only if and are conditionally independent given . This non-negativity stems from the interpretation of conditional mutual information as a Kullback-Leibler divergence, which is inherently non-negative. Additionally, conditional mutual information is symmetric, as .[19] Conditional dependence lacks transitivity with respect to unconditional dependence: the presence of dependence between and given does not imply dependence between and unconditionally. A sketch of a counterexample involves scenarios where and are marginally independent but become dependent upon conditioning on , such as when acts as a common effect (collider) of and .[20] Conditional dependence integrates with marginal distributions through the chain rule of probability, which expresses the joint distribution as a product of conditional probabilities, such as . In this factorization, conditional dependence between and given manifests in the term deviating from , thereby aggregating local dependencies into the overall joint structure while preserving the marginals.[21]Key Theorems
The Hammersley-Clifford theorem establishes a foundational link between conditional independence structures in Markov random fields and the factorization of their joint distributions. Specifically, for a finite undirected graph and random variables with strictly positive joint probability distribution that satisfies the local Markov property with respect to —meaning that each is conditionally independent of given , where is the set of neighbors of —the distribution admits a factorization over the maximal cliques of : where is the normalizing constant and each is a non-negative potential function defined on the variables in clique . This implies that the conditional dependence relations encoded by the graph's separation properties are fully captured by interactions within cliques, enabling the representation of complex dependence structures through local potentials in graphical models. A high-level proof outline proceeds by constructing the potentials iteratively from the conditional distributions implied by the Markov property, ensuring the product reproduces the joint via telescoping factorization and normalization, assuming positivity to avoid zero probabilities that could violate the Markov assumptions.[22] The decomposition property governs how conditional independence over composite sets implies independence over subsets, with direct implications for conditional dependence as its contrapositive. For conditional independence, if , then and . Equivalently, for conditional dependence (the negation), if or (i.e., depends on at least one of or given ), then . This property, part of the semi-graphoid axioms, ensures that joint conditional dependence cannot arise without at least one marginal dependence. A proof sketch for the independence direction uses marginalization: integrate the joint conditional density over to obtain , and similarly for the other subset; the dependence contrapositive follows immediately.[23] The intersection property further characterizes compositions of conditional independences, again with nuanced implications for dependence. For conditional independence under strictly positive distributions, if and , then . This axiom completes the graphoid properties, allowing inference of broader independences from restricted ones, but it fails without positivity—e.g., in distributions with zero probabilities, the property may not hold, leading to spurious conditional dependences where none are implied by the graph structure. For conditional dependence, the contrapositive is: if , then either or , though failure cases arise in non-positive measures where joint dependence does not propagate to both components, complicating graphical representations. A high-level proof sketch relies on the definition: from , ; substituting the second independence yields , with positivity ensuring all conditionals are well-defined via Bayes' rule without division by zero. Information-theoretic variants use mutual information inequalities, where and imply by chain rule additivity under positivity.[23]Examples and Illustrations
Elementary Example
Consider two fair coins flipped independently, resulting in random variables and , where 1 denotes heads and 0 denotes tails, each with . Define (the XOR operation), so if the outcomes match (both heads or both tails) and if they differ. This setup simulates a scenario where acts as a signal of outcome consistency, analogous to a "fair" (matching, ) or "biased" (mismatching, ) indication. Marginally, and are independent, as their joint distribution factors: , with each of the four outcomes having probability 0.25. Consequently, . Also, . However, conditioning on induces dependence between and . The conditional joint probabilities are , , and . The marginals are and similarly for . Thus, , demonstrating conditional dependence. A similar inequality holds for . The full joint probability distribution over is given in the following table:| X | Y | Z | P(X,Y,Z) |
|---|---|---|---|
| 0 | 0 | 0 | 0.25 |
| 0 | 1 | 1 | 0.25 |
| 1 | 0 | 1 | 0.25 |
| 1 | 1 | 0 | 0.25 |
| Y=0 | Y=1 | |
|---|---|---|
| X=0 | 0.5 | 0 |
| X=1 | 0 | 0.5 |
