Hubbry Logo
Chain rule (probability)Chain rule (probability)Main
Open search
Chain rule (probability)
Community hub
Chain rule (probability)
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Chain rule (probability)
Chain rule (probability)
from Wikipedia

In probability theory, the chain rule[1] (also called the general product rule[2][3]) describes how to calculate the probability of the intersection of, not necessarily independent, events or the joint distribution of random variables respectively, using conditional probabilities. This rule allows one to express a joint probability in terms of only conditional probabilities.[4] The rule is notably used in the context of discrete stochastic processes and in applications, e.g. the study of Bayesian networks, which describe a probability distribution in terms of conditional probabilities.

Chain rule for events

[edit]

Two events

[edit]

For two events and , the chain rule states that

,

where denotes the conditional probability of given .

Example

[edit]

A Jar A has 1 black ball and 2 white balls, and another Jar B has 1 black ball and 3 white balls. Suppose we pick an urn at random and then select a ball from that urn. Let event be choosing the first urn, i.e. , where is the complementary event of . Let event be the chance we choose a white ball. The chance of choosing a white ball, given that we have chosen the first urn, is The intersection then describes choosing the first urn and a white ball from it. The probability can be calculated by the chain rule as follows:

Finitely many events

[edit]

For events whose intersection has not probability zero, the chain rule states

Example 1

[edit]

For , i.e. four events, the chain rule reads

.

Example 2

[edit]

We randomly draw 4 cards (one at a time) without replacement from deck with 52 cards. What is the probability that we have picked 4 aces?

First, we set . Obviously, we get the following probabilities

.

Applying the chain rule,

.

Statement of the theorem and proof

[edit]

Let be a probability space. Recall that the conditional probability of an given is defined as

Then we have the following theorem.

Chain rule Let be a probability space. Let . Then

Proof

The formula follows immediately by recursion

where we used the definition of the conditional probability in the first step.

Chain rule for discrete random variables

[edit]

Two random variables

[edit]

For two discrete random variables , we use the events and in the definition above, and find the joint distribution as

or

where is the probability distribution of and conditional probability distribution of given .

Finitely many random variables

[edit]

Let be random variables and . By the definition of the conditional probability,

and using the chain rule, where we set , we can find the joint distribution as

Example

[edit]

For , i.e. considering three random variables. Then, the chain rule reads

Bibliography

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In , the chain rule, also known as the , expresses the joint probability of multiple events as a product of a sequence of , allowing the of complex joint distributions into more manageable components. For a finite of events A1,A2,,AnA_1, A_2, \dots, A_n in a , the rule states that the probability of their is P(i=1nAi)=P(A1)i=2nP(Aij=1i1Aj)P\left(\bigcap_{i=1}^n A_i\right) = P(A_1) \prod_{i=2}^n P\left(A_i \mid \bigcap_{j=1}^{i-1} A_j\right), where each term conditions on the occurrence of all preceding events. This formulation follows directly from the definition of and can be proven by on the number of events. The chain rule extends seamlessly to random variables, providing a foundational tool for representing joint probability distributions in terms of conditional distributions. For discrete random variables X1,X2,,XnX_1, X_2, \dots, X_n, the joint is P(X1=x1,,Xn=xn)=i=1nP(Xi=xiX1=x1,,Xi1=xi1)P(X_1 = x_1, \dots, X_n = x_n) = \prod_{i=1}^n P(X_i = x_i \mid X_1 = x_1, \dots, X_{i-1} = x_{i-1}), with the convention that the conditioning is empty for the first term. This factorization is particularly useful in computational contexts, such as deriving full joint distributions from partial conditional knowledge. The chain rule underpins key developments in statistical modeling and , including Bayesian networks and algorithms that rely on probabilistic graphical models to handle dependencies among variables efficiently. By enabling the breakdown of high-dimensional joint probabilities, it facilitates practical computations in areas like and predictive modeling, where direct evaluation of joints would otherwise be intractable.

Chain rule for events

Two events

The chain rule for two events AA and BB in a probability space provides a fundamental way to express their joint probability:
P(AB)=P(A)P(BA)=P(B)P(AB),P(A \cap B) = P(A) \, P(B \mid A) = P(B) \, P(A \mid B),
assuming P(A)>0P(A) > 0 and P(B)>0P(B) > 0. This equality follows directly from the definition of conditional probability, P(BA)=P(AB)P(A)P(B \mid A) = \frac{P(A \cap B)}{P(A)}, by rearranging terms to isolate the joint probability.\] The modern axiomatic definition of conditional probability, underpinning this rule, was formalized by Andrey Kolmogorov in his foundational work on probability theory.\[
This formulation breaks down the probability of both events occurring simultaneously into the product of the marginal probability of one event (the unconditional probability) and the of the second event given the first. It enables computation of joint probabilities by leveraging known marginals and conditionals, which is particularly useful when events are dependent and direct assessment of the is challenging.$$] The in the expression highlights that the rule can condition on either event, offering flexibility in application. Concepts akin to , including early forms of this , emerged in the through the correspondence between and on the "" in gambling scenarios.[Kolmogorovs1933axiomatizationprovidedtherigorousmathematicalfoundationthatsolidifiedthechainrulewithinmeasuretheoreticprobability. Kolmogorov's 1933 axiomatization provided the rigorous mathematical foundation that solidified the chain rule within measure-theoretic probability.] A simple example illustrates the rule: consider drawing two cards sequentially without replacement from a . The probability of drawing an first and then a is P(ace first)P(king secondace first)=452451=4663P(\text{ace first}) \cdot P(\text{king second} \mid \text{ace first}) = \frac{4}{52} \cdot \frac{4}{51} = \frac{4}{663}. This factors the joint event into the initial marginal probability and the updated after the first draw.[$$

Finitely many events

The chain rule for finitely many events extends the two-event case by iteratively applying conditional probabilities to compute the joint probability of the intersection of nn events A1,A2,,AnA_1, A_2, \dots, A_n. This allows the joint probability P(A1A2An)P(A_1 \cap A_2 \cap \dots \cap A_n) to be expressed as a product of a marginal probability and successive conditional probabilities, where each conditioning set accumulates the previous . The general formula is: P(A1A2An)=P(A1)P(A2A1)P(A3A1A2)P(AnA1An1),P(A_1 \cap A_2 \cap \dots \cap A_n) = P(A_1) \, P(A_2 \mid A_1) \, P(A_3 \mid A_1 \cap A_2) \cdots P(A_n \mid A_1 \cap \dots \cap A_{n-1}), assuming all relevant probabilities are well-defined (e.g., conditioning events have positive probability). This arises from the recursive application of the definition of , providing a multiplicative structure for intersections without relying on more complex methods like inclusion-exclusion, which applies to unions. A key advantage of this formulation is its flexibility in the ordering of the events. By selecting an order that aligns with known conditional dependencies or independences, computations can be simplified; for instance, if later events are conditionally independent of some earlier ones given others, the corresponding conditional probabilities reduce to marginals or simpler forms. This ordering strategy is particularly useful in structured probabilistic models where dependencies are sparse. Consider the probability of on three consecutive days, R1,R2,R3R_1, R_2, R_3, where daily rain events are independent with P(Ri)=0.3P(R_i) = 0.3 for each ii. The joint probability is P(R1R2R3)=P(R1)P(R2R1)P(R3R1R2)P(R_1 \cap R_2 \cap R_3) = P(R_1) \, P(R_2 \mid R_1) \, P(R_3 \mid R_1 \cap R_2). Due to , P(R2R1)=P(R2)=0.3P(R_2 \mid R_1) = P(R_2) = 0.3 and P(R3R1R2)=P(R3)=0.3P(R_3 \mid R_1 \cap R_2) = P(R_3) = 0.3, yielding 0.3×0.3×0.3=0.0270.3 \times 0.3 \times 0.3 = 0.027. This illustrates how the chain rule leverages to revert conditionals to marginals, easing calculation for sequences.

Proof

The chain rule for events, also known as the , states that in a (Ω,F,P)(\Omega, \mathcal{F}, P), for any finite collection of A1,A2,,AnFA_1, A_2, \dots, A_n \in \mathcal{F} with P(j=1iAj)>0P\left( \bigcap_{j=1}^{i} A_j \right) > 0 for all i=1,,ni = 1, \dots, n, P(i=1nAi)=i=1nP(Ai|j=1i1Aj),P\left( \bigcap_{i=1}^n A_i \right) = \prod_{i=1}^n P\left( A_i \,\middle|\, \bigcap_{j=1}^{i-1} A_j \right),
Add your contribution
Related Hubs
User Avatar
No comments yet.