Recent from talks
Nothing was collected or created yet.
Chain rule (probability)
View on WikipediaIn probability theory, the chain rule[1] (also called the general product rule[2][3]) describes how to calculate the probability of the intersection of, not necessarily independent, events or the joint distribution of random variables respectively, using conditional probabilities. This rule allows one to express a joint probability in terms of only conditional probabilities.[4] The rule is notably used in the context of discrete stochastic processes and in applications, e.g. the study of Bayesian networks, which describe a probability distribution in terms of conditional probabilities.
Chain rule for events
[edit]Two events
[edit]For two events and , the chain rule states that
- ,
where denotes the conditional probability of given .
Example
[edit]A Jar A has 1 black ball and 2 white balls, and another Jar B has 1 black ball and 3 white balls. Suppose we pick an urn at random and then select a ball from that urn. Let event be choosing the first urn, i.e. , where is the complementary event of . Let event be the chance we choose a white ball. The chance of choosing a white ball, given that we have chosen the first urn, is The intersection then describes choosing the first urn and a white ball from it. The probability can be calculated by the chain rule as follows:
Finitely many events
[edit]For events whose intersection has not probability zero, the chain rule states
Example 1
[edit]For , i.e. four events, the chain rule reads
- .
Example 2
[edit]We randomly draw 4 cards (one at a time) without replacement from deck with 52 cards. What is the probability that we have picked 4 aces?
First, we set . Obviously, we get the following probabilities
- .
Applying the chain rule,
- .
Statement of the theorem and proof
[edit]Let be a probability space. Recall that the conditional probability of an given is defined as
Then we have the following theorem.
Chain rule— Let be a probability space. Let . Then
The formula follows immediately by recursion
where we used the definition of the conditional probability in the first step.
Chain rule for discrete random variables
[edit]Two random variables
[edit]For two discrete random variables , we use the events and in the definition above, and find the joint distribution as
or
where is the probability distribution of and conditional probability distribution of given .
Finitely many random variables
[edit]Let be random variables and . By the definition of the conditional probability,
and using the chain rule, where we set , we can find the joint distribution as
Example
[edit]For , i.e. considering three random variables. Then, the chain rule reads
Bibliography
[edit]- René L. Schilling (2021), Measure, Integral, Probability & Processes - Probab(ilistical)ly the Theoretical Minimum (1 ed.), Technische Universität Dresden, Germany, ISBN 979-8-5991-0488-9
{{citation}}: CS1 maint: location missing publisher (link) - William Feller (1968), An Introduction to Probability Theory and Its Applications, vol. I (3 ed.), New York / London / Sydney: Wiley, ISBN 978-0-471-25708-0
- Russell, Stuart J.; Norvig, Peter (2003), Artificial Intelligence: A Modern Approach (2nd ed.), Upper Saddle River, New Jersey: Prentice Hall, ISBN 0-13-790395-2, p. 496.
References
[edit]- ^ Schilling, René L. (2021). Measure, Integral, Probability & Processes - Probab(ilistical)ly the Theoretical Minimum. Technische Universität Dresden, Germany. p. 136ff. ISBN 979-8-5991-0488-9.
{{cite book}}: CS1 maint: location missing publisher (link) - ^ Schum, David A. (1994). The Evidential Foundations of Probabilistic Reasoning. Northwestern University Press. p. 49. ISBN 978-0-8101-1821-8.
- ^ Klugh, Henry E. (2013). Statistics: The Essentials for Research (3rd ed.). Psychology Press. p. 149. ISBN 978-1-134-92862-0.
- ^ Virtue, Pat. "10-606: Mathematical Foundations for Machine Learning" (PDF).
Chain rule (probability)
View on GrokipediaChain rule for events
Two events
The chain rule for two events and in a probability space provides a fundamental way to express their joint probability:assuming and . This equality follows directly from the definition of conditional probability, , by rearranging terms to isolate the joint probability.\] The modern axiomatic definition of conditional probability, underpinning this rule, was formalized by Andrey Kolmogorov in his foundational work on probability theory.\[ This formulation breaks down the probability of both events occurring simultaneously into the product of the marginal probability of one event (the unconditional probability) and the conditional probability of the second event given the first. It enables computation of joint probabilities by leveraging known marginals and conditionals, which is particularly useful when events are dependent and direct assessment of the intersection is challenging.$$] The symmetry in the expression highlights that the rule can condition on either event, offering flexibility in application. Concepts akin to conditional probability, including early forms of this product rule, emerged in the 17th century through the correspondence between Blaise Pascal and Pierre de Fermat on the "problem of points" in gambling scenarios.[] A simple example illustrates the rule: consider drawing two cards sequentially without replacement from a standard 52-card deck. The probability of drawing an ace first and then a king is . This factors the joint event into the initial marginal probability and the updated conditional probability after the first draw.[$$
