Recent from talks
Nothing was collected or created yet.
Conditional probability
View on Wikipedia| Part of a series on statistics |
| Probability theory |
|---|
In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred.[1] This particular method relies on event A occurring with some sort of relationship with another event B. In this situation, the event A can be analyzed by a conditional probability with respect to B. If the event of interest is A and the event B is known or assumed to have occurred, "the conditional probability of A given B", or "the probability of A under the condition B", is usually written as P(A|B)[2] or occasionally PB(A). This can also be understood as the fraction of probability B that intersects with A, or the ratio of the probabilities of both events happening to the "given" one happening (how many times A occurs rather than not assuming B has occurred):
- .[3]
For example, the probability that any given person has a cough on any given day may be only 5%. But if we know or assume that the person is sick, then they are much more likely to be coughing. For example, the conditional probability that someone sick is coughing might be 75%, in which case we would have that P(Cough) = 5% and P(Cough|Sick) = 75 %. Although there is a relationship between A and B in this example, such a relationship or dependence between A and B is not necessary, nor do they have to occur simultaneously.
P(A|B) may or may not be equal to P(A), i.e., the unconditional probability or absolute probability of A. If P(A|B) = P(A), then events A and B are said to be independent: in such a case, knowledge about either event does not alter the likelihood of each other. P(A|B) (the conditional probability of A given B) typically differs from P(B|A). For example, if a person has dengue fever, the person might have a 90% chance of being tested as positive for the disease. In this case, what is being measured is that if event B (having dengue) has occurred, the probability of A (tested as positive) given that B occurred is 90%, simply writing P(A|B) = 90%. Alternatively, if a person is tested as positive for dengue fever, they may have only a 15% chance of actually having this rare disease due to high false positive rates. In this case, the probability of the event B (having dengue) given that the event A (testing positive) has occurred is 15% or P(B|A) = 15%. It should be apparent now that falsely equating the two probabilities can lead to various errors of reasoning, which is commonly seen through base rate fallacies.
While conditional probabilities can provide extremely useful information, limited information is often supplied or at hand. Therefore, it can be useful to reverse or convert a conditional probability using Bayes' theorem: .[4] Another option is to display conditional probabilities in a conditional probability table to illuminate the relationship between events.
Definition
[edit]


Conditioning on an event
[edit]Kolmogorov definition
[edit]Given two events A and B from the sigma-field of a probability space, with the unconditional probability of B being greater than zero (i.e., P(B) > 0), the conditional probability of A given B () is the probability of A occurring if B has or is assumed to have happened.[5] A is assumed to be the set of all possible outcomes of an experiment or random trial that has a restricted or reduced sample space. The conditional probability can be found by the quotient of the probability of the joint intersection of events A and B, that is, , the probability at which A and B occur together, and the probability of B:[2][6][7]
For a sample space consisting of equal likelihood outcomes, the probability of the event A is understood as the fraction of the number of outcomes in A to the number of all outcomes in the sample space. Then, this equation is understood as the fraction of the set to the set B. Note that the above equation is a definition, not just a theoretical result. We denote the quantity as and call it the "conditional probability of A given B."
As an axiom of probability
[edit]Some authors, such as de Finetti, prefer to introduce conditional probability as an axiom of probability:
This equation for a conditional probability, although mathematically equivalent, may be intuitively easier to understand. It can be interpreted as "the probability of B occurring multiplied by the probability of A occurring, provided that B has occurred, is equal to the probability of the A and B occurrences together, although not necessarily occurring at the same time". Additionally, this may be preferred philosophically; under major probability interpretations, such as the subjective theory, conditional probability is considered a primitive entity. Moreover, this "multiplication rule" can be practically useful in computing the probability of and introduces a symmetry with the summation axiom for Poincaré Formula:
- Thus the equations can be combined to find a new representation of the :
As the probability of a conditional event
[edit]Conditional probability can be defined as the probability of a conditional event . The Goodman–Nguyen–Van Fraassen conditional event can be defined as:
- where and represent states or elements of A or B. [8]
It can be shown that
which meets the Kolmogorov definition of conditional probability.[9]
Conditioning on an event of probability zero
[edit]If , then according to the definition, is undefined.
The case of greatest interest is that of a random variable Y, conditioned on a continuous random variable X resulting in a particular outcome x. The event has probability zero and, as such, cannot be conditioned on.
Instead of conditioning on X being exactly x, we could condition on it being closer than distance away from x. The event will generally have nonzero probability and hence, can be conditioned on. We can then take the limit
| 1 |
For example, if two continuous random variables X and Y have a joint density , then by L'Hôpital's rule and Leibniz integral rule, upon differentiation with respect to :
The resulting limit is the conditional probability distribution of Y given X and exists when the denominator, the probability density , is strictly positive.
It is tempting to define the undefined probability using limit (1), but this cannot be done in a consistent manner. In particular, it is possible to find random variables X and W and values x, w such that the events and are identical but the resulting limits are not:
The Borel–Kolmogorov paradox demonstrates this with a geometrical argument.
Conditioning on a discrete random variable
[edit]Let X be a discrete random variable and its possible outcomes denoted V. For example, if X represents the value of a rolled dice then V is the set . Let us assume for the sake of presentation that X is a discrete random variable, so that each value in V has a nonzero probability.
For a value x in V and an event A, the conditional probability is given by . Writing
for short, we see that it is a function of two variables, x and A.
For a fixed A, we can form the random variable . It represents an outcome of whenever a value x of X is observed.
The conditional probability of A given X can thus be treated as a random variable Y with outcomes in the interval . From the law of total probability, its expected value is equal to the unconditional probability of A.
Partial conditional probability
[edit]The partial conditional probability is about the probability of event given that each of the condition events has occurred to a degree (degree of belief, degree of experience) that might be different from 100%. Frequentistically, partial conditional probability makes sense, if the conditions are tested in experiment repetitions of appropriate length .[10] Such -bounded partial conditional probability can be defined as the conditionally expected average occurrence of event in testbeds of length that adhere to all of the probability specifications , i.e.:
Based on that, partial conditional probability can be defined as
where [10]
Jeffrey conditionalization[11][12] is a special case of partial conditional probability, in which the condition events must form a partition:
Example
[edit]Suppose that somebody secretly rolls two fair six-sided dice, and we wish to compute the probability that the face-up value of the first one is 2, given the information that their sum is no greater than 5.
Probability that D1 = 2
Table 1 shows the sample space of 36 combinations of rolled values of the two dice, each of which occurs with probability 1/36, with the numbers displayed in the red and dark gray cells being D1 + D2.
D1 = 2 in exactly 6 of the 36 outcomes; thus P(D1 = 2) = 6⁄36 = 1⁄6:
Table 1 + D2 1 2 3 4 5 6 D1 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12
Probability that D1 + D2 ≤ 5
Table 2 shows that D1 + D2 ≤ 5 for exactly 10 of the 36 outcomes, thus P(D1 + D2 ≤ 5) = 10⁄36:
Table 2 + D2 1 2 3 4 5 6 D1 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12
Probability that D1 = 2 given that D1 + D2 ≤ 5
Table 3 shows that for 3 of these 10 outcomes, D1 = 2.
Thus, the conditional probability P(D1 = 2 | D1+D2 ≤ 5) = 3⁄10 = 0.3:
Table 3 + D2 1 2 3 4 5 6 D1 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12
Here, in the earlier notation for the definition of conditional probability, the conditioning event B is that D1 + D2 ≤ 5, and the event A is D1 = 2. We have as seen in the table.
Use in inference
[edit]In statistical inference, the conditional probability is an update of the probability of an event based on new information.[13] The new information can be incorporated as follows:[1]
- Let A, the event of interest, be in the sample space, say (X,P).
- The occurrence of the event A knowing that event B has or will have occurred, means the occurrence of A as it is restricted to B, i.e. .
- Without the knowledge of the occurrence of B, the information about the occurrence of A would simply be P(A)
- The probability of A knowing that event B has or will have occurred, will be the probability of relative to P(B), the probability that B has occurred.
- This results in whenever P(B) > 0 and 0 otherwise.
This approach results in a probability measure that is consistent with the original probability measure and satisfies all the Kolmogorov axioms. This conditional probability measure also could have resulted by assuming that the relative magnitude of the probability of A with respect to X will be preserved with respect to B (cf. a Formal Derivation below).
The wording "evidence" or "information" is generally used in the Bayesian interpretation of probability. The conditioning event is interpreted as evidence for the conditioned event. That is, P(A) is the probability of A before accounting for evidence E, and P(A|E) is the probability of A after having accounted for evidence E or after having updated P(A). This is consistent with the frequentist interpretation, which is the first definition given above.
Example
[edit]When Morse code is transmitted, there is a certain probability that the "dot" or "dash" that was received is erroneous. This is often taken as interference in the transmission of a message. Therefore, it is important to consider when sending a "dot", for example, the probability that a "dot" was received. This is represented by: In Morse code, the ratio of dots to dashes is 3:4 at the point of sending, so the probabilities of a "dot" and "dash" are . If it is assumed that the probability that a dot is transmitted as a dash is 1/10, and that the probability that a dash is transmitted as a dot is likewise 1/10, then Bayes's rule can be used to calculate .
Now, can be calculated:
Statistical independence
[edit]Events A and B are defined to be statistically independent if the probability of the intersection of A and B is equal to the product of the probabilities of A and B:
If P(B) is not zero, then this is equivalent to the statement that
Similarly, if P(A) is not zero, then
is also equivalent. Although the derived forms may seem more intuitive, they are not the preferred definition as the conditional probabilities may be undefined, and the preferred definition is symmetrical in A and B. Independence does not refer to a disjoint event.[15]
It should also be noted that given the independent event pair [A,B] and an event C, the pair is defined to be conditionally independent if[16]
This theorem is useful in applications where multiple independent events are being observed.
Independent events vs. mutually exclusive events
The concepts of mutually independent events and mutually exclusive events are separate and distinct. The following table contrasts results for the two cases (provided that the probability of the conditioning event is not zero).
| If statistically independent | If mutually exclusive | |
|---|---|---|
| 0 | ||
| 0 | ||
| 0 |
In fact, mutually exclusive events cannot be statistically independent (unless both of them are impossible), since knowing that one occurs gives information about the other (in particular, that the latter will certainly not occur).
Common fallacies
[edit]- These fallacies should not be confused with Robert K. Shope's 1978 "conditional fallacy", which deals with counterfactual examples that beg the question.
Assuming conditional probability is of similar size to its inverse
[edit]
In general, it cannot be assumed that P(A|B) ≈ P(B|A). This can be an insidious error, even for those who are highly conversant with statistics.[17] The relationship between P(A|B) and P(B|A) is given by Bayes' theorem:
That is, P(A|B) ≈ P(B|A) only if P(B)/P(A) ≈ 1, or equivalently, P(A) ≈ P(B).
Assuming marginal and conditional probabilities are of similar size
[edit]In general, it cannot be assumed that P(A) ≈ P(A|B). These probabilities are linked through the law of total probability:
where the events form a countable partition of .
This fallacy may arise through selection bias.[18] For example, in the context of a medical claim, let SC be the event that a sequela (chronic disease) S occurs as a consequence of circumstance (acute condition) C. Let H be the event that an individual seeks medical help. Suppose that in most cases, C does not cause S (so that P(SC) is low). Suppose also that medical attention is only sought if S has occurred due to C. From experience of patients, a doctor may therefore erroneously conclude that P(SC) is high. The actual probability observed by the doctor is P(SC|H).
Over- or under-weighting priors
[edit]Not taking prior probability into account partially or completely is called base rate neglect. The reverse, insufficient adjustment from the prior probability is conservatism.
Formal derivation
[edit]Formally, P(A | B) is defined as the probability of A according to a new probability function on the sample space, such that outcomes not in B have probability 0 and that it is consistent with all original probability measures.[19][20]
Let Ω be a discrete sample space with elementary events {ω}, and let P be the probability measure with respect to the σ-algebra of Ω. Suppose we are told that the event B ⊆ Ω has occurred. A new probability distribution (denoted by the conditional notation) is to be assigned on {ω} to reflect this. All events that are not in B will have null probability in the new distribution. For events in B, two conditions must be met: the probability of B is one and the relative magnitudes of the probabilities must be preserved. The former is required by the axioms of probability, and the latter stems from the fact that the new probability measure has to be the analog of P in which the probability of B is one—and every event that is not in B, therefore, has a null probability. Hence, for some scale factor α, the new distribution must satisfy:
Substituting 1 and 2 into 3 to select α:
So the new probability distribution is
Now for a general event A,
See also
[edit]- Bayes' theorem
- Bayesian epistemology
- Borel–Kolmogorov paradox
- Chain rule (probability)
- Class membership probabilities
- Conditional independence
- Conditional probability distribution
- Conditioning (probability)
- Disintegration theorem
- Joint probability distribution
- Monty Hall problem
- Pairwise independent distribution
- Posterior probability
- Postselection
- Regular conditional probability
References
[edit]- ^ a b Gut, Allan (2013). Probability: A Graduate Course (Second ed.). New York, NY: Springer. ISBN 978-1-4614-4707-8.
- ^ a b "Conditional Probability". www.mathsisfun.com. Retrieved 2020-09-11.
- ^ Dekking, Frederik Michel; Kraaikamp, Cornelis; Lopuhaä, Hendrik Paul; Meester, Ludolf Erwin (2005). "A Modern Introduction to Probability and Statistics". Springer Texts in Statistics: 26. doi:10.1007/1-84628-168-7. ISBN 978-1-85233-896-1. ISSN 1431-875X.
- ^ Dekking, Frederik Michel; Kraaikamp, Cornelis; Lopuhaä, Hendrik Paul; Meester, Ludolf Erwin (2005). "A Modern Introduction to Probability and Statistics". Springer Texts in Statistics: 25–40. doi:10.1007/1-84628-168-7. ISBN 978-1-85233-896-1. ISSN 1431-875X.
- ^ Reichl, Linda Elizabeth (2016). "2.3 Probability". A Modern Course in Statistical Physics (4th revised and updated ed.). WILEY-VCH. ISBN 978-3-527-69049-7.
- ^ Kolmogorov, Andrey (1956), Foundations of the Theory of Probability, Chelsea
- ^ "Conditional Probability". www.stat.yale.edu. Retrieved 2020-09-11.
- ^ Flaminio, Tommaso; Godo, Lluis; Hosni, Hykel (2020-09-01). "Boolean algebras of conditionals, probability and logic". Artificial Intelligence. 286 103347. arXiv:2006.04673. doi:10.1016/j.artint.2020.103347. ISSN 0004-3702. S2CID 214584872.
- ^ Van Fraassen, Bas C. (1976), Harper, William L.; Hooker, Clifford Alan (eds.), "Probabilities of Conditionals", Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science: Volume I Foundations and Philosophy of Epistemic Applications of Probability Theory, The University of Western Ontario Series in Philosophy of Science, Dordrecht: Springer Netherlands, pp. 261–308, doi:10.1007/978-94-010-1853-1_10, ISBN 978-94-010-1853-1, retrieved 2021-12-04
- ^ a b c Draheim, Dirk (2017). "Generalized Jeffrey Conditionalization (A Frequentist Semantics of Partial Conditionalization)". Springer. Retrieved December 19, 2017.
- ^ Jeffrey, Richard C. (1983), The Logic of Decision (2nd ed.), University of Chicago Press, ISBN 9780226395821
- ^ "Bayesian Epistemology". Stanford Encyclopedia of Philosophy. 2017. Retrieved December 29, 2017.
- ^ Casella, George; Berger, Roger L. (2002). Statistical Inference. Duxbury Press. ISBN 0-534-24312-6.
- ^ "Conditional Probability and Independence" (PDF). Retrieved 2021-12-22.
- ^ Tijms, Henk (2012). Understanding Probability (3rd ed.). Cambridge: Cambridge University Press. doi:10.1017/cbo9781139206990. ISBN 978-1-107-65856-1.
- ^ Pfeiffer, Paul E. (1978). Conditional Independence in Applied Probability. Boston, MA: Birkhäuser Boston. ISBN 978-1-4612-6335-7. OCLC 858880328.
- ^ Paulos, J. A. (1988). Innumeracy: Mathematical Illiteracy and its Consequences. Hill and Wang. p. 63 et seq. ISBN 0-8090-7447-8.
- ^ Bruss, F. Thomas (2007). "Der Wyatt-Earp-Effekt oder die betörende Macht kleiner Wahrscheinlichkeiten". Spektrum der Wissenschaft (in German). 2: 110–113.
- ^ George Casella and Roger L. Berger (1990), Statistical Inference, Duxbury Press, ISBN 0-534-11958-1 (p. 18 et seq.)
- ^ Grinstead and Snell's Introduction to Probability, p. 134
External links
[edit]Conditional probability
View on GrokipediaFoundations
Definition
Conditional probability is a fundamental measure in probability theory that quantifies the likelihood of an event occurring given that another event has already occurred. In the frequentist interpretation, it represents the limiting relative frequency with which event occurs among the occurrences of event , as the number of trials approaches infinity.[8] This intuitive notion aligns with empirical observations, where the conditional probability is the proportion of times happens in the subsequence of trials where is realized.[8] Formally, in the axiomatic framework established by Andrey Kolmogorov, the conditional probability of event given event (with ) is defined as where is the probability of the intersection of and .[9] This definition extends the basic axioms of probability—non-negativity, normalization, and countable additivity—by introducing a normalized ratio that preserves probabilistic structure while conditioning on the restricting event .[9] As a core primitive concept, it underpins derivations of more advanced theorems and enables the modeling of dependencies in random phenomena.[10] Unlike joint probability , which measures the simultaneous occurrence of both events without restriction, conditional probability adjusts for the information provided by , often yielding a different value that reflects updated likelihoods.[9] This distinction is essential for distinguishing unconditional joint events from scenarios constrained by prior outcomes.[9]Notation
The standard notation for the conditional probability of an event given an event is , where the vertical bar signifies "given" or "conditioned on" .[11] This convention interprets as the probability measure restricted to the occurrence of , normalized appropriately.[12] For conditioning on multiple events, the notation extends to , indicating the probability of given the joint occurrence of and .[13] In multivariate settings, the vertical bar clearly delineates the conditioning set, with commas separating the conditioned events to prevent ambiguity in grouping.[1] Alternative notations appear in some probability literature, such as to emphasize the conditional probability measure induced by .[14] Another variant, , has been used in some texts to denote the conditional probability, though it is less common today.[1]Conditioning Types
On Events
In the axiomatic framework established by Andrey Kolmogorov in 1933, conditional probability is defined within the context of a probability space consisting of a sample space , an event algebra (specifically, a -algebra of measurable subsets of ), and a probability measure satisfying the standard axioms of non-negativity, normalization, and countable additivity. For events with , the conditional probability is given by which quantifies the probability of given that has occurred, building directly on the measure-theoretic structure of events. This definition implies an axiomatic treatment of conditional probability itself: for a fixed conditioning event with , the map for forms a new probability measure on , inheriting the Kolmogorov axioms. Specifically, for all (non-negativity), (normalization), and for a countable collection of pairwise disjoint events , (countable additivity). This perspective treats conditioning on as restricting the probability space to the subspace , renormalizing probabilities accordingly while preserving the algebraic structure of events. Bruno de Finetti offered a foundational reinterpretation in his subjective theory of probability, emphasizing operational and coherence-based axioms over measure theory. He regarded not as a derived quotient but as the direct probability ascribed to the conditional event "A given B," interpreted as the belief in A occurring under the explicit condition that B has occurred, with the joint relation emerging as a consequence of coherence to avoid Dutch book arguments. This approach prioritizes conditional probabilities as primitives, suitable for expressing degrees of belief in event-based scenarios without assuming a full unconditional measure. Alfréd Rényi proposed a new axiomatic foundation in 1955, taking conditional probabilities as primitives in conditional probability spaces, which allows for systems with unbounded measures where not all events in the algebra have assigned (normalized) unconditional probabilities. In Rényi's system, a conditional probability function is a primitive that assigns values to pairs of events (with ), satisfying axioms of non-negativity, normalization , and additivity for compatible conditionals, without requiring a complete unconditional probability measure on . This enables axiomatic treatment in situations of partial knowledge about the event space.[15]On Random Variables
In probability theory, the conditional probability associated with discrete random variables and is defined pointwise for values and in their respective supports, where , as This expression yields the conditional probability mass function (pmf) of given , which fully characterizes the updated distribution of after observing the specific value of .[16] The interpretation of this conditional pmf is that it represents the probabilities of the possible outcomes of , revised based on the information provided by the realization ; for instance, if and model the outcomes of successive coin flips, conditioning on adjusts the likelihoods for to reflect the observed flip. This framework extends the basic event-based conditioning—where events are indicator functions of subsets—by allowing to take multiple values, thus enabling a distribution over finer-grained conditional scenarios rather than binary or coarse event partitions.[16] For continuous random variables, the analogous concept shifts to probability densities, assuming the joint distribution has a density function with respect to Lebesgue measure. The conditional probability density function (pdf) of given , where the marginal density , is given by This conditional pdf describes the updated density of upon observing , with probabilities for intervals computed via integration over the conditional density.[16] Unlike conditioning on events, which restricts to probabilities over fixed subsets and often relies on indicator random variables, conditioning on continuous random variables leverages the full density structure to model dependencies across a continuum of outcomes, providing a more precise tool for analyzing joint behaviors in stochastic processes.[16]On Zero-Probability Events
The standard definition of conditional probability, , is undefined when . This limitation poses a significant challenge in continuous probability spaces, where events like a continuous random variable attaining a precise value have measure zero, despite the intuitive need to condition on such events for modeling purposes.[17] To address this, conditional probabilities are often resolved through the use of conditional densities in jointly continuous settings. The conditional density is defined for values where the marginal density , effectively extending the conditioning concept to points of positive density even though . Heuristically, the Dirac delta function can represent these point conditions, allowing formal expressions like the joint density incorporating to model conditioning on exact values in continuous distributions. A foundational rigorous resolution stems from Joseph L. Doob's martingale-based approach in 1953, where conditional expectations are defined as -projections onto sub--algebras, enabling the construction of conditional distributions via the Doob-Dynkin lemma for measurable functions. This framework underpins regular conditional distributions, which are Markov kernels satisfying almost surely for -measurable sets , with the property that for relevant events.[17] Such distributions exist uniquely (up to almost sure equivalence) in standard Borel probability spaces, including Polish spaces, ensuring well-defined conditioning even on null sets.[17] In applications to continuous models, regular conditional distributions facilitate conditioning on exact values; for jointly normal random variables and , the distribution of given is normal with mean and variance , providing a concrete realization despite .[18]Illustrations
Basic Examples
A classic example of conditional probability arises when rolling two fair six-sided dice. Let B be the event that the sum of the numbers shown is 7, and let A be the event that at least one die shows a 1. The conditional probability P(A | B) is the probability that at least one die is 1 given that the sum is 7.[19] The possible outcomes for sum 7 are the equally likely pairs: (1,6), (2,5), (3,4), (4,3), (5,2), (6,1), giving six outcomes in total. Among these, the outcomes with at least one 1 are (1,6) and (6,1). Thus, there are 2 favorable outcomes out of 6 possible, so Another introductory example involves drawing a single card from a standard 52-card deck. Let C be the event of drawing a face card (jack, queen, or king; there are 12 such cards), and let D be the event of drawing an ace (there are 4 aces). The conditional probability P(D | C) is the probability of drawing an ace given that a face card was drawn. Since aces are not face cards, the events D and C are mutually exclusive, so there are 0 aces among the 12 face cards. Thus, This demonstrates that conditional probabilities can be zero when the conditioned event precludes the target event.[20] The Monty Hall problem offers a well-known illustration of conditional probability in a decision-making context. A contestant selects one of three doors, one hiding a car (prize) and the other two hiding goats. The host, aware of the contents, opens a different door revealing a goat. The contestant may then stick with their original choice or switch to the remaining unopened door. The probability of winning the car by switching is 2/3.[21] Initially, the probability that the car is behind the chosen door is 1/3, and the probability it is behind one of the other two doors is 2/3. By revealing a goat behind one unchosen door, the host transfers the entire 2/3 probability to the remaining unopened door, making switching advantageous. Tree diagrams provide a visual method to distinguish joint probabilities from conditional ones by representing sequential events and their probabilities as branches. For the two-dice sum example above, a tree diagram begins with the 6 possible outcomes for the first die (each with probability 1/6), branching to the second die's outcomes (each 1/6), yielding 36 joint outcomes. Conditioning on sum 7 restricts the relevant paths to the 6 pairs that sum to 7, each now with equal conditional probability 1/6, allowing computation of further conditional events like at least one 1 (2 paths out of 6). This branching highlights how the full joint space narrows under conditioning.[22]Inference Applications
In statistical inference, conditional probability is fundamental to hypothesis testing via the likelihood function, which quantifies the probability of observing the data given a specific hypothesis, denoted as .[23] This measure evaluates how compatible the data is with the hypothesis, allowing researchers to compare the relative support for alternative explanations without assigning probabilities to the hypotheses themselves.[23] For example, in assessing whether a coin is fair, the likelihood compares the probability of observed toss outcomes under the null hypothesis of equal probabilities versus alternatives like a biased coin.[23] A prominent application arises in medical diagnostics, where conditional probabilities distinguish test characteristics from diagnostic inferences. The probability , known as sensitivity, represents the likelihood of a positive result given the disease is present and is a fixed property of the test.[24] In contrast, , the positive predictive value, is the probability of actual disease given a positive result, which depends on disease prevalence and test specificity.[24] For a rare disease with 0.1% prevalence, 99% sensitivity, and 99% specificity, a positive test yields only about 9% probability of disease, as false positives dominate due to low prevalence, underscoring how conditional probabilities inform reliable inference beyond basic test performance.[25] In epidemiology, conditional probabilities are essential for modeling infectious disease dynamics and predicting spread. The basic reproduction number , defined as the expected number of secondary cases generated by one infected individual in a fully susceptible population, relies on conditional transmission probabilities, such as the probability of infection given effective contact. When , this leads to exponential growth in case numbers through successive chains of transmission. For instance, a conditional case fatality rate of 15% given infection informs overall mortality risks, amplified by the epidemic's exponential expansion.[26] In contrast, economic forecasting for events like financial crises often employs marginal probabilities to estimate the overall likelihood of the event occurring, without conditioning on intermediate transmission-like steps. Models may predict, for example, a 15% chance of a full crisis based on aggregate indicators such as credit growth, representing the integrated probability that the event happens in its entirety, with the remaining probability indicating no crisis or only partial effects. This highlights a key distinction: chained conditional probabilities drive the compounding dynamics in epidemiological models, whereas marginal probabilities provide a holistic assessment in economic predictions.[27] Conditional probability also facilitates updating beliefs through sequential conditioning, where each new piece of evidence refines prior assessments by incorporating additional data. This process treats the posterior distribution from one stage as the prior for the next, enabling efficient evidence accumulation without recomputing full likelihoods from scratch.[28] In applications like analyzing large datasets from psychological experiments, such as reaction times in decision-making tasks, sequential updates partition data into batches for real-time inference, separating effects like speed and caution while maintaining conceptual coherence.[28] In frequentist inference, conditional probability underpins procedures by computing probabilities conditional on fixed parameter values, with the observed data serving as the basis for estimating unknowns and controlling error rates.[29] This conditioning treats parameters as known under the hypothesis, generating p-values and confidence intervals that reflect long-run frequencies, such as the probability of data as extreme as observed under the null.[29] Thus, inference conditions on the data to quantify uncertainty while adhering to the paradigm's emphasis on repeatable sampling properties.[29]Connections
Independence
In probability theory, two events and in a probability space are defined to be statistically independent if the conditional probability of given equals the unconditional probability of , that is, , provided .[30] This condition holds symmetrically for . Equivalently, independence is characterized by the joint probability satisfying .[31] This equivalence follows directly from the definition of conditional probability, , which implies the product form when the conditional equals the marginal.[30] For random variables, independence extends the event-based definition: two discrete random variables and are independent if the conditional probability mass function satisfies for all and such that .[32] This ensures that the distribution of remains unchanged regardless of the observed value of . The definition generalizes to continuous random variables via probability density functions, where the conditional density for in the support of .[33] When considering multiple events or random variables, a distinction arises between pairwise independence and mutual independence. Pairwise independence requires that every pair satisfies the independence condition individually, such as for all .[34] Mutual independence, however, demands that the independence holds for every finite subset, including the full collection; for three events , , and , this includes pairwise conditions plus .[34] Mutual independence implies pairwise independence, but the converse does not hold, as pairwise conditions alone may fail to capture higher-order dependencies.[35] The same distinctions apply to collections of random variables.[35] A key implication of independence is the simplification of joint distributions: for mutually independent random variables , the joint probability mass or density function factors as the product of the marginals, (or for continuous cases).[36] This factorization greatly reduces computational complexity in modeling joint behaviors, as expectations, variances, and other moments can often be computed separately and combined without cross-terms.[37] For pairwise independent variables, the joint does not necessarily factor fully, limiting such simplifications to pairs.[36]Bayes' Theorem
Bayes' theorem is a cornerstone of conditional probability, enabling the inversion of conditional probabilities to compute the probability of one event given another by relating it to the reverse conditional and marginal probabilities. This theorem facilitates updating beliefs or probabilities based on new evidence, making it essential in fields requiring inference under uncertainty.[38] The theorem is stated aswhere the denominator is the marginal probability of , often computed via the law of total probability as over a partition of mutually exclusive and exhaustive events .[38] Named after the English mathematician Thomas Bayes, the theorem appeared in his posthumously published essay "An Essay Towards Solving a Problem in the Doctrine of Chances" in 1763.[39] French mathematician Pierre-Simon Laplace independently rediscovered and formalized it in a more general version in his 1812 work Théorie Analytique des Probabilités, expanding its applicability to continuous cases and statistical inference.[38] In Bayesian statistics, Bayes' theorem underpins the updating process, where represents the prior probability of the hypothesis before observing evidence , is the likelihood of the evidence given the hypothesis, and is the posterior probability reflecting the updated belief after incorporating the evidence.[38] This framework allows for systematic incorporation of prior knowledge with observed data to refine probabilistic assessments.[38] For continuous random variables, the theorem adapts to probability density functions, expressed proportionally as
where is the prior density of the parameter , the likelihood density of the data given , and the posterior density; the normalizing constant is the marginal density .[40] This form is fundamental to Bayesian inference with continuous distributions.[40]