Recent from talks
Nothing was collected or created yet.
Geometric distribution
View on Wikipedia| Geometric | |||
|---|---|---|---|
|
Probability mass function | |||
|
Cumulative distribution function | |||
| Parameters | success probability (real) | success probability (real) | |
| Support | k trials where | k failures where | |
| PMF | |||
| CDF |
for , for |
for , for | |
| Mean | |||
| Median |
|
| |
| Mode | |||
| Variance | |||
| Skewness | |||
| Excess kurtosis | |||
| Entropy | |||
| MGF |
for |
for | |
| CF | |||
| PGF | |||
| Fisher information | |||
In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:
- The probability distribution of the number of Bernoulli trials needed to get one success, supported on ;
- The probability distribution of the number of failures before the first success, supported on .
These two different geometric distributions should not be confused with each other. Often, the name shifted geometric distribution is adopted for the former one (distribution of ); however, to avoid ambiguity, it is considered wise to indicate which is intended, by mentioning the support explicitly.
The geometric distribution gives the probability that the first occurrence of success requires independent trials, each with success probability . If the probability of success on each trial is , then the probability that the -th trial is the first success is
for
The above form of the geometric distribution is used for modeling the number of trials up to and including the first success. By contrast, the following form of the geometric distribution is used for modeling the number of failures until the first success:
for
The geometric distribution gets its name because its probabilities follow a geometric sequence. It is sometimes called the Furry distribution after Wendell H. Furry.[1]: 210
Definition
[edit]The geometric distribution is the discrete probability distribution that describes when the first success in an infinite sequence of independent and identically distributed Bernoulli trials occurs. Its probability mass function depends on its parameterization and support. When supported on , the probability mass function is where is the number of trials and is the probability of success in each trial.[2]: 260–261
The support may also be , defining . This alters the probability mass function into where is the number of failures before the first success.[3]: 66
An alternative parameterization of the distribution gives the probability mass function where and .[1]: 208–209
An example of a geometric distribution arises from rolling a six-sided die until a "1" appears. Each roll is independent with a chance of success. The number of rolls needed follows a geometric distribution with .
Properties
[edit]Memorylessness
[edit]The geometric distribution is the only memoryless discrete probability distribution.[4] It is the discrete version of the same property found in the exponential distribution.[1]: 228 The property asserts that the number of previously failed trials does not affect the number of future trials needed for a success.
Because there are two definitions of the geometric distribution, there are also two definitions of memorylessness for discrete random variables.[5] Expressed in terms of conditional probability, the two definitions are and
where and are natural numbers, is a geometrically distributed random variable defined over , and is a geometrically distributed random variable defined over . Note that these definitions are not equivalent for discrete random variables; does not satisfy the first equation and does not satisfy the second.
Moments and cumulants
[edit]The expected value and variance of a geometrically distributed random variable defined over is[2]: 261 With a geometrically distributed random variable defined over , the expected value changes into while the variance stays the same.[6]: 114–115
For example, when rolling a six-sided die until landing on a "1", the average number of rolls needed is and the average number of failures is .
The moment generating function of the geometric distribution when defined over and respectively is[7][6]: 114 The moments for the number of failures before the first success are given by
where is the polylogarithm function.[8]
The cumulant generating function of the geometric distribution defined over is[1]: 216 The cumulants satisfy the recursionwhere , when defined over .[1]: 216
Proof of expected value
[edit]Consider the expected value of X as above, i.e. the average number of trials until a success. The first trial either succeeds with probability , or fails with probability . If it fails, the remaining mean number of trials until a success is identical to the original mean - this follows from the fact that all trials are independent.
From this we get the formula:
which, when solved for , gives:
The expected number of failures can be found from the linearity of expectation, . It can also be shown in the following way:
The interchange of summation and differentiation is justified by the fact that convergent power series converge uniformly on compact subsets of the set of points where they converge.
Summary statistics
[edit]The mean of the geometric distribution is its expected value which is, as previously discussed in § Moments and cumulants, or when defined over or respectively.
The median of the geometric distribution is when defined over [9] and when defined over .[3]: 69
The mode of the geometric distribution is the first value in the support set. This is 1 when defined over and 0 when defined over .[3]: 69
The skewness of the geometric distribution is .[6]: 115
The kurtosis of the geometric distribution is .[6]: 115 The excess kurtosis of a distribution is the difference between its kurtosis and the kurtosis of a normal distribution, .[10]: 217 Therefore, the excess kurtosis of the geometric distribution is . Since , the excess kurtosis is always positive so the distribution is leptokurtic.[3]: 69 In other words, the tail of a geometric distribution decays faster than a Gaussian.[10]: 217
Entropy and Fisher's Information
[edit]Entropy (Geometric Distribution, Failures Before Success)
[edit]Entropy is a measure of uncertainty in a probability distribution. For the geometric distribution that models the number of failures before the first success, the probability mass function is:
The entropy for this distribution is defined as:
The entropy increases as the probability decreases, reflecting greater uncertainty as success becomes rarer.
Fisher's Information (Geometric Distribution, Failures Before Success)
[edit]Fisher information measures the amount of information that an observable random variable carries about an unknown parameter . For the geometric distribution (failures before the first success), the Fisher information with respect to is given by:
Proof:
- The Likelihood Function for a geometric random variable is:
- The Log-Likelihood Function is:
- The Score Function (first derivative of the log-likelihood w.r.t. ) is:
- The second derivative of the log-likelihood function is:
- Fisher Information is calculated as the negative expected value of the second derivative:
Fisher information increases as decreases, indicating that rarer successes provide more information about the parameter .
Entropy (Geometric Distribution, Trials Until Success)
[edit]For the geometric distribution modeling the number of trials until the first success, the probability mass function is:
The entropy for this distribution is the same as that of version modeling trials until failure,
Fisher's Information (Geometric Distribution, Trials Until Success)
[edit]Fisher information for the geometric distribution modeling the number of trials until the first success is given by:
Proof:
- The Likelihood Function for a geometric random variable is:
- The Log-Likelihood Function is:
- The Score Function (first derivative of the log-likelihood w.r.t. ) is:
- The second derivative of the log-likelihood function is:
- Fisher Information is calculated as the negative expected value of the second derivative:
General properties
[edit]- The probability generating functions of geometric random variables and defined over and are, respectively,[6]: 114–115
- The characteristic function is equal to so the geometric distribution's characteristic function, when defined over and respectively, is[11]: 1630
- The entropy of a geometric distribution with parameter is[12]
- Given a mean, the geometric distribution is the maximum entropy probability distribution of all discrete probability distributions. The corresponding continuous distribution is the exponential distribution.[13]
- The geometric distribution defined on is infinitely divisible, that is, for any positive integer , there exist independent identically distributed random variables whose sum is also geometrically distributed. This is because the negative binomial distribution can be derived from a Poisson-stopped sum of logarithmic random variables.[11]: 606–607
- The decimal digits of the geometrically distributed random variable Y are a sequence of independent (and not identically distributed) random variables.[citation needed] For example, the hundreds digit D has this probability distribution: where q = 1 − p, and similarly for the other digits, and, more generally, similarly for numeral systems with other bases than 10. When the base is 2, this shows that a geometrically distributed random variable can be written as a sum of independent random variables whose probability distributions are indecomposable.
- Golomb coding is the optimal prefix code[clarification needed] for the geometric discrete distribution.[12]
Related distributions
[edit]- The sum of independent geometric random variables with parameter is a negative binomial random variable with parameters and .[14] The geometric distribution is a special case of the negative binomial distribution, with .
- The geometric distribution is a special case of discrete compound Poisson distribution.[11]: 606
- The minimum of geometric random variables with parameters is also geometrically distributed with parameter .[15]
- Suppose 0 < r < 1, and for k = 1, 2, 3, ... the random variable Xk has a Poisson distribution with expected value rk/k. Then has a geometric distribution taking values in , with expected value r/(1 − r).[citation needed]
- The exponential distribution is the continuous analogue of the geometric distribution. Applying the floor function to the exponential distribution with parameter creates a geometric distribution with parameter defined over .[3]: 74 This can be used to generate geometrically distributed random numbers as detailed in § Random variate generation.
- If p = 1/n and X is geometrically distributed with parameter p, then the distribution of X/n approaches an exponential distribution with expected value 1 as n → ∞, sinceMore generally, if p = λ/n, where λ is a parameter, then as n→ ∞ the distribution of X/n approaches an exponential distribution with rate λ: therefore the distribution function of X/n converges to , which is that of an exponential random variable.[citation needed]
- The index of dispersion of the geometric distribution is and its coefficient of variation is . The distribution is overdispersed.[1]: 216
Statistical inference
[edit]The true parameter of an unknown geometric distribution can be inferred through estimators and conjugate distributions.
Method of moments
[edit]Provided they exist, the first moments of a probability distribution can be estimated from a sample using the formulawhere is the th sample moment and .[16]: 349–350 Estimating with gives the sample mean, denoted . Substituting this estimate in the formula for the expected value of a geometric distribution and solving for gives the estimators and when supported on and respectively. These estimators are biased since as a result of Jensen's inequality.[17]: 53–54
Maximum likelihood estimation
[edit]The maximum likelihood estimator of is the value that maximizes the likelihood function given a sample.[16]: 308 By finding the zero of the derivative of the log-likelihood function when the distribution is defined over , the maximum likelihood estimator can be found to be , where is the sample mean.[18] If the domain is , then the estimator shifts to . As previously discussed in § Method of moments, these estimators are biased.
Regardless of the domain, the bias is equal to
which yields the bias-corrected maximum likelihood estimator,[citation needed]
Bayesian inference
[edit]In Bayesian inference, the parameter is a random variable from a prior distribution with a posterior distribution calculated using Bayes' theorem after observing samples.[17]: 167 If a beta distribution is chosen as the prior distribution, then the posterior will also be a beta distribution and it is called the conjugate distribution. In particular, if a prior is selected, then the posterior, after observing samples , is[19]Alternatively, if the samples are in , the posterior distribution is[20]Since the expected value of a distribution is ,[11]: 145 as and approach zero, the posterior mean approaches its maximum likelihood estimate.
Random variate generation
[edit]The geometric distribution can be generated experimentally from i.i.d. standard uniform random variables by finding the first such random variable to be less than or equal to . However, the number of random variables needed is also geometrically distributed and the algorithm slows as decreases.[21]: 498
Random generation can be done in constant time by truncating exponential random numbers. An exponential random variable can become geometrically distributed with parameter through . In turn, can be generated from a standard uniform random variable altering the formula into .[21]: 499–500 [22]
Applications
[edit]The geometric distribution is used in many disciplines. In queueing theory, the M/M/1 queue has a steady state following a geometric distribution.[23] In stochastic processes, the Yule Furry process is geometrically distributed.[24] The distribution also arises when modeling the lifetime of a device in discrete contexts.[25] It has also been used to fit data including modeling patients spreading COVID-19.[26]
See also
[edit]References
[edit]- ^ a b c d e f Johnson, Norman L.; Kemp, Adrienne W.; Kotz, Samuel (2005-08-19). Univariate Discrete Distributions. Wiley Series in Probability and Statistics (1 ed.). Wiley. doi:10.1002/0471715816. ISBN 978-0-471-27246-5.
- ^ a b Nagel, Werner; Steyer, Rolf (2017-04-04). Probability and Conditional Expectation: Fundamentals for the Empirical Sciences. Wiley Series in Probability and Statistics (1st ed.). Wiley. doi:10.1002/9781119243496. ISBN 978-1-119-24352-6.
- ^ a b c d e Chattamvelli, Rajan; Shanmugam, Ramalingam (2020). Discrete Distributions in Engineering and the Applied Sciences. Synthesis Lectures on Mathematics & Statistics. Cham: Springer International Publishing. doi:10.1007/978-3-031-02425-2. ISBN 978-3-031-01297-6.
- ^ Dekking, Frederik Michel; Kraaikamp, Cornelis; Lopuhaä, Hendrik Paul; Meester, Ludolf Erwin (2005). A Modern Introduction to Probability and Statistics. Springer Texts in Statistics. London: Springer London. p. 50. doi:10.1007/1-84628-168-7. ISBN 978-1-85233-896-1.
- ^ Weisstein, Eric W. "Memoryless". mathworld.wolfram.com. Retrieved 2024-07-25.
- ^ a b c d e Forbes, Catherine; Evans, Merran; Hastings, Nicholas; Peacock, Brian (2010-11-29). Statistical Distributions (1st ed.). Wiley. doi:10.1002/9780470627242. ISBN 978-0-470-39063-4.
- ^ Bertsekas, Dimitri P.; Tsitsiklis, John N. (2008). Introduction to Probability. Optimization and Computation Series (2nd ed.). Belmont: Athena Scientific. p. 235. ISBN 978-1-886529-23-6.
- ^ Weisstein, Eric W. "Geometric Distribution". MathWorld. Retrieved 2024-07-13.
- ^ Aggarwal, Charu C. (2024). Probability and Statistics for Machine Learning: A Textbook. Cham: Springer Nature Switzerland. p. 138. doi:10.1007/978-3-031-53282-5. ISBN 978-3-031-53281-8.
- ^ a b Chan, Stanley (2021). Introduction to Probability for Data Science (1st ed.). Michigan Publishing. ISBN 978-1-60785-747-1.
- ^ a b c d Lovric, Miodrag, ed. (2011). International Encyclopedia of Statistical Science (1st ed.). Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/978-3-642-04898-2. ISBN 978-3-642-04897-5.
- ^ a b Gallager, R.; van Voorhis, D. (March 1975). "Optimal source codes for geometrically distributed integer alphabets (Corresp.)". IEEE Transactions on Information Theory. 21 (2): 228–230. doi:10.1109/TIT.1975.1055357. ISSN 0018-9448.
- ^ Lisman, J. H. C.; Zuylen, M. C. A. van (March 1972). "Note on the generation of most probable frequency distributions". Statistica Neerlandica. 26 (1): 19–23. doi:10.1111/j.1467-9574.1972.tb00152.x. ISSN 0039-0402.
- ^ Pitman, Jim (1993). Probability. New York, NY: Springer New York. p. 372. doi:10.1007/978-1-4612-4374-8. ISBN 978-0-387-94594-1.
- ^ Ciardo, Gianfranco; Leemis, Lawrence M.; Nicol, David (1 June 1995). "On the minimum of independent geometrically distributed random variables". Statistics & Probability Letters. 23 (4): 313–326. doi:10.1016/0167-7152(94)00130-Z. hdl:2060/19940028569. S2CID 1505801.
- ^ a b Evans, Michael; Rosenthal, Jeffrey (2023). Probability and Statistics: The Science of Uncertainty (2nd ed.). Macmillan Learning. ISBN 978-1429224628.
- ^ a b Held, Leonhard; Sabanés Bové, Daniel (2020). Likelihood and Bayesian Inference: With Applications in Biology and Medicine. Statistics for Biology and Health. Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/978-3-662-60792-3. ISBN 978-3-662-60791-6.
- ^ Siegrist, Kyle (2020-05-05). "7.3: Maximum Likelihood". Statistics LibreTexts. Retrieved 2024-06-20.
- ^ Fink, Daniel. "A Compendium of Conjugate Priors". CiteSeerX 10.1.1.157.5540.
- ^ "3. Conjugate families of distributions" (PDF). Archived (PDF) from the original on 2010-04-08.
- ^ a b Devroye, Luc (1986). Non-Uniform Random Variate Generation. New York, NY: Springer New York. doi:10.1007/978-1-4613-8643-8. ISBN 978-1-4613-8645-2.
- ^ Knuth, Donald Ervin (1997). The Art of Computer Programming. Vol. 2 (3rd ed.). Reading, Mass: Addison-Wesley. p. 136. ISBN 978-0-201-89683-1.
- ^ Daskin, Mark S. (2021). Bite-Sized Operations Management. Synthesis Lectures on Operations Research and Applications. Cham: Springer International Publishing. p. 127. doi:10.1007/978-3-031-02493-1. ISBN 978-3-031-01365-2.
- ^ Madhira, Sivaprasad; Deshmukh, Shailaja (2023). Introduction to Stochastic Processes Using R. Singapore: Springer Nature Singapore. p. 449. doi:10.1007/978-981-99-5601-2. ISBN 978-981-99-5600-5.
- ^ Gupta, Rakesh; Gupta, Shubham; Ali, Irfan (2023), Garg, Harish (ed.), "Some Discrete Parametric Markov–Chain System Models to Analyze Reliability", Advances in Reliability, Failure and Risk Analysis, Singapore: Springer Nature Singapore, pp. 305–306, doi:10.1007/978-981-19-9909-3_14, ISBN 978-981-19-9908-6, retrieved 2024-07-13
- ^ Polymenis, Athanase (2021-10-01). "An application of the geometric distribution for assessing the risk of infection with SARS-CoV-2 by location". Asian Journal of Medical Sciences. 12 (10): 8–11. doi:10.3126/ajms.v12i10.38783. ISSN 2091-0576.
Geometric distribution
View on Grokipediawhile for the failures-before-success version, it is
[2][3] The expected value is for the former and for the latter, with both sharing the variance .[2][3] A defining feature of the geometric distribution is its memoryless property, which states that the probability of requiring additional trials beyond a certain point is independent of the trials already conducted: for non-negative integers and .[3] This property uniquely characterizes the geometric distribution among discrete distributions on the non-negative integers and parallels the exponential distribution in continuous settings.[3] The distribution also serves as the special case of the negative binomial distribution, which generalizes it to the number of trials until the -th success.[4] Applications of the geometric distribution are widespread in fields requiring modeling of waiting times or trial counts until an event, such as reliability engineering (e.g., time until an engine failure in independent tests), quality control (e.g., inspections until a defective item is found), and telecommunications (e.g., packet retransmissions until successful delivery).[5][6][7] It is also used in ecology for modeling runs of species occurrences and in computer science for analyzing algorithm performance in randomized settings.[7][6]
Definition
Parameterizations
The geometric distribution arises in the context of independent Bernoulli trials, each with success probability where .[8] One standard parameterization defines the random variable as the number of failures preceding the first success, with possible values in the set .[8] This formulation, often considered primary in mathematical probability, directly corresponds to the terms of a geometric series indexed from 0.[8] An alternative parameterization specifies the random variable as the total number of trials required to achieve the first success, taking values in .[9] Here, , linking the two definitions.[10] Both parameterizations remain in common use due to contextual preferences: the failures version suits derivations involving infinite series and theoretical probability, while the trials version is favored in applied statistics for representing waiting times or experiment counts.[8][9] The probability of failure on each trial is conventionally denoted .[8]Probability Mass Function
The geometric distribution can be parameterized in terms of the number of failures before the first success in a sequence of independent Bernoulli trials, each with success probability where . The probability mass function (PMF) for this parameterization is given by [8] This PMF is verified to be a valid probability distribution, as the infinite sum over the support equals 1: using the formula for the sum of an infinite geometric series with common ratio .[8] An alternative parameterization models the number of trials until the first success, also based on independent Bernoulli trials with success probability . The corresponding PMF is [11] This PMF similarly sums to 1 over its support: again applying the infinite geometric series sum.[11] The random variables and are related by , which induces a probability shift such that for , accounting for the difference in their supports and the exponent adjustment in the PMF.[11]Cumulative Distribution Function
The cumulative distribution function (CDF) of a geometric random variable provides the probability that the number of failures or trials until the first success is at most a given value. There are two common parameterizations of the geometric distribution, which differ in whether the random variable counts the number of failures before the first success (starting from 0) or the number of trials until the first success (starting from 1).[3] Consider first the parameterization where denotes the number of failures before the first success in a sequence of independent Bernoulli trials, each with success probability where . The CDF is given by This closed-form expression is obtained by summing the probability mass function (PMF) from 0 to , which forms a finite geometric series.[3][12] For the alternative parameterization where represents the trial number on which the first success occurs, the support begins at 1, and the CDF is Similarly, this follows from summing the corresponding PMF from 1 to .[12][13] In both cases, the CDF approaches 1 as , since , ensuring that success eventually occurs with probability 1. The survival function, , is then for the failures parameterization and for the trials parameterization, representing the probability that no success has occurred by trial .[12][3] These CDFs interpret the cumulative probability of the first success happening by the -th trial (or after at most failures), which is fundamental for modeling waiting times in discrete processes.[13]Properties
Memorylessness
The memoryless property of the geometric distribution states that, for a random variable representing the number of trials until the first success with success probability (where ), the conditional probability holds for all nonnegative integers and .[14] This means that the probability of requiring more than additional trials after already observing failures remains unchanged from the original probability of exceeding trials from the start. To prove this, consider the survival function derived from the cumulative distribution function (CDF). For this parameterization, for . Thus, [15] This equality demonstrates that past outcomes do not influence future probabilities. The property implies that the distribution of the remaining number of trials (or failures) is identical to the original distribution, regardless of the number of prior failures observed, reflecting a lack of "aging" or dependence on history in the process. As the sole discrete distribution exhibiting memorylessness, the geometric distribution serves as the discrete analogue to the exponential distribution's continuous memoryless property.[8]Moments and Cumulants
The expected value of the geometric random variable , representing the number of failures before the first success in a sequence of independent Bernoulli trials with success probability , is given by .[8] This can be derived directly from the probability mass function for , yielding , where the sum is evaluated using the formula for the expected value of a geometric series.[8] Alternatively, leveraging the memoryless property of the geometric distribution, the tail sum formula provides , where , so .[16] The variance is .[8] To derive this, first compute the second factorial moment .[8] Then, .[8] In the alternative parameterization where denotes the number of trials until the first success, the expected value is and the variance is .[12] The skewness of the geometric distribution (in the failures parameterization) is , measuring the asymmetry toward positive values, which increases as decreases./11%3A_Bernoulli_Trials/11.03%3A_The_Geometric_Distribution) The (excess) kurtosis is , indicating heavier tails than the normal distribution, particularly for small ./11%3A_Bernoulli_Trials/11.03%3A_The_Geometric_Distribution) The cumulants of the geometric distribution satisfy , , and for .[8] These follow from the cumulant-generating function , obtained as the logarithm of the moment-generating function.[8]Summary Statistics
The geometric distribution is characterized by its success probability parameter , with two common parameterizations: one counting the number of failures before the first success (, PMF ) and the other counting the number of trials until the first success (, PMF ). Note that , so the distributions share the same variance, skewness, and excess kurtosis, but differ in mean, mode, and median by a shift of 1.[17][18] The following table summarizes key statistics for both parameterizations, where .| Statistic | Failures () | Trials () |
|---|---|---|
| Mean | ||
| Variance | ||
| Standard Deviation | ||
| Skewness | ||
| Excess Kurtosis | ||
| Mode | 0 | 1 |
| Median |
Information Measures
Entropy
The Shannon entropy of a discrete random variable quantifies the average uncertainty or information content in its outcomes, defined as , measured in bits. For the geometric distribution, which is discrete, the differential entropy approximation does not apply; instead, the focus is on this discrete Shannon entropy.[19] Consider the parameterization where counts the number of failures before the first success, so with probability mass function , where is the success probability () and . The entropy is , or equivalently , where is the binary entropy function.[19] For the alternative parameterization where counts the number of trials until the first success, so with , note that . Since adding a constant does not change the entropy, . The binary entropy arises from the memoryless property of the geometric distribution, reflecting the uncertainty in each Bernoulli trial scaled by the expected number of trials .[19] The entropy (or ) is minimized at , where bits, corresponding to certain success on the first trial with no uncertainty. As , the entropy diverges to infinity, reflecting unbounded uncertainty due to the potentially infinite sequence of failures. For example, at , bits. If natural logarithms are used instead, the entropy is measured in nats by replacing with .[19]Fisher Information
The Fisher information quantifies the amount of information that an observable random variable carries about an unknown parameter in a statistical model. For the geometric distribution, where denotes the number of failures before the first success with success probability , and probability mass function for , the Fisher information is a scalar measuring the sensitivity of the distribution to changes in .[20] To derive , consider the log-probability mass function: . The score function, or first derivative with respect to , is The second derivative is The Fisher information is then the negative expected value of this second derivative: Since for the geometric distribution, substitution yields This expression holds equivalently when computed as the variance of the score function, confirming the result.[21][20] For a sample of independent observations, the total Fisher information is . This directly informs the asymptotic efficiency of estimators, such as the maximum likelihood estimator . Specifically, the asymptotic variance of is , implying that is asymptotically normal with mean and variance . Thus, estimation precision is highest near the boundaries or , where the variance approaches zero, and lowest around , where achieves its minimum value of .[20]Related Distributions
Bernoulli Distribution
The Bernoulli distribution serves as the foundational building block for the geometric distribution, representing the outcome of a single trial in a sequence of independent experiments. A Bernoulli trial is a random experiment with exactly two possible outcomes: success, occurring with probability where , or failure, occurring with probability .[1] These trials are assumed to be independent, meaning the outcome of any one trial does not influence the others, and identically distributed, with the same success probability for each. This setup forms a Bernoulli process, which underpins many discrete probability models. The geometric distribution emerges directly from a sequence of such independent and identically distributed (i.i.d.) Bernoulli trials as the distribution of the waiting time until the first success. Specifically, if trials continue until the initial success is observed, the number of trials required—denoted —follows a geometric distribution with parameter . This waiting time interpretation highlights the geometric as a natural extension of repeated Bernoulli experiments, where each preceding failure simply delays the eventual success without altering future probabilities.[22] The independence of the Bernoulli trials ensures that the probability of success remains constant across attempts, making the geometric distribution memoryless in its progression.[1] In a limiting case, the geometric distribution reduces to the Bernoulli distribution when the process is restricted to a single trial. Under the convention where the geometric random variable counts the number of failures before the first success (so ), the probability exactly matches the Bernoulli probability of success on that solitary trial, with all higher values becoming impossible.[23] This adjustment underscores the geometric's role as a generalization of the Bernoulli, where expanding to multiple potential trials accommodates waiting beyond the immediate outcome. All key properties of the geometric distribution, such as its memorylessness and moment-generating function, inherit directly from the independence and identical distribution of the underlying Bernoulli trials. Without this prerequisite structure, the geometric could not maintain its characteristic lack of dependence on prior failures, emphasizing the Bernoulli's essential preparatory role in deriving and understanding the geometric model.Binomial Distribution
The binomial distribution and the geometric distribution are both derived from sequences of independent Bernoulli trials, each with success probability and failure probability , but they model different aspects of the process. The binomial distribution describes the number of successes in a fixed number of trials, with probability mass function for . In contrast, the geometric distribution describes the number of trials until the first success (equivalently, one fixed success with a variable number of trials), emphasizing the waiting time rather than a predetermined trial count.[1] This distinction highlights how the binomial fixes the sample size while allowing variable outcomes, whereas the geometric fixes the outcome (first success) while allowing variable sample size.[24] A key probabilistic relation connects the two: the probability mass function of the geometric distribution, for , can be expressed as the product of the success probability and the binomial probability of zero successes in the preceding trials, i.e., , where .[23] Since , this directly yields the geometric form and illustrates how the waiting time to first success builds on the absence of successes, a core component of binomial probabilities.[25] This link underscores the geometric as a foundational waiting-time model within the broader framework of Bernoulli trial accumulations. Conditioning on the binomial outcome provides another perspective on their interplay. Given exactly one success in trials (i.e., where ), the position of that single success follows a discrete uniform distribution on , with for each .[23] This uniform conditioning contrasts with the unconditional geometric distribution, which decreases with , and reflects an "inverse" waiting-time view: instead of waiting forward until success, the position is retrospectively uniform given the total constraint of one success.[23] Notably, this conditional uniformity holds independently of , as the events of failures before and after the success balance out.[23] The probability generating functions further tie the distributions together, with the binomial's reducing to the Bernoulli case () as , the single-trial building block shared with the geometric.[24] The geometric extends this via its generating function for , which arises from summing the infinite series of potential waiting times and can be seen as generalizing the fixed- structure to unbounded trials until success.[24] This functional form facilitates derivations of moments and connections to other waiting-time models.[1]Negative Binomial Distribution
The negative binomial distribution arises as the distribution of the total number of failures occurring before the r-th success in a sequence of independent Bernoulli trials, each with success probability p. This can be viewed as the sum of r independent and identically distributed geometric random variables, where each geometric random variable counts the number of failures before a single success.[26][27] If X ~ Geometric(p) represents the number of failures before the first success, with probability mass function (PMF) P(X = k) = (1-p)^k p for k = 0, 1, 2, ..., then the negative binomial random variable Y = X_1 + X_2 + \dots + X_r, where each X_i ~ Geometric(p), follows a negative binomial distribution NB(r, p).[26][28] The PMF of the negative binomial distribution NB(r, p) for the number of failures y before the r-th success is given by This form incorporates binomial coefficients to account for the number of ways to arrange y failures and r successes in a sequence ending with the r-th success.[29] The expected value (mean) of Y is E[Y] = r (1-p)/p, and the variance is Var(Y) = r (1-p)/p^2. These moments generalize those of the geometric distribution, which correspond to the special case r = 1, where E[Y] = (1-p)/p and Var(Y) = (1-p)/p^2.[29][27] The PMF of the negative binomial can be derived through convolution of the individual geometric PMFs, reflecting the additive nature of the failures across the r stages. Specifically, the probability P(Y = y) is the sum over all non-negative integers k_1, k_2, \dots, k_{r-1} such that k_1 + \dots + k_{r-1} + k_r = y of the product p^r (1-p)^{k_1 + \dots + k_r}, which simplifies to the binomial coefficient expression due to the identical distributions. Alternatively, this convolution result can be obtained using moment-generating functions: the MGF of a geometric random variable is p / (1 - (1-p) e^t) for t < -\ln(1-p), and raising it to the r-th power yields the MGF of the negative binomial, confirming the distribution.[30][28]Exponential Distribution
The exponential distribution arises as the continuous counterpart to the geometric distribution, modeling the waiting time until the first event in a continuous-time setting, much as the geometric distribution counts the number of discrete trials until the first success. In a Poisson process with rate parameter , the interarrival time between events follows an exponential distribution with probability density function for , paralleling the role of the geometric distribution in discrete-time Bernoulli processes.[31][32] This analogy is precise through parameter correspondence: for a geometric distribution with success probability , the associated exponential rate is , ensuring the survival functions align exactly, for integer . When is small, , reflecting the low-probability regime where discrete and continuous models converge.[31] Both distributions exhibit the memoryless property, where the conditional probability of waiting beyond an additional unit is independent of elapsed time or trials. In the continuous limit, the geometric probability mass function approximates the exponential density: consider time discretized into intervals of length , with success probability ; then, for a waiting time scaled by , the geometric PMF yields , matching the exponential PDF.[31][17] This limit interprets the geometric as a discretized exponential, with the waiting time for the first Poisson event in continuous time corresponding to the trial count in discrete time. A time-rescaled geometric random variable, where the number of trials is mapped to time as and , converges in distribution to an exponential random variable with rate . This transformation highlights the geometric as the discrete embedding of the exponential, unifying their interpretations in renewal theory.[31][32]Statistical Inference
Method of Moments
The method of moments provides a straightforward approach to estimate the success probability in the geometric distribution by equating the first population moment to the corresponding sample moment. There are two common parameterizations of the geometric distribution: one counting the number of failures before the first success, with probability mass function for and population mean ; the other counting the number of trials until the first success, with for and ./07:_Point_Estimation/7.02:_The_Method_of_Moments) For a random sample of size , let and denote the sample means from the failures and trials parameterizations, respectively. The method of moments estimator for in the failures case is , obtained by solving . In the trials case, the estimator simplifies to , by solving . These estimators arise directly from matching the unbiased sample mean to the population mean in each parameterization./07:_Point_Estimation/7.02:_The_Method_of_Moments)[20] The trials version estimator is unbiased in the sense that the underlying sample mean is unbiased for , though the nonlinear transformation to introduces finite-sample bias; regardless, both estimators are asymptotically consistent as , converging in probability to the true . The asymptotic variance of is approximately , derived from the Fisher information and applicable to large samples via the delta method; this can be estimated by substituting for .[20] While higher-order moments could be matched for added robustness in cases of model misspecification, the first moment is typically sufficient for the single-parameter geometric distribution, yielding a simple and efficient estimator./07:_Point_Estimation/7.02:_The_Method_of_Moments)Maximum Likelihood Estimation
Consider a sample of independent observations from the geometric distribution, where represents the number of failures before the first success, with probability mass function for and parameter .[33] The likelihood function is then [20] To find the maximum likelihood estimator (MLE), take the natural logarithm: Differentiating with respect to and setting the result to zero yields which simplifies to , where .[20] The second derivative confirms this is a maximum.[33] Notably, this MLE coincides with the method of moments estimator for the geometric distribution.[7] Under standard regularity conditions, the MLE is asymptotically normal as : with asymptotic variance derived from the inverse of the Fisher information .[20] The Fisher information for a single observation is , where is the probability mass function.[34] For the full sample, it scales to .[20] The form of the MLE is invariant under the alternative parameterization of the geometric distribution as the number of trials until the first success, , with for . In this case, the MLE is , which equals and matches the failures-based estimator since .[33] The asymptotic properties remain the same under this reparameterization.[20]Bayesian Inference
In Bayesian inference for the geometric distribution, the success probability is assigned a Beta prior distribution with shape parameters and , which is the conjugate prior due to its compatibility with the likelihood form. For independent observations from Geometric(), where for , the likelihood is proportional to . The resulting posterior distribution is Beta(, ), providing a closed-form update that incorporates both prior beliefs and observed data. The posterior mean serves as a Bayesian point estimate for : which shrinks the maximum likelihood estimate toward the prior mean and converges to it as increases. Credible intervals for are obtained from the quantiles of this Beta posterior, offering probabilistic bounds that account for parameter uncertainty. The maximum a posteriori (MAP) estimate, corresponding to the posterior mode, is when and , providing a peaked summary of the posterior under squared-error loss alternatives. For non-informative priors, the Jeffreys prior , derived from the square root of the Fisher information , is improper but yields a proper posterior for observed data.[35] This prior leads to a posterior that approximates the maximum likelihood estimate in large samples, while incorporating parameter invariance properties. Bayesian approaches with conjugate priors like Beta handle small samples more effectively than maximum likelihood estimation by leveraging prior information to stabilize estimates and quantify uncertainty.[36]Computation
Random Variate Generation
Generating random variates from the geometric distribution can be accomplished using several methods, with the choice depending on computational efficiency and the specific parameterization (number of trials until first success or number of failures before first success). The direct simulation method involves repeatedly generating independent Bernoulli random variables with success probability until the first success occurs. For the number of trials (support ), this counts the trials including the success; for the number of failures (support ), it counts only the preceding failures. This approach is straightforward but inefficient for small , as it requires an expected uniform random variates per sample, leading to high computational cost when the expected value is large.[37] A more efficient alternative is the inverse transform sampling method, which leverages the closed-form inverse of the cumulative distribution function (CDF). Generate . For the number of trials , compute for the number of failures , This relies on the CDF inversion , adjusted for discreteness with ceiling or floor functions; since , the form using is equivalent. The method requires only one uniform variate and logarithmic operations per sample, making it suitable for all .[3][38] The inverse transform method is preferred in computational practice due to its efficiency, particularly for small , and is implemented in many simulation libraries for generating geometric variates.[39]Numerical Methods
The probability mass function (PMF) and cumulative distribution function (CDF) of the geometric distribution, modeling the number of failures before the first success with success probability (), are given by and respectively.[8] For large , direct evaluation of risks numerical underflow in floating-point systems, as the value approaches zero exponentially. To mitigate this, computations are performed in log-space: with the result exponentiated only if the unlogged probability is required; otherwise, log-probabilities are retained for stability in subsequent operations like summation or ratio calculations.[40] The CDF can similarly be evaluated as , using specialized functions like log1p-exp for accuracy when is close to 1.[41] The quantile function, which inverts the CDF to find the smallest such that for , is derived from solving , yielding the real-valued form For the discrete case, this is rounded to the appropriate integer via ceiling to ensure the CDF condition holds: This logarithmic formulation avoids overflow and underflow issues inherent in iterative summation methods, providing efficient evaluation even for extreme quantiles.[41] Tail probabilities, such as , follow directly from the CDF as the survival function and are computed analogously in log-space as to prevent underflow for large . Recursive evaluation, where , can enhance stability by iteratively multiplying probabilities in log-space, avoiding repeated exponentiations. For very large means , where the distribution becomes heavy-tailed, a normal approximation may be applied to estimate central tail probabilities using mean and variance , though this is less accurate for extreme tails due to skewness.[41] In software implementations, such as Python's SciPy library, the geometric distribution is handled viascipy.stats.geom, which computes PMF, CDF, and quantile functions with built-in safeguards against overflow; for instance, when , results may clip to the integer maximum due to dtype limits, and log-scale methods are implicitly used for stability. For scenarios requiring arbitrary precision to fully avoid overflow, libraries like SymPy or mpmath enable exact or high-precision evaluations of the formulas above.[42]