Recent from talks
Nothing was collected or created yet.
Compound probability distribution
View on WikipediaIn probability and statistics, a compound probability distribution (also known as a mixture distribution or contagious distribution) is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution, with (some of) the parameters of that distribution themselves being random variables. If the parameter is a scale parameter, the resulting mixture is also called a scale mixture.
The compound distribution ("unconditional distribution") is the result of marginalizing (integrating) over the latent random variable(s) representing the parameter(s) of the parametrized distribution ("conditional distribution").
Definition
[edit]A compound probability distribution is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution with an unknown parameter that is again distributed according to some other distribution . The resulting distribution is said to be the distribution that results from compounding with . The parameter's distribution is also called the mixing distribution or latent distribution. Technically, the unconditional distribution results from marginalizing over , i.e., from integrating out the unknown parameter(s) . Its probability density function is given by:
The same formula applies analogously if some or all of the variables are vectors.
From the above formula, one can see that a compound distribution essentially is a special case of a marginal distribution: The joint distribution of and is given by , and the compound results as its marginal distribution: . If the domain of is discrete, then the distribution is again a special case of a mixture distribution.
Properties
[edit]General
[edit]The compound distribution will depend on the specific expression of each distribution, as well as which parameter of is distributed according to the distribution , and the parameters of will include any parameters of that are not marginalized, or integrated, out. The support of is the same as that of , and if the latter is a two-parameter distribution parameterized with the mean and variance, some general properties exist.
Mean and variance
[edit]The compound distribution's first two moments are given by the law of total expectation and the law of total variance:
If the mean of is distributed as , which in turn has mean and variance the expressions above imply and , where is the variance of .
Proof
[edit]let and be probability distributions parameterized with mean a variance asthen denoting the probability density functions as and respectively, and being the probability density of we haveand we have from the parameterization and thatand therefore the mean of the compound distribution as per the expression for its first moment above.
The variance of is given by , andgiven the fact that and . Finally we get
Applications
[edit]Testing
[edit]Distributions of common test statistics result as compound distributions under their null hypothesis, for example in Student's t-test (where the test statistic results as the ratio of a normal and a chi-squared random variable), or in the F-test (where the test statistic is the ratio of two chi-squared random variables).
Overdispersion modeling
[edit]Compound distributions are useful for modeling outcomes exhibiting overdispersion, i.e., a greater amount of variability than would be expected under a certain model. For example, count data are commonly modeled using the Poisson distribution, whose variance is equal to its mean. The distribution may be generalized by allowing for variability in its rate parameter, implemented via a gamma distribution, which results in a marginal negative binomial distribution. This distribution is similar in its shape to the Poisson distribution, but it allows for larger variances. Similarly, a binomial distribution may be generalized to allow for additional variability by compounding it with a beta distribution for its success probability parameter, which results in a beta-binomial distribution.
Bayesian inference
[edit]Besides ubiquitous marginal distributions that may be seen as special cases of compound distributions, in Bayesian inference, compound distributions arise when, in the notation above, F represents the distribution of future observations and G is the posterior distribution of the parameters of F, given the information in a set of observed data. This gives a posterior predictive distribution. Correspondingly, for the prior predictive distribution, F is the distribution of a new data point while G is the prior distribution of the parameters.
Convolution
[edit]Convolution of probability distributions (to derive the probability distribution of sums of random variables) may also be seen as a special case of compounding; here the sum's distribution essentially results from considering one summand as a random location parameter for the other summand.[1]
Computation
[edit]Compound distributions derived from exponential family distributions often have a closed form. If analytical integration is not possible, numerical methods may be necessary.
Compound distributions may relatively easily be investigated using Monte Carlo methods, i.e., by generating random samples. It is often easy to generate random numbers from the distributions as well as and then utilize these to perform collapsed Gibbs sampling to generate samples from .
A compound distribution may usually also be approximated to a sufficient degree by a mixture distribution using a finite number of mixture components, allowing to derive approximate density, distribution function etc.[1]
Parameter estimation (maximum-likelihood or maximum-a-posteriori estimation) within a compound distribution model may sometimes be simplified by utilizing the EM-algorithm.[2]
Examples
[edit]- Gaussian scale mixtures:[3][4]
- Compounding a normal distribution with variance distributed according to an inverse gamma distribution (or equivalently, with precision distributed as a gamma distribution) yields a non-standardized Student's t-distribution.[5] This distribution has the same symmetrical shape as a normal distribution with the same central point, but has greater variance and heavy tails.
- Compounding a Gaussian (or normal) distribution with variance distributed according to an exponential distribution (or with standard deviation according to a Rayleigh distribution) yields a Laplace distribution. More generally, compounding a Gaussian (or normal) distribution with variance distributed according to a gamma distribution yields a variance-gamma distribution.
- Compounding a Gaussian distribution with variance distributed according to an exponential distribution whose rate parameter is itself distributed according to a gamma distribution yields a Normal-exponential-gamma distribution. (This involves two compounding stages. The variance itself then follows a Lomax distribution; see below.)
- Compounding a Gaussian distribution with standard deviation distributed according to a (standard) inverse uniform distribution yields a Slash distribution.
- Compounding a Gaussian (normal) distribution with a Kolmogorov distribution yields a logistic distribution.[6][3]
- other Gaussian mixtures:
- Compounding a Gaussian distribution with mean distributed according to another Gaussian distribution yields (again) a Gaussian distribution.
- Compounding a Gaussian distribution with mean distributed according to a shifted exponential distribution yields an exponentially modified Gaussian distribution.
- Compounding a Bernoulli distribution with probability of success distributed according to a distribution that has a defined expected value yields a Bernoulli distribution with success probability . An interesting consequence is that the dispersion of does not influence the dispersion of the resulting compound distribution.
- Compounding a binomial distribution with probability of success distributed according to a beta distribution yields a beta-binomial distribution. It possesses three parameters, a parameter (number of samples) from the binomial distribution and shape parameters and from the beta distribution.[7][8]
- Compounding a multinomial distribution with probability vector distributed according to a Dirichlet distribution yields a Dirichlet-multinomial distribution.
- Compounding a Poisson distribution with rate parameter distributed according to a gamma distribution yields a negative binomial distribution.[9][10]
- Compounding a Poisson distribution with rate parameter distributed according to an exponential distribution yields a geometric distribution.
- Compounding an exponential distribution with its rate parameter distributed according to a gamma distribution yields a Lomax distribution.[11]
- Compounding a gamma distribution with inverse scale parameter distributed according to another gamma distribution yields a three-parameter beta prime distribution.[12]
- Compounding a half-normal distribution with its scale parameter distributed according to a Rayleigh distribution yields an exponential distribution. This follows immediately from the Laplace distribution resulting as a normal scale mixture; see above. The roles of conditional and mixing distributions may also be exchanged here; consequently, compounding a Rayleigh distribution with its scale parameter distributed according to a half-normal distribution also yields an exponential distribution.
- A Gamma(k=2,θ) - distributed random variable whose scale parameter θ again is uniformly distributed marginally yields an exponential distribution.
Similar terms
[edit]The notion of "compound distribution" as used e.g. in the definition of a Compound Poisson distribution or Compound Poisson process is different from the definition found in this article. The meaning in this article corresponds to what is used in e.g. Bayesian hierarchical modeling.
The special case for compound probability distributions where the parametrized distribution is the Poisson distribution is also called mixed Poisson distribution.
See also
[edit]References
[edit]- ^ a b Röver, C.; Friede, T. (2017). "Discrete approximation of a mixture distribution via restricted divergence". Journal of Computational and Graphical Statistics. 26 (1): 217–222. arXiv:1602.04060. doi:10.1080/10618600.2016.1276840.
- ^ Gelman, A.; Carlin, J. B.; Stern, H.; Rubin, D. B. (1997). "9.5 Finding marginal posterior modes using EM and related algorithms". Bayesian Data Analysis (1st ed.). Boca Raton: Chapman & Hall / CRC. p. 276.
- ^ a b Lee, S.X.; McLachlan, G.J. (2019). "Scale Mixture Distribution". Wiley StatsRef: Statistics Reference Online. pp. 1–16. doi:10.1002/9781118445112.stat08201. ISBN 978-1-118-44511-2.
- ^ Gneiting, T. (1997). "Normal scale mixtures and dual probability densities". Journal of Statistical Computation and Simulation. 59 (4): 375–384. doi:10.1080/00949659708811867.
- ^ Mood, A. M.; Graybill, F. A.; Boes, D. C. (1974). Introduction to the theory of statistics (3rd ed.). New York: McGraw-Hill.
- ^ Andrews, D.F.; Mallows, C.L. (1974), "Scale mixtures of normal distributions", Journal of the Royal Statistical Society, Series B, 36 (1): 99–102, doi:10.1111/j.2517-6161.1974.tb00989.x
- ^ Johnson, N. L.; Kemp, A. W.; Kotz, S. (2005). "6.2.2". Univariate discrete distributions (3rd ed.). New York: Wiley. p. 253.
- ^ Gelman, A.; Carlin, J. B.; Stern, H.; Dunson, D. B.; Vehtari, A.; Rubin, D. B. (2014). Bayesian Data Analysis (3rd ed.). Boca Raton: Chapman & Hall / CRC. Bibcode:2014bda..book.....G.
- ^ Lawless, J.F. (1987). "Negative binomial and mixed Poisson regression". The Canadian Journal of Statistics. 15 (3): 209–225. doi:10.2307/3314912. JSTOR 3314912.
- ^ Teich, M. C.; Diament, P. (1989). "Multiply stochastic representations for K distributions and their Poisson transforms". Journal of the Optical Society of America A. 6 (1): 80–91. Bibcode:1989JOSAA...6...80T. CiteSeerX 10.1.1.64.596. doi:10.1364/JOSAA.6.000080.
- ^ Johnson, N. L.; Kotz, S.; Balakrishnan, N. (1994). "20 Pareto distributions". Continuous univariate distributions. Vol. 1 (2nd ed.). New York: Wiley. p. 573.
- ^ Dubey, S. D. (1970). "Compound gamma, beta and F distributions". Metrika. 16: 27–31. doi:10.1007/BF02613934.
Further reading
[edit]- Lindsay, B. G. (1995), Mixture models: theory, geometry and applications, NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5, Hayward, CA, USA: Institute of Mathematical Statistics, pp. i–163, ISBN 978-0-940600-32-4, JSTOR 4153184
- Seidel, W. (2010), "Mixture models", in Lovric, M. (ed.), International Encyclopedia of Statistical Science, Heidelberg: Springer, pp. 827–829, doi:10.1007/978-3-642-04898-2_368, ISBN 978-3-642-04898-2
- Mood, A. M.; Graybill, F. A.; Boes, D. C. (1974), "III.4.3 Contagious distributions and truncated distributions", Introduction to the theory of statistics (3rd ed.), New York: McGraw-Hill, ISBN 978-0-07-042864-5
- Johnson, N. L.; Kemp, A. W.; Kotz, S. (2005), "8 Mixture distributions", Univariate discrete distributions, New York: Wiley, ISBN 978-0-471-27246-5
Compound probability distribution
View on GrokipediaFundamentals
Definition
A compound probability distribution arises as the distribution of a random sum , where is a nonnegative integer-valued discrete random variable representing the random number of terms, and the (for ) are independent and identically distributed random variables that are independent of . The distribution of is termed the primary or counting distribution, while the common distribution of each is the secondary or component distribution.[7] This structure models scenarios where the number of summands is stochastic, such as aggregate claims in insurance or total progeny in branching processes.[2] Under the assumptions that takes values in and the are i.i.d. and independent of , the probability laws of can be expressed using convolutions.[1] For the discrete case, where takes discrete values, the probability mass function is with the convention that the sum is 0 when .[2] In the continuous case, if the admit a probability density function , then the density function of is where denotes the -fold convolution density of (and is a Dirac delta at 0). Prominent examples of compound distributions include the compound Poisson (where is Poisson-distributed), compound binomial, and compound geometric distributions, each inheriting properties from their primary and secondary components.[8]Historical Context
The concept of compound probability distributions emerged in the early 20th century within actuarial science, building on Siméon Denis Poisson's foundational work on the Poisson distribution from 1837. The compound Poisson process, a key early example, was introduced by Filip Lundberg in his 1903 doctoral thesis, where he modeled insurance claims as a Poisson process for the number of events combined with random claim sizes to assess ruin probabilities.[9] This approach marked the initial formalization of compounding a counting process with independent severity distributions, laying groundwork for applications in risk modeling.[10] A significant milestone occurred in 1923 when Felix Eggenberger and George Pólya derived the negative binomial distribution as a Poisson distribution with a gamma-distributed rate parameter (a mixture distribution), interpreting it as a contagion model in urn schemes.[11] This mixture representation highlighted compounding's utility for overdispersion beyond simple Poisson assumptions. In 1930, Harald Cramér advanced collective risk theory in his treatise "On the Mathematical Theory of Risk," systematizing Lundberg's ideas by analyzing the compound Poisson process for aggregate claims in insurance, including approximations for ruin probabilities.[12] During the 1940s, William Feller generalized compound distributions within renewal theory, exploring their role in recurrent events and branching processes using generating functions, as detailed in his 1943 work on the Pascal distribution as a compound form.[13] Concurrently, Paul Lévy's contributions in the 1920s and 1930s integrated compounding into broader stochastic processes through his development of infinitely divisible distributions, which encompass compound Poisson and lead to stable distributions as limits of normalized sums. These efforts solidified compound distributions as essential tools in probability theory by mid-century.[14]Mathematical Properties
General Characteristics
Compound probability distributions, also known as random sum distributions, arise as the distribution of , where is a non-negative integer-valued random variable independent of the i.i.d. sequence with common distribution identical to that of . These distributions exhibit several structural properties that distinguish them from simple sums or mixtures. Notably, they inherit certain stability features from their components, such as belonging to the same parametric family under specific compounding operations; for instance, a compound Poisson distribution with logarithmic series-distributed jumps yields the negative binomial distribution, preserving closure within the family of discrete distributions used in over-dispersed count modeling.[15] A key characteristic is that compound distributions can be infinitely divisible if both the counting distribution of and the summand distribution of are infinitely divisible. This property allows the distribution to be expressed as the limit of convolutions of simpler distributions, facilitating approximations in large-scale systems like risk aggregation. For example, compound Poisson distributions, where follows a Poisson law, are infinitely divisible provided the jump sizes have an infinitely divisible distribution, enabling their use in Lévy process constructions.[16] The probability generating function (PGF) of is given by the composition , where and are the PGFs of and , respectively; similarly, the moment generating function (MGF), when it exists, satisfies under suitable conditions on the supports. This compositional structure underscores the recursive nature of the distribution, as represents a compound sum implying iterated convolutions: the density or mass function involves convolving the distribution of a random number of times, weighted by the probabilities of .[2] In terms of shape and asymptotic behavior, compound distributions are often unimodal, inheriting this trait from the component distributions if they are unimodal, but they typically display heavier tails compared to either the counting or summand distributions alone. This tail heaviness arises from the variability in , which amplifies extreme events; for instance, in compound heavy-tailed models, the tail probability decays more slowly than that of , dominated by scenarios with large or large individual , as analyzed in collective risk theory.[17]Moments and Higher Moments
The moments of a compound probability distribution, defined as where the are independent and identically distributed random variables independent of the nonnegative integer-valued random variable , and assuming all relevant moments are finite, can be expressed in terms of the moments of and . The mean is given by a result known as Wald's identity that holds under the independence condition and finite first moments.[18] The variance follows from the law of total variance: Higher moments are conveniently handled through cumulants, which add under independent summation and facilitate recursive computation for compound structures. The cumulant generating function of is , where and are the cumulant generating functions of and , respectively; this composition yields a recursive formula for the cumulants of order : with further terms involving higher cumulants of and lower cumulants of , derived via Faà di Bruno's formula applied to the composition.[19] Note that the first two cumulants recover the mean and variance: and . In the special case of a compound Poisson distribution, where follows a Poisson distribution with rate parameter , the variance simplifies to since .[20]Derivations and Proofs
The expected value of a compound random variable , where is a non-negative integer-valued random variable independent of the i.i.d. sequence with common distribution having finite mean and , is derived using the law of iterated expectations.[21] Conditioning on , the conditional expectation is for , and . Thus, .[21] This holds under the assumption that and to ensure the expectations exist. The variance of follows from the law of total variance, assuming and .[21] The conditional variance is for , and . The conditional expectation is . Therefore, This decomposition requires the second moments to be finite for convergence.[21] The probability generating function (PGF) of , defined as for , is obtained by conditioning on . The conditional PGF is , where is the PGF of . Thus, provided the infinite sum converges, which holds if is defined for and has a proper distribution.[22] The case contributes to the sum, corresponding to . For convergence of the series, the support of must ensure is analytic in the unit disk or the moments are finite as needed.[2] A compound Poisson distribution, where for and the are i.i.d. with characteristic function , has characteristic function .[23] To show infinite divisibility, note that for any positive integer , where each factor is the characteristic function of a compound Poisson with rate and the same jump distribution, confirming it can be expressed as the -fold convolution of identical distributions.[23] This property assumes the jump distribution is arbitrary but proper, with the Poisson ensuring non-negative integer counts. Edge cases include , reducing to a degenerate distribution at 0, which is trivially infinitely divisible.Applications
Statistical Modeling
Compound probability distributions are particularly valuable in statistical modeling for addressing overdispersion in count data, where the variance exceeds the mean, a common issue in fields like ecology and epidemiology. For instance, the negative binomial distribution, a classic compound Poisson-gamma distribution, effectively models such overdispersion by incorporating a gamma-distributed mixing parameter that accounts for extra variability beyond the Poisson assumption. In ecological studies, this approach has been applied to species abundance data, where environmental heterogeneity leads to clustered counts that violate Poisson equidispersion, allowing for more accurate inference on population dynamics.[24][25] Hypothesis testing in compound distribution models often involves score tests to compare compound variants against simpler baselines, such as distinguishing a compound Poisson from a standard Poisson in generalized linear regression frameworks. These tests leverage the score statistic under the null hypothesis of no overdispersion, requiring only estimation from the simpler model, which enhances computational efficiency while detecting extra variation due to unobserved factors. Seminal work by Cameron and Trivedi demonstrated the robustness of such tests in overdispersed Poisson regressions, showing they maintain appropriate size and power even with moderate sample sizes.[26][27] Parameter estimation for compound distributions typically employs the method of moments (MoM) or maximum likelihood estimation (MLE), balancing simplicity and efficiency. In MoM, sample moments are equated to theoretical moments—often the mean and variance—to solve for parameters like the mixing distribution's shape, providing closed-form solutions for distributions like the negative binomial. MLE, in contrast, maximizes the log-likelihood function, yielding asymptotically efficient estimators but requiring numerical optimization; for the negative binomial dispersion parameter, it has been shown to be unique and consistent under standard conditions. These methods outperform direct Poisson fitting by incorporating the compound structure, though MoM is preferred for initial estimates due to its robustness to outliers.[28][29] The primary advantage of compound distributions in statistical modeling lies in their ability to capture unobserved heterogeneity, such as varying exposure rates or individual differences not explicitly measured, leading to more realistic variance structures than standard distributions. This is evident in epidemiological applications, where negative binomial models have been used to analyze disease clustering in outbreak data, accounting for superspreading events that cause overdispersion in case counts during events like SARS-CoV-2 transmission. By integrating moments for estimation, these models briefly link to higher-order properties without delving into Bayesian frameworks. Overall, they improve model fit and predictive accuracy in heterogeneous datasets, reducing bias in regression coefficients.[30][31]Bayesian Analysis
Compound probability distributions play a central role in Bayesian inference by representing mixtures where a parameter of one distribution is itself random, drawn from a prior distribution. This setup naturally arises in hierarchical modeling, allowing for the incorporation of uncertainty at multiple levels. A classic example is the compound distribution formed by a Poisson likelihood with following a Gamma prior , which results in a marginal distribution for that is negative binomial.[32] This mixture structure facilitates closed-form posterior updates due to conjugacy, where the posterior for remains Gamma-distributed: .[33] In hierarchical Bayesian models, compound distributions are particularly useful for modeling random effects, such as varying intercepts or slopes in regression settings where group-specific parameters are drawn from a higher-level distribution. For instance, in Bayesian regression, random effects can be represented as a compound Poisson process with Gamma-mixed rates to account for overdispersion in count data across clusters.[34] The Gamma distribution serves as a conjugate prior for the Poisson likelihood in these setups, ensuring tractable posterior inference when the rate parameter is uncertain.[35] This conjugacy simplifies the integration over hyperparameters, enabling efficient computation of marginal posteriors for model parameters. Inference for compound parameters often relies on Markov chain Monte Carlo methods like Gibbs sampling, which iteratively samples from conditional posteriors in the hierarchical structure, or variational inference techniques that approximate the joint posterior with a factorized distribution to scale to high-dimensional settings.[36] Gibbs sampling proves effective for compound models by augmenting latent variables, such as the mixing rates, to sample from the full conditional distributions.[37] Variational methods, in turn, optimize a lower bound on the evidence to infer approximate posteriors, particularly beneficial for large datasets involving compound hierarchies.[38] The advantages of compound distributions in Bayesian analysis stem from their ability to naturally model uncertainty in rate parameters, providing flexible priors that capture heterogeneity without assuming fixed values. This is especially valuable in fields like pharmacokinetics, where rate parameters for drug absorption or elimination vary across individuals due to physiological differences, allowing hierarchical compounds to propagate uncertainty through the posterior predictive distribution.[39] Such modeling enhances predictive accuracy by integrating prior knowledge with observed data, yielding robust estimates of parameter variability.[40]Signal Processing and Convolution
In signal processing, compound Poisson processes serve as foundational models for phenomena involving random impulses, such as shot noise in communication systems. Shot noise arises from the discrete nature of charge carriers, modeled as a compound Poisson process where the signal , with representing the Poisson-distributed number of events up to time and the independent jump sizes or impulse responses. This framework captures the stochastic superposition of pulses in optical and electronic communications, where the Poisson arrival of photons or electrons leads to fluctuations that degrade signal quality. Early formulations trace back to analyses of vacuum tube noise, extended to compound forms for non-exponential decays in modern applications like fiber-optic channels.[41][42] The probability density function (PDF) of a compound distribution admits a convolution-based interpretation, reflecting the summation of random variables. Specifically, the PDF of the total is given by , where denotes the -fold convolution of the secondary distribution , and includes a Dirac delta at zero. This structure arises naturally in signal processing for the response to clustered impulses, with iterative convolutions weighted by the counting distribution enabling efficient computation via transforms like Fourier or Laplace for filtering noisy aggregates. Such representations underpin deconvolution techniques to recover underlying signals from observed compound noise.[43] Applications extend to queueing theory, where compound distributions model workload accumulation in M/G/1 queues, with waiting times following a compound geometric form due to the geometric number of preceding service times under Poisson arrivals. In this setting, the steady-state waiting time distribution approximates the convolution of service times weighted by the queue length probabilities, aiding performance analysis for systems like data networks. Similarly, in risk theory, aggregate claims are modeled as compound Poisson sums, , where counts claim occurrences and individual severities, informing ruin probabilities and reserve calculations in insurance portfolios. These models highlight the role of convolutions in predicting overflow or excess in dynamic systems.[44][45] In filtering contexts, compound distributions accommodate non-Gaussian noise in extended Kalman filters, particularly for jump processes like compound Poisson disturbances in state estimation. Modified progressive extended Kalman filters handle compound measurement noises by approximating higher moments, improving robustness in tracking systems with impulsive outliers, such as radar or sensor networks under sporadic interference. This adaptation preserves the recursive structure of the standard Kalman update while accounting for the heavy-tailed nature of compound sums.[46][47] Compound distributions relate to broader Lévy processes, where subordinated or stable variants model anomalous diffusion beyond normal Brownian motion. Stable compound processes, as limits of normalized sums with heavy-tailed jumps, generate Lévy flights exhibiting super-diffusion, with characteristic exponents less than 2 leading to non-local spread in physical systems like turbulent flows or biological transport. These connections enable compound models to approximate infinite-activity Lévy paths for simulating irregular propagation in signal environments.[48][49]Computational Approaches
Closed-Form Solutions
Closed-form solutions for compound probability distributions exist only in specific cases where the counting distribution and the severity distribution permit explicit expressions for the probability mass or density functions of the compound sum . One prominent example is the compound Poisson-exponential distribution, which yields the gamma distribution. Specifically, if and independently, then follows a distribution.[50] Another example is the negative binomial distribution, which arises as a compound Poisson distribution with logarithmic series severity distribution.[4] For broader classes of discrete compound distributions, particularly in actuarial contexts, the Panjer recursion provides an efficient analytical method to compute the probability mass function (PMF) recursively without full convolution. For distributions where the counting PMF satisfies for (with ), the compound PMF obeys with , where is the severity PMF. This recursion applies to compound Poisson, binomial, and negative binomial cases and enables exact PMF evaluation for integer-valued severities in insurance aggregate loss models.[51] Transform methods offer another avenue for deriving closed forms by inverting generating functions. The probability generating function (PGF) of is , and inversion via yields exact probabilities when the PGF admits a simple series expansion, as in polynomial or rational forms. Similarly, for continuous severities, the characteristic function can be inverted using the Fourier transform: providing the density when the integral evaluates to a recognizable form, such as in stable or infinitely divisible cases.[52] These closed-form approaches are limited to scenarios where both and possess tractable generating functions, typically when they belong to exponential families that ensure closure under compounding or permit explicit inversions. For instance, exponential family members like Poisson, binomial, gamma, or exponential distributions often yield compound forms with recognizable expressions, but arbitrary combinations generally do not. When exact closed forms are unavailable, approximations such as translated gamma or normal fits are employed, with error bounds quantified via total variation distance or stop-loss differences; for compound Poisson approximations, the error satisfies , where measures severity approximation error and is a constant depending on moments. Such bounds ensure controlled accuracy in tail probabilities for risk assessment.[2][53]Simulation Techniques
Monte Carlo simulation provides a fundamental approach to generating samples from a compound probability distribution , where follows a counting distribution (such as Poisson) and the are independent and identically distributed severity random variables independent of .[54] To implement this, one first draws a realization from the distribution of , then generates independent samples from the severity distribution and computes their sum; this process is repeated for the desired number of simulations.[54] This method is particularly effective for univariate and bivariate compound variables, enabling estimation of quantities like tail probabilities and means through empirical averages of the simulated sums.[54] For scenarios involving rare events, such as extreme tails of the compound distribution, importance sampling enhances efficiency by altering the sampling distribution to increase the likelihood of observing those events, followed by reweighting using likelihood ratios , where is the target density and is the biasing density.[55] In the context of compound distributions like those modeling insurance ruin probabilities, the biasing can involve tilting the severity or frequency parameters (e.g., via a Lundberg exponent) to shift the drift, yielding unbiased estimators with reduced variance compared to standard Monte Carlo.[55] Additional algorithms facilitate simulation when direct methods are inefficient. Acceptance-rejection sampling can be applied to generate samples from the compound density by proposing from an envelope distribution that bounds the target compound pdf, accepting proposals with probability proportional to the ratio of the densities; this is useful for compound forms without closed-form inversion.[56] For computing the probability mass or density function of discrete or discretized compound distributions, fast Fourier transform (FFT)-based convolution offers a rapid alternative to direct summation, with complexity where is the grid size, by transforming the frequency and severity pmfs, multiplying in the frequency domain, and inverting.[57] This approach discretizes continuous severities if needed, applies zero-padding for accuracy, and includes continuity corrections, outperforming recursive methods like Panjer's for fine grids while maintaining high precision (e.g., total variation distances below ).[57] Software implementations streamline these techniques. In R, the actuar package provides thercompound function to simulate from general compound models by specifying frequency and severity generators, such as rcompound(n, model.freq = rpois(lambda=1.5), model.sev = rgamma(shape=3, rate=2)) for a compound Poisson-gamma distribution; a specialized rcompois handles Poisson frequencies directly.[58] In Python, simulations rely on scipy.stats for component distributions, e.g., drawing from scipy.stats.poisson for and scipy.stats.expon for severities, then summing as in the Monte Carlo procedure; custom functions can wrap these for compound generation.[59]
Validation of simulations typically involves comparing empirical moments from the samples, such as the mean and variance , to their theoretical counterparts derived from the components, ensuring convergence as the number of simulations increases.[54]
