Hubbry Logo
Cross-entropy methodCross-entropy methodMain
Open search
Cross-entropy method
Community hub
Cross-entropy method
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Cross-entropy method
Cross-entropy method
from Wikipedia

The cross-entropy (CE) method is a Monte Carlo method for importance sampling and optimization. It is applicable to both combinatorial and continuous problems, with either a static or noisy objective.

The method approximates the optimal importance sampling estimator by repeating two phases:[1]

  1. Draw a sample from a probability distribution.
  2. Minimize the cross-entropy between this distribution and a target distribution to produce a better sample in the next iteration.

Reuven Rubinstein developed the method in the context of rare-event simulation, where tiny probabilities must be estimated, for example in network reliability analysis, queueing models, or performance analysis of telecommunication systems. The method has also been applied to the traveling salesman, quadratic assignment, DNA sequence alignment, max-cut and buffer allocation problems.

Estimation via importance sampling

[edit]

Consider the general problem of estimating the quantity

,

where is some performance function and is a member of some parametric family of distributions. Using importance sampling this quantity can be estimated as

,

where is a random sample from . For positive , the theoretically optimal importance sampling density (PDF) is given by

.

This, however, depends on the unknown . The CE method aims to approximate the optimal PDF by adaptively selecting members of the parametric family that are closest (in the Kullback–Leibler sense) to the optimal PDF .

Generic CE algorithm

[edit]
  1. Choose initial parameter vector ; set t = 1.
  2. Generate a random sample from
  3. Solve for , where
  4. If convergence is reached then stop; otherwise, increase t by 1 and reiterate from step 2.

In several cases, the solution to step 3 can be found analytically. Situations in which this occurs are

  • When belongs to the natural exponential family
  • When is discrete with finite support
  • When and , then corresponds to the maximum likelihood estimator based on those .

Continuous optimization—example

[edit]

The same CE algorithm can be used for optimization, rather than estimation. Suppose the problem is to maximize some function , for example, . To apply CE, one considers first the associated stochastic problem of estimating for a given level , and parametric family , for example the 1-dimensional Gaussian distribution, parameterized by its mean and variance (so here). Hence, for a given , the goal is to find so that is minimized. This is done by solving the sample version (stochastic counterpart) of the KL divergence minimization problem, as in step 3 above. It turns out that parameters that minimize the stochastic counterpart for this choice of target distribution and parametric family are the sample mean and sample variance corresponding to the elite samples, which are those samples that have objective function value . The worst of the elite samples is then used as the level parameter for the next iteration. This yields the following randomized algorithm that happens to coincide with the so-called Estimation of Multivariate Normal Algorithm (EMNA), an estimation of distribution algorithm.

Pseudocode

[edit]
// Initialize parameters
μ := −6
σ2 := 100
t := 0
maxits := 100
N := 100
Ne := 10
// While maxits not exceeded and not converged
while t < maxits and σ2 > ε do
    // Obtain N samples from current sampling distribution
    X := SampleGaussian(μ, σ2, N)
    // Evaluate objective function at sampled points
    S := exp(−(X − 2) ^ 2) + 0.8 exp(−(X + 2) ^ 2)
    // Sort X by objective function values in descending order
    X := sort(X, S)
    // Update parameters of sampling distribution via elite samples                  
    μ := mean(X(1:Ne))
    σ2 := var(X(1:Ne))
    t := t + 1
// Return mean of final sampling distribution as solution
return μ
[edit]

See also

[edit]

Journal papers

[edit]
  • De Boer, P.-T., Kroese, D.P., Mannor, S. and Rubinstein, R.Y. (2005). A Tutorial on the Cross-Entropy Method. Annals of Operations Research, 134 (1), 19–67.[1]
  • Rubinstein, R.Y. (1997). Optimization of Computer Simulation Models with Rare Events, European Journal of Operational Research, 99, 89–112.

Software implementations

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The cross-entropy (CE) method is a versatile for solving complex optimization problems and estimating rare-event probabilities in , which operates by iteratively refining a parametric to concentrate sampling efforts on high-performing or rare outcomes through minimization of the Kullback-Leibler divergence between the current distribution and an empirical distribution derived from elite samples. Developed by Reuven Y. Rubinstein in the late , the method originated as an adaptive technique for rare-event estimation in discrete-event simulations, such as reliability analysis in networks, where standard methods are inefficient due to the low probability of target events. It was subsequently extended to optimization tasks by reformulating them as rare-event problems, enabling efficient solutions to both combinatorial (e.g., traveling salesman problem, knapsack) and continuous (e.g., multi-extremal functions) challenges that are often NP-hard. At its core, the CE algorithm begins with an initial reference distribution, generates a set of random samples, evaluates their performance using an objective function, selects the "" subset (typically the top ρ fraction, where 0.01 ≤ ρ ≤ 0.1), and updates the distribution parameters to maximize the likelihood of generating similar elite samples in the next , effectively minimizing the distance. This process repeats until convergence, often requiring smoothing (e.g., via convex combinations of old and new parameters) to avoid premature trapping in local optima, and can handle noisy environments by incorporating techniques. For rare-event simulation, the method shifts the toward the rare region, dramatically reducing the number of simulations needed—for instance, achieving up to 2000-fold efficiency gains in probability estimates. Key applications span engineering, operations research, and machine learning, including network reliability assessment, portfolio optimization, and training neural networks via policy search in reinforcement learning. The method's adaptability to various parametric families (e.g., Bernoulli for binary problems, Gaussian for continuous) and its theoretical grounding in information theory have made it a foundational tool, with extensions like the minimum cross-entropy variant for multi-objective scenarios and integrations with other heuristics such as genetic algorithms. Despite its strengths in scalability and ease of implementation, performance can depend on parameter tuning, such as elite fraction selection and stopping criteria, to balance exploration and exploitation.

Fundamentals

Importance Sampling Basics

Rare event simulation refers to the task of estimating small probabilities, such as P(A)P(A) where AA denotes a rare event with probability p1p \ll 1, using Monte Carlo methods. In direct or crude Monte Carlo estimation, one generates NN independent samples under the reference probability measure PP and approximates pp by the empirical proportion of samples falling into AA, yielding an estimator with variance p(1p)/Np/Np(1-p)/N \approx p/N. This results in a high relative error of approximately 1/Np1/\sqrt{Np}
Add your contribution
Related Hubs
User Avatar
No comments yet.