Simulated annealing
View on WikipediaThis article needs additional citations for verification. (December 2009) |


Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. For large numbers of local optima, SA can find the global optimum.[1] It is often used when the search space is discrete (for example the traveling salesman problem, the boolean satisfiability problem, protein structure prediction, and job-shop scheduling). For problems where a fixed amount of computing resource is available, finding an approximate global optimum may be more relevant than attempting to find a precise local optimum. In such cases, SA may be preferable to exact algorithms such as gradient descent or branch and bound. The problems solved by SA are currently formulated by an objective function of many variables, subject to several mathematical constraints. In practice, a constraint violation can be penalized as part of the objective function.
Similar techniques have been independently introduced on several occasions, including Pincus (1970),[2] Khachaturyan et al. (1979,[3] 1981[4]), Kirkpatrick, Gelatt and Vecchi (1983), and Cerny (1985).[5] In 1983, this approach was used by Kirkpatrick, Gelatt Jr., and Vecchi[6] for a solution of the traveling salesman problem. They also proposed its current name, simulated annealing.[7]
The name of the algorithm comes from annealing in metallurgy, a technique involving heating and controlled cooling of a material to alter its physical properties. This notion of slow cooling implemented in the simulated annealing algorithm is interpreted as a slow decrease in the probability of accepting worse solutions as the solution space is explored. Accepting worse solutions allows for a more extensive search for the global optimal solution. Simulated annealing algorithms work by progressively decreasing the temperature from an initial positive value to zero. At each time step, the algorithm randomly selects a solution close to the current one, measures its quality, and moves to it according to the temperature-dependent probabilities of selecting better or worse solutions.
The simulation can be performed either by a solution of kinetic equations for probability density functions,[8][9] or by using a stochastic sampling method.[6][10] The method is an adaptation of the Metropolis–Hastings algorithm, a Monte Carlo method to generate sample states of a thermodynamic system, published by N. Metropolis et al. in 1953.[11]
Overview
[edit]
The state s of some physical systems, and the function E(s) to be minimized, is analogous to the internal energy of the system in that state. The goal is to bring the system, from an arbitrary initial state, to a state with the minimum possible energy.
The basic iteration
[edit]At each step, the simulated annealing heuristic considers some neighboring state s* of the current state s, and probabilistically decides between moving the system to state s* or staying in state s. These probabilities ultimately lead the system to move to states of lower energy. Typically, this step is repeated until the system reaches a state that is good enough for the application, or until a given computation budget has been exhausted.
The neighbors of a state
[edit]Optimization of a solution involves evaluating the neighbor states, which are new states produced through conservatively altering the current state. For example, in the traveling salesman problem, each state is typically defined as a permutation of the cities to be visited, and the neighbors of any state are the set of permutations produced by swapping any two of these cities. The well-defined way in which the states are altered to produce neighboring states is called a move, and different moves give different sets of neighboring states. These moves usually result in minimal alterations of the current state, in an attempt to progressively improve the solution through iteratively improving its parts (such as the city connections in the traveling salesman problem).
Simple heuristics like hill climbing, which move by finding better neighbor after better neighbor and stop when they have reached a solution which has no neighbors that are better solutions, is not guaranteed to lead to any of the existing better solutions – their outcome may easily be just a local optimum, while the actual best solution would be a global optimum that could be different. Metaheuristics use the neighbors of a solution as a way to explore the solution space, and although they prefer better neighbors, they probabilistically also accept worse neighbors to avoid getting stuck in local optima; they can find the global optimum if given enough time.
Acceptance probabilities
[edit]The probability of making the transition from the current state to a candidate new state is specified by an acceptance probability function , that depends on the energies and of the two states, and on a global time-varying parameter called the temperature. States with a smaller energy are better than those with a greater energy. The probability function must be positive even when is greater than . This feature prevents the method from becoming stuck at a local minimum that is worse than the global one.
When tends to zero, the probability must tend to zero if and to a positive value otherwise. For sufficiently small values of , the system will then increasingly favor moves that go downhill (i.e., to lower energy values), and avoid those that go uphill. With the procedure reduces to the greedy algorithm, which makes only the downhill transitions.
In the original description of simulated annealing, the probability was equal to 1 when —i.e., the procedure always moved downhill when it found a way to do so, irrespective of the temperature. Many descriptions and implementations of simulated annealing still take this condition as part of the method's definition. However, this condition is not essential for the method to work.
The function is usually chosen so that the probability of accepting a move decreases when the difference increases—that is, small uphill moves are more likely than large ones. However, this requirement is not strictly necessary, provided that the above requirements are met.
Given these properties, the temperature plays a crucial role in controlling the evolution of the state of the system with regard to its sensitivity to the variations of system energies. To be precise, for a large , the evolution of is sensitive to coarser energy variations, while it is sensitive to finer energy variations when is small.
The annealing schedule
[edit]The name and inspiration of the algorithm demand controlled temperature variation. This necessitates a gradual reduction of the temperature as the simulation proceeds. The algorithm starts initially with set to a high value, and then it is decreased at each step following some annealing schedule—which may be specified by the user but must end with towards the end of the allotted time budget. In this way, the system is expected to wander initially towards a broad region of the search space containing good solutions, ignoring small features of the energy function; then drift towards low-energy regions that become narrower, and finally move downhill according to the steepest descent heuristic.
For any given finite problem, the probability that the simulated annealing algorithm terminates with a global optimal solution approaches 1 as the annealing schedule is extended.[12] This theoretical result, however, is not particularly helpful, since the time required to ensure a significant probability of success will usually exceed the time required for a complete search of the solution space.[13]
Pseudocode
[edit]The following pseudocode presents the simulated annealing heuristic as described above. It starts from a state s0 and continues until a maximum of kmax steps have been taken. In the process, the call neighbour(s) should generate a randomly chosen neighbour of a given state s; the call random(0, 1) should pick and return a value in the range [0, 1], uniformly at random. The annealing schedule is defined by the call temperature(r), which should yield the temperature to use, given the fraction r of the time budget that has been expended so far.
- Let s = s0
- For k = 0 through kmax (exclusive):
- T ← temperature( 1 - (k+1)/kmax )
- Pick a random neighbour, snew ← neighbour(s)
- If P(E(s), E(snew), T) ≥ random(0, 1):
- s ← snew
- Output: the final state s
Parameter selection
[edit]In order to apply the simulated annealing method to a specific problem, one must specify the following parameters: the state space, the energy (goal) function E(), the candidate generator procedure neighbour(), the acceptance probability function P(), and the annealing schedule temperature() including the initial temperature init_temp. These choices can have a significant impact on the method's effectiveness. Unfortunately, there are no choices of these parameters that will be good for all problems, and there is no general way to find the best choices for a given problem. The following sections give some general guidelines.
Sufficiently near neighbour
[edit]Simulated annealing may be modeled as a random walk on a search graph, whose vertices are all possible states, and whose edges are the candidate moves. An essential requirement for the neighbour() function is that it must provide a sufficiently short path on this graph from the initial state to any state which may be the global optimum – the diameter of the search graph must be small. In the traveling salesman example above, for instance, the search space for n = 20 cities has n! = 2,432,902,008,176,640,000 (2.4 quintillion) states; yet the number of neighbors of each vertex is edges (coming from ), and the diameter of the graph is .
Transition probabilities
[edit]To investigate the behavior of simulated annealing on a particular problem, it can be useful to consider the transition probabilities that result from the various design choices made in the implementation of the algorithm. For each edge of the search graph, the transition probability is defined as the probability that the simulated annealing algorithm will move to state when its current state is . This probability depends on the current temperature as specified by temperature(), on the order in which the candidate moves are generated by the neighbour() function, and on the acceptance probability function P(). (Note that the transition probability is not simply , because the candidates are tested serially.)
Acceptance probabilities
[edit]The specification of neighbour(), P(), and temperature() is partially redundant. In practice, it's common to use the same acceptance function P() for many problems and adjust the other two functions according to the specific problem.
In the formulation of the method by Kirkpatrick et al., the acceptance probability function was defined as 1 if , and otherwise. This formula was superficially justified by analogy with the transitions of a physical system; it corresponds to the Metropolis–Hastings algorithm, in the case where T=1 and the proposal distribution of Metropolis–Hastings is symmetric. However, this acceptance probability is often used for simulated annealing even when the neighbour() function, which is analogous to the proposal distribution in Metropolis–Hastings, is not symmetric, or not probabilistic at all. As a result, the transition probabilities of the simulated annealing algorithm do not correspond to the transitions of the analogous physical system, and the long-term distribution of states at a constant temperature need not bear any resemblance to the thermodynamic equilibrium distribution over states of that physical system, at any temperature. Nevertheless, most descriptions of simulated annealing assume the original acceptance function, which is probably hard-coded in many implementations of SA.
In 1990, Moscato and Fontanari,[14] and independently Dueck and Scheuer,[15] proposed that a deterministic update (i.e. one that is not based on the probabilistic acceptance rule) could speed-up the optimization process without impacting on the final quality. Moscato and Fontanari conclude from observing the analogous of the "specific heat" curve of the "threshold updating" annealing originating from their study that "the stochasticity of the Metropolis updating in the simulated annealing algorithm does not play a major role in the search of near-optimal minima". Instead, they proposed that "the smoothening of the cost function landscape at high temperature and the gradual definition of the minima during the cooling process are the fundamental ingredients for the success of simulated annealing." The method subsequently popularized under the denomination of "threshold accepting" due to Dueck and Scheuer's denomination. In 2001, Franz, Hoffmann and Salamon showed that the deterministic update strategy is indeed the optimal one within the large class of algorithms that simulate a random walk on the cost/energy landscape.[16]
Efficient candidate generation
[edit]When choosing the candidate generator neighbour(), one must consider that after a few iterations of the simulated annealing algorithm, the current state is expected to have much lower energy than a random state. Therefore, as a general rule, one should skew the generator towards candidate moves where the energy of the destination state is likely to be similar to that of the current state. This heuristic (which is the main principle of the Metropolis–Hastings algorithm) tends to exclude very good candidate moves as well as very bad ones; however, the former are usually much less common than the latter, so the heuristic is generally quite effective.
In the traveling salesman problem above, for example, swapping two consecutive cities in a low-energy tour is expected to have a modest effect on its energy (length); whereas swapping two arbitrary cities is far more likely to increase its length than to decrease it. Thus, the consecutive-swap neighbor generator is expected to perform better than the arbitrary-swap one, even though the latter could provide a somewhat shorter path to the optimum (with swaps, instead of ).
A more precise statement of the heuristic is that one should try the first candidate states for which is large. For the "standard" acceptance function above, it means that is on the order of or less. Thus, in the traveling salesman example above, one could use a neighbour() function that swaps two random cities, where the probability of choosing a city-pair vanishes as their distance increases beyond .
Barrier avoidance
[edit]When choosing the candidate generator neighbour() one must also try to reduce the number of "deep" local minima—states (or sets of connected states) that have much lower energy than all its neighboring states. Such "closed catchment basins" of the energy function may trap the simulated annealing algorithm with high probability (roughly proportional to the number of states in the basin) and for a very long time (roughly exponential on the energy difference between the surrounding states and the bottom of the basin).
As a rule, it is impossible to design a candidate generator that will satisfy this goal and also prioritize candidates with similar energy. On the other hand, one can often vastly improve the efficiency of simulated annealing by relatively simple changes to the generator. In the traveling salesman problem, for instance, it is not hard to exhibit two tours , , with nearly equal lengths, such that (1) is optimal, (2) every sequence of city-pair swaps that converts to goes through tours that are much longer than both, and (3) can be transformed into by flipping (reversing the order of) a set of consecutive cities. In this example, and lie in different "deep basins" if the generator performs only random pair-swaps; but they will be in the same basin if the generator performs random segment-flips.
Cooling schedule
[edit]The physical analogy that is used to justify simulated annealing assumes that the cooling rate is low enough for the probability distribution of the current state to be near thermodynamic equilibrium at all times. Unfortunately, the relaxation time—the time one must wait for the equilibrium to be restored after a change in temperature—strongly depends on the "topography" of the energy function and on the current temperature. In the simulated annealing algorithm, the relaxation time also depends on the candidate generator, in a very complicated way. Note that all these parameters are usually provided as black box functions to the simulated annealing algorithm. Therefore, the ideal cooling rate cannot be determined beforehand and should be empirically adjusted for each problem. Adaptive simulated annealing algorithms address this problem by connecting the cooling schedule to the search progress. Other adaptive approaches such as Thermodynamic Simulated Annealing,[17] automatically adjusts the temperature at each step based on the energy difference between the two states, according to the laws of thermodynamics.
Restarts
[edit]Sometimes it is better to move back to a solution that was significantly better rather than always moving from the current state. This process is called restarting of simulated annealing. To do this we set and to and and perhaps restart the annealing schedule. The decision to restart could be based on several criteria. Notable among these include restarting based on a fixed number of steps, based on whether the current energy is too high compared to the best energy obtained so far, restarting randomly, etc.
Related methods
[edit]- Interacting Metropolis–Hasting algorithms (a.k.a. sequential Monte Carlo[18]) combines simulated annealing moves with an acceptance-rejection of the best-fitted individuals equipped with an interacting recycling mechanism.
- Quantum annealing uses "quantum fluctuations" instead of thermal fluctuations to get through high but thin barriers in the target function.
- Stochastic tunneling attempts to overcome the increasing difficulty simulated annealing runs have in escaping from local minima as the temperature decreases, by 'tunneling' through barriers.
- Tabu search normally moves to neighbouring states of lower energy, but will take uphill moves when it finds itself stuck in a local minimum; and avoids cycles by keeping a "taboo list" of solutions already seen.
- Dual-phase evolution is a family of algorithms and processes (to which simulated annealing belongs) that mediate between local and global search by exploiting phase changes in the search space.
- Reactive search optimization focuses on combining machine learning with optimization, by adding an internal feedback loop to self-tune the free parameters of an algorithm to the characteristics of the problem, of the instance, and of the local situation around the current solution.
- Genetic algorithms maintain a pool of solutions rather than just one. New candidate solutions are generated not only by "mutation" (as in SA), but also by "recombination" of two solutions from the pool. Probabilistic criteria, similar to those used in SA, are used to select the candidates for mutation or combination, and for discarding excess solutions from the pool.
- Memetic algorithms search for solutions by employing a set of agents that both cooperate and compete in the process; sometimes the agents' strategies involve simulated annealing procedures for obtaining high-quality solutions before recombining them.[19] Annealing has also been suggested as a mechanism for increasing the diversity of the search.[20]
- Graduated optimization digressively "smooths" the target function while optimizing.
- Ant colony optimization (ACO) uses many ants (or agents) to traverse the solution space and find locally productive areas.
- The cross-entropy method (CE) generates candidate solutions via a parameterized probability distribution. The parameters are updated via cross-entropy minimization, so as to generate better samples in the next iteration.
- Harmony search mimics musicians in improvisation where each musician plays a note to find the best harmony together.
- Stochastic optimization is an umbrella set of methods that includes simulated annealing and numerous other approaches.
- Particle swarm optimization is an algorithm modeled on swarm intelligence that finds a solution to an optimization problem in a search space, or models and predicts social behavior in the presence of objectives.
- The runner-root algorithm (RRA) is a meta-heuristic optimization algorithm for solving unimodal and multimodal problems inspired by the runners and roots of plants in nature.
- Intelligent water drops algorithm (IWD) which mimics the behavior of natural water drops to solve optimization problems
- Parallel tempering is a simulation of model copies at different temperatures (or Hamiltonians) to overcome the potential barriers.
- Multi-objective simulated annealing algorithms have been used in multi-objective optimization.[21]
See also
[edit]- Adaptive simulated annealing
- Automatic label placement
- Combinatorial optimization
- Dual-phase evolution
- Graph cuts in computer vision
- Intelligent water drops algorithm
- Markov chain
- Molecular dynamics
- Multidisciplinary optimization
- Particle swarm optimization
- Place and route
- Quantum annealing
- Traveling salesman problem
References
[edit]- ^ "What is Simulated Annealing?". www.cs.cmu.edu. Retrieved 2023-05-13.
- ^ Pincus, Martin (Nov–Dec 1970). "A Monte-Carlo Method for the Approximate Solution of Certain Types of Constrained Optimization Problems". Journal of the Operations Research Society of America. 18 (6): 967–1235. doi:10.1287/opre.18.6.1225.
- ^ Khachaturyan, A.: Semenovskaya, S.: Vainshtein B., Armen (1979). "Statistical-Thermodynamic Approach to Determination of Structure Amplitude Phases". Soviet Physics Crystallography. 24 (5): 519–524.
{{cite journal}}: CS1 maint: multiple names: authors list (link) - ^ Khachaturyan, A.; Semenovskaya, S.; Vainshtein, B. (1981). "The Thermodynamic Approach to the Structure Analysis of Crystals". Acta Crystallographica. A37 (5): 742–754. Bibcode:1981AcCrA..37..742K. doi:10.1107/S0567739481001630.
{{cite journal}}: CS1 maint: multiple names: authors list (link) - ^ Laarhoven, P. J. M. van (Peter J. M.) (1987). Simulated annealing: theory and applications. Aarts, E. H. L. (Emile H. L.). Dordrecht: D. Reidel. ISBN 90-277-2513-6. OCLC 15548651.
- ^ a b Kirkpatrick, S.; Gelatt Jr, C. D.; Vecchi, M. P. (1983). "Optimization by Simulated Annealing". Science. 220 (4598): 671–680. Bibcode:1983Sci...220..671K. CiteSeerX 10.1.1.123.7607. doi:10.1126/science.220.4598.671. JSTOR 1690046. PMID 17813860. S2CID 205939.
- ^ Kirkpatrick, S. (1984). "Optimization by simulated annealing: Quantitative studies." Journal of Statistical Physics, 34(5-6), 975-986.
- ^ Khachaturyan, A.; Semenovskaya, S.; Vainshtein, B. (1979). "Statistical-Thermodynamic Approach to Determination of Structure Amplitude Phases". Sov.Phys. Crystallography. 24 (5): 519–524.
- ^ Khachaturyan, A.; Semenovskaya, S.; Vainshtein, B. (1981). "The Thermodynamic Approach to the Structure Analysis of Crystals". Acta Crystallographica. 37 (A37): 742–754. Bibcode:1981AcCrA..37..742K. doi:10.1107/S0567739481001630.
- ^ Černý, V. (1985). "Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm". Journal of Optimization Theory and Applications. 45: 41–51. doi:10.1007/BF00940812. S2CID 122729427.
- ^ Metropolis, Nicholas; Rosenbluth, Arianna W.; Rosenbluth, Marshall N.; Teller, Augusta H.; Teller, Edward (1953). "Equation of State Calculations by Fast Computing Machines". The Journal of Chemical Physics. 21 (6): 1087. Bibcode:1953JChPh..21.1087M. doi:10.1063/1.1699114. OSTI 4390578. S2CID 1046577.
- ^ Granville, V.; Krivanek, M.; Rasson, J.-P. (1994). "Simulated annealing: A proof of convergence". IEEE Transactions on Pattern Analysis and Machine Intelligence. 16 (6): 652–656. Bibcode:1994ITPAM..16..652G. doi:10.1109/34.295910.
- ^ Nolte, Andreas; Schrader, Rainer (1997), "A Note on the Finite Time Behaviour of Simulated Annealing", Operations Research Proceedings 1996, vol. 1996, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 175–180, doi:10.1007/978-3-642-60744-8_32, ISBN 978-3-540-62630-5, retrieved 2023-02-06
{{citation}}: CS1 maint: work parameter with ISBN (link) - ^ Moscato, P.; Fontanari, J.F. (1990), "Stochastic versus deterministic update in simulated annealing", Physics Letters A, 146 (4): 204–208, Bibcode:1990PhLA..146..204M, doi:10.1016/0375-9601(90)90166-L
- ^ Dueck, G.; Scheuer, T. (1990), "Threshold accepting: A general purpose optimization algorithm appearing superior to simulated annealing", Journal of Computational Physics, 90 (1): 161–175, Bibcode:1990JCoPh..90..161D, doi:10.1016/0021-9991(90)90201-B, ISSN 0021-9991
- ^ Franz, A.; Hoffmann, K.H.; Salamon, P (2001), "Best optimal strategy for finding ground states", Physical Review Letters, 86 (3): 5219–5222, doi:10.1103/PhysRevLett.86.5219, PMID 11384462
- ^ De Vicente, Juan; Lanchares, Juan; Hermida, Román (2003). "Placement by thermodynamic simulated annealing". Physics Letters A. 317 (5–6): 415–423. Bibcode:2003PhLA..317..415D. doi:10.1016/j.physleta.2003.08.070.
- ^ Del Moral, Pierre; Doucet, Arnaud; Jasra, Ajay (2006). "Sequential Monte Carlo samplers". Journal of the Royal Statistical Society, Series B. 68 (3): 411–436. arXiv:cond-mat/0212648. doi:10.1111/j.1467-9868.2006.00553.x. S2CID 12074789.
- ^ Moscato, Pablo (June 1993). "An introduction to population approaches for optimization and hierarchical objective functions: A discussion on the role of tabu search". Annals of Operations Research. 41 (2): 85–121. doi:10.1007/BF02022564. S2CID 35382644.
- ^ Moscato, P. (1989). "On Evolution, Search, Optimization, Genetic Algorithms and Martial Arts: Towards Memetic Algorithms". Caltech Concurrent Computation Program (report 826).
- ^ Deb, Bandyopadhyay (June 2008). "A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA". IEEE Transactions on Evolutionary Computation. 12 (3): 269–283. Bibcode:2008ITEC...12..269B. doi:10.1109/TEVC.2007.900837. S2CID 12107321.
Further reading
[edit]- A. Das and B. K. Chakrabarti (Eds.), Quantum Annealing and Related Optimization Methods, Lecture Note in Physics, Vol. 679, Springer, Heidelberg (2005)
- Weinberger, E. (1990). "Correlated and uncorrelated fitness landscapes and how to tell the difference". Biological Cybernetics. 63 (5): 325–336. doi:10.1007/BF00202749. S2CID 851736.
- Press, WH; Teukolsky, SA; Vetterling, WT; Flannery, BP (2007). "Section 10.12. Simulated Annealing Methods". Numerical Recipes: The Art of Scientific Computing (3rd ed.). New York: Cambridge University Press. ISBN 978-0-521-88068-8. Archived from the original on 2011-08-11. Retrieved 2011-08-13.
- Strobl, M.A.R.; Barker, D. (2016). "On simulated annealing phase transitions in phylogeny reconstruction". Molecular Phylogenetics and Evolution. 101: 46–55. Bibcode:2016MolPE.101...46S. doi:10.1016/j.ympev.2016.05.001. PMC 4912009. PMID 27150349.
- V.Vassilev, A.Prahova: "The Use of Simulated Annealing in the Control of Flexible Manufacturing Systems", International Journal Information Theories & Applications, VOLUME 6/1999
- D. Thiel, "Simulated Annealing: From Statistical Thermodynamics to Combinatory Problems Solving", Encyclopedia of Life Support Systems UNESCO – EOLSS, Chapter Systems Science and Cybernetics – Vol. III [1]
External links
[edit]- Simulated Annealing A JavaScript app that allows you to experiment with simulated annealing. Source code included.
- "General Simulated Annealing Algorithm" Archived 2008-09-23 at the Wayback Machine An open-source MATLAB program for general simulated annealing exercises.
- Self-Guided Lesson on Simulated Annealing A Wikiversity project.
- Google in superposition of using, not using quantum computer Ars Technica discusses the possibility that the D-Wave computer being used by Google may, in fact, be an efficient simulated annealing co-processor.
- [2] A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA.
Simulated annealing
View on GrokipediaIntroduction
Overview
Simulated annealing is a probabilistic metaheuristic algorithm designed for global optimization problems in vast, complex search spaces, where traditional local search methods often get trapped in suboptimal solutions.[2] It approximates the global minimum of an objective function by mimicking the physical annealing process in metallurgy, where controlled cooling allows a material to reach a low-energy state.[2] The general workflow starts with an initial random state in the solution space. Iteratively, the algorithm generates a neighboring state through a small random perturbation of the current state. Better neighbors (those with lower energy, or improved objective value) are always accepted, while worse ones are accepted probabilistically, with the acceptance likelihood controlled by a decreasing "temperature" parameter that simulates gradual cooling. This mechanism introduces controlled randomness to explore broadly at high temperatures, escaping local optima, and narrows focus at low temperatures to refine toward convergence on a high-quality global solution.[2] A classic example is the traveling salesman problem (TSP), which seeks the shortest possible route visiting each of a set of cities exactly once and returning to the origin. Here, a state is represented as a permutation of the cities outlining the tour sequence, and the energy function measures the total tour length based on inter-city distances. Simulated annealing effectively navigates the enormous space of possible tours to yield near-optimal paths, even for instances with hundreds of cities.[2]Physical Inspiration
In the physical process of annealing in metallurgy, a solid material such as a metal is heated to a high temperature, typically above its recrystallization point, which increases atomic mobility and allows atoms to move freely from their positions in the crystal lattice. This elevated temperature provides sufficient thermal energy for the system to overcome energy barriers, enabling the exploration of a wide range of atomic configurations and the reduction of defects like dislocations, vacancies, and grain boundaries that were introduced during prior processing, such as cold working.[7] As the material is then slowly cooled under controlled conditions—often in a furnace at rates of 20–25 K/h—the atoms gradually settle into more stable positions, forming a highly ordered crystal structure with minimized internal stresses and defects. This slow cooling is crucial because it prevents the system from becoming trapped in metastable, higher-energy states; instead, it promotes thermodynamic equilibrium, leading to a low-energy configuration that represents the global minimum in the material's free energy landscape. The process thus transforms the material into a softer, more ductile state suitable for further fabrication.[7] This metallurgical phenomenon inspires the simulated annealing algorithm by providing an analogy for navigating complex optimization problems. In the computational domain, the physical "temperature" is mapped to a control parameter T that governs the randomness of transitions between solution states, allowing the algorithm to explore diverse regions of the search space at high T, much like atomic diffusion at elevated temperatures. The "energy" E(s) of a state s corresponds to the value of the objective function to be minimized, while states themselves represent candidate solutions or configurations in the problem's state space. During cooling, decreasing T reduces the acceptance of suboptimal moves, guiding the system toward a global optimum analogous to the defect-free crystal structure.[2] A key physical principle underlying this analogy is the Boltzmann distribution from statistical mechanics, which describes the probability of the system occupying a particular state with energy E at temperature T in thermal equilibrium:History
Origins in Physics
The foundations of simulated annealing trace back to the 1953 work by Nicholas Metropolis and colleagues, who developed a Monte Carlo method for simulating the behavior of physical systems in equilibrium.[8] This approach, known as the Metropolis algorithm, generates configurations of a system by proposing random changes and accepting or rejecting them based on an acceptance probability that ensures the sampled states follow the desired distribution.[8] The method was applied to compute properties like equations of state for interacting molecules, demonstrating its utility in handling complex, high-dimensional configuration spaces through stochastic sampling.[8] Central to this technique is the principle from statistical mechanics that, at thermal equilibrium, the probability of a system occupying a state with energy is proportional to , where is the Boltzmann constant and is the temperature.[8] This Boltzmann distribution governs the likelihood of different configurations, allowing simulations to model how systems explore energy landscapes and settle into low-energy states as temperature decreases.[8] By enforcing this distribution via the acceptance criterion—accepting moves that lower energy with probability 1 and those that increase it with probability —the algorithm mimics the natural thermalization process in physical systems.[8] In the 1970s, extensions of these Monte Carlo methods were applied to simulate frustrated magnetic systems, such as the Ising model with random interactions and early spin glass models. Researchers like Kurt Binder used these simulations to investigate the Ising model on lattices where nearest-neighbor couplings were randomly ferromagnetic or antiferromagnetic, revealing complex phase behaviors and the presence of multiple metastable states separated by high energy barriers. For spin glasses, introduced by Edwards and Anderson in 1975, Monte Carlo studies highlighted how random disorder leads to rugged energy landscapes, where cooling the system gradually helps overcome barriers to access lower-energy configurations and approximate ground states.[9] This physics-based approach revealed that controlled cooling in Monte Carlo simulations could effectively minimize energy in disordered systems, paving the way for its adaptation as a general optimization strategy by recognizing the analogy between thermal equilibrium sampling and searching for function minima.[9]Computational Development
The transition of annealing concepts from physical simulations to computational optimization began in earnest in 1983, when Scott Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi at IBM introduced simulated annealing as a probabilistic method for solving combinatorial optimization problems. In their seminal work, they applied the technique to very-large-scale integration (VLSI) circuit design—specifically, for placement and routing of components to minimize wire length—and the traveling salesman problem (TSP), demonstrating its ability to escape local minima by mimicking the cooling process in metallurgy. This paper coined the term "simulated annealing" and established the core framework, including the Metropolis acceptance criterion adapted for discrete state spaces.[2] Independently, V. Černý developed a thermodynamically inspired algorithm around the same time, published in 1985, which applied a similar Monte Carlo simulation to the TSP and quadratic assignment problems, emphasizing efficient cooling schedules for global optimization. These early computational adaptations marked a shift from purely physical modeling to practical algorithmic tools, with initial implementations focusing on NP-hard problems in computer-aided design. Throughout the 1980s and 1990s, simulated annealing gained traction in operations research, VLSI design, and early artificial intelligence, integrating with heuristic methods for problems like graph partitioning and scheduling. Key contributions included theoretical analyses of convergence and practical extensions for parallel computing. A foundational text, Simulated Annealing and Boltzmann Machines by Emile Aarts and Jan Korst (1989), formalized the stochastic approach, bridging combinatorial optimization and neural computing while providing guidelines for parameter tuning. By the mid-1990s, surveys had reviewed numerous applications in operations research, underscoring its robustness across domains.[10] Post-2000 developments emphasized adaptive mechanisms to enhance efficiency, such as the Adaptive Simulated Annealing (ASA) algorithm introduced in 2000, which dynamically adjusts temperature and neighborhood sizes based on problem dimensionality for faster convergence in continuous and discrete spaces.[11] Software integration accelerated adoption, with libraries like SciPy incorporating simulated annealing variants in the 2010s, enabling accessible implementations for scientific computing and machine learning tasks.[12] In the 2020s, research has increasingly explored hybrid quantum-classical annealing, combining classical heuristics with quantum annealers like D-Wave systems to tackle large-scale optimization, as demonstrated in applications to scheduling and portfolio management.[13]Core Algorithm
State Space and Energy Function
In simulated annealing, the optimization problem is formulated over a state space $ S $, which encompasses all feasible configurations or solutions to the problem at hand. This space can be discrete, such as the set of all permutations of cities in the traveling salesman problem (TSP), where each state represents a possible tour order, or continuous, as in parameter tuning for neural networks where states are vectors of real-valued weights. Alternatively, for the 0-1 knapsack problem, $ S $ consists of binary strings of length $ n $, each indicating whether an item is included in the knapsack or not, subject to capacity constraints. The structure of $ S $ depends on the problem's nature, often forming a combinatorial space with exponentially many states that renders exhaustive search impractical. The energy function $ E: S \to \mathbb{R} $ assigns a scalar value to each state $ s \in S $, quantifying the "cost" or undesirability of that configuration, with the objective being to minimize $ E(s) $ to reach the global optimum. In the TSP example, $ E(s) $ is defined as the total distance of the tour corresponding to permutation $ s $. For the knapsack, $ E(s) $ typically measures the negative total value of selected items if the weight constraint is satisfied, or a high penalty otherwise to enforce feasibility. Desirable properties of $ E $ include additivity, where the energy decomposes into sums over independent components (e.g., pairwise interactions in graph partitioning), or modularity, allowing evaluation based on modular substructures, which facilitates efficient computation in large spaces. These properties, inspired by physical systems, enable the analogy to thermodynamic energy minimization. The initial state $ s_0 \in S $ is selected to start the annealing process, often randomly to ensure broad exploration or via a heuristic method for quicker convergence toward promising regions. In applications like VLSI circuit placement, random initialization avoids bias toward suboptimal layouts, while heuristics such as greedy algorithms provide a strong starting point in problems like TSP. This choice influences the trajectory but is designed to be robust under the annealing dynamics. The search landscape refers to the structure induced by $ E $ over $ S $, visualized as a multidimensional surface with peaks and valleys corresponding to high- and low-energy states, respectively. Rugged terrains feature numerous local minima—states where small changes increase energy—trapping greedy algorithms, alongside a global optimum representing the lowest energy configuration. Simulated annealing navigates these landscapes by probabilistically escaping local minima, mimicking thermal fluctuations in physical annealing to probabilistically reach the global minimum despite barriers. This capability is particularly valuable in NP-hard problems with deceptive landscapes, such as combinatorial optimization.Neighbor Generation
In simulated annealing, the neighborhood structure for a current state consists of the set of all states reachable through small, local perturbations that maintain the problem's constraints while introducing minimal changes to explore nearby solutions efficiently.[2] This structure ensures that generated candidates remain in the feasible state space, facilitating gradual navigation toward lower-energy configurations without requiring exhaustive search.[14] Candidate states, or neighbors , are typically generated by selecting one element uniformly at random from , which promotes unbiased exploration of the local landscape at each iteration.[2] This uniform selection mechanism, rooted in the Metropolis algorithm, allows the process to mimic thermal fluctuations in physical annealing by probabilistically sampling adjacent states.[14] Specific generation methods vary by problem domain to balance computational efficiency and solution quality. For the traveling salesman problem (TSP), a common approach involves swapping the positions of two cities in the tour sequence, creating a new permutation that alters the path length modestly.[14] In binary optimization problems, such as the knapsack or satisfiability problems, neighbors are produced by flipping a single bit in the binary representation of the state, which corresponds to toggling one decision variable.[15] The transition probability from the current state to a generated neighbor is often set to a uniform value of when , ensuring equal likelihood for all local moves.[2] However, biased probabilities can be employed to favor certain directions, such as those leading to promising regions, thereby improving convergence speed in large-scale applications without violating the algorithm's foundational principles.[16] To guarantee thorough exploration of the state space, the neighborhood structure must induce an ergodic Markov chain, meaning that from any state, it is possible to reach any other state through a sequence of allowed transitions, preventing the algorithm from becoming trapped in disconnected components.[17] This connectivity requirement is essential for the theoretical convergence properties of simulated annealing, as demonstrated in analyses of nonstationary Markov processes underlying the method.[17]Acceptance Mechanism
In simulated annealing, the acceptance mechanism decides whether to transition from the current state $ s $ to a proposed neighbor state $ s' $ based on their respective energy values $ E(s) $ and $ E(s') $. The energy difference is defined as $ \Delta E = E(s') - E(s) $.[2] If $ \Delta E \leq 0 $, indicating an improvement or equal energy, the new state is accepted with probability 1. For $ \Delta E > 0 $, an uphill move, acceptance occurs probabilistically with probability $ p = e^{-\Delta E / T} $, where $ T $ is the current temperature parameter. This can be compactly expressed as the acceptance probability:Temperature Schedule
The initial temperature $ T_0 $ in simulated annealing is typically set sufficiently high to ensure that a large proportion of proposed moves are accepted, often aiming for an initial acceptance rate of around 60-80% for uphill moves based on the average energy change $ \Delta E $.[18] This allows broad exploration of the state space early in the process, mimicking the high-temperature phase of physical annealing where the system can easily escape local minima. The most common cooling rule is the geometric schedule, where the temperature at iteration $ k+1 $ is updated as $ T_{k+1} = \alpha T_k $, with $ 0 < \alpha < 1 $ (typically $ \alpha $ between 0.8 and 0.99 to balance speed and thoroughness).[2] This results in an exponential decay expressed as $ T(t) = T_0 \cdot \alpha^t $, where $ t $ denotes the iteration or time step, enabling gradual reduction in randomness over time. Alternative schedules include linear cooling, $ T(t) = T_0 - \beta t $ for some positive $ \beta $, which decreases temperature at a constant rate but may converge faster in practice for certain problems, and adaptive methods that adjust $ \alpha $ dynamically based on recent acceptance rates to maintain equilibrium.[19][20] The annealing process terminates using criteria such as the temperature dropping below a minimum threshold $ T_{\min} $ (often near zero or a small positive value) or after a fixed number of iterations.[1] Theoretically, for asymptotic convergence to the global optimum in probability, the cooling schedule must decrease slowly enough, such as the logarithmic form $ T(t) \sim \frac{c}{\log(t+1)} $ where $ c $ exceeds the maximum depth of non-global local minima, ensuring the Markov chain has sufficient time to explore optimal states.[21]Implementation
Pseudocode
The basic simulated annealing algorithm can be expressed in pseudocode as a straightforward iterative process that requires the user to specify the energy evaluation function , the neighbor generation mechanism , and key parameters such as the initial temperature , the minimum temperature , and the cooling rate (where )[2]. This template assumes a minimization problem and focuses on the core loop without advanced features like parallelization or adaptive adjustments.// Initialize the current state and its energy
s ← s_0 // Initial state (random or heuristic)
E ← E(s) // Compute initial energy using the provided [energy](/page/Energy) function
T ← T_0 // Set initial temperature
// Set best solution tracking (optional, for recording global minimum)
s_best ← s
E_best ← E
// Main annealing loop: continue until temperature is sufficiently low
while T > T_min do
// Generate a candidate neighbor state
s_new ← N(s) // Sample a neighbor from the neighborhood of current state using N(s)
// Evaluate the energy of the new state
E_new ← E(s_new)
// Compute the energy difference
ΔE ← E_new - E
// Acceptance decision using the Metropolis criterion
if ΔE < 0 or random() < exp(-ΔE / T) then // random() generates uniform [0,1)
s ← s_new // Accept the new state
E ← E_new // Update current energy
if E < E_best then // Update best if improved
s_best ← s
E_best ← E
end if
// Cool the temperature according to the schedule (geometric cooling here)
T ← α * T // Reduce temperature multiplicatively
end while
// Return the best state found
return s_best
This pseudocode represents the fundamental serial implementation of simulated annealing, where each iteration explores a single neighbor and updates sequentially, as originally conceptualized in the seminal work on the method[2]. The acceptance mechanism briefly references the standard probability for uphill moves, ensuring probabilistic escape from local minima at higher temperatures.