Recent from talks
Contribute something
Nothing was collected or created yet.
Parallel tempering
View on WikipediaThis article may be too technical for most readers to understand. (December 2021) |
Parallel tempering, in physics and statistics, is a computer simulation method typically used to find the lowest energy state of a system of many interacting particles. It addresses the problem that at high temperatures, one may have a stable state different from low temperature, whereas simulations at low temperatures may become "stuck" in a metastable state. It does this by using the fact that the high temperature simulation may visit states typical of both stable and metastable low temperature states.
More specifically, parallel tempering (also known as replica exchange MCMC sampling), is a simulation method aimed at improving the dynamic properties of Monte Carlo method simulations of physical systems, and of Markov chain Monte Carlo (MCMC) sampling methods more generally. The replica exchange method was originally devised by Robert Swendsen and J. S. Wang,[1] then extended by Charles J. Geyer,[2] and later developed further by Giorgio Parisi,[3] Koji Hukushima and Koji Nemoto,[4] and others.[5][6] Y. Sugita and Y. Okamoto also formulated a molecular dynamics version of parallel tempering; this is usually known as replica-exchange molecular dynamics or REMD.[7]
Essentially, one runs N copies of the system, randomly initialized, at different temperatures. Then, based on the Metropolis criterion one exchanges configurations at different temperatures. The idea of this method is to make configurations at high temperatures available to the simulations at low temperatures and vice versa. This results in a very robust ensemble which is able to sample both low and high energy configurations. In this way, thermodynamical properties such as the specific heat, which is in general not well computed in the canonical ensemble, can be computed with great precision.
Background
[edit]Typically a Monte Carlo simulation using a Metropolis–Hastings update consists of a single stochastic process that evaluates the energy of the system and accepts/rejects updates based on the temperature T. At high temperatures updates that change the energy of the system are comparatively more probable. When the system is highly correlated, updates are rejected and the simulation is said to suffer from critical slowing down.
If we were to run two simulations at temperatures separated by a ΔT, we would find that if ΔT is small enough, then the energy histograms obtained by collecting the values of the energies over a set of Monte Carlo steps N will create two distributions that will somewhat overlap. The overlap can be defined by the area of the histograms that falls over the same interval of energy values, normalized by the total number of samples. For ΔT = 0 the overlap should approach 1.
Another way to interpret this overlap is to say that system configurations sampled at temperature T1 are likely to appear during a simulation at T2. Because the Markov chain should have no memory of its past, we can create a new update for the system composed of the two systems at T1 and T2. At a given Monte Carlo step we can update the global system by swapping the configuration of the two systems, or alternatively trading the two temperatures. The update is accepted according to the Metropolis–Hastings criterion with probability
and otherwise the update is rejected. The detailed balance condition has to be satisfied by ensuring that the reverse update has to be equally likely, all else being equal. This can be ensured by appropriately choosing regular Monte Carlo updates or parallel tempering updates with probabilities that are independent of the configurations of the two systems or of the Monte Carlo step.[8]
This update can be generalized to more than two systems.
By a careful choice of temperatures and number of systems one can achieve an improvement in the mixing properties of a set of Monte Carlo simulations that exceeds the extra computational cost of running parallel simulations.
Other considerations to be made: increasing the number of different temperatures can have a detrimental effect, as one can think of the 'lateral' movement of a given system across temperatures as a diffusion process. Set up is important as there must be a practical histogram overlap to achieve a reasonable probability of lateral moves.
The parallel tempering method can be used as a super simulated annealing that does not need restart, since a system at high temperature can feed new local optimizers to a system at low temperature, allowing tunneling between metastable states and improving convergence to a global optimum.
Implementations
[edit]See also
[edit]References
[edit]- ^ Swendsen RH and Wang JS (1986) Replica Monte Carlo simulation of spin glasses Physical Review Letters 57 : 2607–2609
- ^ C. J. Geyer, (1991) in Computing Science and Statistics, Proceedings of the 23rd Symposium on the Interface, American Statistical Association, New York, p. 156.
- ^ Marinari, E; Parisi, G (1992-07-15). "Simulated Tempering: A New Monte Carlo Scheme". Europhysics Letters. 19 (6): 451–458. arXiv:hep-lat/9205018. Bibcode:1992EL.....19..451M. doi:10.1209/0295-5075/19/6/002. ISSN 0295-5075. S2CID 250781561.
- ^ Hukushima, Koji & Nemoto, Koji (1996). "Exchange Monte Carlo method and application to spin glass simulations". J. Phys. Soc. Jpn. 65 (6): 1604–1608. arXiv:cond-mat/9512035. Bibcode:1996JPSJ...65.1604H. doi:10.1143/JPSJ.65.1604. S2CID 15032087.
- ^ Marco Falcioni & Michael W. Deem (1999). "A Biased Monte Carlo Scheme for Zeolite Structure Solution". J. Chem. Phys. 110 (3): 1754. arXiv:cond-mat/9809085. Bibcode:1999JChPh.110.1754F. doi:10.1063/1.477812. S2CID 13963102.
- ^ David J. Earl and Michael W. Deem (2005) "Parallel tempering: Theory, applications, and new perspectives", Phys. Chem. Chem. Phys., 7, 3910
- ^ Y. Sugita & Y. Okamoto (1999). "Replica-exchange molecular dynamics method for protein folding". Chemical Physics Letters. 314 (1–2): 141–151. Bibcode:1999CPL...314..141S. doi:10.1016/S0009-2614(99)01123-9.
- ^ Radford M. Neal (1996). "Sampling from multimodal distributions using tempered transitions". Statistics and Computing. 6 (4): 353–366. doi:10.1007/BF00143556. S2CID 11106113.
Parallel tempering
View on GrokipediaFundamentals
Definition and Overview
Parallel tempering, also known as replica exchange Monte Carlo, is a Monte Carlo sampling technique that simulates multiple non-interacting replicas of a system at distinct temperatures to draw samples from a target probability distribution.[5] This method is particularly suited for target distributions that are multimodal or feature rugged energy landscapes, where conventional Markov chain Monte Carlo (MCMC) approaches often struggle due to poor mixing and entrapment in local modes.[5] The primary objective of parallel tempering is to enhance exploration of the state space by permitting swaps of configurations between replicas, which allows low-temperature replicas—focused on precise sampling near the target distribution—to benefit from the broader searches conducted by high-temperature replicas, thereby avoiding prolonged trapping in suboptimal regions.[5] At its core, the algorithm operates by evolving replicas independently at temperatures , with typically set to the temperature of interest for the target distribution.[5] At regular intervals, attempts are made to exchange configurations between neighboring replicas along the temperature ladder, promoting the diffusion of diverse states and improving overall ergodicity without altering the underlying single-replica dynamics.[5] This setup is often illustrated conceptually as a "replica ladder," in which cold replicas at the base sample refined, low-energy configurations pertinent to the target, while hot replicas at the top perform extensive, diffusive explorations of the configuration space.[5] Exchanges between adjacent rungs enable key low-energy states discovered at higher temperatures to percolate downward, facilitating more thorough and unbiased sampling across the entire landscape.[5]Historical Development
Parallel tempering, originally introduced as the replica Monte Carlo method, was developed by Robert H. Swendsen and Jian-Sheng Wang in 1986 to simulate spin-glass systems with quenched random interactions, addressing ergodicity issues by employing multiple replicas at different temperatures and partial exchanges of configurations to reduce long correlation times.[6] In 1991, Charles J. Geyer formalized the approach for general Markov chain Monte Carlo (MCMC) applications, shifting to complete exchanges of configurations between replicas at distinct temperatures, which enhanced its versatility beyond physics-specific simulations.[7] Theoretical justification for tempering techniques was further advanced in 1992 by Giorgio Parisi through the proposal of simulated tempering, a related method that informed the conceptual foundations of replica exchange by allowing temperature updates in a single system to navigate rough energy landscapes.[8] Key refinements occurred in 1996 when Koji Hukushima and Kazuyuki Nemoto introduced the exchange Monte Carlo method to improve efficiency in spin-glass simulations and establishing parallel tempering as a robust tool for complex systems.[9] A significant extension came in 1999 with Yuji Sugita and Yuko Okamoto's adaptation of the algorithm to molecular dynamics, termed replica-exchange molecular dynamics (REMD), which facilitated protein folding simulations by enabling efficient barrier crossing in biomolecular energy landscapes.[10] Initially focused on statistical physics, parallel tempering evolved into broader statistical applications by the early 2000s, with implementations in chemistry, biology, and materials science for enhanced sampling in high-dimensional spaces.[1] Post-2010, its adoption grew in bioinformatics for tasks like protein structure prediction and systems biology model reduction, as well as in optimization problems across engineering and machine learning, due to its parallelizability and improved convergence properties.[11]Theoretical Foundations
Limitations of Standard MCMC Methods
Standard Markov chain Monte Carlo (MCMC) methods, such as the Metropolis-Hastings algorithm, exhibit slow mixing times when sampling from high-dimensional or multimodal target distributions, resulting in inefficient exploration of the state space and prolonged autocorrelation between successive samples.[12] This critical slowing down becomes particularly pronounced near phase transitions or in distributions with separated modes, where local proposal mechanisms fail to facilitate transitions between regions of low probability density, leading to poor coverage of rare events. In such scenarios, the chains often become trapped in local energy minima, with high autocorrelation times that scale unfavorably, limiting the effective independent samples obtained per computation. In physical models like the Ising model, standard local-update MCMC algorithms suffer from critical slowing down, where the integrated autocorrelation time scales as with correlation length and dynamic exponent near criticality, leading to computational costs that grow as for linear system size and dimension .[13] For disordered systems such as spin glasses, these issues are exacerbated by exponentially high energy barriers separating numerous local minima, causing the system to remain trapped for extended periods and resulting in exponential scaling of relaxation times with system size.[14] Similarly, in protein folding simulations, conventional Monte Carlo or molecular dynamics methods at low temperatures get stuck in local minimum-energy conformations due to rugged energy landscapes with substantial barriers, preventing efficient sampling of the global minimum and rare folding events.[15] Quantitatively, the mixing time in these barrier-dominated regimes follows an Arrhenius-like scaling, , where is the height of the energy barrier and is the temperature; this renders low-temperature sampling, crucial for ground-state properties, computationally prohibitive as barriers become insurmountable relative to thermal energy.[16] These inefficiencies in standard MCMC underscore the need for enhanced sampling strategies, such as tempering approaches, to improve ergodicity in complex distributions.[14]Principles of Tempering and Replica Exchange
Tempering in the context of parallel tempering involves simulating multiple non-interacting replicas of the system at different temperatures, where the inverse temperature scales the Boltzmann factor in the canonical ensemble probability distribution . At higher temperatures (lower ), the energy landscape is effectively flattened, reducing the depth of local minima and enabling replicas to more readily escape energy traps that might confine standard Markov chain Monte Carlo (MCMC) sampling at low temperatures. Conversely, low-temperature (high-) replicas focus on refining sampling in low-energy regions, providing detailed exploration of the target distribution while benefiting from the broader exploration of hotter replicas through subsequent exchanges. The replica exchange principle extends this tempering by periodically attempting swaps of configurations between replicas at adjacent temperatures and , where the temperatures form a geometric ladder with to maintain consistent statistical overlap between neighboring distributions.[17] The swap acceptance probability is , ensuring that exchanges occur with a likelihood that respects the underlying thermodynamics without biasing the ensemble. This mechanism allows configurations from high-temperature replicas to propagate to lower temperatures, facilitating escape from metastable states and enhancing overall sampling efficiency across the temperature range.[17] Theoretically, parallel tempering preserves detailed balance in the extended ensemble of all replicas, as the swap moves satisfy the Metropolis criterion and the individual replica evolutions maintain equilibrium within their respective canonical ensembles.[17] Ergodicity is improved through diffusion of states across the temperature ladder via successful swaps, enabling the cold replicas to access a more representative portion of the global state space that might otherwise be isolated due to high energy barriers. For effective exchanges, the energy histograms of adjacent replicas must overlap sufficiently; optimal temperature spacing is achieved when the difference in average energies satisfies , ensuring swap acceptance rates around 20-30% for balanced mixing.[18]Algorithm Description
Replica Ladder Setup
In parallel tempering, the replica ladder setup begins with the initialization of multiple independent Markov chains, each representing a replica of the system. Typically, N replicas are started from random configurations drawn from a broad prior distribution or from pre-equilibrated states to ensure diverse initial sampling across the state space. Each replica is then assigned to a distinct temperature T_i from a predefined ladder, allowing the ensemble to explore the target distribution at varying levels of thermal energy.[19] This setup draws briefly from tempering principles, where higher temperatures facilitate broader exploration while lower ones refine sampling near the ground state. The temperature ladder is designed to promote efficient information exchange between replicas by ensuring sufficient overlap in their probability distributions. A common approach uses a geometric progression, where temperatures are set as T_i = T_min * γ^{i-1} for i = 1 to N, with γ > 1 chosen to achieve an acceptance rate of around 20-30% for subsequent swaps between adjacent replicas. The minimal temperature T_min is usually the target simulation temperature, while the maximal T_max is selected to enable rapid decorrelation at high energies. The number of replicas N scales with system complexity; for biomolecular systems like proteins, N typically ranges from 10 to 100, balancing computational cost with sampling efficiency—for instance, 24 replicas suffice for small peptides, while larger proteins may require 48 or more.[20] During execution, the replicas run in parallel, each evolving independently through local Markov chain Monte Carlo (MCMC) updates, such as Metropolis-Hastings steps, over a fixed number of iterations between potential exchanges. This independent dynamics allows each replica to sample its assigned temperature-specific distribution without interference, leveraging parallel computing resources to advance all chains simultaneously. Each replica maintains a state comprising a configuration x_i and its associated energy E(x_i, T_i), targeting the Boltzmann-like distribution π_i(x) ∝ \exp\left(-\frac{E(x)}{T_i}\right). This representation ensures that the ensemble collectively approximates the canonical distribution at T_min while benefiting from enhanced mixing across the ladder.Configuration Exchange Mechanism
In parallel tempering, the configuration exchange mechanism involves periodic attempts to swap the current states (configurations) between adjacent replicas in the temperature ladder. These swap proposals are typically made after a fixed number of local Monte Carlo or molecular dynamics steps—often on the order of 100 to 1000 steps per replica—to allow sufficient intra-replica equilibration before attempting exchanges. A pair of adjacent replicas and , with inverse temperatures , is randomly selected, and an exchange of their configurations is proposed.[21] The proposed swap is accepted according to the Metropolis criterion, with acceptance probability where denotes the energy of configuration .[21] This ensures that the joint distribution over the extended ensemble of replicas remains invariant, as the swap either preserves or decreases the total effective energy across the pair. The acceptance probability derives from the requirement of detailed balance in the extended Markov chain. Consider the joint probability before the swap: . After swapping to at temperature and at , the joint becomes . The ratio of these probabilities is , and the Metropolis rule accepts with the minimum of 1 and this ratio to satisfy detailed balance: the forward and reverse transition probabilities balance such that , where denotes the full replica configurations.[21] To further enhance mixing and reduce correlations between swap attempts, an even-odd strategy is often employed, alternating between proposing swaps on even-indexed pairs (e.g., 0-1, 2-3, ...) and odd-indexed pairs (e.g., 1-2, 3-4, ...). This deterministic alternation allows parallel execution of non-overlapping swaps within each cycle, improving scalability and round-trip efficiency across the temperature ladder without introducing additional bias.[3]Equilibrium and Convergence Properties
Parallel tempering operates within an extended ensemble comprising replicas, each evolving according to a tempered distribution for inverse temperatures with , where is the target distribution at the lowest temperature . The equilibrium distribution of the joint Markov chain is the product measure , as the local updates within each replica satisfy detailed balance for their respective marginals, and the replica exchange moves preserve this product form by satisfying detailed balance between swapped states.[1][22] Consequently, the marginal distribution for the replica at the lowest temperature exactly matches the target distribution , enabling unbiased sampling from the desired equilibrium once stationarity is reached.[22] The swap mechanism induces a random walk for each configuration across the temperature ladder, where successful exchanges between adjacent replicas occur with probability , promoting diffusion from high to low temperatures and vice versa. This process ensures ergodicity provided the base MCMC kernels are irreducible on their state spaces and the temperature ladder connects all replicas, allowing the joint chain to explore the full product space.[1] In ideal cases with constant acceptance rates for adjacent swaps, the round-trip time for a configuration to diffuse from the hottest to the coldest replica and back scales as , reflecting the quadratic diffusion time of a one-dimensional random walk across sites.[1] Optimal performance targets acceptance rates of 20-40% for adjacent swaps, as derived from maximizing the mean squared displacement in temperature space, which balances exploration and detailed balance.[1] Convergence diagnostics in parallel tempering focus on monitoring replica exchange rates to verify sufficient mixing across the ladder, with rates below 20% indicating poor overlap and potential bottlenecks. Energy histograms from adjacent replicas should exhibit substantial overlap (e.g., covering a common range of energies) to facilitate efficient swaps; insufficient overlap signals the need for temperature adjustments to achieve canonical scaling and rapid mixing.[1] Under conditions of geometric ergodicity for the base chains and adequate temperature spacing, the overall chain mixes rapidly, with spectral gap lower bounds ensuring polynomial-time convergence for multimodal targets like Gaussian mixtures or Ising models.[23]Implementations and Practical Considerations
Numerical Implementation and Pseudocode
Parallel tempering implementations leverage parallel computing frameworks to manage multiple replicas efficiently, allowing simultaneous execution of local MCMC updates across processors. The Message Passing Interface (MPI) is a widely adopted standard for distributed environments, where each replica operates independently using kernels like Metropolis-Hastings for configuration updates at its assigned temperature. Local updates preserve detailed balance within each tempered distribution, while inter-replica swaps ensure global ergodicity. In practice, the algorithm proceeds in sweeps: each replica undergoes a fixed number of local MCMC steps before attempting swaps between adjacent temperatures. Energy evaluations, central to acceptance decisions, are computed directly for low-dimensional problems but may require approximations like surrogate models in high dimensions to reduce computational cost. Precomputing lookup tables or using vectorized operations in languages like Python or MATLAB can further optimize performance for repeated evaluations. The following pseudocode outlines a standard parallel tempering procedure, adapted for general use with N replicas at temperatures T_1 < T_2 < ... < T_N (where β_i = 1/T_i), targeting a distribution proportional to exp(-E(x)):Initialize global variables: R (list of [replica](/page/Replica)s), T (list of temperatures), B (best configuration, initialized randomly), E ([energy](/page/Energy) of B);
for each [replica](/page/Replica) r_i:
Initialize configuration x_i ~ prior distribution;
Compute E_i = E(x_i);
end for
for sweep s = 1 to S (maximum sweeps):
for each [replica](/page/Replica) r_i:
Perform Q local MCMC updates on x_i (e.g., Metropolis-Hastings steps at temperature T_i);
Compute updated E_i = E(x_i);
if E_i == 0: // zero-[energy](/page/Energy) solution found
return x_i;
else if E_i < E:
B = x_i; E = E_i;
end if
end for
for each pair of adjacent replicas (i, i+1):
Δβ = β_{i+1} - β_i; // note: β increases as T decreases
ΔE = E_i - E_{i+1};
P = min(1, exp(Δβ * ΔE));
if Rand() < P: // Rand() uniform in [0,1]
Swap configurations x_i and x_{i+1} (and their energies E_i, E_{i+1});
end if
end for
end for
return B, E
Initialize global variables: R (list of [replica](/page/Replica)s), T (list of temperatures), B (best configuration, initialized randomly), E ([energy](/page/Energy) of B);
for each [replica](/page/Replica) r_i:
Initialize configuration x_i ~ prior distribution;
Compute E_i = E(x_i);
end for
for sweep s = 1 to S (maximum sweeps):
for each [replica](/page/Replica) r_i:
Perform Q local MCMC updates on x_i (e.g., Metropolis-Hastings steps at temperature T_i);
Compute updated E_i = E(x_i);
if E_i == 0: // zero-[energy](/page/Energy) solution found
return x_i;
else if E_i < E:
B = x_i; E = E_i;
end if
end for
for each pair of adjacent replicas (i, i+1):
Δβ = β_{i+1} - β_i; // note: β increases as T decreases
ΔE = E_i - E_{i+1};
P = min(1, exp(Δβ * ΔE));
if Rand() < P: // Rand() uniform in [0,1]
Swap configurations x_i and x_{i+1} (and their energies E_i, E_{i+1});
end if
end for
end for
return B, E
import numpy as np
def log_target(x):
return np.log(0.5 * np.exp(-(x + 2)**2 / 2) + 0.5 * np.exp(-(x - 2)**2 / 2))
N_replicas = 10
n_sweeps = 1000
n_local_steps = 100
proposal_std = 0.5
# Temperatures T_i from 1 to 10, betas = 1/T
T = np.linspace(1, 10, N_replicas)
beta = 1 / T
# Initialize replicas x_i ~ uniform[-5,5]
x = np.random.uniform(-5, 5, N_replicas)
logp = np.array([log_target(xi) for xi in x]) # log π(x_i)
for sweep in range(n_sweeps):
# Local updates for each replica
for i in range(N_replicas):
for _ in range(n_local_steps):
x_prop = x[i] + np.random.normal(0, proposal_std)
logp_prop = log_target(x_prop)
delta = beta[i] * (logp_prop - logp[i])
if np.log(np.random.rand()) < delta:
x[i] = x_prop
logp[i] = logp_prop
# Attempt swaps between adjacent replicas
for i in range(N_replicas - 1):
delta_beta = beta[i] - beta[i+1]
delta_logp = logp[i] - logp[i+1]
p_swap = np.min([1, np.exp(delta_beta * delta_logp)])
if np.random.rand() < p_swap:
# Swap x and logp
x[i], x[i+1] = x[i+1], x[i]
logp[i], logp[i+1] = logp[i+1], logp[i]
# Samples from base (coldest) chain: x[0] over sweeps (store during run for full chain)
import numpy as np
def log_target(x):
return np.log(0.5 * np.exp(-(x + 2)**2 / 2) + 0.5 * np.exp(-(x - 2)**2 / 2))
N_replicas = 10
n_sweeps = 1000
n_local_steps = 100
proposal_std = 0.5
# Temperatures T_i from 1 to 10, betas = 1/T
T = np.linspace(1, 10, N_replicas)
beta = 1 / T
# Initialize replicas x_i ~ uniform[-5,5]
x = np.random.uniform(-5, 5, N_replicas)
logp = np.array([log_target(xi) for xi in x]) # log π(x_i)
for sweep in range(n_sweeps):
# Local updates for each replica
for i in range(N_replicas):
for _ in range(n_local_steps):
x_prop = x[i] + np.random.normal(0, proposal_std)
logp_prop = log_target(x_prop)
delta = beta[i] * (logp_prop - logp[i])
if np.log(np.random.rand()) < delta:
x[i] = x_prop
logp[i] = logp_prop
# Attempt swaps between adjacent replicas
for i in range(N_replicas - 1):
delta_beta = beta[i] - beta[i+1]
delta_logp = logp[i] - logp[i+1]
p_swap = np.min([1, np.exp(delta_beta * delta_logp)])
if np.random.rand() < p_swap:
# Swap x and logp
x[i], x[i+1] = x[i+1], x[i]
logp[i], logp[i+1] = logp[i+1], logp[i]
# Samples from base (coldest) chain: x[0] over sweeps (store during run for full chain)
