Recent from talks
Nothing was collected or created yet.
Metropolis-adjusted Langevin algorithm
View on WikipediaIn computational statistics, the Metropolis-adjusted Langevin algorithm (MALA) or Langevin Monte Carlo (LMC) is a Markov chain Monte Carlo (MCMC) method for obtaining random samples – sequences of random observations – from a probability distribution for which direct sampling is difficult. As the name suggests, MALA uses a combination of two mechanisms to generate the states of a random walk that has the target probability distribution as an invariant measure:
- new states are proposed using (overdamped) Langevin dynamics, which use evaluations of the gradient of the target probability density function;
- these proposals are accepted or rejected using the Metropolis–Hastings algorithm, which uses evaluations of the target probability density (but not its gradient).
Informally, the Langevin dynamics drive the random walk towards regions of high probability in the manner of a gradient flow, while the Metropolis–Hastings accept/reject mechanism improves the mixing and convergence properties of this random walk. MALA was originally proposed by Julian Besag in 1994,[1] (although the method Smart Monte Carlo was already introduced in 1978 [2]) and its properties were examined in detail by Gareth Roberts together with Richard Tweedie[3] and Jeff Rosenthal.[4] Many variations and refinements have been introduced since then, e.g. the manifold variant of Girolami and Calderhead (2011).[5] The method is equivalent to using the Hamiltonian Monte Carlo (hybrid Monte Carlo) algorithm with only a single discrete time step.[5]
Further details
[edit]Let denote a probability density function on , one from which it is desired to draw an ensemble of independent and identically distributed samples. We consider the overdamped Langevin Itô diffusion
driven by the time derivative of a standard Brownian motion . (Note that another commonly used normalization for this diffusion is
which generates the same dynamics.) In the limit as , this probability distribution of approaches a stationary distribution, which is also invariant under the diffusion, which we denote . It turns out that, in fact, .
Approximate sample paths of the Langevin diffusion can be generated by many discrete-time methods. One of the simplest is the Euler–Maruyama method with a fixed time step . We set and then recursively define an approximation to the true solution by
where each is an independent draw from a multivariate normal distribution on with mean 0 and covariance matrix equal to the identity matrix. Note that is normally distributed with mean and covariance equal to times the identity matrix.
In contrast to the Euler–Maruyama method for simulating the Langevin diffusion, which always updates according to the update rule
MALA incorporates an additional step. We consider the above update rule as defining a proposal for a new state,
This proposal is accepted or rejected according to the Metropolis-Hastings algorithm: set
where
is the transition probability density from to (note that, in general ). Let be drawn from the continuous uniform distribution on the interval . If , then the proposal is accepted, and we set ; otherwise, the proposal is rejected, and we set .
The combined dynamics of the Langevin diffusion and the Metropolis–Hastings algorithm satisfy the detailed balance conditions necessary for the existence of a unique, invariant, stationary distribution . Compared to naive Metropolis–Hastings, MALA has the advantage that it usually proposes moves into regions of higher probability, which are then more likely to be accepted. On the other hand, when is strongly anisotropic (i.e. it varies much more quickly in some directions than others), it is necessary to take in order to properly capture the Langevin dynamics; the use of a positive-definite preconditioning matrix can help to alleviate this problem, by generating proposals according to
so that has mean and covariance .
For limited classes of target distributions, the optimal acceptance rate for this algorithm can be shown to be ; if it is discovered to be substantially different in practice, should be modified accordingly.[4]
References
[edit]- ^ J. Besag (1994). "Comments on "Representations of knowledge in complex systems" by U. Grenander and MI Miller". Journal of the Royal Statistical Society, Series B. 56: 591–592.
- ^ Rossky, P.J.; Doll, J.D.; Friedman, H.L. (1978). "Brownian Dynamics as smart Monte Carlo simulation". Journal of Chemical Physics. 69 (10): 4628. Bibcode:1978JChPh..69.4628R. doi:10.1063/1.436415.
- ^ G. O. Roberts and R. L. Tweedie (1996). "Exponential convergence of Langevin distributions and their discrete approximations". Bernoulli. 2 (4): 341–363. doi:10.2307/3318418. JSTOR 3318418.
- ^ a b G. O. Roberts and J. S. Rosenthal (1998). "Optimal scaling of discrete approximations to Langevin diffusions". Journal of the Royal Statistical Society, Series B. 60 (1): 255–268. doi:10.1111/1467-9868.00123. S2CID 5831882.
- ^ a b M. Girolami and B. Calderhead (2011). "Riemann manifold Langevin and Hamiltonian Monte Carlo methods". Journal of the Royal Statistical Society, Series B. 73 (2): 123–214. CiteSeerX 10.1.1.190.580. doi:10.1111/j.1467-9868.2010.00765.x.
Metropolis-adjusted Langevin algorithm
View on GrokipediaIntroduction
Overview
The Metropolis-adjusted Langevin algorithm (MALA) is a Markov chain Monte Carlo (MCMC) method that operates within the Metropolis-Hastings framework, utilizing proposals derived from the Euler-Maruyama discretization of the Langevin diffusion to sample from a target distribution .[4] This integration of Langevin dynamics provides a structured proposal mechanism informed by the target's gradient, while the Metropolis-Hastings adjustment ensures the resulting Markov chain is invariant to .[5] MALA plays a central role in statistical computing by generating independent samples from intricate distributions, which is essential for approximating integrals or expectations under , particularly in high-dimensional Bayesian posterior estimation where direct sampling is infeasible. Its design makes it suitable for applications requiring robust exploration of multimodal or non-Gaussian densities. The algorithm was first introduced by Julian Besag in 1994, in a discussion contribution on representations of knowledge in complex systems, with initial motivations tied to spatial statistics problems.[6] A primary strength of MALA lies in its use of gradient information from the log-density of , which allows for more directed and efficient traversal of the parameter space compared to isotropic random walk proposals in standard Metropolis-Hastings algorithms.Motivation
Sampling from complex target distributions π(x), particularly those that are multimodal or high-dimensional, poses significant challenges in fields such as Bayesian statistics and statistical physics, where direct sampling methods like rejection sampling become inefficient or intractable due to the curse of dimensionality and the presence of multiple modes separated by low-density regions. These difficulties often lead to poor exploration of the state space, with chains getting trapped in local modes, resulting in biased estimates and slow convergence.[7] The basic random walk Metropolis algorithm, a foundational Markov chain Monte Carlo (MCMC) method, exacerbates these issues by proposing symmetric, directionless steps from a Gaussian distribution centered at the current state, leading to slow mixing times especially in high dimensions.[8] Optimal scaling analysis reveals that its mixing time scales as O(d), where d is the dimension, with an optimal acceptance rate of approximately 0.234, highlighting the inefficiency of undirected exploration in complex landscapes.[8] The Metropolis-adjusted Langevin algorithm (MALA) addresses these limitations by building on the Metropolis-Hastings framework and incorporating the score function ∇ log π(x) to generate proposals that align with the gradient of the log-density, effectively guiding the chain toward regions of higher probability and reducing unnecessary random wandering.[4] This informed proposal mechanism enables more efficient traversal of multimodal structures, improving overall sampling performance. Empirical studies confirm MALA's advantages, showing substantially reduced integrated autocorrelation times relative to random walk Metropolis in moderate dimensions (e.g., 10–100), where the algorithm's optimal scaling yields an acceptance rate of about 0.574 and a mixing time of O(d^{1/3}).[9]Background
Langevin dynamics
The overdamped Langevin dynamics provide a continuous-time stochastic process that serves as the foundation for proposal mechanisms in algorithms like the Metropolis-adjusted Langevin algorithm (MALA). This process is modeled by the stochastic differential equation (SDE) where is the state at time , is the target probability density function, denotes the score function (gradient of the log-density), and is a standard -dimensional Brownian motion.[1] The drift term acts as a deterministic force that pushes the process toward regions of higher density under , effectively guiding the trajectory uphill in the log-probability landscape to concentrate sampling in probable areas. Meanwhile, the diffusion term introduces isotropic random perturbations, enabling exploration of the state space and preventing the process from getting trapped in local modes. This balance between directed movement and stochastic noise makes the dynamics suitable for approximating the target distribution over long times. The scaling factor of in the diffusion ensures the equilibrium variance aligns with the temperature of the target (assuming unit temperature), a convention rooted in statistical mechanics analogies.[1] Under mild regularity conditions, such as being twice continuously differentiable and for all , the overdamped Langevin process is ergodic and admits as its unique stationary (invariant) distribution. This invariance arises from the reversibility of the dynamics with respect to , verifiable through the Fokker-Planck equation associated with the SDE, which has as its steady-state solution. Additional growth conditions on , such as for some constant , ensure non-explosion of the process and global existence of solutions.[1] To generate discrete-time samples approximating draws from , the continuous dynamics are discretized using the Euler-Maruyama scheme with time step , yielding an update , where . This approximation introduces bias for finite , motivating the use of Metropolis-Hastings corrections to ensure exact invariance to in the discrete chain.[1]Metropolis-Hastings algorithm
The Metropolis-Hastings (MH) algorithm is a Markov chain Monte Carlo (MCMC) method that generates a sequence of random samples from a target probability distribution π, even when direct sampling is infeasible. Introduced as a generalization of the earlier Metropolis method, it allows for flexible, potentially asymmetric proposal distributions while ensuring the generated Markov chain has π as its invariant distribution. In the MH framework, starting from a current state , a candidate state is proposed according to a conditional distribution , which may depend on and can be asymmetric. The proposal is then accepted with probability and the chain moves to if accepted; otherwise, it remains at . This acceptance rule ensures the Markov chain is reversible, meaning the transition probabilities satisfy the detailed balance condition for all states and , which in turn guarantees that π is invariant under the chain's transitions. The purpose of this construction is to produce a Markov chain that converges to the target distribution π, allowing samples to approximate expectations under π regardless of the proposal's asymmetry, as long as is properly defined and irreducible transitions are possible. A key advantage is its independence from the normalization of the proposal and target densities: the algorithm relies only on the ratios and , so unnormalized forms of π suffice, making it applicable to complex distributions like posterior densities in Bayesian inference. In the context of the Metropolis-adjusted Langevin algorithm, the MH framework provides the acceptance mechanism to correct proposals derived from Langevin dynamics, ensuring detailed balance for the resulting chain.Algorithm description
Proposal mechanism
The proposal mechanism in the Metropolis-adjusted Langevin algorithm (MALA) generates candidate states through a discrete-time approximation of the continuous-time Langevin diffusion process, which serves as the underlying inspiration for informed sampling steps.[1] Specifically, starting from the current state , the proposed state is obtained via the Euler-Maruyama discretization scheme, yielding , where is a standard Gaussian random vector in dimensions, denotes the target density, and is the step size parameter.[1] This construction results in a Gaussian proposal distribution , which incorporates gradient information from the target log-density to bias proposals toward regions of higher probability density, thereby improving exploration efficiency compared to symmetric random-walk proposals.[1] The mean of the proposal shifts according to the local gradient direction, while the covariance introduces isotropic noise scaled by the step size, mimicking the diffusive behavior of the continuous process.[1] The proposal is inherently asymmetric, as the reverse proposal density centers at with the same covariance, depending on the gradient evaluated at the proposed state rather than the current one; this asymmetry necessitates a corrective acceptance probability in the Metropolis-Hastings framework to maintain detailed balance with respect to .[1] The step size plays a crucial role in balancing the approximation quality and sampling efficiency: small values of reduce discretization bias by closely tracking the continuous Langevin dynamics but lead to proposals with low variance and thus slower mixing, whereas larger increases variance for broader exploration at the cost of greater bias from the Euler approximation.[1]Acceptance step
In the Metropolis-adjusted Langevin algorithm (MALA), the acceptance step follows the standard Metropolis-Hastings (MH) framework to ensure that the Markov chain has the target distribution as its invariant measure. Given a current state and a proposed state generated from the Langevin-based proposal distribution , the proposal is accepted with probability \alpha(x, \tilde{x}) = \min\left\{1, \frac{\pi(\tilde{x}) q(x | \tilde{x})}{\pi(x) q(\tilde{x} | x)}\right\}\}. If accepted, the chain moves to \(\tilde{x}; otherwise, it remains at . This acceptance ratio enforces detailed balance, preserving exactly despite the proposal's approximation to the underlying Langevin diffusion.[4] Substituting the Gaussian form of the proposal distribution specific to MALA yields a more explicit expression for the acceptance probability. Assuming the discretization of the Langevin stochastic differential equation with step size and variance (to match the diffusion coefficient), the proposal densities are multivariate normal: and . The resulting acceptance probability simplifies to where the exponent arises from the ratio of the Gaussian densities combined with . This form highlights the correction terms that account for the asymmetry in the proposals due to the gradient information.[4][10] The MH adjustment in this step is crucial for correcting the discretization error inherent in the Euler-Maruyama approximation of the continuous Langevin dynamics, ensuring that the discrete chain remains invariant to regardless of the step size . Without this acceptance mechanism, the unadjusted Langevin algorithm would introduce bias and fail to converge to the exact target distribution. Computationally, evaluating requires assessing and at both and , doubling the cost of the proposal generation per iteration compared to simpler MCMC methods like random walk Metropolis.[4]Theoretical properties
Optimal scaling
The optimal scaling of the Metropolis-adjusted Langevin algorithm (MALA) addresses the choice of the step size parameter to maximize sampling efficiency in high-dimensional settings. In the asymptotic regime as the dimension , Roberts and Rosenthal (1998) established that the optimal scales as , where is a target-dependent constant that tunes the proposal variance to achieve an asymptotic acceptance rate of approximately 0.574. This scaling ensures that the discrete-time proposals approximate the underlying continuous Langevin diffusion while maintaining detailed balance through the Metropolis adjustment.[11] The efficiency of MALA under optimal scaling is quantified by the speed measure, which minimizes the asymptotic variance of estimators derived from the Markov chain. This involves balancing the trade-off between the average acceptance probability (which decreases with larger ) and the proposal step size (which increases the displacement per accepted move). By maximizing this speed, the algorithm achieves an optimal asymptotic variance that scales as , reflecting improved performance over random-walk Metropolis methods. In the high-dimensional limit, the optimally scaled MALA chain converges weakly to a Langevin diffusion process with a tuned diffusion constant determined by and the acceptance rate. This diffusion limit provides a continuous-time approximation of the discrete chain's behavior, facilitating theoretical analysis of its ergodicity and mixing properties. In practical implementations, the theoretical acceptance rate of 0.574 serves as a guideline, with empirical tuning often targeting rates between 50% and 60% to balance efficiency across finite dimensions and non-i.i.d. targets.[12]Convergence analysis
The Metropolis-adjusted Langevin algorithm (MALA) generates a Markov chain that is ergodic under standard conditions for Metropolis-Hastings algorithms, ensuring convergence in total variation distance to the target distribution . Specifically, the chain is Harris recurrent and positive Harris, hence ergodic, provided the proposal distribution is absolutely continuous with respect to Lebesgue measure (due to the Gaussian noise in the Langevin proposal) and the acceptance probability is positive with probability one, which guarantees irreducibility and aperiodicity. Geometric ergodicity of MALA, which implies exponential convergence rates in appropriate norms, holds under tail conditions on that ensure the drift term dominates appropriately. For distributions in the class with (corresponding to sub-exponential tails), MALA is -uniformly geometrically ergodic for a suitable Lyapunov function with , where is the discretization step size.[1] In contrast, for strongly log-concave (where the negative Hessian of is positive definite with condition number ), MALA exhibits geometric ergodicity with mixing time in dimensions from a warm start, enabling efficient sampling via exponential decay of the total variation distance to stationarity.[13] However, for lighter-tailed distributions such as Gaussians () or super-Gaussian tails (), geometric ergodicity fails, and convergence occurs at subgeometric rates, such as polynomial in some cases, depending on the step size and tail behavior.[1] Under geometric ergodicity, MALA satisfies a central limit theorem for ergodic averages, establishing asymptotic normality of Monte Carlo estimators for bounded functions with . Specifically, as , where the asymptotic variance is finite and given by , depending on the integrated autocorrelation time of the chain. This result holds uniformly over starting points under the -uniform ergodicity conditions, providing error bounds of order for posterior means or expectations in Bayesian settings.[1] The proposals in MALA arise from the Euler-Maruyama discretization of the overdamped Langevin diffusion , which converges weakly to the continuous-time process as the step size . While the unadjusted discretization introduces bias for finite , the Metropolis-Hastings correction ensures the invariant distribution remains exactly regardless of , preserving convergence guarantees without requiring infinitesimal steps.[1]Implementation
Pseudocode
The Metropolis-adjusted Langevin algorithm (MALA) generates a Markov chain targeting a distribution by iteratively proposing states via an Euler-Maruyama discretization of the underlying Langevin diffusion and correcting for discretization bias using a Metropolis-Hastings acceptance step. The process begins by initializing the current state from or an appropriate prior, then proceeds for iterations: at each step , compute the proposal using the current state and the gradient of the log-target density, evaluate the acceptance probability , and set with probability or retain otherwise, appending each to the chain.[14] The following pseudocode implements vanilla MALA in -dimensions, assuming access to the unnormalized log-density and its gradient , with step size and Gaussian noise drawn from . For numerical stability, evaluations of are performed in log-space, and the acceptance ratio is computed using the explicit quadratic form derived from the asymmetric proposal densities to avoid direct density evaluations where possible.[14][15]Algorithm MALA(ℓ = log π, ∇ℓ = ∇ log π, τ > 0, N ∈ ℕ, x₀ ∈ ℝᵈ):
x ← x₀
chain ← [x]
for k = 1 to N do:
μ ← x + τ ⋅ ∇ℓ(x) // Proposal mean (drift-adjusted)
̃x ← μ + √(2τ) ⋅ Z, where Z ∼ 𝒩(0, I_d) // Proposal via Euler step
ν ← ̃x + τ ⋅ ∇ℓ(̃x) // Reverse proposal mean
ℓ_x ← ℓ(x)
ℓ_̃x ← ℓ(̃x)
δ₁ ← ‖̃x - μ‖² // ‖̃x - x - τ ∇ℓ(x)‖²
δ₂ ← ‖x - ν‖² // ‖x - ̃x - τ ∇ℓ(̃x)‖²
log α ← ℓ_̃x - ℓ_x + (δ₂ - δ₁) / (4τ) // Log acceptance ratio
α ← min{1, exp(log α)}
if U ∼ Uniform(0,1) < α then:
x ← ̃x
chain.append(x)
return chain
Algorithm MALA(ℓ = log π, ∇ℓ = ∇ log π, τ > 0, N ∈ ℕ, x₀ ∈ ℝᵈ):
x ← x₀
chain ← [x]
for k = 1 to N do:
μ ← x + τ ⋅ ∇ℓ(x) // Proposal mean (drift-adjusted)
̃x ← μ + √(2τ) ⋅ Z, where Z ∼ 𝒩(0, I_d) // Proposal via Euler step
ν ← ̃x + τ ⋅ ∇ℓ(̃x) // Reverse proposal mean
ℓ_x ← ℓ(x)
ℓ_̃x ← ℓ(̃x)
δ₁ ← ‖̃x - μ‖² // ‖̃x - x - τ ∇ℓ(x)‖²
δ₂ ← ‖x - ν‖² // ‖x - ̃x - τ ∇ℓ(̃x)‖²
log α ← ℓ_̃x - ℓ_x + (δ₂ - δ₁) / (4τ) // Log acceptance ratio
α ← min{1, exp(log α)}
if U ∼ Uniform(0,1) < α then:
x ← ̃x
chain.append(x)
return chain
