Hubbry Logo
Particle filterParticle filterMain
Open search
Particle filter
Community hub
Particle filter
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Particle filter
Particle filter
from Wikipedia

Particle filters, also known as sequential Monte Carlo methods, are a set of Monte Carlo algorithms used to find approximate solutions for filtering problems for nonlinear state-space systems, such as signal processing and Bayesian statistical inference.[1] The filtering problem consists of estimating the internal states in dynamical systems when partial observations are made and random perturbations are present in the sensors as well as in the dynamical system. The objective is to compute the posterior distributions of the states of a Markov process, given the noisy and partial observations. The term "particle filters" was first coined in 1996 by Pierre Del Moral about mean-field interacting particle methods used in fluid mechanics since the beginning of the 1960s.[2] The term "Sequential Monte Carlo" was coined by Jun S. Liu and Rong Chen in 1998.[3]

Particle filtering uses a set of particles (also called samples) to represent the posterior distribution of a stochastic process given the noisy and/or partial observations. The state-space model can be nonlinear and the initial state and noise distributions can take any form required. Particle filter techniques provide a well-established methodology[2][4][5] for generating samples from the required distribution without requiring assumptions about the state-space model or the state distributions. However, these methods do not perform well when applied to very high-dimensional systems.

Particle filters update their prediction in an approximate (statistical) manner. The samples from the distribution are represented by a set of particles; each particle has a likelihood weight assigned to it that represents the probability of that particle being sampled from the probability density function. Weight disparity leading to weight collapse is a common issue encountered in these filtering algorithms. However, it can be mitigated by including a resampling step before the weights become uneven. Several adaptive resampling criteria can be used including the variance of the weights and the relative entropy concerning the uniform distribution.[6] In the resampling step, the particles with negligible weights are replaced by new particles in the proximity of the particles with higher weights.

From the statistical and probabilistic point of view, particle filters may be interpreted as mean-field particle interpretations of Feynman-Kac probability measures.[7][8][9][10][11] These particle integration techniques were developed in molecular chemistry and computational physics by Theodore E. Harris and Herman Kahn in 1951, Marshall N. Rosenbluth and Arianna W. Rosenbluth in 1955,[12] and more recently by Jack H. Hetherington in 1984.[13] In computational physics, these Feynman-Kac type path particle integration methods are also used in Quantum Monte Carlo, and more specifically Diffusion Monte Carlo methods.[14][15][16] Feynman-Kac interacting particle methods are also strongly related to mutation-selection genetic algorithms currently used in evolutionary computation to solve complex optimization problems.

The particle filter methodology is used to solve Hidden Markov Model (HMM) and nonlinear filtering problems. With the notable exception of linear-Gaussian signal-observation models (Kalman filter) or wider classes of models (Benes filter[17]), Mireille Chaleyat-Maurel and Dominique Michel proved in 1984 that the sequence of posterior distributions of the random states of a signal, given the observations (a.k.a. optimal filter), has no finite recursion.[18] Various other numerical methods based on fixed grid approximations, Markov Chain Monte Carlo techniques, conventional linearization, extended Kalman filters, or determining the best linear system (in the expected cost-error sense) are unable to cope with large-scale systems, unstable processes, or insufficiently smooth nonlinearities.

Particle filters and Feynman-Kac particle methodologies find application in signal and image processing, Bayesian inference, machine learning, risk analysis and rare event sampling, engineering and robotics, artificial intelligence, bioinformatics,[19] phylogenetics, computational science, economics and mathematical finance, molecular chemistry, computational physics, pharmacokinetics, quantitative risk and insurance[20][21] and other fields.

History

[edit]

Heuristic-like algorithms

[edit]

From a statistical and probabilistic viewpoint, particle filters belong to the class of branching/genetic type algorithms, and mean-field type interacting particle methodologies. The interpretation of these particle methods depends on the scientific discipline. In Evolutionary Computing, mean-field genetic type particle methodologies are often used as heuristic and natural search algorithms (a.k.a. Metaheuristic). In computational physics and molecular chemistry, they are used to solve Feynman-Kac path integration problems or to compute Boltzmann-Gibbs measures, top eigenvalues, and ground states of Schrödinger operators. In Biology and Genetics, they represent the evolution of a population of individuals or genes in some environment.

The origins of mean-field type evolutionary computational techniques can be traced back to 1950 and 1954 with Alan Turing's work on genetic type mutation-selection learning machines[22] and the articles by Nils Aall Barricelli at the Institute for Advanced Study in Princeton, New Jersey.[23][24] The first trace of particle filters in statistical methodology dates back to the mid-1950s; the 'Poor Man's Monte Carlo',[25] that was proposed by John Hammersley et al., in 1954, contained hints of the genetic type particle filtering methods used today. In 1963, Nils Aall Barricelli simulated a genetic type algorithm to mimic the ability of individuals to play a simple game.[26] In evolutionary computing literature, genetic-type mutation-selection algorithms became popular through the seminal work of John Holland in the early 1970s, particularly his book[27] published in 1975.

In Biology and Genetics, the Australian geneticist Alex Fraser also published in 1957 a series of papers on the genetic type simulation of artificial selection of organisms.[28] The computer simulation of the evolution by biologists became more common in the early 1960s, and the methods were described in books by Fraser and Burnell (1970)[29] and Crosby (1973).[30] Fraser's simulations included all of the essential elements of modern mutation-selection genetic particle algorithms.

From the mathematical viewpoint, the conditional distribution of the random states of a signal given some partial and noisy observations is described by a Feynman-Kac probability on the random trajectories of the signal weighted by a sequence of likelihood potential functions.[7][8] Quantum Monte Carlo, and more specifically Diffusion Monte Carlo methods can also be interpreted as a mean-field genetic type particle approximation of Feynman-Kac path integrals.[7][8][9][13][14][31][32] The origins of Quantum Monte Carlo methods are often attributed to Enrico Fermi and Robert Richtmyer who developed in 1948 a mean-field particle interpretation of neutron chain reactions,[33] but the first heuristic-like and genetic type particle algorithm (a.k.a. Resampled or Reconfiguration Monte Carlo methods) for estimating ground state energies of quantum systems (in reduced matrix models) is due to Jack H. Hetherington in 1984.[13] One can also quote the earlier seminal works of Theodore E. Harris and Herman Kahn in particle physics, published in 1951, using mean-field but heuristic-like genetic methods for estimating particle transmission energies.[34] In molecular chemistry, the use of genetic heuristic-like particle methodologies (a.k.a. pruning and enrichment strategies) can be traced back to 1955 with the seminal work of Marshall N. Rosenbluth and Arianna W. Rosenbluth.[12]

The use of genetic particle algorithms in advanced signal processing and Bayesian inference is more recent. In January 1993, Genshiro Kitagawa developed a "Monte Carlo filter",[35] a slightly modified version of this article appeared in 1996.[36] In April 1993, Neil J. Gordon et al., published in their seminal work[37] an application of genetic type algorithm in Bayesian statistical inference. The authors named their algorithm 'the bootstrap filter', and demonstrated that compared to other filtering methods, their bootstrap algorithm does not require any assumption about that state space or the noise of the system. Independently, the ones by Pierre Del Moral[2] and Himilcon Carvalho, Pierre Del Moral, André Monin, and Gérard Salut[38] on particle filters published in the mid-1990s. Particle filters were also developed in signal processing in early 1989-1992 by P. Del Moral, J.C. Noyer, G. Rigal, and G. Salut in the LAAS-CNRS in a series of restricted and classified research reports with STCAN (Service Technique des Constructions et Armes Navales), the IT company DIGILOG, and the LAAS-CNRS (the Laboratory for Analysis and Architecture of Systems) on RADAR/SONAR and GPS signal processing problems.[39][40][41][42][43][44]

Mathematical foundations

[edit]

From 1950 to 1996, all the publications on particle filters, and genetic algorithms, including the pruning and resample Monte Carlo methods introduced in computational physics and molecular chemistry, present natural and heuristic-like algorithms applied to different situations without a single proof of their consistency, nor a discussion on the bias of the estimates and genealogical and ancestral tree-based algorithms.

The mathematical foundations and the first rigorous analysis of these particle algorithms are due to Pierre Del Moral[2][4] in 1996. The article[2] also contains proof of the unbiased properties of a particle approximation of likelihood functions and unnormalized conditional probability measures. The unbiased particle estimator of the likelihood functions presented in this article is used today in Bayesian statistical inference.

Dan Crisan, Jessica Gaines, and Terry Lyons,[45][46][47] as well as Pierre Del Moral, and Terry Lyons,[48] created branching-type particle techniques with various population sizes around the end of the 1990s. P. Del Moral, A. Guionnet, and L. Miclo[8][49][50] made more advances in this subject in 2000. Pierre Del Moral and Alice Guionnet[51] proved the first central limit theorems in 1999, and Pierre Del Moral and Laurent Miclo[8] proved them in 2000. The first uniform convergence results concerning the time parameter for particle filters were developed at the end of the 1990s by Pierre Del Moral and Alice Guionnet.[49][50] The first rigorous analysis of genealogical tree-based particle filter smoothers is due to P. Del Moral and L. Miclo in 2001[52]

The theory on Feynman-Kac particle methodologies and related particle filter algorithms was developed in 2000 and 2004 in the books.[8][5] These abstract probabilistic models encapsulate genetic type algorithms, particle, and bootstrap filters, interacting Kalman filters (a.k.a. Rao–Blackwellized particle filter[53]), importance sampling and resampling style particle filter techniques, including genealogical tree-based and particle backward methodologies for solving filtering and smoothing problems. Other classes of particle filtering methodologies include genealogical tree-based models,[10][5][54] backward Markov particle models,[10][55] adaptive mean-field particle models,[6] island-type particle models,[56][57] particle Markov chain Monte Carlo methodologies,[58][59] Sequential Monte Carlo samplers [60][61][62] and Sequential Monte Carlo Approximate Bayesian Computation methods[63] and Sequential Monte Carlo ABC based Bayesian Bootstrap.[64]

The filtering problem

[edit]

Objective

[edit]

A particle filter's goal is to estimate the posterior density of state variables given observation variables. The particle filter is intended for use with a hidden Markov Model, in which the system includes both hidden and observable variables. The observable variables (observation process) are linked to the hidden variables (state-process) via a known functional form. Similarly, the probabilistic description of the dynamical system defining the evolution of the state variables is known.

A generic particle filter estimates the posterior distribution of the hidden states using the observation measurement process. With respect to a state-space such as the one below:

the filtering problem is to estimate sequentially the values of the hidden states , given the values of the observation process at any time step k.

All Bayesian estimates of follow from the posterior density . The particle filter methodology provides an approximation of these conditional probabilities using the empirical measure associated with a genetic type particle algorithm. In contrast, the Markov Chain Monte Carlo or importance sampling approach would model the full posterior .

The Signal-Observation model

[edit]

Particle methods often assume and the observations can be modeled in this form:

  • is a Markov process on (for some ) that evolves according to the transition probability density . This model is also often written in a synthetic way as
with an initial probability density .
  • The observations take values in some state space on (for some ) and are conditionally independent provided that are known. In other words, each only depends on . In addition, we assume conditional distribution for given are absolutely continuous, and in a synthetic way we have

An example of system with these properties is:

where both and are mutually independent sequences with known probability density functions and g and h are known functions. These two equations can be viewed as state space equations and look similar to the state space equations for the Kalman filter. If the functions g and h in the above example are linear, and if both and are Gaussian, the Kalman filter finds the exact Bayesian filtering distribution. If not, Kalman filter-based methods are a first-order approximation (EKF) or a second-order approximation (UKF in general, but if the probability distribution is Gaussian a third-order approximation is possible).

The assumption that the initial distribution and the transitions of the Markov chain are continuous for the Lebesgue measure can be relaxed. To design a particle filter we simply need to assume that we can sample the transitions of the Markov chain and to compute the likelihood function (see for instance the genetic selection mutation description of the particle filter given below). The continuous assumption on the Markov transitions of is only used to derive in an informal (and rather abusive) way different formulae between posterior distributions using the Bayes' rule for conditional densities.

Approximate Bayesian computation models

[edit]

In certain problems, the conditional distribution of observations, given the random states of the signal, may fail to have a density; the latter may be impossible or too complex to compute.[19] In this situation, an additional level of approximation is necessitated. One strategy is to replace the signal by the Markov chain and to introduce a virtual observation of the form

for some sequence of independent random variables with known probability density functions. The central idea is to observe that

The particle filter associated with the Markov process given the partial observations is defined in terms of particles evolving in with a likelihood function given with some obvious abusive notation by . These probabilistic techniques are closely related to Approximate Bayesian Computation (ABC). In the context of particle filters, these ABC particle filtering techniques were introduced in 1998 by P. Del Moral, J. Jacod and P. Protter.[65] They were further developed by P. Del Moral, A. Doucet and A. Jasra.[66][67]

The nonlinear filtering equation

[edit]

Bayes' rule for conditional probability gives:

where

Particle filters are also an approximation, but with enough particles they can be much more accurate.[2][4][5][49][50] The nonlinear filtering equation is given by the recursion

with the convention for k = 0. The nonlinear filtering problem consists in computing these conditional distributions sequentially.

Feynman-Kac formulation

[edit]

We fix a time horizon n and a sequence of observations , and for each k = 0, ..., n we set:

In this notation, for any bounded function F on the set of trajectories of from the origin k = 0 up to time k = n, we have the Feynman-Kac formula

Feynman-Kac path integration models arise in a variety of scientific disciplines, including in computational physics, biology, information theory and computer sciences.[8][10][5] Their interpretations are dependent on the application domain. For instance, if we choose the indicator function of some subset of the state space, they represent the conditional distribution of a Markov chain given it stays in a given tube; that is, we have:

and

as soon as the normalizing constant is strictly positive.

Particle filters

[edit]

A Genetic type particle algorithm

[edit]

Initially, such an algorithm starts with N independent random variables with common probability density . The genetic algorithm selection-mutation transitions[2][4]

mimic/approximate the updating-prediction transitions of the optimal filter evolution (Eq. 1):

  • During the selection-updating transition we sample N (conditionally) independent random variables with common (conditional) distribution

where stands for the Dirac measure at a given state a.

  • During the mutation-prediction transition, from each selected particle we sample independently a transition

In the above displayed formulae stands for the likelihood function evaluated at , and stands for the conditional density evaluated at .

At each time k, we have the particle approximations

and

In Genetic algorithms and Evolutionary computing community, the mutation-selection Markov chain described above is often called the genetic algorithm with proportional selection. Several branching variants, including with random population sizes have also been proposed in the articles.[5][45][48]

Particle methods, like all sampling-based approaches (e.g., Markov Chain Monte Carlo), generate a set of samples that approximate the filtering density

For example, we may have N samples from the approximate posterior distribution of , where the samples are labeled with superscripts as:

Then, expectations with respect to the filtering distribution are approximated by

with

where stands for the Dirac measure at a given state a. The function f, in the usual way for Monte Carlo, can give all the moments etc. of the distribution up to some approximation error. When the approximation equation (Eq. 2) is satisfied for any bounded function f we write

Particle filters can be interpreted as a genetic type particle algorithm evolving with mutation and selection transitions. We can keep track of the ancestral lines

of the particles . The random states , with the lower indices l=0,...,k, stands for the ancestor of the individual at level l=0,...,k. In this situation, we have the approximation formula

with the empirical measure

Here F stands for any founded function on the path space of the signal. In a more synthetic form (Eq. 3) is equivalent to

Particle filters can be interpreted in many different ways. From the probabilistic point of view they coincide with a mean-field particle interpretation of the nonlinear filtering equation. The updating-prediction transitions of the optimal filter evolution can also be interpreted as the classical genetic type selection-mutation transitions of individuals. The sequential importance resampling technique provides another interpretation of the filtering transitions coupling importance sampling with the bootstrap resampling step. Last, but not least, particle filters can be seen as an acceptance-rejection methodology equipped with a recycling mechanism.[10][5]

The general probabilistic principle

[edit]

The nonlinear filtering evolution can be interpreted as a dynamical system in the set of probability measures of the form where stands for some mapping from the set of probability distribution into itself. For instance, the evolution of the one-step optimal predictor

satisfies a nonlinear evolution starting with the probability distribution . One of the simplest ways to approximate these probability measures is to start with N independent random variables with common probability distribution . Suppose we have defined a sequence of N random variables such that

At the next step we sample N (conditionally) independent random variables with common law .

A particle interpretation of the filtering equation

[edit]

We illustrate this mean-field particle principle in the context of the evolution of the one step optimal predictors

For k = 0 we use the convention .

By the law of large numbers, we have

in the sense that

for any bounded function . We further assume that we have constructed a sequence of particles at some rank k such that

in the sense that for any bounded function we have

In this situation, replacing by the empirical measure in the evolution equation of the one-step optimal filter stated in (Eq. 4) we find that

Notice that the right hand side in the above formula is a weighted probability mixture

where stands for the density evaluated at , and stands for the density evaluated at for

Then, we sample N independent random variable with common probability density so that

Iterating this procedure, we design a Markov chain such that

Notice that the optimal filter is approximated at each time step k using the Bayes' formulae

The terminology "mean-field approximation" comes from the fact that we replace at each time step the probability measure by the empirical approximation . The mean-field particle approximation of the filtering problem is far from being unique. Several strategies are developed in the books.[10][5]

Some convergence results

[edit]

The analysis of the convergence of particle filters was started in 1996[2][4] and in 2000 in the book[8] and the series of articles.[48][49][50][51][52][68][69] More recent developments can be found in the books,[10][5] When the filtering equation is stable (in the sense that it corrects any erroneous initial condition), the bias and the variance of the particle particle estimates

are controlled by the non asymptotic uniform estimates

for any function f bounded by 1, and for some finite constants In addition, for any :

for some finite constants related to the asymptotic bias and variance of the particle estimate, and some finite constant c. The same results are satisfied if we replace the one step optimal predictor by the optimal filter approximation.

Genealogical trees and Unbiasedness properties

[edit]

Genealogical tree based particle smoothing

[edit]

Tracing back in time the ancestral lines

of the individuals and at every time step k, we also have the particle approximations

These empirical approximations are equivalent to the particle integral approximations

for any bounded function F on the random trajectories of the signal. As shown in[54] the evolution of the genealogical tree coincides with a mean-field particle interpretation of the evolution equations associated with the posterior densities of the signal trajectories. For more details on these path space models, we refer to the books.[10][5]

Unbiased particle estimates of likelihood functions

[edit]

We use the product formula

with

and the conventions and for k = 0. Replacing by the empirical approximation

in the above displayed formula, we design the following unbiased particle approximation of the likelihood function

with

where stands for the density evaluated at . The design of this particle estimate and the unbiasedness property has been proved in 1996 in the article.[2] Refined variance estimates can be found in[5] and.[10]

Backward particle smoothers

[edit]

Using Bayes' rule, we have the formula

Notice that

This implies that

Replacing the one-step optimal predictors by the particle empirical measures

we find that

We conclude that

with the backward particle approximation

The probability measure

is the probability of the random paths of a Markov chain running backward in time from time k=n to time k=0, and evolving at each time step k in the state space associated with the population of particles

  • Initially (at time k=n) the chain chooses randomly a state with the distribution
  • From time k to the time (k-1), the chain starting at some state for some at time k moves at time (k-1) to a random state chosen with the discrete weighted probability

In the above displayed formula, stands for the conditional distribution evaluated at . In the same vein, and stand for the conditional densities and evaluated at and These models allows to reduce integration with respect to the densities in terms of matrix operations with respect to the Markov transitions of the chain described above.[55] For instance, for any function we have the particle estimates

where

This also shows that if

then

Particle smoothing can also be achieved in a single online pass through a fixed-lag approximation[70].

Some convergence results

[edit]

We shall assume that filtering equation is stable, in the sense that it corrects any erroneous initial condition.

In this situation, the particle approximations of the likelihood functions are unbiased and the relative variance is controlled by

for some finite constant c. In addition, for any :

for some finite constants related to the asymptotic bias and variance of the particle estimate, and for some finite constant c.

The bias and the variance of the particle particle estimates based on the ancestral lines of the genealogical trees

are controlled by the non asymptotic uniform estimates

for any function F bounded by 1, and for some finite constants In addition, for any :

for some finite constants related to the asymptotic bias and variance of the particle estimate, and for some finite constant c. The same type of bias and variance estimates hold for the backward particle smoothers. For additive functionals of the form

with

with functions bounded by 1, we have

and

for some finite constants More refined estimates including exponentially small probability of errors are developed in.[10]

Sequential Importance Resampling (SIR)

[edit]

Monte Carlo filter and bootstrap filter

[edit]

Sequential importance Resampling (SIR), Monte Carlo filtering (Kitagawa 1993[35]), bootstrap filtering algorithm (Gordon et al. 1993[37]) and single distribution resampling (Bejuri W.M.Y.B et al. 2017[71]), are also commonly applied filtering algorithms, which approximate the filtering probability density by a weighted set of N samples

The importance weights are approximations to the relative posterior probabilities (or densities) of the samples such that

Sequential importance sampling (SIS) is a sequential (i.e., recursive) version of importance sampling. As in importance sampling, the expectation of a function f can be approximated as a weighted average

For a finite set of samples, the algorithm performance is dependent on the choice of the proposal distribution

.

The "optimal" proposal distribution is given as the target distribution

This particular choice of proposal transition has been proposed by P. Del Moral in 1996 and 1998.[4] When it is difficult to sample transitions according to the distribution one natural strategy is to use the following particle approximation

with the empirical approximation

associated with N (or any other large number of samples) independent random samples with the conditional distribution of the random state given . The consistency of the resulting particle filter of this approximation and other extensions are developed in.[4] In the above display stands for the Dirac measure at a given state a.

However, the transition prior probability distribution is often used as importance function, since it is easier to draw particles (or samples) and perform subsequent importance weight calculations:

Sequential Importance Resampling (SIR) filters with transition prior probability distribution as importance function are commonly known as bootstrap filter and condensation algorithm.

Resampling is used to avoid the problem of the degeneracy of the algorithm, that is, avoiding the situation that all but one of the importance weights are close to zero. The performance of the algorithm can be also affected by proper choice of resampling method. The stratified sampling proposed by Kitagawa (1993[35]) is optimal in terms of variance.

A single step of sequential importance resampling is as follows:

1) For draw samples from the proposal distribution
2) For update the importance weights up to a normalizing constant:
Note that when we use the transition prior probability distribution as the importance function,
this simplifies to the following :
3) For compute the normalized importance weights:
4) Compute an estimate of the effective number of particles as
This criterion reflects the variance of the weights. Other criteria can be found in the article,[6] including their rigorous analysis and central limit theorems.
5) If the effective number of particles is less than a given threshold , then perform resampling:
a) Draw N particles from the current particle set with probabilities proportional to their weights. Replace the current particle set with this new one.
b) For set

The term "Sampling Importance Resampling" is also sometimes used when referring to SIR filters, but the term Importance Resampling is more accurate because the word "resampling" implies that the initial sampling has already been done.[72]

Sequential importance sampling (SIS)

[edit]

Sequential importance sampling (SIS) is the same as the SIR algorithm but without the resampling stage. This version often exhibits particle weight collapse, where all the probability gets concentrated on one or two particles, and the rest of the particle weights correspond to very small probability. The introduction of resampling alleviates this problem.

"Direct version" algorithm

[edit]

The "direct version" algorithm [citation needed] is rather simple (compared to other particle filtering algorithms) and it uses composition and rejection. To generate a single sample x at k from :

1) Set n = 0 (This will count the number of particles generated so far)
2) Uniformly choose an index i from the range
3) Generate a test from the distribution with
4) Generate the probability of using from where is the measured value
5) Generate another uniform u from where
6) Compare u and
6a) If u is larger then repeat from step 2
6b) If u is smaller then save as and increment n
7) If n == N then quit

The goal is to generate P "particles" at k using only the particles from . This requires that a Markov equation can be written (and computed) to generate a based only upon . This algorithm uses the composition of the P particles from to generate a particle at k and repeats (steps 2–6) until P particles are generated at k.

This can be more easily visualized if x is viewed as a two-dimensional array. One dimension is k and the other dimension is the particle number. For example, would be the ith particle at and can also be written (as done above in the algorithm). Step 3 generates a potential based on a randomly chosen particle () at time and rejects or accepts it in step 6. In other words, the values are generated using the previously generated .

Applications

[edit]

Particle filters and Feynman-Kac particle methodologies find application in several contexts, as an effective mean for tackling noisy observations or strong nonlinearities, such as:

Other particle filters

[edit]

See also

[edit]

References

[edit]

Bibliography

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A particle filter, also known as a sequential (SMC) method, is a Bayesian technique that approximates the density function of the hidden state of a dynamic system using a set of weighted random samples, or "particles," particularly suited for nonlinear and non-Gaussian models where traditional methods like the fail. Introduced in seminal work by Gordon, Salmond, and Smith in 1993, the bootstrap filter—a foundational particle filter —employs and resampling to recursively update particle weights based on new observations, enabling robust state in complex scenarios. The core principle involves propagating particles through the system's state transition model, adjusting their weights via the likelihood of observations, and resampling to mitigate degeneracy, thereby providing a to intractable integrals in Bayesian filtering. Particle filters have evolved significantly since their inception, with variants like the auxiliary particle filter and unscented particle filter addressing challenges such as particle impoverishment and computational efficiency in high-dimensional spaces. They excel in handling multimodal posteriors and model uncertainties, offering asymptotic consistency and unbiased estimates under sufficient particle numbers. Key applications span diverse fields, including for (SLAM), target tracking in , for volatility estimation, geosciences for , and for fault detection in processes. Despite their flexibility, particle filters can suffer from the curse of dimensionality, prompting ongoing research into hybrid approaches combining them with Gaussian approximations or for enhanced scalability.

History

Early heuristic approaches

Early heuristic approaches to particle filtering emerged from practical needs in physics and , where probabilistic simulations were used to approximate complex systems without rigorous theoretical backing. In the 1950s and 1960s, methods were developed at to model and other nuclear processes, simulating thousands of individual particle paths to estimate aggregate behaviors in reactors and weapons design. These techniques relied on random sampling to propagate particle trajectories heuristically, providing approximate solutions to high-dimensional integration problems in radiation shielding and fission chain reactions. By the 1970s, similar particle-based approximations extended to simulations, such as reliability in complex systems, where particle ensembles approximated failure probabilities under uncertainty. In parallel, the 1970s saw the introduction of genetic algorithms as another class of optimizers that influenced early filtering ideas. John Holland's foundational work framed these algorithms as evolutionary processes, using , selection, and crossover to evolve populations of candidate solutions toward optimal states. Applied to problems, genetic algorithms treated state variables as "genes" in a population, iteratively refining approximations to hidden system parameters through survival-of-the-fittest mechanics, as demonstrated in early applications for dynamic . These methods provided ad-hoc ways to handle nonlinear without assuming , foreshadowing population-based filtering strategies. A pivotal bridge to more structured approaches came in 1993 with the bootstrap filter proposed by Gordon, Salmond, and , which adapted sampling for sequential state estimation in nonlinear, non-Gaussian settings. This algorithm represented the posterior density via a set of weighted particles propagated through and update steps, offering a practical for tracking targets in signals. Building on this, Isard and Blake's 1996 condensation algorithm applied particle propagation specifically to visual tracking, using conditional to handle occlusions and clutter in video sequences. These developments marked informal precursors that paved the way for rigorous probabilistic formulations in subsequent decades.

Mathematical and probabilistic foundations

The mathematical and probabilistic foundations of particle filters trace back to seminal developments in stochastic processes and measure theory, providing the rigorous framework for approximating complex filtering distributions through interacting particle systems. A key cornerstone is the Feynman-Kac formula, introduced by in 1947, which establishes a probabilistic representation for solutions to certain parabolic partial differential equations via expectations over paths. This formula laid the groundwork for path integral methods in and probability, later extended to sequential contexts. In 1996, Pierre Del Moral applied Feynman-Kac formulae to nonlinear filtering problems, demonstrating how branching and interacting particle systems could numerically approximate these solutions in state estimation tasks. Further theoretical support emerged from the study of interacting particle systems, pioneered by Henry P. McKean in , who analyzed a class of Markov processes linked to nonlinear parabolic equations and established their mean-field limits. McKean's work introduced the concept of propagation of chaos, wherein the empirical distribution of a large number of weakly interacting particles converges to a deterministic nonlinear , providing a limiting regime essential for justifying particle approximations in high-dimensional spaces. These mean-field limits ensure that particle systems behave predictably as the number of particles increases, bridging microscopic interactions to macroscopic probabilistic descriptions. Pivotal contributions included K. R. Parthasarathy's 1967 work on probability measures on metric spaces, which formalized the dynamics relevant to evolutions under branching and selection, influencing later particle models for measure approximations. Building on this, in the 1990s, Pierre Del Moral and Alice Guionnet developed genetic-type algorithms within interacting particle frameworks, proving stability and convergence properties for these systems in filtering applications through large deviation principles. Their analyses highlighted how selection and mutation mechanisms in particle s mimic evolutionary processes to maintain diversity and accuracy in approximations. A significant timeline event occurred in 1997 with Dan Crisan's work on measure-valued processes, which connected nonlinear filtering equations to superprocesses and demonstrated the convergence of interacting particle systems to these measures, solidifying the probabilistic backbone for practical implementations. This development, emerging from earlier approaches in the 1970s and 1980s, provided central limit theorems and consistency results that validated particle methods as unbiased estimators for filtering posteriors.

The Filtering Problem

Bayesian estimation objectives

The primary objective of Bayesian estimation in filtering problems is to compute the posterior distribution p(xty1:t)p(x_t \mid y_{1:t}), which represents the conditional probability density of the hidden state xtx_t at time tt given the sequence of observations y1:ty_{1:t} up to that time. This distribution encapsulates all available information about the state, enabling the derivation of optimal estimates such as the mean or mode for or tasks. In the context of sequential data processing, this posterior serves as the foundation for characterizing uncertainty in dynamic systems. Bayesian filtering distinguishes between three main estimation tasks based on the availability and use of observations. Filtering refers to the estimation of the current state via p(xty1:t)p(x_t \mid y_{1:t}), updating recursively as new data arrives. Smoothing, in contrast, is an offline process that refines estimates of past states xkx_k (for k<tk < t) using all observations up to a final time T>tT > t, yielding p(xky1:T)p(x_k \mid y_{1:T}) with reduced uncertainty due to future data. Prediction extends the framework to forecast future states, computing distributions like p(xt+ny1:t)p(x_{t+n} \mid y_{1:t}) for n>0n > 0 by propagating the current posterior forward. The recursive structure of Bayesian filtering relies on to update the posterior sequentially. Specifically, p(xty1:t)p(ytxt)p(xtxt1)p(xt1y1:t1)dxt1,p(x_t \mid y_{1:t}) \propto p(y_t \mid x_t) \int p(x_t \mid x_{t-1}) p(x_{t-1} \mid y_{1:t-1}) \, dx_{t-1}, where the integral performs prediction by marginalizing over the previous state, and the likelihood p(ytxt)p(y_t \mid x_t) incorporates the new observation. This formulation alternates between a prediction step, which propagates uncertainty through the system dynamics, and an update step, which corrects the estimate based on the measurement. In nonlinear and non-Gaussian settings, the multidimensional integrals in this recursion lack closed-form analytical solutions, rendering exact computation intractable even with modern numerical methods. Such intractability arises because the state transition p(xtxt1)p(x_t \mid x_{t-1}) and observation p(ytxt)p(y_t \mid x_t) densities do not preserve simple forms like Gaussianity under integration, motivating the development of approximate inference techniques.

State-space signal models

In state-space signal models, the underlying system is typically formulated as a hidden Markov model (HMM), where the state evolves according to a Markov process and observations are conditionally independent given the current state. The state transition is described by the equation xt=f(xt1,wt)x_t = f(x_{t-1}, w_t), where xtx_t is the hidden state at time tt, f()f(\cdot) is a possibly nonlinear transition function, and wtw_t is process noise, often assumed to be independent and identically distributed (i.i.d.) with a known distribution such as Gaussian. Similarly, the observation model is given by yt=g(xt,vt)y_t = g(x_t, v_t), where yty_t is the observed signal, g()g(\cdot) is a possibly nonlinear measurement function, and vtv_t is observation noise, also typically i.i.d. and independent of the process noise, often Gaussian. These models capture the dynamic evolution of hidden states and their partial observability through noisy measurements, forming the foundation for inference in sequential data processing. A special case arises when both f()f(\cdot) and g()g(\cdot) are linear and the noises wtw_t and vtv_t are Gaussian, leading to a linear Gaussian state-space model. In this scenario, the optimal filtering solution can be computed exactly using the , which recursively updates the state estimate and its covariance based on predictions and measurement corrections. For general nonlinear and/or non-Gaussian cases, however, no closed-form solution exists, necessitating approximate methods to handle the intractable integrals involved in state estimation. The model is initialized with a prior distribution p(x0)p(x_0) over the initial state, which encodes available about the system's starting condition, such as a Gaussian centered at an . The joint density of the state trajectory and observations up to time tt is then p(x0:t,y1:t)=p(x0)k=1tp(xkxk1)p(ykxk)p(x_{0:t}, y_{1:t}) = p(x_0) \prod_{k=1}^t p(x_k \mid x_{k-1}) p(y_k \mid x_k), reflecting the and assumptions. These formulations provide the probabilistic structure that supports Bayesian objectives by defining the likelihood and prior dynamics for posterior . Practical examples illustrate the versatility of these models. In target tracking, a constant velocity model assumes the state xt=[positiont,velocityt]x_t = [position_t, velocity_t]^\top evolves as xt=[1Δt01]xt1+wtx_t = \begin{bmatrix} 1 & \Delta t \\ 0 & 1 \end{bmatrix} x_{t-1} + w_t, with linear observations of position, capturing smooth motion under Gaussian process noise. In financial time series analysis, stochastic volatility models treat log-volatility as a hidden state following an autoregressive , such as ht=μ+ϕ(ht1μ)+ηth_t = \mu + \phi (h_{t-1} - \mu) + \eta_t with Gaussian ηt\eta_t, and observations as returns yt=exp(ht/2)ϵty_t = \exp(h_t / 2) \epsilon_t where ϵtN(0,1)\epsilon_t \sim \mathcal{N}(0,1), enabling the capture of time-varying risk without assuming constant variance.

Nonlinear filtering equations

In the context of state-space models, nonlinear filtering addresses the problem of estimating the posterior distribution of a hidden state sequence given a sequence of observations, relying on recursive Bayesian updates. The exact Bayesian filtering recursions consist of a prediction step followed by an update step. In the step, the prior distribution of the state at time tt given observations up to time t1t-1 is obtained by marginalizing over the previous state: p(xty1:t1)=p(xtxt1)p(xt1y1:t1)dxt1,p(x_t \mid y_{1:t-1}) = \int p(x_t \mid x_{t-1}) p(x_{t-1} \mid y_{1:t-1}) \, dx_{t-1}, which follows from the Chapman-Kolmogorov equation for the evolution of marginal distributions in Markov processes. The update step then incorporates the new observation yty_t via to yield the posterior: p(xty1:t)=p(ytxt)p(xty1:t1)p(yty1:t1),p(x_t \mid y_{1:t}) = \frac{p(y_t \mid x_t) p(x_t \mid y_{1:t-1})}{p(y_t \mid y_{1:t-1})}, where the , or likelihood of the , is p(yty1:t1)=p(ytxt)p(xty1:t1)dxt.p(y_t \mid y_{1:t-1}) = \int p(y_t \mid x_t) p(x_t \mid y_{1:t-1}) \, dx_t. These recursions provide the optimal solution for nonlinear, non-Gaussian filtering problems under the Bayesian framework but are generally intractable to compute analytically, as the required integrals lack closed-form expressions except in special cases like linear Gaussian models. The intractability intensifies with the curse of dimensionality: in high-dimensional state spaces, the volume of the integration domain grows exponentially, rendering exact evaluation computationally prohibitive and necessitating approximate numerical methods such as particle filters.

Feynman-Kac probabilistic formulations

The Feynman-Kac formula establishes a connection between solutions of parabolic partial differential equations (PDEs) and expectations involving . For a X=(Xt)0tTX = (X_t)_{0 \leq t \leq T} starting from X0=xX_0 = x and governed by an Itô dXt=b(t,Xt)dt+σ(t,Xt)dWtdX_t = b(t, X_t) dt + \sigma(t, X_t) dW_t, where WW is a , the formula states that the expectation E[ϕ(XT)t=0Tg(t,Xt)]\mathbb{E} \left[ \phi(X_T) \prod_{t=0}^T g(t, X_t) \right] equals u(0,x)u(0, x), the value at initial time t=0t=0 and position xx of the solution to the PDE ut(t,x)+Lu(t,x)V(t,x)u(t,x)=0,u(T,x)=ϕ(x),\frac{\partial u}{\partial t}(t, x) + \mathcal{L} u(t, x) - V(t, x) u(t, x) = 0, \quad u(T, x) = \phi(x), with L\mathcal{L} the infinitesimal generator of the and V(t,x)=logg(t,x)V(t, x) = -\log g(t, x). In nonlinear filtering problems, the Feynman-Kac framework recasts the posterior distribution of the hidden state given observations as a normalized expectation under a change of . The unnormalized posterior measure at time tt is represented as ηt(ϕ)=E[ϕ(Xt)s=0tgs(Xs)]\eta_t(\phi) = \mathbb{E} \left[ \phi(X_t) \prod_{s=0}^t g_s(X_s) \right] for test functions ϕ\phi, where the measure change incorporates the transition dynamics and the potentials gs(x)=p(ysx)g_s(x) = p(y_s | x) corresponding to the observation likelihoods p(ysxs)p(y_s | x_s), with {ys}\{y_s\} the observed . This semigroup property allows the filtering recursion to be viewed as the evolution of expectations under the Feynman-Kac flow, providing a measure-theoretic foundation for sequential approximations. Branching particle representations offer a genealogical interpretation of these expectations, modeling the paths as particle lineages in a . In this setup, each particle simulates a potential state , with branching events (births and deaths) governed by the potentials gt(x)=p(ytx)g_t(x) = p(y_t | x): particles "reproduce" with rates proportional to gtg_t to amplify likely paths and may be pruned otherwise, ensuring the of surviving particles approximates the target posterior. This mechanism naturally handles the multiplicative structure of the expectations in the Feynman-Kac formula, facilitating unbiased approximations in high-dimensional or nonlinear settings. Pierre Del Moral's 2004 monograph serves as the foundational reference for integrating Feynman-Kac formulae with interacting and genealogical particle systems, particularly in filtering applications.

Core Principles of Particle Filters

Monte Carlo simulation basics

Monte Carlo methods provide a computational framework for approximating expectations and integrals involving complex probability distributions by leveraging random sampling. These techniques rely on the , which ensures that the sample average of a function evaluated at independent draws from a target distribution p(x)p(x) converges to the true expectation Ep[f(X)]=f(x)p(x)dx\mathbb{E}_{p}[f(X)] = \int f(x) p(x) \, dx as the number of samples NN increases. This empirical approximation is particularly valuable in high-dimensional or intractable settings where analytical solutions are unavailable. A fundamental representation in Monte Carlo simulation is the empirical probability measure, given by Π^N=1Ni=1NδXi\hat{\Pi}_N = \frac{1}{N} \sum_{i=1}^N \delta_{X^i}, where {Xi}i=1N\{X^i\}_{i=1}^N are independent and identically distributed (i.i.d.) samples from p(x)p(x), and δx\delta_x denotes the Dirac delta measure at xx. For a general weighted case, this extends to Π^N=i=1NwiδXi\hat{\Pi}_N = \sum_{i=1}^N w_i \delta_{X^i}, where the weights wiw_i sum to 1, providing an unbiased estimator for integrals of the form f(x)Π(dx)i=1Nwif(Xi)\int f(x) \, \Pi(dx) \approx \sum_{i=1}^N w_i f(X^i). As NN \to \infty, this measure converges weakly to the target distribution Π\Pi with probability 1, enabling reliable approximations for large sample sizes. When direct sampling from the target distribution p(x)p(x) is difficult, importance sampling addresses this by drawing samples XiX^i from a proposal distribution q(x)q(x) that is easier to sample from, and reweighting each sample by the importance weight wi=p(Xi)/q(Xi)w_i = p(X^i)/q(X^i). The weighted empirical measure then approximates expectations under pp, yielding Ep[f(X)]i=1Nw~if(Xi)\mathbb{E}_{p}[f(X)] \approx \sum_{i=1}^N \tilde{w}_i f(X^i), where w~i=wi/j=1Nwj\tilde{w}_i = w_i / \sum_{j=1}^N w_j are the normalized weights. This method, originally developed in the context of neutron transport simulations, reduces computational burden by aligning samples with regions of high relevance to the target. To assess the efficiency of such approximations, particularly in , the effective sample size NeffN_{\text{eff}} quantifies the reduction in variance due to non-uniform weights, defined as Neff=1/i=1Nw~i2N_{\text{eff}} = 1 / \sum_{i=1}^N \tilde{w}_i^2. This metric, which ranges from 1 (complete degeneracy, where one weight dominates) to NN (uniform weights, equivalent to direct sampling), indicates how many i.i.d. samples from pp would yield the same variance as the weighted set. techniques, including careful choice of q(x)q(x) to minimize weight variability, aim to maximize NeffN_{\text{eff}} and thus improve precision. The statistical error in estimators follows from the , which establishes that the normalized error N(1Ni=1Nf(Xi)Ep[f(X)])\sqrt{N} \left( \frac{1}{N} \sum_{i=1}^N f(X^i) - \mathbb{E}_{p}[f(X)] \right)
Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.