Data assimilation
Data assimilation
Main page

Data assimilation

logo
Community Hub0 subscribers
What are your thoughts?
Be the first to start a discussion here.
Be the first to start a discussion here.
Data assimilation

Data assimilation refers to a large group of methods that update information from numerical computer models with information from observations. Data assimilation is used to update model states, model trajectories over time, model parameters, and combinations thereof. What distinguishes data assimilation from other estimation methods is that the computer model is a dynamical model, i.e. the model describes how model variables change over time, and its firm mathematical foundation in Bayesian Inference. As such, it generalizes inverse methods and has close connections with machine learning.

Data assimilation initially developed in the field of numerical weather prediction. Numerical weather prediction models are equations describing the evolution of the atmosphere, typically coded into a computer program. When these models are used for forecasting the model output quickly deviates from the real atmosphere. Hence, we use observations of the atmosphere to keep the model on track. Data assimilation provides a very large number of practical ways to bring these observations into the models.

Simply inserting point-wise measurements into the numerical models did not provide a satisfactory solution. Real world measurements contain errors both due to the quality of the instrument and how accurately the position of the measurement is known. These errors can cause instabilities in the models that eliminate any level of skill in a forecast. Thus, more sophisticated methods were needed in order to initialize a model using all available data while making sure to maintain stability in the numerical model. Such data typically includes the measurements as well as a previous forecast valid at the same time the measurements are made. If applied iteratively, this process begins to accumulate information from past observations into all subsequent forecasts.

Because data assimilation developed out of the field of numerical weather prediction, it initially gained popularity amongst the geosciences. In fact, one of the most cited publication in all of the geosciences is an application of data assimilation to reconstruct the observed history of the atmosphere.

Classically, data assimilation has been applied to chaotic dynamical systems that are too difficult to predict using simple extrapolation methods. The cause of this difficulty is that small changes in initial conditions can lead to large changes in prediction accuracy. This is sometimes known as the butterfly effect – the sensitive dependence on initial conditions in which a small change in one state of a deterministic nonlinear system can result in large differences in a later state.

At any update time, data assimilation usually takes a forecast (also known as the first guess, or background information) and applies a correction to the forecast based on a set of observed data and estimated errors that are present in both the observations and the forecast itself. The difference between the forecast and the observations at that time is called the departure or the innovation (as it provides new information to the data assimilation process). A weighting factor is applied to the innovation to determine how much of a correction should be made to the forecast based on the new information from the observations. The best estimate of the state of the system based on the correction to the forecast determined by a weighting factor times the innovation is called the analysis. In one dimension, computing the analysis could be as simple as forming a weighted average of a forecasted and observed value. In multiple dimensions the problem becomes more difficult. Much of the work in data assimilation is focused on adequately estimating the appropriate weighting factor based on intricate knowledge of the errors in the system.

The measurements are usually made of a real-world system, rather than of the model's incomplete representation of that system, and so a special function called the observation operator (usually depicted by h() for a nonlinear operator or "H" for its linearization) is needed to map the modeled variable to a form that can be directly compared with the observation.

One of the common mathematical philosophical perspectives is to view data assimilation as a Bayesian estimation problem. From this perspective, the analysis step is an application of Bayes' theorem and the overall assimilation procedure is an example of recursive Bayesian estimation. However, the probabilistic analysis is usually simplified to a computationally feasible form. Advancing the probability distribution in time would be done exactly in the general case by the Fokker–Planck equation, but that is not feasible for high-dimensional systems; so, various approximations operating on simplified representations of the probability distributions are used instead. Often the probability distributions are assumed Gaussian so that they can be represented by their mean and covariance, which gives rise to the Kalman filter.

See all
User Avatar
No comments yet.