Hubbry Logo
logo
Stationary process
Community hub

Stationary process

logo
0 subscribers
Read side by side
from Wikipedia

In mathematics and statistics, a stationary process (also called a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose statistical properties, such as mean and variance, do not change over time. More formally, the joint probability distribution of the process remains the same when shifted in time. This implies that the process is statistically consistent across different time periods. Because many statistical procedures in time series analysis assume stationarity, non-stationary data are frequently transformed to achieve stationarity before analysis.

A common cause of non-stationarity is a trend in the mean, which can be due to either a unit root or a deterministic trend. In the case of a unit root, stochastic shocks have permanent effects, and the process is not mean-reverting. With a deterministic trend, the process is called trend-stationary, and shocks have only transitory effects, with the variable tending towards a deterministically evolving mean. A trend-stationary process is not strictly stationary but can be made stationary by removing the trend. Similarly, processes with unit roots can be made stationary through differencing.

Another type of non-stationary process, distinct from those with trends, is a cyclostationary process, which exhibits cyclical variations over time.

Strict stationarity, as defined above, can be too restrictive for many applications. Therefore, other forms of stationarity, such as wide-sense stationarity or N-th-order stationarity, are often used. The definitions for different kinds of stationarity are not consistent among different authors (see Other terminology).

Strict-sense stationarity

[edit]

Definition

[edit]

Formally, let be a stochastic process and let represent the cumulative distribution function of the unconditional (i.e., with no reference to any particular starting value) joint distribution of at times . Then, is said to be strictly stationary, strongly stationary or strict-sense stationary if[1]: p. 155 

Since does not affect , is independent of time.

Examples

[edit]
Two simulated time series processes, one stationary and the other non-stationary, are shown above. The augmented Dickey–Fuller (ADF) test statistic is reported for each process; non-stationarity cannot be rejected for the second process at a 5% significance level.

White noise is the simplest example of a stationary process.

An example of a discrete-time stationary process where the sample space is also discrete (so that the random variable may take one of N possible values) is a Bernoulli scheme. Other examples of a discrete-time stationary process with continuous sample space include some autoregressive and moving average processes which are both subsets of the autoregressive moving average model. Models with a non-trivial autoregressive component may be either stationary or non-stationary, depending on the parameter values, and important non-stationary special cases are where unit roots exist in the model.

Example 1

[edit]

Let be any scalar random variable, and define a time-series by

Then is a stationary time series, for which realisations consist of a series of constant values, with a different constant value for each realisation. A law of large numbers does not apply on this case, as the limiting value of an average from a single realisation takes the random value determined by , rather than taking the expected value of .

The time average of does not converge since the process is not ergodic.

Example 2

[edit]

As a further example of a stationary process for which any single realisation has an apparently noise-free structure, let have a uniform distribution on and define the time series by

Then is strictly stationary since ( modulo ) follows the same uniform distribution as for any .

Example 3

[edit]

Keep in mind that a weakly white noise is not necessarily strictly stationary. Let be a random variable uniformly distributed in the interval and define the time series

Then

So is a white noise in the weak sense (the mean and cross-covariances are zero, and the variances are all the same), however it is not strictly stationary.

Nth-order stationarity

[edit]

In Eq.1, the distribution of samples of the stochastic process must be equal to the distribution of the samples shifted in time for all . N-th-order stationarity is a weaker form of stationarity where this is only requested for all up to a certain order . A random process is said to be N-th-order stationary if:[1]: p. 152 

Weak or wide-sense stationarity

[edit]

Definition

[edit]

A weaker form of stationarity commonly employed in signal processing is known as weak-sense stationarity, wide-sense stationarity (WSS), or covariance stationarity. WSS random processes only require that 1st moment (i.e. the mean) and autocovariance do not vary with respect to time and that the 2nd moment is finite for all times. Any strictly stationary process which has a finite mean and covariance is also WSS.[2]: p. 299 

So, a continuous time random process which is WSS has the following restrictions on its mean function and autocovariance function :

The first property implies that the mean function must be constant. The second property implies that the autocovariance function depends only on the difference between and and only needs to be indexed by one variable rather than two variables.[1]: p. 159  Thus, instead of writing,

the notation is often abbreviated by the substitution :

This also implies that the autocorrelation depends only on , that is

The third property says that the second moments must be finite for any time .

Motivation

[edit]

The main advantage of wide-sense stationarity is that it places the time-series in the context of Hilbert spaces. Let H be the Hilbert space generated by {x(t)} (that is, the closure of the set of all linear combinations of these random variables in the Hilbert space of all square-integrable random variables on the given probability space). By the positive definiteness of the autocovariance function, it follows from Bochner's theorem that there exists a positive measure on the real line such that H is isomorphic to the Hilbert subspace of L2(μ) generated by {e−2πiξ⋅t}. This then gives the following Fourier-type decomposition for a continuous time stationary stochastic process: there exists a stochastic process with orthogonal increments such that, for all

where the integral on the right-hand side is interpreted in a suitable (Riemann) sense. The same result holds for a discrete-time stationary process, with the spectral measure now defined on the unit circle.

When processing WSS random signals with linear, time-invariant (LTI) filters, it is helpful to think of the correlation function as a linear operator. Since it is a circulant operator (depends only on the difference between the two arguments), its eigenfunctions are the Fourier complex exponentials. Additionally, since the eigenfunctions of LTI operators are also complex exponentials, LTI processing of WSS random signals is highly tractable—all computations can be performed in the frequency domain. Thus, the WSS assumption is widely employed in signal processing algorithms.

Definition for complex stochastic process

[edit]

In the case where is a complex stochastic process the autocovariance function is defined as and, in addition to the requirements in Eq.3, it is required that the pseudo-autocovariance function depends only on the time lag. In formulas, is WSS, if

Joint stationarity

[edit]

The concept of stationarity may be extended to two stochastic processes.

Joint strict-sense stationarity

[edit]

Two stochastic processes and are called jointly strict-sense stationary if their joint cumulative distribution remains unchanged under time shifts, i.e. if

Joint (M + N)th-order stationarity

[edit]

Two random processes and is said to be jointly (M + N)-th-order stationary if:[1]: p. 159 

Joint weak or wide-sense stationarity

[edit]

Two stochastic processes and are called jointly wide-sense stationary if they are both wide-sense stationary and their cross-covariance function depends only on the time difference . This may be summarized as follows:

Relation between types of stationarity

[edit]
  • If a stochastic process is N-th-order stationary, then it is also M-th-order stationary for all .
  • If a stochastic process is second order stationary () and has finite second moments, then it is also wide-sense stationary.[1]: p. 159 
  • If a stochastic process is wide-sense stationary, it is not necessarily second-order stationary.[1]: p. 159 
  • If a stochastic process is strict-sense stationary and has finite second moments, it is wide-sense stationary.[2]: p. 299 
  • If two stochastic processes are jointly (M + N)-th-order stationary, this does not guarantee that the individual processes are M-th- respectively N-th-order stationary.[1]: p. 159 

Other terminology

[edit]

The terminology used for types of stationarity other than strict stationarity can be rather mixed. Some examples follow.

  • Priestley uses stationary up to order m if conditions similar to those given here for wide sense stationarity apply relating to moments up to order m.[3][4] Thus wide sense stationarity would be equivalent to "stationary to order 2", which is different from the definition of second-order stationarity given here.
  • Honarkhah and Caers also use the assumption of stationarity in the context of multiple-point geostatistics, where higher n-point statistics are assumed to be stationary in the spatial domain.[5]

Techniques to stationarize a non-stationary process

[edit]

In time series analysis and stochastic processes, stationarizing a time series is a crucial preprocessing step aimed at transforming a non-stationary process into a stationary one. Several techniques exist for achieving this, depending on the type and order of non-stationarity present. For first-order non-stationarity, where the mean of the process varies over time, differencing is a common and effective method: it transforms the series by subtracting each value from its predecessor, thus stabilizing the mean. For non-stationarities up to the second order, time-frequency analysis (e.g., Wavelet transform, Wigner distribution function, or Short-time Fourier transform) can be employed to isolate and suppress time-localized, nonstationary spectral components. Additionally, surrogate data methods can be used to construct strictly stationary versions of the original time series. One of the ways for identifying non-stationary times series is the ACF plot. Sometimes, patterns will be more visible in the ACF plot than in the original time series; however, this is not always the case.[6]

The choice of method for time series stationarization depends on the nature of the non-stationarity and the goals of the analysis, especially when building models that require strict stationarity assumptions, such as ARMA or spectral-based techniques. More details on some time series stationarization methods are presented below.

Stationarization by means of differencing

[edit]

One way to make some time series first-order stationary is to compute the differences between consecutive observations. This is known as differencing. Differencing can help stabilize the mean of a time series by removing changes in the level of a time series, and so eliminating trends. This can also remove seasonality, if differences are taken appropriately (e.g. differencing observations 1 year apart to remove a yearly trend). Transformations such as logarithms can help to stabilize the variance of a time series.

Stationarization by means of the surrogate method

[edit]

The surrogate method for stationarization [7] works by generating a new time series that preserves certain statistical properties of the original series while removing its nonstationary components [8][9][10]. A common approach is to apply the Fourier Transform to the original time series to obtain its magnitude and phase spectra. The magnitude spectrum, which determines the power distribution across frequencies, is retained to preserve the global autocorrelation structure. The phase spectrum, which encodes the temporal alignment of frequency components and is often responsible for time-dependent dynamics in the time series (like non-stationarities), is then randomized, typically by replacing it with a set of random phases drawn uniformly from while enforcing conjugate symmetry to ensure a real-valued inverse. Applying the inverse Fourier Transform to the modified spectra yields a strictly stationary surrogate time series [11]: one with the same power spectrum as the original but lacking the temporal structures that caused non-stationarity. This technique is often used in hypothesis tests for probing the stationarity property [12][13][14][15].

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In probability theory and statistics, a stationary process, also known as a strictly stationary process, is a stochastic process whose finite-dimensional distributions remain invariant under shifts in time.[1] This means that for any integer k1k \geq 1 and any n0n \geq 0, the joint distribution of (Xn,Xn+1,,Xn+k1)(X_n, X_{n+1}, \dots, X_{n+k-1}) is identical to that of (X0,X1,,Xk1)(X_0, X_1, \dots, X_{k-1}).[2] Strict stationarity captures the full probabilistic structure of the process, ensuring that its statistical properties, including all moments and dependencies, do not evolve over time.[3] A related but weaker concept is weak stationarity (or wide-sense stationarity), which applies primarily to second-order processes and requires only that the mean is constant across time and that the autocovariance function depends solely on the time lag between observations, rather than their absolute positions.[4] Specifically, for a weakly stationary process {Xt}\{X_t\}, E[Xt]=μ\mathbb{E}[X_t] = \mu for all tt, and Cov(Xt,Xt+τ)=γ(τ)\text{Cov}(X_t, X_{t+\tau}) = \gamma(\tau) for all tt and lag τ\tau, where γ(τ)\gamma(\tau) is finite and symmetric.[5] Weak stationarity is less restrictive than strict stationarity, as the latter implies the former only if second moments exist, but it suffices for many practical analyses involving linear models and spectral properties.[5] Processes that are Gaussian are strictly stationary if and only if they are weakly stationary, due to the complete characterization of Gaussian distributions by their mean and covariance.[6] Stationary processes form the cornerstone of time series analysis, enabling the estimation of consistent sample statistics such as means, variances, and correlations, which would otherwise be unreliable in non-stationary data.[7] They simplify modeling and forecasting by assuming time-invariance, making techniques like autoregressive moving average (ARMA) models applicable, and are essential for ergodic theorems that equate time averages to ensemble averages in large samples.[8] Beyond statistics, stationary processes underpin applications in signal processing, econometrics, and physics, where assumptions of stationarity facilitate spectral analysis and prediction of random phenomena like noise in communications or fluctuations in financial markets.[9]

Fundamental Concepts

Overview and Importance

A stationary process is a stochastic process in which the joint probability distribution of any collection of its random variables remains invariant under time shifts, meaning its statistical properties, such as mean and variance, do not change over time. This implies mean reversion in time series, where a mean-reverting series has constant statistics like mean and standard deviation over time.[10][11] This time-invariance distinguishes stationary processes from non-stationary ones, where properties evolve, and encompasses two primary forms: strict-sense stationarity, requiring full distributional invariance, and wide-sense stationarity, focusing on constant mean and autocovariance.[12][3] The concept is fundamental in time series analysis, where it simplifies modeling, forecasting, and statistical inference by enabling the assumption of consistent probabilistic behavior across time periods.[13] Without stationarity, standard techniques like regression can yield misleading results, as the process cannot be reliably treated as a sequence of independent draws from a fixed distribution.[8] This invariance facilitates the application of tools like autoregressive models and spectral analysis, which rely on stable temporal structures to extract meaningful patterns.[14] Historically, stationary processes emerged in physics around 1900 through studies of Brownian motion, where Albert Einstein modeled particle displacements as processes with stationary increments to link microscopic fluctuations to observable diffusion.[15] The concept transitioned to econometrics in the 1920s, as researchers like G. Udny Yule and Eugen Slutsky demonstrated how non-stationary time series could produce illusory cycles and correlations, underscoring the need for stationarity in economic modeling.[16][17] Stationary processes play a critical role in diverse fields, including signal processing for filtering noise in stationary signals, finance for pricing assets under stable volatility assumptions, and climate modeling to distinguish genuine trends from random variations.[18] In these domains, non-stationarity often leads to spurious correlations, such as apparent relationships between unrelated trending series, which can invalidate predictions unless addressed.[16] For instance, in climate science, transforming non-stationary temperature data to stationarity enables reliable stochastic simulations of variability.[19]

Historical Development

The concept of stationarity originated in 19th-century physics, particularly in the kinetic theory of gases, where James Clerk Maxwell assumed steady-state distributions for molecular velocities to describe equilibrium conditions in gaseous systems. In his 1860 work, Maxwell derived the distribution of velocities under the assumption that the system reaches a stable, time-invariant state after collisions, laying early groundwork for notions of unchanging statistical properties over time. The formal introduction of stationarity in stochastic processes occurred in the 1930s through the contributions of Soviet mathematicians. Alexander Khinchin established the correlation theory for stationary stochastic processes in 1934, linking stationarity to ergodicity by showing that time averages converge to ensemble averages under certain correlation decay conditions. Andrey Kolmogorov further advanced this in 1941 with his foundational work on interpolation and extrapolation of stationary random sequences, providing rigorous probabilistic frameworks for prediction in such processes.[20] In the realm of time series analysis, stationarity gained practical traction in the late 1920s and early 1930s. George Udny Yule's 1927 paper introduced autoregressive models for investigating periodicities in disturbed series, implicitly relying on stationary assumptions to model sunspot data as stable linear dependencies on past values. Gilbert Walker extended this in 1931 by developing models for periodicity in interrelated series, incorporating moving average components that assumed underlying stationarity for forecasting weather and economic patterns.[21][22] Post-World War II advancements emphasized computational aspects through spectral analysis. John Tukey, collaborating with Ralph Blackman in 1958, promoted wide-sense stationarity in power spectrum estimation, enabling practical applications in communications engineering by focusing on time-invariant means and covariances for efficient signal processing. This shift facilitated broader adoption in fields requiring tractable analysis of noisy data. In econometrics, the transition from strict to wide-sense stationarity became prominent in the mid-20th century, allowing flexible modeling of economic time series without full distributional invariance.[23][24] In the 1970s, the concept extended into non-linear dynamics and chaos theory, where stationary invariant measures describe long-term behavior on strange attractors despite sensitive dependence on initial conditions. Seminal works, such as the 1971 paper by David Ruelle and Floris Takens, integrated stationarity into chaotic systems to analyze stable probability distributions amid apparent randomness.[25]

Strict-Sense Stationarity

Definition

A stochastic process {Xt}tT\{X_t\}_{t \in T}, where TT is the index set (typically the integers or real numbers), is defined as strictly stationary if its finite-dimensional distributions are invariant under time shifts. Specifically, for any integer k1k \geq 1, any t1,,tkTt_1, \dots, t_k \in T, and any shift hTh \in T such that ti+hTt_i + h \in T for all ii, the joint distribution of (Xt1,Xt2,,Xtk)(X_{t_1}, X_{t_2}, \dots, X_{t_k}) is the same as that of (Xt1+h,Xt2+h,,Xtk+h)(X_{t_1 + h}, X_{t_2 + h}, \dots, X_{t_k + h}).[26] This definition captures the full probabilistic structure of the process and does not require the existence of moments. Strict stationarity implies wide-sense stationarity if the first and second moments exist and are finite.[26]

Properties and Examples

Strict-sense stationary processes possess several key properties arising from the time-invariance of their finite-dimensional distributions. All moments, including the mean, variance, and higher-order joint moments, are invariant under time shifts, meaning they depend only on the relative time differences rather than absolute times.[26] Similarly, cumulants, which are derived from the moments via the moment-generating function, exhibit the same invariance, providing a complete characterization of the process's statistical structure independent of time origin.[27] This invariance extends to the marginal distributions, which remain constant across all time points, ensuring that the univariate distribution of XtX_t is identical for every tt.[28] Under additional conditions, such as mixing (where dependence between distant observations diminishes), strict-sense stationary processes can be ergodic, allowing time averages from a single realization to converge to ensemble averages, such as the mean.[26] Furthermore, such processes are preserved under time-invariant transformations, like applying a fixed function or linear filter to the observations, as long as the operation does not introduce time dependence; this links to higher-order stationarity, where moment preservation holds for all orders.[26] Illustrative examples highlight these properties. An independent and identically distributed (i.i.d.) sequence, such as white noise where each XtX_t is drawn from the same distribution independently, is strictly stationary because any shift preserves the joint distributions exactly.[26] A constant process, defined by Xt=cX_t = c for all tt and some fixed cc, is trivially stationary, as all joint distributions are degenerate and unchanged by shifts.[28] Circularly symmetric processes provide another example, particularly in periodic or angular settings. For instance, a random phase signal X(t)=Acos(ωt+Θ)X(t) = A \cos(\omega t + \Theta), where AA is constant and Θ\Theta is uniformly distributed on [0,2π)[0, 2\pi), is strictly stationary due to the uniform phase ensuring rotational invariance equivalent to time shifts.[26]

Wide-Sense Stationarity

Definition

A stochastic process {Xt}tT\{X_t\}_{t \in T}, where TT is the index set (typically the real numbers or integers), is defined as wide-sense stationary if it satisfies two key conditions on its first- and second-order moments. First, the expected value E[Xt]=μE[X_t] = \mu must be constant for all tTt \in T, independent of time. Second, the covariance between XtX_t and Xt+τX_{t+\tau} must depend solely on the time lag τ\tau, expressed as Cov(Xt,Xt+τ)=γ(τ)\operatorname{Cov}(X_t, X_{t+\tau}) = \gamma(\tau) for all t,τTt, \tau \in T, where γ()\gamma(\cdot) is the autocovariance function. This definition presupposes that the second moments are finite, i.e., E[Xt2]<E[X_t^2] < \infty for all tt, ensuring the covariance is well-defined.[26] The associated autocorrelation function is then given by ρ(τ)=γ(τ)/γ(0)\rho(\tau) = \gamma(\tau) / \gamma(0), which is normalized such that ρ(0)=1\rho(0) = 1 and remains invariant with respect to time shifts.[26] For complex-valued processes, the definition extends by requiring E[Xt]=μE[X_t] = \mu (constant) and E[XtXt+τ]=γ(τ)E[X_t \overline{X_{t+\tau}}] = \gamma(\tau), where \overline{\cdot} denotes the complex conjugate, to account for non-real cases while preserving the lag dependence.[29] This formulation is weaker than strict-sense stationarity, as it focuses solely on these moment conditions rather than full distributional invariance.

Motivation and Applications

Wide-sense stationarity is motivated by its relative ease of verification and computation compared to strict-sense stationarity, as it requires only the invariance of the mean and autocorrelation function rather than all finite-dimensional distributions.[26] This makes it particularly suitable for practical analyses where full distributional properties are difficult or unnecessary to establish, while still capturing essential second-order statistics.[26] Furthermore, wide-sense stationarity suffices for many theoretical and applied contexts involving linear systems and spectral analysis, where higher-order moments beyond the second are not required, enabling efficient characterization via power spectral density.[26] In time series forecasting, wide-sense stationarity underpins autoregressive moving average (ARMA) models, which decompose stationary processes into autoregressive and moving average components for parameter estimation and prediction.[30] The Box-Jenkins methodology, developed in the 1970s, relies on this framework to identify, estimate, and validate ARMA models after transforming data to achieve stationarity through differencing or other means.[30] In signal processing, wide-sense stationarity facilitates the design of linear time-invariant filters for stationary noise, as the output of such a system remains wide-sense stationary when the input is, allowing straightforward computation of output autocorrelation via convolution with the system's impulse response.[31] For econometrics, it plays a key role in cointegration testing, where non-stationary integrated series are examined for linear combinations that yield wide-sense stationary residuals, indicating long-run equilibrium relationships.[32] An additional advantage of wide-sense stationarity lies in asymptotic theory, where central limit theorems hold for partial sums of stationary linear processes under mixingale-type conditions, ensuring normal approximations for large samples without requiring ergodicity or stricter stationarity.[33] A representative example is the first-order moving average process defined as $ X_t = \varepsilon_t + \theta \varepsilon_{t-1} $, where $ {\varepsilon_t} $ is white noise with variance $ \sigma^2 $; this process exhibits constant variance $ (1 + \theta^2) \sigma^2 $ and covariance that depends only on the lag (nonzero only at lag 1, equal to $ \theta \sigma^2 $), confirming its wide-sense stationarity for any finite $ \theta $.[34]

Higher-Order Stationarity

N-th Order Stationarity

A stochastic process {Xt}\{X_t\} is said to be nn-th order stationary if, for every integer knk \leq n, the joint distribution of any kk observations is invariant under time shifts. Specifically, for any times t1,,tkt_1, \dots, t_k and any shift τ\tau, the joint cumulative distribution function satisfies
FXt1+τ,,Xtk+τ(x1,,xk)=FXt1,,Xtk(x1,,xk), F_{X_{t_1 + \tau}, \dots, X_{t_k + \tau}}(x_1, \dots, x_k) = F_{X_{t_1}, \dots, X_{t_k}}(x_1, \dots, x_k),
meaning these distributions depend only on the time differences titjt_i - t_j rather than absolute time.[35] This condition ensures that the statistical behavior up to order nn remains consistent across the process.[26] This notion of nn-th order stationarity provides a framework for analyzing processes where full strict stationarity—requiring invariance of all finite-dimensional distributions—may be overly restrictive, yet the distributions up to a specific order nn are sufficient for modeling or inference. For instance, in signal processing or time series analysis, second-order properties often suffice for linear predictions, allowing focus on nn-th order without assuming higher-order invariance.[36] An illustrative example involves processes with stable low-order joint distributions but time-varying higher ones, such as a stochastic process with time-invariant marginal distributions (first-order stationary) but evolving bivariate joint distributions (not second-order stationary), for example, where the dependence structure, like correlation, changes over time while marginals remain fixed. Similar dynamics can occur for higher orders, where low-order joints are stationary, but higher-order ones exhibit trends in dependence, enabling targeted analysis based on the stable orders.[28] As nn \to \infty, the process is strictly stationary, as all finite-dimensional distributions are invariant under shifts.[35] This limiting case underscores how cumulative distributional invariance recovers the full time-invariance of strict stationarity.

Relation to Strict and Wide-Sense

A strictly stationary process has shift-invariant finite-dimensional distributions, which implies nn-th order stationarity for every finite nn.[26] Conversely, if the process is nn-th order stationary for every nn, then it is strictly stationary. Second-order stationarity requires that the joint distribution of any two observations is shift-invariant, which implies wide-sense stationarity (constant mean and lag-dependent autocovariance) provided the first and second moments exist. However, wide-sense stationarity does not imply second-order stationarity in general, as it only constrains the moments, not the full distribution; the two coincide for Gaussian processes, where the mean and covariance fully characterize the distribution.[36][28] Higher-order stationarity for n>2n > 2 extends second-order by requiring invariance of joint distributions up to order nn, imposing conditions on higher dependencies beyond those captured by wide-sense. For Gaussian processes, second-order stationarity equates to strict stationarity and thus to higher-order stationarity of all orders, since the distributions are fully characterized by the first- and second-order properties.[26][28] This equivalence does not extend to joint stationarity across multiple processes unless additional cross-covariance conditions are met.

Joint Stationarity

Strict-Sense Joint Stationarity

Strict-sense joint stationarity extends the notion of strict-sense stationarity from a single stochastic process to a collection of two or more processes, ensuring that their combined statistical behavior remains unchanged under time shifts.[37] For two stochastic processes {Xt}\{X_t\} and {Yt}\{Y_t\}, they are jointly strictly stationary if all finite-dimensional joint distributions are invariant to time translation. Specifically, for any integers nn and mm, any times t1,,tnt_1, \dots, t_n and s1,,sms_1, \dots, s_m, and any shift τ\tau, the joint probability density function satisfies
pX(t1),,X(tn),Y(s1),,Y(sm)(x1,,xn,y1,,ym)=pX(t1+τ),,X(tn+τ),Y(s1+τ),,Y(sm+τ)(x1,,xn,y1,,ym). p_{X(t_1), \dots, X(t_n), Y(s_1), \dots, Y(s_m)}(x_1, \dots, x_n, y_1, \dots, y_m) = p_{X(t_1 + \tau), \dots, X(t_n + \tau), Y(s_1 + \tau), \dots, Y(s_m + \tau)}(x_1, \dots, x_n, y_1, \dots, y_m).
This condition implies that the joint cumulative distribution function FF of any finite collection of observations from both processes also satisfies F(Xt1+h,Ys1+h,)=F(Xt1,Ys1,)F_{(X_{t_1 + h}, Y_{s_1 + h}, \dots)} = F_{(X_{t_1}, Y_{s_1}, \dots)} for all shifts hh. Equivalently, the vector-valued process (Xt,Yt)(X_t, Y_t) is strictly stationary as a single multivariate process.[37] This framework is crucial for modeling and analyzing cross-dependencies in multivariate time series, where interactions between processes, such as synchronization in coupled oscillators, must preserve distributional invariance over time to enable reliable inference on joint dynamics.[38] A representative example is bivariate white noise, where pairs (Xt,Yt)(X_t, Y_t) are independent and identically distributed according to a fixed joint distribution, such as a bivariate Gaussian with zero mean and constant cross-covariance matrix (1ρρ1)\begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix} for some ρ(1,1)\rho \in (-1, 1); this setup ensures all joint finite-dimensional distributions are shift-invariant, capturing potential linear or nonlinear cross-dependencies while maintaining strict stationarity.[39]

Wide-Sense Joint Stationarity

Joint wide-sense stationarity extends the concept of wide-sense stationarity to multiple stochastic processes, focusing on the second-order joint statistics. Consider two stochastic processes {Xt}\{X_t\} and {Yt}\{Y_t\}. These processes are jointly wide-sense stationary if each is individually wide-sense stationary—meaning their means are constant (E[Xt]=μX\mathbb{E}[X_t] = \mu_X and E[Yt]=μY\mathbb{E}[Y_t] = \mu_Y for all tt) and their autocovariances depend only on the time lag (γX(τ)=Cov(Xt,Xt+τ)\gamma_X(\tau) = \mathrm{Cov}(X_t, X_{t+\tau}) and γY(τ)=Cov(Yt,Yt+τ)\gamma_Y(\tau) = \mathrm{Cov}(Y_t, Y_{t+\tau}))—and additionally, their cross-covariance function depends solely on the lag τ\tau:
γXY(τ)=Cov(Xt,Yt+τ)=E[(XtμX)(Yt+τμY)] \gamma_{XY}(\tau) = \mathrm{Cov}(X_t, Y_{t+\tau}) = \mathbb{E}[(X_t - \mu_X)(Y_{t+\tau} - \mu_Y)]
for all tt and τ\tau.[40] The cross-correlation function between the processes is then defined as
ρXY(τ)=γXY(τ)γX(0)γY(0), \rho_{XY}(\tau) = \frac{\gamma_{XY}(\tau)}{\sqrt{\gamma_X(0) \gamma_Y(0)}},
which normalizes the cross-covariance by the standard deviations and provides a measure of linear dependence that is invariant to time shifts. This setup ensures that the second-order joint structure of the processes remains unchanged under time translations.[35] An important application arises in econometrics, where cointegrated vector autoregression (VAR) processes model multiple time series that, while individually non-stationary, exhibit joint wide-sense stationarity in their error terms or linear combinations, capturing long-run equilibrium relationships such as those between economic indicators like GDP and interest rates.[41] This condition is weaker than joint strict-sense stationarity, as it requires only second-order moment invariance rather than time-invariance of all joint probability distributions.[40]

Comparisons Among Stationarity Types

Equivalence Conditions

A strictly stationary process implies wide-sense stationarity whenever the second moments are finite, as the constant mean and lag-dependent autocovariance follow directly from the time-invariance of the finite-dimensional distributions under this condition.[42] For Gaussian processes, wide-sense stationarity is equivalent to strict-sense stationarity, since the finite-dimensional distributions are fully determined by the mean and covariance functions, which are time-invariant in the wide-sense case. Higher-order stationarity for all finite orders nn, meaning the joint moments of any nn random variables are invariant under time shifts, implies strict-sense stationarity provided the moments determine the underlying finite-dimensional distributions, as in cases satisfying Carleman's moment condition. In the joint stationarity context for multiple processes, joint strict-sense stationarity implies joint wide-sense stationarity when second moments are finite, but the reverse implication fails in general, with counterexamples arising from non-Gaussian coupled processes where cross-covariances are stationary yet higher-order joint distributions vary with time shifts. Within ergodic theory, stationary processes that are also ergodic exhibit asymptotic equivalence between time averages of functions of the process and ensemble averages, as established by Birkhoff's ergodic theorem, enabling consistent estimation of statistical properties from single realizations.[43] In multivariate settings, these equivalence conditions extend naturally to vector-valued processes, where joint properties govern the overall stationarity.[44]

Implications for Stochastic Processes

Stationarity plays a foundational role in the analysis and modeling of stochastic processes by enabling key theoretical results and practical methodologies. For stationary ergodic processes, the Birkhoff ergodic theorem guarantees that time averages converge almost surely to ensemble averages, allowing inferences about long-term behavior from finite observations. This equivalence underpins much of statistical inference in time series, where ergodicity ensures that sample statistics reliably estimate population parameters.[43] In wide-sense stationary processes, stationarity facilitates spectral decomposition through the Wiener-Khinchin theorem, which establishes that the power spectral density exists as the Fourier transform of the autocorrelation function, providing a frequency-domain representation essential for filtering and prediction tasks. Without stationarity, such decompositions fail, complicating the identification of underlying dynamics. For Gaussian processes, this equivalence between strict-sense and wide-sense stationarity further simplifies analysis, as second-order statistics fully characterize the process.[45] Non-stationarity introduces significant risks in modeling, such as spurious regressions, where independent non-stationary series exhibit misleading correlations due to shared trends, as demonstrated in econometric simulations.[46] Stationarity assumptions are thus critical in algorithms like the Kalman filter, which relies on constant statistical properties for deriving optimal recursive estimators in linear dynamic systems.[47] Extensions of stationarity include almost-periodic processes, which arise as limits of stationary processes under weak convergence and possess covariance functions that are almost periodic, enabling analysis of quasi-periodic phenomena in physics and engineering. However, challenges persist in applications like communications, where cyclostationary processes—non-stationary with periodic statistical variations—offer superior modeling for signals with inherent cycles, such as modulated carriers, outperforming stationary approximations.[48][49]

Techniques for Achieving Stationarity

Differencing Methods

Differencing is a fundamental technique in time series analysis used to transform non-stationary processes into stationary ones by eliminating trends and stabilizing the mean. First-order differencing involves computing the differences between consecutive observations, defined as $ \Delta X_t = X_t - X_{t-1} $, which effectively removes linear trends and stabilizes variance in integrated processes of order one.[50] This method is particularly effective for processes exhibiting a constant drift, as the resulting differenced series often approximates white noise with constant mean and variance.[51] For processes with higher-degree polynomial trends, such as quadratic trends of degree $ k-1 $, k-th order differencing is applied iteratively. Second-order differencing, for instance, is given by $ \Delta^2 X_t = \Delta X_t - \Delta X_{t-1} = X_t - 2X_{t-1} + X_{t-2} $, which removes quadratic trends but is rarely needed beyond the second order due to data loss and potential introduction of unnecessary complexity.[50] Higher-order differencing assumes the original series follows a polynomial trend of the specified degree and is chosen based on the minimal order that achieves stationarity, often assessed via autocorrelation function decay.[51] In the context of autoregressive integrated moving average (ARIMA) models, differencing plays a central role in handling integrated processes denoted as I(d), where d represents the order of differencing required to achieve stationarity. The ARIMA(p, d, q) framework applies d-th order differencing to the original series before fitting an ARMA(p, q) model to the stationary residuals, enabling forecasting for non-stationary data like those with unit roots.[52] This approach, pioneered in the Box-Jenkins methodology, ensures the differenced series meets the stationarity assumptions necessary for parameter estimation and model identification.[52] A classic example is the random walk process, defined as $ X_t = X_{t-1} + \varepsilon_t $, where $ \varepsilon_t $ is white noise; this is non-stationary due to its unit root and accumulating variance. Applying first-order differencing yields $ \Delta X_t = \varepsilon_t $, transforming it into stationary white noise with zero mean and constant variance, facilitating straightforward modeling and prediction.[50] Despite its utility, differencing has limitations, including the risk of over-differencing, which occurs when more differences are applied than necessary, introducing an artificial moving average (MA) structure and negative lag-1 autocorrelations near -0.5.[53] Over-differencing can inflate variance and distort model forecasts, so it is preceded by unit root tests such as the Dickey-Fuller test to confirm the presence of a unit root and determine the appropriate differencing order.[54] The Augmented Dickey-Fuller test extends this by accounting for higher-order autoregressive terms, providing a robust statistical basis for deciding on d before differencing; it tests the null hypothesis of a unit root (non-stationarity), rejecting it if the p-value is less than 0.05, which indicates stationarity and mean reversion in the series.[54][10][11] Another complementary test is the Hurst exponent, which measures the long-term memory of the time series; a value H < 0.5 indicates mean-reverting (anti-persistent) behavior, supporting stationarity.[10][11]

Surrogate Data Approaches

Surrogate data approaches generate artificial time series that mimic the statistical properties of the original data under a specific null hypothesis, typically that of a stationary linear process, to test for deviations such as non-stationarity or nonlinearity. These methods preserve key features like the power spectrum or amplitude distribution while introducing randomness to create realizations consistent with the null hypothesis of stationarity and linearity, allowing researchers to assess whether observed behaviors arise from deterministic structures or stochastic variability.[55] One foundational technique is phase randomization surrogates, which generate data by applying the Fourier transform to the original series to obtain the amplitude spectrum, randomizing the phases uniformly between 0 and 2π, and then performing the inverse Fourier transform. This process yields surrogate series that retain the original power spectral density—ensuring the same frequency content and autocorrelation structure—while being consistent with a stationary Gaussian process under the null hypothesis of linearity.[55] To address limitations in preserving the empirical distribution alongside the spectrum, the amplitude-adjusted Fourier transform (AAFT) method refines this approach through an iterative procedure: first, rank-order the original data to match a surrogate's amplitude distribution via shuffling, then apply phase randomization in the Fourier domain, and iteratively adjust amplitudes to converge on both the power spectrum and the original marginal distribution. This results in surrogate series consistent with a stationary linear process that match the distributional properties of the original, making it suitable for non-Gaussian data under the null hypothesis.[55] In hypothesis testing, these surrogates evaluate the null hypothesis of stationarity and linearity by computing discriminating statistics—such as correlation dimension or predictability measures—on the original series and comparing them to distributions from multiple surrogates; significant deviations reject the null in favor of non-stationary or nonlinear alternatives. For instance, when applied to chaotic time series like the Lorenz attractor, phase randomization surrogates can reveal hidden periodicities by preserving spectral power while randomizing phases, allowing detection if the original exhibits stronger periodicity than expected under the stationary linear null.[55] Compared to differencing methods for removing linear trends, surrogate approaches like AAFT test for stationarity and linearity by generating data under a stationary linear null that preserves properties such as the spectrum and marginal distribution, enabling detection of non-stationarity or nonlinearity without assuming Gaussianity or preprocessing the original data. This facilitates analysis of complex systems to determine if they conform to stationary assumptions.[55]

References

User Avatar
No comments yet.