Recent from talks
Nothing was collected or created yet.
Autoregressive model
View on WikipediaThis article includes a list of general references, but it lacks sufficient corresponding inline citations. (March 2011) |
In statistics, econometrics, and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it can be used to describe certain time-varying processes in nature, economics, behavior, etc. The autoregressive model specifies that the output variable depends linearly on its own previous values and on a stochastic term (an imperfectly predictable term); thus the model is in the form of a stochastic difference equation (or recurrence relation) which should not be confused with a differential equation. Together with the moving-average (MA) model, it is a special case and key component of the more general autoregressive–moving-average (ARMA) and autoregressive integrated moving average (ARIMA) models of time series, which have a more complicated stochastic structure; it is also a special case of the vector autoregressive model (VAR), which consists of a system of more than one interlocking stochastic difference equation in more than one evolving random variable. Another important extension is the time-varying autoregressive (TVAR) model, where the autoregressive coefficients are allowed to change over time to model evolving or non-stationary processes. TVAR models are widely applied in cases where the underlying dynamics of the system are not constant, such as in sensors time series modelling[1][2], finance[3], climate science[4], economics[5], signal processing[6] and telecommunications[7], radar systems[8], and biological signals[9].
Unlike the moving-average (MA) model, the autoregressive model is not always stationary; non-stationarity can arise either due to the presence of a unit root or due to time-varying model parameters, as in time-varying autoregressive (TVAR) models.
Large language models are called autoregressive, but they are not a classical autoregressive model in this sense because they are not linear.
Definition
[edit]The notation indicates an autoregressive model of order p. The AR(p) model is defined as
where are the parameters of the model, and is white noise.[10][11] This can be equivalently written using the backshift operator B as
so that, moving the summation term to the left side and using polynomial notation, we have
An autoregressive model can thus be viewed as the output of an all-pole infinite impulse response filter whose input is white noise.
Some parameter constraints are necessary for the model to remain weak-sense stationary. For example, processes in the AR(1) model with are not stationary. More generally, for an AR(p) model to be weak-sense stationary, the roots of the polynomial must lie outside the unit circle, i.e., each (complex) root must satisfy (see pages 89,92 [12]).
Intertemporal effect of shocks
[edit]In an AR process, a one-time shock affects values of the evolving variable infinitely far into the future. For example, consider the AR(1) model . A non-zero value for at say time t=1 affects by the amount . Then by the AR equation for in terms of , this affects by the amount . Then by the AR equation for in terms of , this affects by the amount . Continuing this process shows that the effect of never ends, although if the process is stationary then the effect diminishes toward zero in the limit.
Because each shock affects X values infinitely far into the future from when they occur, any given value Xt is affected by shocks occurring infinitely far into the past. This can also be seen by rewriting the autoregression
(where the constant term has been suppressed by assuming that the variable has been measured as deviations from its mean) as
When the polynomial division on the right side is carried out, the polynomial in the backshift operator applied to has an infinite order—that is, an infinite number of lagged values of appear on the right side of the equation.
Characteristic polynomial
[edit]The autocorrelation function of an AR(p) process can be expressed as [citation needed]
where are the roots of the polynomial
where B is the backshift operator, where is the function defining the autoregression, and where are the coefficients in the autoregression. The formula is valid only if all the roots have multiplicity 1.[citation needed]
The autocorrelation function of an AR(p) process is a sum of decaying exponentials.
- Each real root contributes a component to the autocorrelation function that decays exponentially.
- Similarly, each pair of complex conjugate roots contributes an exponentially damped oscillation.
Graphs of AR(p) processes
[edit]
The simplest AR process is AR(0), which has no dependence between the terms. Only the error/innovation/noise term contributes to the output of the process, so in the figure, AR(0) corresponds to white noise.
For an AR(1) process with a positive , only the previous term in the process and the noise term contribute to the output. If is close to 0, then the process still looks like white noise, but as approaches 1, the output gets a larger contribution from the previous term relative to the noise. This results in a "smoothing" or integration of the output, similar to a low pass filter.
For an AR(2) process, the previous two terms and the noise term contribute to the output. If both and are positive, the output will resemble a low pass filter, with the high frequency part of the noise decreased. If is positive while is negative, then the process favors changes in sign between terms of the process. The output oscillates. This can be linked to edge detection or detection of change in direction.
Example: An AR(1) process
[edit]An AR(1) process is given by:where is a white noise process with zero mean and constant variance . (Note: The subscript on has been dropped.) The process is weak-sense stationary if since it is obtained as the output of a stable filter whose input is white noise. (If then the variance of depends on time lag t, so that the variance of the series diverges to infinity as t goes to infinity, and is therefore not weak-sense stationary.) Assuming , the mean is identical for all values of t by definition of weak sense stationarity. If the mean is denoted by , it follows fromthatand hence
The variance is
where is the standard deviation of . This can be shown by noting that
and then by noticing that the quantity above is a stable fixed point of this relation.
The autocovariance is given by
It can be seen that the autocovariance function decays with a decay time (also called time constant) of .[13]
The spectral density function is the Fourier transform of the autocovariance function. In discrete terms this will be the discrete-time Fourier transform:
This expression is periodic due to the discrete nature of the , which is manifested as the cosine term in the denominator. If we assume that the sampling time () is much smaller than the decay time (), then we can use a continuum approximation to :
which yields a Lorentzian profile for the spectral density:
where is the angular frequency associated with the decay time .
An alternative expression for can be derived by first substituting for in the defining equation. Continuing this process N times yields
For N approaching infinity, will approach zero and:
It is seen that is white noise convolved with the kernel plus the constant mean. If the white noise is a Gaussian process then is also a Gaussian process. In other cases, the central limit theorem indicates that will be approximately normally distributed when is close to one.
For , the process will be a geometric progression (exponential growth or decay). In this case, the solution can be found analytically: whereby is an unknown constant (initial condition).
Explicit mean/difference form of AR(1) process
[edit]The AR(1) model is the discrete-time analogy of the continuous Ornstein-Uhlenbeck process. It is therefore sometimes useful to understand the properties of the AR(1) model cast in an equivalent form. In this form, the AR(1) model, with process parameter , is given by
- , where , is the model mean, and is a white-noise process with zero mean and constant variance .
By rewriting this as and then deriving (by induction) , one can show that
- and
- .
Choosing the maximum lag
[edit]The partial autocorrelation of an AR(p) process equals zero at lags larger than p, so the appropriate maximum lag p is the one after which the partial autocorrelations are all zero.
Calculation of the AR parameters
[edit]There are many ways to estimate the coefficients, such as the ordinary least squares procedure or method of moments (through Yule–Walker equations).
The AR(p) model is given by the equation
It is based on parameters where i = 1, ..., p. There is a direct correspondence between these parameters and the covariance function of the process, and this correspondence can be inverted to determine the parameters from the autocorrelation function (which is itself obtained from the covariances). This is done using the Yule–Walker equations.
Yule–Walker equations
[edit]The Yule–Walker equations, named for Udny Yule and Gilbert Walker,[14][15] are the following set of equations.[16]
where m = 0, …, p, yielding p + 1 equations. Here is the autocovariance function of Xt, is the standard deviation of the input noise process, and is the Kronecker delta function.
Because the last part of an individual equation is non-zero only if m = 0, the set of equations can be solved by representing the equations for m > 0 in matrix form, thus getting the equation
which can be solved for all The remaining equation for m = 0 is
which, once are known, can be solved for
An alternative formulation is in terms of the autocorrelation function. The AR parameters are determined by the first p+1 elements of the autocorrelation function. The full autocorrelation function can then be derived by recursively calculating [17]
Examples for some Low-order AR(p) processes
- p=1
- Hence
- p=2
- The Yule–Walker equations for an AR(2) process are
- Remember that
- Using the first equation yields
- Using the recursion formula yields
- The Yule–Walker equations for an AR(2) process are
Estimation of AR parameters
[edit]The above equations (the Yule–Walker equations) provide several routes to estimating the parameters of an AR(p) model, by replacing the theoretical covariances with estimated values.[18] Some of these variants can be described as follows:
- Estimation of autocovariances or autocorrelations. Here each of these terms is estimated separately, using conventional estimates. There are different ways of doing this and the choice between these affects the properties of the estimation scheme. For example, negative estimates of the variance can be produced by some choices.
- Formulation as a least squares regression problem in which an ordinary least squares prediction problem is constructed, basing prediction of values of Xt on the p previous values of the same series. This can be thought of as a forward-prediction scheme. The normal equations for this problem can be seen to correspond to an approximation of the matrix form of the Yule–Walker equations in which each appearance of an autocovariance of the same lag is replaced by a slightly different estimate.
- Formulation as an extended form of ordinary least squares prediction problem. Here two sets of prediction equations are combined into a single estimation scheme and a single set of normal equations. One set is the set of forward-prediction equations and the other is a corresponding set of backward prediction equations, relating to the backward representation of the AR model:
- Here predicted values of Xt would be based on the p future values of the same series.[clarification needed] This way of estimating the AR parameters is due to John Parker Burg,[19] and is called the Burg method:[20] Burg and later authors called these particular estimates "maximum entropy estimates",[21] but the reasoning behind this applies to the use of any set of estimated AR parameters. Compared to the estimation scheme using only the forward prediction equations, different estimates of the autocovariances are produced, and the estimates have different stability properties. Burg estimates are particularly associated with maximum entropy spectral estimation.[22]
Other possible approaches to estimation include maximum likelihood estimation. Two distinct variants of maximum likelihood are available: in one (broadly equivalent to the forward prediction least squares scheme) the likelihood function considered is that corresponding to the conditional distribution of later values in the series given the initial p values in the series; in the second, the likelihood function considered is that corresponding to the unconditional joint distribution of all the values in the observed series. Substantial differences in the results of these approaches can occur if the observed series is short, or if the process is close to non-stationarity.
Spectrum
[edit]

The power spectral density (PSD) of an AR(p) process with noise variance is[17]
AR(0)
[edit]For white noise (AR(0))
AR(1)
[edit]For AR(1)
- If there is a single spectral peak at , often referred to as red noise. As becomes nearer 1, there is stronger power at low frequencies, i.e. larger time lags. This is then a low-pass filter, when applied to full spectrum light, everything except for the red light will be filtered.
- If there is a minimum at , often referred to as blue noise. This similarly acts as a high-pass filter, everything except for blue light will be filtered.
AR(2)
[edit]The behavior of an AR(2) process is determined entirely by the roots of it characteristic equation, which is expressed in terms of the lag operator as:
or equivalently by the poles of its transfer function, which is defined in the Z domain by:
It follows that the poles are values of z satisfying:
- ,
which yields:
- .
and are the reciprocals of the characteristic roots, as well as the eigenvalues of the temporal update matrix:
AR(2) processes can be split into three groups depending on the characteristics of their roots/poles:
- When , the process has a pair of complex-conjugate poles, creating a mid-frequency peak at:
with bandwidth about the peak inversely proportional to the moduli of the poles:
The terms involving square roots are all real in the case of complex poles since they exist only when .
Otherwise the process has real roots, and:
- When it acts as a low-pass filter on the white noise with a spectral peak at
- When it acts as a high-pass filter on the white noise with a spectral peak at .
The process is non-stationary when the poles are on or outside the unit circle, or equivalently when the characteristic roots are on or inside the unit circle. The process is stable when the poles are strictly within the unit circle (roots strictly outside the unit circle), or equivalently when the coefficients are in the triangle .
The full PSD function can be expressed in real form as:
Implementations in statistics packages
[edit]- R – the stats package includes ar function;[23] the astsa package includes sarima function to fit various models including AR.[24]
- MATLAB – the Econometrics Toolbox[25] and System Identification Toolbox[26] include AR models.[27]
- MATLAB and Octave – the TSA toolbox contains several estimation functions for uni-variate, multivariate, and adaptive AR models.[28]
- PyMC3 – the Bayesian statistics and probabilistic programming framework supports AR modes with p lags.
- bayesloop – supports parameter inference and model selection for the AR-1 process with time-varying parameters.[29]
- Python – statsmodels.org hosts an AR model.[30]
Impulse response
[edit]The impulse response of a system is the change in an evolving variable in response to a change in the value of a shock term k periods earlier, as a function of k. Since the AR model is a special case of the vector autoregressive model, the computation of the impulse response in vector autoregression#impulse response applies here.
n-step-ahead forecasting
[edit]Once the parameters of the autoregression
have been estimated, the autoregression can be used to forecast an arbitrary number of periods into the future. First use t to refer to the first period for which data is not yet available; substitute the known preceding values Xt-i for i=1, ..., p into the autoregressive equation while setting the error term equal to zero (because we forecast Xt to equal its expected value, and the expected value of the unobserved error term is zero). The output of the autoregressive equation is the forecast for the first unobserved period. Next, use t to refer to the next period for which data is not yet available; again the autoregressive equation is used to make the forecast, with one difference: the value of X one period prior to the one now being forecast is not known, so its expected value—the predicted value arising from the previous forecasting step—is used instead. Then for future periods the same procedure is used, each time using one more forecast value on the right side of the predictive equation until, after p predictions, all p right-side values are predicted values from preceding steps.
There are four sources of uncertainty regarding predictions obtained in this manner: (1) uncertainty as to whether the autoregressive model is the correct model; (2) uncertainty about the accuracy of the forecasted values that are used as lagged values in the right side of the autoregressive equation; (3) uncertainty about the true values of the autoregressive coefficients; and (4) uncertainty about the value of the error term for the period being predicted. Each of the last three can be quantified and combined to give a confidence interval for the n-step-ahead predictions; the confidence interval will become wider as n increases because of the use of an increasing number of estimated values for the right-side variables.
See also
[edit]Notes
[edit]- ^ Souza, Douglas Baptista de; Leao, Bruno Paes (26 October 2023). "Data Augmentation of Sensor Time Series using Time-varying Autoregressive Processes". Annual Conference of the PHM Society. 15 (1). doi:10.36001/phmconf.2023.v15i1.3565.
- ^ Souza, Douglas Baptista de; Leao, Bruno Paes (5 November 2024). "Data Augmentation of Multivariate Sensor Time Series using Autoregressive Models and Application to Failure Prognostics". Annual Conference of the PHM Society. 16 (1). arXiv:2410.16419. doi:10.36001/phmconf.2024.v16i1.4145.
- ^ Jia, Zhixuan; Li, Wang; Jiang, Yunlong; Liu, Xingshen (9 July 2025). "The Use of Minimization Solvers for Optimizing Time-Varying Autoregressive Models and Their Applications in Finance". Mathematics. 13 (14): 2230. doi:10.3390/math13142230.
- ^ Diodato, Nazzareno; Di Salvo, Cristina; Bellocchi, Gianni (18 March 2025). "Climate driven generative time-varying model for improved decadal storm power predictions in the Mediterranean". Communications Earth & Environment. 6 (1): 212. Bibcode:2025ComEE...6..212D. doi:10.1038/s43247-025-02196-2.
- ^ Inayati, Syarifah; Iriawan, Nur (31 December 2024). "Time-Varying Autoregressive Models for Economic Forecasting". Matematika: 131–142. doi:10.11113/matematika.v40.n3.1654.
- ^ Baptista de Souza, Douglas; Kuhn, Eduardo Vinicius; Seara, Rui (January 2019). "A Time-Varying Autoregressive Model for Characterizing Nonstationary Processes". IEEE Signal Processing Letters. 26 (1): 134–138. Bibcode:2019ISPL...26..134B. doi:10.1109/LSP.2018.2880086.
- ^ Wang, Shihan; Chen, Tao; Wang, Hongjian (17 March 2023). "IDBD-Based Beamforming Algorithm for Improving the Performance of Phased Array Radar in Nonstationary Environments". Sensors. 23 (6): 3211. Bibcode:2023Senso..23.3211W. doi:10.3390/s23063211. PMC 10052024. PMID 36991922.
- ^ Abramovich, Yuri I.; Spencer, Nicholas K.; Turley, Michael D. E. (April 2007). "Time-Varying Autoregressive (TVAR) Models for Multiple Radar Observations". IEEE Transactions on Signal Processing. 55 (4): 1298–1311. Bibcode:2007ITSP...55.1298A. doi:10.1109/TSP.2006.888064.
- ^ Gutierrez, D.; Salazar-Varas, R. (August 2011). "EEG signal classification using time-varying autoregressive models and common spatial patterns". 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. pp. 6585–6588. doi:10.1109/IEMBS.2011.6091624. ISBN 978-1-4577-1589-1. PMID 22255848.
- ^ Box, George E. P. (1994). Time series analysis : forecasting and control. Gwilym M. Jenkins, Gregory C. Reinsel (3rd ed.). Englewood Cliffs, N.J.: Prentice Hall. p. 54. ISBN 0-13-060774-6. OCLC 28888762.
- ^ Shumway, Robert H. (2000). Time series analysis and its applications. David S. Stoffer. New York: Springer. pp. 90–91. ISBN 0-387-98950-1. OCLC 42392178.
- ^ Shumway, Robert H.; Stoffer, David (2010). Time series analysis and its applications : with R examples (3rd ed.). Springer. ISBN 978-1441978646.
- ^ Lai, Dihui; and Lu, Bingfeng; "Understanding Autoregressive Model for Time Series as a Deterministic Dynamic System" Archived 2023-03-24 at the Wayback Machine, in Predictive Analytics and Futurism, June 2017, number 15, June 2017, pages 7-9
- ^ Yule, G. Udny (1927) "On a Method of Investigating Periodicities in Disturbed Series, with Special Reference to Wolfer's Sunspot Numbers" Archived 2011-05-14 at the Wayback Machine, Philosophical Transactions of the Royal Society of London, Ser. A, Vol. 226, 267–298.]
- ^ Walker, Gilbert (1931) "On Periodicity in Series of Related Terms" Archived 2011-06-07 at the Wayback Machine, Proceedings of the Royal Society of London, Ser. A, Vol. 131, 518–532.
- ^ Theodoridis, Sergios (2015-04-10). "Chapter 1. Probability and Stochastic Processes". Machine Learning: A Bayesian and Optimization Perspective. Academic Press, 2015. pp. 9–51. ISBN 978-0-12-801522-3.
- ^ a b Von Storch, Hans; Zwiers, Francis W. (2001). Statistical analysis in climate research. Cambridge University Press. doi:10.1017/CBO9780511612336. ISBN 0-521-01230-9.[page needed]
- ^ Eshel, Gidon. "The Yule Walker Equations for the AR Coefficients" (PDF). stat.wharton.upenn.edu. Archived (PDF) from the original on 2018-07-13. Retrieved 2019-01-27.
- ^ Burg, John Parker (1968); "A new analysis technique for time series data", in Modern Spectrum Analysis (Edited by D. G. Childers), NATO Advanced Study Institute of Signal Processing with emphasis on Underwater Acoustics. IEEE Press, New York.
- ^ Brockwell, Peter J.; Dahlhaus, Rainer; Trindade, A. Alexandre (2005). "Modified Burg Algorithms for Multivariate Subset Autoregression" (PDF). Statistica Sinica. 15: 197–213. Archived from the original (PDF) on 2012-10-21.
- ^ Burg, John Parker (1967) "Maximum Entropy Spectral Analysis", Proceedings of the 37th Meeting of the Society of Exploration Geophysicists, Oklahoma City, Oklahoma.
- ^ Bos, Robert; De Waele, Stijn; Broersen, Piet M. T. (2002). "Autoregressive spectral estimation by application of the Burg algorithm to irregularly sampled data". IEEE Transactions on Instrumentation and Measurement. 51 (6): 1289. Bibcode:2002ITIM...51.1289B. doi:10.1109/TIM.2002.808031. Archived from the original on 2023-04-16. Retrieved 2019-12-11.
- ^ "Fit Autoregressive Models to Time Series" Archived 2016-01-28 at the Wayback Machine (in R)
- ^ Stoffer, David; Poison, Nicky (2023-01-09). "astsa: Applied Statistical Time Series Analysis". Retrieved 2023-08-20.
- ^ "Econometrics Toolbox". www.mathworks.com. Archived from the original on 2023-04-16. Retrieved 2022-02-16.
- ^ "System Identification Toolbox". www.mathworks.com. Archived from the original on 2022-02-16. Retrieved 2022-02-16.
- ^ "Autoregressive Model - MATLAB & Simulink". www.mathworks.com. Archived from the original on 2022-02-16. Retrieved 2022-02-16.
- ^ "The Time Series Analysis (TSA) toolbox for Octave and MATLAB". pub.ist.ac.at. Archived from the original on 2012-05-11. Retrieved 2012-04-03.
- ^ "christophmark/bayesloop". December 7, 2021. Archived from the original on September 28, 2020. Retrieved September 4, 2018 – via GitHub.
- ^ "statsmodels.tsa.ar_model.AutoReg — statsmodels 0.12.2 documentation". www.statsmodels.org. Archived from the original on 2021-02-28. Retrieved 2021-04-29.
References
[edit]- Mills, Terence C. (1990). Time Series Techniques for Economists. Cambridge University Press. ISBN 9780521343398.
- Percival, Donald B.; Walden, Andrew T. (1993). Spectral Analysis for Physical Applications. Cambridge University Press. Bibcode:1993sapa.book.....P.
- Pandit, Sudhakar M.; Wu, Shien-Ming (1983). Time Series and System Analysis with Applications. John Wiley & Sons.
External links
[edit]- AutoRegression Analysis (AR) by Paul Bourke
- Econometrics lecture (topic: Autoregressive models) on YouTube by Mark Thoma
Autoregressive model
View on Grokipediawhere is the value at time t, is a constant, are the model parameters (autoregressive coefficients), and is white noise error with mean zero and constant variance.[1] For the simplest case, an AR(1) model, this reduces to , assuming stationarity when .[1] The concept originated in the early 20th century, with George Udny Yule introducing the first AR(2) model in 1927 to investigate periodicities in sunspot data, addressing limitations of purely deterministic cycle models by incorporating random disturbances.[2] This work was extended by Gilbert Thomas Walker in 1931, who generalized the approach to higher-order autoregressions and derived methods for parameter estimation, laying the groundwork for the Yule-Walker equations that solve for the coefficients using autocorrelations.[3] Autoregressive models are central to time series analysis, particularly in econometrics and finance, where they forecast variables like stock prices or economic indicators by capturing serial correlation; for instance, AR models have been applied to predict Google stock returns based on lagged values.[4][1] In signal processing, they model stationary processes for tasks like speech analysis or noise reduction, assuming the data-generating mechanism follows a linear recursive structure.[5] In contemporary machine learning, autoregressive principles underpin generative models for sequences, such as those in natural language processing (e.g., predicting the next word conditioned on prior context) and computer vision (e.g., generating images pixel by pixel), enabling scalable density estimation through the chain rule of probability.[6] These extensions, often implemented with neural networks like recurrent or transformer architectures, have revolutionized applications in large language models while inheriting the core idea of sequential dependency modeling. However, according to Yann LeCun, autoregressive large language models lack true understanding, planning, and reasoning capabilities due to limitations in sample efficiency, world modeling, and their reliance on predicting discrete tokens rather than continuous representations.[7]
Fundamentals
Definition
An autoregressive (AR) model is a stochastic process in which each observation is expressed as a linear combination of previous observations of the same process plus a random error term.[8] These models are fundamental in time series analysis for capturing temporal dependencies, where the value at time relies on prior values rather than assuming observations are independent.[9] In contrast to models treating data points as unrelated, AR models leverage the inherent autocorrelation in sequential data, such as economic indicators or natural phenomena, to represent persistence or momentum.[10] The general form of an AR model of order , denoted AR(), is given by where is a constant, are the model parameters (autoregressive coefficients), and is white noise—a sequence of independent and identically distributed random variables with mean zero and constant variance.[8] The order indicates the number of lagged terms included, allowing the model to account for dependencies extending back periods.[11] AR models differ from moving average (MA) models, which express the current value as a linear combination of past forecast errors rather than past values of the series itself.[12] The term "autoregressive" derives from the idea of performing a regression of the variable against its own lagged values, emphasizing self-dependence within the time series.[2] This framework assumes stationarity for reliable inference, though extensions like ARIMA incorporate differencing for non-stationary data.[13]Historical Development
The origins of autoregressive models trace back to the work of British statistician George Udny Yule, who in 1927 introduced autoregressive schemes to analyze periodicities in disturbed time series, particularly applying them to Wolfer's sunspot numbers to model cycles in astronomical data.[2] Yule's approach represented a departure from traditional periodogram methods, emphasizing stochastic processes where current values depend on past observations to capture quasi-periodic behaviors in time series.[2] In 1931, Gilbert Thomas Walker extended Yule's framework by generalizing it to higher-order autoregressive models, allowing for more flexible representations of complex dependencies in related time series.[3] In the 1930s and 1940s, Herman Wold further advanced the theory by developing the Wold decomposition, showing that stationary processes can be represented as infinite-order AR or MA models, paving the way for ARMA frameworks. A major milestone came in 1970 with George E. P. Box and Gwilym M. Jenkins, who incorporated autoregressive models into the ARMA framework in their seminal book, providing a systematic methodology for identification, estimation, and forecasting that popularized AR models across forecasting applications in statistics and beyond. Since the 1980s, autoregressive models have seen modern extensions in signal processing for spectral estimation and analysis of stationary signals, as well as in machine learning through autoregressive neural networks that leverage past outputs for sequence generation tasks.Model Formulation
General AR(p) Equation
The autoregressive model of order , commonly denoted as AR(), specifies that the value of a time series at time , , depends linearly on its previous values plus a constant term and a stochastic error. The general form of the model is given by where are the autoregressive parameters, is a constant representing the deterministic component (often related to the mean of the process), and is a white noise error term. This equation can be rearranged as . The error term is assumed to have mean zero, constant variance , and to be uncorrelated across time, i.e., , for , and . For statistical inference, such as maximum likelihood estimation, the errors are often further assumed to be independent and identically distributed as Gaussian, .[14][15] When , the model is homogeneous, implying a zero-mean process, which is suitable for centered data. In the inhomogeneous case with , the constant accounts for a non-zero mean, and under weak stationarity, the unconditional mean of the process is . The model assumes weak stationarity, meaning the mean, variance, and autocovariances are time-invariant, which requires the roots of the characteristic polynomial to lie outside the unit circle (as detailed in the stationarity conditions section). For inference involving normality assumptions, Gaussian errors facilitate exact likelihood computations.[16][17] A compact notation for the AR() model employs the backshift operator , defined such that and for . The autoregressive polynomial is , leading to the operator form . This notation simplifies manipulations, such as differencing or combining with moving average components in broader ARMA models.[14][18] For stationary AR() processes, the model admits an infinite moving average (MA()) representation, expressing as an infinite linear combination of current and past errors plus the mean: , where the coefficients are determined by the autoregressive parameters and satisfy with to ensure absolute summability. This representation underscores the process's dependence on the entire error history, providing a foundation for forecasting and spectral analysis.[19][20]Stationarity Conditions
In time series analysis, weak stationarity, also known as covariance stationarity, requires that a process has a constant mean, constant variance, and autocovariances that depend solely on the time lag rather than the specific time points.[15] For an autoregressive process of order , denoted AR(), this property ensures that the statistical characteristics remain invariant over time, facilitating reliable modeling and forecasting.[15] The necessary and sufficient condition for an AR() process to be weakly stationary is that all roots of the characteristic equation lie outside the unit circle in the complex plane, meaning their moduli satisfy .[17] This condition guarantees the existence of a stationary solution.[21] For the simple AR(1) process , stationarity holds if and only if .[15] If the stationarity condition is violated, such as when one or more roots have modulus , the AR process becomes non-stationary, exhibiting behaviors like unit root processes (e.g., random walks with time-dependent variance) or explosive dynamics where variance grows without bound.[17] In the case of a unit root (), as in an AR(1) with , the process integrates to form a non-stationary series with persistent shocks.[21] To address non-stationarity in AR processes, differencing transforms the series into a stationary one by applying the operator , which removes trends or unit roots; higher-order differencing () may be needed for processes integrated of order .[22] This approach underpins ARIMA models, where the differenced series follows a stationary ARMA process.[22]Properties and Analysis
Characteristic Polynomial
The characteristic polynomial of an autoregressive model of order , denoted , arises from the AR operator , where is the backshift operator such that .[23] This polynomial encapsulates the linear dependence structure of the process defined by , with as white noise.[17] To derive the characteristic polynomial, consider the AR equation in operator form: . Substituting the lag operator with a complex variable yields the polynomial , which can be viewed through the lens of the z-transform of the process.[24] The z-transform approach transforms the difference equation into an algebraic one, where serves as the denominator in the transfer function , enabling the representation of the AR process as an infinite moving average via partial fraction expansion of the roots.[24] Alternatively, generating functions can be used to express the moments of the process, with the characteristic polynomial emerging from the denominator of the generating function for the autocovariances.[24] The roots of provide key insights into the dynamics of the AR process. If the roots are complex conjugates, they introduce oscillatory components in the time series behavior, with the argument of the roots determining the frequency of oscillation.[24] The modulus of the roots governs persistence: roots with smaller modulus (closer to but outside the unit circle) imply slower decay of shocks and longer-lasting effects, while larger moduli (farther from the unit circle) indicate faster decay.[24] For stationarity, all roots must lie outside the unit circle in the complex plane, a condition that ensures the infinite MA representation converges.[25] Pure AR models are always invertible. Stationary AR models can be expressed as a convergent infinite moving average (MA(∞)) representation without additional constraints beyond the root locations.[17] Graphically, the roots are plotted in the complex plane, where the unit circle serves as a boundary: points inside indicate non-stationarity, while those outside confirm it, visually highlighting oscillatory patterns via the imaginary axis and persistence via radial distance.[25]Intertemporal Effects of Shocks
In an autoregressive (AR) model, a shock is conceptualized as a one-time innovation to the error term, representing an unanticipated disturbance at time . This shock influences the future values of the process for through the model's recursive structure. The marginal effect of such a shock is given by , where denotes the -th dynamic multiplier, obtained by recursively applying the AR coefficients (for an AR() model, , for ).[26] The persistence of these intertemporal effects depends on the stationarity of the AR process. In a stationary AR model, where all roots of the characteristic polynomial lie outside the unit circle, the effects of a shock decay geometrically over time, ensuring that the influence diminishes as increases (e.g., in an AR(1) process with coefficient , the effect on is ). Conversely, in non-stationary cases, such as when a unit root is present (e.g., in AR(1)), the effects accumulate rather than decay, leading to permanent shifts in the level of the series.[26] A key aspect of shock propagation is the variance decomposition, which quantifies how past shocks contribute to the current unconditional variance of the process. For a stationary AR model, the variance , where each term represents the contribution from a shock periods in the past; this infinite sum converges due to geometric decay. In the AR(1) case, it simplifies to , illustrating how earlier shocks have exponentially smaller contributions relative to recent ones.[26] In econometric applications, particularly in macroeconomics, these shocks are often interpreted as exogenous events such as policy changes, supply disruptions, or demand fluctuations that propagate through economic variables modeled via AR processes. For instance, an unanticipated monetary policy tightening can be viewed as a negative shock whose intertemporal effects trace the subsequent adjustments in output or inflation, with persistence reflecting the economy's inertial response to such interventions.[27]Impulse Response Function
In autoregressive (AR) models, the impulse response function (IRF) quantifies the dynamic impact of a unit shock to the innovation term on the future values of the process . It is formally defined as the sequence of coefficients for , with the initial condition reflecting the contemporaneous effect of the shock. These IRF coefficients arise from the moving average representation of the stationary AR process, , and satisfy a linear recurrence relation derived from the AR structure. For an AR() model, they are computed recursively as for , with for . This recursion allows efficient numerical calculation of the IRF sequence, starting from the known AR parameters . For the simple AR(1) model, , the IRF has a closed-form expression for . Under the stationarity condition , this exhibits geometric decay, with the shock's influence diminishing exponentially over time. In practice, IRFs for AR() models with are visualized through plots tracing against , revealing patterns such as monotonic decay, overshooting (where the response temporarily exceeds the long-run effect), or oscillatory behavior influenced by complex roots in the model's characteristic polynomial. For instance, roots near the unit circle can prolong the shock's persistence, while purely real roots yield smoother responses. To account for estimation uncertainty, confidence bands are constructed around estimated IRFs using methods like asymptotic normality, which relies on the variance-covariance matrix of the AR parameters, or bootstrapping, which resamples residuals to simulate the sampling distribution of the responses. These bands widen with the forecast horizon and are essential for statistical inference on shock persistence.[28]Specific Examples
AR(1) Process
The AR(1) process is the first-order autoregressive model, capturing dependence of the current observation on only the immediate past value. It is expressed as where denotes a constant term, is the autoregressive coefficient satisfying for stationarity, and is white noise with zero mean and finite variance .[29] Under the stationarity condition , the unconditional mean of the process is .[29] The unconditional variance is .[29][30] The autocorrelation function of the stationary AR(1) process exhibits exponential decay, given by for lag .[30] This geometric decline reflects the diminishing influence of past shocks over time, with the rate determined by .[29] An equivalent representation centers the process around its mean, yielding the mean-deviation form This formulation highlights the mean-reverting dynamics when , as deviations from are scaled by before adding new noise.[30] Simulations of AR(1) sample paths reveal behavioral contrasts across values. For , paths show rapid mean reversion, with quick damping of shocks and low persistence.[29] At , paths display high persistence, wandering slowly before reverting, mimicking long-memory patterns.[29] Negative , such as , produces oscillatory paths alternating around the mean.[30] In the unit root case where , the AR(1) process simplifies to a random walk, , which lacks stationarity as variance grows indefinitely with time.[29]AR(2) Process
The AR(2) process extends the autoregressive framework to second-order dependence, defined by the equation where is a constant, and are the autoregressive parameters, and is white noise with mean zero and finite variance .[31] This formulation allows the current value to depend linearly on the two preceding observations, capturing more complex temporal dynamics than the first-order case.[32] Stationarity of the AR(2) process requires that the roots of the characteristic equation lie outside the unit circle.[32] Equivalently, this condition holds if the parameters satisfy , , and , defining a triangular region in the parameter space.[33] Under these constraints, the process has a time-invariant mean and finite variance.[31] The autocorrelation function (ACF) of a stationary AR(2) process decays gradually to zero, following the recursive relation for , with initial values and .[34] If the characteristic roots are complex conjugates—which occurs when the discriminant —the ACF exhibits damped sine wave oscillations, reflecting pseudo-periodic behavior.[35] In contrast, real roots produce a monotonic exponential decay in the ACF.[36] The partial autocorrelation function (PACF) for an AR(2) process truncates after lag 2, with for all , providing a diagnostic signature for model identification.[31] This sharp cutoff distinguishes AR(2) from higher-order processes, where the PACF would decay more slowly.[34] The distinction between real and complex characteristic roots fundamentally shapes the process's dynamics: real roots yield smooth, non-oscillatory persistence, while complex roots introduce cyclic patterns with a pseudo-period determined by .[34] Simulated AR(2) series with complex roots, such as those satisfying the stationarity triangle and negative , demonstrate this through visibly damped oscillatory trajectories, highlighting behaviors like stochastic cycles absent in lower-order models.[35]Estimation Methods
Choosing the Lag Order
Selecting the appropriate lag order in an autoregressive AR() model is crucial for balancing model fit and parsimony, as an overly low may underfit the data by omitting relevant dynamics, while a high risks capturing noise rather than true structure.[37] Methods for lag selection generally involve graphical tools, statistical criteria, testing procedures, and validation techniques, each providing complementary insights into the underlying serial dependence.[1] One foundational approach relies on the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the time series. The ACF measures the linear correlation between observations at different lags, often decaying gradually for AR processes, while the PACF isolates the direct correlation at lag after removing effects of earlier lags. For an AR() model, the theoretical PACF cuts off to zero after lag , with significant sample PACF values (typically exceeding bounds of , where is the sample size) at lags 1 through and insignificance beyond. Practitioners plot the PACF and identify the lag where spikes become negligible, suggesting that order as a candidate .[1] Information criteria offer a quantitative means to penalize model complexity while rewarding goodness of fit, commonly applied after fitting candidate AR models via least squares. The Akaike Information Criterion (AIC) is defined as where is the maximized likelihood and accounts for the intercept and autoregressive coefficients; the order minimizing AIC is selected. Similarly, the Bayesian Information Criterion (BIC), which imposes a stronger penalty on complexity, is with the sample size, favoring more parsimonious models and often yielding lower than AIC, especially in finite samples. Both criteria are computed sequentially for increasing until a minimum is reached, though BIC's consistency property makes it preferable when the true order is of interest.[37] Hypothesis testing provides a formal sequential framework for lag inclusion, starting from a baseline model and adding lags until evidence of significance wanes. Sequential t-tests assess the individual significance of the highest lag coefficient in an AR() model against zero, using standard errors from OLS estimation; if insignificant (e.g., at 5% level), reduce by one and retest. Alternatively, F-tests evaluate the joint significance of all additional lags from AR() to AR(), equivalent to testing restrictions on coefficients; rejection supports retaining the higher order. These "testing up" or "testing down" procedures guard against arbitrary choices but require assumptions like serially uncorrelated errors.[37][38] Cross-validation evaluates candidate orders by their out-of-sample predictive performance, partitioning the time series into training and holdout sets while preserving temporal order to avoid lookahead bias. For AR models, one computes the mean absolute prediction error (MAPE) or root mean squared error (RMSE) for forecasts on the holdout using models fitted to training data; the minimizing this error is chosen. K-fold variants (e.g., 10-fold) are valid when residuals are uncorrelated, outperforming in-sample metrics, but fail with serial correlation in underfit models—residual diagnostics like the Ljung-Box test are essential post-selection.[39] A key concern with high lag orders is overfitting, where the model captures idiosyncratic noise in the sample, leading to inflated in-sample fit but poor generalization and forecast inaccuracy; information criteria and testing mitigate this by penalizing excess parameters, as higher increases variance without proportional bias reduction.[40] As a practical starting point, especially for annual economic data with moderate sample sizes, one may initially consider lag orders up to 8-10 before applying formal selection, ensuring computational feasibility and alignment with typical business cycle lengths.[41]Yule-Walker Equations
The Yule-Walker equations offer a moment-based approach to estimate the coefficients of a stationary autoregressive process of order , denoted AR(), by relating the model's parameters to its autocovariance function. Introduced in the context of analyzing periodicities in time series, these equations stem from the foundational work on autoregressive representations.[2][3] Consider the AR() model where is white noise with mean zero and variance . To derive the equations, multiply both sides by for and take expectations, yielding where is the autocovariance function, which satisfies and under stationarity. For , the equation becomes These relations form a system of linear equations that links the AR coefficients directly to the autocovariances. In matrix notation, the system for is expressed as where , , and is the symmetric Toeplitz matrix The positive definiteness of under the stationarity condition ensures a unique solution . For estimation from a sample , replace the population autocovariances with sample estimates where . Substituting into the matrix form gives the Yule-Walker estimator , from which the noise variance is estimated as . This method assumes the lag order is known. Under the stationarity assumption, the Yule-Walker estimator is consistent, with as , and asymptotically normal, satisfying , where and is the autocovariance matrix of the process at lag zero. However, the estimator is biased in finite samples, particularly for small , due to the nonlinearity in the sample autocovariances and the inversion of the estimated Toeplitz matrix.Maximum Likelihood Estimation
Maximum likelihood estimation (MLE) seeks to estimate the parameters of an autoregressive (AR) model by maximizing the likelihood of observing the given time series data under the model assumptions. For AR models, this approach typically assumes that the innovations are independent and identically distributed as Gaussian white noise with mean zero and variance . The parameters to estimate include the autoregressive coefficients and the innovation variance . Under the Gaussian assumption, the likelihood function for an AR(p) process observed as is where . This expression conditions on the initial observations and treats the process as a sequence of conditional normals starting from . The conditional MLE, obtained by maximizing this likelihood (or equivalently, its logarithm), conditions on the first observations as fixed values, ignoring their contribution to the joint density. This conditional approach is computationally straightforward and equivalent to ordinary least squares regression of on the lagged values for , yielding estimates and . The conditional MLE has a closed-form solution via the normal equations for any . In contrast, the unconditional MLE incorporates the full joint likelihood by accounting for the initial conditions through the stationary distribution of the process or via prediction errors (innovations). For a stationary Gaussian AR(p), the observations follow a multivariate normal distribution with mean zero and covariance matrix determined by the parameters, leading to an exact likelihood that includes the density of the first values. This can be computed efficiently using the prediction error decomposition, where the log-likelihood is expressed as a sum of one-step-ahead forecast errors and their conditional variances, often implemented via the Kalman filter for higher-order models. The unconditional MLE is asymptotically more efficient than the conditional version, especially for short samples where initial observations matter, and typically requires numerical optimization methods such as Newton-Raphson iterations, which update estimates using the score vector and observed information matrix derived from the log-likelihood. Compared to the Yule-Walker method, MLE offers greater statistical efficiency when the Gaussian assumption holds, as it fully utilizes the distributional information rather than relying solely on sample autocorrelations. The asymptotic covariance matrix of the MLE can be estimated from the inverse of the observed Hessian of the log-likelihood (negative second derivatives), providing standard errors for inference. Hypothesis tests, such as Wald tests for individual coefficients or likelihood ratio tests for comparing models of different orders, rely on these asymptotic normality properties under standard regularity conditions.Spectral Characteristics
Power Spectral Density
The power spectral density (PSD) of a stationary autoregressive (AR) process provides a frequency-domain representation of its second-order properties, quantifying how the variance is distributed across different frequencies. For an AR(p) process defined by , where is white noise with variance and the characteristic polynomial has roots outside the unit circle ensuring stationarity, the PSD is given by for frequencies . This formula arises from the infinite moving average (MA) representation of the AR process, where the transfer function in the frequency domain inverts the AR polynomial. The PSD relates directly to the autocovariance function of the process via the inverse Fourier transform: with representing the total variance. Conversely, the PSD is the Fourier transform of the autocovariance sequence, bridging time-domain dependence to cyclic components in the frequency domain. In interpretation, the PSD highlights dominant periodicities in the process: peaks at specific indicate frequencies contributing most to the variance, such as cycles near zero frequency for processes with strong short-term dependence. For persistent AR models (e.g., roots of near the unit circle), the PSD concentrates power at low frequencies, reflecting long-memory-like behavior in the time domain. The sample periodogram, an estimator of the PSD formed from the discrete Fourier transform of the observed series, converges in probability to as the sample size increases, for fixed away from 0 and . This asymptotic property underpins nonparametric spectral estimation, though AR models offer parametric alternatives for smoother density approximations. The PSD of an AR process can be viewed as the inverse of the spectral density of its causal MA() representation , where , yielding . This "whitening" perspective underscores how AR filtering removes serial correlation, flattening the spectrum toward that of white noise.Low-Order AR Spectra
The AR(0) model, equivalent to white noise, exhibits a flat power spectral density across all frequencies, given by where is the variance of the innovation process and . This uniform spectrum reflects the absence of temporal dependence, with equal power distributed at every frequency, characteristic of uncorrelated noise.[42] For the AR(1) model with , the power spectral density is derived from the general form . When is small (e.g., near 0), the spectrum is relatively flat, spreading power evenly similar to white noise but with slight modulation. As approaches 1, power concentrates sharply at low frequencies (), indicating strong persistence and low-frequency dominance in the process.[42] The AR(2) model , stationary for roots of outside the unit circle, has power spectral density Complex conjugate roots produce spectral peaks at frequencies corresponding to the argument of the roots, reflecting oscillatory behavior with a dominant cycle length. For real roots, the spectrum may show broader concentration without distinct peaks.[42] Frequency-domain plots of these low-order AR spectra illustrate parameter effects: AR(1) traces transition from near-flat (low ) to sharply peaked at zero frequency (high ); AR(2) plots reveal single or bimodal peaks for varying , such as concentration around (cycle of about 4-5 units) for , . These visualizations highlight how AR parameters shape power distribution, aiding model diagnostics.[42] In model identification, AR spectra typically feature sharp peaks from AR poles, contrasting with smoother MA spectra that exhibit dips from zeros, facilitating distinction between AR and MA processes via observed frequency patterns.Forecasting and Applications
n-Step-Ahead Predictions
In autoregressive (AR) models, the one-step-ahead forecast for the next observation is obtained by substituting the known past values into the model equation, yielding , where is the constant term and are the AR coefficients.[43] This point forecast represents the conditional expectation of given the observed data up to time .[44] For multi-step-ahead forecasts (), the recursive method is employed, where , iteratively using previously computed forecasts in place of unavailable future observations.[43] This approach accumulates uncertainty as the forecast horizon increases, with forecast errors typically growing due to the propagation of prior prediction errors.[44] In the special case of an AR(1) model, a closed-form expression simplifies the computation: , where is the process mean.[43] Prediction intervals account for this uncertainty by incorporating the forecast variance. A interval is given by , where is the critical value from the standard normal distribution and is the estimated variance, with the innovation variance and the coefficients from the infinite moving average representation of the AR process.[44] For stationary AR processes (where all roots of the characteristic polynomial lie outside the unit circle), forecasts converge to the unconditional mean as , reflecting the mean-reverting behavior of the series.[43]Practical Implementations
Autoregressive (AR) models are commonly implemented in statistical software for time series analysis, enabling practitioners to estimate parameters, generate forecasts, and visualize model dynamics efficiently. In the R programming language, thear() function from the base stats package provides tools for fitting AR models of specified order using either the Yule-Walker equations or maximum likelihood estimation (MLE). For forecasting applications, the forecast package extends this functionality by integrating AR models with prediction intervals and automated order selection via functions like auto.arima(), which can fit pure AR processes as a special case. These implementations are widely used in econometric and financial time series workflows due to R's robust ecosystem for statistical computing.
Python offers accessible AR model fitting through the statsmodels library, where the AutoReg class in statsmodels.tsa.ar_model handles estimation for AR(p) processes, supporting both conditional least squares and MLE approaches. Data preparation, such as handling time-indexed series and differencing for stationarity, is typically done using pandas, which provides DataFrame methods like asfreq() and interpolate() for aligning and filling timestamps. This combination makes Python suitable for integrating AR models into machine learning pipelines, such as those in scikit-learn extensions or custom neural network hybrids.
MATLAB's Econometrics Toolbox includes the estimate method with the ar model object for fitting univariate AR models, estimating coefficients via least squares or MLE. For more general linear systems, the armax function in the System Identification Toolbox allows specification of AR components within ARMA or ARMAX frameworks, facilitating transfer function analysis and simulation. These tools are particularly valued in engineering and signal processing contexts for their built-in support for multivariate extensions and graphical diagnostics.
In Julia, AR models can be fitted using packages like TimeModels.jl, which supports ARIMA models (with differencing order 0 and moving average order 0 for pure AR), or by constructing lagged regressors and using the StatsModels.jl package for ordinary least squares estimation. Julia's just-in-time compilation enables high-performance computations, making it ideal for large-scale time series simulations.
Practical examples illustrate these implementations. For AR(1) estimation in Python, the following code snippet fits a model to a simulated series:
import numpy as np
import pandas as pd
from statsmodels.tsa.ar_model import AutoReg
# Simulated AR(1) data: y_t = 0.5 * y_{t-1} + epsilon
np.random.seed(42)
n = 100
y = np.zeros(n)
y[0] = np.random.normal()
for t in range(1, n):
y[t] = 0.5 * y[t-1] + np.random.normal()
data = pd.Series(y)
# Fit AR(1)
model = AutoReg(data, lags=1)
results = model.fit()
print(results.summary()) # Displays coefficients, e.g., phi_1 ≈ 0.5
import numpy as np
import pandas as pd
from statsmodels.tsa.ar_model import AutoReg
# Simulated AR(1) data: y_t = 0.5 * y_{t-1} + epsilon
np.random.seed(42)
n = 100
y = np.zeros(n)
y[0] = np.random.normal()
for t in range(1, n):
y[t] = 0.5 * y[t-1] + np.random.normal()
data = pd.Series(y)
# Fit AR(1)
model = AutoReg(data, lags=1)
results = model.fit()
print(results.summary()) # Displays coefficients, e.g., phi_1 ≈ 0.5
vars package for a fitted AR(1):
library(vars)
# Simulated AR(1) data as above (adapt to R: set.seed(42); y <- arima.sim(n=100, list(ar=0.5), innov=rnorm))
data <- ts(y)
# Fit AR(1) using VAR for compatibility with irf
var_fit <- VAR(data, p = 1)
# IRF (response to unit shock)
irf_obj <- irf(var_fit, impulse = "y", response = "y", n.ahead = 10)
plot(irf_obj) # Plots decaying response: phi^t for t=1 to 10
library(vars)
# Simulated AR(1) data as above (adapt to R: set.seed(42); y <- arima.sim(n=100, list(ar=0.5), innov=rnorm))
data <- ts(y)
# Fit AR(1) using VAR for compatibility with irf
var_fit <- VAR(data, p = 1)
# IRF (response to unit shock)
irf_obj <- irf(var_fit, impulse = "y", response = "y", n.ahead = 10)
plot(irf_obj) # Plots decaying response: phi^t for t=1 to 10
adf.test() in the tseries package; in Python, adfuller() from statsmodels.tsa.stattools. For handling missing data, common approaches include linear interpolation or forward/backward filling in pandas (data.interpolate(method='linear')) or R's na.approx() from the zoo package, followed by model refitting to avoid bias in coefficient estimates. These steps ensure reliable AR fitting, with diagnostics like residual autocorrelation checks (e.g., Ljung-Box test) verifying model adequacy post-estimation.