Recent from talks
Nothing was collected or created yet.
Autoregressive moving-average model
View on WikipediaIn the statistical analysis of time series, an autoregressive–moving-average (ARMA) model is used to represent a (weakly) stationary stochastic process by combining two components: autoregression (AR) and moving average (MA). These models are widely used for analyzing the structure of a series and for forecasting future values.
The AR component specifies that the current value of the series depends linearly on its own past values (lags), while the MA component specifies that the current value depends on a linear combination of past error terms. An ARMA model is typically denoted as ARMA(p, q), where p is the order of the autoregressive part and q is the order of the moving-average part.
The general ARMA model was described in the 1951 thesis of Peter Whittle, Hypothesis testing in time series analysis, and it was popularized in the 1970 book by George E. P. Box and Gwilym Jenkins.
ARMA models can be estimated by using the Box–Jenkins method.
Mathematical formulation
[edit]Autoregressive model
[edit]The notation AR(p) refers to the autoregressive model of order p. The AR(p) model is written as
where are parameters and the random variable is white noise, usually independent and identically distributed (i.i.d.) normal random variables.[1][2]
In order for the model to remain stationary, the roots of its characteristic polynomial must lie outside the unit circle. For example, processes in the AR(1) model with are not stationary because the root of lies within the unit circle.[3]
The augmented Dickey–Fuller test can assesses the stability of an intrinsic mode function and trend components. For stationary time series, the ARMA models can be used, while for non-stationary series, Long short-term memory models can be used to derive abstract features. The final value is obtained by reconstructing the predicted outcomes of each time series.[citation needed]
Moving average model
[edit]The notation MA(q) refers to the moving average model of order q:
where the are the parameters of the model, is the expectation of (often assumed to equal 0), and , ..., are i.i.d. white noise error terms that are commonly normal random variables.[4]
ARMA model
[edit]The notation ARMA(p, q) refers to the model with p autoregressive terms and q moving-average terms. This model contains the AR(p) and MA(q) models,[5]
In terms of lag operator
[edit]In some texts, the models is specified using the lag operator L. In these terms, the AR(p) model is given by
where represents the polynomial
The MA(q) model is given by
where represents the polynomial
Finally, the combined ARMA(p, q) model is given by
or more concisely,
or
This is the form used in Box, Jenkins & Reinsel.[6]
Moreover, starting summations from and setting and , then we get an even more elegant formulation:
Spectrum
[edit]The spectral density of an ARMA process iswhere is the variance of the white noise, is the characteristic polynomial of the moving average part of the ARMA model, and is the characteristic polynomial of the autoregressive part of the ARMA model.[7][8]
Fitting models
[edit]Choosing p and q
[edit]An appropriate value of p in the ARMA(p, q) model can be found by plotting the partial autocorrelation functions. Similarly, q can be estimated by using the autocorrelation functions. Both p and q can be determined simultaneously using extended autocorrelation functions (EACF).[9] Further information can be gleaned by considering the same functions for the residuals of a model fitted with an initial selection of p and q.
Brockwell & Davis recommend using Akaike information criterion (AIC) for finding p and q.[10] Another option is the Bayesian information criterion (BIC).
Estimating coefficients
[edit]After choosing p and q, ARMA models can be fitted by least squares regression to find the values of the parameters which minimize the error term. It is good practice to find the smallest values of p and q which provide an acceptable fit to the data. For a pure AR model, the Yule-Walker equations may be used to provide a fit.
ARMA outputs are used primarily to forecast (predict), and not to infer causation as in other areas of econometrics and regression methods such as OLS and 2SLS.
Software implementations
[edit]- In R, standard package
statshas functionarima, documented in ARIMA Modelling of Time Series. Packageastsahas an improved script calledsarimafor fitting ARMA models (seasonal and nonseasonal) andsarima.simto simulate data from these models. Extension packages contain related and extended functionality: packagetseriesincludes the functionarma(), documented in "Fit ARMA Models to Time Series"; packagefracdiffcontainsfracdiff()for fractionally integrated ARMA processes; and packageforecastincludesauto.arimafor selecting a parsimonious set of p, q. The CRAN task view on Time Series contains links to most of these. - Mathematica has a complete library of time series functions including ARMA.[11]
- MATLAB includes functions such as
arma,arandarxto estimate autoregressive, exogenous autoregressive and ARMAX models. See System Identification Toolbox and Econometrics Toolbox for details. - Julia has community-driven packages that implement fitting with an ARMA model such as
arma.jl. - Python has the
statsmodelsS package which includes many models and functions for time series analysis, including ARMA. Formerly part of the scikit-learn library, it is now stand-alone and integrates well with Pandas. - PyFlux has a Python-based implementation of ARIMAX models, including Bayesian ARIMAX models.
- IMSL Numerical Libraries are libraries of numerical analysis functionality including ARMA and ARIMA procedures implemented in standard programming languages like C, Java, C# .NET, and Fortran.
- gretl can estimate ARMA models, as mentioned here
- GNU Octave extra package
octave-forgesupports AR models. - Stata includes the function
arima. for ARMA and ARIMA models. - SuanShu is a Java library of numerical methods that implements univariate/multivariate ARMA, ARIMA, ARMAX, etc models, documented in "SuanShu, a Java numerical and statistical library".
- SAS has an econometric package, ETS, that estimates ARIMA models. See details.
History and interpretations
[edit]The general ARMA model was described in the 1951 thesis of Peter Whittle, who used mathematical analysis (Laurent series and Fourier analysis) and statistical inference.[12][13] ARMA models were popularized by a 1970 book by George E. P. Box and Jenkins, who expounded an iterative (Box–Jenkins) method for choosing and estimating them. This method was useful for low-order polynomials (of degree three or less).[14]
ARMA is essentially an infinite impulse response filter applied to white noise, with some additional interpretation placed on it.
In digital signal processing, ARMA is represented as a digital filter with white noise at the input and the ARMA process at the output.
Applications
[edit]ARMA is appropriate when a system is a function of a series of unobserved shocks (the MA or moving average part) as well as its own behavior. For example, stock prices may be shocked by fundamental information as well as exhibiting technical trending and mean-reversion effects due to market participants.[citation needed]
Generalizations
[edit]There are various generalizations of ARMA. Nonlinear AR (NAR), nonlinear MA (NMA) and nonlinear ARMA (NARMA) model nonlinear dependence on past values and error terms. Vector AR (VAR) and vector ARMA (VARMA) model multivariate time series. Autoregressive integrated moving average (ARIMA) models non-stationary time series (that is, whose mean changes over time). Autoregressive conditional heteroskedasticity (ARCH) models time series where the variance changes. Seasonal ARIMA (SARIMA or periodic ARMA) models periodic variation. Autoregressive fractionally integrated moving average (ARFIMA, or Fractional ARIMA, FARIMA) model time-series that exhibits long memory. Multiscale AR (MAR) is indexed by the nodes of a tree instead of integers.
Autoregressive–moving-average model with exogenous inputs (ARMAX)
[edit]The notation ARMAX(p, q, b) refers to a model with p autoregressive terms, q moving average terms and b exogenous inputs terms. The last term is a linear combination of the last b terms of a known and external time series . It is given by:
where are the parameters of the exogenous input .
Some nonlinear variants of models with exogenous variables have been defined: see for example Nonlinear autoregressive exogenous model.
Statistical packages implement the ARMAX model through the use of "exogenous" (that is, independent) variables. Care must be taken when interpreting the output of those packages, because the estimated parameters usually (for example, in R[15] and gretl) refer to the regression:
where incorporates all exogenous (or independent) variables:
See also
[edit]- Autoregressive integrated moving average (ARIMA)
- Exponential smoothing
- Linear predictive coding
- Predictive analytics
- Infinite impulse response
- Finite impulse response
This article includes a list of general references, but it lacks sufficient corresponding inline citations. (August 2010) |
References
[edit]- ^ Box, George E. P. (1994). Time series analysis : forecasting and control. Gwilym M. Jenkins, Gregory C. Reinsel (3rd ed.). Englewood Cliffs, N.J.: Prentice Hall. p. 54. ISBN 0-13-060774-6. OCLC 28888762.
- ^ Shumway, Robert H. (2000). Time series analysis and its applications. David S. Stoffer. New York: Springer. pp. 90–91. ISBN 0-387-98950-1. OCLC 42392178.
- ^ Box, George E. P.; Jenkins, Gwilym M.; Reinsel, Gregory C. (1994). Time series analysis : forecasting and control (3rd ed.). Englewood Cliffs, N.J.: Prentice Hall. pp. 54–55. ISBN 0-13-060774-6. OCLC 28888762.
- ^ Box, George E. P.; Jenkins, Gwilym M.; Reinsel, Gregory C.; Ljung, Greta M. (2016). Time series analysis : forecasting and control (5th ed.). Hoboken, New Jersey: John Wiley & Sons, Incorporated. p. 53. ISBN 978-1-118-67492-5. OCLC 908107438.
- ^ Shumway, Robert H. (2000). Time series analysis and its applications. David S. Stoffer. New York: Springer. p. 98. ISBN 0-387-98950-1. OCLC 42392178.
- ^ Box, George; Jenkins, Gwilym M.; Reinsel, Gregory C. (1994). Time Series Analysis: Forecasting and Control (Third ed.). Prentice-Hall. ISBN 0130607746.
- ^ Rosenblatt, Murray (2000). Gaussian and non-Gaussian linear time series and random fields. New York: Springer. p. 10. ISBN 0-387-98917-X. OCLC 42061096.
- ^ Wei, William W. S. (1990). Time series analysis : univariate and multivariate methods. Redwood City, Calif.: Addison-Wesley Pub. pp. 242–243. ISBN 0-201-15911-2. OCLC 18166355.
- ^ Missouri State University. "Model Specification, Time Series Analysis" (PDF).
- ^ Brockwell, P. J.; Davis, R. A. (2009). Time Series: Theory and Methods (2nd ed.). New York: Springer. p. 273. ISBN 9781441903198.
- ^ Time series features in Mathematica Archived November 24, 2011, at the Wayback Machine
- ^ Hannan, Edward James (1970). Multiple time series. Wiley series in probability and mathematical statistics. New York: John Wiley and Sons.
- ^ Whittle, P. (1951). Hypothesis Testing in Time Series Analysis. Almquist and Wicksell.
Whittle, P. (1963). Prediction and Regulation. English Universities Press. ISBN 0-8166-1147-5.
{{cite book}}: ISBN / Date incompatibility (help)- Republished as: Whittle, P. (1983). Prediction and Regulation by Linear Least-Square Methods. University of Minnesota Press. ISBN 0-8166-1148-3.
- ^ Hannan & Deistler (1988, p. 227): Hannan, E. J.; Deistler, Manfred (1988). Statistical theory of linear systems. Wiley series in probability and mathematical statistics. New York: John Wiley and Sons.
- ^ ARIMA Modelling of Time Series, R documentation
Further reading
[edit]- Mills, Terence C. (1990). Time Series Techniques for Economists. Cambridge University Press. ISBN 0521343399.
- Percival, Donald B.; Walden, Andrew T. (1993). Spectral Analysis for Physical Applications. Cambridge University Press. ISBN 052135532X.
- Francq, C.; Zakoïan, J.-M. (2005), "Recent results for linear time series models with non independent innovations", in Duchesne, P.; Remillard, B. (eds.), Statistical Modeling and Analysis for Complex Data Problems, Springer, pp. 241–265, CiteSeerX 10.1.1.721.1754.
- Shumway, R.H. and Stoffer, D.S. (2017). Time Series Analysis and Its Applications with R Examples. Springer. DOI: 10.1007/978-3-319-52452-8
Autoregressive moving-average model
View on GrokipediaModel Components
Autoregressive Process
The autoregressive (AR) process of order , denoted AR(), is a fundamental model in time series analysis that describes how the current value of a series depends linearly on its own previous values, plus a random shock. This structure captures the endogenous dependence within the series, making it suitable for modeling phenomena with inertia or momentum, such as economic indicators or financial returns.[7][2] The mathematical formulation of an AR() process is given by where represents a constant term (often related to the mean of the process), the coefficients (for ) quantify the influence of each lagged value, and is a white noise error term with zero mean, constant variance , and no serial correlation (i.e., for ). The parameters must satisfy specific conditions to ensure the process behaves in a stable manner over time.[7][8][9] For the AR() process to be stationary—meaning its statistical properties like mean and variance remain constant over time—the roots of the characteristic polynomial must lie outside the unit circle in the complex plane, i.e., all roots satisfy . This condition ensures that the effects of past shocks do not accumulate indefinitely, preventing explosive behavior or non-constant variance. If any root has modulus less than or equal to 1, the process becomes non-stationary, often exhibiting trends or unit root behavior.[10][11][12] A simple yet illustrative example is the AR(1) process, , where stationarity requires . Here, directly measures the degree of persistence: if is close to 1 (e.g., 0.9), shocks to the series decay slowly, leading to prolonged deviations from the mean and high autocorrelation; conversely, if is near 0, the series behaves more like white noise with minimal memory. This persistence is crucial for understanding phenomena like business cycles, where –0.95 is common in empirical macroeconomic data.[8][13][10] Under the stationarity condition, a causal AR() process admits an infinite moving average (MA) representation, expressing as an infinite linear combination of current and past white noise terms. This Wold decomposition is derived by inverting the AR operator: starting from the AR equation, recursive substitution yields , where is the process mean, and the coefficients are determined by the expansion of (with the lag operator), satisfying for convergence. For the AR(1) case, the derivation is straightforward: , iterating backward gives , with and , illustrating how past shocks propagate with geometrically decaying weights. This representation underscores the AR process's equivalence to an infinite-order MA under stationarity, facilitating forecasting and spectral analysis.[9][12][13]Moving Average Process
The moving average process of order , denoted MA(), models a time series where each observation is a constant plus the current white noise error term and a finite linear combination of the previous error terms. This structure captures how past forecast errors influence current values, reflecting temporary shocks with limited persistence. The mathematical formulation is where is the process mean (often zero for centered series), the (for ) are fixed moving average coefficients that weight the impact of past errors, and is white noise—a sequence of i.i.d. random variables with and , typically assumed Gaussian for exact inference. The parameters and are estimated from data, and the model assumes no further dependence beyond the specified lags.[14] By construction, the MA() process is always (weakly) stationary, as its mean is constant and autocovariances depend only on the lag, owing to the finite summation of stationary white noise components. However, invertibility—a condition ensuring the process can be expressed as an infinite autoregressive form for practical forecasting—requires that all roots of the MA polynomial lie outside the unit circle () in the complex plane. Non-invertible models, while mathematically valid, complicate estimation and interpretation, so invertible parameterizations are preferred.[15] A basic illustration is the MA(1) model, , which models dependence limited to the immediate past error. This form is particularly effective for short-term dynamics, such as fleeting market shocks in economic data, where only the most recent error meaningfully affects the current observation. When , the model can approximate over-differencing effects, where excessive differencing of a stationary series induces artificial negative lag-1 autocorrelation that the MA term corrects.[16] The autocorrelation function (ACF) of an MA() process is derived from its autocovariance structure and truncates exactly after lag , a hallmark for model identification. The lag- autocovariance is for (with and for ), while and for . The autocorrelations follow as , yielding nonzero values only up to lag , which reflects the process's finite memory. This cutoff pattern contrasts with processes having longer-range dependence. The MA process complements autoregressive models by focusing on error-driven correlations rather than value persistence.ARMA Formulation
General ARMA Equation
The autoregressive moving-average (ARMA) model of order (p, q), denoted ARMA(p, q), provides a unified framework for modeling stationary time series by integrating autoregressive and moving average components, allowing representation of processes with both lagged dependencies in observations and errors.[17] This approach builds briefly on the pure autoregressive and moving average processes as foundational building blocks. The general equation for an ARMA(p, q) process with mean is where is the time series value at time , is a sequence of white noise errors with mean zero and constant variance , the coefficients are the autoregressive parameters, and are the moving average parameters.[17][18] For centered processes where , the equation simplifies to Here, represents the degree of the autoregressive component, indicating the number of prior observations influencing the current value, while represents the degree of the moving average component, indicating the number of prior errors affecting the current value.[17][18] Validity of the model requires stationarity of the AR part, ensured by roots of the associated characteristic polynomial lying outside the unit circle, and invertibility of the MA part, similarly requiring roots outside the unit circle to allow expression as an infinite AR process.[17][18] A compact notation using the backshift (lag) operator , defined such that , expresses the model as , where and , though full analysis of this form follows separately.[17][18] Special cases of the ARMA(p, q) model include the pure autoregressive AR(p) when , reducing to (assuming ), which emphasizes dependence solely on past values.[17][18] The pure moving average MA(q) arises when , given by , focusing on finite dependence on past shocks.[17][18] A representative example is the ARMA(1,1) model, , which models series exhibiting persistence from the prior observation tempered by a single lagged error, commonly applied to capture exponential decay in autocorrelations.[17][18]Lag Operator Representation
The lag operator, often denoted and also known as the backshift operator, provides a compact notation for expressing time shifts in time series models, defined such that . Higher powers extend this action linearly, with for any positive integer . This operator facilitates the representation of linear combinations through polynomials, such as the autoregressive polynomial and the moving average polynomial , where the coefficients and characterize the dependencies in the series.[19] In this framework, the general autoregressive moving average (ARMA) model of orders and is succinctly written as , where is white noise with mean zero and variance . This operator form highlights the multiplicative structure of the polynomials, allowing for elegant manipulations in theoretical derivations. For a stationary ARMA process—requiring the roots of to lie outside the unit circle—the model admits an infinite moving average (MA) representation: , where the weights are generated by the power series expansion of , with . An invertible ARMA process similarly yields an infinite autoregressive (AR) form.[20][19] The lag operator notation offers significant advantages in time series analysis, including streamlined simulation of processes by recursively applying the operator to generate future values, efficient forecasting through recursive computation of expectations (e.g., for the MA() representation), and derivation of key moments such as the autocovariance function via polynomial expansions. It also simplifies proofs of properties like ergodicity under stationarity assumptions.[19][21] For illustration, consider a first-order autoregressive AR(1) model, , which in lag notation becomes with for stationarity. Similarly, a first-order moving average MA(1) model, , is expressed as , with invertibility requiring . These forms reveal the AR(1) as an MA() process, .[19]Model Properties
Stationarity and Invertibility
In time series analysis, strict stationarity refers to a stochastic process where the joint distribution of any collection of observations is invariant to time shifts, implying constant mean, variance, and autocovariances that depend solely on the lag between observations.[22] For ARMA models, the focus is typically on weak (or second-order) stationarity, which requires a time-invariant mean and autocovariance function, ensuring the process has finite second moments and is suitable for modeling with constant parameters. For an ARMA(p, q) process defined by the equation , where and are the autoregressive (AR) and moving average (MA) polynomials, respectively, stationarity holds if all roots of the characteristic equation lie strictly outside the unit circle in the complex plane (i.e., have absolute values greater than 1).[22] These roots determine the behavior of the process: when they are outside the unit circle, the AR component exhibits mean reversion, as past shocks decay over time, leading to a stable process with bounded variance. Conversely, if any root lies inside the unit circle (absolute value less than 1), the process becomes explosive, with variance growing without bound and forecasts diverging rapidly. Roots on the unit circle (absolute value equal to 1), such as a unit root in an AR(1) model where , result in non-stationarity, manifesting as persistent trends or random walk-like behavior without mean reversion.[22] Invertibility, a complementary property, ensures the MA component can be expressed as an infinite AR process, facilitating practical forecasting and parameter estimation. It requires all roots of to lie outside the unit circle, mirroring the stationarity condition but applied to the MA polynomial. This condition guarantees that current observations depend on past errors in a decaying manner, avoiding non-unique representations of the process.[22] Non-stationarity in ARMA models, particularly due to unit roots, violates the constant mean and variance assumptions, leading to unreliable parameter estimates, spurious regressions, and forecasts that fail to capture long-term dynamics. In such cases, transformations like differencing are necessary, extending the model to ARIMA frameworks to induce stationarity. Testing for stationarity conceptually involves examining the roots of the AR polynomial or applying unit root tests, which assess the null hypothesis of a unit root against the alternative of stationarity, often through augmented regressions to account for serial correlation.Autocorrelation Structure
The autocorrelation function (ACF) and partial autocorrelation function (PACF) characterize the serial correlation structure of stationary ARMA processes, providing key patterns for model identification.[8][23] For a stationary ARMA(p, q) process, the ACF measures the correlation between observations separated by lag k, while the PACF isolates the correlation at lag k after adjusting for intermediate lags.[8] In a pure autoregressive AR(p) process, the ACF decays exponentially or in a damped sinusoidal manner as the lag increases, reflecting persistent dependence on past values.[23] The autocorrelations satisfy the Yule-Walker equations: for k > 0, , where is the autocorrelation at lag k and are the AR coefficients.[23] In contrast, the PACF for AR(p) truncates to zero after lag p, showing no significant partial correlation beyond the order.[8] For a pure moving average MA(q) process, the ACF truncates abruptly to zero after lag q, as correlations depend only on the finite shock history.[23] The PACF, however, decays gradually without truncation, similar to the ACF of an AR process.[8] In a mixed ARMA(p, q) process, the ACF and PACF exhibit hybrid behaviors: the ACF typically shows a non-zero pattern up to lag q followed by exponential decay influenced by the AR component, while the PACF decays without clear truncation.[23] These mixed patterns distinguish ARMA models from pure AR or MA, aiding in order selection during identification.[8] For example, in an AR(1) model with , the ACF plot displays , a smooth exponential decay from lag 1 onward, while the PACF spikes at lag 1 () and drops to zero thereafter.[23] In an MA(1) model with , the ACF has at lag 1 and zero beyond, whereas the PACF decays exponentially.[8] For an ARMA(1,1) model , the ACF features a distinct followed by geometric decay for k ≥ 2, and the PACF decays gradually, blending the truncation and persistence of its components.[24]Spectral Density
The power spectral density (PSD) of a weakly stationary time series process is defined as the Fourier transform of its autocovariance function, providing a frequency-domain representation of the process's variance distribution across frequencies. For an ARMA(p, q) process defined by , where is white noise with variance , , and , the PSD is given by This formula reveals how the AR and MA components shape the frequency content: the denominator amplifies or attenuates frequencies based on AR roots, while the numerator does so for MA roots. Peaks in highlight dominant frequencies corresponding to cyclical patterns in the time series, whereas a constant PSD (flat spectrum) characterizes white noise, indicating no preferred frequencies. For an AR(1) process with , the PSD simplifies to , exhibiting a peak at that underscores low-frequency persistence.[25] In contrast, for an MA(1) process with , emphasizes low frequencies, with higher power near compared to higher frequencies. The periodogram serves as a conceptual nonparametric estimator of the PSD, computed as the squared modulus of the discrete Fourier transform of the data, offering an initial view of underlying frequency structure before parametric modeling.Identification and Estimation
Order Selection for p and q
Order selection for the autoregressive (AR) order and moving average (MA) order in an ARMA() model is a critical preliminary step that precedes parameter estimation, aiming to identify the minimal model structure that adequately captures the time series dynamics while avoiding unnecessary complexity.[26] The Box-Jenkins methodology provides a foundational graphical approach for tentative identification, relying on the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) of the stationary time series.[26] For an AR() process, the sample ACF typically exhibits a gradual tail-off after lag , while the PACF shows a sharp cut-off beyond lag ; conversely, for an MA() process, the sample ACF cuts off after lag , and the PACF tails off.[26] These patterns, which align with theoretical ACF and PACF behaviors for pure AR and MA processes, guide initial choices for and in mixed ARMA models, though they may be less distinct in combined cases.[26] To refine these tentative orders more objectively, information criteria balance model fit against parsimony by penalizing excessive parameters. The Akaike Information Criterion (AIC) is defined as , where is the maximized likelihood and is the number of parameters (including for the constant variance); lower AIC values favor models that minimize expected prediction error. The Bayesian Information Criterion (BIC), given by with sample size , imposes a stronger penalty on complexity, promoting consistent selection of the true order as grows large and thus reducing overfitting risk compared to AIC.[27] Both criteria are computed over a grid of candidate values, typically up to small integers like 5, and the model with the minimum value is selected.[27] Alternative approaches include cross-validation, which evaluates candidate models by partitioning the data into training and validation sets to assess out-of-sample forecast accuracy, helping to guard against overfitting in finite samples.[28] Following order selection, the Ljung-Box test on residuals assesses whether remaining autocorrelations are insignificant, confirming the chosen orders yield white noise residuals; the test statistic follows a distribution under the null of no serial correlation up to lag , with non-rejection supporting the model.[29] Challenges in order selection arise from the risk of overfitting, where high or capture noise rather than signal, inflating variance in forecasts; information criteria mitigate this, but AIC may still select overly complex models in small samples.[27] Non-stationarity poses another issue, as ARMA assumes weak stationarity; if present, differencing must precede selection to achieve stationarity, or the process may be misspecified as ARIMA.[26] For illustration, consider a quarterly economic series where the sample ACF decays exponentially without cut-off and the PACF cuts off after lag 2, suggesting an AR(2) model with , .[26]Parameter Estimation Techniques
The primary approach to estimating the parameters of an ARMA(p, q) model—namely, the autoregressive coefficients , the moving average coefficients , and the innovation variance —is maximum likelihood estimation (MLE), assuming Gaussian innovations and that the model orders p and q have been predetermined. Under this assumption, the parameters maximize the log-likelihood function, which for a sample of n observations is given by where are the model residuals, with denoting the mean (often set to zero for centered data). Computing the exact likelihood requires evaluating the joint density of the observations, typically via the innovations algorithm or state-space representation, as the residuals depend on unobserved past values.[6] Due to the nonlinear nature of the ARMA likelihood, direct maximization is challenging and often relies on iterative numerical optimization techniques such as Newton-Raphson or BFGS. Initial parameter values are crucial to avoid local maxima or non-convergence; a common strategy is to use conditional least squares (CLS) estimates, which minimize the sum of squared residuals conditioned on the first max(p, q) observations treated as fixed, providing a computationally simpler approximation to the MLE.[6] Alternatively, Yule-Walker equations—originally for pure AR models but extended via sample autocovariances—can generate starting values for the AR coefficients, with MA coefficients then refined iteratively.[30] Non-convergence issues arise from ill-conditioned likelihood surfaces, particularly for higher-order models or near unit roots, and can be mitigated by constraints ensuring stationarity and invertibility during optimization. Under standard regularity conditions, including stationarity, invertibility, and Gaussian innovations, the MLE exhibits desirable asymptotic properties: it is consistent, asymptotically normal with variance given by the inverse Fisher information matrix, and efficient in the Cramér-Rao sense.[31] Specifically, as sample size n approaches infinity, converges in distribution to a multivariate normal with mean zero and covariance equal to the Godambe information matrix.[31] These properties hold even for non-Gaussian innovations under quasi-maximum likelihood, though efficiency may degrade without correct distributional assumptions.[32] In comparison, the method of moments (MoM) provides an alternative by equating sample autocovariances to their theoretical counterparts derived from the ARMA autocovariance function, solving the resulting nonlinear system for the parameters.[30] While computationally straightforward for low orders and robust to non-Gaussianity, MoM estimators are generally less efficient than MLE, with larger asymptotic variances, especially for MA components where higher moments may be required.[6] For instance, in pure AR(p) cases, MoM reduces to Yule-Walker and achieves the same efficiency as MLE under Gaussianity, but for general ARMA(p, q), MLE is preferred for its superior small-sample performance and asymptotic optimality.[30]Model Diagnostics and Validation
After estimating the parameters of an ARMA model, typically via maximum likelihood estimation, it is essential to perform diagnostic checks to verify that the model adequately captures the underlying time series structure and that the residuals behave as expected under the model's assumptions. These diagnostics help detect potential misspecifications, such as unmodeled dependencies or inadequate order selection, ensuring the model's reliability for inference and forecasting. A primary diagnostic tool is residual analysis, where the residuals—defined as the differences between observed and fitted values—are examined for properties of white noise. Standardized residuals, obtained by dividing raw residuals by their estimated standard errors, should exhibit no patterns, constant variance, and approximate normality; visual inspections such as time series plots and quantile-quantile (Q-Q) plots are used to assess these characteristics, with Q-Q plots comparing the distribution of residuals against a standard normal distribution to detect deviations from normality. If residuals display heteroscedasticity or non-normal tails in Q-Q plots, it may indicate model inadequacy, prompting revisions such as higher-order terms or transformations. To formally test for serial correlation in residuals, portmanteau tests aggregate autocorrelations across multiple lags. The Ljung-Box test is widely applied, with its statistic given by where is the sample size, is the number of lags considered (often set to 10–20 for adequacy), and are the sample autocorrelations of the residuals; under the null hypothesis of no serial correlation, asymptotically follows a distribution with degrees of freedom for an ARMA() model.[33] A non-significant (e.g., p-value > 0.05) supports the absence of residual autocorrelation, confirming the model has captured the serial dependence.[33] Model validation extends to out-of-sample forecasting, where the ARMA model is used to predict held-out data to evaluate predictive performance beyond the fitting period. Common metrics include the mean squared error (MSE), which penalizes larger errors quadratically as , and the mean absolute error (MAE), , where is the forecast horizon and are actual and predicted values; lower values indicate better accuracy, with MSE sensitive to outliers and MAE providing a robust scale-independent measure. These metrics help assess whether the model generalizes well, as in-sample fit can overestimate performance. Overfitting, where the model fits noise rather than the signal, is detected through model comparison techniques such as the likelihood ratio test (LRT) for nested ARMA models. The LRT statistic, , where and are the maximized log-likelihoods of the fuller and restricted models, follows a distribution with degrees of freedom equal to the difference in parameter counts under the null of no additional parameters needed; rejection suggests the simpler model suffices, mitigating overfitting.[34] This test is particularly useful when comparing ARMA() against ARMA() with and .[34] A common issue in ARMA diagnostics is residual autocorrelation, which signals model misspecification such as incorrect orders or unaccounted seasonality; for instance, significant Ljung-Box p-values or patterned residual plots indicate that the ARMA structure has not fully removed dependencies, necessitating model refinement. In such cases, revisiting identification steps or considering extensions like ARIMA may resolve the problem.Computational Implementation
Software Packages
Several software packages and libraries implement the autoregressive moving-average (ARMA) model, enabling users to fit, estimate, and forecast time series data across various programming languages and environments. These tools typically support core functionalities such as parameter estimation via maximum likelihood, order specification for AR(p) and MA(q) components, and diagnostic checks for model adequacy, often extending to related models like ARIMA.[35] In the R programming language, thearima() function from the base stats package provides comprehensive support for fitting ARMA and ARIMA models. It allows users to specify the orders p, d, and q, and employs methods like conditional sum-of-squares or maximum likelihood estimation for parameter fitting, with built-in options for forecasting and residual analysis.[36]
Python's statsmodels library offers the ARIMA class within the tsa.arima.model module, which facilitates ARMA model estimation and forecasting using techniques such as Kalman filter-based methods for handling non-stationary series. This implementation supports integration with pandas for data handling and includes tools for AIC/BIC-based order selection and Ljung-Box tests for diagnostics.[37]
MATLAB's Econometrics Toolbox includes the arima class, which models ARMA processes with customizable p and q orders, supporting estimation via exact or approximate maximum likelihood and providing options for seasonal components and covariates. It integrates diagnostics like autocorrelation function plots and normality tests for residuals.[38]
Julia's StateSpaceModels.jl package provides ARMA modeling capabilities using state-space representations, supporting specification of AR(p) and MA(q) terms with efficient estimation via Kalman filtering and simulation methods. This library leverages Julia's performance for large datasets.[39]
For high-performance computing needs, the arima crate in Rust provides ARMA and ARIMA model coefficient estimation and simulation, though it may require additional setup for full workflows compared to higher-level languages.[40]
Practical Coding Examples
Implementing an ARMA model in practice involves simulating data, fitting the model, and visualizing diagnostics to ensure adequacy. These examples use R and Python, focusing on ARMA(1,1) for simplicity, as it illustrates the core autoregressive and moving average components. The R example demonstrates simulation and fitting using the basestats package, while the Python example employs statsmodels for loading real data and forecasting.
In R, begin by setting a seed for reproducibility and simulating an ARMA(1,1) process with parameters φ=0.5 and θ=0.3. Use the arima.sim function to generate 100 observations.
set.[seed](/page/Seed)(123) # For [reproducibility](/page/Reproducibility)
n <- 100
phi <- 0.5
theta <- 0.3
arma11_data <- arima.sim(model = list(ar = phi, ma = theta), n = n)
set.[seed](/page/Seed)(123) # For [reproducibility](/page/Reproducibility)
n <- 100
phi <- 0.5
theta <- 0.3
arma11_data <- arima.sim(model = list(ar = phi, ma = theta), n = n)
arima(), specifying the order as order=c(1,0,1) to indicate no differencing for a pure ARMA. Suppress initial value warnings if needed by setting method="ML".
arma_model <- arima(arma11_data, order = c(1, 0, 1), method = "ML")
summary(arma_model)
arma_model <- arima(arma11_data, order = c(1, 0, 1), method = "ML")
summary(arma_model)
acf() and pacf(), which help identify the ARMA orders by showing decaying patterns for AR(1) and MA(1) tails, respectively. Then, extract residuals and plot their ACF to check for white noise.
par(mfrow = c(2, 2))
acf(arma11_data, main = "ACF of Simulated Data")
pacf(arma11_data, main = "PACF of Simulated Data")
residuals <- residuals(arma_model)
acf(residuals, main = "ACF of Residuals")
plot(arma_model, which = 4) # Ljung-Box test plot for diagnostics
par(mfrow = c(2, 2))
acf(arma11_data, main = "ACF of Simulated Data")
pacf(arma11_data, main = "PACF of Simulated Data")
residuals <- residuals(arma_model)
acf(residuals, main = "ACF of Residuals")
plot(arma_model, which = 4) # Ljung-Box test plot for diagnostics
statsmodels library to fit an ARMA model to the sunspots dataset, a classic time series often used for ARMA modeling. First, load the data, which is stationary after appropriate checks.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import statsmodels.api as sm
import warnings
warnings.filterwarnings('ignore')
# Load sunspots data (built-in)
sunspots_data = sm.datasets.sunspots.load_pandas().data
ts = sunspots_data['SUNACTIVITY']
ts.index = pd.date_range(start='1700', periods=len(ts), freq='Y')
# Plot ACF and PACF for order identification
fig, ax = plt.subplots(2, 1, figsize=(10, 6))
plot_acf(ts, ax=ax[0], lags=20)
plot_pacf(ts, ax=ax[1], lags=20)
plt.show()
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import statsmodels.api as sm
import warnings
warnings.filterwarnings('ignore')
# Load sunspots data (built-in)
sunspots_data = sm.datasets.sunspots.load_pandas().data
ts = sunspots_data['SUNACTIVITY']
ts.index = pd.date_range(start='1700', periods=len(ts), freq='Y')
# Plot ACF and PACF for order identification
fig, ax = plt.subplots(2, 1, figsize=(10, 6))
plot_acf(ts, ax=ax[0], lags=20)
plot_pacf(ts, ax=ax[1], lags=20)
plt.show()
ARIMA with order=(1,0,1). For forecasting, generate 12 steps ahead with 95% confidence intervals.
model = ARIMA(ts, order=(1, 0, 1))
fitted_model = model.fit()
print(fitted_model.summary())
# Forecast
forecast = fitted_model.get_forecast(steps=12)
forecast_mean = forecast.predicted_mean
conf_int = forecast.conf_int()
# Plot forecast with confidence intervals
plt.figure(figsize=(10, 6))
plt.plot(ts.index, ts, label='Observed')
plt.plot(forecast_mean.index, forecast_mean, label='Forecast', color='red')
plt.fill_between(conf_int.index, conf_int.iloc[:, 0], conf_int.iloc[:, 1], color='pink', alpha=0.3)
plt.legend()
plt.show()
# Residual diagnostics
residuals = fitted_model.resid
fig, ax = plt.subplots(figsize=(10, 3))
plot_acf(residuals, ax=ax, lags=20)
plt.show()
model = ARIMA(ts, order=(1, 0, 1))
fitted_model = model.fit()
print(fitted_model.summary())
# Forecast
forecast = fitted_model.get_forecast(steps=12)
forecast_mean = forecast.predicted_mean
conf_int = forecast.conf_int()
# Plot forecast with confidence intervals
plt.figure(figsize=(10, 6))
plt.plot(ts.index, ts, label='Observed')
plt.plot(forecast_mean.index, forecast_mean, label='Forecast', color='red')
plt.fill_between(conf_int.index, conf_int.iloc[:, 0], conf_int.iloc[:, 1], color='pink', alpha=0.3)
plt.legend()
plt.show()
# Residual diagnostics
residuals = fitted_model.resid
fig, ax = plt.subplots(figsize=(10, 3))
plot_acf(residuals, ax=ax, lags=20)
plt.show()
maxiter=100 or use method='css' for conditional sum of squares initialization in both R and Python to improve stability.
Best practices emphasize setting random seeds (e.g., np.random.seed(123) in Python or set.seed(123) in R) for reproducible simulations and leveraging vectorized operations, such as NumPy arrays in Python or base R vectors, to handle large datasets efficiently without loops. These ensure scalable, verifiable implementations.
