Hubbry Logo
Autoregressive integrated moving averageAutoregressive integrated moving averageMain
Open search
Autoregressive integrated moving average
Community hub
Autoregressive integrated moving average
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Autoregressive integrated moving average
Autoregressive integrated moving average
from Wikipedia

In time series analysis used in statistics and econometrics, autoregressive integrated moving average (ARIMA) and seasonal ARIMA (SARIMA) models are generalizations of the autoregressive moving average (ARMA) model to non-stationary series and periodic variation, respectively. All these models are fitted to time series in order to better understand it and predict future values. The purpose of these generalizations is to fit the data as well as possible. Specifically, ARMA assumes that the series is stationary, that is, its expected value is constant in time. If instead the series has a trend (but a constant variance/autocovariance), the trend is removed by "differencing",[1] leaving a stationary series. This operation generalizes ARMA and corresponds to the "integrated" part of ARIMA. Analogously, periodic variation is removed by "seasonal differencing".[2]

Components

[edit]

As in ARMA, the "autoregressive" (AR) part of ARIMA indicates that the evolving variable of interest is regressed on its prior values. The "moving average" (MA) part indicates that the regression error is a linear combination of error terms whose values occurred contemporaneously and at various times in the past.[3] The "integrated" (I) part indicates that the data values have been replaced with the difference between each value and the previous value.

According to Wold's decomposition theorem[4][5][6] the ARMA model is sufficient to describe a regular (a.k.a. purely nondeterministic[6]) wide-sense stationary time series. This motivates to make such a non-stationary time series stationary, e.g., by using differencing, before using ARMA.[7]

If the time series contains a predictable sub-process (a.k.a. pure sine or complex-valued exponential process[5]), the predictable component is treated as a non-zero-mean but periodic (i.e., seasonal) component in the ARIMA framework that it is eliminated by the seasonal differencing.

Mathematical formulation

[edit]

Non-seasonal ARIMA models are usually denoted ARIMA(p, d, q) where parameters p, d, q are non-negative integers: p is the order (number of time lags) of the autoregressive model, d is the degree of differencing (the number of times the data have had past values subtracted), and q is the order of the moving-average model. Seasonal ARIMA models are usually denoted ARIMA(p, d, q)(P, D, Q)m, where the uppercase P, D, Q are the autoregressive, differencing, and moving average terms for the seasonal part of the ARIMA model and m is the number of periods in each season.[8][2] When two of the parameters are 0, the model may be referred to based on the non-zero parameter, dropping "AR", "I" or "MA" from the acronym. For example, is AR(1), is I(1), and is MA(1).

Given time series data Xt where t is an integer index and the Xt are real numbers, an model is given by

or equivalently by

where is the lag operator, the are the parameters of the autoregressive part of the model, the are the parameters of the moving average part and the are error terms. The error terms are generally assumed to be independent, identically distributed variables sampled from a normal distribution with zero mean.

If the polynomial has a unit root (a factor ) of multiplicity d, then it can be rewritten as:

An ARIMA(p, d, q) process expresses this polynomial factorisation property with p = p'−d, and is given by:

and so is special case of an ARMA(p+d, q) process having the autoregressive polynomial with d unit roots. (This is why no process that is accurately described by an ARIMA model with d > 0 is wide-sense stationary.)

The above can be generalized as follows.

This defines an ARIMA(p, d, q) process with drift .

Other special forms

[edit]

The explicit identification of the factorization of the autoregression polynomial into factors as above can be extended to other cases, firstly to apply to the moving average polynomial and secondly to include other special factors. For example, having a factor in a model is one way of including a non-stationary seasonality of period s into the model; this factor has the effect of re-expressing the data as changes from s periods ago. Another example is the factor , which includes a (non-stationary) seasonality of period 2.[clarification needed] The effect of the first type of factor is to allow each season's value to drift separately over time, whereas with the second type values for adjacent seasons move together.[clarification needed]

Identification and specification of appropriate factors in an ARIMA model can be an important step in modeling as it can allow a reduction in the overall number of parameters to be estimated while allowing the imposition on the model of types of behavior that logic and experience suggest should be there.

Differencing

[edit]

A stationary time series's properties do not change. Specifically, for a wide-sense stationary time series, the mean and the variance/autocovariance are constant over time. Differencing in statistics is a transformation applied to a non-stationary time-series in order to make it trend stationary (i.e., stationary in the mean sense), by removing or subtracting the trend or non-constant mean. However, it does not affect the non-stationarity of the variance or autocovariance. Likewise, seasonal differencing or deseasonalization is applied to a time-series to remove the seasonal component.

From the perspective of signal processing, especially the Fourier spectral analysis theory, the trend is a low-frequency part in the spectrum of a series, while the season is a periodic-frequency part. Therefore, differencing is a high-pass (that is, low-stop) filter and the seasonal-differencing is a comb filter to suppress respectively the low-frequency trend and the periodic-frequency season in the spectrum domain (rather than directly in the time domain).[7]

To difference the data, we compute the difference between consecutive observations. Mathematically, this is shown as

It may be necessary to difference the data a second time to obtain a stationary time series, which is referred to as second-order differencing:

Seasonal differencing involves computing the difference between an observation and the corresponding observation in the previous season e.g a year. This is shown as:

The differenced data are then used for the estimation of an ARMA model.

Examples

[edit]

Some well-known special cases arise naturally or are mathematically equivalent to other popular forecasting models. For example:

  • ARIMA(0, 0, 0) models white noise.
  • An ARIMA(0, 1, 0) model is a random walk.
  • An ARIMA(0, 1, 2) model is a Damped Holt's model.
  • An ARIMA(0, 1, 1) model without constant is a basic exponential smoothing model.[9]
  • An ARIMA(0, 2, 2) model is given by — which is equivalent to Holt's linear method with additive errors, or double exponential smoothing.[9]

Choosing the order

[edit]

The order p and q can be determined using the sample autocorrelation function (ACF), partial autocorrelation function (PACF), and/or extended autocorrelation function (EACF) method.[10]

Other alternative methods include AIC, BIC, etc.[10] To determine the order of a non-seasonal ARIMA model, a useful criterion is the Akaike information criterion (AIC). It is written as

where L is the likelihood of the data, p is the order of the autoregressive part and q is the order of the moving average part. The k represents the intercept of the ARIMA model. For AIC, if k = 1 then there is an intercept in the ARIMA model (c ≠ 0) and if k = 0 then there is no intercept in the ARIMA model (c = 0).

The corrected AIC for ARIMA models can be written as

The Bayesian Information Criterion (BIC) can be written as

The objective is to minimize the AIC, AICc or BIC values for a good model. The lower the value of one of these criteria for a range of models being investigated, the better the model will suit the data. The AIC and the BIC are used for two completely different purposes. While the AIC tries to approximate models towards the reality of the situation, the BIC attempts to find the perfect fit. The BIC approach is often criticized as there never is a perfect fit to real-life complex data; however, it is still a useful method for selection as it penalizes models more heavily for having more parameters than the AIC would.

AICc can only be used to compare ARIMA models with the same orders of differencing. For ARIMAs with different orders of differencing, RMSE can be used for model comparison.

Forecasts using ARIMA models

[edit]

The ARIMA model can be viewed as a "cascade" of two models. The first is non-stationary:

while the second is wide-sense stationary:

Now forecasts can be made for the process , using a generalization of the method of autoregressive forecasting.

Forecast intervals

[edit]

The forecast intervals (confidence intervals for forecasts) for ARIMA models are based on assumptions that the residuals are uncorrelated and normally distributed. If either of these assumptions does not hold, then the forecast intervals may be incorrect. For this reason, researchers plot the ACF and histogram of the residuals to check the assumptions before producing forecast intervals.

95% forecast interval: , where is the variance of .

For , for all ARIMA models regardless of parameters and orders.

For ARIMA(0,0,q),

[citation needed]

In general, forecast intervals from ARIMA models will increase as the forecast horizon increases.

Variations and extensions

[edit]

A number of variations on the ARIMA model are commonly employed. If multiple time series are used then the can be thought of as vectors and a VARIMA model may be appropriate. Sometimes a seasonal effect is suspected in the model; in that case, it is generally considered better to use a SARIMA (seasonal ARIMA) model than to increase the order of the AR or MA parts of the model.[11] If the time-series is suspected to exhibit long-range dependence, then the d parameter may be allowed to have non-integer values in an autoregressive fractionally integrated moving average model, which is also called a Fractional ARIMA (FARIMA or ARFIMA) model.

Software implementations

[edit]

Various packages that apply methodology like Box–Jenkins parameter optimization are available to find the right parameters for the ARIMA model.

  • EViews: has extensive ARIMA and SARIMA capabilities.
  • Julia: contains an ARIMA implementation in the TimeModels package[12]
  • Mathematica: includes ARIMAProcess function.
  • MATLAB: the Econometrics Toolbox includes ARIMA models and regression with ARIMA errors
  • NCSS: includes several procedures for ARIMA fitting and forecasting.[13][14][15]
  • Python: the "statsmodels" package includes models for time series analysis – univariate time series analysis: AR, ARIMA – vector autoregressive models, VAR and structural VAR – descriptive statistics and process models for time series analysis.
  • R: the standard R stats package includes an arima function, which is documented in "ARIMA Modelling of Time Series". Besides the part, the function also includes seasonal factors, an intercept term, and exogenous variables (xreg, called "external regressors"). The package astsa has scripts such as sarima to estimate seasonal or nonseasonal models and sarima.sim to simulate from these models. The CRAN task view on Time Series is the reference with many more links. The "forecast" package in R can automatically select an ARIMA model for a given time series with the auto.arima() function [that can often give questionable results] [1] and can also simulate seasonal and non-seasonal ARIMA models with its simulate.Arima() function.[16]
  • Ruby: the "statsample-timeseries" gem is used for time series analysis, including ARIMA models and Kalman Filtering.
  • JavaScript: the "arima" package includes models for time series analysis and forecasting (ARIMA, SARIMA, SARIMAX, AutoARIMA)
  • C: the "ctsa" package includes ARIMA, SARIMA, SARIMAX, AutoARIMA and multiple methods for time series analysis.
  • SAFE TOOLBOXES: includes ARIMA modelling and regression with ARIMA errors.
  • SAS: includes extensive ARIMA processing in its Econometric and Time Series Analysis system: SAS/ETS.
  • IBM SPSS: includes ARIMA modeling in the Professional and Premium editions of its Statistics package as well as its Modeler package. The default Expert Modeler feature evaluates a range of seasonal and non-seasonal autoregressive (p), integrated (d), and moving average (q) settings and seven exponential smoothing models. The Expert Modeler can also transform the target time-series data into its square root or natural log. The user also has the option to restrict the Expert Modeler to ARIMA models, or to manually enter ARIMA nonseasonal and seasonal p, d, and q settings without Expert Modeler. Automatic outlier detection is available for seven types of outliers, and the detected outliers will be accommodated in the time-series model if this feature is selected.
  • SAP: the APO-FCS package[17] in SAP ERP from SAP allows creation and fitting of ARIMA models using the Box–Jenkins methodology.
  • SQL Server Analysis Services: from Microsoft includes ARIMA as a Data Mining algorithm.
  • Stata includes ARIMA modelling (using its arima command) as of Stata 9.
  • StatSim: includes ARIMA models in the Forecast web app.
  • Teradata Vantage has the ARIMA function as part of its machine learning engine.
  • TOL (Time Oriented Language) is designed to model ARIMA models (including SARIMA, ARIMAX and DSARIMAX variants) [2].
  • Scala: spark-timeseries library contains ARIMA implementation for Scala, Java and Python. Implementation is designed to run on Apache Spark.
  • PostgreSQL/MadLib: Time Series Analysis/ARIMA.
  • X-12-ARIMA: from the US Bureau of the Census

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The Autoregressive Integrated Moving Average (ARIMA) model is a classic statistical technique for understanding and predicting data, particularly non-stationary series that display trends or other patterns requiring transformation to stationarity. It integrates three core components—autoregression (AR), integration (I), and (MA)—to capture linear dependencies, differenced dynamics, short-term patterns, and noise in data over time, enabling forecasts based on historical patterns. The model is denoted as ARIMA(p, d, q), e.g., ARIMA(1,1,1), where p represents the number of lagged observations included in the autoregressive term, d indicates the number of differencing steps applied to remove trends and achieve stationarity, and q specifies the order of the moving average term, which models the influence of past forecast errors. Developed and popularized by statisticians and Gwilym M. Jenkins, ARIMA emerged as a cornerstone of analysis through their seminal 1970 work, which emphasized iterative model identification, estimation, and diagnostic checking for practical forecasting applications. ARIMA models assume the underlying time series can be represented as a of its own past values, differenced values, and past errors, making them suitable for univariate in fields such as , (e.g., stock or commodity prices), and . The autoregressive part (AR(p)) posits that the current value depends linearly on previous values plus a term, expressed as yt=c+ϕ1yt1++ϕpytp+ϵty_t = c + \phi_1 y_{t-1} + \cdots + \phi_p y_{t-p} + \epsilon_t, where ϕi\phi_i are parameters and ϵt\epsilon_t is . The moving average component (MA(q)) incorporates lagged forecast errors to account for short-term dynamics, given by yt=c+ϵt+θ1ϵt1++θqϵtqy_t = c + \epsilon_t + \theta_1 \epsilon_{t-1} + \cdots + \theta_q \epsilon_{t-q}. Differencing (the "I" in ARIMA) transforms non-stationary data by subtracting consecutive observations dd times, often once for series with constant trends, to stabilize the mean and variance before fitting an ARMA model to the resulting stationary series. While effective for short-term predictions, ARIMA's performance relies on proper parameter selection via methods like autocorrelation function (ACF) and (PACF) plots, and it may require extensions like seasonal ARIMA (SARIMA) for periodic data.

Model Fundamentals

Core Components

The Autoregressive Integrated Moving Average () model derives its name from the combination of three core statistical components: autoregressive (AR), integrated (I), and moving average (), forming a framework designed for univariate . This encapsulates a that extends traditional ARMA models to handle trends and non-stationarity inherent in many real-world datasets, such as economic indicators or inventory levels. The autoregressive (AR) component captures the linear dependency of the current value in a on its own lagged values, essentially modeling how past observations influence the present one within a . For instance, in an AR process, the value at time t is expressed as a weighted sum of previous values plus random , allowing the model to account for or persistence in the data. The integrated (I) component addresses non-stationarity by applying successive differencing operations to the original time series, transforming it into a stationary series where statistical properties like mean and variance remain constant over time. This differencing removes trends or drifts, making the data suitable for subsequent AR and MA modeling, with the order of integration indicating the number of differencing steps required. The (MA) component models the impact of past forecast errors or shocks on the current observation, representing the series as a of current and previous error terms from the model's predictions. It helps capture short-term fluctuations and smooth out irregularities that the AR component might not fully explain. Collectively, ARIMA integrates these elements to provide a flexible approach for modeling non-stationary time series: the I part achieves stationarity through differencing, after which the AR and MA components jointly describe the underlying dynamics of the transformed series for accurate forecasting. This combination enables ARIMA to handle a wide range of temporal patterns without assuming external regressors.

Historical Background

The foundations of autoregressive integrated moving average (ARIMA) models trace back to early 20th-century developments in time series analysis. Autoregressive (AR) models were first introduced by George Udny Yule in 1927, who applied them to analyze periodicities in sunspot data, demonstrating how current values could depend linearly on past values to capture apparent cycles in seemingly random series. This work laid the groundwork for modeling serial correlation. In 1931, Gilbert Thomas Walker extended Yule's ideas by exploring correlations across related time series, further refining the theoretical framework for AR processes and introducing methods to estimate parameters via what became known as the Yule-Walker equations. Moving average (MA) models emerged shortly thereafter, with Herman Wold's 1938 study on providing a rigorous decomposition theorem that represented any as an infinite MA of uncorrelated innovations. Wold's contribution established the MA as a fundamental building block for capturing the effects of past shocks on current observations. The AR and MA components were combined into autoregressive (ARMA) models for , first described by Peter Whittle in 1951. The formalization of ARIMA occurred in 1970 with George E. P. Box and Gwilym M. Jenkins' seminal book, Time Series Analysis: Forecasting and Control, which integrated AR, MA, and differencing (the "I" in ) into a unified methodology for identifying, estimating, and with . This Box-Jenkins approach revolutionized practical analysis by emphasizing iterative model building and diagnostic checks. Following its publication, the Box-Jenkins methodology gained widespread adoption in the 1970s and , as empirical studies demonstrated 's superior forecasting accuracy over large-scale econometric models for short-term univariate predictions. By the , had become a standard tool in econometric and statistical software, such as the Time Series Processor (TSP) versions from the early and SAS/ETS procedures introduced around the same period, enabling broader application in , , and .

Mathematical Foundations

General Formulation

The autoregressive integrated moving average (ARIMA) model of order (p, d, q) provides a framework for modeling time series data that exhibit non-stationarity, which is addressed through differencing. The general formulation is given by the equation ϕ(B)(1B)dyt=θ(B)ϵt,\phi(B) (1 - B)^d y_t = \theta(B) \epsilon_t, where BB denotes the backshift operator such that Byt=yt1B y_t = y_{t-1}, ϕ(B)=1ϕ1BϕpBp\phi(B) = 1 - \phi_1 B - \cdots - \phi_p B^p is the autoregressive operator polynomial of degree pp, θ(B)=1+θ1B++θqBq\theta(B) = 1 + \theta_1 B + \cdots + \theta_q B^q is the moving average operator polynomial of degree qq, and {ϵt}\{\epsilon_t\} is a white noise process with mean zero and constant variance σ2\sigma^2. This equation originates from the foundational work on time series modeling, where the integration component (1B)d(1 - B)^d transforms a non-stationary series into a stationary one suitable for ARMA analysis. In this notation, the parameter pp specifies the number of lagged observations included in the autoregressive component, dd indicates the degree of non-seasonal differencing required to achieve stationarity, and qq represents the number of lagged forecast errors incorporated in the component. The model assumes in the relationships, that the dd-th differenced series dyt=(1B)dyt\nabla^d y_t = (1 - B)^d y_t is stationary (with constant and autocovariance structure depending only on lag), and that the moving average θ(B)\theta(B) is invertible, meaning its roots lie outside the unit circle in the to ensure the model can be expressed in an infinite AR form. Additionally, the autoregressive ϕ(B)\phi(B) must have roots outside the unit circle to guarantee stationarity of the differenced process. The formulation extends the autoregressive (ARMA) model by incorporating the integration step: an ARMA(p, q) model ϕ(B)zt=θ(B)ϵt\phi(B) z_t = \theta(B) \epsilon_t is applied to the differenced series zt=(1B)dytz_t = (1 - B)^d y_t, yielding the full ARIMA structure for handling trends and non-stationarity without separate deterministic components. For with periodic patterns, a seasonal ARIMA (SARIMA) variant is denoted as ARIMA(p, d, q)(P, D, Q)_s, where lowercase parameters apply to the non-seasonal components, uppercase (P, D, Q) to the seasonal autoregressive, differencing, and moving average orders, and ss is the length of the seasonal cycle (e.g., 12 for monthly data).

Stationarity and Integration

In time series analysis, weak stationarity refers to a stochastic process where the mean, variance, and autocovariance are constant over time, meaning E(yt)=μE(y_t) = \mu for all tt, Var(yt)=σ2\text{Var}(y_t) = \sigma^2 for all tt, and Cov(yt,yt+k)=γk\text{Cov}(y_t, y_{t+k}) = \gamma_k independent of tt for any lag kk. This condition ensures that the statistical properties of the series do not change systematically, allowing for reliable modeling of dependencies. To assess stationarity, statistical tests such as the Augmented Dickey-Fuller (ADF) test are commonly employed. The ADF test extends the original Dickey-Fuller test by including lagged difference terms to account for higher-order autoregressive processes, testing the of a (non-stationarity) against the alternative of stationarity. The is compared to critical values; if it is more negative than the critical value (e.g., at 5% significance), the null is rejected, indicating the series is stationary. Conversely, failure to reject suggests non-stationarity, often requiring transformation. The integrated component in ARIMA models addresses non-stationarity by applying differencing to make the series stationary. A series is integrated of order dd, denoted I(d)I(d), if it becomes stationary after dd differences; the parameter dd specifies the number of differencing operations needed. First-order differencing, the most common case, is mathematically represented as Δyt=ytyt1\Delta y_t = y_t - y_{t-1}, which removes a linear trend by subtracting consecutive observations. Higher-order differencing applies this recursively, such as second-order Δ2yt=Δ(Δyt)=(ytyt1)(yt1yt2)\Delta^2 y_t = \Delta(\Delta y_t) = (y_t - y_{t-1}) - (y_{t-1} - y_{t-2}), until stationarity is achieved. Over-differencing, where dd exceeds the true order, introduces unnecessary noise and an invertible moving average component of order 1, leading to inefficient parameter estimates and inflated forecast variances, though the model remains identifiable. Under-differencing, applying insufficient differences, leaves residual non-stationarity, resulting in invalid inferences, spurious autocorrelations, and biased forecasts that fail to capture the true dynamics. Proper selection of dd is thus essential for model validity and predictive accuracy.

Model Building Process

Order Selection

Order selection in ARIMA models involves determining the appropriate values for the autoregressive order pp, the degree of differencing dd, and the order qq, guided by the . This iterative process begins with model identification, where the analyst examines the to assess stationarity and tentative orders, followed by and diagnostic checking to refine the choice. The emphasizes empirical examination of the data's structure through graphical tools and statistical tests, ensuring the selected model captures the underlying patterns without unnecessary complexity. To select the integration order dd, tests play a crucial role in determining the number of differences needed to achieve stationarity. The augmented Dickey-Fuller (ADF) test, for instance, evaluates the of a unit root against the alternative of stationarity by estimating an and testing the significance of the coefficient on the lagged dependent variable. If the test rejects the null, no differencing (d=0d=0) is required; otherwise, differencing is applied until stationarity is confirmed, typically with d=1d=1 or d=2d=2 sufficing for most series. Once stationarity is established, the autoregressive order pp and moving average order qq are identified using autocorrelation function (ACF) and (PACF) plots of the differenced series. For a pure AR(pp) process, the ACF decays gradually or shows a sinusoidal , while the PACF cuts off abruptly after lag pp, indicating significant partial correlations up to pp and insignificance thereafter. Conversely, for a pure MA(qq) process, the PACF decays gradually, but the ACF cuts off after lag qq. Mixed ARMA models exhibit more complex , such as decaying ACF and PACF beyond the respective orders, requiring tentative fitting and . To objectively compare candidate models suggested by ACF and PACF, information criteria such as the (AIC) and (BIC) are employed. The AIC is calculated as AIC=2lnL+2k\text{AIC} = -2 \ln L + 2k, where LL is the maximized likelihood of the model and kk is the number of estimated parameters, balancing goodness-of-fit with model complexity by penalizing excessive parameters. The BIC, given by BIC=2lnL+klnn\text{BIC} = -2 \ln L + k \ln n with nn as the sample size, imposes a stronger penalty on complexity, favoring more parsimonious models especially in larger samples. Lower values indicate better models, with BIC often preferred for its consistency in selecting the true order asymptotically. Throughout order selection, the principle of parsimony is essential to mitigate overfitting risks, where overly complex models capture rather than signal, leading to poor out-of-sample forecasts. The Box-Jenkins approach advocates selecting the simplest model that adequately fits the , as redundancy in AR and MA terms can cause parameter cancellation and instability; for example, higher-order terms may mimic lower-order ones, inflating variance without improving predictions. Overfitting is particularly evident when ACF/PACF suggest high pp or qq, but criteria like BIC select lower orders to ensure generalizability.

Differencing Techniques

Differencing is a fundamental transformation in modeling used to convert a non-stationary time series into a stationary one by eliminating trends and stabilizing the . The order of differencing, denoted by dd in the (p, d, q) framework, indicates the number of times the series must be differenced to achieve approximate stationarity. differencing, the most common approach, computes the differences between consecutive observations and is defined as Δyt=ytyt1\Delta y_t = y_t - y_{t-1}. This method effectively removes linear trends but may not suffice for series with higher-degree polynomials or seasonal patterns. Higher-order differencing extends this process iteratively to address more complex non-stationarities, such as quadratic trends. For second-order differencing, the transformation is applied to the first differences: Δ2yt=Δ(ytyt1)=(ytyt1)(yt1yt2)\Delta^2 y_t = \Delta (y_t - y_{t-1}) = (y_t - y_{t-1}) - (y_{t-1} - y_{t-2}), which simplifies to yt2yt1+yt2y_t - 2y_{t-1} + y_{t-2}. Subsequent orders follow similarly, with the kk-th order difference expressed using binomial coefficients: Δkyt=i=0k(1)i(ki)yti\Delta^k y_t = \sum_{i=0}^k (-1)^i \binom{k}{i} y_{t-i}. While effective for trends, excessive differencing can amplify high-frequency noise and inflate variance, complicating model interpretability by shifting focus from original levels to changes in changes. Practitioners are advised to use the minimal order necessary to avoid over-differencing. Seasonal differencing targets periodic fluctuations by subtracting observations from the same season in prior periods, defined as Δsyt=ytyts\Delta_s y_t = y_t - y_{t-s} where ss is the seasonal period (e.g., 12 for monthly ). This can be combined with non-seasonal differencing, as in the SARIMA extension, to handle both trend and simultaneously. For instance, in the classic monthly international passenger (1949–1960), first-order non-seasonal differencing removes the upward linear trend, resulting in a series with stable mean but persistent annual cycles; applying additional seasonal differencing with s=12s=12 yields a nearly with constant variance and no evident patterns. Such transformations enhance forecast reliability but require careful assessment to preserve the underlying dynamics. The order of differencing is typically determined through iterative visual inspection of time series plots. Starting with the original series, one differences and replots repeatedly, stopping when the resulting series exhibits no visible trend, stable variance, and random fluctuations around a constant mean—hallmarks of stationarity. This graphical approach, advocated in foundational methodology, allows practitioners to gauge the minimal dd empirically, often supplemented by brief checks using stationarity tests for confirmation. In the airline passenger example, iterative plotting reveals that one non-seasonal and one seasonal difference suffice, balancing transformation efficacy with minimal distortion to variance and interpretive clarity. Over-differencing risks introducing unnecessary variability, which can degrade forecast accuracy and obscure economic or practical insights from the model.

Parameter Estimation

Estimation Methods

Parameter estimation in ARIMA models, after differencing to achieve stationarity, primarily involves fitting the underlying ARMA(p, q) parameters using (MLE) under the assumption of Gaussian-distributed errors. This approach maximizes the of the observed data given the model parameters, providing asymptotically efficient estimates. The log-likelihood is typically formulated based on the innovations algorithm or to account for the dependence structure. Two common variants of likelihood-based estimation are conditional least squares and exact maximum likelihood. Conditional least squares approximates the likelihood by conditioning on initial values for the process, often setting pre-sample errors to zero, which simplifies but introduces some , particularly for short series or high-order components. In contrast, exact maximum likelihood incorporates the full likelihood by properly accounting for initial conditions, yielding more accurate estimates at the cost of greater computational intensity; it is preferred for precise inference. For components, which depend on past errors, initial values are crucial and are often obtained via . This technique involves reversing the and fitting an ARMA model to generate estimates of pre-sample residuals, ensuring the likelihood computation starts from plausible values. Parameter optimization proceeds iteratively using numerical methods such as the Newton-Raphson algorithm, which updates estimates based on the and Hessian of the log-likelihood until convergence criteria are met, typically defined by small changes in parameter values or likelihood (e.g., less than 0.1% relative change). When errors deviate from Gaussianity, such as in the presence of heavy tails or , standard MLE may still be applied as quasi-maximum likelihood , which remains consistent under mild misspecification but loses ; alternatively, full MLE can incorporate specified non-Gaussian distributions like Student's t, though this increases .

Model Diagnostics

Model diagnostics for autoregressive integrated moving average () models focus on validating the fitted model by scrutinizing the residuals to confirm that the model's assumptions are met and that no systematic patterns remain unexplained. These checks are essential after parameter to ensure the model adequately captures the dynamics without residual structure that could bias forecasts. Residual analysis forms the cornerstone of ARIMA diagnostics, aiming to verify that the residuals—defined as the differences between observed and fitted values—exhibit properties of white noise, including zero mean, constant variance, and lack of serial correlation. Visual inspection begins with plotting the residuals over time to detect any trends, heteroscedasticity, or outliers, followed by the autocorrelation function (ACF) plot of the residuals. An adequate model produces an ACF plot where all autocorrelations beyond lag zero fall within the confidence bands (typically ±1.96/√n), indicating no significant linear dependencies remain in the data. To quantitatively assess the absence of autocorrelation in residuals, the Ljung-Box test is widely applied. This evaluates the joint hypothesis that the first h autocorrelations of the residuals are zero. The is calculated as Q=n(n+2)k=1hρ^k2nk,Q = n(n + 2) \sum_{k=1}^{h} \frac{\hat{\rho}_k^2}{n - k}, where nn is the effective sample size after differencing, hh is the number of lags tested (often chosen as 10 or 20), and ρ^k\hat{\rho}_k denotes the sample at lag kk. For an ARIMA(p,d,q) model, Q asymptotically follows a with hpqh - p - q degrees of freedom under the of white noise residuals. A greater than a chosen significance level (e.g., 0.05) fails to reject the null, supporting model adequacy; conversely, low p-values signal remaining autocorrelation, prompting model refinement. Normality of residuals is another key assumption for valid inference and forecasting in ARIMA models, particularly when using . The Jarque-Bera test provides a formal evaluation by testing whether the sample and match those of a (zero and of 3). The test statistic is JB=n6(S2+(K3)24),JB = \frac{n}{6} \left( S^2 + \frac{(K - 3)^2}{4} \right), where nn is the sample size, SS is the sample , and KK is the sample of the residuals. Under normality, JB follows a with 2 . A non-significant (e.g., >0.05) indicates that the residuals are consistent with normality; significant results may suggest the need for transformations or alternative models, though mild deviations are often tolerable for large samples. Overfitting in ARIMA models, where excessive parameters capture noise rather than signal, is detected through out-of-sample validation. This involves partitioning the into and validation sets, fitting the model on the , and assessing predictive on the unseen validation set using metrics such as or root . If out-of-sample errors substantially exceed in-sample errors, or if simpler models perform comparably on validation , overfitting is evident, and model orders should be reduced. In practice, diagnostic outputs from fitted ARIMA models integrate these checks for comprehensive interpretation. For instance, consider an model fitted to monthly sales data: the Ljung-Box Q-statistic for 12 lags might yield Q = 14.3 with a of 0.29, confirming no significant ; the Jarque-Bera statistic could be JB = 1.8 with 0.41, supporting normality; and an ACF plot of residuals showing all bars within confidence limits would visually affirm white noise properties. Such results collectively validate the model for reliable , whereas deviations (e.g., significant Q ) would indicate inadequate specification.

Forecasting Applications

Forecast Computation

Forecasting with an ARIMA(p,d,q) model involves generating point estimates for future values of the yty_t. The begins by applying the differencing operator dd times to achieve stationarity, resulting in a differenced series zt=dytz_t = \nabla^d y_t, which follows an ARMA(p,q) . Forecasts for ztz_t are then computed, and the original series forecasts are obtained by reversing the differencing through cumulative summation. This approach ensures that the forecasts account for the integration order in the model. For the one-step-ahead forecast of the differenced series, denoted z^t+1t\hat{z}_{t+1|t}, the ARMA(p,q) model provides: z^t+1t=i=1pϕizt+1i+j=1qθjet+1j,\hat{z}_{t+1|t} = \sum_{i=1}^p \phi_i z_{t+1-i} + \sum_{j=1}^q \theta_j e_{t+1-j}, where ϕi\phi_i are the autoregressive parameters, θj\theta_j are the parameters, zt+1iz_{t+1-i} are observed past values of the differenced series, and et+1je_{t+1-j} are past forecast errors (residuals) from the fitted model. If d=1d=1, the one-step-ahead forecast for the original series is y^t+1t=yt+z^t+1t\hat{y}_{t+1|t} = y_t + \hat{z}_{t+1|t}. This formula originates from the in the ARMA representation, as detailed in the foundational work on models. Multi-step-ahead forecasts are generated recursively by applying the ARMA model iteratively, substituting previous forecasts for unavailable future observations. For the h-step-ahead forecast z^t+ht\hat{z}_{t+h|t}, the autoregressive terms use a combination of observed and previously forecasted differenced values, while the moving average terms incorporate past errors; however, the MA component decays to zero for horizons h>qh > q, simplifying to a pure AR forecast beyond that point. For an integrated model with d=1d=1, the h-step-ahead forecast is y^t+ht=yt+j=1hz^t+jt\hat{y}_{t+h|t} = y_t + \sum_{j=1}^h \hat{z}_{t+j|t}, cumulatively summing the differenced forecasts to reverse the integration. This recursive method ensures consistency with the model's structure. The variance of forecast errors generally increases with the forecast horizon, reflecting accumulating uncertainty from model predictions and unobserved shocks. For models, the one-step-ahead variance is approximately the residual variance σ2\sigma^2, but for longer horizons, it grows due to the propagation of errors through the autoregressive structure and the integration, often linearly for d=1 models. A practical example is quarterly (GDP) using an (1,1,1) model, commonly applied to economic exhibiting trends and mild . Fitting the model to historical quarterly GDP data, the forecasts incorporate recent differenced values and residuals, with multi-step forecasts showing the influence of the MA term diminishing over time, integrated back to yield absolute GDP levels. Such applications demonstrate ARIMA's utility in macroeconomic .

Interval Estimation

Interval estimation in ARIMA models involves constructing prediction intervals around point forecasts to quantify the uncertainty associated with future predictions. These intervals provide a range within which the actual future values are expected to lie with a specified probability, typically assuming normality of the forecast errors. The width of these intervals increases with the forecast horizon, reflecting growing uncertainty over time. The forecast standard error for the h-step ahead prediction is derived from the variance of the forecast error, given by σh2=σ2(1+i=1h1ψi2)\sigma_h^2 = \sigma^2 \left(1 + \sum_{i=1}^{h-1} \psi_i^2 \right), where σ2\sigma^2 is the innovation variance and ψi\psi_i are the coefficients of the infinite moving average (MA(∞)) representation of the ARIMA model. The ψi\psi_i weights capture the cumulative effect of past shocks on future values and are obtained by inverting the ARIMA model into its MA(∞) form. This formula accounts for the accumulation of uncertainty from unobserved future errors. Approximate prediction intervals are commonly constructed assuming the forecast errors follow a . For a (1 - α) , the interval is y^t+ht±zα/2σh\hat{y}_{t+h|t} \pm z_{\alpha/2} \sigma_h, where y^t+ht\hat{y}_{t+h|t} is the point forecast and zα/2z_{\alpha/2} is the upper α/2 of the standard . This approach is straightforward and widely used, particularly for large samples or longer horizons where the justifies normality. For short forecast horizons, exact prediction intervals can be computed by directly deriving the distribution of the , often using the model's or recursive methods, which avoid the normality assumption. However, for longer horizons, these exact methods become computationally intensive, and the normal approximation is preferred for practicality. Fan charts offer a graphical representation of these probabilistic forecasts, displaying a series of nested intervals that fan out over time to illustrate increasing . Each shaded region corresponds to a cumulative probability , such as 10%, %, up to 90%, providing a comprehensive view of the forecast distribution rather than just point estimates and bounds. This visualization is particularly effective for communicating in applications like . As an example, consider quarterly stock prices with an (1,1,0) model fitted to historical data. The 95% prediction intervals for the next four quarters widen over time, highlighting how uncertainty accumulates and informs in financial .

Extensions and Implementations

Advanced Variations

The basic model assumes stationarity after integer differencing and captures linear dependencies through autoregressive and moving average components, but it often fails to adequately model real-world exhibiting , external influences, long-range dependencies, or abrupt changes. Advanced variations extend the framework to address these limitations, enhancing flexibility and accuracy for complex data patterns. These extensions maintain the core structure while incorporating additional parameters or mechanisms, as originally outlined in foundational methodologies. Seasonal ARIMA (SARIMA) models address the limitation of basic in handling periodic patterns, such as monthly or quarterly cycles in economic or environmental data. The SARIMA(p,d,q)(P,D,Q)_s formulation combines non-seasonal components with seasonal autoregressive (P), differencing (D), and (Q) terms, where s denotes the seasonal period (e.g., s=12 for monthly data). The general model is expressed as: ϕp(B)ΦP(Bs)(1B)d(1Bs)Dyt=θq(B)ΘQ(Bs)ϵt\phi_p(B) \Phi_P(B^s) (1 - B)^d (1 - B^s)^D y_t = \theta_q(B) \Theta_Q(B^s) \epsilon_t Here, ϕp(B)\phi_p(B) and θq(B)\theta_q(B) are the non-seasonal polynomials, ΦP(Bs)\Phi_P(B^s) and ΘQ(Bs)\Theta_Q(B^s) are the seasonal counterparts evaluated at lag s, and ϵt\epsilon_t is white noise. This structure allows SARIMA to capture both short-term and seasonal autocorrelations, improving fits for data like airline passenger counts or retail sales. SARIMA was developed as part of the Box-Jenkins methodology to model multiplicative seasonal effects without assuming independence across cycles. ARIMAX extends by incorporating exogenous variables, addressing cases where the is influenced by external factors such as weather, policy changes, or marketing efforts, which basic ARIMA ignores. The model integrates a to link the input series xtx_t to the output yty_t, typically formulated as an ARIMA process for the plus a filtered exogenous component: yt=ν(B)xt+θq(B)ϕp(B)aty_t = \nu(B) x_t + \frac{\theta_q(B)}{\phi_p(B)} a_t, where ν(B)\nu(B) is the (often ν(B)=ω(B)δ(B)\nu(B) = \frac{\omega(B)}{\delta(B)} for numerator ω\omega and denominator δ\delta of orders r and s), and ata_t is the residual. This allows quantification of causal impacts, such as how temperature affects electricity demand. The approach builds on transfer function-noise models introduced in the Box-Jenkins framework for dynamic regression. Fractionally integrated ARIMA (ARFIMA) tackles the limitation of integer-order differencing in basic , which cannot model long-memory processes where shocks persist indefinitely with hyperbolic decay rather than exponential. In ARFIMA(p,d,q), the integration parameter d is fractional (typically 0<d<0.50 < d < 0.5 for stationarity with ), generalizing the differencing operator to (1B)d=k=0(dk)(1)kBk(1 - B)^d = \sum_{k=0}^\infty \binom{d}{k} (-1)^k B^k. The model becomes ϕp(B)(1B)dyt=θq(B)ϵt\phi_p(B) (1 - B)^d y_t = \theta_q(B) \epsilon_t, enabling capture of persistent autocorrelations in series like river flows or financial volatility. ARFIMA was introduced to formalize dynamics, distinguishing them from short-memory ARIMA processes. Intervention analysis extends to detect and model structural breaks or external shocks, such as implementations or disasters, which basic treats as unmodeled noise. It augments the with an intervention term: yt=ω(B)δ(B)ξt+Nty_t = \frac{\omega(B)}{\delta(B)} \xi_t + N_t, where ξt\xi_t is a deterministic input (e.g., a ItI_t for permanent shifts or for temporary), ω(B)/δ(B)\omega(B)/\delta(B) is the capturing dynamic response, and NtN_t is the ARIMA noise process. This method estimates the magnitude and duration of impacts, as applied to economic interventions like tax changes. The technique was developed to assess point or gradual disruptions in time series. These variations collectively overcome key shortcomings of standard , such as inability to handle (via SARIMA), external drivers (ARIMAX), slow decay in correlations (ARFIMA), and sudden changes (intervention analysis), thereby broadening applicability in fields like , , and .

Software Tools

Several software tools and libraries facilitate the implementation, estimation, and forecasting of ARIMA models across various programming languages, each offering distinct features for time series analysis. In R, the base stats package provides the arima() function for fitting ARIMA models to univariate time series data, supporting maximum likelihood estimation and handling differenced models with missing values. This function allows specification of the ARIMA order (p, d, q) and includes options for seasonal components via SARIMA extensions. For forecasting, the forecast() function from the companion forecast package wraps around arima() to generate predictions and confidence intervals, enabling drift terms not available in the base implementation. Python's statsmodels library implements through the statsmodels.tsa.arima.model.[ARIMA](/page/Arima) class, which serves as the primary interface for univariate models, including support for exogenous regressors and seasonal components. It performs estimation using methods like conditional or maximum likelihood and provides methods for , , and residual . For automated order selection, the pmdarima library's auto_arima() function identifies optimal (p, d, q) parameters by minimizing information criteria such as AIC or BIC, returning a fitted model compatible with statsmodels. MATLAB offers the arima object in the Econometrics Toolbox for creating and estimating ARIMA(p, D, q) models, where users specify orders and parameters like non-zero means or seasonal lags. The estimate method fits the model to data using maximum likelihood, while forecast computes multi-step predictions with optional simulation for uncertainty quantification. Additional functions like infer for residuals and simulate for generating response paths support comprehensive model diagnostics and validation. In Julia, the StateSpaceModels.jl package supports ARIMA modeling within a state-space framework, allowing of ARIMA(p, d, q) models via functions like fit_arima for parameter optimization and forecasting. It leverages Julia's for efficient handling of large datasets and includes tools for and residual analysis, though it requires familiarity with state-space representations. Comparisons across these tools highlight differences in ease of use, , and visualization. MATLAB excels in ease of use for beginners due to its integrated environment and intuitive syntax for ARIMA specification, making it ideal for quick prototyping without extensive coding. R and Python offer comparable accessibility, with R's arima() providing straightforward integration with base functions and Python's statsmodels benefiting from extensive support, though Python's auto_arima simplifies order selection more than R's manual approaches. Julia's StateSpaceModels.jl is less beginner-friendly due to its abstract state-space interface but offers superior for large-scale computations thanks to Julia's just-in-time and parallelization capabilities. For visualization and diagnostics, R stands out with seamless integration to ggplot2 for plotting residuals, ACF/PACF, and forecast intervals, while Python relies on matplotlib or seaborn for similar tasks, often requiring additional setup; MATLAB provides built-in plotting functions, and Julia uses Plots.jl effectively but with a steeper .

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.