Hubbry Logo
search
logo

Linear prediction

logo
Community Hub0 Subscribers
Read side by side
from Wikipedia

Linear prediction is a mathematical operation where future values of a discrete-time signal are estimated as a linear function of previous samples.

In digital signal processing, linear prediction is often called linear predictive coding (LPC) and can thus be viewed as a subset of filter theory. In system analysis, a subfield of mathematics, linear prediction can be viewed as a part of mathematical modelling or optimization.

The prediction model

[edit]

The most common representation is

where is the predicted signal value, the previous observed values, with , and the predictor coefficients. The error generated by this estimate is

where is the true signal value.

These equations are valid for all types of (one-dimensional) linear prediction. The differences are found in the way the predictor coefficients are chosen.

For multi-dimensional signals the error metric is often defined as

where is a suitable chosen vector norm. Predictions such as are routinely used within Kalman filters and smoothers to estimate current and past signal values, respectively, from noisy measurements.[1]

Estimating the parameters

[edit]

The most common choice in optimization of parameters is the root mean square criterion which is also called the autocorrelation criterion. In this method we minimize the expected value of the squared error , which yields the equation

for 1 ≤ jp, where R is the autocorrelation of signal xn, defined as

,

and E is the expected value. In the multi-dimensional case this corresponds to minimizing the L2 norm.

The above equations are called the normal equations or Yule-Walker equations. In matrix form the equations can be equivalently written as

where the autocorrelation matrix is a symmetric, Toeplitz matrix with elements , the vector is the autocorrelation vector , and , the parameter vector.

Another, more general, approach is to minimize the sum of squares of the errors defined in the form

where the optimisation problem searching over all must now be constrained with .

On the other hand, if the mean square prediction error is constrained to be unity and the prediction error equation is included on top of the normal equations, the augmented set of equations is obtained as

where the index ranges from 0 to , and is a matrix.

Specification of the parameters of the linear predictor is a wide topic and a large number of other approaches have been proposed. In fact, the autocorrelation method is the most common[2] and it is used, for example, for speech coding in the GSM standard.

Solution of the matrix equation is computationally a relatively expensive process. The Gaussian elimination for matrix inversion is probably the oldest solution but this approach does not efficiently use the symmetry of . A faster algorithm is the Levinson recursion proposed by Norman Levinson in 1947, which recursively calculates the solution.[citation needed] In particular, the autocorrelation equations above may be more efficiently solved by the Durbin algorithm.[3]

In 1986, Philippe Delsarte and Y.V. Genin proposed an improvement to this algorithm called the split Levinson recursion, which requires about half the number of multiplications and divisions.[4] It uses a special symmetrical property of parameter vectors on subsequent recursion levels. That is, calculations for the optimal predictor containing terms make use of similar calculations for the optimal predictor containing terms.

Another way of identifying model parameters is to iteratively calculate state estimates using Kalman filters and obtaining maximum likelihood estimates within expectation–maximization algorithms.

For equally-spaced values, a polynomial interpolation is a linear combination of the known values. If the discrete time signal is estimated to obey a polynomial of degree then the predictor coefficients are given by the corresponding row of the triangle of binomial transform coefficients. This estimate might be suitable for a slowly varying signal with low noise. The predictions for the first few values of are

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Linear prediction is a signal processing technique that estimates the current value of a discrete-time signal as a linear combination of its previous samples, minimizing the prediction error through least-squares optimization.[1] This approach models signals, particularly stationary random processes, using an autoregressive framework where the predicted sample x^n=k=1pakxnk\hat{x}_n = \sum_{k=1}^p a_k x_{n-k}, with aka_k as predictor coefficients and pp as the model order, yielding a residual error en=xnx^ne_n = x_n - \hat{x}_n.[2] Originating from early 20th-century work on time series analysis, such as Yule's 1927 application to sunspot data and advancements by Wiener and Kolmogorov in the 1940s for extrapolation of stationary processes, linear prediction has become foundational in fields like speech analysis and geophysical signal processing.[2] In speech processing, linear prediction underpins linear predictive coding (LPC), which represents the vocal tract as an all-pole filter driven by a periodic impulse train for voiced sounds or noise for unvoiced ones, enabling efficient estimation of formants, pitch, and spectral envelopes.[3] The core principle involves solving the Yule-Walker equations derived from the autocorrelation of the signal to obtain the coefficients, often via efficient algorithms like Levinson-Durbin recursion, which achieves computational complexity of O(p2)O(p^2).[1] Two primary estimation methods dominate: the autocorrelation method, which assumes signal stationarity and applies windowing (e.g., Hamming) over the entire frame to ensure filter stability, and the covariance method, which handles non-stationarity by minimizing error over finite intervals but risks instability without additional constraints.[3] These techniques facilitate applications such as speech compression in codecs like CELP, synthesis for vocoders, speaker identification, and even extensions like perceptual linear prediction for improved psychoacoustic modeling.[3][4] Beyond speech, linear prediction extends to diverse domains including electroencephalogram (EEG) analysis for brain signal modeling, seismic data processing in geophysics, and image compression through autoregressive techniques.[2] Its efficacy stems from the parsimonious all-pole model, which captures smooth spectral envelopes with fewer parameters than direct Fourier methods, though it assumes linearity and may require hybrid pole-zero models for complex signals.[3] Influential developments, such as Itakura and Saito's maximum likelihood formulation in 1968, have solidified its role in modern digital signal processing tools and standards.[2]

Fundamentals

Definition and Basic Concept

Linear prediction is a fundamental technique in signal processing for estimating the value of a discrete-time signal at a given time instant based on a linear combination of its previous values. Formally, the predicted sample x^(n)\hat{x}(n) is approximated as
x^(n)=k=1pakx(nk), \hat{x}(n) = \sum_{k=1}^{p} a_k x(n-k),
where pp is the prediction order, aka_k are the coefficients determined to minimize the prediction error, and x(nk)x(n-k) denote the pp past samples. The resulting prediction error is defined as e(n)=x(n)x^(n)e(n) = x(n) - \hat{x}(n), representing the difference between the actual and estimated values. This approach models the signal as an output of a linear system driven by past observations, enabling efficient representation and analysis of correlated data.[5] The method operates under key assumptions that ensure its applicability and optimality. Primarily, it assumes the underlying signal process is stationary, meaning its statistical properties, such as mean and autocorrelation, remain constant over time, which allows the use of time-invariant coefficients. Additionally, the predictor is strictly linear in the coefficients aka_k, facilitating straightforward optimization without introducing nonlinear complexities. These assumptions underpin the technique's effectiveness in scenarios where signals exhibit temporal dependencies, such as in speech or geophysical data.[5] To minimize the mean squared prediction error E[e2(n)]E[e^2(n)], the coefficients aka_k are selected according to the orthogonality principle, which posits that the optimal error e(n)e(n) must be uncorrelated with each of the past samples used in the prediction. Mathematically, this requires E[e(n)x(nk)]=0E[e(n) x(n-k)] = 0 for k=1,2,,pk = 1, 2, \dots, p, ensuring the error contains no component predictable from the observed data window. This principle derives from the projection theorem in Hilbert space and guarantees the least-squares solution for linear estimators.[5] A illustrative example is one-step-ahead prediction (p=1p=1) applied to a white noise signal, where consecutive samples are uncorrelated with zero mean and unit variance. Here, the optimal coefficient a1=0a_1 = 0, yielding x^(n)=0\hat{x}(n) = 0 and e(n)=x(n)e(n) = x(n), so the prediction error variance equals the signal variance σx2=1\sigma_x^2 = 1. Consequently, the prediction gain, defined as 10log10(σx2/σe2)10 \log_{10} (\sigma_x^2 / \sigma_e^2), is zero decibels, demonstrating that linear prediction offers no improvement for inherently unpredictable, uncorrelated processes.[5]

Historical Development

The roots of linear prediction trace back to 19th-century advancements in time series analysis, where statisticians like Adolphe Quetelet, Francis Galton, and Karl Pearson developed foundational methods for modeling sequential data and regression, laying the groundwork for predictive techniques in stochastic processes.[6] A pivotal early contribution came in 1927 when George Udny Yule applied autoregressive models to analyze and predict sunspot data, introducing a method to investigate periodicities in disturbed series that marked the first explicit use of linear prediction in time series forecasting. This work demonstrated how past values could linearly predict future ones, influencing subsequent statistical modeling of temporal dependencies. In 1938, Herman Wold advanced the theoretical underpinnings in his doctoral thesis, decomposing stationary stochastic processes into deterministic and purely stochastic components, which provided a rigorous link between time series decomposition and optimal linear prediction theory. Wold's framework emphasized the role of innovation processes in prediction, establishing key principles for representing stochastic phenomena as sums of predictable and unpredictable parts. Post-World War II, linear prediction gained prominence in speech signal processing, with significant developments at Bell Laboratories in the 1960s and 1970s. Bishnu S. Atal and colleagues explored linear predictive coding (LPC) for speech analysis and synthesis, building on earlier statistical approaches to model vocal tract resonances efficiently.[7] Independently, Fumitada Itakura at NTT Laboratories in 1967 introduced maximum likelihood methods for LPC parameter estimation, adapting prediction techniques to spectral envelope representation in audio signals. John Burg's 1967 presentation on maximum entropy spectral analysis further propelled the field by proposing an efficient method for estimating prediction error and coefficients, which avoided assumptions about unobserved autocorrelation lags and improved resolution in short data records, influencing algorithms in digital signal processing throughout the 1970s.[8] A key milestone was John D. Markel's 1971 paper on LPC applications, which detailed formant trajectory estimation and synthesis techniques, enabling practical speech processing systems. By the 1980s, linear prediction was integrated into telecommunications standards, notably the GSM full-rate speech codec adopted in 1990, which employed regular pulse excitation with long-term prediction (RPE-LTP) based on LPC principles to achieve efficient low-bitrate voice compression for mobile networks. This adoption marked linear prediction's transition from theoretical tool to widespread engineering application in digital communications.

Mathematical Formulation

Forward Linear Prediction Model

The forward linear prediction model estimates the current value of a discrete-time signal x(n)x(n) using a linear combination of its pp previous samples, enabling causal prediction of future samples from past observations. This approach assumes the signal follows an autoregressive process where each sample is predictable from earlier ones, minimizing the mean-squared prediction error through optimal coefficient selection. The model is foundational in signal processing for applications requiring efficient representation of correlated data, such as speech and time series analysis.[9] The predicted value x^(n)\hat{x}(n) for a pp-th order forward predictor is expressed as
x^(n)=k=1pap(k)x(nk), \hat{x}(n) = -\sum_{k=1}^{p} a_p(k) x(n-k),
where ap(k)a_p(k) denote the prediction coefficients that weight the past samples x(nk)x(n-k). These coefficients are chosen to minimize the expected value of the squared error, effectively capturing the signal's short-term correlations. The negative sign convention aligns with the autoregressive filter formulation, where the predictor acts as an all-pole filter.[9] The forward prediction error ep(n)e_p(n) is the difference between the actual sample and its prediction:
ep(n)=x(n)x^(n)=x(n)+k=1pap(k)x(nk). e_p(n) = x(n) - \hat{x}(n) = x(n) + \sum_{k=1}^{p} a_p(k) x(n-k).
This error signal, often termed the residual or innovation, quantifies the unpredictable component of the signal and serves as input to synthesis filters in predictive coding schemes. For stationary signals, the error is uncorrelated with past samples when coefficients are optimally estimated, ensuring the predictor extracts all linear dependencies.[9] Reflection coefficients, denoted kik_i, arise in the recursive computation of predictor coefficients and represent the partial correlation between the current sample and the farthest past sample, conditioned on intervening samples. In the forward model, they facilitate stability by requiring ki<1|k_i| < 1 for all i=1,,pi = 1, \dots, p, which guarantees that the associated all-pole filter has poles inside the unit circle, preventing instability in implementation. These coefficients are particularly useful in adaptive scenarios, as they allow incremental updates without recomputing the entire set of ap(k)a_p(k).[9] The lattice structure provides a modular block diagram realization of the forward predictor, consisting of a tapped delay line with multipliers by the reflection coefficients and adders for error computation. It recursively builds higher-order predictors from lower-order ones through stages where each level updates the forward error using the previous backward error, enabling efficient order recursion and numerical stability. This structure is depicted as a series of delay elements z1z^{-1}, reflection coefficient multipliers kmk_m, and summers, forming an IIR filter equivalent to the direct-form predictor.[9] For illustration, consider a second-order (p=2p=2) forward predictor applied to a unit-amplitude sinusoidal signal x(n)=cos(ω0n)x(n) = \cos(\omega_0 n), where ω0\omega_0 is the normalized frequency. The optimal coefficients are a2(1)=2cos(ω0)a_2(1) = -2 \cos(\omega_0) and a2(2)=1a_2(2) = 1, derived from the signal's autoregressive representation. Substituting these yields x^(n)=2cos(ω0)x(n1)x(n2)\hat{x}(n) = 2 \cos(\omega_0) x(n-1) - x(n-2), resulting in zero prediction error e2(n)=0e_2(n) = 0 for the noiseless sinusoid, as the model perfectly reconstructs the periodic waveform. This demonstrates how increasing the predictor order reduces the error to negligible levels for signals with few dominant frequencies, highlighting the model's efficacy in capturing sinusoidal correlations.[9]

Backward Linear Prediction Model

The backward linear prediction model provides an anti-causal approach to estimating past samples of a signal from subsequent observations within a finite window, contrasting with the causal nature of forward prediction. For a signal x(n)x(n) and a prediction order mm, the predicted value of the past sample is given by
x^(nm)=k=1mam(k)x(nm+k), \hat{x}(n-m) = -\sum_{k=1}^{m} a_m(k) x(n-m+k),
where am(k)a_m(k) are the prediction coefficients optimized to minimize the prediction error over the window.[9] This formulation arises in the context of autoregressive modeling for stationary processes, where the backward predictor leverages future samples to reconstruct earlier ones, facilitating computations in recursive algorithms.[9] The backward prediction error, denoted e~m(n)\tilde{e}_m(n), quantifies the discrepancy between the actual and predicted past sample:
e~m(n)=x(nm)+k=1mam(k)x(nm+k). \tilde{e}_m(n) = x(n-m) + \sum_{k=1}^{m} a_m(k) x(n-m+k).
This error is minimized in a least-squares sense, yielding normal equations analogous to those in forward prediction for stationary signals.[9] For such processes, the backward and forward models are related by time-reversibility of the autocorrelation, with backward coefficients being the reverse of the forward ones: the coefficient weighting the sample closest to the predicted one in backward corresponds to the farthest in forward, i.e., amb(k)=amf(m+1k)a_m^b(k) = a_m^f(m + 1 - k).[9] In order-recursive algorithms like the Levinson-Durbin recursion, the backward prediction model plays a crucial role by enabling efficient step-up procedures from order m1m-1 to mm. Specifically, the backward errors from lower-order predictors are used alongside forward errors to compute the next reflection coefficient, allowing the full set of coefficients to be updated with O(m)O(m) operations per step while exploiting the Toeplitz structure of the autocorrelation matrix.[9] A key stability condition for the resulting inverse filter (the autoregressive model) requires the magnitude of each reflection coefficient to be less than 1, ensuring all poles lie inside the unit circle and preventing instability in the prediction process.[9] Violation of this condition, such as km1|k_m| \geq 1, can lead to poles outside the unit circle, rendering the filter non-causal or divergent in implementation.[9]

Parameter Estimation

Yule-Walker Equations

The Yule-Walker equations provide a set of linear normal equations for estimating the coefficients of a linear predictor by minimizing the mean squared prediction error under the orthogonality principle. For a forward linear prediction error $ e_p(n) = x(n) - \sum_{j=1}^p a_p(j) x(n-j) $, the principle requires that the error be orthogonal to the data used for prediction, yielding $ E[e_p(n) x(n-k)] = 0 $ for $ k = 1, 2, \dots, p $. Substituting the error expression gives the system:
r(k)j=1pap(j)r(kj)=0,k=1,2,,p, r(k) - \sum_{j=1}^p a_p(j) r(k-j) = 0, \quad k = 1, 2, \dots, p,
where $ r(k) = E[x(n) x(n-k)] $ is the autocorrelation function at lag $ k $.[10][11] In matrix form, these equations are expressed as $ \mathbf{R} \mathbf{a}_p = \mathbf{r}_p $, where $ \mathbf{a}p = [a_p(1), a_p(2), \dots, a_p(p)]^T $ is the coefficient vector, $ \mathbf{r}p = [r(1), r(2), \dots, r(p)]^T $, and $ \mathbf{R} $ is the $ p \times p $ symmetric Toeplitz autocorrelation matrix with elements $ R{i,j} = r(|i-j|) $. The Toeplitz structure arises from the stationarity assumption and facilitates specialized solution methods. The prediction error variance is then $ \sigma_p^2 = r(0) - \sum{j=1}^p a_p(j) r(j) $.[10][11] For low prediction orders, explicit solutions highlight the method's simplicity. In first-order prediction ($ p=1 $), the equation simplifies to $ r(1) - a_1(1) r(0) = 0 $, so $ a_1(1) = r(1)/r(0) .Forsecondorder(. For second-order ( p=2 $), the system is:

$$ \begin{bmatrix} r(0) & r(1) \ r(1) & r(0) \end{bmatrix} \begin{bmatrix} a_2(1) \ a_2(2) \end{bmatrix}

\begin{bmatrix} r(1) \ r(2) \end{bmatrix}, $$ yielding $ a_2(1) = \frac{r(1)(r(0) - r(2))}{r(0)^2 - r(1)^2} $ and $ a_2(2) = \frac{r(1)^2 - r(0) r(2)}{r(0)^2 - r(1)^2} $ after normalization by $ r(0) $. These ratios directly relate coefficients to autocorrelations, assuming $ r(0) > 0 $.[10] The derivation assumes the signal is wide-sense stationary, ensuring the autocorrelation $ r(k) $ exists and depends only on the lag $ k $. For non-stationary signals, the equations can be applied approximately by windowing the data to create stationary frames, estimating $ r(k) $ within each window via time-averaged correlations. Direct solution of the matrix equation requires $ O(p^3) $ operations via inversion or factorization, which motivates efficient recursive algorithms like the Levinson-Durbin recursion for higher orders.[12][11]

Levinson-Durbin Recursion

The Levinson-Durbin recursion, also known as the Levinson algorithm, provides an efficient method for solving the Yule-Walker equations to obtain linear prediction coefficients from autocorrelation estimates, building on the work of Levinson for general prediction problems and Durbin for autoregressive time series estimation.[13][14] It recursively computes the coefficients for increasing predictor orders up to pp, leveraging the Toeplitz structure of the autocorrelation matrix to avoid direct matrix inversion.[5] The algorithm begins with initialization for orders 0 and 1. For order 0, the prediction error variance is E0=r(0)E_0 = r(0), where r(k)r(k) denotes the autocorrelation at lag kk. For order 1, the coefficient is a1(1)=r(1)/r(0)a_1(1) = r(1)/r(0), and the error variance is E1=r(0)(1a1(1)2)E_1 = r(0) (1 - a_1(1)^2). For higher orders m=2m = 2 to pp, the recursion proceeds as follows: first, compute the reflection coefficient
km=r(m)j=1m1am1(j)r(mj)Em1, k_m = \frac{r(m) - \sum_{j=1}^{m-1} a_{m-1}(j) r(m-j)}{E_{m-1}},
where the sum is over previous coefficients. Then, update the coefficients for j=1j = 1 to m1m-1:
am(j)=am1(j)kmam1(mj), a_m(j) = a_{m-1}(j) - k_m a_{m-1}(m-j),
set am(m)=kma_m(m) = k_m, and update the error variance:
Em=Em1(1km2). E_m = E_{m-1} (1 - k_m^2).
This process yields the pp-th order coefficients ap(j)a_p(j) upon completion.[5][15] The recursion offers several advantages, including quadratic computational complexity of O(p2)O(p^2) operations, which is significantly more efficient than the O(p3)O(p^3) required for general matrix solving methods. Additionally, the condition km<1|k_m| < 1 for all mm serves as a stability check, ensuring the resulting all-pole filter is minimum-phase and the autocorrelation matrix is positive definite. The reflection coefficients kmk_m also directly correspond to the parameters of an equivalent lattice filter structure, facilitating implementations in adaptive filtering and speech processing.[5][14] The following pseudocode outlines the procedure, assuming input autocorrelations r(0)r(0) to r(p)r(p) and output coefficients ap(1)a_p(1) to ap(p)a_p(p):
Input: r(0), r(1), ..., r(p)
Output: a(1), a(2), ..., a(p); E (final error variance)

E = r(0)
if p >= 1:
    a[1] = r(1) / E
    E = E * (1 - a[1]^2)
for m = 2 to p:
    sum = 0
    for j = 1 to m-1:
        sum += a[j] * r[m - j]
    k = (r[m] - sum) / E
    for j = 1 to m-1:
        a_new[j] = a[j] - k * a[m - j]
    a[m] = k
    for j = 1 to m-1:
        a[j] = a_new[j]
    E = E * (1 - k^2)

return a(1) to a(p), E
(Note: In practice, arrays store coefficients up to the current order, with symmetric updates for efficiency.)[5][16] As an illustrative example, consider computing 3rd-order coefficients for an autoregressive process with autocorrelations r(0)=1r(0) = 1, r(1)=0.5r(1) = 0.5, r(2)=0.2r(2) = 0.2, r(3)=0.08r(3) = 0.08.
  • Order 1: a1(1)=0.5a_1(1) = 0.5, E1=1×(10.25)=0.75E_1 = 1 \times (1 - 0.25) = 0.75.
  • Order 2: k2=0.20.5×0.50.75=0.20.250.75=0.0667k_2 = \frac{0.2 - 0.5 \times 0.5}{0.75} = \frac{0.2 - 0.25}{0.75} = -0.0667, then a2(1)=0.5(0.0667)×0.50.5333a_2(1) = 0.5 - (-0.0667) \times 0.5 \approx 0.5333, a2(2)=0.0667a_2(2) = -0.0667, E2=0.75×(10.0044)0.7467E_2 = 0.75 \times (1 - 0.0044) \approx 0.7467.
  • Order 3: k3=0.08(0.5333×0.2+(0.0667)×0.5)0.74670.080.07330.74670.0089k_3 = \frac{0.08 - (0.5333 \times 0.2 + (-0.0667) \times 0.5)}{0.7467} \approx \frac{0.08 - 0.0733}{0.7467} \approx 0.0089, then a3(1)=0.53330.0089×(0.0667)0.5340a_3(1) = 0.5333 - 0.0089 \times (-0.0667) \approx 0.5340, a3(2)=0.06670.0089×0.53330.0715a_3(2) = -0.0667 - 0.0089 \times 0.5333 \approx -0.0715, a3(3)=0.0089a_3(3) = 0.0089, E30.7467×(10.00008)0.7466E_3 \approx 0.7467 \times (1 - 0.00008) \approx 0.7466.
This step-wise evolution demonstrates how lower-order solutions inform higher ones, converging to the final coefficients [0.5340,0.0715,0.0089][0.5340, -0.0715, 0.0089].[15]

Applications

Speech Coding and Analysis

In linear predictive coding (LPC), speech is modeled as the output of an all-pole vocal tract filter excited by a quasi-periodic impulse train for voiced segments or random noise for unvoiced segments. This approach captures the spectral envelope of the speech signal efficiently, enabling compression and analysis. The core LPC equation is
s(n)=k=1paks(nk)+Gu(n), s(n) = \sum_{k=1}^{p} a_k s(n-k) + G u(n),
where s(n)s(n) represents the speech sample at time nn, aka_k are the linear predictor coefficients, pp is the model order, GG is the gain factor scaling the excitation, and u(n)u(n) is the input excitation signal.[17] Parameter extraction in LPC involves framing the speech signal, applying pre-emphasis, and estimating the predictor coefficients via methods like the autocorrelation or covariance approach. Typically, predictor orders of 10 to 12 are used for sampling rates between 8 and 16 kHz to adequately model the vocal tract resonances up to the Nyquist frequency. A Hamming window is commonly applied to each frame to minimize spectral leakage and ensure stationarity assumptions hold during coefficient computation.[18][19] LPC finds extensive use in formant estimation and pitch detection within speech analysis. Formants, which correspond to vocal tract resonances, are derived from the roots of the predictor polynomial A(z)=1k=1pakzkA(z) = 1 - \sum_{k=1}^{p} a_k z^{-k}, where the angles of the complex roots near the unit circle yield the formant frequencies. For pitch detection, inverse filtering with the LPC model removes the spectral envelope, flattening the residual signal and exposing periodicity from which the pitch period can be estimated via autocorrelation or zero-crossing analysis.[20][21] LPC forms the foundation for several speech coding standards, including the Code-Excited Linear Prediction (CELP) algorithm in Federal Standard FS1016, which operates at 4.8 kbps by quantizing LPC parameters alongside stochastic excitation codebooks, and the GSM Full Rate codec, which employs Regular Pulse Excitation with Long-Term Prediction (RPE-LTP) based on LPC at 13 kbps. These standards achieve effective compression rates in the 4-8 kbps range for variants like enhanced CELP, balancing quality and bandwidth for telephony. Quantization of LPC parameters is evaluated using the spectral distortion (SD) measure, defined as the log spectral distance between original and quantized envelopes; an average SD of approximately 1 dB is widely regarded as achieving perceptually transparent quality without excessive outliers above 4 dB.[22][23][24]

Time Series Forecasting

Linear prediction plays a central role in time series forecasting through its equivalence to autoregressive (AR) models, where the prediction coefficients directly correspond to the AR parameters. In an AR(p) model, the value at time $ t $ is expressed as $ x_t = \sum_{k=1}^p a_k x_{t-k} + e_t $, with $ e_t $ as white noise, mirroring the forward linear prediction framework. The one-step-ahead forecast is thus $ \hat{x}(t+1) = \sum_{k=1}^p a_k x_{t+1-k} $, and for h-steps ahead, it extends recursively as $ \hat{x}(n+h) = \sum_{k=1}^{p} a_k \hat{x}(n+h-k) $, relying on previously computed forecasts for lags beyond observed data.[25] Model order selection is crucial for effective AR forecasting, balancing goodness-of-fit with model parsimony to avoid overfitting. Common criteria include the Akaike Information Criterion (AIC), defined as $ \text{AIC} = -2 \log L + 2p $, where $ L $ is the likelihood and $ p $ is the number of parameters, and the Bayesian Information Criterion (BIC), $ \text{BIC} = -2 \log L + p \log n $ with sample size $ n $, both minimized to choose $ p $. Additionally, the partial autocorrelation function (PACF) aids identification: for an AR(p) process, the PACF cuts off to zero after lag p, providing a graphical tool to estimate the order by inspecting significant spikes within the 95% confidence bounds of approximately $ \pm 2 / \sqrt{n} $.[26][27] In multi-step prediction, forecast uncertainty grows due to error propagation, with the h-step-ahead forecast error variance given by $ \sigma^2 \sum_{j=0}^{h-1} \phi_j^2 $, where $ \phi_j $ (often denoted $ \psi_j $) are the impulse response coefficients from the AR model's infinite moving average representation, starting with $ \phi_0 = 1 $. This variance increases with h, reflecting accumulating uncertainty from iterated predictions, and prediction intervals widen accordingly using the normal approximation $ \hat{x}(n+h) \pm z_{\alpha/2} \sqrt{\sigma_h^2} $. Parameters are typically estimated via methods like Yule-Walker equations, as detailed elsewhere.[28] AR models via linear prediction have been applied to forecast economic indicators and stock prices. However, AR models assume stationarity, requiring constant mean and variance, which many real series violate through trends or seasonality; non-stationary cases often necessitate differencing to achieve stationarity, leading to ARIMA extensions that integrate differencing with AR components for improved long-term forecasts.[29][30]

Extensions

Multi-Channel Linear Prediction

Multi-channel linear prediction extends the scalar linear prediction framework to vector-valued signals, where the observation at each time step is a d-dimensional vector x(n)Rd\mathbf{x}(n) \in \mathbb{R}^d, such as signals captured by multiple sensors or variables in a multivariate time series. The predicted vector is given by
x^(n)=k=1pAkx(nk), \hat{\mathbf{x}}(n) = \sum_{k=1}^{p} \mathbf{A}_k \mathbf{x}(n-k),
where pp is the prediction order and each Ak\mathbf{A}_k is a d×dd \times d coefficient matrix capturing linear dependencies across channels and lags.[31] This model assumes stationarity and aims to minimize the prediction error covariance, generalizing the univariate case to account for inter-channel correlations.[32] Parameter estimation in multi-channel linear prediction relies on a generalization of the Yule-Walker equations, which relate the coefficient matrices to the autocorrelation matrices of the process. For each lag k=1,,pk = 1, \dots, p, the equations take the form
R(0)A(k)+j=1k1A(j)R(kj)=R(k), \mathbf{R}(0) \mathbf{A}(k) + \sum_{j=1}^{k-1} \mathbf{A}(j) \mathbf{R}(k-j) = -\mathbf{R}(k),
where R(k)=E[x(n)xT(nk)]\mathbf{R}(k) = \mathbb{E}[\mathbf{x}(n) \mathbf{x}^T(n-k)] is the d×dd \times d autocorrelation matrix at lag kk, and R(0)\mathbf{R}(0) is the covariance matrix. Solving this system of matrix equations yields the Ak\mathbf{A}_k, often via least-squares methods or recursive algorithms adapted from the scalar Levinson-Durbin recursion, though the multivariate version requires matrix inversions at each step.[33] In applications, multi-channel linear prediction is widely used in beamforming for sensor arrays, where the coefficient matrices model spatial correlations to enhance signals from specific directions while suppressing interference. For instance, in acoustic sensor networks, it enables distributed beamforming by predicting and subtracting reverberant or noisy components across microphones, improving signal-to-noise ratios in reverberant environments.[34] In econometrics, it underpins multivariate autoregressive (MAR) models, which forecast multiple interrelated time series, such as economic indicators, by capturing cross-variable dynamics essential for policy analysis and risk assessment.[31] The increased dimensionality introduces significant computational challenges, as estimating the coefficients involves solving linear systems of size up to pd×pdp d \times p d, leading to a complexity of O(p3d3)O(p^3 d^3) for direct methods like Cholesky decomposition of the full autocorrelation matrix.[33] To mitigate this, techniques such as canonical correlation analysis (CCA) are employed for dimensionality reduction, projecting the high-dimensional channels onto a lower-dimensional subspace that preserves predictive correlations while reducing the effective dd.[35] A representative example is two-channel audio prediction for noise cancellation, where d=2d=2 microphones capture a speech signal corrupted by correlated noise. The model estimates cross-channel coefficients Ak\mathbf{A}_k that predict the noise in one channel from the other, enabling subtraction to recover clean audio.[36]

Relation to Autoregressive Models

Linear prediction is fundamentally equivalent to autoregressive (AR) modeling, where the prediction coefficients aka_k directly correspond to the AR parameters in the model x(n)=k=1pakx(nk)+e(n)x(n) = -\sum_{k=1}^{p} a_k x(n-k) + e(n), with e(n)e(n) representing white noise innovation. This equivalence arises because the linear predictor minimizes the mean squared prediction error, which aligns with the AR process definition that expresses the current observation as a linear combination of past values plus uncorrelated noise. The power spectral density (PSD) of such an AR process is then given by P(ω)=σe2A(ejω)2P(\omega) = \frac{\sigma_e^2}{|A(e^{j\omega})|^2}, where A(z)=1+k=1pakzkA(z) = 1 + \sum_{k=1}^{p} a_k z^{-k} is the AR polynomial and σe2\sigma_e^2 is the variance of the innovation e(n)e(n). A key interpretation of this equivalence lies in the maximum entropy principle, where the linear prediction spectrum represents the maximum entropy (or flattest) spectrum consistent with the known autocorrelation function up to lag pp.[37] This approach, originally developed by Burg, ensures that the extrapolated autocorrelations beyond lag pp are chosen to maximize the spectral entropy subject to the constraints from observed data, yielding an all-pole model identical to the AR spectrum.[37] Consequently, linear prediction provides an unbiased high-resolution spectral estimate under these constraints, particularly advantageous for short data records where nonparametric methods fail. In practice, AR parameters can be fitted directly to the data using variants of Durbin's algorithm, which performs least-squares estimation without first computing explicit autocorrelations, thus avoiding potential biases from sample autocorrelation estimates. This direct method iteratively refines the coefficients by minimizing the sum of squared prediction errors over the observed time series, offering computational efficiency and numerical stability for moderate-order models. Linear prediction, as an AR method, connects to other time series models such as moving average (MA) and autoregressive moving average (ARMA) processes; for instance, a finite-order AR approximation via linear prediction can model short-memory MA processes by using sufficiently high orders to capture the infinite AR representation of an MA. In ARMA contexts, linear prediction coefficients approximate the AR component, providing a pole-only model that serves as a starting point for full ARMA estimation. Under Gaussian assumptions for the innovations, linear prediction estimators for AR parameters exhibit asymptotic efficiency, achieving the Cramér-Rao lower bound as the sample size increases, with the least-squares estimates being asymptotically equivalent to the maximum likelihood estimator. This property ensures consistent and normally distributed parameter estimates, with variance determined by the inverse of the information matrix derived from the process autocovariances.

References

User Avatar
No comments yet.