Linear prediction
View on WikipediaLinear prediction is a mathematical operation where future values of a discrete-time signal are estimated as a linear function of previous samples.
In digital signal processing, linear prediction is often called linear predictive coding (LPC) and can thus be viewed as a subset of filter theory. In system analysis, a subfield of mathematics, linear prediction can be viewed as a part of mathematical modelling or optimization.
The prediction model
[edit]The most common representation is
where is the predicted signal value, the previous observed values, with , and the predictor coefficients. The error generated by this estimate is
where is the true signal value.
These equations are valid for all types of (one-dimensional) linear prediction. The differences are found in the way the predictor coefficients are chosen.
For multi-dimensional signals the error metric is often defined as
where is a suitable chosen vector norm. Predictions such as are routinely used within Kalman filters and smoothers to estimate current and past signal values, respectively, from noisy measurements.[1]
Estimating the parameters
[edit]The most common choice in optimization of parameters is the root mean square criterion which is also called the autocorrelation criterion. In this method we minimize the expected value of the squared error , which yields the equation
for 1 ≤ j ≤ p, where R is the autocorrelation of signal xn, defined as
- ,
and E is the expected value. In the multi-dimensional case this corresponds to minimizing the L2 norm.
The above equations are called the normal equations or Yule-Walker equations. In matrix form the equations can be equivalently written as
where the autocorrelation matrix is a symmetric, Toeplitz matrix with elements , the vector is the autocorrelation vector , and , the parameter vector.
Another, more general, approach is to minimize the sum of squares of the errors defined in the form
where the optimisation problem searching over all must now be constrained with .
On the other hand, if the mean square prediction error is constrained to be unity and the prediction error equation is included on top of the normal equations, the augmented set of equations is obtained as
where the index ranges from 0 to , and is a matrix.
Specification of the parameters of the linear predictor is a wide topic and a large number of other approaches have been proposed. In fact, the autocorrelation method is the most common[2] and it is used, for example, for speech coding in the GSM standard.
Solution of the matrix equation is computationally a relatively expensive process. The Gaussian elimination for matrix inversion is probably the oldest solution but this approach does not efficiently use the symmetry of . A faster algorithm is the Levinson recursion proposed by Norman Levinson in 1947, which recursively calculates the solution.[citation needed] In particular, the autocorrelation equations above may be more efficiently solved by the Durbin algorithm.[3]
In 1986, Philippe Delsarte and Y.V. Genin proposed an improvement to this algorithm called the split Levinson recursion, which requires about half the number of multiplications and divisions.[4] It uses a special symmetrical property of parameter vectors on subsequent recursion levels. That is, calculations for the optimal predictor containing terms make use of similar calculations for the optimal predictor containing terms.
Another way of identifying model parameters is to iteratively calculate state estimates using Kalman filters and obtaining maximum likelihood estimates within expectation–maximization algorithms.
For equally-spaced values, a polynomial interpolation is a linear combination of the known values. If the discrete time signal is estimated to obey a polynomial of degree then the predictor coefficients are given by the corresponding row of the triangle of binomial transform coefficients. This estimate might be suitable for a slowly varying signal with low noise. The predictions for the first few values of are
See also
[edit]References
[edit]- ^ "Kalman Filter - an overview | ScienceDirect Topics". www.sciencedirect.com. Retrieved 2022-06-24.
- ^ "Linear Prediction - an overview | ScienceDirect Topics". www.sciencedirect.com. Retrieved 2022-06-24.
- ^ Ramirez, M. A. (2008). "A Levinson Algorithm Based on an Isometric Transformation of Durbin's" (PDF). IEEE Signal Processing Letters. 15: 99–102. Bibcode:2008ISPL...15...99R. doi:10.1109/LSP.2007.910319. S2CID 18906207.
- ^ Delsarte, P. and Genin, Y. V. (1986), The split Levinson algorithm, IEEE Transactions on Acoustics, Speech, and Signal Processing, v. ASSP-34(3), pp. 470–478
This article includes a list of general references, but it lacks sufficient corresponding inline citations. (November 2010) |
Further reading
[edit]- Hayes, M. H. (1996). Statistical Digital Signal Processing and Modeling. New York: J. Wiley & Sons. ISBN 978-0471594314.
- Levinson, N. (1947). "The Wiener RMS (root mean square) error criterion in filter design and prediction". Journal of Mathematics and Physics. 25 (4): 261–278. doi:10.1002/sapm1946251261.
- Makhoul, J. (1975). "Linear prediction: A tutorial review". Proceedings of the IEEE. 63 (5): 561–580. doi:10.1109/PROC.1975.9792.
- Yule, G. U. (1927). "On a Method of Investigating Periodicities in Disturbed Series, with Special Reference to Wolfer's Sunspot Numbers". Phil. Trans. Roy. Soc. A. 226 (636–646): 267–298. Bibcode:1927RSPTA.226..267Y. doi:10.1098/rsta.1927.0007. JSTOR 91170.
External links
[edit]Linear prediction
View on GrokipediaFundamentals
Definition and Basic Concept
Linear prediction is a fundamental technique in signal processing for estimating the value of a discrete-time signal at a given time instant based on a linear combination of its previous values. Formally, the predicted sample is approximated asHistorical Development
The roots of linear prediction trace back to 19th-century advancements in time series analysis, where statisticians like Adolphe Quetelet, Francis Galton, and Karl Pearson developed foundational methods for modeling sequential data and regression, laying the groundwork for predictive techniques in stochastic processes.[6] A pivotal early contribution came in 1927 when George Udny Yule applied autoregressive models to analyze and predict sunspot data, introducing a method to investigate periodicities in disturbed series that marked the first explicit use of linear prediction in time series forecasting. This work demonstrated how past values could linearly predict future ones, influencing subsequent statistical modeling of temporal dependencies. In 1938, Herman Wold advanced the theoretical underpinnings in his doctoral thesis, decomposing stationary stochastic processes into deterministic and purely stochastic components, which provided a rigorous link between time series decomposition and optimal linear prediction theory. Wold's framework emphasized the role of innovation processes in prediction, establishing key principles for representing stochastic phenomena as sums of predictable and unpredictable parts. Post-World War II, linear prediction gained prominence in speech signal processing, with significant developments at Bell Laboratories in the 1960s and 1970s. Bishnu S. Atal and colleagues explored linear predictive coding (LPC) for speech analysis and synthesis, building on earlier statistical approaches to model vocal tract resonances efficiently.[7] Independently, Fumitada Itakura at NTT Laboratories in 1967 introduced maximum likelihood methods for LPC parameter estimation, adapting prediction techniques to spectral envelope representation in audio signals. John Burg's 1967 presentation on maximum entropy spectral analysis further propelled the field by proposing an efficient method for estimating prediction error and coefficients, which avoided assumptions about unobserved autocorrelation lags and improved resolution in short data records, influencing algorithms in digital signal processing throughout the 1970s.[8] A key milestone was John D. Markel's 1971 paper on LPC applications, which detailed formant trajectory estimation and synthesis techniques, enabling practical speech processing systems. By the 1980s, linear prediction was integrated into telecommunications standards, notably the GSM full-rate speech codec adopted in 1990, which employed regular pulse excitation with long-term prediction (RPE-LTP) based on LPC principles to achieve efficient low-bitrate voice compression for mobile networks. This adoption marked linear prediction's transition from theoretical tool to widespread engineering application in digital communications.Mathematical Formulation
Forward Linear Prediction Model
The forward linear prediction model estimates the current value of a discrete-time signal using a linear combination of its previous samples, enabling causal prediction of future samples from past observations. This approach assumes the signal follows an autoregressive process where each sample is predictable from earlier ones, minimizing the mean-squared prediction error through optimal coefficient selection. The model is foundational in signal processing for applications requiring efficient representation of correlated data, such as speech and time series analysis.[9] The predicted value for a -th order forward predictor is expressed asBackward Linear Prediction Model
The backward linear prediction model provides an anti-causal approach to estimating past samples of a signal from subsequent observations within a finite window, contrasting with the causal nature of forward prediction. For a signal and a prediction order , the predicted value of the past sample is given byParameter Estimation
Yule-Walker Equations
The Yule-Walker equations provide a set of linear normal equations for estimating the coefficients of a linear predictor by minimizing the mean squared prediction error under the orthogonality principle. For a forward linear prediction error $ e_p(n) = x(n) - \sum_{j=1}^p a_p(j) x(n-j) $, the principle requires that the error be orthogonal to the data used for prediction, yielding $ E[e_p(n) x(n-k)] = 0 $ for $ k = 1, 2, \dots, p $. Substituting the error expression gives the system:$$ \begin{bmatrix} r(0) & r(1) \ r(1) & r(0) \end{bmatrix} \begin{bmatrix} a_2(1) \ a_2(2) \end{bmatrix}
\begin{bmatrix} r(1) \ r(2) \end{bmatrix}, $$ yielding $ a_2(1) = \frac{r(1)(r(0) - r(2))}{r(0)^2 - r(1)^2} $ and $ a_2(2) = \frac{r(1)^2 - r(0) r(2)}{r(0)^2 - r(1)^2} $ after normalization by $ r(0) $. These ratios directly relate coefficients to autocorrelations, assuming $ r(0) > 0 $.[10] The derivation assumes the signal is wide-sense stationary, ensuring the autocorrelation $ r(k) $ exists and depends only on the lag $ k $. For non-stationary signals, the equations can be applied approximately by windowing the data to create stationary frames, estimating $ r(k) $ within each window via time-averaged correlations. Direct solution of the matrix equation requires $ O(p^3) $ operations via inversion or factorization, which motivates efficient recursive algorithms like the Levinson-Durbin recursion for higher orders.[12][11]Levinson-Durbin Recursion
The Levinson-Durbin recursion, also known as the Levinson algorithm, provides an efficient method for solving the Yule-Walker equations to obtain linear prediction coefficients from autocorrelation estimates, building on the work of Levinson for general prediction problems and Durbin for autoregressive time series estimation.[13][14] It recursively computes the coefficients for increasing predictor orders up to , leveraging the Toeplitz structure of the autocorrelation matrix to avoid direct matrix inversion.[5] The algorithm begins with initialization for orders 0 and 1. For order 0, the prediction error variance is , where denotes the autocorrelation at lag . For order 1, the coefficient is , and the error variance is . For higher orders to , the recursion proceeds as follows: first, compute the reflection coefficientInput: r(0), r(1), ..., r(p)
Output: a(1), a(2), ..., a(p); E (final error variance)
E = r(0)
if p >= 1:
a[1] = r(1) / E
E = E * (1 - a[1]^2)
for m = 2 to p:
sum = 0
for j = 1 to m-1:
sum += a[j] * r[m - j]
k = (r[m] - sum) / E
for j = 1 to m-1:
a_new[j] = a[j] - k * a[m - j]
a[m] = k
for j = 1 to m-1:
a[j] = a_new[j]
E = E * (1 - k^2)
return a(1) to a(p), E
(Note: In practice, arrays store coefficients up to the current order, with symmetric updates for efficiency.)[5][16]
As an illustrative example, consider computing 3rd-order coefficients for an autoregressive process with autocorrelations , , , .
- Order 1: , .
- Order 2: , then , , .
- Order 3: , then , , , .