Hubbry Logo
Recursive least squares filterRecursive least squares filterMain
Open search
Recursive least squares filter
Community hub
Recursive least squares filter
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Recursive least squares filter
Recursive least squares filter
from Wikipedia

Recursive least squares (RLS) is an adaptive filter algorithm that recursively finds the coefficients that minimize a weighted linear least squares cost function relating to the input signals. This approach is in contrast to other algorithms such as the least mean squares (LMS) that aim to reduce the mean square error. In the derivation of the RLS, the input signals are considered deterministic, while for the LMS and similar algorithms they are considered stochastic. Compared to most of its competitors, the RLS exhibits extremely fast convergence. However, this benefit comes at the cost of high computational complexity.

Motivation

[edit]

RLS was discovered by Gauss but lay unused or ignored until 1950 when Plackett rediscovered the original work of Gauss from 1821. In general, the RLS can be used to solve any problem that can be solved by adaptive filters. For example, suppose that a signal is transmitted over an echoey, noisy channel that causes it to be received as

where represents additive noise. The intent of the RLS filter is to recover the desired signal by use of a -tap FIR filter, :

where is the column vector containing the most recent samples of . The estimate of the recovered desired signal is

The goal is to estimate the parameters of the filter , and at each time we refer to the current estimate as and the adapted least-squares estimate by . is also a column vector, as shown below, and the transpose, , is a row vector. The matrix product (which is the dot product of and ) is , a scalar. The estimate is "good" if is small in magnitude in some least squares sense.

As time evolves, it is desired to avoid completely redoing the least squares algorithm to find the new estimate for , in terms of .

The benefit of the RLS algorithm is that there is no need to invert matrices, thereby saving computational cost. Another advantage is that it provides intuition behind such results as the Kalman filter.

Discussion

[edit]

The idea behind RLS filters is to minimize a cost function by appropriately selecting the filter coefficients , updating the filter as new data arrives. The error signal and desired signal are defined in the negative feedback diagram below:

The error implicitly depends on the filter coefficients through the estimate :

The weighted least squares error function —the cost function we desire to minimize—being a function of is therefore also dependent on the filter coefficients:

where is the "forgetting factor" which gives exponentially less weight to older error samples.

The cost function is minimized by taking the partial derivatives for all entries of the coefficient vector and setting the results to zero

Next, replace with the definition of the error signal

Rearranging the equation yields

This form can be expressed in terms of matrices

where is the weighted sample covariance matrix for , and is the equivalent estimate for the cross-covariance between and . Based on this expression we find the coefficients which minimize the cost function as

This is the main result of the discussion.

Choosing λ

[edit]

The smaller is, the smaller is the contribution of previous samples to the covariance matrix. This makes the filter more sensitive to recent samples, which means more fluctuations in the filter co-efficients. The case is referred to as the growing window RLS algorithm. In practice, is usually chosen between 0.98 and 1.[1] By using type-II maximum likelihood estimation the optimal can be estimated from a set of data.[2]

Recursive algorithm

[edit]

The discussion resulted in a single equation to determine a coefficient vector which minimizes the cost function. In this section we want to derive a recursive solution of the form

where is a correction factor at time . We start the derivation of the recursive algorithm by expressing the cross covariance in terms of

where is the dimensional data vector

Similarly we express in terms of by

In order to generate the coefficient vector we are interested in the inverse of the deterministic auto-covariance matrix. For that task the Woodbury matrix identity comes in handy. With

is -by-
is -by-1 (column vector)
is 1-by- (row vector)
is the 1-by-1 identity matrix

The Woodbury matrix identity follows

To come in line with the standard literature, we define

where the gain vector is

Before we move on, it is necessary to bring into another form

Subtracting the second term on the left side yields

With the recursive definition of the desired form follows

Now we are ready to complete the recursion. As discussed

The second step follows from the recursive definition of . Next we incorporate the recursive definition of together with the alternate form of and get

With we arrive at the update equation

where is the a priori error. Compare this with the a posteriori error; the error calculated after the filter is updated:

That means we found the correction factor

This intuitively satisfying result indicates that the correction factor is directly proportional to both the error and the gain vector, which controls how much sensitivity is desired, through the weighting factor, .

RLS algorithm summary

[edit]

The RLS algorithm for a p-th order RLS filter can be summarized as

Parameters: filter order
forgetting factor
value to initialize
Initialization: ,
,
where is the identity matrix of rank
Computation: For

.

The recursion for follows an algebraic Riccati equation and thus draws parallels to the Kalman filter.[3]

Lattice recursive least squares filter (LRLS)

[edit]

The lattice recursive least squares adaptive filter is related to the standard RLS except that it requires fewer arithmetic operations (order N).[4] It offers additional advantages over conventional LMS algorithms such as faster convergence rates, modular structure, and insensitivity to variations in eigenvalue spread of the input correlation matrix. The LRLS algorithm described is based on a posteriori errors and includes the normalized form. The derivation is similar to the standard RLS algorithm and is based on the definition of . In the forward prediction case, we have with the input signal as the most up to date sample. The backward prediction case is , where i is the index of the sample in the past we want to predict, and the input signal is the most recent sample.[5]

Parameter summary

[edit]
is the forward reflection coefficient
is the backward reflection coefficient
represents the instantaneous a posteriori forward prediction error
represents the instantaneous a posteriori backward prediction error
is the minimum least-squares backward prediction error
is the minimum least-squares forward prediction error
is a conversion factor between a priori and a posteriori errors
are the feedforward multiplier coefficients.
is a small positive constant that can be 0.01

LRLS algorithm summary

[edit]

The algorithm for a LRLS filter can be summarized as

Initialization:
For
  (if for )
 
 
 
End
Computation:
For
 
 
 
 
 For
 
 
 
 
 
 
 
 
 Feedforward filtering
 
 
 
 End
End

Normalized lattice recursive least squares filter (NLRLS)

[edit]

The normalized form of the LRLS has fewer recursions and variables. It can be calculated by applying a normalization to the internal variables of the algorithm which will keep their magnitude bounded by one. This is generally not used in real-time applications because of the number of division and square-root operations which comes with a high computational load.

NLRLS algorithm summary

[edit]

The algorithm for a NLRLS filter can be summarized as

Initialization:
For
  (if for )
 
 
End
 
Computation:
For
  (Input signal energy)
  (Reference signal energy)
 
 
 For
 
 
 
 Feedforward filter
 
 
 End
End

See also

[edit]

References

[edit]

Notes

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The recursive least squares (RLS) filter is an adaptive filtering used in to recursively estimate the coefficients of a that minimize a cost function based on incoming data. It operates by updating the filter weights at each time step to track changes in the input signal statistics, employing a factor λ\lambda (typically between 0.95 and 0.995) to exponentially weight recent observations more heavily than older ones, thereby enabling adaptation to time-varying environments. In its standard formulation, the RLS algorithm solves the of minimizing the cumulative squared error E(n)=i=1nλnid(i)wT(n)u(i)2E(n) = \sum_{i=1}^n \lambda^{n-i} |d(i) - \mathbf{w}^T(n) \mathbf{u}(i)|^2, where d(i)d(i) is the desired signal, u(i)\mathbf{u}(i) is the input vector, and w(n)\mathbf{w}(n) is the weight vector at time nn. The recursive updates leverage the matrix inversion lemma (also known as the Sherman-Morrison-Woodbury formula) to efficiently compute the inverse of the input correlation matrix P(n)=[i=1nλniu(i)uT(i)]1\mathbf{P}(n) = [\sum_{i=1}^n \lambda^{n-i} \mathbf{u}(i) \mathbf{u}^T(i)]^{-1}, avoiding full matrix recomputation at each step. This results in a gain vector k(n)\mathbf{k}(n) that adjusts the weights as w(n)=w(n1)+k(n)e(n)\mathbf{w}(n) = \mathbf{w}(n-1) + \mathbf{k}(n) e(n), where e(n)e(n) is the prediction error, with a of O(M2)O(M^2) per update for an MM-tap filter. RLS filters exhibit significantly faster convergence than gradient-based methods like the least mean squares (LMS) algorithm, often achieving near-optimal performance in approximately 2M2M iterations, and they demonstrate zero steady-state misadjustment in stationary conditions. Their equivalence to a under specific state-space models further underscores their optimality in linear tasks. Common applications include adaptive equalization in communications, cancellation, noise suppression, and , where rapid adaptation to non-stationary signals is essential.

Introduction and Background

Overview and Motivation

The recursive least squares (RLS) filter is an adaptive that recursively updates the coefficients of a to minimize a cost function, enabling real-time in dynamic systems. This approach processes incoming data sequentially, making it particularly effective for applications where system parameters evolve over time, such as in and control systems. Developed primarily in the and , RLS emerged as an enhancement to batch methods, which require recomputing solutions from all historical data and are impractical for online applications. Key contributions came from researchers like Lennart Ljung and Torsten Söderström, who formalized recursive identification techniques in system modeling, providing a theoretical foundation for adaptive estimation. Early parallels to RLS concepts appeared in the 1960s through Kalman filtering for state estimation, but the distinct formulation of RLS for adaptive filtering gained prominence around 1980 with efficient implementations for tasks. The primary motivation for RLS lies in its ability to address the rigidity of static by incorporating a forgetting factor, typically denoted as λ (where 0 < λ ≤ 1), which exponentially discounts older data to emphasize recent observations. This mechanism allows the filter to track time-varying conditions, such as fluctuating noise or channel distortions, without accumulating outdated information that could degrade performance. For instance, in acoustic echo cancellation, RLS adapts to changing room impulses and signal paths during a conversation, ensuring effective suppression of echoes in real-time telephony systems. Compared to simpler methods like the least mean squares (LMS) algorithm, RLS achieves faster convergence, though at higher computational cost, making it ideal for scenarios demanding precise tracking.

Mathematical Prerequisites

The recursive least squares (RLS) filter relies on foundational concepts from linear algebra, particularly in the realm of signal estimation, where signals are represented as vectors in finite-dimensional vector spaces. In this context, an inner product defines the geometry of the space, enabling the measurement of similarity between signal vectors through the dot product, which for complex-valued signals generalizes to the Hermitian inner product u,v=uHv\langle \mathbf{u}, \mathbf{v} \rangle = \mathbf{u}^H \mathbf{v}. Matrix inverses play a crucial role in solving systems of linear equations that arise in estimation problems, assuming the matrix is invertible, which requires it to be positive definite or full rank in practice for signal data matrices. Projection operators are essential for finding the best linear approximation of a desired signal onto the subspace spanned by input regressors, minimizing the estimation error in a mean-square sense. Least squares estimation addresses the problem of finding the optimal weights w\mathbf{w} that minimize the squared error between observed data and model predictions. Consider a batch of NN observations where the desired signal y\mathbf{y} (an N×1N \times 1 vector) is modeled as y=Xw+v\mathbf{y} = \mathbf{X} \mathbf{w} + \mathbf{v}, with X\mathbf{X} an N×MN \times M data matrix of input regressors, w\mathbf{w} an M×1M \times 1 weight vector, and v\mathbf{v} noise. The cost function to minimize is J=Xwy2=(Xwy)T(Xwy)J = \|\mathbf{X} \mathbf{w} - \mathbf{y}\|^2 = (\mathbf{X} \mathbf{w} - \mathbf{y})^T (\mathbf{X} \mathbf{w} - \mathbf{y}). To derive the solution, take the derivative with respect to w\mathbf{w} and set it to zero: Jw=2XTXw2XTy=0\frac{\partial J}{\partial \mathbf{w}} = 2 \mathbf{X}^T \mathbf{X} \mathbf{w} - 2 \mathbf{X}^T \mathbf{y} = 0, yielding the normal equations XTXw=XTy\mathbf{X}^T \mathbf{X} \mathbf{w} = \mathbf{X}^T \mathbf{y}. Assuming XTX\mathbf{X}^T \mathbf{X} is invertible, the batch least squares solution is w=(XTX)1XTy\mathbf{w} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}, which projects y\mathbf{y} onto the column space of X\mathbf{X}. This solution is unbiased and has minimum variance under Gaussian noise assumptions. An extension to weighted least squares incorporates a positive definite diagonal weighting matrix W\mathbf{W} to emphasize certain data points, such as more recent observations in sequential processing. The weighted cost function becomes J=(Xwy)TW(Xwy)J = (\mathbf{X} \mathbf{w} - \mathbf{y})^T \mathbf{W} (\mathbf{X} \mathbf{w} - \mathbf{y}), and the solution follows similarly from the normal equations XTWXw=XTWy\mathbf{X}^T \mathbf{W} \mathbf{X} \mathbf{w} = \mathbf{X}^T \mathbf{W} \mathbf{y}, giving w=(XTWX)1XTWy\mathbf{w} = (\mathbf{X}^T \mathbf{W} \mathbf{X})^{-1} \mathbf{X}^T \mathbf{W} \mathbf{y}, provided the weighted Gram matrix is invertible. This formulation allows for unequal treatment of errors, improving robustness to outliers or varying data reliability. In the stochastic setting relevant to RLS, signals are modeled as realizations of random processes, often assumed stationary for theoretical analysis, meaning statistical properties like mean and autocorrelation do not change over time. The autocorrelation matrix R=E[xxT]\mathbf{R} = E[\mathbf{x} \mathbf{x}^T] captures second-order statistics of the input vector x\mathbf{x}, while the cross-correlation p=E[xd]\mathbf{p} = E[\mathbf{x} d] relates input to the desired signal dd, both expectations taken over the joint distribution. The Wiener solution, which minimizes the mean-square error E[dwTx2]E[|d - \mathbf{w}^T \mathbf{x}|^2], solves the Wiener-Hopf equations Rwopt=p\mathbf{R} \mathbf{w}_{\text{opt}} = \mathbf{p}, yielding wopt=R1p\mathbf{w}_{\text{opt}} = \mathbf{R}^{-1} \mathbf{p} under the assumption that R\mathbf{R} is positive definite. For time-varying systems where stationarity does not hold, such as in tracking changing signal environments, exponentially weighted data provides a mechanism to discount older observations. This involves assigning weights that decay geometrically with time lag, effectively prioritizing recent data to adapt to non-stationarities while maintaining computational tractability in estimation.

Core RLS Formulation

Weighted Least Squares Problem

The weighted least squares problem underlying the recursive least squares (RLS) filter seeks to determine the optimal coefficient vector w(n)\mathbf{w}(n) that minimizes a time-weighted cost function based on accumulated errors up to time nn. Specifically, the cost function is defined as J(n)=i=1nλnid(i)wT(n)x(i)2,J(n) = \sum_{i=1}^n \lambda^{n-i} |d(i) - \mathbf{w}^T(n) \mathbf{x}(i)|^2, where d(i)d(i) denotes the desired signal at time ii, x(i)\mathbf{x}(i) is the input data vector, w(n)\mathbf{w}(n) is the filter coefficient vector of length MM at time nn, and λ\lambda (with 0<λ10 < \lambda \leq 1) is the forgetting factor that applies exponentially decaying weights to past observations. This weighting scheme emphasizes recent data over older samples, enabling the filter to adapt to time-varying environments; when λ=1\lambda = 1, the formulation reduces to the standard unweighted least squares problem, treating all observations equally. The batch solution for w(n)\mathbf{w}(n) at each time step involves solving the normal equations: w(n)=[k=1nλnkx(k)xT(k)]1[k=1nλnkx(k)d(k)]=P(n)b(n),\mathbf{w}(n) = \left[ \sum_{k=1}^n \lambda^{n-k} \mathbf{x}(k) \mathbf{x}^T(k) \right]^{-1} \left[ \sum_{k=1}^n \lambda^{n-k} \mathbf{x}(k) d(k) \right] = \mathbf{P}(n) \mathbf{b}(n), where P(n)\mathbf{P}(n) is the inverse of the weighted correlation matrix and b(n)\mathbf{b}(n) is the weighted cross-correlation vector between the input and desired signal. This direct approach requires inverting an M×MM \times M matrix at every iteration, incurring a computational complexity of O(M3)O(M^3) operations per update, which becomes prohibitive for high-order filters or real-time applications and thus motivates the development of recursive formulations. In the RLS context, two key error signals are defined to quantify prediction accuracy: the a priori error e(n)=d(n)wT(n1)x(n)e(n) = d(n) - \mathbf{w}^T(n-1) \mathbf{x}(n), which uses the previous coefficient vector, and the a posteriori error e^(n)=d(n)wT(n)x(n)\hat{e}(n) = d(n) - \mathbf{w}^T(n) \mathbf{x}(n), which employs the updated coefficients.

Recursive Derivation

The recursive least squares (RLS) filter derives its efficient update mechanism from the batch weighted least squares solution by exploiting the structure of sequential data arrivals and applying the matrix inversion lemma, also known as the Sherman-Morrison formula, to avoid full matrix recomputation at each step. Consider the weighted least squares problem at time nn, where the cost function is J(n)=i=1nλnid(i)xT(i)w(n)2J(n) = \sum_{i=1}^n \lambda^{n-i} |d(i) - \mathbf{x}^T(i) \mathbf{w}(n)|^2, with forgetting factor λ(0,1]\lambda \in (0,1] to emphasize recent data. The minimizer is w(n)=R1(n)b(n)\mathbf{w}(n) = \mathbf{R}^{-1}(n) \mathbf{b}(n), where R(n)=i=1nλnix(i)xT(i)\mathbf{R}(n) = \sum_{i=1}^n \lambda^{n-i} \mathbf{x}(i) \mathbf{x}^T(i) is the weighted autocorrelation matrix and b(n)=i=1nλnid(i)x(i)\mathbf{b}(n) = \sum_{i=1}^n \lambda^{n-i} d(i) \mathbf{x}(i) is the cross-correlation vector. To derive the recursive form, first update the inverse correlation matrix P(n)=R1(n)\mathbf{P}(n) = \mathbf{R}^{-1}(n). Note that R(n)=λR(n1)+x(n)xT(n)\mathbf{R}(n) = \lambda \mathbf{R}(n-1) + \mathbf{x}(n) \mathbf{x}^T(n), so P(n)=(λR(n1)+x(n)xT(n))1\mathbf{P}(n) = (\lambda \mathbf{R}(n-1) + \mathbf{x}(n) \mathbf{x}^T(n))^{-1}. Applying the Sherman-Morrison formula, which states that (A+uvT)1=A1A1uvTA11+vTA1u(\mathbf{A} + \mathbf{u} \mathbf{v}^T)^{-1} = \mathbf{A}^{-1} - \frac{\mathbf{A}^{-1} \mathbf{u} \mathbf{v}^T \mathbf{A}^{-1}}{1 + \mathbf{v}^T \mathbf{A}^{-1} \mathbf{u}} for invertible A\mathbf{A} and vectors u,v\mathbf{u}, \mathbf{v}, with A=λR(n1)\mathbf{A} = \lambda \mathbf{R}(n-1), u=x(n)\mathbf{u} = \mathbf{x}(n), and v=x(n)\mathbf{v} = \mathbf{x}(n), yields: P(n)=λ1[P(n1)P(n1)x(n)xT(n)P(n1)λ+xT(n)P(n1)x(n)].\mathbf{P}(n) = \lambda^{-1} \left[ \mathbf{P}(n-1) - \frac{\mathbf{P}(n-1) \mathbf{x}(n) \mathbf{x}^T(n) \mathbf{P}(n-1)}{\lambda + \mathbf{x}^T(n) \mathbf{P}(n-1) \mathbf{x}(n)} \right]. This rank-one update computes P(n)\mathbf{P}(n) in O(M2)O(M^2) operations, where MM is the filter order, compared to O(M3)O(M^3) for direct inversion of the batch R(n)\mathbf{R}(n). Next, derive the recursive updates for b(n)\mathbf{b}(n) and w(n)\mathbf{w}(n). The cross-correlation vector updates as b(n)=λb(n1)+d(n)x(n)\mathbf{b}(n) = \lambda \mathbf{b}(n-1) + d(n) \mathbf{x}(n), a simple O(M)O(M) vector addition. The weight vector then becomes w(n)=P(n)b(n)\mathbf{w}(n) = \mathbf{P}(n) \mathbf{b}(n). Substituting the update for b(n)\mathbf{b}(n) and using the expression for P(n)\mathbf{P}(n), this simplifies to w(n)=w(n1)+K(n)[d(n)xT(n)w(n1)]\mathbf{w}(n) = \mathbf{w}(n-1) + \mathbf{K}(n) [d(n) - \mathbf{x}^T(n) \mathbf{w}(n-1)], where the gain vector is K(n)=λ1P(n1)x(n)/(1+λ1xT(n)P(n1)x(n))\mathbf{K}(n) = \lambda^{-1} \mathbf{P}(n-1) \mathbf{x}(n) / (1 + \lambda^{-1} \mathbf{x}^T(n) \mathbf{P}(n-1) \mathbf{x}(n)). This form resembles the update, with K(n)\mathbf{K}(n) weighting the prediction error e(n)=d(n)xT(n)w(n1)e(n) = d(n) - \mathbf{x}^T(n) \mathbf{w}(n-1). To verify equivalence, prove by induction that the recursive w(n)\mathbf{w}(n) minimizes J(n)J(n). For the base case n=1n=1, w(1)=x(1)d(1)/xT(1)x(1)\mathbf{w}(1) = \mathbf{x}(1) d(1) / \mathbf{x}^T(1) \mathbf{x}(1) clearly minimizes the scalar cost. Assume true for n1n-1: w(n1)=argminJ(n1)\mathbf{w}(n-1) = \arg\min J(n-1). For nn, the minimizer satisfies the normal equations R(n)w(n)=b(n)\mathbf{R}(n) \mathbf{w}(n) = \mathbf{b}(n). Substituting the recursive forms shows R(n)[w(n1)+K(n)e(n)]=λR(n1)w(n1)+x(n)xT(n)w(n1)+x(n)e(n)=λb(n1)+d(n)x(n)=b(n)\mathbf{R}(n) [\mathbf{w}(n-1) + \mathbf{K}(n) e(n)] = \lambda \mathbf{R}(n-1) \mathbf{w}(n-1) + \mathbf{x}(n) \mathbf{x}^T(n) \mathbf{w}(n-1) + \mathbf{x}(n) e(n) = \lambda \mathbf{b}(n-1) + d(n) \mathbf{x}(n) = \mathbf{b}(n), confirming the normal equations hold and thus w(n)\mathbf{w}(n) minimizes J(n)J(n). By induction, the recursion is equivalent to the batch solution. For numerical stability in high dimensions, the Sherman-Morrison update can be generalized using the for low-rank adjustments, though implementation details are deferred to algorithmic variants. Overall, the recursive structure achieves O(M2)O(M^2) complexity per iteration, enabling real-time adaptation in applications like signal processing, far superior to the O(M3N)O(M^3 N) cost of batch recomputation over NN samples.

Standard RLS Algorithm

Update Equations

The standard recursive least squares (RLS) algorithm updates the filter coefficients and associated matrices at each time step nn using a sequence of four key equations, derived from the matrix inversion lemma to avoid direct matrix inversion and enable efficient recursion. First, the gain vector K(n)\mathbf{K}(n) is computed as K(n)=P(n1)x(n)λ+xT(n)P(n1)x(n),\mathbf{K}(n) = \frac{\mathbf{P}(n-1) \mathbf{x}(n)}{\lambda + \mathbf{x}^T(n) \mathbf{P}(n-1) \mathbf{x}(n)}, where P(n1)\mathbf{P}(n-1) is the inverse covariance matrix from the previous step, x(n)\mathbf{x}(n) is the input vector, and λ\lambda (typically 0<λ10 < \lambda \leq 1) is the forgetting factor. Next, the a priori error ε(n)\varepsilon(n), which represents the prediction error before the update, is calculated as ε(n)=d(n)wT(n1)x(n),\varepsilon(n) = d(n) - \mathbf{w}^T(n-1) \mathbf{x}(n), with d(n)d(n) denoting the desired signal and w(n1)\mathbf{w}(n-1) the prior coefficient vector. The coefficient vector is then updated via w(n)=w(n1)+K(n)ε(n),\mathbf{w}(n) = \mathbf{w}(n-1) + \mathbf{K}(n) \varepsilon(n), incorporating the gain-weighted error to adjust the filter taps. Finally, the inverse covariance matrix is updated using a rank-1 modification: P(n)=IK(n)xT(n)λP(n1),\mathbf{P}(n) = \frac{\mathbf{I} - \mathbf{K}(n) \mathbf{x}^T(n)}{\lambda} \mathbf{P}(n-1), where I\mathbf{I} is the identity matrix; this form leverages the Sherman-Morrison formula for computational efficiency. To reduce computational complexity, particularly for the denominator in the gain vector, a scalar intermediate quantity α(n)=xT(n)P(n1)x(n)\alpha(n) = \mathbf{x}^T(n) \mathbf{P}(n-1) \mathbf{x}(n) is often introduced, yielding K(n)=P(n1)x(n)/(λ+α(n))\mathbf{K}(n) = \mathbf{P}(n-1) \mathbf{x}(n) / (\lambda + \alpha(n)); the subsequent covariance update then proceeds via the rank-1 form above. These updates form the core iteration of the RLS algorithm, typically implemented in a loop from n=1n = 1 to NN, where NN is the number of data samples; a pseudocode outline is as follows:

For n = 1 to N: Compute α(n) = x^T(n) P(n-1) x(n) Compute K(n) = P(n-1) x(n) / (λ + α(n)) Compute ε(n) = d(n) - w^T(n-1) x(n) Update w(n) = w(n-1) + K(n) ε(n) Update P(n) = [I - K(n) x^T(n)] P(n-1) / λ

For n = 1 to N: Compute α(n) = x^T(n) P(n-1) x(n) Compute K(n) = P(n-1) x(n) / (λ + α(n)) Compute ε(n) = d(n) - w^T(n-1) x(n) Update w(n) = w(n-1) + K(n) ε(n) Update P(n) = [I - K(n) x^T(n)] P(n-1) / λ

This sequence ensures the filter coefficients minimize the weighted least squares cost at each step. An important property arising from these updates is the relation between the a priori error and the a posteriori error ε^(n)\hat{\varepsilon}(n), given by ε^(n)=ε(n)/(λ+α(n))\hat{\varepsilon}(n) = \varepsilon(n) / (\lambda + \alpha(n)), which quantifies the residual error after adaptation and aids in stability analysis.

Initialization and Parameter Selection

The initialization of the recursive least squares (RLS) filter involves setting the initial coefficient vector and covariance matrix to establish starting conditions that balance prior knowledge with adaptation speed. Typically, the initial coefficient vector w(0)\mathbf{w}(0) is set to the zero vector if no prior information about the filter coefficients is available, allowing the algorithm to learn from incoming data without bias from assumptions. Alternatively, small random values may be used when slight perturbations are desired to avoid potential stagnation in flat error surfaces. This choice promotes unbiased convergence in stationary environments. The initial covariance matrix P(0)\mathbf{P}(0) is commonly initialized as P(0)=δ1I\mathbf{P}(0) = \delta^{-1} \mathbf{I}, where δ>0\delta > 0 is a small scalar (e.g., 0.01 to 1) and I\mathbf{I} is the of appropriate dimension. This formulation introduces high initial uncertainty, enabling fast adaptation during startup by yielding large initial gains in the update equations; smaller δ\delta amplifies the inverse scaling, increasing the emphasis on early data for rapid convergence. The value of δ\delta is often tuned relative to the input signal variance to ensure and appropriate gain scaling. The forgetting factor λ(0,1]\lambda \in (0, 1] is a critical that weights recent observations more heavily in the exponentially weighted cost function, with typical values ranging from 0.95 to 0.995 for moderate forgetting in many applications. Values closer to 1 (e.g., 0.99) promote slow adaptation and greater stability by retaining more historical data, suitable for nearly stationary processes, while smaller λ\lambda (e.g., 0.95) enables faster tracking of time-varying parameters at the of increased sensitivity to noise and potential . Selection criteria for λ\lambda often consider the eigenvalue spread of the input matrix R\mathbf{R}, where larger spreads necessitate smaller λ\lambda to mitigate ill-conditioning, or the desired effective approximated as 1/(1λ)1/(1 - \lambda), which quantifies the number of past samples influencing the estimate (e.g., 20 to samples for λ=0.95\lambda = 0.95 to 0.995). Additionally, λ\lambda is chosen to balance responsiveness against noise sensitivity, with application-specific tuning via simulation or analysis of parameter variation rates. The filter order MM, which determines the dimension of the coefficient vector, is selected to achieve an application-specific between and variance in the model. Common approaches include information-theoretic criteria such as the (AIC) or (BIC), which penalize higher orders to avoid while minimizing prediction error, or cross-validation techniques that evaluate performance on held-out data subsets. For instance, in autoregressive modeling with RLS estimation, order selection via AIC balances model complexity against residual fit, often yielding MM values from 2 to 10 depending on signal correlation structure.

Variant Algorithms

Lattice Recursive Least Squares (LRLS)

The lattice recursive least squares (LRLS) algorithm provides a modular, order-recursive of the RLS filter, structured as a network that processes signals stage by stage without maintaining the full . This design leverages reflection coefficients κm(n)\kappa_m(n) computed from backward errors, enabling efficient adaptation to increasing filter orders while preserving the optimality of the solution. Introduced in seminal work on estimation, the LRLS facilitates implementations in adaptive by decoupling computations across stages, where each stage corresponds to a order mm at time nn. Central to the LRLS are the forward prediction errors fm(n)f_m(n), which represent the error in predicting the input signal using mm past samples, and the backward prediction errors bm(n)b_m(n), which predict the signal from future samples. These are complemented by (PARCOR) coefficients κm(n)\kappa_m(n), defined as κm(n)=E[fm1(n)bm1(n)]E[bm12(n)],\kappa_m(n) = -\frac{\mathbb{E}[f_{m-1}(n) b_{m-1}(n)]}{\mathbb{E}[b_{m-1}^2(n)]}, which quantify the between forward and backward errors at the previous stage and serve as the reflection coefficients for order updates. The core updates in LRLS proceed in two phases: time recursion to incorporate the forgetting factor λ\lambda and new data, followed by order recursion using κm(n)\kappa_m(n). The time-updated prediction errors satisfy \begin{align*} f_m(n) &= \lambda f_m(n-1) + \kappa_m(n) b_{m-1}(n-1), \ b_m(n) &= \lambda b_{m-1}(n-1) + \kappa_m(n) f_m(n-1), \end{align*} where these relations propagate errors stage-wise, ensuring the filter coefficients align with the exponentially weighted least squares criterion. To interface with transversal filter forms, the lattice gains are converted to equivalent transversal coefficients via a lower triangular conversion matrix derived from the Levinson-Durbin recursion, allowing seamless integration with standard RLS outputs.
ParameterDescriptionRecursive Relation
κm(n)\kappa_m(n)Reflection (PARCOR) coefficient at stage mm, time nnκm(n)=E[fm1(n)bm1(n)]E[bm12(n)]\kappa_m(n) = -\frac{\mathbb{E}[f_{m-1}(n) b_{m-1}(n)]}{\mathbb{E}[b_{m-1}^2(n)]}
fm(n)f_m(n)Forward prediction error at stage mm, time nnfm(n)=λfm(n1)+κm(n)bm1(n1)f_m(n) = \lambda f_m(n-1) + \kappa_m(n) b_{m-1}(n-1) (time update); order update via recursion from fm1(n)f_{m-1}(n)
bm(n)b_m(n)Backward prediction error at stage mm, time nnbm(n)=λbm1(n1)+κm(n)fm(n1)b_m(n) = \lambda b_{m-1}(n-1) + \kappa_m(n) f_m(n-1) (time update); order update via recursion from bm1(n)b_{m-1}(n)
The LRLS exhibits enhanced relative to direct-form RLS implementations, as its orthogonal stage-wise processing mitigates error accumulation in matrix inversions. Additionally, the modular architecture simplifies order selection by allowing independent evaluation and addition of stages, while demonstrating reduced sensitivity to round-off errors in finite-precision arithmetic due to localized computations.

Normalized Lattice Recursive Least Squares (NLRLS)

The Normalized Lattice Recursive Least Squares (NLRLS) algorithm introduces normalization into the lattice structure to enhance and robustness against variations in input signal amplitudes. Unlike the standard lattice RLS, which can suffer from drift in finite-precision implementations due to fluctuating input energies, the NLRLS normalizes internal signals to maintain bounded magnitudes, typically close to unity. This normalization is particularly beneficial for adaptive filtering applications where input signals exhibit time-varying power levels, ensuring consistent performance without excessive sensitivity to scaling. Central to the NLRLS are the energy estimates Δ_m(n), which approximate the expected squared magnitude of the forward prediction errors: Δm(n)=λΔm(n1)+fm(n)2,\Delta_m(n) = \lambda \Delta_m(n-1) + |f_m(n)|^2, where λ is the forgetting factor (0 < λ ≤ 1), and f_m(n) is the m-th stage forward prediction error at time n. These estimates serve as scaling factors for normalization. The normalized forward error is defined as γ_m(n) = f_m(n) / √Δ_m(n), and a similar normalization applies to the backward prediction error b_m(n), yielding a normalized backward error β_m(n) = b_m(n) / √Δ_m(n). This approach ensures that the normalized errors have approximately unit energy, mitigating amplification or attenuation effects from input variations. The reflection coefficients in NLRLS are updated using the normalized quantities to preserve orthogonality and stability. Specifically, the m-th stage reflection coefficient evolves as κm(n)=λκm(n1)+fm1(n)bm1(n1)Δm1(n),\kappa_m(n) = \frac{\lambda \kappa_m(n-1) + f_{m-1}(n) b_{m-1}^*(n-1)}{\Delta_{m-1}(n)}, where * denotes complex conjugate, and the update incorporates the previous normalized reflection coefficient weighted by λ. The normalized recursions then propagate the errors across stages while maintaining the unit energy constraint: for the forward path, γm(n)=γm1(n)κm(n)βm1(n1)1κm(n)2,\gamma_m(n) = \frac{\gamma_{m-1}(n) - \kappa_m(n) \beta_{m-1}(n-1)}{\sqrt{1 - |\kappa_m(n)|^2}},
Add your contribution
Related Hubs
User Avatar
No comments yet.