Biconjugate gradient method
View on WikipediaThis article includes a list of references, related reading, or external links, but its sources remain unclear because it lacks inline citations. (September 2013) |
In mathematics, more specifically in numerical linear algebra, the biconjugate gradient method is an algorithm to solve systems of linear equations
Unlike the conjugate gradient method, this algorithm does not require the matrix to be self-adjoint, but instead one needs to perform multiplications by the conjugate transpose A*.
The Algorithm
[edit]- Choose initial guess , two other vectors and and a preconditioner
- for do
In the above formulation, the computed and satisfy
and thus are the respective residuals corresponding to and , as approximate solutions to the systems
is the adjoint, and is the complex conjugate.
Unpreconditioned version of the algorithm
[edit]- Choose initial guess ,
- for do
Discussion
[edit]The biconjugate gradient method is numerically unstable[citation needed] (compare to the biconjugate gradient stabilized method), but very important from a theoretical point of view. Define the iteration steps by
where using the related projection
with
These related projections may be iterated themselves as
A relation to Quasi-Newton methods is given by and , where
The new directions
are then orthogonal to the residuals:
which themselves satisfy
where .
The biconjugate gradient method now makes a special choice and uses the setting
With this particular choice, explicit evaluations of and A−1 are avoided, and the algorithm takes the form stated above.
Properties
[edit]- If is self-adjoint, and , then , , and the conjugate gradient method produces the same sequence at half the computational cost.
- The sequences produced by the algorithm are biorthogonal, i.e., for .
- if is a polynomial with , then . The algorithm thus produces projections onto the Krylov subspace.
- if is a polynomial with , then .
See also
[edit]- Biconjugate gradient stabilized method (BiCG-Stab)
- Conjugate gradient method (CG)
- Conjugate gradient squared method (CGS)
References
[edit]- Fletcher, R. (1976). "Conjugate gradient methods for indefinite systems". In Watson, G. Alistair (ed.). Numerical analysis : proceedings of the Dundee Conference on Numerical Analysis. Lecture Notes in Mathematics. Vol. 506. Springer. pp. 73–89. doi:10.1007/BFb0080116. ISBN 978-3-540-07610-0.
- Press, WH; Teukolsky, SA; Vetterling, WT; Flannery, BP (2007). "Section 2.7.6". Numerical Recipes: The Art of Scientific Computing (3rd ed.). New York: Cambridge University Press. ISBN 978-0-521-88068-8.
Biconjugate gradient method
View on GrokipediaIntroduction
Overview
The biconjugate gradient method (BiCG) is an iterative Krylov subspace algorithm designed for solving large-scale, sparse systems of non-symmetric linear equations of the form $ Ax = b $, where the coefficient matrix $ A $ is not Hermitian.[1] Introduced as a generalization of the conjugate gradient (CG) method, BiCG extends CG's efficient framework—originally effective only for symmetric positive definite systems—to handle non-symmetric cases prevalent in applications like fluid dynamics and electromagnetics.[2] By operating within expanding Krylov subspaces, it approximates the solution iteratively without computing a full matrix factorization, making it particularly valuable for problems where direct solvers are computationally prohibitive.[1] At its core, BiCG employs a biorthogonalization process that generates two coupled sequences of residual vectors: one for the original system and a shadow sequence for the transpose $ A^T $.[1] This dual mechanism ensures that the residuals remain orthogonal with respect to each other across iterations, enabling short three-term recurrences to update the solution and maintain computational efficiency.[2] Unlike methods relying on longer recurrences, such as GMRES, BiCG's approach leverages matrix-vector products with both $ A $ and $ A^T $ to construct conjugate directions that minimize the residual in a Petrov-Galerkin sense.[1] The primary motivation for BiCG stems from the need to overcome CG's restriction to Hermitian matrices, providing a robust tool for ill-conditioned, non-symmetric problems while preserving much of CG's elegance and speed.[2] Key advantages include its modest storage requirements—typically just a few vectors per iteration—and suitability for parallel computing environments, rendering it more memory-efficient than direct methods or storage-intensive Krylov alternatives.[1] These features have established BiCG as a foundational method in numerical linear algebra, often serving as the basis for stabilized variants in practical implementations.[1]Historical development
The biconjugate gradient (BiCG) method was introduced in 1976 by Roland Fletcher as an extension of the conjugate gradient method to address indefinite systems, including nonsymmetric linear systems arising in various applications.[2] This development aimed to generalize the efficient Krylov subspace iteration of the conjugate gradient for symmetric positive-definite matrices to broader, nonsymmetric contexts.[1] Fletcher's foundational work appeared in the paper "Conjugate gradient methods for indefinite systems," published in Numerical Analysis (Lecture Notes in Mathematics, vol. 506, pp. 73–89) by Springer.[2] The method's core idea involved generating two coupled sequences of residuals—one from the original system and one from its transpose—to maintain orthogonality properties analogous to those in the conjugate gradient algorithm.[2] The BiCG method traces its conceptual roots to earlier iterative techniques for eigenvalue problems, including the Lanczos algorithm, developed by Cornelius Lanczos in 1950 for tridiagonalizing symmetric matrices, and the Arnoldi process, introduced by W. E. Arnoldi in 1951 as a generalization to non-symmetric matrices. Early adoption highlighted numerical instabilities in BiCG due to potential breakdowns in the orthogonalization process, which spurred refinements in the 1990s, notably the BiCGSTAB variant proposed by Henk A. van der Vorst in 1992 for improved stability and convergence in nonsymmetric systems.[4]Mathematical background
Non-symmetric linear systems
The biconjugate gradient method is formulated to solve non-symmetric linear systems of equations expressed asKrylov subspace methods
Krylov subspace methods form the theoretical foundation for iterative solvers like the biconjugate gradient (BiCG) method, particularly for large-scale non-symmetric linear systems. These methods generate successive approximations to the solution by projecting the problem onto a sequence of low-dimensional subspaces that capture the essential behavior of the matrix-vector interactions. The core idea is to build the solution incrementally within a Krylov subspace, which is defined for a matrix and an initial residual vector as the span of the vectors obtained by successive applications of :Algorithm
Unpreconditioned BiCG
The biconjugate gradient (BiCG) method is an iterative Krylov subspace algorithm designed to solve large, sparse, non-symmetric linear systems of the form $ Ax = b $, where $ A $ is an $ n \times n $ matrix, $ x $ and $ b $ are vectors in $ \mathbb{R}^n $, and no preconditioning is applied.[2] Developed as an extension of the conjugate gradient method to indefinite and non-symmetric systems, it generates two sequences of residuals and search directions that satisfy bi-orthogonality conditions with respect to the Krylov subspaces generated by $ A $ and $ A^T $.[2] The algorithm begins with initialization. An initial approximation $ x_0 $ is chosen, often $ x_0 = 0 $. The initial residual is computed as $ r_0 = b - A x_0 $. An arbitrary shadow residual $ \bar{r}_0 $ is selected, typically $ \bar{r}_0 = r_0 $ for simplicity, ensuring $ \bar{r}_0^T r_0 \neq 0 $. The initial search directions are set to $ p_0 = r_0 $ and $ \bar{p}_0 = \bar{r}_0 $.[2] The iteration proceeds for $ k = 0, 1, 2, \dots $ until a stopping criterion is met. First, the step length is calculated asInput: A, b, x_0, tol, max_iter
Output: x (approximate solution)
r_0 ← b - A x_0
Choose \bar{r}_0 (e.g., \bar{r}_0 ← r_0)
p_0 ← r_0
\bar{p}_0 ← \bar{r}_0
k ← 0
while ||r_k|| > tol ||b|| and k < max_iter do
α_k ← (\bar{r}_k^T r_k) / (\bar{p}_k^T A p_k)
x_{k+1} ← x_k + α_k p_k
r_{k+1} ← r_k - α_k A p_k
\bar{r}_{k+1} ← \bar{r}_k - α_k A^T \bar{p}_k
β_k ← (\bar{r}_{k+1}^T r_{k+1}) / (\bar{r}_k^T r_k)
p_{k+1} ← r_{k+1} + β_k p_k
\bar{p}_{k+1} ← \bar{r}_{k+1} + β_k \bar{p}_k
k ← k + 1
end while
return x_k
Convergence is typically monitored using the relative residual norm $ |r_k| / |b| < \tol $, where $ \tol $ is a user-specified tolerance (e.g., $ 10^{-6} $), or by reaching a maximum number of iterations $ \max_iter $ to prevent infinite loops. The method requires two matrix-vector products per iteration—one with $ A $ and one with $ A^T $—and is exact in at most $ n $ steps for an $ n \times n $ matrix in exact arithmetic.[2]
Preconditioned BiCG
Preconditioning enhances the biconjugate gradient (BiCG) method by incorporating an approximate inverse matrix that approximates , transforming the original system into a form with improved spectral properties, such as a reduced condition number, to accelerate convergence for ill-conditioned nonsymmetric systems.[1] The preconditioner is typically nonsymmetric and positive definite, allowing solves of the form (and for the shadow residual sequence) at each iteration instead of direct residual computations, which clusters eigenvalues near unity and mitigates slow convergence in the unpreconditioned case.[8] This approach is particularly effective for sparse matrices arising in engineering applications, where direct factorization is prohibitive due to computational cost.[1] Two primary preconditioning strategies are employed: left preconditioning, which modifies the system to and applies the preconditioner to residuals and search directions, and right preconditioning, which solves with , preserving certain residual norms but altering the search space.[8] In left-preconditioned BiCG, the residual update becomes and the shadow residual update , ensuring biorthogonality in the preconditioned inner products; the step size is then computed as (with simplified notation omitting explicit ), and direction vectors are updated accordingly to maintain short recurrences.[8] Right preconditioning similarly adjusts the matrix-vector products to , but requires solving for the final iterate via ; both variants demand approximately two preconditioner solves per iteration, doubling the cost compared to unpreconditioned BiCG but often reducing total iterations substantially.[1] Common preconditioners for BiCG include incomplete LU (ILU) factorization, which approximates with sparsity constraints to yield triangular factors for efficient forward and back substitution solves, and diagonal scaling, where is the inverse of the diagonal of for simple normalization of rows and columns.[8] Selection of depends on the matrix structure: ILU suits banded or sparse nonsymmetric by controlling fill-in levels (e.g., ILU(0) drops all nonzeros outside the original sparsity pattern), while diagonal preconditioners are computationally cheap but less effective for highly indefinite systems; more advanced choices like SSOR may be used when symmetry is partial.[1] The choice balances setup cost (e.g., factorization time) against per-iteration solve efficiency, with ILU often preferred for its robustness in practice.[8] The pseudocode for left-preconditioned BiCG is as follows:[8]Compute r(0) = b - A x(0) for some initial guess x(0)
Choose \tilde{r}(0) (e.g., \tilde{r}(0) = r(0))
for i = 1, 2, ...
Solve M z(i-1) = r(i-1)
Solve M^T \tilde{z}(i-1) = \tilde{r}(i-1)
ρ(i-1) = z(i-1)^T \tilde{r}(i-1)
if ρ(i-1) = 0, method fails
if i = 1
p(i) = z(i-1)
\tilde{p}(i) = \tilde{z}(i-1)
else
β(i-1) = ρ(i-1) / ρ(i-2)
p(i) = z(i-1) + β(i-1) p(i-1)
\tilde{p}(i) = \tilde{z}(i-1) + β(i-1) \tilde{p}(i-1)
end if
q(i) = A p(i)
\tilde{q}(i) = A^T \tilde{p}(i)
α(i) = ρ(i-1) / (\tilde{p}(i)^T q(i))
x(i) = x(i-1) + α(i) p(i)
r(i) = r(i-1) - α(i) q(i)
\tilde{r}(i) = \tilde{r}(i-1) - α(i) \tilde{q}(i)
check convergence; continue if necessary
end for