Hubbry Logo
search
logo

QR decomposition

logo
Community Hub0 Subscribers
Read side by side
from Wikipedia

In linear algebra, a QR decomposition, also known as a QR factorization or QU factorization, is a decomposition of a matrix A into a product A = QR of an orthonormal matrix Q and an upper triangular matrix R. QR decomposition is often used to solve the linear least squares (LLS) problem and is the basis for a particular eigenvalue algorithm, the QR algorithm.

Cases and definitions

[edit]

Square matrix

[edit]

Any real square matrix A may be decomposed as

where Q is an orthogonal matrix (its columns are orthogonal unit vectors meaning ) and R is an upper triangular matrix (also called right triangular matrix). If A is invertible, then the factorization is unique if we require the diagonal elements of R to be positive.

If instead A is a complex square matrix, then there is a decomposition A = QR where Q is a unitary matrix (so the conjugate transpose ).

If A has n linearly independent columns, then the first n columns of Q form an orthonormal basis for the column space of A. More generally, the first k columns of Q form an orthonormal basis for the span of the first k columns of A for any 1 ≤ kn.[1] The fact that any column k of A only depends on the first k columns of Q corresponds to the triangular form of R.[1]

Rectangular matrix

[edit]

More generally, we can factor a complex m×n matrix A, with mn, as the product of an m×m unitary matrix Q and an m×n upper triangular matrix R. As the bottom (mn) rows of an m×n upper triangular matrix consist entirely of zeroes, it is often useful to partition R, or both R and Q:

where R1 is an n×n upper triangular matrix, 0 is an (mnn zero matrix, Q1 is m×n, Q2 is m×(mn), and Q1 and Q2 both have orthogonal columns.

Golub & Van Loan (1996, §5.2) call Q1R1 the thin QR factorization of A; Trefethen and Bau call this the reduced QR factorization.[1] If A is of full rank n and we require that the diagonal elements of R1 are positive then R1 and Q1 are unique, but in general Q2 is not. R1 is then equal to the upper triangular factor of the Cholesky decomposition of A* A (= ATA if A is real).

QL, RQ and LQ decompositions

[edit]

Analogously, we can define QL, RQ, and LQ decompositions, with L being a lower triangular matrix.

Computing the QR decomposition

[edit]

There are several methods for actually computing the QR decomposition, such as the Gram–Schmidt process, Householder transformations, or Givens rotations. Each has a number of advantages and disadvantages.

Using the Gram–Schmidt process

[edit]

Consider the Gram–Schmidt process applied to the columns of the full column rank matrix , with inner product (or for the complex case).

Define the projection:

then:

We can now express the s over our newly computed orthonormal basis:

where . This can be written in matrix form:

where:

and

Example

[edit]

Consider the decomposition of

Recall that an orthonormal matrix has the property .

Then, we can calculate by means of Gram–Schmidt as follows:

Thus, we have

Relation to RQ decomposition

[edit]

The RQ decomposition transforms a matrix A into the product of an upper triangular matrix R (also known as right-triangular) and an orthogonal matrix Q. The only difference from QR decomposition is the order of these matrices.

QR decomposition is Gram–Schmidt orthogonalization of columns of A, started from the first column.

RQ decomposition is Gram–Schmidt orthogonalization of rows of A, started from the last row.

Advantages and disadvantages

[edit]

The Gram-Schmidt process is inherently numerically unstable. While the application of the projections has an appealing geometric analogy to orthogonalization, the orthogonalization itself is prone to numerical error. A significant advantage is the ease of implementation.

Using Householder reflections

[edit]
Householder reflection for QR-decomposition: The goal is to find a linear transformation that changes the vector into a vector of the same length which is collinear to . We could use an orthogonal projection (Gram-Schmidt) but this will be numerically unstable if the vectors and are close to orthogonal. Instead, the Householder reflection reflects through the dotted line (chosen to bisect the angle between and ). The maximum angle with this transform is 45 degrees.

A Householder reflection (or Householder transformation) is a transformation that takes a vector and reflects it about some plane or hyperplane. We can use this operation to calculate the QR factorization of an m-by-n matrix with mn.

Q can be used to reflect a vector in such a way that all coordinates but one disappear.

Let be an arbitrary real m-dimensional column vector of such that for a scalar α. If the algorithm is implemented using floating-point arithmetic, then α should get the opposite sign as the k-th coordinate of , where is to be the pivot coordinate after which all entries are 0 in matrix A's final upper triangular form, to avoid loss of significance. In the complex case, set[2]

and substitute transposition by conjugate transposition in the construction of Q below.

Then, where is the vector [1 0 ⋯ 0]T, || · || is the Euclidean norm and is an m×m identity matrix, set

Or, if is complex

is an m-by-m Householder matrix, which is both symmetric and orthogonal (Hermitian and unitary in the complex case), and

This can be used to gradually transform an m-by-n matrix A to upper triangular form. First, we multiply A with the Householder matrix Q1 we obtain when we choose the first matrix column for x. This results in a matrix Q1A with zeros in the left column (except for the first row).

This can be repeated for A′ (obtained from Q1A by deleting the first row and first column), resulting in a Householder matrix Q2. Note that Q2 is smaller than Q1. Since we want it really to operate on Q1A instead of A′ we need to expand it to the upper left, filling in a 1, or in general:

After iterations of this process, ,

is an upper triangular matrix. So, with

is a QR decomposition of .

This method has greater numerical stability than the Gram–Schmidt method above.

In numerical tests the computed factors and satisfy at machine precision. Also, orthogonality is preserved: . However, the accuracy of and decrease with condition number:

For a well-conditioned example (, ):

In an ill-conditioned test (, ): [3]

The following table gives the number of operations in the k-th step of the QR-decomposition by the Householder transformation, assuming a square matrix with size n.

Operation Number of operations in the k-th step
Multiplications
Additions
Division
Square root

Summing these numbers over the n − 1 steps (for a square matrix of size n), the complexity of the algorithm (in terms of floating point multiplications) is given by

Example

[edit]

Let us calculate the decomposition of

First, we need to find a reflection that transforms the first column of matrix A, vector , into .

Now,

and

Here,

and

Therefore

and , and then

Now observe:

so we already have almost a triangular matrix. We only need to zero the (3, 2) entry.

Take the (1, 1) minor, and then apply the process again to

By the same method as above, we obtain the matrix of the Householder transformation

after performing a direct sum with 1 to make sure the next step in the process works properly.

Now, we find

Or, to four decimal digits,

The matrix Q is orthogonal and R is upper triangular, so A = QR is the required QR decomposition.

Advantages and disadvantages

[edit]

The use of Householder transformations is inherently the most simple of the numerically stable QR decomposition algorithms due to the use of reflections as the mechanism for producing zeroes in the R matrix. However, the Householder reflection algorithm is bandwidth heavy and difficult to parallelize, as every reflection that produces a new zero element changes the entirety of both Q and R matrices.

Parallel implementation of Householder QR

[edit]

The Householder QR method can be implemented in parallel with algorithms such as the TSQR algorithm (which stands for Tall Skinny QR). This algorithm can be applied in the case when the matrix A has m >> n.[4] This algorithm uses a binary reduction tree to compute local householder QR decomposition at each node in the forward pass, and re-constitute the Q matrix in the backward pass. The binary tree structure aims at decreasing the amount of communication between processor to increase performance.

Using Givens rotations

[edit]

QR decompositions can also be computed with a series of Givens rotations. Each rotation zeroes an element in the subdiagonal of the matrix, forming the R matrix. The concatenation of all the Givens rotations forms the orthogonal Q matrix.

In practice, Givens rotations are not actually performed by building a whole matrix and doing a matrix multiplication. A Givens rotation procedure is used instead which does the equivalent of the sparse Givens matrix multiplication, without the extra work of handling the sparse elements. The Givens rotation procedure is useful in situations where only relatively few off-diagonal elements need to be zeroed, and is more easily parallelized than Householder transformations.

Example

[edit]

Let us calculate the decomposition of

First, we need to form a rotation matrix that will zero the lowermost left element, . We form this matrix using the Givens rotation method, and call the matrix . We will first rotate the vector , to point along the X axis. This vector has an angle . We create the orthogonal Givens rotation matrix, :

And the result of now has a zero in the element.

We can similarly form Givens matrices and , which will zero the sub-diagonal elements and , forming a triangular matrix . The orthogonal matrix is formed from the product of all the Givens matrices . Thus, we have , and the QR decomposition is .

Advantages and disadvantages

[edit]

The QR decomposition via Givens rotations is the most involved to implement, as the ordering of the rows required to fully exploit the algorithm is not trivial to determine. However, it has a significant advantage in that each new zero element affects only the row with the element to be zeroed (i) and a row above (j). This makes the Givens rotation algorithm more bandwidth efficient and parallelizable than the Householder reflection technique.

Connection to a determinant or a product of eigenvalues

[edit]

We can use QR decomposition to find the determinant of a square matrix. Suppose a matrix is decomposed as . Then we have

can be chosen such that . Thus,

where the are the entries on the diagonal of . Furthermore, because the determinant equals the product of the eigenvalues, we have

where the are eigenvalues of .

We can extend the above properties to a non-square complex matrix by introducing the definition of QR decomposition for non-square complex matrices and replacing eigenvalues with singular values.

Start with a QR decomposition for a non-square matrix A:

where denotes the zero matrix and is a unitary matrix.

From the properties of the singular value decomposition (SVD) and the determinant of a matrix, we have

where the are the singular values of .

Note that the singular values of and are identical, although their complex eigenvalues may be different. However, if A is square, then

It follows that the QR decomposition can be used to efficiently calculate the product of the eigenvalues or singular values of a matrix.

Column pivoting

[edit]

Pivoted QR differs from ordinary Gram-Schmidt in that it takes the largest remaining column at the beginning of each new step—column pivoting—[5] and thus introduces a permutation matrix P:

Column pivoting is useful when A is (nearly) rank deficient, or is suspected of being so. It can also improve numerical accuracy. P is usually chosen so that the diagonal elements of R are non-increasing: . This can be used to find the (numerical) rank of A at lower computational cost than a singular value decomposition, forming the basis of so-called rank-revealing QR algorithms.

Using for solution to linear inverse problems

[edit]

Compared to the direct matrix inverse, inverse solutions using QR decomposition are more numerically stable as evidenced by their reduced condition numbers.[6]

To solve the underdetermined () linear problem where the matrix has dimensions and rank , first find the QR factorization of the transpose of : , where Q is an orthogonal matrix (i.e. ), and R has a special form: . Here is a square right triangular matrix, and the zero matrix has dimension . After some algebra, it can be shown that a solution to the inverse problem can be expressed as: where one may either find by Gaussian elimination or compute directly by forward substitution. The latter technique enjoys greater numerical accuracy and lower computations.

To find a solution to the overdetermined () problem which minimizes the norm , first find the QR factorization of : . The solution can then be expressed as , where is an matrix containing the first columns of the full orthonormal basis and where is as before. Equivalent to the underdetermined case, back substitution can be used to quickly and accurately find this without explicitly inverting . ( and are often provided by numerical libraries as an "economic" QR decomposition.)

Generalizations

[edit]

Iwasawa decomposition generalizes QR decomposition to semi-simple Lie groups.

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
QR decomposition, also known as QR factorization, is a fundamental technique in linear algebra that decomposes a real or complex matrix $ A $ of size $ m \times n $ (with $ m \geq n $) into the product $ A = QR $, where $ Q $ is an $ m \times n $ matrix with orthonormal columns (i.e., $ Q^H Q = I_n $, with $ ^H $ denoting the conjugate transpose) and $ R $ is an $ n \times n $ upper triangular matrix.[1] This factorization preserves the column space of $ A $ in $ Q $ and captures the triangular structure in $ R $, making it particularly useful for numerical stability in computations.[2] The method was first systematically developed by Gene H. Golub in 1965 as a stable numerical tool for solving linear least squares problems, leveraging orthogonal transformations to avoid error amplification common in direct methods.[3] QR decomposition can be computed via the classical Gram-Schmidt process, which orthogonalizes the columns of $ A $ sequentially, or more stably using Householder reflections or Givens rotations, which introduce zeros below the diagonal of $ R $ through a series of unitary transformations.[2] For full rank matrices, $ Q $ can be extended to an $ m \times m $ orthogonal/unitary matrix, and variants like thin QR or rank-revealing QR handle rectangular or rank-deficient cases by incorporating column pivoting to reveal numerical rank.[4] Key applications of QR decomposition include solving overdetermined linear systems $ Ax = b $ by transforming them into triangular systems via $ Rx = Q^H b $, which is backward stable and avoids ill-conditioning issues in Gaussian elimination.[5] It also forms the basis for the QR algorithm in eigenvalue computation, where iterative applications yield the Schur decomposition for finding eigenvalues and eigenvectors of general matrices.[6] Additional uses span least squares optimization in data fitting, signal processing for MIMO systems, and preconditioning in iterative solvers for large-scale problems, highlighting its role in modern numerical linear algebra libraries like LAPACK.[7]

Definitions and Formulations

Square Matrices

For an n×nn \times n invertible matrix AA, the QR decomposition provides an orthogonal-triangular factorization A=QRA = QR, where QQ is an n×nn \times n orthogonal matrix satisfying QTQ=InQ^T Q = I_n and RR is an n×nn \times n upper triangular matrix with positive diagonal entries. The columns of QQ form an orthonormal basis for Rn\mathbb{R}^n, spanning the entire space and preserving lengths and angles under the transformation defined by QQ. This factorization leverages the properties of orthogonal matrices to simplify various matrix operations while maintaining numerical stability. The uniqueness of the QR decomposition for square invertible matrices follows directly from the conditions imposed: if A=Q1R1=Q2R2A = Q_1 R_1 = Q_2 R_2 with both R1R_1 and R2R_2 upper triangular and having positive diagonals, then Q1=Q2Q_1 = Q_2 and R1=R2R_1 = R_2. This uniqueness arises because the orthogonal factor QQ is determined by successively orthogonalizing the columns of AA without sign ambiguities on the diagonal of RR, ensuring a canonical form. Geometrically, the QR decomposition interprets AA as a composition of an orthogonal transformation followed by a triangular one: QQ rotates or reflects the standard orthonormal basis of Rn\mathbb{R}^n to align with an orthonormal basis for the column space of AA, while RR captures the relative scalings along these directions and the shearing effects between them. This view highlights how the decomposition separates the isometric (length-preserving) component from the volumetric changes encoded in RR. The QR decomposition for square matrices, based on the Gram-Schmidt orthogonalization process, was developed as a numerically stable tool in mid-20th-century numerical linear algebra, with Gene H. Golub advancing its applications in 1965.[3]

Rectangular Matrices

For a full rank m×nm \times n matrix AA with mnm \geq n, the QR decomposition takes the form A=QRA = QR, where QQ is an m×mm \times m orthogonal matrix and RR is an m×nm \times n upper triangular matrix whose first nn rows are nonzero and the remaining mnm - n rows are zero.[8] This full factorization extends the square case by embedding the decomposition into a larger orthogonal factor, preserving the property that QTQ=ImQ^T Q = I_m.[9] In practice, the economy or reduced QR decomposition is often preferred for efficiency, expressed as A=Q^R^A = \hat{Q} \hat{R}, where Q^\hat{Q} is an m×nm \times n matrix with orthonormal columns (satisfying Q^TQ^=In\hat{Q}^T \hat{Q} = I_n) and R^\hat{R} is an n×nn \times n upper triangular matrix.[8] This thin form avoids the unnecessary (mn)×(mn)(m - n) \times (m - n) block of identity in the full QQ, reducing storage and computation costs when mnm \gg n.[9] If AA has full column rank, R^\hat{R} is invertible, enabling direct computation of least squares solutions via back-substitution.[8] For the case m<nm < n (a fat matrix), the full QR decomposition is A=QRA = QR with QQ an m×mm \times m orthogonal matrix and RR an m×nm \times n upper triangular matrix.[8] Here, no reduced form is typically emphasized, as the orthogonal factor is already square and minimal in rows.[9] In contrast, for tall matrices (m>nm > n), the thin QR is the standard variant, focusing on the column space without excess dimensions.[8] The columns of QQ (or Q^\hat{Q} in the reduced form) form an orthonormal basis for the column space of AA, capturing its range exactly.[9] The upper triangular structure of RR (or R^\hat{R}) facilitates forward or backward substitution in applications like solving overdetermined systems, as the triangular form allows efficient triangular solves.[8] For the thin QR, the equation A=QRA = QR holds with QTQ=InQ^T Q = I_n and RR upper triangular, ensuring uniqueness up to signs in the diagonal of RR for full rank AA.[9] The QL, RQ, and LQ decompositions form a family of orthogonal-triangular factorizations analogous to the standard QR decomposition, differing primarily in the placement of the orthogonal and triangular components. These variants arise naturally in numerical linear algebra when alternative triangular structures facilitate specific algorithmic needs, such as processing matrices from the bottom-up or row-wise.[10] In the QL decomposition, an m×nm \times n matrix AA is factored as A=QLA = QL, where QQ is an m×mm \times m orthogonal matrix and LL is an m×nm \times n lower trapezoidal matrix. For square matrices, the diagonal elements of LL are conventionally chosen to be positive. This form is computed by applying orthogonal transformations in reverse order compared to QR, effectively orthogonalizing columns from the bottom. QL proves useful in eigenvalue computations, particularly in the QL step of algorithms for symmetric tridiagonal matrices, where it aids in implicit shifting for improved convergence.[11][12] The RQ decomposition factors A=RQA = RQ, with RR an m×nm \times n upper trapezoidal matrix and QQ an n×nn \times n orthogonal matrix. It corresponds to the transpose of the QR decomposition of ATA^T, thereby relating to row echelon forms when rows are prioritized. RQ is advantageous for row-wise processing in certain parallel or structured matrix algorithms, where upper triangular structure aligns with row operations.[13] The LQ decomposition expresses A=LQA = LQ, where LL is an m×nm \times n lower trapezoidal matrix and QQ is an n×nn \times n orthogonal matrix. As the dual of QR, it is obtained via the QR factorization of ATA^T followed by transposition. LQ is particularly suited to underdetermined systems, enabling the computation of minimum-norm solutions by providing an orthonormal basis for the row space.[14] For a square matrix ARn×nA \in \mathbb{R}^{n \times n}, the QL form is A=QLA = QL with LL lower triangular and positive diagonal entries, mirroring the QR convention but inverted in triangular orientation. The following table summarizes the factor positions across these decompositions:
DecompositionFormOrthogonal FactorTriangular FactorPrimary Utility
QRA=QRA = QRLeft-multipliedUpper, rightColumn space basis
QLA=QLA = QLLeft-multipliedLower, rightBottom-up eigenvalue steps
RQA=RQA = RQRight-multipliedUpper, leftRow-wise or echelon processing
LQA=LQA = LQRight-multipliedLower, leftMinimum-norm solutions

Algorithms for Computation

Gram-Schmidt Orthogonalization

The classical Gram-Schmidt orthogonalization process provides a direct method to compute the QR decomposition of a matrix ARm×nA \in \mathbb{R}^{m \times n} with full column rank by iteratively constructing an orthonormal basis for the column space of AA. The algorithm begins by setting the columns of AA as a1,,an\mathbf{a}_1, \dots, \mathbf{a}_n, and proceeds to generate the columns of QQ as q1,,qn\mathbf{q}_1, \dots, \mathbf{q}_n through successive projections and normalizations. Specifically, initialize q1=a1/a12\mathbf{q}_1 = \mathbf{a}_1 / \|\mathbf{a}_1\|_2, where 2\|\cdot\|_2 denotes the Euclidean norm. For each subsequent column k=2,,nk = 2, \dots, n, compute the projection coefficients rjk=qjTakr_{jk} = \mathbf{q}_j^T \mathbf{a}_k for j=1,,k1j = 1, \dots, k-1, subtract the sum of these projections from ak\mathbf{a}_k to obtain uk=akj=1k1rjkqj\mathbf{u}_k = \mathbf{a}_k - \sum_{j=1}^{k-1} r_{jk} \mathbf{q}_j, and normalize qk=uk/uk2\mathbf{q}_k = \mathbf{u}_k / \|\mathbf{u}_k\|_2 with the diagonal entry rkk=uk2r_{kk} = \|\mathbf{u}_k\|_2. The upper triangular matrix RR is then formed with off-diagonal entries rjkr_{jk} for j<kj < k and zeros below the diagonal.[15] This process ensures that QQ has orthonormal columns and A=QRA = QR, as each qk\mathbf{q}_k is orthogonal to the previous basis vectors by construction. To illustrate, consider the 2×22 \times 2 matrix A=(3112)A = \begin{pmatrix} 3 & 1 \\ 1 & 2 \end{pmatrix}. First, a1=(31)\mathbf{a}_1 = \begin{pmatrix} 3 \\ 1 \end{pmatrix}, so a12=10\|\mathbf{a}_1\|_2 = \sqrt{10} and q1=(3/101/10)\mathbf{q}_1 = \begin{pmatrix} 3/\sqrt{10} \\ 1/\sqrt{10} \end{pmatrix}, with r11=10r_{11} = \sqrt{10}. For a2=(12)\mathbf{a}_2 = \begin{pmatrix} 1 \\ 2 \end{pmatrix}, the projection coefficient r12=q1Ta2=5/10r_{12} = \mathbf{q}_1^T \mathbf{a}_2 = 5/\sqrt{10}. Then, u2=a2r12q1=(0.51.5)\mathbf{u}_2 = \mathbf{a}_2 - r_{12} \mathbf{q}_1 = \begin{pmatrix} -0.5 \\ 1.5 \end{pmatrix}, u22=5/2\|\mathbf{u}_2\|_2 = \sqrt{5/2}, and q2=(0.5/5/21.5/5/2)=(1/103/10)\mathbf{q}_2 = \begin{pmatrix} -0.5 / \sqrt{5/2} \\ 1.5 / \sqrt{5/2} \end{pmatrix} = \begin{pmatrix} -1/\sqrt{10} \\ 3/\sqrt{10} \end{pmatrix}, with r22=5/2r_{22} = \sqrt{5/2}. Thus, Q=(3/101/101/103/10)Q = \begin{pmatrix} 3/\sqrt{10} & -1/\sqrt{10} \\ 1/\sqrt{10} & 3/\sqrt{10} \end{pmatrix} and R=(105/1005/2)R = \begin{pmatrix} \sqrt{10} & 5/\sqrt{10} \\ 0 & \sqrt{5/2} \end{pmatrix}. The classical Gram-Schmidt algorithm is simple and intuitive, as it explicitly builds an orthonormal basis from the original columns through projections, making it easy to implement and understand conceptually. It directly produces the QQ factor with orthonormal columns, which is advantageous for applications requiring an explicit orthogonal basis. However, the algorithm suffers from numerical instability in finite-precision arithmetic, primarily due to the quadratic growth of rounding errors in the subtraction steps of the projections, leading to a loss of orthogonality in the computed QQ. This instability arises because errors in earlier qj\mathbf{q}_j propagate and amplify in subsequent orthogonalizations, potentially making the columns of QQ only approximately orthogonal for ill-conditioned matrices. A variant known as modified Gram-Schmidt addresses some instability by altering the order of projections and normalizations, and applying this process to the rows of AA (or equivalently, to ATA^T) yields an RQ decomposition where A=RQA = RQ with RR lower triangular. For improved stability without such issues, methods like Householder reflections are preferred in practice.

Householder Reflections

The Householder method for QR decomposition relies on a sequence of orthogonal Householder reflections to triangularize the matrix. For an m×nm \times n matrix AA with mnm \geq n, the algorithm applies nn Householder matrices HkH_k (for k=1,,nk = 1, \dots, n) from the left such that HnHn1H1A=RH_n H_{n-1} \cdots H_1 A = R, where RR is upper triangular and each HkH_k introduces zeros below the diagonal in the kk-th column of the current matrix. Each HkH_k acts on rows kk to mm and is defined to reflect the subvector x=A(k:m,k)x = A(k:m, k) onto a multiple of the first standard basis vector e1e_1, thereby zeroing out the subdiagonal entries in column kk. The product Q=H1H2HnQ = H_1 H_2 \cdots H_n is orthogonal, yielding the decomposition A=QRA = Q R. This approach was introduced by Alston S. Householder in his 1958 paper on unitary triangularization.[16] The core of each reflection is the Householder transformation, an orthogonal matrix of the form H=I2uuTuTuH = I - \frac{2 u u^T}{u^T u}, where uu is chosen to map the target subvector appropriately. Specifically, for the subvector xRmk+1x \in \mathbb{R}^{m-k+1}, compute σ=x2\sigma = \|x\|_2; set β=sign(x1)σ\beta = -\operatorname{sign}(x_1) \sigma to avoid cancellation; then u=xβe1u = x - \beta e_1. The transformation Hk=I2uuTuTuH_k = I - \frac{2 u u^T}{u^T u} satisfies Hkx=βe1H_k x = \beta e_1, ensuring the desired zeros while preserving the Euclidean norm. In practice, the reflector is applied to the trailing submatrix without forming HkH_k explicitly, using O(mn2)O(m n^2) floating-point operations overall for mnm \geq n, which reduces to O(n3)O(n^3) for square matrices. This efficiency stems from the rank-one update structure, allowing vectorized implementations in libraries like LAPACK. To illustrate, consider the 3×23 \times 2 matrix
A=(123456). A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{pmatrix}.
For the first column, x=(135)x = \begin{pmatrix} 1 \\ 3 \\ 5 \end{pmatrix}, x2=35\|x\|_2 = \sqrt{35}, β=35\beta = -\sqrt{35} (since sign(1)>0\operatorname{sign}(1) > 0), and u=(1+3535)u = \begin{pmatrix} 1 + \sqrt{35} \\ 3 \\ 5 \end{pmatrix}. The Householder matrix H1=I2uuTuTuH_1 = I - \frac{2 u u^T}{u^T u} is applied to AA, zeroing entries below the (1,1) pivot and updating the second column to (R1200)\begin{pmatrix} R_{12} \\ 0 \\ 0 \end{pmatrix}, where R11=35R_{11} = -\sqrt{35}. For the second column of the updated matrix, take the subvector from row 2: y=(y2y3)y = \begin{pmatrix} y_2 \\ y_3 \end{pmatrix} (computed from the update), compute its Householder vector v=yγe1v = y - \gamma e_1 with γ=sign(y2)y2\gamma = -\operatorname{sign}(y_2) \|y\|_2, and apply H2H_2 (a 3×33 \times 3 matrix with identity in the top-left) to zero below the (2,2) pivot, yielding the full upper triangular RR. The resulting QQ can be formed explicitly if needed, though in applications it is often used implicitly via the stored reflectors. This example demonstrates how each step progressively triangularizes one column at a time. The method is backward stable, meaning the computed R^\hat{R} satisfies Q^R^=A+ΔA\hat{Q} \hat{R} = A + \Delta A with ΔAϵO(n)A\|\Delta A\| \leq \epsilon O(n) \|A\| in the 2-norm under standard floating-point arithmetic, where ϵ\epsilon is machine epsilon; this stability arises because each Householder reflection is orthogonal and thus preserves norms exactly in exact arithmetic, with rounding errors bounded componentwise. It also inherently maintains orthogonality in QQ, avoiding the loss of orthogonality that can plague projection-based methods like classical Gram-Schmidt. These properties make it the preferred choice for dense matrices in numerical software.[17] However, storing the Householder vectors requires O(n2)O(n^2) space for an n×nn \times n matrix (each of the nn vectors has length decreasing from nn to 1), which is comparable to storing QQ explicitly but more compact. The sequential algorithm can be less cache-friendly due to the irregular access patterns in applying reflectors to trailing submatrices, leading to potential performance bottlenecks on modern processors. To address these, blocked variants group multiple reflectors into a single block transformation, enabling level-3 BLAS operations for better data locality and parallelism on multicore systems; the WY representation, where a product of kk Householder matrices is expressed as Qk=I+WYTQ_k = I + W Y^T with WW and YY block matrices, facilitates this by converting sequences of rank-one updates into matrix multiplications. For distributed-memory settings, particularly tall-skinny matrices (mnm \gg n), the TSQR algorithm applies Householder QR in a tree reduction fashion across processors, reducing communication from O(logp)O(\log p) to O(1)O(1) words per processor for pp processors while maintaining stability, and allows reconstruction of the implicit Householder vectors post-factorization. These extensions enhance scalability without sacrificing the core method's numerical reliability.[18][19]

Givens Rotations

Givens rotations provide an alternative approach to computing the QR decomposition by successively applying a series of plane rotations to the original matrix AA, transforming it into upper triangular form RR, such that QTA=RQ^T A = R where QQ is the product of the inverse rotations. This method, introduced by Wallace Givens, relies on unitary transformations that zero out one subdiagonal entry at a time, typically proceeding column by column from left to right and row by row from bottom to top within each column.[20] A Givens rotation matrix Gi,j(θ)G_{i,j}(\theta) is an orthogonal matrix that differs from the identity matrix only in the ii-th and jj-th rows and columns, forming a 2×2 rotation block:
Gi,j(θ)=(Ii1csIji1scImj), G_{i,j}(\theta) = \begin{pmatrix} I_{i-1} & & & \\ & c & & s & \\ & & I_{j-i-1} & & \\ & -s & & c & \\ & & & I_{m-j} & \end{pmatrix},
where c=cosθc = \cos \theta and s=sinθs = \sin \theta. To zero the entry ak,la_{k,l} (with k>lk > l) in column ll using a left multiplication by Gl,k(θ)G_{l,k}(\theta), the parameters are chosen as r=al,l2+ak,l2r = \sqrt{a_{l,l}^2 + a_{k,l}^2}, c=al,l/rc = a_{l,l}/r, and s=ak,l/rs = a_{k,l}/r (with appropriate sign conventions to avoid numerical cancellation). Each such rotation affects only the two specified rows, updating the relevant entries in the current matrix.[21] The full algorithm applies approximately n(n1)/2n(n-1)/2 such rotations for an n×nn \times n matrix, accumulating the transformations to form QQ. For implementation, the rotations are often stored explicitly or as parameters to reconstruct QQ later, and the process yields RR in place of AA. Unlike bulk transformations, this incremental zeroing allows targeted control over sparsity patterns.[21] Consider a 3×3 tridiagonal matrix
A=(120345067). A = \begin{pmatrix} 1 & 2 & 0 \\ 3 & 4 & 5 \\ 0 & 6 & 7 \end{pmatrix}.
First, apply a Givens rotation in planes 1 and 2 to zero the (2,1) entry: r=12+32=10r = \sqrt{1^2 + 3^2} = \sqrt{10}, c=1/10c = 1/\sqrt{10}, s=3/10s = 3/\sqrt{10}. The updated matrix becomes
(10710531020105102067). \begin{pmatrix} \sqrt{10} & \frac{7\sqrt{10}}{5} & \frac{3\sqrt{10}}{2} \\ 0 & -\frac{\sqrt{10}}{5} & \frac{\sqrt{10}}{2} \\ 0 & 6 & 7 \end{pmatrix}.
Next, apply a rotation in planes 2 and 3 to zero the new (3,2) entry, using the current values in column 2 below the diagonal. This process yields an upper triangular RR. For banded matrices like this, adjacent-plane rotations (i.e., ij=1|i-j|=1) maintain the bandwidth throughout.[21] This method excels for banded or sparse matrices, as rotations between adjacent indices introduce minimal fill-in, preserving structure and reducing storage and computation compared to dense methods. Additionally, independent rotations (e.g., in non-overlapping row pairs) enable straightforward parallelization across processors or threads.[22] However, for fully dense matrices, the need for O(n2)O(n^2) rotations—each applied in O(n)O(n) time—results in an overall O(n3)O(n^3) complexity with a larger constant factor than Householder reflections, making it less efficient for such cases.[21]

Numerical Properties and Stability

Relation to Determinants and Eigenvalues

For a square matrix AA with QR decomposition A=QRA = QR, where QQ is an orthogonal matrix and RR is upper triangular, the determinant satisfies det(A)=det(Q)det(R)\det(A) = \det(Q) \det(R). Since QQ is orthogonal, det(Q)=±1\det(Q) = \pm 1, and det(R)\det(R) is the product of its diagonal entries, so det(A)=±i=1nrii\det(A) = \pm \prod_{i=1}^n r_{ii}. This relation highlights how the QR factorization encodes the signed product of the diagonal elements of RR as the determinant of AA. Furthermore, the absolute value of the determinant is preserved under orthogonal transformations, yielding det(A)=i=1nrii|\det(A)| = \prod_{i=1}^n |r_{ii}|. This equation links the volume scaling factor of AA to the magnitudes of the RR diagonals, as QQ preserves volumes while RR provides the triangular scaling. The eigenvalues of AA connect to the QR decomposition through the determinant: the product of the eigenvalues of AA equals det(A)\det(A), so up to the sign from det(Q)\det(Q), it matches i=1nrii\prod_{i=1}^n r_{ii}, and in absolute terms, i=1nλi=i=1nrii\prod_{i=1}^n |\lambda_i| = \prod_{i=1}^n |r_{ii}|. This property provides theoretical insight into eigenvalue magnitudes via the QR factors. The QR iteration, originating in the work of John G. F. Francis in 1961–1962, leverages repeated QR decompositions to converge toward a triangular form where the diagonal entries approximate the eigenvalues of the original matrix. In this process, the product of the evolving RR diagonals remains tied to the eigenvalue product (up to sign), facilitating the algorithm's relation to the Schur decomposition.

Column Pivoting for Stability

Column pivoting enhances the numerical stability of the QR decomposition by strategically reordering the columns of the matrix before applying orthogonal transformations. In the column-pivoted QR decomposition, the factorization takes the form $ A \Pi = QR $, where $ A $ is an $ m \times n $ matrix with $ m \geq n $, $ \Pi $ is an $ n \times n $ permutation matrix, $ Q $ is an $ m \times n $ matrix with orthonormal columns, and $ R $ is an $ n \times n $ upper triangular matrix. At each step of the algorithm, the permutation selects the column in the current trailing submatrix with the largest Euclidean norm as the pivot column, which mitigates the propagation of rounding errors and prevents premature loss of orthogonality in $ Q $. This approach was first proposed by Businger and Golub in 1965 as a means to improve the reliability of least squares computations on ill-conditioned systems.[23] The pivoting strategy is seamlessly integrated into standard QR algorithms, such as those based on Householder reflections or Givens rotations. Before applying the orthogonal transformation to zero out the subdiagonal elements in the pivot column, the columns of the submatrix are permuted to place the entry with the maximum norm in the pivot position. This process repeats for each column, ensuring that the diagonal elements of $ R $ are as large as possible relative to off-diagonal elements in the submatrices. Analysis shows that column pivoting bounds the growth of the elements in $ R $, providing a computed factorization that is backward stable with respect to small perturbations in $ A $, and it effectively controls the condition number of the leading principal submatrices of $ R $.[24] In the rank-revealing QR (RRQR) variant, stronger guarantees are achieved by selecting pivots that not only maximize norms but also minimize the ratio of singular values in the trailing submatrix, bounding the condition number of the $ k \times k $ leading subfactor $ R_{11} $ by $ 1 + f(m,n,k) (\sigma_k / \sigma_{k+1}) $, where $ f $ is a moderate function and $ \sigma $ are singular values. A primary benefit of column pivoting is its capacity to reveal the numerical rank of $ A $, which is crucial for handling rank-deficient matrices. After factorization, the numerical rank $ r $ is estimated by inspecting the diagonal elements of $ R $: specifically, $ r $ is the largest integer such that $ |r_{ii}| \geq \tau |A|_F $ for $ i = 1, \dots, r $, where $ \tau $ is a user-defined tolerance often set to a multiple of machine epsilon times the Frobenius norm $ |A|F $. Small $ |r{ii}| $ for $ i > r $ indicate linear dependence among the columns, allowing reliable detection of deficiency without computing the full singular value decomposition. This feature is particularly valuable in rank-deficient least squares problems, where unpivoted QR may fail to isolate the effective rank due to error accumulation. The column-pivoted QR decomposition provides revelation of the matrix structure for low-rank approximation. The form is $ A = Q R P^T $, where $ P = \Pi^T $ is the permutation matrix, $ Q $ has orthonormal columns spanning an approximation of the column space, and $ R $ is upper triangular. For numerical rank $ r $, the leading $ r \times r $ submatrix of $ R $ is well-conditioned, and the trailing part of $ R $ (with small diagonals) captures dependencies corresponding to the null space dimension. The first $ r $ columns of $ Q $ span the approximate column space, while an orthonormal basis for the null space can be obtained by solving the underdetermined system involving the trailing part of $ R $ and the corresponding permuted standard basis vectors. This enables precise rank determination and stable computations for underdetermined systems. While permutations introduce sign changes in the determinant (an even number of transpositions preserves the sign, odd reverses it), this effect is accounted for in applications requiring determinant evaluation. The decomposition is essential for robust numerical algorithms, as it ensures bounded error amplification in ill-conditioned or rank-deficient scenarios.

Applications

Solving Linear Systems

QR decomposition provides an effective method for solving linear systems $ Ax = b $, where $ A $ is an $ m \times n $ matrix. For square invertible matrices ($ m = n $), the factorization $ A = QR $ with $ Q $ orthogonal and $ R $ upper triangular transforms the system into $ Rx = Q^T b $. The vector $ Q^T b $ is computed using the orthogonal property of $ Q $ by applying the sequence of Householder reflections, and $ x $ is then obtained by back-substitution on the triangular system $ Rx = c $, where $ c = Q^T b $. This approach requires $ O(n^2) $ operations for the solve phase after computing the QR factorization.[25] In practice, explicit storage of $ Q $ is avoided to reduce memory usage; instead, the Householder vectors from the QR computation are stored, allowing $ Q^T b $ to be evaluated in $ O(mn) $ operations by applying a sequence of reflections. The solution can thus be expressed as $ x = R^{-1} (Q^T b) $, with back-substitution on $ R $ costing $ O(n^2) $ flops. This storage-efficient form is particularly useful in implementations like LAPACK. For overdetermined systems ($ m > n $), assuming $ A $ has full column rank, the thin QR decomposition $ A = QR $ (with $ Q $ $ m \times n $ having orthonormal columns and $ R $ $ n \times n $ upper triangular) yields the solution minimizing $ |Ax - b|_2 $ as $ x = R^{-1} (Q^T b) $. Here, $ Q^T b $ projects $ b $ onto the column space of $ A $, followed by back-substitution, with the same computational costs as the square case for the solve.[26] The numerical stability of this method stems from the backward stability of QR factorizations computed via Householder or Givens transformations, ensuring the computed $ \hat{x} $ satisfies $ (A + \Delta A) \hat{x} = b + \Delta b $ with $ |\Delta A|_2 / |A|_2 = O(\epsilon) $ and $ |\Delta b|_2 / |b|_2 = O(\epsilon) $, where $ \epsilon $ is machine epsilon. The forward error $ |x - \hat{x}|_2 / |x|_2 $ is then bounded by approximately $ \kappa_2(A) $ times the backward error, where $ \kappa_2(A) = |A|_2 |A^{-1}|_2 $ is the 2-norm condition number; thus, well-conditioned systems yield accurate solutions. Column pivoting in QR can further enhance stability for ill-conditioned $ A $.[27]

Least Squares Problems

The linear least squares problem seeks to find the vector $ x \in \mathbb{R}^n $ that minimizes the Euclidean norm of the residual $ |Ax - b|_2 $, where $ A $ is an $ m \times n $ matrix with $ m > n $ and full column rank, and $ b \in \mathbb{R}^m $. Given the thin QR decomposition $ A = QR $, where $ Q $ is an $ m \times n $ matrix with orthonormal columns and $ R $ is an $ n \times n $ upper triangular matrix, the least squares problem reduces to minimizing $ |Rx - Q^T b|_2 $. Since the columns of $ Q $ are orthonormal, this minimization is equivalent to the original problem, as $ |Ax - b|_2 = |Q(Rx - Q^T b) + (I - QQ^T)b|_2 = \sqrt{ |Rx - Q^T b|_2^2 + |(I - QQ^T)b|_2^2 } $, where the terms are orthogonal and the second term is the fixed component orthogonal to the column space of $ A $. The solution $ x $ is obtained by solving the upper triangular system $ Rx = Q^T b $ via back substitution, and the minimum residual norm is $ \sqrt{ |b|_2^2 - |Q^T b|_2^2 } $. This QR-based approach is numerically more stable than forming and solving the normal equations $ A^T A x = A^T b $, because the condition number of $ A^T A $ is $ \kappa(A)^2 $, which squares the sensitivity to perturbations in $ A $, whereas the QR method's backward stability is governed primarily by $ \kappa(A) $. In the rank-deficient case, where $ A $ does not have full column rank, a column-pivoted QR decomposition $ A \Pi = Q R $ (with permutation matrix $ \Pi $) reveals the numerical rank $ r $ by identifying the number of significantly large diagonal entries in $ R $ before a sharp drop-off. The minimum-norm least squares solution is then computed by solving the reduced upper triangular system $ R_{11} x_{\mathcal{B}} = Q^T b $ for the basic variables corresponding to the first $ r $ pivoted columns, setting the nonbasic variables to zero, where $ R_{11} $ is the leading $ r \times r $ submatrix of $ R $.[28] For illustration, consider the overdetermined system $ A = \begin{pmatrix} 1 & 0 \ 0 & 1 \ 1 & 1 \end{pmatrix} $, $ b = \begin{pmatrix} 1 \ 1 \ 3 \end{pmatrix} $. The thin QR decomposition is $ Q = \begin{pmatrix} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{6}} \ 0 & \frac{2}{\sqrt{6}} \ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} \end{pmatrix} $, $ R = \begin{pmatrix} \sqrt{2} & \frac{1}{\sqrt{2}} \ 0 & \sqrt{\frac{3}{2}} \end{pmatrix} $ (up to signs), but solving $ Rx = Q^T b $ yields $ x = \begin{pmatrix} 4/3 \ 4/3 \end{pmatrix} $. The residual is $ Ax - b = \begin{pmatrix} 1/3 \ 1/3 \ -1/3 \end{pmatrix} $, with norm $ 1/\sqrt{3} \approx 0.577 $.

Other Uses in Numerical Linear Algebra

The QR algorithm serves as a cornerstone for computing the eigenvalues and Schur decomposition of nonsymmetric matrices in numerical linear algebra. Introduced by Francis, the method begins with a matrix A0=AA_0 = A and iteratively computes the QR factorization Ak=QkRkA_k = Q_k R_k, followed by the update Ak+1=RkQkA_{k+1} = R_k Q_k, preserving the eigenvalues of AA. Without modifications, the basic iteration reveals the eigenvalues on the diagonal as it converges slowly to upper triangular form; however, incorporating shifts—such as Wilkinson shifts—accelerates convergence to the real Schur form, where the matrix is quasi-triangular with 1×1 or 2×2 blocks on the diagonal corresponding to real or complex conjugate eigenvalue pairs. This shifted variant, also due to Francis, ensures quadratic convergence near the eigenvalues and is the basis for implementations in libraries like LAPACK.[29] To enhance computational efficiency before applying QR iterations, matrices are typically reduced to upper Hessenberg form, where all entries below the first subdiagonal are zero, using a sequence of Householder reflections. This reduction, which requires approximately 103n3\frac{10}{3}n^3 floating-point operations for an n×nn \times n matrix, preserves eigenvalues and minimizes fill-in during subsequent iterations, as each QR step on a Hessenberg matrix costs only O(n2)O(n^2) operations. The process applies n2n-2 Householder transformations to eliminate subdiagonal elements column by column, accumulating the reflectors into an orthogonal matrix QHQ_H such that A=QHHQHTA = Q_H H Q_H^T, where HH is upper Hessenberg; for symmetric matrices, this yields tridiagonal form. This preprocessing step is integral to the practical QR algorithm and was refined in early implementations to ensure numerical stability.[29][30] In the computation of the singular value decomposition (SVD), QR techniques play a key role through the Golub-Kahan bidiagonalization process, which preconditions the matrix for efficient iterative refinement. The algorithm applies alternating Householder reflections from the left and right to reduce a general m×nm \times n matrix AA to upper bidiagonal form B=PTAQB = P^T A Q, where PP and QQ are orthogonal, using O(mn2)O(mn^2) operations; this step leverages QR factorization principles to introduce structure while preserving singular values. Subsequent QR-like iterations, often with shifts, are then applied to the bidiagonal matrix to deflate and compute the singular values on the diagonal, converging rapidly due to the reduced bandwidth. This approach, foundational to SVD algorithms in numerical software, enables accurate computation even for ill-conditioned matrices by isolating small singular values early. QR decompositions are also updated efficiently for dynamic matrices in applications requiring incremental modifications, such as adaptive simulations or online learning. For rank-one updates, where a column or row is added or modified, stable algorithms reorthogonalize the Q factor using Gram-Schmidt processes with selective reorthogonalization to maintain orthogonality without full recomputation, achieving O(n2)O(n^2) cost per update. Downdating—removing a column or row—is handled similarly by solving for the inverse transformation, ensuring backward stability in finite precision arithmetic. These methods, exemplified in the Daniel-Gragg-Kaufman-Stewart framework, are crucial for maintaining QR factors in evolving least-squares problems without excessive overhead.

Generalizations and Extensions

Complex and Sparse Matrices

QR decomposition extends naturally to matrices with complex entries, where the factorization $ A = QR $ has $ Q $ as a unitary matrix satisfying $ Q^H Q = I $ (with $ ^H $ denoting the Hermitian transpose) and $ R $ upper triangular.[31] The standard algorithms, such as those based on Householder reflections or Givens rotations, adapt to the complex case by replacing transposes with Hermitian transposes in the relevant computations, ensuring the unitarity property while maintaining numerical stability similar to the real case.[32] For example, in the Householder approach, the reflection vector is chosen to zero out subdiagonal elements, with the reflector defined using complex conjugation to preserve the unitary structure.[31] For sparse matrices, QR decomposition must address the challenge of preserving the nonzero pattern to avoid excessive computational cost and storage overhead from fill-in during factorization. Algorithms apply Householder reflections or Givens rotations selectively—only to nonzeros that need elimination—rather than to the entire matrix, thereby minimizing disruptions to sparsity. Multifrontal methods assemble the factorization by processing frontal matrices corresponding to elimination trees, enabling efficient exploitation of sparsity through block operations on dense submatrices while controlling fill-in via ordering techniques.[33] Supernodal methods further optimize this by grouping columns with identical nonzero patterns into supernodes, reducing overhead in sparse BLAS operations. Software implementations support these extensions effectively. The LAPACK routine ZGEQRF computes the QR factorization for dense complex matrices, storing the essential information in a compact form for subsequent reconstruction of $ Q $.[32] For sparse cases, SuiteSparseQR provides a multifrontal, multithreaded implementation that uses Householder reflections within frontal matrices to achieve high performance while revealing rank through column pivoting.[33] A key challenge in sparse QR is fill-in minimization, where additional nonzeros arise during elimination; this is mitigated by preordering rows and columns using heuristics like minimum degree or minimum local fill to predict and reduce the growth in the sparsity pattern of $ R $.[34] Such orderings ensure that the factorization remains efficient for large-scale problems, though the choice depends on the matrix structure and desired trade-offs between fill-in and numerical stability.[34]

Block and Parallel Variants

To enhance performance on modern computer architectures with deep memory hierarchies, the classical Householder QR factorization has been extended to a blocked variant that partitions the matrix into submatrices, allowing the application of multiple Householder transformations as a single block reflector via BLAS level-3 operations like matrix-matrix multiplication (GEMM). This blocking strategy reduces memory accesses by promoting data reuse in cache, achieving up to several times higher flop rates compared to unblocked level-2 BLAS implementations. The LAPACK library implements this in the generic routine xGEQRF (e.g., DGEQRF for double precision), which computes the QR factorization while storing the Householder vectors and scalars compactly for subsequent explicit formation of Q if needed.[35][36] For large-scale matrices where the number of rows greatly exceeds the number of columns (m ≫ n), the tall-skinny QR (TSQR) algorithm provides a communication-optimal parallel variant that minimizes data movement in distributed environments. TSQR recursively applies QR factorizations to row-partitioned blocks of the matrix, forming local R factors that are then reduced via a binary tree structure to yield the global R, with the overall Q reconstructed from the tree of transformations. This approach reduces communication volume from O(m n^2 / √P) to O(n^2 √P) words for P processors, making it suitable for processors with limited bandwidth, and maintains the same numerical stability as classical Householder QR. The algorithm was introduced by Demmel et al. in their work on communication-avoiding linear algebra.[37] In distributed-memory systems using MPI, the ScaLAPACK library extends blocked QR to parallel settings through the routine PDGEQRF, which employs a block-cyclic distribution of the matrix across a processor grid to balance load and minimize communication. The algorithm proceeds by performing local QR panel factorizations on process columns, broadcasting Householder information, and updating the trailing submatrix via parallel GEMM and reductions, scaling efficiently up to thousands of processors for matrices up to petabyte scale. This implementation builds on the LAPACK blocked approach but incorporates collective operations like broadcasts and all-reduces to handle data distribution, as detailed in the foundational ScaLAPACK design by Choi, Dongarra, and colleagues.[38] For GPU acceleration, NVIDIA's cuSOLVER library provides high-performance implementations of blocked Householder QR via the cusolverDngeqrf routine, which mirrors LAPACK's interface and leverages CUDA kernels for panel factorization and trailing matrix updates, achieving high performance on NVIDIA GPUs for dense matrices up to thousands by thousands.[39] Additionally, cuBLAS supports batched QR through cublasgeqrfBatched for simultaneously factorizing multiple small or rectangular matrices, enabling efficient parallel processing of independent problems common in machine learning and simulations, with significant speedups, often 10x or more, over CPU baselines depending on matrix size and GPU model.[40][41] These GPU variants preserve the numerical properties of the blocked algorithm while exploiting massive thread parallelism for the inner loops. As of 2025, recent versions such as cuSOLVER 13.0 support advanced GPU architectures like Blackwell.[39]

References

User Avatar
No comments yet.