Recent from talks
Contribute something
Nothing was collected or created yet.
LAPACK
View on Wikipedia| LAPACK (Netlib reference implementation) | |
|---|---|
| Initial release | 1992 |
| Stable release | |
| Repository | |
| Written in | Fortran 90 |
| Type | Software library |
| License | BSD-new |
| Website | netlib |
LAPACK ("Linear Algebra Package") is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also includes routines to implement the associated matrix factorizations such as LU, QR, Cholesky and Schur decomposition.[1] LAPACK was originally written in FORTRAN 77, but moved to Fortran 90 in version 3.2 (2008).[2] The routines handle both real and complex matrices in both single and double precision. LAPACK relies on an underlying BLAS implementation to provide efficient and portable computational building blocks for its routines.[1]: "The BLAS as the Key to Portability"
LAPACK was designed as the successor to the linear equations and linear least-squares routines of LINPACK and the eigenvalue routines of EISPACK. LINPACK, written in the 1970s and 1980s, was designed to run on the then-modern vector computers with shared memory. LAPACK, in contrast, was designed to effectively exploit the caches on modern cache-based architectures and the instruction-level parallelism of modern superscalar processors,[1]: "Factors that Affect Performance" and thus can run orders of magnitude faster than LINPACK on such machines, given a well-tuned BLAS implementation.[1]: "The BLAS as the Key to Portability" LAPACK has also been extended to run on distributed memory systems in later packages such as ScaLAPACK and PLAPACK.[3]
Netlib LAPACK is licensed under a three-clause BSD style license, a permissive free software license with few restrictions.[4]
Naming scheme
[edit]Subroutines in LAPACK have a naming convention which makes the identifiers very compact. This was necessary as the first Fortran standards only supported identifiers up to six characters long, so the names had to be shortened to fit into this limit.[1]: "Naming Scheme"
A LAPACK subroutine name is in the form pmmaaa, where:
pis a one-letter code denoting the type of numerical constants used.S,Dstand for real floating-point arithmetic respectively in single and double precision, whileCandZstand for complex arithmetic with respectively single and double precision. The newer version, LAPACK95, uses generic subroutines in order to overcome the need to explicitly specify the data type.mmis a two-letter code denoting the kind of matrix expected by the algorithm. The codes for the different kind of matrices are reported below; the actual data are stored in a different format depending on the specific kind; e.g., when the codeDIis given, the subroutine expects a vector of lengthncontaining the elements on the diagonal, while when the codeGEis given, the subroutine expects an n×n array containing the entries of the matrix.aaais a one- to three-letter code describing the actual algorithm implemented in the subroutine, e.g.SVdenotes a subroutine to solve linear system, whileRdenotes a rank-1 update.
For example, the subroutine to solve a linear system with a general (non-structured) matrix using real double-precision arithmetic is called DGESV.[1]: "Linear Equations"
Use with other programming languages and libraries
[edit]Many programming environments today support the use of libraries with C binding (LAPACKE, a standardised C interface,[5] has been part of LAPACK since version 3.4.0[6]), allowing LAPACK routines to be used directly so long as a few restrictions are observed. Additionally, many other software libraries and tools for scientific and numerical computing are built on top of LAPACK, such as R,[7] MATLAB,[8] and SciPy.[9]
Several alternative language bindings are also available:
Implementations
[edit]As with BLAS, LAPACK is sometimes forked or rewritten to provide better performance on specific systems. Some of the implementations are:
- Accelerate
- Apple's framework for macOS and iOS, which includes tuned versions of BLAS and LAPACK.[10][11]
- Netlib LAPACK
- The official LAPACK.
- Netlib ScaLAPACK
- Scalable (multicore) LAPACK, built on top of PBLAS.
- Intel MKL
- Intel's Math routines for their x86 CPUs.
- OpenBLAS
- Open-source reimplementation of BLAS and LAPACK.
- Gonum LAPACK
- A partial native Go implementation.
Since LAPACK typically calls underlying BLAS routines to perform the bulk of its computations, simply linking to a better-tuned BLAS implementation can be enough to significantly improve performance. As a result, LAPACK is not reimplemented as often as BLAS is.
Similar projects
[edit]These projects provide a similar functionality to LAPACK, but with a main interface differing from that of LAPACK:
- Libflame
- A dense linear algebra library. Has a LAPACK-compatible wrapper. Can be used with any BLAS, although BLIS is the preferred implementation.[12]
- Eigen
- A header library for linear algebra. Has a BLAS and a partial LAPACK implementation for compatibility.
- MAGMA
- Matrix Algebra on GPU and Multicore Architectures (MAGMA) project develops a dense linear algebra library similar to LAPACK but for heterogeneous and hybrid architectures including multicore systems accelerated with GPGPUs.
- PLASMA
- The Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) project is a modern replacement of LAPACK for multi-core architectures. PLASMA is a software framework for development of asynchronous operations and features out of order scheduling with a runtime scheduler called QUARK that may be used for any code that expresses its dependencies with a directed acyclic graph.[13]
See also
[edit]- List of numerical libraries
- Math Kernel Library (MKL)
- NAG Numerical Library
- SLATEC, a FORTRAN 77 library of mathematical and statistical routines
- QUADPACK, a FORTRAN 77 library for numerical integration
References
[edit]- ^ a b c d e f Anderson, E.; Bai, Z.; Bischof, C.; Blackford, S.; Demmel, J.; Dongarra, J.; Du Croz, J.; Greenbaum, A.; Hammarling, S.; McKenney, A.; Sorensen, D. (1999). LAPACK Users' Guide (Third ed.). Philadelphia, PA: Society for Industrial and Applied Mathematics. ISBN 0-89871-447-8. Retrieved 28 May 2022.
- ^ "LAPACK 3.2 Release Notes". 16 November 2008.
- ^ "PLAPACK: Parallel Linear Algebra Package". www.cs.utexas.edu. University of Texas at Austin. 12 June 2007. Retrieved 20 April 2017.
- ^ "LICENSE.txt". Netlib. Retrieved 28 May 2022.
- ^ "The LAPACKE C Interface to LAPACK". LAPACK — Linear Algebra PACKage. Retrieved 2024-09-22.
- ^ "LAPACK 3.4.0". LAPACK — Linear Algebra PACKage. Retrieved 2024-09-22.
- ^ "R: LAPACK Library". stat.ethz.ch. Retrieved 2022-03-19.
- ^ "LAPACK in MATLAB". Mathworks Help Center. Retrieved 28 May 2022.
- ^ "Low-level LAPACK functions". SciPy v1.8.1 Manual. Retrieved 28 May 2022.
- ^ "Guides and Sample Code". developer.apple.com. Retrieved 2017-07-07.
- ^ "Guides and Sample Code". developer.apple.com. Retrieved 2017-07-07.
- ^ "amd/libflame: High-performance object-based library for DLA computations". GitHub. AMD. 25 August 2020.
- ^ "ICL". icl.eecs.utk.edu. Retrieved 2017-07-07.
LAPACK
View on GrokipediaHistory
Origins and Development
LAPACK emerged in the late 1980s to early 1990s as a successor to the LINPACK library, released in 1979 and optimized for solving systems of linear equations on early vector machines, and EISPACK, developed between 1972 and 1978 to provide comprehensive eigenvalue and eigenvector routines. These predecessors, while foundational for numerical linear algebra, exhibited limitations in efficiency due to their reliance on unblocked algorithms that led to poor memory access patterns and underutilization of hierarchical memory systems on advancing high-performance computers, including vector processors like the Cray-1 and nascent parallel architectures.[3][4] The project officially commenced in 1990 as a collaborative effort involving the University of Tennessee at Knoxville, Argonne National Laboratory, and the Numerical Algorithms Group (NAG) in the United Kingdom, building on preliminary prototype discussions during workshops in 1989. This partnership was supported by funding from the National Science Foundation and aimed to unify and modernize the capabilities of its predecessors for contemporary computational environments.[1][4] Central to LAPACK's design motivations was the shift toward block-based algorithms, which partition matrices into subblocks to minimize data transfers between fast cache and slower main memory, thereby enhancing computational efficiency. The library prioritized reusing verified code from LINPACK and EISPACK where possible to maintain reliability, while emphasizing portability across diverse hardware architectures through a modular structure that interfaces with the Basic Linear Algebra Subprograms (BLAS).[3][4] Among the early challenges was the fundamental transition from the unblocked algorithms of prior libraries to blocked variants, enabling the effective exploitation of Level 3 BLAS routines for matrix-matrix operations that deliver superior performance on shared-memory multiprocessors by maximizing arithmetic intensity and reducing memory bandwidth demands. This redesign required extensive restructuring to balance numerical stability, algorithmic flexibility, and hardware adaptation without compromising the robustness of the original implementations.[3][4]Key Contributors and Milestones
The development of LAPACK was led by a core team of architects, including Jack Dongarra from the University of Tennessee and Argonne National Laboratory, James Demmel from the University of California, Berkeley, Iain Duff from the Science and Technology Facilities Council (formerly associated with NAG Ltd.), and Sven Hammarling from NAG Ltd..[5][6] Additional early contributors included Victor Eijkhout from the University of Tennessee and Linda Kaufman, who provided key input on algorithms and implementation details..[7][8] LAPACK's creation involved a collaborative structure supported in part by funding from the U.S. Department of Energy (DOE) and the National Science Foundation (NSF), fostering contributions from academic, government, and industry partners..[1] By 2000, the project had engaged over 50 contributors worldwide, with ongoing development sustained through working groups and community feedback mechanisms..[5][9] Key milestones include the first public release of version 1.0 on February 29, 1992, which established LAPACK as a portable library for high-performance computing..[10] In the late 1990s, enhancements such as improved condition estimation routines were integrated, drawing from LAPACK Working Notes on robust incremental methods to enhance numerical stability..[11] The 2000s saw a focus on compliance with IEEE 754 floating-point standards, particularly in version 3.0 (released June 30, 1999, with updates through May 31, 2000), enabling better handling of NaN and infinity values across diverse hardware..[12][10] During the 2010s, additions for multicore processor support were prioritized, notably in version 3.3.0 (November 14, 2010), which optimized routines for parallel execution on shared-memory systems..[10] The University of Tennessee served as the primary host for LAPACK's development, coordinating releases and maintenance through its Innovative Computing Laboratory..[5] NAG Ltd. played a crucial role in testing, validation, and contributing specialized routines, ensuring reliability across implementations..[1]Release History
The Linear Algebra Package (LAPACK) has undergone periodic updates since its initial public release, with major versions typically issued every 2 to 5 years to incorporate algorithm refinements, enhance numerical stability, adapt to evolving hardware architectures, and address community-submitted issues through the Netlib repository.[10] These releases maintain backward compatibility where possible while introducing new drivers, deprecating obsolete or unsafe routines, and improving testing frameworks. Distribution has been hosted on Netlib since 1992, with a GitHub mirror established in 2010 to facilitate version control and collaborative development.[1][13] The inaugural version, LAPACK 1.0, was released on February 29, 1992, providing foundational routines for linear algebra operations such as solving systems of equations and computing eigenvalues, building directly on BLAS for efficient implementation.[10] Subsequent minor revisions, including 1.0a (June 30, 1992), 1.0b (October 31, 1992), and 1.1 (March 31, 1993), focused on bug fixes and minor enhancements to core functionality. Version 2.0 followed on September 30, 1994, expanding the library with additional solvers and improved workspace management.[10] LAPACK 3.0 marked a significant milestone, released on June 30, 1999, with updates in October 1999 and May 2000 introducing enhanced iterative refinement techniques and better support for generalized eigenvalue problems.[10] Version 3.1.0, issued November 12, 2006 (followed by 3.1.1 on February 26, 2007), improved handling of deprecated routines through thread-safe modifications, including removal of SAVE statements and partial column norm updates for QR factorizations.[14] The 3.2 series (November 18, 2008, with revisions to 3.2.2 in June 2010) transitioned core routines to Fortran 90 for better modularity.[14] Further advancements in version 3.3.0 (November 14, 2010; 3.3.1 April 18, 2011) included a comprehensive Fortran 90 rewrite, new C interfaces for broader accessibility, and adaptations for multicore processors via Level-3 BLAS optimizations and thread-safe utilities like xLAMCH.[14] LAPACK 3.4.0 (November 11, 2011; up to 3.4.2 September 25, 2012) refined these for shared-memory parallelism. Version 3.5.0, released November 19, 2013, added new drivers for symmetric eigenproblems using rook pivoting (xSYSV_ROOK) and improved complete orthogonal decompositions for tall matrices.[14] The 3.6.0 release (November 15, 2015; 3.6.1 June 18, 2016) and 3.7.0 (December 24, 2016; 3.7.1 June 25, 2017) emphasized numerical accuracy in iterative solvers and bug resolutions from community feedback.[10] LAPACK 3.8.0 (November 17, 2017) prepared the ground for subsequent deprecations. In 3.9.0 (November 21, 2019; 3.9.1 April 1, 2021), unsafe routines were deprecated to promote more robust alternatives, alongside additions like QR-preconditioned SVD (xGESVDQ) and Householder reconstruction updates.[15][14] More recent versions include 3.10.0 (June 28, 2021; 3.10.1 April 12, 2022), which integrated faster least-squares algorithms, and 3.11.0 (November 11, 2022), featuring Level-3 BLAS solvers for triangular systems and enhancements to QZ algorithms and Givens rotations. LAPACK 3.12.0, released November 24, 2023, introduced enhanced condition number estimators, Dynamic Mode Decomposition (xGEDMD), and truncated QR with column pivoting for large-scale data applications.[16] The latest, 3.12.1 (January 8, 2025), delivers minor compatibility fixes, errata corrections, and bolstered testing suites without architectural overhauls.[10]Overview
Purpose and Design Principles
LAPACK is a standard software library designed for high-performance numerical linear algebra computations, primarily targeting dense and banded matrices. It provides efficient solutions to common problems such as linear equation factorizations (e.g., LU, Cholesky) and matrix decompositions (e.g., QR, SVD, eigenvalue problems), optimized for modern computer architectures including vector processors and shared-memory parallel systems. By reorganizing classical algorithms from predecessors like LINPACK and EISPACK, LAPACK achieves greater efficiency through reduced data movement and improved cache utilization, making it suitable for scientific and engineering applications requiring robust matrix operations in real and complex arithmetic.[1][17] The design principles of LAPACK emphasize portability, performance, and reliability. Written in Fortran 77 (with later versions incorporating Fortran 90 features), it ensures transportability across diverse Fortran compilers and hardware platforms by avoiding machine-specific code and relying on standardized interfaces. A key principle is its dependence on the Basic Linear Algebra Subprograms (BLAS) for low-level operations, particularly Level 3 BLAS for block matrix computations, which allows vendors to provide optimized implementations tailored to specific hardware, thereby maximizing performance without altering the LAPACK source code. Backward compatibility with earlier libraries is maintained to facilitate migration, while numerical stability is prioritized through the use of proven algorithms and built-in error handling mechanisms, such as condition number estimators to assess solution reliability. LAPACK assumes adherence to the IEEE 754 floating-point standard for consistent behavior across systems.[1][17] In terms of scope, LAPACK focuses on serial and shared-memory parallel computations, excluding distributed-memory environments (addressed separately by ScaLAPACK), and it does not cover general sparse matrices. Performance goals center on block-based algorithms that partition matrices into blocks to minimize memory accesses and enable efficient BLAS calls, targeting optimal asymptotic complexity of operations for solving linear systems where feasible. Tunable workspace parameters allow users to balance computation speed and memory usage, ensuring adaptability to varying hardware constraints.[1][17]Relation to BLAS and Predecessors
LAPACK emerged as a successor to two foundational libraries: LINPACK, which focused on solving general systems of linear equations and was developed during the era of Level 1 and early Level 2 BLAS, emphasizing column-oriented operations for dense matrices; and EISPACK, a standalone collection of eigenvalue routines that operated independently without BLAS integration, often resulting in less efficient implementations on emerging high-performance architectures.[18][19] By integrating and modernizing the algorithms from both predecessors, LAPACK created a cohesive package that addressed their fragmentation—LINPACK handled linear systems but not eigenvalues comprehensively, while EISPACK lacked broader linear algebra support—while leveraging Level 3 BLAS for block-based operations to enhance performance on vector and parallel processors.[18][20] Central to LAPACK's design is its deep integration with the Basic Linear Algebra Subprograms (BLAS), where high-level LAPACK routines invoke BLAS subprograms—such as DGEMM for general matrix multiplication—to perform the core computational kernels.[21] This separation allows LAPACK to focus on algorithmic logic while delegating low-level operations to BLAS, which can be replaced with machine-optimized implementations without modifying the LAPACK source code, thereby promoting portability across diverse hardware platforms.[21][19] This evolution from LINPACK and EISPACK yielded significant benefits, including improved computational efficiency through blocking techniques that exploit Level 3 BLAS for matrix-matrix operations, achieving significant performance improvements on vector machines compared to the approaches of its predecessors.[20][18] LAPACK also provides a unified interface for tackling mixed linear algebra problems, such as combining equation solving with eigenvalue computations, and Release 3.0 did not include all facilities from LINPACK and EISPACK, focusing instead on a streamlined set of routines for modern efficiency.[18] Despite these advances, LAPACK inherits limitations from its BLAS foundation, remaining primarily Fortran-centric in its implementation and assuming column-major storage order, which aligns with Fortran conventions but may require adaptations for other languages or row-major formats.[21][19]Naming Conventions
Routine Naming Scheme
The LAPACK routine naming scheme employs a compact six-character convention to encode the data type, matrix characteristics, and computational task, facilitating quick identification of subroutine functionality within the constraints of Fortran 77 naming limits.[22] Each routine name follows the pattern XYYZZZ, where the first character (X) specifies the precision and data type, the next two characters (YY) denote the matrix type, and the following three characters (ZZZ) indicate the specific operation or task.[22] This structure was designed to promote portability and consistency across implementations.[22] The first character distinguishes between real and complex numbers as well as single and double precision: 'S' for single-precision real, 'D' for double-precision real, 'C' for single-precision complex, and 'Z' for double-precision complex (also known as complex*16).[22] The matrix type (YY) uses standardized codes such as 'GE' for general dense matrices, 'SY' or 'HE' for symmetric or Hermitian matrices (real or complex), 'TR' for triangular matrices, 'OR' or 'UN' for orthogonal or unitary matrices from QR factorizations, and 'PO' for positive definite matrices.[22] The task portion (ZZZ) describes the computation, for example, 'TRF' for LU or Cholesky factorization, 'SV' for solving linear systems, 'EV' for eigenvalue problems, 'VD' for singular value decomposition, or 'BRD' for bidiagonal reduction.[23] Representative examples illustrate the scheme: DGETRF performs LU factorization on a double-precision general real matrix (D for double, GE for general, TRF for triangular factorization); ZHEEV computes eigenvalues and eigenvectors for a double-precision complex Hermitian matrix (Z for double complex, HE for Hermitian, EV for eigenproblem).[22] Routine names are written in uppercase for Fortran compatibility, and auxiliary or unblocked variants may append a digit like '2' (e.g., SGETF2 as the unblocked version of SGETRF) or follow BLAS naming patterns in some cases.[22] Driver routines, which orchestrate complete solutions (e.g., xGESV for solving general linear systems), often lack additional suffixes, while computational routines for specific steps (e.g., xGETRF for factorization) form the core building blocks.[22] This naming convention was established with the initial release of LAPACK version 1.0 in 1992 and has remained largely consistent, with minor extensions for new routines in subsequent versions, such as support for generalized eigenvalue problems in version 3.0 (1999).[22][24]Precision and Storage Conventions
LAPACK supports four precision variants for its routines, distinguished by a prefix in the routine name: single precision real (S), double precision real (D), single precision complex (C), and double precision complex (Z).[22] These variants share identical algorithmic logic, with the primary difference being the scaling of the underlying arithmetic operations to match the precision level; for instance, double precision routines are automatically generated from single precision templates using tools like Toolpack/1 to ensure consistency.[25] This templating approach allows users to select the appropriate precision based on the required accuracy and computational resources, while maintaining portability across systems that support standard floating-point representations.[25] Matrices in LAPACK are stored in column-major order, aligning with Fortran's default convention and facilitating efficient access patterns for block-based algorithms.[26] For general dense matrices, storage uses full two-dimensional arrays, where leading dimensions may exceed the matrix size to accommodate subarrays or extra space per Fortran 77 rules.[26] Specialized formats optimize storage for structured matrices: banded storage for matrices with limited bandwidth to reduce memory for sparse-like problems; triangular storage for upper or lower triangular matrices, holding only the relevant portion; and for symmetric or Hermitian matrices, either full storage of the entire array or packed storage that compacts the redundant elements into a one-dimensional array.[26] Workspace requirements vary by routine and are specified via parameters like LWORK, which indicates the size of the work array; users must allocate sufficient space to avoid errors, with routines often providing mechanisms to query the optimal size.[27] LAPACK's numerical computations assume adherence to the IEEE 754 floating-point standard, particularly for handling infinities, NaNs, and roundoff errors, as configured by default in auxiliary routines like ILAENV.[28] Error handling is managed through the INFO output parameter in most routines: INFO = 0 indicates successful completion, negative values signal invalid input arguments (with the absolute value specifying the offending parameter), and positive values denote computational issues such as singularity or non-convergence.[29] For reliability assessment, many routines compute and return estimates like reciprocal condition numbers (RCOND) or residuals to quantify solution accuracy, enabling users to evaluate error propagation relative to machine epsilon.[29] To enhance efficiency, LAPACK prioritizes in-place operations wherever feasible, allowing input matrices to be overwritten with results and thereby minimizing memory allocations and copies.[30] A key optimization for workspace management is the query mode, invoked by setting LWORK = -1 (or equivalent for other work parameters), which runs the routine without performing the full computation to return the optimal workspace size in the first element of the work array, guiding users toward memory-efficient calls.[27]Core Functionality
Linear Equation Systems
LAPACK provides routines for solving systems of linear equations of the form , where is a dense square matrix of order , and are vectors of length , and the system may have multiple right-hand sides stored as columns of a matrix . These routines support general matrices, symmetric indefinite matrices, symmetric positive definite matrices, and triangular matrices, enabling efficient solutions through matrix factorizations that can be reused for multiple right-hand sides. The library emphasizes numerical stability and efficiency on high-performance architectures by relying on block-based algorithms that call level 3 BLAS operations.[31] For general matrices, the simple driver routine DGESV computes the solution by first performing LU factorization with partial pivoting using DGETRF, which decomposes the permuted matrix as , where is a permutation matrix, is unit lower triangular, and is upper triangular, followed by forward and back substitution using DGETRS to solve and . The expert driver DGESVX extends this by including optional row and column scaling for equilibration to handle ill-conditioned systems, iterative refinement to improve accuracy, and estimation of condition numbers and error bounds. These steps ensure backward stability, with the residual norm estimable via routines like DLACON for monitoring solution quality. Error handling is managed through an INFO flag returned by the drivers: INFO = 0 indicates success, INFO > 0 signals a singular or nearly singular pivot (indicating rank deficiency), and INFO < 0 denotes an illegal argument in the calling sequence.[32] Symmetric indefinite systems are addressed by drivers such as DSYSV, which uses the Bunch-Kaufman factorization via DSYTRF to decompose into a block diagonal form with 1×1 or 2×2 diagonal blocks and a block bidiagonal multiplier matrix, preserving symmetry while allowing stable computations without pivoting that could destroy the structure, followed by DSYTRS for the solve phase. For symmetric positive definite matrices, DPOSV employs Cholesky factorization with DPOTRF, yielding (or ) where (or ) is upper (lower) triangular, and then DPOTRS for efficient triangular solves, offering reduced computational cost compared to general methods due to the positive definiteness assumption. Both drivers support multiple right-hand sides and reuse of the factorization. The expert variants DSYSVX and DPOSVX incorporate equilibration, iterative refinement, and reciprocal condition number estimates for enhanced reliability on ill-conditioned problems.[31] Triangular systems , arising after factorization, are solved directly using routines like DTRSV, which performs forward or back substitution on upper or lower triangular matrices , with options for unit or non-unit diagonals and transposition; this routine is invoked internally by the drivers but can be called independently for efficiency when the triangular factor is available. Overall, these routines prioritize direct methods for dense matrices, with iterative refinement steps using techniques like those in DGECON for norm estimation to achieve high relative accuracy, typically on the order of machine epsilon times the condition number.[33]Eigenvalue and Singular Value Decomposition
LAPACK provides a comprehensive set of routines for solving eigenvalue problems, encompassing both standard and generalized forms for various matrix types, including symmetric, Hermitian, and general real or complex matrices. The standard eigenvalue problem seeks scalars and nonzero vectors satisfying , while the generalized problem extends this to , where and are square matrices. For symmetric or Hermitian matrices, eigenvalues are real, enabling specialized algorithms that exploit structure for efficiency and accuracy.[34][35] For symmetric and Hermitian eigenvalue problems, LAPACK drivers such as DSYEVD and DSYEVR reduce the matrix to tridiagonal form using orthogonal transformations (e.g., via DSYTRD or CHETRD), followed by computation of eigenvalues and optionally eigenvectors using the divide-and-conquer algorithm in DSTEDC or the relatively robust representation (RRR) method in DSTEMR/DSTEGR, which ensures high relative accuracy even for clustered eigenvalues. These routines output eigenvalues in ascending order on a diagonal array and eigenvectors either overwriting the original matrix or in a separate array. For generalized symmetric-definite problems ( with positive definite), drivers like DSYGVD or DSYGVR perform Cholesky factorization on to reduce to a standard problem, followed by similar tridiagonal eigensolving steps.[34][35][36] General nonsymmetric eigenvalue problems are addressed through routines like DGEEV, which computes eigenvalues and right eigenvectors by reducing to upper Hessenberg form (via DGEHRD) and applying the multishift QR iteration (HQR algorithm) to obtain the Schur form, where eigenvalues appear on the diagonal of a quasi-triangular matrix. DGEEV also balances the matrix via scaling to improve stability and accuracy before reduction. For problems requiring ordered Schur decomposition, DGEES computes the generalized Schur form with optional sorting based on user-defined criteria. In the generalized nonsymmetric case (), DGGEV and DGGES employ the QZ algorithm on a generalized Schur decomposition (, ), with balancing via scaling of and to mitigate ill-conditioning; outputs include eigenvalue pairs where , right and left eigenvectors, and deflating subspaces. Expert routines, such as those with workspace query options (e.g., IWORK and LWORK parameters), allow fine control over memory allocation. Condition numbers for eigenvalues can be estimated using DLANGE to compute matrix norms.[34][37] Singular value decomposition (SVD) in LAPACK computes the factorization for an general real or complex matrix , where and are unitary, and is diagonal with nonnegative singular values in descending order. Drivers DGESVD and the faster divide-and-conquer variant DGESDD first bidiagonalize using Householder transformations (via DGEBRD), then apply QR iteration (DBDSQR) or divide-and-conquer (DBDSDC) to the bidiagonal form for singular values and optionally vectors. These support full SVD (all columns of and ) or partial (thin or truncated forms for economy). Balancing via scaling precedes bidiagonalization to enhance numerical stability, particularly for ill-conditioned matrices. Outputs include the singular values in a vector, left singular vectors in , and right singular vectors in , with expert control over workspace via query parameters; singular vector condition numbers can be assessed using DLANGE on relevant submatrices. For generalized SVD of matrix pairs , routines like DGSVD compute decompositions involving shared orthogonal factors, with error bounds bounded by machine epsilon divided by the reciprocal condition number.[38][39][40]| Category | Key Driver Routines | Primary Algorithm |
|---|---|---|
| Symmetric/Hermitian Eigenvalues | DSYEVD, DSYEVR | Tridiagonal reduction + divide-and-conquer or RRR |
| Generalized Symmetric-Definite Eigenvalues | DSYGVD, DSYGVR | Cholesky reduction + standard symmetric solver |
| General Nonsymmetric Eigenvalues | DGEEV, DGEES | Hessenberg reduction + multishift QR (HQR) |
| Generalized Nonsymmetric Eigenvalues | DGGEV, DGGES | QZ algorithm on generalized Schur form |
| SVD | DGESVD, DGESDD | Bidiagonal reduction + QR or divide-and-conquer |
Least Squares and Orthogonalization
LAPACK provides routines for solving linear least squares problems of the form , where is an real matrix, is , and is . These problems arise in overdetermined systems (), where a least squares approximation is sought, and underdetermined systems (), where a minimum-norm solution is typically desired among those minimizing the residual. The package favors orthogonal factorization methods over forming the normal equations , as the latter can magnify conditioning errors due to squaring the condition number.[41] The primary driver routine for the general linear least squares problem is DGELS, which handles both overdetermined and underdetermined cases using QR or LQ factorization. For , DGELS computes the thin QR factorization via the computational routine DGEQRF, applies to using DORMQR to obtain , and solves the upper triangular system (or a portion thereof) with DTRTRS. For , it employs the LQ factorization and solves analogously. This approach ensures numerical stability by preserving the Euclidean norm through the orthogonal factor . Multiple right-hand sides are supported by passing an matrix , with solutions overwriting .[42] For generalized least squares problems with linear equality constraints, such as subject to , where is and is with , the routine DGGLSE is used. It solves this via the generalized RQ factorization of , computed through auxiliary routines like DGERQF and DORGRQ, followed by triangular solves. This factorization yields and , where and are trapezoidal, enabling the constrained minimization.[43] Orthogonalization in LAPACK centers on factorizations like QR, which decompose into an orthogonal matrix and an upper triangular (or trapezoidal) matrix . The core routine DGEQRF computes the QR factorization for general rectangular matrices, producing both thin () and full () variants implicitly through a compact representation. is stored as a product of elementary Householder reflectors, each defined as where is a Householder vector with , and , applied sequentially to triangularize . To explicitly form or apply it to vectors, routines like DORGQR and DORMQR are invoked. For generalized problems, DGGES computes the QZ (generalized Schur) decomposition of a pair , yielding and , where and are orthogonal, and are triangular; this supports orthogonal transformations in generalized eigenvalue contexts but is distinct from standard least squares.[44][45] Rank-revealing capabilities are provided by the pivoted QR routine DGEQP3, which incorporates column pivoting to select numerically strong columns first, approximating a rank-revealing decomposition . Pivoting helps detect the numerical rank by monitoring diagonal elements of against a user-specified tolerance, typically , where is machine precision. Iterative refinement for residuals can be applied post-solution using auxiliary routines like DRSCL to scale and check accuracy. Complete orthogonal factorization for rank-deficient cases extends this via routines such as DGELSY, which combines pivoted QR with an RZ factorization (via DTZRZF) to yield , where and are orthogonal, and is trapezoidal with the leading block upper triangular; this enables basic solutions and pseudoinverses for underdetermined rank-deficient systems. These routines find applications in linear regression, where DGELS fits models by minimizing squared errors, and in computing Moore-Penrose pseudoinverses via the orthogonal factors for ill-posed problems. Rank detection via DGEQP3 tolerances aids in identifying effective dimensions in data analysis. While singular value decomposition offers an alternative for least squares via the pseudoinverse, QR-based methods in this section provide faster, more stable solutions for dense problems without full spectral information.[46][41]Implementations and Interfaces
Reference Implementation
The reference implementation of LAPACK is a portable Fortran library providing the standard, unoptimized codebase for numerical linear algebra routines. Primarily written in Fortran 90 with some Fortran 77 compatibility elements, it encompasses approximately 4000 subroutines covering linear systems, eigenvalue problems, singular value decompositions, and related computations. The source code has been freely distributed via Netlib since the initial LAPACK version 1.0 release on February 29, 1992, and the official development repository on GitHub under the Reference-LAPACK organization was established around 2010 to facilitate version control and collaboration.[1][13][10] Building the reference implementation involves a Makefile-driven process that configures compilation options via a customizablemake.inc file and requires linking to an external BLAS library for low-level operations, as LAPACK delegates basic linear algebra tasks to BLAS. The included testing suite, known as xLAPACK, provides extensive validation by generating test matrices and verifying results against expected outputs, covering more than 90% of the code paths to detect numerical accuracy issues and bugs. This ensures reliability across installations, with users encouraged to run the full test suite post-compilation.[13][47][48]
Maintenance of the reference implementation is community-driven, involving contributors from academia and industry who submit bug reports and proposed fixes through GitHub issues and pull requests. Updates are released periodically, with patch versions addressing specific errata. The codebase emphasizes portability, compiling on diverse platforms including Linux, Windows, and macOS without any hardware-specific optimizations or assembly code, thereby relying on the chosen BLAS implementation for platform-tuned performance.[13][10][1]
Optimized and Vendor-Specific Versions
Optimized and vendor-specific versions of LAPACK leverage hardware-specific features, such as vector extensions and multi-core architectures, to deliver substantial performance gains over the reference implementation while preserving interface compatibility. These implementations typically incorporate hand-tuned assembly code for underlying BLAS operations, enabling efficient use of modern CPU capabilities. Intel's oneAPI Math Kernel Library (oneMKL) offers a highly optimized LAPACK suite tailored for Intel x86 processors, including extensive threading via OpenMP for shared-memory parallelism and support for SIMD instructions like AVX-512 to accelerate vector and matrix operations. As of November 2025, oneMKL has integrated bug fixes and updates from Netlib LAPACK 3.12.1.[49] The library also facilitates GPU offloading for select routines through integration with heterogeneous computing frameworks like MAGMA, allowing hybrid CPU-GPU execution for large-scale problems.[50] AMD's Optimizing CPU Libraries (AOCL) provide AOCL-LAPACK, a performant implementation aligned with Netlib LAPACK 3.12.0 and tuned for Zen-based processors such as EPYC and Ryzen series, with specific enhancements in routines like LU factorizations (e.g., DGETRF) and singular value decomposition (DGESDD).[51] It supports multi-threading and x86 instruction set architectures (ISA) extensions for improved scalability across core counts.[52] Arm Performance Libraries deliver a LAPACK implementation optimized for Arm-based systems, incorporating block algorithms that rely on efficient BLAS calls and OpenMP threading for multi-processor environments, with support for vector extensions like Neon and Scalable Vector Extension (SVE).[53] Among open-source alternatives, OpenBLAS stands out as a multithreaded library that auto-detects and tunes for various CPU architectures, implementing LAPACK routines alongside BLAS with assembly-optimized kernels for SIMD acceleration (e.g., AVX and ARMv8).[54] Its predecessor, GotoBLAS2, emphasized cache blocking strategies and architecture-specific assembly to boost efficiency in linear algebra operations.[55] These versions serve as drop-in replacements for the reference LAPACK, requiring no changes to calling code due to adherence to standard interfaces in Fortran and C.[56][51] Performance benchmarks on modern hardware demonstrate speedups of 2-10x over the reference implementation for key routines, attributed to optimizations like AVX-512 vectorization and multi-threading, though exact gains vary by problem size and architecture.[57] For instance, AOCL-LAPACK post-optimization achieves up to 110% improvement in GFLOPS for banded LU factorization compared to prior versions, often outperforming OpenBLAS on small matrices.[58]Language Bindings and Wrappers
LAPACKE, introduced with LAPACK version 3.3.0 in November 2010, serves as the official C interface to LAPACK routines, enabling seamless integration into C and C++ programs by wrapping Fortran calls with C-compatible types such aslapack_int for integers and dedicated complex types like lapack_complex_double.[59] This two-level design includes a high-level interface that automatically manages workspace memory allocation and deallocation using functions like LAPACKE_malloc and LAPACKE_free, while the middle-level interface requires users to provide pre-allocated arrays for greater control.[59] Both levels support column-major and row-major matrix layouts, with optional NaN checking in the high-level variant to detect invalid inputs, and error codes are returned directly for propagation in calling code.[59]
LAPACK95, released in November 2000, provides an official Fortran 95 interface that enhances usability through object-oriented features, including generic interfaces and modules for accessing LAPACK drivers, computational routines, and auxiliary functions.[60] It employs Fortran 95's module system to encapsulate routines, allowing for type-bound procedures and operator overloading to simplify calls, such as treating matrices as objects in eigenvalue or linear system solvers.[61] Error handling follows LAPACK conventions but integrates with Fortran 95's optional arguments for flexible input validation and output.[61]
Third-party bindings extend LAPACK accessibility to higher-level languages, often incorporating automatic memory management and language-native error mechanisms. In Python, the SciPy library's scipy.linalg module offers high-level wrappers around LAPACK routines, leveraging NumPy arrays for seamless type conversions—including support for complex dtypes like complex128—and in-place operations to optimize memory usage via flags like overwrite_a.[62] Errors from LAPACK are propagated as Python exceptions, such as LinAlgError for singular matrices or numerical issues.[62] Julia's standard LinearAlgebra module directly invokes LAPACK functions for core operations like LU factorization (dgetrf!) and SVD (dgesvd!), with automatic handling of array ownership and complex types through Julia's type system, raising domain errors for invalid computations.[63]
In R, the base package integrates LAPACK calls for linear algebra tasks, such as eigen decompositions and least-squares solutions, using R's internal libRlapack or external implementations, with memory managed via R's vector allocations and errors signaled through warnings or stops.[64] For Java, third-party wrappers like netlib-java provide JNI-based access to LAPACK and BLAS, supporting automatic library loading and fallback to pure-JVM implementations, while the older JAMA package— a basic pure-Java linear algebra library without direct LAPACK ties—has been largely deprecated in favor of such bindings.[65] These bindings typically adapt LAPACK's routine naming scheme, prefixing with language-specific conventions like scipy.linalg.lapack.dgesv in Python.
These language bindings have seen widespread adoption in scientific computing ecosystems, facilitating LAPACK's use in diverse applications from data analysis to simulations; for instance, SciPy routinely employs LAPACK via optimized backends like OpenBLAS to deliver high-performance linear algebra on NumPy arrays.[66]