Hubbry Logo
LAPACKLAPACKMain
Open search
LAPACK
Community hub
LAPACK
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
LAPACK
LAPACK
from Wikipedia
LAPACK (Netlib reference implementation)
Initial release1992; 33 years ago (1992)
Stable release
3.12.1 Edit this on Wikidata / 8 January 2025; 9 months ago (8 January 2025)
Repository
Written inFortran 90
TypeSoftware library
LicenseBSD-new
Websitenetlib.org/lapack/ Edit this on Wikidata

LAPACK ("Linear Algebra Package") is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also includes routines to implement the associated matrix factorizations such as LU, QR, Cholesky and Schur decomposition.[1] LAPACK was originally written in FORTRAN 77, but moved to Fortran 90 in version 3.2 (2008).[2] The routines handle both real and complex matrices in both single and double precision. LAPACK relies on an underlying BLAS implementation to provide efficient and portable computational building blocks for its routines.[1]: "The BLAS as the Key to Portability"

LAPACK was designed as the successor to the linear equations and linear least-squares routines of LINPACK and the eigenvalue routines of EISPACK. LINPACK, written in the 1970s and 1980s, was designed to run on the then-modern vector computers with shared memory. LAPACK, in contrast, was designed to effectively exploit the caches on modern cache-based architectures and the instruction-level parallelism of modern superscalar processors,[1]: "Factors that Affect Performance" and thus can run orders of magnitude faster than LINPACK on such machines, given a well-tuned BLAS implementation.[1]: "The BLAS as the Key to Portability" LAPACK has also been extended to run on distributed memory systems in later packages such as ScaLAPACK and PLAPACK.[3]

Netlib LAPACK is licensed under a three-clause BSD style license, a permissive free software license with few restrictions.[4]

Naming scheme

[edit]

Subroutines in LAPACK have a naming convention which makes the identifiers very compact. This was necessary as the first Fortran standards only supported identifiers up to six characters long, so the names had to be shortened to fit into this limit.[1]: "Naming Scheme"

A LAPACK subroutine name is in the form pmmaaa, where:

  • p is a one-letter code denoting the type of numerical constants used. S, D stand for real floating-point arithmetic respectively in single and double precision, while C and Z stand for complex arithmetic with respectively single and double precision. The newer version, LAPACK95, uses generic subroutines in order to overcome the need to explicitly specify the data type.
  • mm is a two-letter code denoting the kind of matrix expected by the algorithm. The codes for the different kind of matrices are reported below; the actual data are stored in a different format depending on the specific kind; e.g., when the code DI is given, the subroutine expects a vector of length n containing the elements on the diagonal, while when the code GE is given, the subroutine expects an n×n array containing the entries of the matrix.
  • aaa is a one- to three-letter code describing the actual algorithm implemented in the subroutine, e.g. SV denotes a subroutine to solve linear system, while R denotes a rank-1 update.

For example, the subroutine to solve a linear system with a general (non-structured) matrix using real double-precision arithmetic is called DGESV.[1]: "Linear Equations"

Matrix types in the LAPACK naming scheme
Name Description
BD bidiagonal matrix
DI diagonal matrix
GB general band matrix
GE general matrix (i.e., unsymmetric, in some cases rectangular)
GG general matrices, generalized problem (i.e., a pair of general matrices)
GT general tridiagonal matrix
HB (complex) Hermitian band matrix
HE (complex) Hermitian matrix
HG upper Hessenberg matrix, generalized problem (i.e. a Hessenberg and a triangular matrix)
HP (complex) Hermitian, packed storage matrix
HS upper Hessenberg matrix
OP (real) orthogonal matrix, packed storage matrix
OR (real) orthogonal matrix
PB symmetric matrix or Hermitian matrix positive definite band
PO symmetric matrix or Hermitian matrix positive definite
PP symmetric matrix or Hermitian matrix positive definite, packed storage matrix
PT symmetric matrix or Hermitian matrix positive definite tridiagonal matrix
SB (real) symmetric band matrix
SP symmetric, packed storage matrix
ST (real) symmetric matrix tridiagonal matrix
SY symmetric matrix
TB triangular band matrix
TG triangular matrices, generalized problem (i.e., a pair of triangular matrices)
TP triangular, packed storage matrix
TR triangular matrix (or in some cases quasi-triangular)
TZ trapezoidal matrix
UN (complex) unitary matrix
UP (complex) unitary, packed storage matrix

Use with other programming languages and libraries

[edit]

Many programming environments today support the use of libraries with C binding (LAPACKE, a standardised C interface,[5] has been part of LAPACK since version 3.4.0[6]), allowing LAPACK routines to be used directly so long as a few restrictions are observed. Additionally, many other software libraries and tools for scientific and numerical computing are built on top of LAPACK, such as R,[7] MATLAB,[8] and SciPy.[9]

Several alternative language bindings are also available:

Implementations

[edit]

As with BLAS, LAPACK is sometimes forked or rewritten to provide better performance on specific systems. Some of the implementations are:

Accelerate
Apple's framework for macOS and iOS, which includes tuned versions of BLAS and LAPACK.[10][11]
Netlib LAPACK
The official LAPACK.
Netlib ScaLAPACK
Scalable (multicore) LAPACK, built on top of PBLAS.
Intel MKL
Intel's Math routines for their x86 CPUs.
OpenBLAS
Open-source reimplementation of BLAS and LAPACK.
Gonum LAPACK
A partial native Go implementation.

Since LAPACK typically calls underlying BLAS routines to perform the bulk of its computations, simply linking to a better-tuned BLAS implementation can be enough to significantly improve performance. As a result, LAPACK is not reimplemented as often as BLAS is.

Similar projects

[edit]

These projects provide a similar functionality to LAPACK, but with a main interface differing from that of LAPACK:

Libflame
A dense linear algebra library. Has a LAPACK-compatible wrapper. Can be used with any BLAS, although BLIS is the preferred implementation.[12]
Eigen
A header library for linear algebra. Has a BLAS and a partial LAPACK implementation for compatibility.
MAGMA
Matrix Algebra on GPU and Multicore Architectures (MAGMA) project develops a dense linear algebra library similar to LAPACK but for heterogeneous and hybrid architectures including multicore systems accelerated with GPGPUs.
PLASMA
The Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) project is a modern replacement of LAPACK for multi-core architectures. PLASMA is a software framework for development of asynchronous operations and features out of order scheduling with a runtime scheduler called QUARK that may be used for any code that expresses its dependencies with a directed acyclic graph.[13]

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
LAPACK (Linear Algebra PACKage) is a standard open-source software library written in Fortran 90 for performing numerical linear algebra computations, offering routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, singular value decompositions, and various matrix factorizations such as LU, Cholesky, QR, and Schur. Developed to improve upon the earlier LINPACK and EISPACK libraries by leveraging block-based algorithms for enhanced performance on modern vector and parallel processors, LAPACK was first released on February 29, 1992, and has since become a foundational tool in scientific computing, with its latest version, 3.12.1, issued on January 8, 2025. The library supports both real and complex matrices in single and double precision, handles dense and banded matrices (but not sparse ones), and relies on the BLAS (Basic Linear Algebra Subprograms) interface—particularly Level 3 BLAS—for optimized matrix-matrix operations that achieve high computational efficiency. Jointly maintained by researchers from the University of Tennessee, University of California, Berkeley, University of Colorado Denver, and NAG Ltd., with sponsorship from organizations including the National Science Foundation (NSF), Department of Energy (DOE), MathWorks, and Intel, LAPACK is distributed under a permissive modified BSD license, enabling widespread adoption in applications ranging from physics simulations to machine learning. Its design emphasizes reliability, portability across computing architectures, and ease of use through standardized argument lists and comprehensive documentation, as detailed in the official LAPACK Users' Guide.

History

Origins and Development

LAPACK emerged in the late 1980s to early 1990s as a successor to the LINPACK library, released in 1979 and optimized for solving systems of linear equations on early vector machines, and EISPACK, developed between 1972 and 1978 to provide comprehensive eigenvalue and eigenvector routines. These predecessors, while foundational for numerical linear algebra, exhibited limitations in efficiency due to their reliance on unblocked algorithms that led to poor memory access patterns and underutilization of hierarchical memory systems on advancing high-performance computers, including vector processors like the Cray-1 and nascent parallel architectures. The project officially commenced in 1990 as a collaborative effort involving the at Knoxville, Argonne National Laboratory, and the Numerical Algorithms Group (NAG) in the , building on preliminary prototype discussions during workshops in 1989. This partnership was supported by funding from the and aimed to unify and modernize the capabilities of its predecessors for contemporary computational environments. Central to LAPACK's design motivations was the shift toward block-based algorithms, which partition matrices into subblocks to minimize data transfers between fast cache and slower main , thereby enhancing computational efficiency. The library prioritized reusing verified code from LINPACK and EISPACK where possible to maintain reliability, while emphasizing portability across diverse hardware architectures through a modular structure that interfaces with the (BLAS). Among the early challenges was the fundamental transition from the unblocked algorithms of prior libraries to blocked variants, enabling the effective exploitation of Level 3 BLAS routines for matrix-matrix operations that deliver superior performance on shared-memory multiprocessors by maximizing arithmetic intensity and reducing demands. This redesign required extensive restructuring to balance , algorithmic flexibility, and hardware adaptation without compromising the robustness of the original implementations.

Key Contributors and Milestones

The development of LAPACK was led by a core team of architects, including from the University of Tennessee and Argonne National Laboratory, James Demmel from the , Iain Duff from the (formerly associated with NAG Ltd.), and Sven Hammarling from NAG Ltd.. Additional early contributors included Victor Eijkhout from the University of Tennessee and Linda Kaufman, who provided key input on algorithms and implementation details.. LAPACK's creation involved a collaborative structure supported in part by funding from the U.S. Department of Energy (DOE) and the (NSF), fostering contributions from academic, government, and industry partners.. By 2000, the project had engaged over 50 contributors worldwide, with ongoing development sustained through working groups and community feedback mechanisms.. Key milestones include the first public release of version 1.0 on February 29, 1992, which established LAPACK as a portable for .. In the late , enhancements such as improved condition estimation routines were integrated, drawing from LAPACK Working Notes on robust incremental methods to enhance .. The 2000s saw a focus on compliance with floating-point standards, particularly in version 3.0 (released June 30, 1999, with updates through May 31, 2000), enabling better handling of and values across diverse hardware.. During the , additions for support were prioritized, notably in version 3.3.0 (November 14, 2010), which optimized routines for parallel execution on shared-memory systems.. The served as the primary host for LAPACK's development, coordinating releases and maintenance through its Innovative Computing Laboratory.. NAG Ltd. played a crucial role in testing, validation, and contributing specialized routines, ensuring reliability across implementations..

Release History

The Linear Algebra Package (LAPACK) has undergone periodic updates since its initial public release, with major versions typically issued every 2 to 5 years to incorporate algorithm refinements, enhance , adapt to evolving hardware architectures, and address community-submitted issues through the Netlib repository. These releases maintain backward compatibility where possible while introducing new drivers, deprecating obsolete or unsafe routines, and improving testing frameworks. Distribution has been hosted on Netlib since 1992, with a mirror established in 2010 to facilitate and collaborative development. The inaugural version, LAPACK 1.0, was released on February 29, 1992, providing foundational routines for linear algebra operations such as solving systems of equations and computing eigenvalues, building directly on BLAS for efficient implementation. Subsequent minor revisions, including 1.0a (June 30, 1992), 1.0b (October 31, 1992), and 1.1 (March 31, 1993), focused on bug fixes and minor enhancements to core functionality. followed on September 30, 1994, expanding the library with additional solvers and improved workspace management. LAPACK 3.0 marked a significant milestone, released on June 30, 1999, with updates in October 1999 and May 2000 introducing enhanced iterative refinement techniques and better support for generalized eigenvalue problems. Version 3.1.0, issued November 12, 2006 (followed by 3.1.1 on February 26, 2007), improved handling of deprecated routines through thread-safe modifications, including removal of SAVE statements and partial column norm updates for QR factorizations. The 3.2 series (November 18, 2008, with revisions to 3.2.2 in June 2010) transitioned core routines to Fortran 90 for better modularity. Further advancements in version 3.3.0 (November 14, 2010; 3.3.1 April 18, 2011) included a comprehensive 90 rewrite, new C interfaces for broader accessibility, and adaptations for multicore processors via Level-3 BLAS optimizations and thread-safe utilities like xLAMCH. LAPACK 3.4.0 (November 11, 2011; up to 3.4.2 September 25, 2012) refined these for shared-memory parallelism. Version 3.5.0, released November 19, 2013, added new drivers for symmetric eigenproblems using rook pivoting (xSYSV_ROOK) and improved complete orthogonal decompositions for tall matrices. The 3.6.0 release (November 15, 2015; 3.6.1 June 18, 2016) and 3.7.0 (December 24, 2016; 3.7.1 June 25, 2017) emphasized numerical accuracy in iterative solvers and bug resolutions from community feedback. LAPACK 3.8.0 (November 17, 2017) prepared the ground for subsequent deprecations. In 3.9.0 (November 21, 2019; 3.9.1 April 1, 2021), unsafe routines were deprecated to promote more robust alternatives, alongside additions like QR-preconditioned SVD (xGESVDQ) and reconstruction updates. More recent versions include 3.10.0 (June 28, 2021; 3.10.1 April 12, 2022), which integrated faster least-squares algorithms, and 3.11.0 (November 11, 2022), featuring Level-3 BLAS solvers for triangular systems and enhancements to QZ algorithms and Givens rotations. LAPACK 3.12.0, released November 24, 2023, introduced enhanced estimators, (xGEDMD), and truncated QR with column pivoting for large-scale data applications. The latest, 3.12.1 (January 8, 2025), delivers minor compatibility fixes, errata corrections, and bolstered testing suites without architectural overhauls.

Overview

Purpose and Design Principles

LAPACK is a standard software library designed for high-performance numerical linear algebra computations, primarily targeting dense and banded matrices. It provides efficient solutions to common problems such as linear equation factorizations (e.g., LU, Cholesky) and matrix decompositions (e.g., QR, SVD, eigenvalue problems), optimized for modern computer architectures including vector processors and shared-memory parallel systems. By reorganizing classical algorithms from predecessors like LINPACK and EISPACK, LAPACK achieves greater efficiency through reduced data movement and improved cache utilization, making it suitable for scientific and engineering applications requiring robust matrix operations in real and complex arithmetic. The design principles of LAPACK emphasize portability, performance, and reliability. Written in Fortran 77 (with later versions incorporating Fortran 90 features), it ensures transportability across diverse Fortran compilers and hardware platforms by avoiding machine-specific code and relying on standardized interfaces. A key principle is its dependence on the Basic Linear Algebra Subprograms (BLAS) for low-level operations, particularly Level 3 BLAS for block matrix computations, which allows vendors to provide optimized implementations tailored to specific hardware, thereby maximizing performance without altering the LAPACK source code. Backward compatibility with earlier libraries is maintained to facilitate migration, while numerical stability is prioritized through the use of proven algorithms and built-in error handling mechanisms, such as condition number estimators to assess solution reliability. LAPACK assumes adherence to the IEEE 754 floating-point standard for consistent behavior across systems. In terms of scope, LAPACK focuses on serial and shared-memory parallel computations, excluding distributed-memory environments (addressed separately by ScaLAPACK), and it does not cover general sparse matrices. Performance goals center on block-based algorithms that partition matrices into blocks to minimize memory accesses and enable efficient BLAS calls, targeting optimal asymptotic complexity of O(n3)O(n^3) operations for solving n×nn \times n linear systems where feasible. Tunable workspace parameters allow users to balance computation speed and memory usage, ensuring adaptability to varying hardware constraints.

Relation to BLAS and Predecessors

LAPACK emerged as a successor to two foundational libraries: , which focused on solving general systems of linear equations and was developed during the era of Level 1 and early Level 2 BLAS, emphasizing column-oriented operations for dense matrices; and EISPACK, a standalone collection of eigenvalue routines that operated independently without BLAS integration, often resulting in less efficient implementations on emerging high-performance architectures. By integrating and modernizing the algorithms from both predecessors, LAPACK created a cohesive package that addressed their fragmentation—LINPACK handled linear systems but not eigenvalues comprehensively, while EISPACK lacked broader linear algebra support—while leveraging Level 3 BLAS for block-based operations to enhance performance on vector and parallel processors. Central to LAPACK's design is its deep integration with the (BLAS), where high-level LAPACK routines invoke BLAS subprograms—such as DGEMM for general —to perform the core computational kernels. This separation allows LAPACK to focus on algorithmic logic while delegating low-level operations to BLAS, which can be replaced with machine-optimized implementations without modifying the LAPACK source code, thereby promoting portability across diverse hardware platforms. This evolution from LINPACK and EISPACK yielded significant benefits, including improved computational efficiency through blocking techniques that exploit Level 3 BLAS for matrix-matrix operations, achieving significant performance improvements on vector machines compared to the approaches of its predecessors. LAPACK also provides a unified interface for tackling mixed linear algebra problems, such as combining with eigenvalue computations, and Release 3.0 did not include all facilities from LINPACK and EISPACK, focusing instead on a streamlined set of routines for modern efficiency. Despite these advances, LAPACK inherits limitations from its BLAS foundation, remaining primarily Fortran-centric in its implementation and assuming column-major storage order, which aligns with Fortran conventions but may require adaptations for other languages or row-major formats.

Naming Conventions

Routine Naming Scheme

The LAPACK routine naming scheme employs a compact six-character convention to encode the , matrix characteristics, and computational task, facilitating quick identification of subroutine functionality within the constraints of Fortran 77 naming limits. Each routine name follows the pattern XYYZZZ, where the first character (X) specifies the precision and , the next two characters (YY) denote the matrix type, and the following three characters (ZZZ) indicate the specific operation or task. This structure was designed to promote portability and consistency across implementations. The first character distinguishes between real and complex numbers as well as single and double precision: 'S' for single-precision real, 'D' for double-precision real, 'C' for single-precision complex, and 'Z' for double-precision complex (also known as complex*16). The matrix type (YY) uses standardized codes such as 'GE' for general dense matrices, 'SY' or 'HE' for symmetric or Hermitian matrices (real or complex), 'TR' for triangular matrices, 'OR' or 'UN' for orthogonal or unitary matrices from QR factorizations, and 'PO' for positive definite matrices. The task portion (ZZZ) describes the computation, for example, 'TRF' for LU or Cholesky factorization, 'SV' for solving linear systems, 'EV' for eigenvalue problems, 'VD' for , or 'BRD' for bidiagonal reduction. Representative examples illustrate the scheme: DGETRF performs LU factorization on a double-precision general real matrix (D for double, GE for general, TRF for triangular factorization); ZHEEV computes for a double-precision complex (Z for double complex, HE for Hermitian, EV for eigenproblem). Routine names are written in uppercase for compatibility, and auxiliary or unblocked variants may append a digit like '2' (e.g., SGETF2 as the unblocked version of SGETRF) or follow BLAS naming patterns in some cases. Driver routines, which orchestrate complete solutions (e.g., xGESV for solving general linear systems), often lack additional suffixes, while computational routines for specific steps (e.g., xGETRF for ) form the core building blocks. This naming convention was established with the initial release of LAPACK version 1.0 in 1992 and has remained largely consistent, with minor extensions for new routines in subsequent versions, such as support for generalized eigenvalue problems in version 3.0 (1999).

Precision and Storage Conventions

LAPACK supports four precision variants for its routines, distinguished by a prefix in the routine name: single precision real (S), double precision real (D), single precision complex (C), and double precision complex (Z). These variants share identical algorithmic logic, with the primary difference being the scaling of the underlying arithmetic operations to match the precision level; for instance, double precision routines are automatically generated from single precision templates using tools like Toolpack/1 to ensure consistency. This templating approach allows users to select the appropriate precision based on the required accuracy and computational resources, while maintaining portability across systems that support standard floating-point representations. Matrices in LAPACK are stored in column-major order, aligning with 's default convention and facilitating efficient access patterns for block-based algorithms. For general dense matrices, storage uses full two-dimensional s, where leading dimensions may exceed the matrix size to accommodate subarrays or extra space per Fortran 77 rules. Specialized formats optimize storage for structured matrices: banded storage for matrices with limited bandwidth to reduce memory for sparse-like problems; triangular storage for upper or lower triangular matrices, holding only the relevant portion; and for symmetric or Hermitian matrices, either full storage of the entire or packed storage that compacts the redundant elements into a one-dimensional . Workspace requirements vary by routine and are specified via parameters like LWORK, which indicates the size of the work ; users must allocate sufficient space to avoid errors, with routines often providing mechanisms to query the optimal size. LAPACK's numerical computations assume adherence to the floating-point standard, particularly for handling infinities, NaNs, and roundoff errors, as configured by default in auxiliary routines like ILAENV. Error handling is managed through the INFO output parameter in most routines: INFO = 0 indicates successful completion, negative values signal invalid input arguments (with the specifying the offending parameter), and positive values denote computational issues such as singularity or non-convergence. For reliability assessment, many routines compute and return estimates like reciprocal condition numbers (RCOND) or residuals to quantify solution accuracy, enabling users to evaluate error propagation relative to . To enhance efficiency, LAPACK prioritizes in-place operations wherever feasible, allowing input matrices to be overwritten with results and thereby minimizing memory allocations and copies. A key optimization for workspace management is the query mode, invoked by setting LWORK = -1 (or equivalent for other work parameters), which runs the routine without performing the full computation to return the optimal workspace size in the first element of the work array, guiding users toward memory-efficient calls.

Core Functionality

Linear Equation Systems

LAPACK provides routines for solving systems of linear equations of the form Ax=bAx = b, where AA is a dense of order nn, xx and bb are vectors of length nn, and the system may have multiple right-hand sides stored as columns of a matrix BB. These routines support general matrices, symmetric indefinite matrices, symmetric positive definite matrices, and triangular matrices, enabling efficient solutions through matrix factorizations that can be reused for multiple right-hand sides. The library emphasizes and efficiency on high-performance architectures by relying on block-based algorithms that call level 3 BLAS operations. For general matrices, the simple driver routine DGESV computes the solution by first performing LU factorization with partial pivoting using DGETRF, which decomposes the permuted matrix as PA=LUPA = LU, where PP is a , LL is unit lower triangular, and UU is upper triangular, followed by forward and back substitution using DGETRS to solve LX=PBLX = PB and UX=L1PBUX = L^{-1}PB. The expert driver DGESVX extends this by including optional row and column scaling for equilibration to handle ill-conditioned systems, iterative refinement to improve accuracy, and estimation of condition numbers and error bounds. These steps ensure backward stability, with the residual norm bAx\|b - Ax\| estimable via routines like DLACON for monitoring solution quality. Error handling is managed through an INFO flag returned by the drivers: INFO = 0 indicates success, INFO > 0 signals a singular or nearly singular pivot (indicating rank deficiency), and INFO < 0 denotes an illegal argument in the calling sequence. Symmetric indefinite systems are addressed by drivers such as DSYSV, which uses the Bunch-Kaufman factorization via DSYTRF to decompose AA into a block diagonal form with 1×1 or 2×2 diagonal blocks and a block bidiagonal multiplier matrix, preserving symmetry while allowing stable computations without pivoting that could destroy the structure, followed by DSYTRS for the solve phase. For symmetric positive definite matrices, DPOSV employs Cholesky factorization with DPOTRF, yielding A=UTUA = U^T U (or LLTL L^T) where UU (or LL) is upper (lower) triangular, and then DPOTRS for efficient triangular solves, offering reduced computational cost compared to general methods due to the positive definiteness assumption. Both drivers support multiple right-hand sides and reuse of the factorization. The expert variants DSYSVX and DPOSVX incorporate equilibration, iterative refinement, and reciprocal condition number estimates for enhanced reliability on ill-conditioned problems. Triangular systems Tx=bTx = b, arising after factorization, are solved directly using routines like DTRSV, which performs forward or back substitution on upper or lower triangular matrices TT, with options for unit or non-unit diagonals and transposition; this routine is invoked internally by the drivers but can be called independently for efficiency when the triangular factor is available. Overall, these routines prioritize direct methods for dense matrices, with iterative refinement steps using techniques like those in DGECON for norm estimation to achieve high relative accuracy, typically on the order of machine epsilon times the condition number.

Eigenvalue and Singular Value Decomposition

LAPACK provides a comprehensive set of routines for solving eigenvalue problems, encompassing both standard and generalized forms for various matrix types, including symmetric, Hermitian, and general real or complex matrices. The standard eigenvalue problem seeks scalars λ\lambda and nonzero vectors xx satisfying Ax=λxA x = \lambda x, while the generalized problem extends this to Ax=λBxA x = \lambda B x, where AA and BB are square matrices. For symmetric or Hermitian matrices, eigenvalues are real, enabling specialized algorithms that exploit structure for efficiency and accuracy. For symmetric and Hermitian eigenvalue problems, LAPACK drivers such as DSYEVD and DSYEVR reduce the matrix to tridiagonal form using orthogonal transformations (e.g., via DSYTRD or CHETRD), followed by computation of eigenvalues and optionally eigenvectors using the divide-and-conquer algorithm in DSTEDC or the relatively robust representation (RRR) method in DSTEMR/DSTEGR, which ensures high relative accuracy even for clustered eigenvalues. These routines output eigenvalues in ascending order on a diagonal array and eigenvectors either overwriting the original matrix or in a separate array. For generalized symmetric-definite problems (Az=λBzA z = \lambda B z with BB positive definite), drivers like DSYGVD or DSYGVR perform Cholesky factorization on BB to reduce to a standard problem, followed by similar tridiagonal eigensolving steps. General nonsymmetric eigenvalue problems are addressed through routines like DGEEV, which computes eigenvalues and right eigenvectors by reducing to upper Hessenberg form (via DGEHRD) and applying the multishift QR iteration (HQR algorithm) to obtain the Schur form, where eigenvalues appear on the diagonal of a quasi-triangular matrix. DGEEV also balances the matrix via scaling to improve stability and accuracy before reduction. For problems requiring ordered Schur decomposition, DGEES computes the generalized Schur form with optional sorting based on user-defined criteria. In the generalized nonsymmetric case (Ax=λBxA x = \lambda B x), DGGEV and DGGES employ the QZ algorithm on a generalized Schur decomposition (A=QSZHA = Q S Z^H, B=QTZHB = Q T Z^H), with balancing via scaling of AA and BB to mitigate ill-conditioning; outputs include eigenvalue pairs (α,β)(\alpha, \beta) where λ=α/β\lambda = \alpha / \beta, right and left eigenvectors, and deflating subspaces. Expert routines, such as those with workspace query options (e.g., IWORK and LWORK parameters), allow fine control over memory allocation. Condition numbers for eigenvalues can be estimated using DLANGE to compute matrix norms. Singular value decomposition (SVD) in LAPACK computes the factorization A=UΣVHA = U \Sigma V^H for an m×nm \times n general real or complex matrix AA, where UU and VV are unitary, and Σ\Sigma is diagonal with nonnegative singular values σ1σmin(m,n)0\sigma_1 \geq \cdots \geq \sigma_{\min(m,n)} \geq 0 in descending order. Drivers DGESVD and the faster divide-and-conquer variant DGESDD first bidiagonalize AA using Householder transformations (via DGEBRD), then apply QR iteration (DBDSQR) or divide-and-conquer (DBDSDC) to the bidiagonal form for singular values and optionally vectors. These support full SVD (all columns of UU and VV) or partial (thin or truncated forms for economy). Balancing via scaling precedes bidiagonalization to enhance numerical stability, particularly for ill-conditioned matrices. Outputs include the singular values in a vector, left singular vectors in UU, and right singular vectors in VV, with expert control over workspace via query parameters; singular vector condition numbers can be assessed using DLANGE on relevant submatrices. For generalized SVD of matrix pairs (A,B)(A, B), routines like DGSVD compute decompositions involving shared orthogonal factors, with error bounds bounded by machine epsilon divided by the reciprocal condition number.
CategoryKey Driver RoutinesPrimary Algorithm
Symmetric/Hermitian EigenvaluesDSYEVD, DSYEVRTridiagonal reduction + divide-and-conquer or RRR
Generalized Symmetric-Definite EigenvaluesDSYGVD, DSYGVRCholesky reduction + standard symmetric solver
General Nonsymmetric EigenvaluesDGEEV, DGEESHessenberg reduction + multishift QR (HQR)
Generalized Nonsymmetric EigenvaluesDGGEV, DGGESQZ algorithm on generalized Schur form
SVDDGESVD, DGESDDBidiagonal reduction + QR or divide-and-conquer

Least Squares and Orthogonalization

LAPACK provides routines for solving linear problems of the form minxAxb2\min_x \|Ax - b\|_2, where AA is an m×nm \times n real matrix, xx is n×1n \times 1, and bb is m×1m \times 1. These problems arise in overdetermined systems (m>nm > n), where a approximation is sought, and underdetermined systems (m<nm < n), where a minimum-norm solution is typically desired among those minimizing the residual. The package favors orthogonal methods over forming the normal equations ATAx=ATbA^T A x = A^T b, as the latter can magnify conditioning errors due to squaring the . The primary driver routine for the general problem is DGELS, which handles both overdetermined and underdetermined cases using QR or LQ factorization. For mnm \geq n, DGELS computes the thin QR factorization A=QRA = Q R via the computational routine DGEQRF, applies QTQ^T to bb using DORMQR to obtain QTbQ^T b, and solves the upper triangular system Rx=QTbR x = Q^T b (or a portion thereof) with DTRTRS. For m<nm < n, it employs the LQ factorization A=LQA = L Q and solves analogously. This approach ensures by preserving the Euclidean norm through the orthogonal factor QQ. Multiple right-hand sides are supported by passing an m×nrhsm \times nrhs matrix BB, with solutions overwriting BB. For problems with linear equality constraints, such as minxAxc2\min_x \|A x - c\|_2 subject to Bx=dB x = d, where AA is m×nm \times n and BB is p×np \times n with m+pnm + p \leq n, the routine DGGLSE is used. It solves this via the generalized RQ factorization of (A,B)(A, B), computed through auxiliary routines like DGERQF and DORGRQ, followed by triangular solves. This factorization yields A=RQA = R Q and B=ZTB = Z T, where RR and TT are trapezoidal, enabling the constrained minimization. Orthogonalization in LAPACK centers on factorizations like QR, which decompose AA into an QQ and an upper triangular (or trapezoidal) matrix RR. The core routine DGEQRF computes the QR factorization for general rectangular matrices, producing both thin (m×min(m,n)m \times \min(m,n)) and full (m×mm \times m) variants implicitly through a compact representation. QQ is stored as a product of elementary Householder reflectors, each defined as H=IβvvT,H = I - \beta v v^T, where vv is a Householder vector with v1=1v_1 = 1, and β=2/(vTv)\beta = 2 / (v^T v), applied sequentially to triangularize AA. To explicitly form QQ or apply it to vectors, routines like DORGQR and DORMQR are invoked. For generalized problems, DGGES computes the QZ (generalized Schur) decomposition of a pair (A,B)(A, B), yielding A=QSZTA = Q S Z^T and B=QTZTB = Q T Z^T, where QQ and ZZ are orthogonal, and S,TS, T are triangular; this supports orthogonal transformations in generalized eigenvalue contexts but is distinct from standard . Rank-revealing capabilities are provided by the pivoted QR routine DGEQP3, which incorporates column pivoting to select numerically strong columns first, approximating a rank-revealing decomposition AP=QRA P = Q R. Pivoting helps detect the numerical rank by monitoring diagonal elements of RR against a user-specified tolerance, typically max(m,n)×ϵ×A\max(m,n) \times \epsilon \times \|A\|_\infty, where ϵ\epsilon is machine precision. Iterative refinement for residuals can be applied post-solution using auxiliary routines like DRSCL to scale and check accuracy. Complete orthogonal factorization for rank-deficient cases extends this via routines such as DGELSY, which combines pivoted QR with an RZ factorization (via DTZRZF) to yield AP=UTVTA P = U T V^T, where UU and VV are orthogonal, and TT is trapezoidal with the leading r×rr \times r block upper triangular; this enables basic solutions and pseudoinverses for underdetermined rank-deficient systems. These routines find applications in , where DGELS fits models by minimizing squared errors, and in computing Moore-Penrose pseudoinverses via the orthogonal factors for ill-posed problems. Rank detection via DGEQP3 tolerances aids in identifying effective dimensions in . While offers an alternative for via the pseudoinverse, QR-based methods in this section provide faster, more stable solutions for dense problems without full spectral information.

Implementations and Interfaces

Reference Implementation

The reference implementation of LAPACK is a portable library providing the standard, unoptimized codebase for routines. Primarily written in 90 with some Fortran 77 compatibility elements, it encompasses approximately 4000 subroutines covering linear systems, eigenvalue problems, decompositions, and related computations. The source code has been freely distributed via Netlib since the initial LAPACK version 1.0 release on February 29, 1992, and the official development repository on under the Reference-LAPACK organization was established around 2010 to facilitate and collaboration. Building the reference implementation involves a Makefile-driven process that configures compilation options via a customizable make.inc file and requires linking to an external BLAS library for low-level operations, as LAPACK delegates basic linear algebra tasks to BLAS. The included testing suite, known as xLAPACK, provides extensive validation by generating test matrices and verifying results against expected outputs, covering more than 90% of the code paths to detect numerical accuracy issues and bugs. This ensures reliability across installations, with users encouraged to run the full test suite post-compilation. Maintenance of the is community-driven, involving contributors from academia and industry who submit bug reports and proposed fixes through GitHub issues and pull requests. Updates are released periodically, with patch versions addressing specific errata. The codebase emphasizes portability, compiling on diverse platforms including , Windows, and macOS without any hardware-specific optimizations or assembly code, thereby relying on the chosen BLAS implementation for platform-tuned performance.

Optimized and Vendor-Specific Versions

Optimized and vendor-specific versions of LAPACK leverage hardware-specific features, such as vector extensions and multi-core architectures, to deliver substantial performance gains over the while preserving interface compatibility. These implementations typically incorporate hand-tuned assembly code for underlying BLAS operations, enabling efficient use of modern CPU capabilities. Intel's oneAPI (oneMKL) offers a highly optimized LAPACK suite tailored for x86 processors, including extensive threading via for shared-memory parallelism and support for SIMD instructions like to accelerate vector and matrix operations. As of November 2025, oneMKL has integrated bug fixes and updates from Netlib LAPACK 3.12.1. The library also facilitates GPU offloading for select routines through integration with heterogeneous computing frameworks like , allowing hybrid CPU-GPU execution for large-scale problems. AMD's Optimizing CPU Libraries (AOCL) provide AOCL-LAPACK, a performant implementation aligned with Netlib LAPACK 3.12.0 and tuned for Zen-based processors such as and series, with specific enhancements in routines like LU factorizations (e.g., DGETRF) and (DGESDD). It supports multi-threading and x86 instruction set architectures (ISA) extensions for improved scalability across core counts. Arm Performance Libraries deliver a LAPACK implementation optimized for -based systems, incorporating block algorithms that rely on efficient BLAS calls and threading for multi-processor environments, with support for vector extensions like and Scalable Vector Extension (SVE). Among open-source alternatives, stands out as a multithreaded library that auto-detects and tunes for various CPU architectures, implementing LAPACK routines alongside BLAS with assembly-optimized kernels for SIMD acceleration (e.g., AVX and ARMv8). Its predecessor, GotoBLAS2, emphasized cache blocking strategies and architecture-specific assembly to boost efficiency in linear algebra operations. These versions serve as drop-in replacements for the reference LAPACK, requiring no changes to calling code due to adherence to standard interfaces in Fortran and C. Performance benchmarks on modern hardware demonstrate speedups of 2-10x over the reference implementation for key routines, attributed to optimizations like AVX-512 vectorization and multi-threading, though exact gains vary by problem size and architecture. For instance, AOCL-LAPACK post-optimization achieves up to 110% improvement in GFLOPS for banded LU factorization compared to prior versions, often outperforming OpenBLAS on small matrices.

Language Bindings and Wrappers

LAPACKE, introduced with LAPACK version 3.3.0 in November 2010, serves as the official interface to LAPACK routines, enabling seamless integration into and programs by wrapping calls with C-compatible types such as lapack_int for integers and dedicated complex types like lapack_complex_double. This two-level design includes a high-level interface that automatically manages workspace memory allocation and deallocation using functions like LAPACKE_malloc and LAPACKE_free, while the middle-level interface requires users to provide pre-allocated arrays for greater control. Both levels support column-major and row-major matrix layouts, with optional NaN checking in the high-level variant to detect invalid inputs, and error codes are returned directly for propagation in calling code. LAPACK95, released in November 2000, provides an official Fortran 95 interface that enhances usability through object-oriented features, including generic interfaces and modules for accessing LAPACK drivers, computational routines, and auxiliary functions. It employs Fortran 95's module system to encapsulate routines, allowing for type-bound procedures and to simplify calls, such as treating matrices as objects in eigenvalue or solvers. Error handling follows LAPACK conventions but integrates with Fortran 95's optional arguments for flexible input validation and output. Third-party bindings extend LAPACK accessibility to higher-level languages, often incorporating automatic and language-native error mechanisms. In Python, the library's scipy.linalg module offers high-level wrappers around LAPACK routines, leveraging arrays for seamless type conversions—including support for complex dtypes like complex128—and in-place operations to optimize memory usage via flags like overwrite_a. Errors from LAPACK are propagated as Python exceptions, such as LinAlgError for singular matrices or numerical issues. Julia's standard LinearAlgebra module directly invokes LAPACK functions for core operations like LU factorization (dgetrf!) and SVD (dgesvd!), with automatic handling of array ownership and complex types through Julia's , raising domain errors for invalid computations. In , the base package integrates LAPACK calls for linear algebra tasks, such as eigen decompositions and least-squares solutions, using R's internal libRlapack or external implementations, with memory managed via R's vector allocations and errors signaled through warnings or stops. For , third-party wrappers like netlib-java provide JNI-based access to LAPACK and BLAS, supporting automatic library loading and fallback to pure-JVM implementations, while the older package— a basic pure-Java linear algebra library without direct LAPACK ties—has been largely deprecated in favor of such bindings. These bindings typically adapt LAPACK's routine naming scheme, prefixing with language-specific conventions like scipy.linalg.lapack.dgesv in Python. These language bindings have seen widespread adoption in scientific computing ecosystems, facilitating LAPACK's use in diverse applications from to simulations; for instance, routinely employs LAPACK via optimized backends like to deliver high-performance linear algebra on arrays.

Parallel and Distributed Extensions

LAPACK, primarily designed for shared-memory systems, has been extended through several projects to support parallel and distributed-memory environments, enabling efficient computation on (HPC) clusters. These extensions leverage message-passing interfaces like MPI to distribute data and computations across multiple nodes, addressing the limitations of single-node processing for large-scale linear algebra problems. The most prominent extension is ScaLAPACK (Scalable Linear Algebra PACKage), developed by the same team behind LAPACK and released starting in 1995. ScaLAPACK provides a subset of LAPACK routines redesigned for distributed-memory MIMD parallel computers, using block-partitioned algorithms and a two-dimensional block-cyclic data decomposition to minimize communication overhead. It builds on the Basic Linear Algebra Communication Subprograms (BLACS) for interprocessor communication and the Parallel BLAS (PBLAS) for distributed operations, with support for MPI or PVM. Key routines include drivers like pdgesv for solving distributed linear systems via LU factorization and pdgemm for , which rely on operations to synchronize data across processors. Funded by NSF and DOE grants, ScaLAPACK has been pivotal in scaling dense linear algebra to HPC systems. Another significant extension is , a modern C++ library introduced in for distributed-memory dense and sparse-direct linear algebra. Elemental implements LAPACK-like interfaces over MPI, supporting arbitrary-precision datatypes such as double and quad-precision, and includes routines for , QR decompositions, and eigenvalue problems. It can interface with ScaLAPACK for internodal computations when available, emphasizing portability across heterogeneous distributed systems. Developed initially at the and later maintained by , Elemental focuses on simplicity and performance for optimization and tasks in large-scale settings. PLAPACK (Parallel Linear Algebra PACKage), developed in the at the , offers higher-level abstractions for parallel dense linear algebra on distributed-memory machines. It extends ScaLAPACK by providing an object-oriented interface in C and , allowing users to describe algorithms in terms of matrix objects and operations rather than low-level data distributions. PLAPACK supports routines for LU factorization, eigenvalue solvers, and , with performance comparable to ScaLAPACK while simplifying parallel programming. Its last major release was in 2007, and it includes tools for visualization and tuning on MPI-based clusters. Runtime systems like (Parallel Runtime Scheduling and Execution Controller) further enhance parallelism in LAPACK extensions by enabling task-based scheduling for distributed linear algebra. Developed at the , PaRSEC manages micro-tasks on heterogeneous architectures using MPI, supporting domain-specific libraries such as DPLASMA for dense linear algebra kernels. It dynamically schedules tasks to optimize data locality and load balance, facilitating scalable implementations of algorithms like Cholesky factorization. These extensions are widely used in HPC environments for applications requiring massive parallelism, such as simulations in physics and modeling, where they scale to thousands of cores and handle matrices up to petascale sizes. For instance, ScaLAPACK-based solvers achieve efficient weak scaling on systems with several thousand processors when matrix dimensions are sufficiently large (e.g., order 5000–10,000). Collective operations in routines like pdgemm ensure synchronization, making them suitable for distributed clusters but requiring careful data partitioning to avoid bottlenecks.

Modern Successors and Alternatives

One prominent modern successor to LAPACK is PLASMA (Parallel Linear Algebra Software for Multicore Architectures), developed starting in 2008 by the Innovative Computing Laboratory at the , Knoxville (UTK). PLASMA employs tile-based algorithms that enable dynamic task scheduling on multicore processors and GPUs, facilitating asynchronous execution and improved data reuse in cache hierarchies. For instance, its zgesv routine solves complex linear systems via LU factorization with partial pivoting, achieving scalability on shared-memory systems through parallelism. Similarly, (Matrix Algebra on GPU and Multicore Architectures), initiated in 2009 by the same UTK group in collaboration with UC Berkeley and , extends LAPACK-like functionality to hybrid CPU-GPU environments. MAGMA leverages low-level libraries such as cuBLAS for GPU acceleration while using task-based parallelism for , as seen in its magma_dgesv routine for dense linear solvers. This design supports asynchronous task graphs, allowing overlapping of computation and communication to enhance performance on accelerator-rich nodes. SLATE (Software for Linear Algebra Targeting Exascale), developed starting in 2018 by the LAPACK development team at UTK, UC Berkeley, and , serves as a modern distributed-memory successor to ScaLAPACK. SLATE provides scalable implementations of LAPACK and ScaLAPACK routines for CPU and GPU clusters, using task-based scheduling and supporting heterogeneous architectures for . It includes functionality for linear solvers, eigenvalue problems, and factorizations, integrated into DOE's Exascale Computing Project. Other notable alternatives include BLIS (BLAS-like Library Instantiation Software), introduced in 2013 by the project at The University of Texas at Austin, which provides a framework for portable, high-performance implementations of BLAS and LAPACK operations through customizable block-linear algebra kernels. BLIS emphasizes ease of tuning for specific hardware, enabling developers to optimize micro-kernels without altering high-level interfaces. Complementing this, Intel's libXSMM library focuses on just-in-time generated micro-kernels for small dense and sparse matrix operations, particularly benefiting workloads on x86 architectures by minimizing overhead in batched computations. High-level frameworks like and offer GPU-native linear algebra via their respective linalg modules, integrating solvers for systems of equations, eigenvalues, and decompositions with support. These libraries prioritize seamless heterogeneous execution on GPUs through backends like cuSOLVER, abstracting low-level details for applications while maintaining compatibility with dense linear algebra tasks. Compared to traditional LAPACK, these successors provide advantages in asynchronous execution, native heterogeneous support for GPUs and manycores, and enhanced scalability toward exascale systems, as evidenced by their integration into U.S. Department of Energy initiatives like the Exascale Computing Project. Many retain LAPACK-compatible interfaces but shift to (DAG)-based task models for better resource utilization on modern hardware.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.