Hubbry Logo
search
logo

Computational complexity of matrix multiplication

logo
Community Hub0 Subscribers

Wikipedia

from Wikipedia

Unsolved problem in computer science
What is the fastest algorithm for matrix multiplication?

In theoretical computer science, the computational complexity of matrix multiplication dictates how quickly the operation of matrix multiplication can be performed. Matrix multiplication algorithms are a central subroutine in theoretical and numerical algorithms for numerical linear algebra and optimization, so finding the fastest algorithm for matrix multiplication is of major practical relevance.

Directly applying the mathematical definition of matrix multiplication gives an algorithm that requires n3 field operations to multiply two n × n matrices over that field (Θ(n3) in big O notation). Surprisingly, algorithms exist that provide better running times than this straightforward "schoolbook algorithm". The first to be discovered was Strassen's algorithm, devised by Volker Strassen in 1969 and often referred to as "fast matrix multiplication".[1] The optimal number of field operations needed to multiply two square n × n matrices up to constant factors is still unknown. This is a major open question in theoretical computer science.

As of January 2024, the best bound on the asymptotic complexity of a matrix multiplication algorithm is O(n2.371339).[2] However, this and similar improvements to Strassen are not used in practice, because they are galactic algorithms: the constant coefficient hidden by the big O notation is so large that they are only worthwhile for matrices that are too large to handle on present-day computers.[3][4]

Simple algorithms

[edit]

If A, B are two n × n matrices over a field, then their product AB is also an n × n matrix over that field, defined entrywise as

Schoolbook algorithm

[edit]

The simplest approach to computing the product of two n × n matrices A and B is to compute the arithmetic expressions coming from the definition of matrix multiplication. In pseudocode:

input A and B, both n by n matrices
initialize C to be an n by n matrix of all zeros
for i from 1 to n:
    for j from 1 to n:
        for k from 1 to n:
            C[i][j] = C[i][j] + A[i][k]*B[k][j]
output C (as A*B)

This algorithm requires multiplications and additions of scalars for computing the product of two square n×n matrices. Its computational complexity is therefore , in a model of computation where field operations (addition and multiplication) take constant time (in practice, this is the case for floating point numbers, but not necessarily for integers).

Strassen's algorithm

[edit]

Strassen's algorithm improves on naive matrix multiplication through a divide-and-conquer approach. The key observation is that multiplying two 2 × 2 matrices can be done with only seven multiplications, instead of the usual eight (at the expense of 11 additional addition and subtraction operations). This means that, treating the input n×n matrices as block 2 × 2 matrices, the task of multiplying two n×n matrices can be reduced to seven subproblems of multiplying two n/2×n/2 matrices. Applying this recursively gives an algorithm needing field operations.

Unlike algorithms with faster asymptotic complexity, Strassen's algorithm is used in practice. The numerical stability is reduced compared to the naive algorithm,[5] but it is faster in cases where n > 100 or so[6] and appears in several libraries, such as BLAS.[7] Fast matrix multiplication algorithms cannot achieve component-wise stability, but some can be shown to exhibit norm-wise stability.[8] It is very useful for large matrices over exact domains such as finite fields, where numerical stability is not an issue.

Matrix multiplication exponent

[edit]
Improvement of estimates of exponent ω over time for the computational complexity of matrix multiplication
Closeup of 1990–2024
Timeline of matrix multiplication exponent
Year Bound on ω Authors
1969 2.8074 Strassen[1]
1978 2.796 Pan[9]
1979 2.780 Bini, Capovani [it], Romani[10]
1981 2.522 Schönhage[11]
1981 2.517 Romani[12]
1981 2.496 Coppersmith, Winograd[13]
1986 2.479 Strassen[14]
1990 2.3755 Coppersmith, Winograd[15]
2010 2.3737 Stothers[16]
2012 2.3729 Williams[17][18]
2014 2.3728639 Le Gall[19]
2020 2.3728596 Alman, Williams[20][21]
2022 2.371866 Duan, Wu, Zhou[22]
2024 2.371552 Williams, Xu, Xu, and Zhou[23]
2024 2.371339 Alman, Duan, Williams, Xu, Xu, and Zhou[2]

The matrix multiplication exponent, usually denoted ω, is the smallest real number for which any two matrices over a field can be multiplied together using field operations. This notation is commonly used in algorithms research, so that algorithms using matrix multiplication as a subroutine have bounds on running time that can update as bounds on ω improve.

Using a naive lower bound and schoolbook matrix multiplication for the upper bound, one can straightforwardly conclude that 2 ≤ ω ≤ 3. Whether ω = 2 is a major open question in theoretical computer science, and there is a line of research developing matrix multiplication algorithms to get improved bounds on ω.

All recent algorithms in this line of research use the laser method, a generalization of the Coppersmith–Winograd algorithm, which was given by Don Coppersmith and Shmuel Winograd in 1990 and was the best matrix multiplication algorithm until 2010.[24] The conceptual idea of these algorithms is similar to Strassen's algorithm: a method is devised for multiplying two k × k-matrices with fewer than k3 multiplications, and this technique is applied recursively. The laser method has limitations to its power: Ambainis, Filmus and Le Gall prove that it cannot be used to show that ω < 2.3725 by analyzing higher and higher tensor powers of a certain identity of Coppersmith and Winograd and neither ω < 2.3078 for a wide class of variants of this approach.[25] In 2022 Duan, Wu and Zhou devised a variant breaking the first of the two barriers with ω < 2.37188,[22] they do so by identifying a source of potential optimization in the laser method termed combination loss for which they compensate using an asymmetric version of the hashing method in the Coppersmith–Winograd algorithm.

Nonetheless, the above are classical examples of galactic algorithms. On the opposite, the above Strassen's algorithm of 1969 and Pan's algorithm of 1978, whose respective exponents are slightly above and below 2.78, have constant coefficients that make them feasible.[26]

Group theory reformulation of matrix multiplication algorithms

[edit]

Henry Cohn, Robert Kleinberg, Balázs Szegedy and Chris Umans put methods such as the Strassen and Coppersmith–Winograd algorithms in an entirely different group-theoretic context, by utilising triples of subsets of finite groups which satisfy a disjointness property called the triple product property (TPP). They also give conjectures that, if true, would imply that there are matrix multiplication algorithms with essentially quadratic complexity. This implies that the optimal exponent of matrix multiplication is 2, which most researchers believe is indeed the case.[4] One such conjecture is that families of wreath products of Abelian groups with symmetric groups realise families of subset triples with a simultaneous version of the TPP.[27][28] Several of their conjectures have since been disproven by Blasiak, Cohn, Church, Grochow, Naslund, Sawin, and Umans using the Slice Rank method.[29] Further, Alon, Shpilka and Chris Umans have recently shown that some of these conjectures implying fast matrix multiplication are incompatible with another plausible conjecture, the sunflower conjecture,[30] which in turn is related to the cap set problem.[29]

Lower bounds for ω

[edit]

There is a trivial lower bound of . Since any algorithm for multiplying two n × n-matrices has to process all 2n2 entries, there is a trivial asymptotic lower bound of Ω(n2) operations for any matrix multiplication algorithm. Thus . It is unknown whether . The best known lower bound for matrix-multiplication complexity is Ω(n2 log(n)), for bounded coefficient arithmetic circuits over the real or complex numbers, and is due to Ran Raz.[31]

It is known that, under the model of computation typically studied, there is no matrix multiplication algorithm that uses precisely O(nω) operations; there must be an additional factor of no(1).[13]

Rectangular matrix multiplication

[edit]

Similar techniques also apply to rectangular matrix multiplication. The central object of study is , which is the smallest such that one can multiply a matrix of size with a matrix of size with arithmetic operations. A result in algebraic complexity states that multiplying matrices of size and requires the same number of arithmetic operations as multiplying matrices of size and and of size and , so this encompasses the complexity of rectangular matrix multiplication.[32] This generalizes the square matrix multiplication exponent, since .

Since the output of the matrix multiplication problem is size , we have for all values of . If one can prove for some values of between 0 and 1 that , then such a result shows that for those . The largest k such that is known as the dual matrix multiplication exponent, usually denoted α. α is referred to as the "dual" because showing that is equivalent to showing that . Like the matrix multiplication exponent, the dual matrix multiplication exponent sometimes appears in the complexity of algorithms in numerical linear algebra and optimization.[33]

The first bound on α is by Coppersmith in 1982, who showed that .[34] The current best peer-reviewed bound on α is , given by Williams, Xu, Xu, and Zhou.[23]

[edit]

Problems that have the same asymptotic complexity as matrix multiplication include determinant, matrix inversion, Gaussian elimination (see next section). Problems with complexity that is expressible in terms of include characteristic polynomial, eigenvalues (but not eigenvectors), Hermite normal form, and Smith normal form.[citation needed]

Matrix inversion, determinant and Gaussian elimination

[edit]

In his 1969 paper, where he proved the complexity for matrix computation, Strassen proved also that matrix inversion, determinant and Gaussian elimination have, up to a multiplicative constant, the same computational complexity as matrix multiplication. The proof does not make any assumptions on matrix multiplication that is used, except that its complexity is for some .

The starting point of Strassen's proof is using block matrix multiplication. Specifically, a matrix of even dimension 2n×2n may be partitioned in four n×n blocks Under this form, its inverse is provided that A and are invertible.

Thus, the inverse of a 2n×2n matrix may be computed with two inversions, six multiplications and four additions or additive inverses of n×n matrices. It follows that, denoting respectively by I(n), M(n) and A(n) = n2 the number of operations needed for inverting, multiplying and adding n×n matrices, one has If one may apply this formula recursively: If and one gets eventually for some constant d.

For matrices whose dimension is not a power of two, the same complexity is reached by increasing the dimension of the matrix to a power of two, by padding the matrix with rows and columns whose entries are 1 on the diagonal and 0 elsewhere.

This proves the asserted complexity for matrices such that all submatrices that have to be inverted are indeed invertible. This complexity is thus proved for almost all matrices, as a matrix with randomly chosen entries is invertible with probability one.

The same argument applies to LU decomposition, as, if the matrix A is invertible, the equality defines a block LU decomposition that may be applied recursively to and for getting eventually a true LU decomposition of the original matrix.

The argument applies also for the determinant, since it results from the block LU decomposition that

Minimizing number of multiplications

[edit]

Related to the problem of minimizing the number of arithmetic operations is minimizing the number of multiplications, which is typically a more costly operation than addition. A algorithm for matrix multiplication must necessarily only use multiplication operations, but these algorithms are impractical. Improving from the naive multiplications for schoolbook multiplication, matrices in can be done with 47 multiplications,[35] matrix multiplication over a commutative ring can be done in 21 multiplications[36][37] (23 if non-commutative[38]). The lower bound of multiplications needed is 2mn+2nm−2 (multiplication of n×m matrices with m×n matrices using the substitution method, ), which means n=3 case requires at least 19 multiplications and n=4 at least 34.[39] For n=2 optimal seven multiplications and 15 additions are minimal, compared to only four additions for eight multiplications.[40][41]

See also

[edit]

References

[edit]
[edit]

Grokipedia

from Grokipedia
The computational complexity of matrix multiplication is a central problem in algebraic complexity theory, concerned with determining the minimal number of arithmetic operations (additions and multiplications) required to compute the product of two matrices over a field, such as the real or complex numbers.[1] For two n × n matrices, the standard row-column algorithm performs Θ(n³) scalar multiplications and additions, but since 1969, a series of algorithms has progressively reduced the asymptotic exponent ω, where the complexity is O(n^ω), with the current best upper bound being ω < 2.371339 as of 2024. This exponent measures the intrinsic difficulty of the problem and has profound implications for algorithms in linear algebra, machine learning, scientific computing, and cryptography, as matrix multiplication is a bottleneck in operations like solving linear systems and neural network training.[2] The quest for faster matrix multiplication began with Volker Strassen's 1969 discovery of a divide-and-conquer algorithm that multiplies two 2 × 2 matrices using only 7 scalar multiplications instead of 8, yielding an overall complexity of O(n^{log_2 7}) ≈ O(n^{2.807}) for n × n matrices by recursive application. This breakthrough shattered the perceived optimality of the cubic-time method and inspired subsequent improvements, including Victor Pan's 1978 algorithm achieving O(n^{2.795}), Dario Bini et al.'s 1979 approximate approach at O(n^{2.78}), and Arnold Schönhage's 1981 method reaching O(n^{2.548}). By 1990, Don Coppersmith and Shmuel Winograd introduced the "laser method" for tensor decomposition, pushing ω < 2.376, a technique that has underpinned most advances since. Further refinements followed, such as the 2012 work by Virginia Vassilevska Williams yielding ω < 2.3729, and the 2022 refinement by Ran Duan, Hongxun Wu, and Yu Zhou to ω < 2.371866 via asymmetric hashing. Recent progress has accelerated, driven by sophisticated algebraic geometry and combinatorial techniques. In 2023, Virginia Vassilevska Williams, Yinzhan Xu, Zixuan Xu, and Renfei Zhou improved the bound to ω ≤ 2.371552 using a new laser variant that analyzes higher-order tensor powers more efficiently.[2] This was quickly surpassed in 2024 by Josh Alman, Ran Duan, Virginia Vassilevska Williams, Yinzhan Xu, Zixuan Xu, and Renfei Zhou, who achieved ω < 2.371339 by exploiting asymmetries in rectangular matrix multiplication and eliminating hidden inefficiencies in prior decompositions.[3] These algorithms, while theoretically groundbreaking, remain impractical for moderate n due to high constant factors and intricate implementations; in practice, optimized O(n^3) variants like those in BLAS libraries dominate. Despite these upper bounds approaching 2, the lower bound on ω remains stubbornly at 2, established trivially since matrix multiplication requires at least Ω(n^2) operations to read and write the output, with no non-trivial improvements proven in over 50 years.[4] Conjectures suggest ω = 2, implying a quadratic-time algorithm exists, but barriers like the lack of progress on the 3 × 3 case (where the minimal number of multiplications is unknown, bounded between 19 and 23) persist. Related problems, such as rectangular matrix multiplication (for m × n and n × p matrices) and its connections to the all-pairs shortest paths problem, further illuminate the challenges, with exponents like α (for n × n^α cases) also seeing recent advances to α ≥ 0.321334.[2] The field remains active, with machine learning-aided discoveries and geometric methods promising further insights into this enduring open question.[5]

Basic Algorithms

Standard Matrix Multiplication

The standard matrix multiplication algorithm, commonly known as the schoolbook method, computes the product C=ABC = AB of two n×nn \times n matrices AA and BB by determining each entry cijc_{ij} of the resulting matrix CC as the dot product of the ii-th row of AA and the jj-th column of BB, given by
cij=k=1naikbkj. c_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj}.
This approach directly implements the definition of matrix multiplication, where each of the n2n^2 entries in CC requires nn pairwise scalar multiplications followed by n1n-1 additions to sum the products.[6] The time complexity of this algorithm is O(n3)O(n^3) arithmetic operations, consisting of exactly n3n^3 scalar multiplications and n3n2n^3 - n^2 scalar additions, assuming constant-time operations on the matrix elements. This cubic dependence arises because the three nested loops—over rows of AA, columns of BB, and the summation index—each iterate nn times. In terms of space complexity, the algorithm requires O(n2)O(n^2) space to store the input matrices AA and BB as well as the output matrix CC, with only O(1)O(1) additional space for temporary variables during computation.[6] To illustrate, consider multiplying two 2×22 \times 2 matrices:
A=(a11a12a21a22),B=(b11b12b21b22). A = \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}, \quad B = \begin{pmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{pmatrix}.
The entry c11c_{11} is computed as a11b11+a12b21a_{11} b_{11} + a_{12} b_{21}, requiring 2 multiplications and 1 addition. Similarly, c12=a11b12+a12b22c_{12} = a_{11} b_{12} + a_{12} b_{22}, c21=a21b11+a22b21c_{21} = a_{21} b_{11} + a_{22} b_{21}, and c22=a21b12+a22b22c_{22} = a_{21} b_{12} + a_{22} b_{22}, for a total of 8 multiplications and 4 additions, consistent with the general formula for n=2n=2. Pseudocode for the general case is as follows:
for i from 1 to n:
    for j from 1 to n:
        c_ij = 0
        for k from 1 to n:
            c_ij = c_ij + a_ik * b_kj
This method forms the foundational approach taught in introductory linear algebra courses and traces its origins to the mid-19th century, when Arthur Cayley formalized matrix multiplication in 1855–1858 as a means to represent the composition of linear transformations.[7][8]

Strassen's Algorithm

Strassen's algorithm, developed by Volker Strassen in 1969, marked the first sub-cubic time algorithm for matrix multiplication, reducing the computational complexity from the standard O(n3)O(n^3) to O(nlog27)O(n2.807)O(n^{\log_2 7}) \approx O(n^{2.807}) scalar multiplications for n×nn \times n matrices over a ring.[9] This breakthrough demonstrated that Gaussian elimination, which relies on O(n3)O(n^3) operations, is not optimal for solving systems of linear equations, as faster matrix multiplication directly accelerates such problems.[9] The algorithm uses a divide-and-conquer paradigm, recursively partitioning each n×nn \times n matrix AA and BB (with nn a power of 2) into four (n/2)×(n/2)(n/2) \times (n/2) blocks: A=(A11A12A21A22)A = \begin{pmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{pmatrix} and similarly for BB. Rather than performing the eight block multiplications required by the conventional approach, it computes only seven intermediate products M1M_1 through M7M_7, using additional block additions and subtractions to reconstruct the result blocks Cij=ABC_{ij} = A B.[9] For the base case of 2×22 \times 2 matrices with entries aija_{ij} and bijb_{ij}, the products are:
M1=(a11+a22)(b11+b22),M2=(a21+a22)b11,M3=a11(b12b22),M4=a22(b21b11),M5=(a11+a12)b22,M6=(a21a11)(b11+b12),M7=(a12a22)(b21+b22). \begin{align*} M_1 &= (a_{11} + a_{22})(b_{11} + b_{22}), \\ M_2 &= (a_{21} + a_{22}) b_{11}, \\ M_3 &= a_{11} (b_{12} - b_{22}), \\ M_4 &= a_{22} (b_{21} - b_{11}), \\ M_5 &= (a_{11} + a_{12}) b_{22}, \\ M_6 &= (a_{21} - a_{11}) (b_{11} + b_{12}), \\ M_7 &= (a_{12} - a_{22}) (b_{21} + b_{22}). \end{align*}
The output blocks are then formed as c11=M1+M4M5+M7c_{11} = M_1 + M_4 - M_5 + M_7, c12=M3+M5c_{12} = M_3 + M_5, c21=M2+M4c_{21} = M_2 + M_4, and c22=M1M2+M3M6c_{22} = M_1 - M_2 + M_3 - M_6.[9] This scheme requires 18 additions/subtractions per recursion level, compared to 8 for the standard method. The time complexity follows the recurrence T(n)=7T(n/2)+Θ(n2)T(n) = 7 T(n/2) + \Theta(n^2), where the Θ(n2)\Theta(n^2) term accounts for the additions at each level; solving this via the master theorem yields T(n)=Θ(nlog27)T(n) = \Theta(n^{\log_2 7}).[9] In practice, the recursive overhead and higher constant factors from the additions make the algorithm slower than the standard O(n3)O(n^3) method for small nn, with benefits emerging only for matrices larger than a few hundred dimensions, such as n500n \gtrapprox 500 in optimized implementations.[10] Strassen-like methods extend this idea to larger block sizes, such as 3×33 \times 3 partitions, which compute the product using 23 multiplications instead of the naive 27, potentially reducing constants for certain practical scenarios despite not improving the asymptotic exponent.[11]

The Matrix Multiplication Exponent

Definition and Historical Development

The matrix multiplication exponent, denoted ω\omega, is defined as the infimum of the real numbers α\alpha such that the product of two n×nn \times n matrices over an arbitrary field (such as the real or complex numbers) can be computed using O(nα)O(n^\alpha) arithmetic operations in the limit as nn \to \infty.[12] This measure captures the asymptotic arithmetic complexity of the problem, focusing on the exponent rather than leading constants or lower-order terms. Trivially, ω2\omega \geq 2, since the output is an n×nn \times n matrix with n2n^2 entries, each of which requires at least a constant number of operations in the worst case, with a more rigorous argument coming from the rank of the output matrix.[13] The classical O(n3)O(n^3) algorithm for matrix multiplication, based on the straightforward summation over indices, immediately establishes the upper bound ω3\omega \leq 3.[13] The historical development of bounds on ω\omega began in earnest with Volker Strassen's seminal 1969 paper, which introduced the first sub-cubic algorithm by showing that two 2×22 \times 2 matrices could be multiplied using only 7 scalar multiplications instead of 8, leading to the recursive upper bound ωlog272.807\omega \leq \log_2 7 \approx 2.807.[14] This breakthrough shifted the focus to algebraic techniques, viewing matrix multiplication as the evaluation of a trilinear form or, equivalently, the decomposition of the matrix multiplication tensor n,n,n\langle n, n, n \rangle.[12] In the 1970s, researchers like Winograd, Pan, and Schönhage built on this by seeking low-rank tensor decompositions for larger block sizes, gradually improving the upper bounds through manual and semi-automated searches for efficient bilinear algorithms.[13] These algebraic methods established the tensor rank as a key tool for bounding ω\omega, since the minimal number of multiplications needed relates directly to the rank of the tensor, and asymptotic extensions allow recursive constructions for arbitrary nn.[12] A major advance came in 1986 with the introduction of the "laser method" by Volker Strassen, which systematically analyzes powers of structured tensors to eliminate redundant computations. Coppersmith and Winograd applied and refined this method in their 1987 work, yielding ω2.376\omega \leq 2.376. Their follow-up 1990 work refined this using arithmetic progressions to construct improved rectangular matrix multiplications, implying the tighter square bound ω2.3755\omega \leq 2.3755.[15] Subsequent improvements were sporadic until the 2010s, when refined laser techniques and automated searches produced incremental gains, such as ω2.3737\omega \leq 2.3737 by Stothers in 2010 and ω2.3728639\omega \leq 2.3728639 by Le Gall in 2014.[13][16] Further refinements include ω<2.37286\omega < 2.37286 by Alman and Williams in 2020.[17] The 2023 work by Williams, Xu, Xu, and Zhou achieved ω2.371552\omega \leq 2.371552.[2] The most recent milestone is the 2024 paper by Alman, Duan, Williams, Yinzhan Xu, Zixuan Xu, and Zhou, which exploits greater asymmetry in tensor analyses to achieve ω<2.371339\omega < 2.371339.[3]
YearAuthorsUpper Bound on ω\omega
1969Strassen2.807
1981Schönhage2.522
1987Coppersmith & Winograd2.376
1990Coppersmith & Winograd2.3755
2010Stothers2.3737
2014Le Gall2.3728639
2020Alman & Williams2.37286
2023Williams et al.2.371552
2024Alman et al.2.371339
As of 2025, no major breakthroughs have surpassed the 2024 bound, and ω\omega is widely conjectured to equal 2—matching the lower bound—though this remains unproven.[3] The pursuit of tighter bounds on ω\omega has profound implications beyond direct matrix operations, as faster multiplication accelerates algorithms in graph theory (e.g., all-pairs shortest paths in O(nω)O(n^\omega) time), linear programming, and machine learning tasks like transformer models that rely on repeated matrix products.[13]

Upper Bounds and Algorithmic Improvements

Following Strassen's breakthrough, subsequent improvements to the upper bound on the matrix multiplication exponent ω have relied on a combination of algebraic, combinatorial, and geometric techniques to construct faster algorithms. Algebraic approaches, such as tensor decomposition, seek low-rank representations of the matrix multiplication tensor to enable recursive algorithms with reduced complexity. Combinatorial methods, exemplified by the laser method, systematically search for structured decompositions that minimize the number of multiplications needed. Geometric perspectives, including analyses of tensor powers and asymptotic slice ranks, provide tools to bound the efficiency of these decompositions by considering symmetries and overlaps in high-dimensional spaces.[16][18] In the 1980s, variants of the Coppersmith-Winograd algorithm advanced the state-of-the-art using algebraic constructions based on arithmetic progressions and tensor powers, achieving ω < 2.376. This marked a significant refinement over Strassen's ω < 2.807, by exploiting higher-degree extensions of the bilinear map for matrix multiplication to reduce the exponent through recursive application. Building on this foundation, Virginia Vassilevska Williams introduced an automated combinatorial framework in 2012 that enumerates potential tensor decompositions, yielding ω < 2.3729.[19] Further progress came from refinements to the laser method, originally proposed by Strassen for detecting low-rank subtensors. In 2020, Josh Alman and Virginia Vassilevska Williams refined this method by introducing asymmetric hashing and improved marginal distribution analyses to mitigate "combination loss"—the inefficiency arising from unintended overlaps in tensor slices—applied to powers of the Coppersmith-Winograd tensor, achieving ω < 2.37286. The laser method operates by identifying witness tensors that certify low-rank decompositions; it labels overlapping blocks as "garbage" for elimination while preserving valuable computations, with refinements enhancing the recovery of signal from noise in large tensor powers. In 2022, the AlphaTensor project by Fawzi et al. used reinforcement learning to discover novel small-scale tensor decompositions for matrix multiplication, improving algorithms for fixed small dimensions but not directly advancing the asymptotic exponent ω.[17][5] Recent advancements have focused on eliminating hidden inefficiencies in the laser method's hashing and combination steps. In 2023, Virginia Vassilevska Williams, Yinzhan Xu, Zixuan Xu, and Renfei Zhou developed a variant incorporating alpha-to-omega connections, improving asymmetry in tensor constructions to achieve ω ≤ 2.371552.[2] In 2024, Alman, Duan, Williams, Yinzhan Xu, Zixuan Xu, and Zhou further refined the approach with increased asymmetry in hashing, yielding the current best upper bound of ω < 2.371339 through more precise control over decomposition ranks in asymmetric settings.[3] These 2024 breakthroughs eliminated subtle losses in prior methods by optimizing witness tensor selections and reducing overhead in recursive recursions. Despite these theoretical gains, practical implementations of these algorithms suffer from large hidden constants and intricate recursion depths, preventing real-world speedups over classical methods like Strassen's for matrices up to thousands in dimension. For instance, the constant factors in laser-based algorithms grow exponentially with recursion levels, making them slower than O(n^3) implementations on modern hardware for typical sizes encountered in machine learning or scientific computing.[20] Open challenges persist in closing the gap to ω = 2, which would imply near-linear time matrix multiplication and profound implications for algebraic complexity classes. While current methods suggest potential achievability over finite fields—where modular arithmetic might enable tighter decompositions—no such bound has been realized, and bridging the theoretical-practical divide remains a key barrier.[20]

Theoretical Frameworks

Algebraic Reformulations

Matrix multiplication can be reformulated algebraically as a trilinear form on three matrices A,B,CCn×nA, B, C \in \mathbb{C}^{n \times n}, defined by A,B,C=\trace(ABCT)=i,j,kAijBjkCik\langle A, B, C \rangle = \trace(A B C^T) = \sum_{i,j,k} A_{ij} B_{jk} C_{ik}, which equals the Frobenius inner product AB,CTF\langle A B, C^T \rangle_F of the product matrix ABA B and CTC^T.[21] This perspective views the operation as a trilinear map from the tensor product Cn2Cn2Cn2\mathbb{C}^{n^2} \otimes \mathbb{C}^{n^2} \otimes \mathbb{C}^{n^2} to C\mathbb{C}, encapsulated by the matrix multiplication tensor MnCn2Cn2Cn2M\langle n \rangle \in \mathbb{C}^{n^2} \otimes \mathbb{C}^{n^2} \otimes \mathbb{C}^{n^2}, where the tensor's structure encodes the bilinear pairings between input matrices to produce the output.[21] Such a representation shifts the focus from direct algorithmic steps to the intrinsic algebraic complexity of decomposing this tensor. The computational complexity is intimately tied to the tensor rank of MnM\langle n \rangle, defined as the smallest integer rr such that the tensor decomposes as a sum of rr rank-1 tensors, each of the form uvwu \otimes v \otimes w for vectors u,v,wCn2u, v, w \in \mathbb{C}^{n^2}.[22] A decomposition of rank rr corresponds to an algorithm computing the multiplication using rr scalar multiplications, establishing an upper bound on the arithmetic complexity.[23] The matrix multiplication exponent ω\omega is then given by ω=inf{τ\rank(Mn)=O(nτ)}\omega = \inf \{ \tau \mid \rank(M\langle n \rangle) = O(n^\tau) \}, where the infimum is over real τ2\tau \geq 2, linking the asymptotic growth of the tensor rank directly to the exponent.[24] Over algebraically closed fields like C\mathbb{C}, this rank measures the minimal dimension of intermediate spaces in bilinear algorithms for the map.[22] Distinguishing between exact rank and border rank is crucial for upper bounds, as the latter allows approximations via limits of rank-rr tensors, providing a more flexible measure for asymptotic analysis.[21] Border rank equals the usual rank in finite dimensions but permits schemes where intermediate computations approach the exact tensor, yielding valid algorithms in the limit.[24] For instance, Strassen's algorithm achieves an exact rank-7 decomposition for the 2×22 \times 2 case, M2M\langle 2 \rangle, by expressing the eight entries of the product as seven nonlinear combinations of the inputs, reducing multiplications from eight to seven.[25] This decomposition is a slice of the full tensor, where the 2×22 \times 2 multiplication embeds into larger instances, enabling recursive generalization to arbitrary nn via block partitioning.[21] A group-theoretic reformulation leverages representation theory of the general linear group GL(n,C)GL(n, \mathbb{C}), under which MnM\langle n \rangle transforms as the tensor product of the standard representation and its duals.[26] Algorithms correspond to decompositions into rank-rr factors invariant under the subgroup ΓGL(n)6\Gamma \leq GL(n)^6 acting diagonally on inputs and outputs, with the symmetry group ΓS3Z23\Gamma \cong S_3 \rtimes \mathbb{Z}_2^3 for the n×nn \times n case facilitating equivariant analyses.[21] Such decompositions exploit the irreducible representations of GL(n)GL(n), labeled by Young diagrams, to bound the multiplicity of components in the tensor's expansion, thereby constraining possible ranks.[26] These algebraic frameworks enable the application of tools from algebraic geometry, such as secant varieties σr(V)\sigma_r(V)—the closures of spans of rr points on the Segre variety VV of rank-1 tensors—to study the dimension and membership of MnM\langle n \rangle in σr\sigma_r, providing both upper and structural bounds on tensor ranks.[21] For example, representation-theoretic decompositions of ideals in coordinate rings yield lower bounds on ranks by identifying vanishing multiplicities of invariants.[26] This geometric lens has been instrumental in refining estimates for ω\omega, bridging pure algebra with computational bounds.[27]

Lower Bounds

The matrix multiplication exponent ω\omega satisfies ω2\omega \geq 2, as the output matrix consists of n2n^2 entries that are linearly independent over the field in general position, each requiring at least one arithmetic operation, thereby necessitating Ω(n2)\Omega(n^2) scalar multiplications and additions in total.[28] In the unrestricted model of algebraic circuits, the strongest known lower bounds for the tensor rank (corresponding to the number of scalar multiplications) remain quadratic with leading coefficient 3, specifically at least 3n222n3/23n3n^2 - 2\sqrt{2} n^{3/2} - 3n for multiplying n×nn \times n matrices over algebraically closed fields of characteristic zero.[29] In restricted models, nonlinear lower bounds exceeding the trivial Ω(n2)\Omega(n^2) have been established. For arithmetic circuits with bounded coefficients (constants of absolute value at most a fixed integer) over the reals or complexes, Ran Raz proved that computing the n×nn \times n matrix product demands Ω(n2logn)\Omega(n^2 \log n) operations, using a geometric variant of rigidity and properties of bilinear circuits. Similarly, over finite fields, Amir Shpilka demonstrated lower bounds on the number of product gates in bilinear circuits, requiring Ω(n2logn/loglogn)\Omega(n^{2} \log n / \log \log n) such gates in some cases, while Raz and Shpilka extended this to superlinear edge counts, Ω(n1+ϵ)\Omega(n^{1+\epsilon}) for any ϵ>0\epsilon > 0, in constant-depth circuits with arbitrary gates.[30][31] These results show ω>2\omega > 2 in depth-bounded or gate-restricted settings but do not extend to general circuits. Communication complexity techniques provide additional insights into limitations, particularly in distributed and parallel models. For distributed-memory computations, the amount of data transferred between processors to perform n×nn \times n matrix multiplication is at least Ω(n2/M)\Omega(n^2 / \sqrt{M}) words, where MM is the local memory per processor; this bound, derived via Loomis-Whitney inequalities on computation graphs, implies that achieving ω=2\omega = 2 would require asymptotically optimal communication, yet the proven lower bound exceeds what quadratic-time algorithms can achieve without excessive transfers, indirectly supporting ω>2\omega > 2 in practical settings.[32][33] Geometric complexity theory (GCT) offers a representation-theoretic framework for proving lower bounds on ω\omega, by comparing invariants of the matrix multiplication tensor to those of simpler polynomials like the determinant. Introduced by Ketan Mulmuley and Milind Sohoni, GCT embeds algebraic complexity into orbit closures under group actions, enabling proofs via occurrence obstructions and symmetry. Landsberg, Michalek, and others applied GCT to establish explicit border rank lower bounds for the matrix multiplication tensor, such as R(Mm)32m22\underline{R}(M_m) \geq \frac{3}{2} m^2 - 2 for m×mm \times m multiplication, exceeding the trivial rank m2m^2 by a constant factor using plethysm coefficients and Young diagram multiplicities.[34][35] However, these yield only mildly super-quadratic tensor ranks for fixed mm and do not produce asymptotic ω>2+ϵ\omega > 2 + \epsilon for any ϵ>0\epsilon > 0. Over finite fields, the exponent ω\omega generally aligns with characteristic-zero cases for large fields, but field-dependent phenomena arise in small characteristic, where certain Strassen-like algorithms may achieve effective ω=2\omega = 2 under modular constraints without divisions, though no general algebraic algorithm reaches this bound.[36] Proving ω>2\omega > 2 in the general algebraic model remains elusive due to fundamental barriers linking it to broader complexity separations. Such a proof would imply super-quadratic lower bounds on algebraic circuits for problems like the permanent, separating the complexity classes VP (verifiable in polynomial size) from VNP (nondeterministic polynomial size), the algebraic analogue of P versus NP; GCT targets this via representation theory but faces challenges in computing invariants for high-degree symmetries and handling approximation by borders.[34] As of 2025, no super-quadratic lower bounds exist for unrestricted algebraic circuits computing matrix multiplication, with ongoing efforts in GCT and border rank yielding only incremental improvements over the quadratic threshold.[37]

Variants and Extensions

Rectangular Matrix Multiplication

Rectangular matrix multiplication involves computing the product of an m×pm \times p matrix and a p×np \times n matrix, resulting in an m×nm \times n output matrix. The standard schoolbook algorithm performs this operation using O(mnp)O(m n p) arithmetic operations by summing over the pp shared dimensions for each of the mnm n output entries. Subcubic algorithms extend techniques from square matrix multiplication, such as Strassen's method, to rectangular cases where the dimensions differ significantly. These methods aim to reduce the exponent in the time complexity, often parameterized by the infimum α(m,n,p)\alpha(m, n, p) such that the multiplication requires O(max(m,n,p)α(m,n,p))O(\max(m, n, p)^{\alpha(m, n, p)}) field operations, assuming the dimensions are scaled appropriately relative to the largest one. When m=n=pm = n = p, this reduces to the square matrix multiplication exponent [38], and improvements in rectangular bounds can imply better upper bounds for ω\omega through blocking strategies that divide square matrices into rectangular blocks.[39] A seminal result is due to Coppersmith, who developed an algorithm for multiplying n×nkn \times n^k and nk×nn^k \times n matrices in O(n2(logn)2)O(n^2 (\log n)^2) scalar multiplications for specific small kk, achieving subcubic complexity for balanced rectangular cases where the inner dimension p=nβp = n^\beta with β<1\beta < 1. For instance, when β0.172\beta \approx 0.172, the time is O(n2+o(1))O(n^{2 + o(1)}), improving over the schoolbook O(n2+β)O(n^{2 + \beta}) and marking the first such subcubic rectangular algorithm. Such algorithms find applications in data analysis, particularly with rectangular matrices common in least squares problems, where forming the normal equations involves multiplying a tall thin design matrix by its transpose, benefiting from reduced complexity in unbalanced cases like mnm \gg n. Recent advancements in the 2020s leverage refined versions of the laser method to obtain faster algorithms for highly unbalanced rectangular multiplication, such as when mnm \ll n. For example, a 2024 refinement introduces greater asymmetry in tensor constructions, yielding improved exponents for rectangular shapes and enabling times asymptotically better than prior methods for cases where one dimension is much smaller than the others.[3]

Fast Matrix Multiplication in Other Contexts

Fast matrix multiplication algorithms, including Strassen's method and subsequent improvements like Coppersmith-Winograd variants, apply directly over finite fields, achieving the same asymptotic complexity exponent ω ≤ 2.372 as over the real or complex numbers, since these algorithms rely on tensor decompositions and algebraic identities valid in any field of characteristic zero or sufficiently large characteristic.[1] In fields supporting fast Fourier transforms (FFT), such as those with primitive roots of unity, structured matrix multiplications can leverage FFT for convolution-based reductions, but the general dense case retains the algebraic exponent without achieving ω = 2. For general finite fields without such structure, the complexity remains governed by the algebraic upper bounds, though numerical stability concerns are absent. Specific optimizations for small finite fields, like GF(2), exploit bit-level operations where addition is XOR and multiplication is AND; here, hierarchical Strassen-like algorithms for dense matrices over GF(2) achieve practical speedups by minimizing gate operations, with implementations running in O(n^ω) time adapted to bit-packed representations.[40] For integer matrices, exact computation requires handling large intermediate values to avoid overflow, often addressed by performing multiplications modulo several primes and reconstructing the result using the Chinese Remainder Theorem, which preserves the O(n^ω) asymptotic complexity while bounding bit lengths.[41] Strassen's algorithm and its variants are adapted for exact integer arithmetic by recursively applying the decomposition in modular rings, enabling faster computation for large integers compared to naive methods, though constant factors and modular overhead must be managed for practicality. These approaches are particularly useful in cryptographic applications or exact linear algebra, where floating-point approximations are unsuitable. In the Boolean semiring, where addition is logical OR and multiplication is AND, matrix multiplication computes the transitive closure of a directed graph and is closely related to the all-pairs shortest paths (APSP) problem in the (min, +) semiring via combinatorial reductions. The complexity exponent for Boolean matrix multiplication, denoted ω_B, matches the algebraic ω ≤ 2.372 through embeddings into numeric fields, allowing the use of fast algebraic algorithms, though pure combinatorial methods yield slightly higher exponents around 2.529. Recent improvements in the algebraic exponent directly lower ω_B, impacting applications like graph connectivity and pattern matching. In parallel and distributed models, such as the PRAM, fast matrix multiplication achieves parallel time complexity O(log n) using O(n^ω / log n) processors via recursive decomposition and Brent's scheduling principle, yielding total work O(n^ω) matching the sequential bound. In distributed-memory settings, communication-optimal algorithms like 2.5D matrix multiplication minimize bandwidth to O(n^ω / √M) per processor, where M is local memory, but lower bounds from geometric inequalities imply ω > 2, as subquadratic communication would contradict known volume constraints for dense computation.[42] Quantum algorithms for matrix multiplication offer no asymptotic speedup beyond the classical O(n^ω) for computing the full dense product over fields, with query complexity matching the classical Θ(n^2) in the algebraic setting. However, Grover's search provides quadratic speedups for unstructured subproblems like element-wise verification or sparse cases, potentially accelerating hybrid routines, though dense output computation remains bottlenecked by classical limits.[43] In 2025, advancements have focused on practical integer matrix multiplication, including enhancements to the Ozaki scheme that enable high-accuracy computation using low-precision units in hardware, achieving up to 1.5× speedups over prior methods in optimized libraries supporting integer BLAS-like operations. These improvements emphasize reduced rounding errors and better exploitation of modern architectures for exact arithmetic in scientific computing.[44]

Connections to Linear Algebra Operations

The computational complexity of matrix multiplication directly influences several core linear algebra operations, as these tasks can often be reformulated or accelerated using fast matrix multiplication algorithms, achieving the asymptotic bound of O(nω)O(n^\omega) arithmetic operations, where ω2.371339\omega \leq 2.371339 as of 2024 is the current best-known upper bound on the matrix multiplication exponent. This equivalence arises because operations like inversion, factorization, and determinant computation involve recursive block decompositions where updates rely on matrix products, allowing substitution of faster multiplication routines without increasing the overall exponent. Matrix inversion is computationally equivalent to matrix multiplication in algebraic complexity models, enabling inversion of an n×nn \times n nonsingular matrix in O(nω)O(n^\omega) operations. A foundational algorithm by Bunch and Hopcroft employs recursive triangular factorization, where block updates via fast matrix multiplication compute the Schur complements and yield both the LU decomposition and inverse simultaneously. An alternative approach uses the Cayley-Hamilton theorem, which states that a matrix satisfies its own characteristic equation, allowing the inverse to be expressed as a linear combination of matrix powers up to degree n1n-1; these powers are computed recursively with matrix multiplications, again in O(nω)O(n^\omega) time. The Bunch-Hopcroft method exemplifies these connections, as its block-wise elimination leverages Schur complements—defined as S=A22A21A111A12S = A_{22} - A_{21}A_{11}^{-1}A_{12}—which require inverting smaller blocks and performing multiplications that scale with ω\omega. Gaussian elimination, the standard method for solving linear systems or computing factorizations, traditionally requires O(n3)O(n^3) operations due to successive row reductions. However, fast variants accelerate this by partitioning the matrix into blocks and using matrix multiplication for efficient updates of trailing submatrices, particularly the Schur complements in each elimination step, reducing the complexity to O(nω)O(n^\omega). This block-recursive structure mirrors that of fast multiplication algorithms, ensuring that improvements in ω\omega propagate directly to elimination without additional overhead. Determinant computation, essential for assessing matrix invertibility and solving systems, also achieves O(nω)O(n^\omega) complexity through methods that embed the problem into matrix products. Techniques such as dynamic programming over the minors of the matrix or interpolation of the characteristic polynomial reduce the task to a sequence of multiplications and additions, with the exponent determined by the multiplication bottleneck.[45] For instance, recursive evaluation of the Laplace expansion can be optimized by fast multiplication of bordered matrices formed from minors. These reductions have significant implications for solving linear systems Ax=bAx = b, where computing the inverse or an LU factorization in O(nω)O(n^\omega) followed by forward-back substitution in O(n2)O(n^2) yields an overall complexity of O(nω)O(n^\omega), far surpassing the classical O(n3)O(n^3) for large nn. Such efficiencies underpin numerical libraries and scientific computing applications reliant on linear algebra. The foundational links between matrix multiplication and these operations were established in the 1970s, building on Strassen's breakthrough, with pivotal contributions from researchers like Shmuel Winograd who analyzed the algebraic equivalences in computational models over arbitrary fields.

Arithmetic Circuit Complexity

Arithmetic circuit complexity addresses the minimal number of multiplication operations required to compute the product of two n×nn \times n matrices over a field, treating additions as cost-free. An arithmetic circuit is modeled as a directed acyclic graph whose nodes are either input variables (entries of the input matrices AA and BB), constants from the field, or gates performing addition or multiplication, with the circuit size defined as the number of multiplication gates. The outputs correspond to the entries of the product matrix C=ABC = AB, and the goal is to minimize this size while correctly computing the bilinear form underlying matrix multiplication.[46] The problem of finding the circuit with the fewest multiplications for n×nn \times n matrix multiplication has been studied extensively for small nn, with asymptotic implications for larger sizes. For 2×22 \times 2 matrices, Volker Strassen established in 1969 that 7 multiplications are both necessary and sufficient, serving as a foundational result for subsequent developments. Extending such ideas recursively, methods inspired by the Karatsuba algorithm for integer multiplication can achieve O(n2logn)O(n^2 \log n) multiplications for certain rectangular cases, such as n×2nn \times 2^n matrices, by decomposing the computation into fewer subproblems. However, for general square n×nn \times n multiplication, the exact minimal number remains unknown beyond small nn.[14] Lower bounds on the number of multiplications provide fundamental limits. A general lower bound of Ω(n2)\Omega(n^2) follows from the fact that the output matrix has n2n^2 independent entries, each requiring at least one multiplication in the worst case. For n=3n=3, Laderman demonstrated in 1976 an upper bound of 23 multiplications, with the best known lower bound being 19 multiplications, as established by Matthias Bläser in 2003, highlighting the gap for this case.[47][48] These bounds underscore that while Ω(n2)\Omega(n^2) is achievable in the limit, superquadratic growth is expected based on tensor rank considerations. The matrix multiplication exponent ω\omega is precisely the infimum over real numbers cc such that the minimal circuit size s(n)=O(nc)s(n) = O(n^c), meaning improvements in s(n)s(n) for small nn can potentially lower ω\omega through recursive application, though the exact relation depends on the recursion structure and fan-out. In practice, such circuits find application in signal processing for multiplying by constant matrices, where the fixed structure allows precomputation of minimal-multiplication formulas; for instance, approximations of the discrete cosine transform (DCT) used in image compression can be realized with significantly fewer than n3n^3 multiplications by exploiting circuit optimizations tailored to the transform matrix. Despite progress for small nn—such as verified minima for n10n \leq 10 in some fields—the exact minimal number of multiplications for general nn remains an open problem, with ongoing efforts focusing on tensor decomposition techniques to close gaps between upper and lower bounds.

References

User Avatar
No comments yet.