STRASSEN'S MATRIX MULTIPLICATION

STRASSEN’S <strong>MATRIX</strong> <strong>MULTIPLICATION</strong>Excerpt from von zur Gathen & Gerhard (2003)JOACHIM VON ZUR GATHEN AND JÜRGEN GERHARD28th January 2004Let R be a ring and A, B ∈ R n×n two square matrices. The classical algorithmfor computing the product matrix AB by computing n 2 row-by-column productsuses n 3 multiplications and n 3 − n 2 additions in R. After having seen in Chapter 8of von zur Gathen & Gerhard (2003) that we can multiply polynomials faster thanby the obvious method, a natural question is whether something similar applies tomatrix multiplication.A few years after Karatsuba’s polynomial and integer multiplication algorithm,Strassen (1969) found such an algorithm. The input matrices are divided into fourn/2 × n/2 blocks, and the computation of AB is reduced to seven multiplicationsand 18 additions of n/2 × n/2 matrices, in comparison to eight multiplications andfour additions for the classical algorithm. Like in the polynomial case, the multiplicationsare handled recursively, and the saving of one (costly) multiplication at theexpense of 14 (cheap) additions leads to an asymptotically smaller running time ofO(n log 2 7 ) operations in R, where the exponent is log 27 = 2.807354922 . . .We present a slightly different version, also using seven multiplications of n/2 ×n/2 matrices, but only 15 additions.ALGORITHM 1. Matrix multiplication.Input: A, B ∈ R n×n , where R is a ring and n = 2 k for some k ∈ N.Output: The product matrix AB ∈ R n×n .1. If n = 1 then(let A = (a),)B = (b)(for some a,)b ∈ R and Return (ab).A11 A2. Write A =12B11 B, B =12, with all AA 21 A 22 B 21 B ij , B ij ∈ R (n/2)×(n/2) .223. S 1 ← A 21 + A 22 , T 1 ← B 12 − B 11 ,S 2 ← S 1 − A 11 , T 2 ← B 22 − T 1 ,S 3 ← A 11 − A 21 , T 3 ← B 22 − B 12 ,S 4 ← A 12 − S 2 , T 4 ← T 2 − B 21 .4. Call the algorithm recursively to computeP 1 = A 11 B 11 , P 5 = S 1 T 1 ,P 2 = A 12 B 21 , P 6 = S 2 T 2 ,P 3 = S 4 B 22 , P 7 = S 3 T 3 .P 4 = A 22 T 4 ,

2 Gathen & Gerhard5. U 1 ← P 1 + P 2 , U 5 ← U 4 + P 3 ,U 2 ← P 1 + P 6 , U 6 ← U 3 − P 4 ,U 3 ← U 2 + P 7 , U 7 ← U 3 + P 5 ,U 4 ← U 2 + P 5 .( )U1 U6. Return5.U 6 U 7THEOREM 2. Algorithm 1 correctly computes the product matrix and uses at most6n log 2 7 additions and multiplications in R. For arbitrary n ∈ N, an n × n matrixproduct can be computed with 42n log 2 7 ∈ O(n log 2 7 ) ring operations.PROOF. The correctness is left as an exercise. For n = 2 k ∈ N, let T (n) denote thenumber of arithmetic operations in R that the algorithm performs on inputs of sizen × n. Then T (1) = 1 and T (2 k ) = 15 · 2 2k−2 + 7T (2 k−1 ) for k ≥ 1, and by inductionthe first claim follows. For arbitrary n, we pad the matrices with zeroes, thereby atmost doubling the dimension.□Strassen’s (1969) discovery was the starting signal for the development of fast algorithms.Although subquadratic integer multiplication algorithms had been aroundfor a while, it was the surprise of realizing that the “obvious” cubic algorithm formatrix multiplication could be improved that kicked this development into highgear and inspired, within the following five years, the many new ideas for almostall the fast algorithms discussed in Part II of von zur Gathen & Gerhard (2003).On a more technical level, Strassen’s result spawned three lines of research:◦ faster matrix multiplication,◦ other problems from linear algebra,◦ bilinear complexity.For a field F , a number ω ∈ R is a feasible matrix multiplication exponent if twon × n matrices over F can be multiplied with O(n ω ) operations in F . The classicalalgorithm shows that ω = 3 is feasible, and Strassen’s that ω = log 27 is. The matrixmultiplication exponent µ (for F ) is the infimum of all feasible ones. Thus2 ≤ µ ≤ ωfor all feasible ω’s. This µ is the same for all fields of a fixed characteristic, and allfeasible exponents discovered so far work for all fields.The fascinating history of the smallest exponents known is in the Notes; thecurrent world record is ω < 2.376. It seems natural to conjecture that µ = 2; there iscurrently no method in sight that might prove or disprove this.

Strassen’s Matrix Multiplication 3How practical are these algorithms? Bailey et al. (1990) deplore “an unfortunatemyth [. . . ] regarding the crossover point for Strassen’s algorithm”, and show thatfor the Sun–4 “Strassen is faster for matrices as small as 16 × 16. For Cray systemsthe crossover point is roughly 128”. They conclude that “it appears that Strassen’salgorithm can indeed be used to accelerate practical-sized linear algebra calculations.”Besides a Cray library implementation (SGEMMS) of fast matrix multiplication,there is also one, the ESSL library, for IBM 3090 machines. Higham (1990)reports on a set of FORTRAN 77 routines (level 3 BLAS) using “Strassen’s methodfor fast matrix multiplication, which is now recognized to be a practically usefultechnique once matrix dimensions exceed about 100”. In all these experiments, thecoefficients are floating point numbers of a fixed precision.A further avenue to explore with Strassen’s algorithm is that its recursive partitionemploys data access that is essentially different from classical multiplication.This may make it attractive for machines with a hierarchical memory structureand for large matrices stored in secondary memory, possibly reducing data transfer(paging) time.Further computational problems in linear algebra include matrix inversion, computingthe determinant, the characteristic polynomial, or the LR-decomposition ofa matrix, and, for F = C, the QR-decomposition and unitary transformation to upperHessenberg form. It turns out that all these problems have the same asymptoticcomplexity as matrix multiplication (up to constant factors), so that a fast algorithmfor one of them immediately gives fast algorithms for all of them.The exponent η for solving systems of linear equations satisfies η ≤ ω for allfeasible ω. It is not known whether η = µ.The most fundamental consequence of Strassen’s breakthrough was the developmentof bilinear complexity theory, a deep and rich area that is concerned withgood and optimal algorithms for functions that depend linearly on each of two setsof variables, just like the entries of the product of two matrices (or polynomials) do.Bürgisser, Clausen & Shokrollahi (1997) give a detailed account of the achievementsin this theory, which is part of algebraic complexity theory.Notes Algorithm 1 is due to Winograd (1971), and the current world record ω < 2.376is from Coppersmith & Winograd (1990). The entries of the following table indicate theapproximate date of discovery of new feasible matrix multiplication exponents in history;publication was often years later.Strassen 1968 2.808Pan 1978 2.781Bini et al. 1979 2.780Sch"onhage 1979 2.548Pan 1979 2.522Coppersmith & Winograd 1980 2.498Strassen 1986 2.479Coppersmith & Winograd 1986 2.376The details of these algorithms are beyond the scope of this text. The most comprehensivetreatment is in Bürgisser et al. (1997); we also refer to the books by Pan (1984) and de Groote

4 Gathen & Gerhard(1987), and the survey articles of Strassen (1984, 1990) and von zur Gathen (1988) for detailsand references.ReferencesDAVID H. BAILEY, KING LEE & HORST D. SIMON (1990). Using Strassen’s Algorithm toAccelerate the Solution of Linear Systems. The Journal of Supercomputing 4(4), 357–371.P. BÜRGISSER, M. CLAUSEN & M. A. SHOKROLLAHI (1997). Algebraic Complexity Theory.Number 315 in Grundlehren der mathematischen Wissenschaften. Springer-Verlag.DON COPPERSMITH & SHMUEL WINOGRAD (1990). Matrix Multiplication via ArithmeticProgressions. Journal of Symbolic Computation 9, 251–280.JOACHIM VON ZUR GATHEN (1988). Algebraic complexity theory. Annual Review of ComputerScience 3, 317–347.JOACHIM VON ZUR GATHEN & JÜRGEN GERHARD (2003). Modern Computer Algebra. CambridgeUniversity Press, Cambridge, UK, 2nd edition. ISBN 0-521-82646-2. URL http://www-math.upb.de/~aggathen/mca/. First edition 1999.H. F. DE GROOTE (1987). Lectures on the Complexity of Bilinear Problems. Number 245 inLecture Notes in Computer Science. Springer-Verlag.NICHOLAS J. HIGHAM (1990). Exploiting Fast Matrix Multiplication Within the Level 3BLAS. ACM Transactions on Mathematical Software 16(4), 352–368.V. YA. PAN (1984). How to multiply matrices faster. Number 179 in Lecture Notes in ComputerScience. Springer-Verlag, New York.VOLKER STRASSEN (1969). Gaussian Elimination is not Optimal. Numerische Mathematik 13,354–356.VOLKER STRASSEN (1984). Algebraische Berechnungskomplexität. In Perspectives in Mathematics,Anniversary of Oberwolfach 1984, 509–550. Birkhäuser Verlag, Basel.VOLKER STRASSEN (1990). Algebraic Complexity Theory. In Handbook of Theoretical ComputerScience, J. VAN LEEUWEN, editor, volume A, 633–672. Elsevier Science Publishers B.V.,Amsterdam, and The MIT Press, Cambridge MA.S. WINOGRAD (1971). On Multiplication of 2 × 2 matrices. Linear Algebra and its Applications4, 381–388.

STRASSEN'S MATRIX MULTIPLICATION

Create successful ePaper yourself

Delete template?

Save as template?