22.02.2014 Views

Discrete Mathematics University of Kentucky CS 275 Spring ... - MGNet

Discrete Mathematics University of Kentucky CS 275 Spring ... - MGNet

Discrete Mathematics University of Kentucky CS 275 Spring ... - MGNet

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

• In step 4, we have to take care that with S 1 . (a) If k is odd, then copy the<br />

first column <strong>of</strong> A 21 into W mk . (b) Complete S 1 .<br />

• In step 10, we have to take care that with S 4 . (a) If k is odd, then pretend the<br />

first column <strong>of</strong> A 21 = 0 in W mk . (b) Complete S 4 .<br />

• In step 11, we have to take care that with M 6 . (a) If m is odd, then save the<br />

first row <strong>of</strong> M 5 . (b) Calculate most <strong>of</strong> M 6 . (c) Complete M 6 using (a) based<br />

on whether or not m is odd.<br />

• In step 21, we have to take care that with M 3 . (a) Caluclate M 3 using an<br />

index shift.<br />

This all sounds very complicated. However, the code GEMMW that is readily<br />

available on the Web effectively is implemented in 27 calls to subroutines that<br />

do the matrix operations and actually implements<br />

C = ?Cop(A)op(B) + DCC,<br />

where op(X) is either X, X transpose, X conjugate, or X conjugate transpose.<br />

145<br />

What is the total cost?<br />

• There are 7 submatrix-submatrix multiplies and 15 submatrix-submatrix<br />

adds or subtracts. So the cost is f(n) = 7f(n/2) + 15n 2 /4 when m=k=n. This is<br />

actually an O(n 2.807 logn) algorithm, where log 2 7 = 2.807.<br />

• The work area W mk needs 7((m+1)max(k,n)+m+4)/48 space.<br />

• The work area W kn needs 7((k+1)n+n+4)/48 space.<br />

• If C overlaps A or B in memory, an additional mn space is needed to save C<br />

before calculating DCC when D-0.<br />

• The maximum amount <strong>of</strong> extra memory is bounded by<br />

(mCmax(k,n)+kn)/3+(m+max(k,n)+k+3n)/2+32+mn. Hence, the overall<br />

extra storage is cN 2 /3, where c,{2,5}.<br />

• Typical memory usage when m=k=n is<br />

o D-0 or A or B overlap with C: 1.67N 2 .<br />

o D=0 and A and B do not overlap with C: 0.67N 2 .<br />

146

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!