03.02.2015 Views

Dense Matrix Algorithms -- Chapter 8 Introduction

Dense Matrix Algorithms -- Chapter 8 Introduction

Dense Matrix Algorithms -- Chapter 8 Introduction

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8<br />

<strong>Matrix</strong>-<strong>Matrix</strong> Multiplications C = AB<br />

• Assume the best serial algorithm is: O(n 3 )<br />

• This is not true however<br />

– Strassen's algorithm has fewer operations but not substantially<br />

– There are however others<br />

• In parallel, there are three algorithms discussed:<br />

• A simple block algorithm<br />

– Communication contention and lots of memory -- parallel runtime Ω(n)<br />

• Cannon's block algorithm -- reduces the memory requirement<br />

– Allows computation/communication overlap<br />

» Changes the parallel runtime a little unfortunately<br />

• The DNS algorithm (Dekel, Nassimi, Sahni algorithm)<br />

– Partitions intermediate data so that the parallel runtime is reduced to<br />

Θ(log n) -- an upper bound lower than the lower bound for the above<br />

two algorithms<br />

5/6/2003 densematrix 15<br />

The Simple Algorithm<br />

• Assume matrices A and B of size n×n<br />

• Assume p processors in a grid of size √p×√p<br />

• Assume the matrices are distributed by blocks of size<br />

n/√p×n/√p on each processor for both A and B<br />

– Algorithm:<br />

• Perform an all-to-all broadcast in each row of processors of the<br />

blocks of A in the particular row<br />

– For row i, this insures that every block of the i-th block row of A<br />

is on every processor in the i-th row of the grid<br />

• Perform an all-to-all broadcast in each column of processors of<br />

the blocks of B in the particular column<br />

– For column j, that insures that every block of the j-th block<br />

column of B is on every processor in the j-th column of the grid<br />

• Perform the row-block multiplication by the column -block of<br />

the blocks on each processor -- this computes the appropriate<br />

block of C<br />

5/6/2003 densematrix 16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!