Dense Matrix Algorithms -- Chapter 8 Introduction
Dense Matrix Algorithms -- Chapter 8 Introduction
Dense Matrix Algorithms -- Chapter 8 Introduction
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
8<br />
<strong>Matrix</strong>-<strong>Matrix</strong> Multiplications C = AB<br />
• Assume the best serial algorithm is: O(n 3 )<br />
• This is not true however<br />
– Strassen's algorithm has fewer operations but not substantially<br />
– There are however others<br />
• In parallel, there are three algorithms discussed:<br />
• A simple block algorithm<br />
– Communication contention and lots of memory -- parallel runtime Ω(n)<br />
• Cannon's block algorithm -- reduces the memory requirement<br />
– Allows computation/communication overlap<br />
» Changes the parallel runtime a little unfortunately<br />
• The DNS algorithm (Dekel, Nassimi, Sahni algorithm)<br />
– Partitions intermediate data so that the parallel runtime is reduced to<br />
Θ(log n) -- an upper bound lower than the lower bound for the above<br />
two algorithms<br />
5/6/2003 densematrix 15<br />
The Simple Algorithm<br />
• Assume matrices A and B of size n×n<br />
• Assume p processors in a grid of size √p×√p<br />
• Assume the matrices are distributed by blocks of size<br />
n/√p×n/√p on each processor for both A and B<br />
– Algorithm:<br />
• Perform an all-to-all broadcast in each row of processors of the<br />
blocks of A in the particular row<br />
– For row i, this insures that every block of the i-th block row of A<br />
is on every processor in the i-th row of the grid<br />
• Perform an all-to-all broadcast in each column of processors of<br />
the blocks of B in the particular column<br />
– For column j, that insures that every block of the j-th block<br />
column of B is on every processor in the j-th column of the grid<br />
• Perform the row-block multiplication by the column -block of<br />
the blocks on each processor -- this computes the appropriate<br />
block of C<br />
5/6/2003 densematrix 16