Dense Matrix Algorithms -- Chapter 8 Introduction
Dense Matrix Algorithms -- Chapter 8 Introduction
Dense Matrix Algorithms -- Chapter 8 Introduction
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
11<br />
The DNS Algorithm<br />
• Partitions intermediate data as well as the input data<br />
• Results in a parallel time of Θ(log n) using Ω(n 3 /log n) processors<br />
• The essence of the algorithm:<br />
• Use a processor, labeled P ijk , to perform each scalar product A ik B kj for<br />
a total of n 3 and add the results up to obtain C ij simultaneously in log<br />
n steps<br />
• The parallel algorithm:<br />
• Consider processors are arranged in n planes of n×n processors<br />
– We arrange the matrices A and B so that the elements A ik and B kj are on<br />
the processor ij in the k-th plane<br />
– Each processor performs its multiplication of elements<br />
– Then n 2 simultaneous sum-reductions are performed, say down the<br />
dimension k to the bottom plane of processors, to produce the product C<br />
on the bottom plane<br />
5/6/2003 densematrix 21<br />
Communication For The DNS<br />
Algorithm<br />
• Initially A and B are assumed to be<br />
distributed element by element on the<br />
bottom plane of processors (k = 0)<br />
• For A, move its k column to the diagonal column in<br />
the k plane<br />
– Now broadcast those elements simultaneously to the rows<br />
of each plane<br />
• For B, move its k row to the diagonal row in the k<br />
plane<br />
– Now broadcast those elements simultaneously to the<br />
columns of each plane<br />
5/6/2003 densematrix 22