03.02.2015 Views

Dense Matrix Algorithms -- Chapter 8 Introduction

Dense Matrix Algorithms -- Chapter 8 Introduction

Dense Matrix Algorithms -- Chapter 8 Introduction

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

11<br />

The DNS Algorithm<br />

• Partitions intermediate data as well as the input data<br />

• Results in a parallel time of Θ(log n) using Ω(n 3 /log n) processors<br />

• The essence of the algorithm:<br />

• Use a processor, labeled P ijk , to perform each scalar product A ik B kj for<br />

a total of n 3 and add the results up to obtain C ij simultaneously in log<br />

n steps<br />

• The parallel algorithm:<br />

• Consider processors are arranged in n planes of n×n processors<br />

– We arrange the matrices A and B so that the elements A ik and B kj are on<br />

the processor ij in the k-th plane<br />

– Each processor performs its multiplication of elements<br />

– Then n 2 simultaneous sum-reductions are performed, say down the<br />

dimension k to the bottom plane of processors, to produce the product C<br />

on the bottom plane<br />

5/6/2003 densematrix 21<br />

Communication For The DNS<br />

Algorithm<br />

• Initially A and B are assumed to be<br />

distributed element by element on the<br />

bottom plane of processors (k = 0)<br />

• For A, move its k column to the diagonal column in<br />

the k plane<br />

– Now broadcast those elements simultaneously to the rows<br />

of each plane<br />

• For B, move its k row to the diagonal row in the k<br />

plane<br />

– Now broadcast those elements simultaneously to the<br />

columns of each plane<br />

5/6/2003 densematrix 22

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!