Dense Matrix Algorithms -- Chapter 8 Introduction

More documents

Recommendations

Info

10 Cannon's Algorithm • Recall it: • Initially – Aligns A by shifting the data in the rows circularly left a distance equal to the row number – Aligns B by shifting the data in the columns circularly up a distance equal to the column number • Repeatedly for √p steps: – Perform a matrix multiplication of the block of A by the block of B that is on each processor – Shift circularly the blocks of A in rows left one position – Shift circularly the blocks of B in columns one position 5/6/2003 densematrix 19 Performance Analysis Of Cannon's Algorithm • Initial alignment • Shifts at most √p – 1 positions – Time is at most: 2(t s + t w n 2 /p) • √p steps: • Total shifting time: 2(t s + t w n 2 /p)√p • Total computation time: (n/√p) 3 √p = n 3 /p • Total parallel runtime is: n 3 /p + 2t s √p+ 2t w n 2 /√p • Cf: simple algorithm: n 3 /p + 2t s log √p + 2t w n 2 /√p • Cost optimality condition and isoefficiency function are the same for Cannon's algorithm as the simple algorithm 5/6/2003 densematrix 20
11 The DNS Algorithm • Partitions intermediate data as well as the input data • Results in a parallel time of Θ(log n) using Ω(n 3 /log n) processors • The essence of the algorithm: • Use a processor, labeled P ijk , to perform each scalar product A ik B kj for a total of n 3 and add the results up to obtain C ij simultaneously in log n steps • The parallel algorithm: • Consider processors are arranged in n planes of n×n processors – We arrange the matrices A and B so that the elements A ik and B kj are on the processor ij in the k-th plane – Each processor performs its multiplication of elements – Then n 2 simultaneous sum-reductions are performed, say down the dimension k to the bottom plane of processors, to produce the product C on the bottom plane 5/6/2003 densematrix 21 Communication For The DNS Algorithm • Initially A and B are assumed to be distributed element by element on the bottom plane of processors (k = 0) • For A, move its k column to the diagonal column in the k plane – Now broadcast those elements simultaneously to the rows of each plane • For B, move its k row to the diagonal row in the k plane – Now broadcast those elements simultaneously to the columns of each plane 5/6/2003 densematrix 22
Page 1 and 2: 1 Dense Matrix Algorithms -- Chapte
Page 3 and 4: 3 Fewer Than n Processors • Suppo
Page 5 and 6: 5 Diagram Of The Computational And
Page 7 and 8: 7 Determining The Cost-Optimal Cons
Page 9: 9 Performance And Analysis for the
Page 13: 13 The DNS Algorithm Continued 5/6/

Dense Matrix Algorithms -- Chapter 8 Introduction

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?