12.07.2015 Views

Pipelined Computations

Pipelined Computations

Pipelined Computations

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 5slides5-1<strong>Pipelined</strong> <strong>Computations</strong>Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


<strong>Pipelined</strong> <strong>Computations</strong>slides5-2Problem divided into a series of tasks that have to be completedone after the other (the basis of sequential programming).Each task executed by a separate process or processor.P 0 P 1 P 2 P 3 P 4 P 5Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


Exampleslides5-3Add all the elements of array a to an accumulating sum:for (i = 0; i < n; i++)sum = sum + a[i];The loop could be “unfolded” to yieldsum = sum + a[0];sum = sum + a[1];sum = sum + a[2];sum = sum + a[3];sum = sum + a[4];...Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


slides5-4Pipeline for an unfolded loopa[0] a[1] a[2] a[3] a[4]sums inas outs inas outs inas outs inas outs inas outSlides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


Another Exampleslides5-5Frequency filter - Objective to remove specific frequencies (f 0 , f 1 , f 2 ,f 3 , etc.) from a digitized signal, f(t). Signal enters pipeline from left:Signal withoutSignal withoutfrequency f frequency f 1Signal 3Signal withoutwithoutfrequency f 0 frequency f 2f(t)f 1f in f out f in f outf 0 f 4f 2f in f outf 3f in f outf inf outFilteredsignalSlides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


Where pipelining can be used to good effectslides5-6Assuming problem can be divided into a series of sequential tasks,pipelined approach can provide increased execution speed underthe following three types of computations:1. If more than one instance of the complete problem is to beexecuted2. If a series of data items must be processed, each requiringmultiple operations3. If information to start the next process can be passed forwardbefore the process has completed all its internal operationsSlides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


“Type 1” Pipeline Space-Time Diagramslides5-7p - 1mP 5P 4P 3P 2P 1P 0Instance1Instance Instance Instance Instance Instance1 2 3 4 5Instance Instance Instance Instance Instance Instance1 2 3 4 5 6Instance Instance Instance Instance Instance Instance Instance1 2 3 4 5 6 7Instance Instance Instance Instance Instance Instance Instance1 2 3 4 5 6 7Instance Instance Instance Instance Instance Instance Instance1 2 3 4 5 6 7Instance Instance Instance Instance Instance Instance2 3 4 5 6 7TimeExecution time = m + p - 1 cycles for a p-stage pipeline and m instances.Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


slides5-8Alternative space-time diagramInstance 0Instance 1Instance 2Instance 3Instance 4P 0 P 1 P 2 P 3 P 4 P 5P 0 P 1 P 2 P 3 P 4 P 5P 0 P 1 P 2 P 3 P 4 P 5P 0 P 1 P 2 P 3 P 4 P 5P 0 P 1 P 2 P 3 P 4 P 5TimeSlides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


“Type 2” Pipeline Space-Time DiagramP 7d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9Input sequenced 9 d 8 d 7 d 6 d 5 d 4 d 3 d 2 d 1 d 0 P 0 P 1 P 2 P 3 P 4 P 5 P 6 P 8 P 9P 6d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9(a) Pipeline structurep - 1nP 9d 0 d 1 d 2 d 3 d 4 d 5 d 6P 8d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7P 7d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9P 5P 4P 3P 2P 1d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9slides5-9P 0Time(b) Timing diagramSlides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


“Type 3” Pipeline Space-Time Diagramslides5-10Informationtransfersufficient tostart nextprocessP 0P 1P 2P 3P 4P 5Information passedto next stageP 2P 3P 4P 0P 1P 5Time(a) Processes with the sameexecution timeTime(b) Processes not with thesame execution timePipeline processing where information passes to next stage beforeSlides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


slides5-11If the number of stages is larger than the number of processors inany pipeline, a group of stages can be assigned to each processor:Processor 0 Processor 1Processor 2P 0 P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 P 10 P 11Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


slides5-12Computing Platform for <strong>Pipelined</strong> ApplicationsMultiprocessor system with a line configuration.HostcomputerMultiprocessorProcessorsStrictly speaking pipeline may not be the best structure for acluster - however a cluster with switched direct connections,as most have, can support simultaneous message passing.Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


slides5-13Example <strong>Pipelined</strong> Solutions(Examples of each type of computation)Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


slides5-14Pipeline Program ExamplesAdding Numbers2345Σ i1Σ i1Σ i1Σ i1Σ1P 0 P 1 P 2 P 3 P 4iType 1 pipeline computationSlides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


slides5-15Basic code for process P i :recv(&accumulation, P i-1 );accumulation = accumulation + number;send(&accumulation, P i+1 );except for the first process, P 0 , which issend(&number, P 1 );and the last process, P n−1 , which isrecv(&number, P n-2 );accumulation = accumulation + number;Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


slides5-16SPMD programif (process > 0) {recv(&accumulation, P i-1 );accumulation = accumulation + number;}if (process < n-1) send(&accumulation, P i+1 );The final result is in the last process.Instead of addition, other arithmetic operations could be done.Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


slides5-17<strong>Pipelined</strong> addition numbers with a masterprocess and ring configurationMaster processSlavesd n-1 … d 2 d 1 d 0P 0 P 1 P 2P n-1SumSlides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


Sorting NumbersA parallel version of insertion sort.slides5-181numbers4, 3, 1, 2, 5P 0 P 1 P 2 P 3 P 42345Time(cycles)674, 3, 1, 24, 3, 155214, 3 5 24 5 3 2 15 4 3 25 4 312 189105 4 3 2 15 4 3 2 15 4 3 2 1Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


slides5-19Pipeline for sorting using insertion sortSeries of numbersx n−1 … x 1 x 0P 0SmallernumbersP 1 P 2Comparex maxLargest numberNext largestnumberType 2 pipeline computationSlides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


slides5-20The basic algorithm for process P i isrecv(&number, P i-1 );if (number > x) {send(&x, P i+1 );x = number;} else send(&number, i+1 P );With n numbers, how many the ith process is to accept is known; itis given by n − i.How many to pass onward is also known; it is given by n − i − 1since one of the numbers received is not passed onward.Hence, a simple loop could be used.Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


slides5-21Insertion sort with results returned to the masterprocess using a bidirectional line configurationMaster processd n-1 … d 2 d 1 d 0SortedsequenceP 0 P 1 P 2P n-1Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


slides5-22Insertion sort with results returnedP 4P 3Sorting phase2n - 1Returning sorted numbersnShown for n = 5P 2P 1P 0TimeSlides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


Prime Number Generationslides5-23Sieve of EratosthenesSeries of all integers is generated from 2. First number, 2, is primeand kept. All multiples of this number are deleted as they cannot beprime. Process repeated with each remaining number. Thealgorithm removes nonprimes, leaving only primes.Not multiples of1st prime numberSeries of numbersx n−1 … 5 4 3 2P 0 P 1 P 2Compare 1st prime 2nd prime 3rd primemultiples number number numberType 2 pipeline computationSlides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


The code for a process, P i , could be based uponslides5-24recv(&x, P i-1 );/* repeat following for each number */recv(&number, P i-1 );if ((number % x) != 0) send(&number, P i+1 );Each process will not receive the same amount of numbers and theamount is not known beforehand. Use a “terminator” message,which is sent at the end of the sequence:recv(&x, P i-1 );for (i = 0; i < n; i++) {recv(&number, P i-1 );if (number == terminator) break;if (number % x) != 0) send(&number, i+1 P );}Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


slides5-25Solving a System of Linear EquationsUpper-triangular forma n-1,0 x 0 + a n-1,1 x 1 + a n-1,2 x 2 … + a n-1,n-1 x n-1 = b n-1..a 2,0 x 0 + a 2,1 x 1 + a 2,2 x 2 = b 2a 1,0 x 0 + a 1,1 x 1 = b 1a 0,0 x 0 = b 0where a’s and b’s are constants and x’s are unknowns to be found.Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


Back Substitutionslides5-26First, the unknown x 0 is found from the last equation; i.e.,bx 00 = ---------a 0,0Value obtained for x 0 substituted into next equation to obtain x 1 ; i.e.,bx 1 – a 1,0 x 01 = -----------------------------a 1,1Values obtained for x 1 and x 0 substituted into next equation toobtain x 2 :bx 2 – a 2,0 x 0 – a 2,1 x 12 = ----------------------------------------------------a 2,2and so on until all the unknowns are found.Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


slides5-27Pipeline SolutionFirst pipeline stage computes x 0 and passes x 0 onto the secondstage, which computes x 1 from x 0 and passes both x 0 and x 1 ontothe next stage, which computes x 2 from x 0 and x 1 , and so on.P 0 P 1 P 2 P 3xx x 0 x 0 00 xx 1 xCompute x 10 Compute x 1 1 Compute x 2 x Compute x 2 3x 2x 3Type 3 pipeline computationSlides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


slides5-28The ith process (0 < i < n) receives the values x 0 , x 1 , x 2 , …, x i-1 andcomputes x i from the equation:i – 1b i– ∑ a i,jx jj 0x =i= --------------------------------------a i,iSlides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


slides5-29Sequential CodeGiven the constants a i,j and b k stored in arrays a[][] and b[],respectively, and the values for unknowns to be stored in an array,x[], the sequential code could bex[0] = b[0]/a[0][0]; /* computed separately */for (i = 1; i < n; i++) {/*for remaining unknowns*/sum = 0;for (j = 0; j < i; j++sum = sum + a[i][j]*x[j];x[i] = (b[i] - sum)/a[i][i];}Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


Parallel Codeslides5-30Pseudocode of process P i (1 < i < n) of could befor (j = 0; j < i; j++) {recv(&x[j], P i-1 );send(&x[j], P i+1 );}sum = 0;for (j = 0; j < i; j++)sum = sum + a[i][j]*x[j];x[i] = (b[i] - sum)/a[i][i];send(&x[i], P i+1 );Now we have additional computations to do after receiving andresending values.Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.


Pipeline processing using back substitutionslides5-31P 5ProcessesP 4P 3P 2Final computed valueP 1P 0First value passed onwardTimeSlides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!