10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

18 CHAPTER 2. HARDWARE AND ITS HABITS2.3.4 Existing <strong>Parallel</strong> <strong>So</strong>ftwareAlthough multicore CPUs seem to have taken thecomputing industry by surprise, the fact remainsthat shared-memory parallel computer systems havebeen commercially available for more than a quartercentury. This is more than enough time forsignificant parallel software to make its appearance,and it indeed has. <strong>Parallel</strong> operating systemsare quite commonplace, as are parallel threadinglibraries, parallel relational database managementsystems, and parallel numerical software are nowreadily available. Using existing parallel softwarego a long ways towards solving any parallel-softwarecrisis we might encounter.Perhaps the most common example is the parallelrelational database management system. <strong>It</strong> is notunusual for single-threaded programs, often writtenin high-level scripting languages, to access a centralrelational database concurrently. In the resultinghighly parallel system, only the database need actuallydeal directly with parallelism. A very nice trickwhen it works!anyone bother with them?The lesson should be quite clear: parallel algorithmsmust be explicitly designed to run nearly independentthreads. The less frequently the threadscommunicate, whether by atomic operations, locks,orexplicitmessages, thebettertheapplication’sperformanceand scalability will be. In short, achievingexcellent parallel performance and scalability meansstriving for embarrassingly parallel algorithms andimplementations, whether by careful choice of datastructures and algorithms, use of existing parallelapplications and environments, or transforming theproblem into one for which an embarrassingly parallelsolution exists.Chapter 5 will discuss design disciplines that promoteperformance and scalability.2.4 <strong>So</strong>ftware Design ImplicationsThe values of the ratios in Table 2.1 are criticallyimportant, as they limit the efficiency of a givenparallel application. To see this, suppose that theparallel application uses CAS operations to communicateamong threads. These CAS operations willtypically involve a cache miss, that is, assuming thatthe threads are communicating primarily with eachother rather than with themselves. Suppose furtherthat the unit of work corresponding to each CAScommunication operation takes 300ns, which is sufficienttime to compute several floating-point transcendentalfunctions. Then about half of the executiontime will be consumed by the CAS communicationoperations! This in turn means that a two-CPUsystem running such a parallel program would runno faster than one a sequential implementation runningon a single CPU.The situation is even worse in the distributedsystemcase, where the latency of a single communicationsoperation might take as long as thousandsor even millions of floating-point operations. Thisillustrates how important it is for communicationsoperations to be extremely infrequent and to enablevery large quantities of processing.Quick Quiz 2.7: Given that distributed-systemscommunication is so horribly expensive, why does

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!