10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

54 CHAPTER 5. PARTITIONING AND SYNCHRONIZATION DESIGNthe real world, these criteria often conflict to agreater or lesser degree, requiring that the designercarefully balance the resulting tradeoffs.As such, these criteria may be thought of as the“forces” acting on the design, with particularly goodtradeoffs between these forces being called “designpatterns” [Ale79, GHJV95].Thedesigncriteriaforattainingthethreeparallelprogramminggoals are speedup, contention, overhead,read-to-write ratio, and complexity:Speedup: As noted in Section 1.2, increased performanceis the major reason to go to all ofthe time and trouble required to parallelize it.Speedup is defined to be the ratio of the timerequired to run a sequential version of the programto the time required to run a parallel version.Contention: <strong>If</strong> more CPUs are applied to a parallelprogram than can be kept busy by thatprogram, the excess CPUs are prevented fromdoing useful work by contention. This may belock contention, memory contention, or a hostof other performance killers.Work-to-Synchronization Ratio: A uniprocessor,single-threaded, non-preemptible, and noninterruptible2 version of a given parallel programwould not need any synchronizationprimitives. Therefore, any time consumedby these primitives (including communicationcache misses as well as message latency, lockingprimitives, atomic instructions, and memorybarriers) is overhead that does not contributedirectly to the useful work that the programis intended to accomplish. Note that the importantmeasure is the relationship between thesynchronization overhead and the overhead ofthe code in the critical section, with larger criticalsectionsabletotolerategreatersynchronizationoverhead. The work-to-synchronization ratiois related to the notion of synchronizationefficiency.Read-to-Write Ratio: A data structure that israrely updated may often be replicated ratherthan partitioned, and furthermore may be protectedwith asymmetric synchronization primitivesthat reduce readers’ synchronizationoverhead at the expense of that of writers,thereby reducing overall synchronization overhead.Corresponding optimizations are possi-them.2 Either by masking interrupts or by being oblivious toble for frequently updated data structures, asdiscussed in Chapter 4.Complexity: A parallel program is more complexthan an equivalent sequential program becausethe parallel program has a much larger statespace than does the sequential program, althoughthese larger state spaces can in somecases be easily understood given sufficient regularityand structure. A parallel programmermust consider synchronization primitives, messaging,locking design, critical-section identification,anddeadlockinthecontextofthislargerstate space.This greater complexity often translates tohigher development and maintenance costs.Therefore, budgetary constraints can limit thenumber and types of modifications made toan existing program, since a given degree ofspeedup is worth only so much time and trouble.Furthermore, there may be potential sequentialoptimizations that are cheaper andmore effective than parallelization. As notedin Section 1.2.1, parallelization is but one performanceoptimization of many, and is furthermorean optimization that applies most readilyto CPU-based bottlenecks.These criteria will act together to enforce a maximumspeedup. The first three criteria are deeply interrelated,so the remainder of this section analyzesthese interrelationships. 3Note that these criteria may also appear as partof the requirements specification. For example,speedup may act as a desideratum (“the faster, thebetter”) or as an absolute requirement of the workload,or “context” (“the system must support atleast 1,000,000 web hits per second”).An understanding of the relationships betweenthese design citeria can be very helpful when identifyingappropriate design tradeoffs for a parallel program.1. The less time a program spends in critical sections,the greater the potential speedup.2. The fraction of time that the program spends ina given exclusive critical section must be muchless than the reciprocal of the number of CPUsfor the actual speedup to approach the numberof CPUs. For example, a program running on10 CPUs must spend much less than one tenth3 A real-world parallel system will be subject to many additionaldesign criteria, such as data-structure layout, memorysize, memory-hierarchy latencies, and bandwidth limitations.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!