10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1.4. WHAT MAKES PARALLEL PROGRAMMING HARD? 7Performance<strong>Parallel</strong>Access ControlWorkPartitioningInteractingWith <strong>Hard</strong>wareGeneralityProductivityResourcePartitioning andReplicationFigure 1.5: Categories of Tasks Required of <strong>Parallel</strong>Programmersit is to parallelize your program, the more attractiveparallelization becomes as an optimization. <strong>Parallel</strong>izationhas a reputation of being quite difficult,which leads to the question “exactly what makesparallel programming so difficult?”1.4 <strong>What</strong> Makes <strong>Parallel</strong> <strong>Programming</strong><strong>Hard</strong>?<strong>It</strong> is important to note that the difficulty of parallelprogramming is as much a human-factors issueas it is a set of technical properties of the parallelprogramming problem. This is the case because weneed human beings to be able to tell parallel systemswhat to do, and this two-way communicationbetween human and computer is as much a functionof the human as it is of the computer. Therefore,appeals to abstractions or to mathematical analyseswill necessarily be of severely limited utility.IntheIndustrialRevolution,theinterfacebetweenhuman and machine was evaluated by human-factorstudies, then called time-and-motion studies. Althoughthere have been a few human-factor studiesexamining parallel programming [ENS05, ES05,HCS + 05, SS94], these studies have been extremelynarrowly focused, and hence unable to demonstrateanygeneralresults. Furthermore,giventhatthenormalrange of programmer productivity spans morethan an order of magnitude, it is unrealistic to expectan affordable study to be capable of detecting(say) a 10% difference in productivity. Althoughthe multiple-order-of-magnitude differencesthat such studies can reliably detect are extremelyvaluable, the most impressive improvements tend tobe based on a long series of 10% improvements.We must therefore take a different approach.One such approach is to carefully consider whattasks that parallel programmers must undertakethatarenotrequiredofsequentialprogrammers. Wecan then evaluate how well a given programminglanguage or environment assists the developer withthese tasks. These tasks fall into the four categoriesshown in Figure 1.5, each of which is covered in thefollowing sections.1.4.1 Work PartitioningWork partitioning is absolutely required for parallelexecution: if there is but one “glob” of work, thenit can be executed by at most one CPU at a time,which is by definition sequential execution. However,partitioning the code requires great care. Forexample, uneven partitioning can result in sequentialexecution once the small partitions have completed[Amd67]. In less extreme cases, load balancingcan be used to fully utilize available hardware,thus attaining more-optimal performance.In addition, partitioning of work can complicatehandling of global errors and events: a parallel programmay need to carry out non-trivial synchronizationin order to safely process such global events.Each partition requires some sort of communication:after all, if a given thread did not communicateat all, it would have no effect and would thus notneed to be executed. However, because communicationincurs overhead, careless partitioning choicescan result in severe performance degradation.Furthermore, the number of concurrent threadsmust often be controlled, as each such thread occupiescommon resources, for example, space in CPUcaches. <strong>If</strong>toomanythreadsarepermittedtoexecuteconcurrently, the CPU caches will overflow, resultingin high cache miss rate, which in turn degradesperformance. On the other hand, large numbers ofthreads are often required to overlap computationand I/O.Quick Quiz 1.12: <strong>What</strong> besides CPU cache capacitymight require limiting the number of concurrentthreads?Finally, permitting threads to execute concurrentlygreatly increases the program’s state space,which can make the program difficult to understand,degrading productivity. All else being equal, smallerstate spaces having more regular structure are moreeasily understood, but this is a human-factors statementas opposed to a technical or mathematicalstatement. Good parallel designs might have extremelylarge state spaces, but nevertheless be easyto understand due to their regular structure, whilepoor designs can be impenetrable despite having a

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!