10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 2<strong>Hard</strong>ware and its HabitsMost people have an intuitive understanding thatpassing messages between systems is considerablymore expensive than performing simple calculationswithin the confines of a single system. However,it is not always so clear that communicatingamong threads within the confines of a singleshared-memory system can also be quite expensive.This chapter therefore looks the cost of synchronizationand communication within a shared-memorysystem. This chapter merely scratches the surfaceof shared-memory parallel hardware design; readersdesiring more detail would do well to start with arecent edition of Hennessy’s and Patterson’s classictext [HP95].Quick Quiz 2.1: Why should parallel programmersbother learning low-level properties of thehardware? Wouldn’t it be easier, better, and moregeneral to remain at a higher level of abstraction?2.1 OverviewCareless reading of computer-system specificationsheets might lead one to believe that CPU performanceis a footrace on a clear track, as illustratedin Figure 2.1, where the race always goes to theswiftest.Although there are a few CPU-bound benchmarksthat approach the ideal shown in Figure 2.1, thetypical program more closely resembles an obstaclecourse than a race track. This is because the internalarchitecture of CPUs has changed dramaticallyover the past few decades, courtesy of Moore’sLaw. These changes are described in the followingsections.2.1.1 Pipelined CPUsIn the early 1980s, the typical microprocessorfetched an instruction, decoded it, and executed it,Figure 2.1: CPU Performance at its Besttypically taking at least three clock cycles to completeone instruction before proceeding to the next.In contrast, the CPU of the late 1990s and early2000s will be executing many instructions simultaneously,using a deep “pipeline” to control the flowof instructions internally to the CPU, this differencebeing illustrated by Figure 2.2.Achieving full performance with a CPU having along pipeline requires highly predictable control flowthrough the program. Suitable control flow can beprovided by a program that executes primarily intight loops, for example, programs doing arithmeticon large matrices or vectors. The CPU can thencorrectly predict that the branch at the end of theloop will be taken in almost all cases. In such programs,the pipeline can be kept full and the CPUcan execute at full speed.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!