10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

12.2. MEMORY BARRIERS 119ignore the rest of this chapter.Of course, if you are implementing synchronizationprimitives, you don’t have this luxury. The followingdiscussion of memory ordering and memorybarriers is for you.12.2.3 Variables <strong>Can</strong> Have MoreThan One Value<strong>It</strong>isnaturaltothinkofavariableastakingonawelldefinedsequence of values in a well-defined, globalorder. Unfortunately, it is time to say “goodbye” tothis sort of comforting fiction.To see this, consider the program fragment shownin Figure 12.4. This code fragment is executed inparallel by several CPUs. Line 1 sets a shared variabletothecurrentCPU’sID,line2initializesseveralvariablesfromagettb()functionthatdeliversathevalue of fine-grained hardware “timebase” counterthat is synchronized among all CPUs (not availablefrom all CPU architectures, unfortunately!), and theloop from lines 3-8 records the length of time thatthevariableretainsthevaluethatthisCPUassignedto it. Of course, one of the CPUs will “win”, andwould thus never exit the loop if not for the checkon lines 7-8.Quick Quiz 12.3: <strong>What</strong> assumption is the codefragment in Figure 12.4 making that might not bevalid on real hardware?1 state.variable = mycpu;2 lasttb = oldtb = firsttb = gettb();3 while (state.variable == mycpu) {4 lasttb = oldtb;5 oldtb = gettb();6 if (lasttb - firsttb > 1000)7 break;8 }Figure 12.4: <strong>So</strong>ftware Logic AnalyzerUpon exit from the loop, firsttb will hold atimestamp taken shortly after the assignment andlasttb will hold a timestamp taken before the lastsampling of the shared variable that still retainedthe assigned value, or a value equal to firsttb ifthe shared variable had changed before entry intothe loop. This allows us to plot each CPU’s view ofthe value of state.variable over a 532-nanosecondtime period, as shown in Figure 12.5. This data wascollected on 1.5GHz POWER5 system with 8 cores,each containing a pair of hardware threads. CPUs 1,2, 3, and 4 recorded the values, while CPU 0 controlledthe test. The timebase counter period wasabout 5.32ns, sufficiently fine-grained to allow observationsof intermediate cache states.CPU 1CPU 2CPU 3CPU 431422100ns 200ns 300ns 400ns 500nsFigure12.5: AVariableWithMultipleSimultaneousValuesEach horizontal bar represents the observations ofa given CPU over time, with the black regions tothe left indicating the time before the correspondingCPU’s first measurement. During the first 5ns,only CPU 3 has an opinion about the value of thevariable. During the next 10ns, CPUs 2 and 3 disagreeon the value of the variable, but thereafteragree that the value is “2”, which is in fact the finalagreed-upon value. However, CPU 1 believes thatthe value is “1” for almost 300ns, and CPU 4 believesthat the value is “4” for almost 500ns.Quick Quiz 12.4: How could CPUs possiblyhave different views of the value of a single variableat the same time?Quick Quiz 12.5: Why do CPUs 2 and 3 cometo agreement so quickly, when it takes so long forCPUs 1 and 4 to come to the party?We have entered a regime where we must bade afond farewell to comfortable intuitions about valuesof variables and the passage of time. This is theregime where memory barriers are needed.12.2.4 <strong>What</strong> <strong>Can</strong> <strong>You</strong> Trust?<strong>You</strong> most definitely cannot trust your intuition.<strong>What</strong> can you trust?<strong>It</strong> turns out that there are a few reasonably simplerules that allow you to make good use of memorybarriers. This section derives those rules, for thosewhowishtogettothebottomofthememory-barrierstory, at least from the viewpoint of portable code.<strong>If</strong> you just want to be told what the rules are ratherthan suffering through the actual derivation, pleasefeel free to skip to Section 12.2.6.The exact semantics of memory barriers varywildly from one CPU to another, so portable codemustrelyonlyontheleast-common-denominator semanticsof memory barriers.Fortunately, all CPUs impose the following rules:1. All accesses by a given CPU will appear to thatCPU to have occurred in program order.22

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!