10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

124 CHAPTER 12. ADVANCED SYNCHRONIZATION5. <strong>If</strong> one CPU does a load from A ordered beforea store to B, and if a second CPU does a storeto B ordered before a store to A, and if the firstCPU’s load from A gives the value stored bythe second CPU, then the first CPU’s store toB must happen after the second CPU’s storeto B, hence the value stored by the first CPUpersists. 6<strong>So</strong> what exactly @@@12.2.7 Abstract Memory AccessModelConsider the abstract model of the system shown inFigure 12.6.CPU 1 Memory CPU 2DeviceFigure 12.6: Abstract Memory Access ModelEach CPU executes a program that generatesmemory access operations. In the abstract CPU,memory operation ordering is very relaxed, and aCPU may actually perform the memory operationsin any order it likes, provided program causality appearsto be maintained. Similarly, the compiler mayalso arrange the instructions it emits in any order itlikes, provided it doesn’t affect the apparent operationof the program.<strong>So</strong>intheabovediagram, theeffectsofthememoryoperations performed by a CPU are perceived bythe rest of the system as the operations cross theinterface between the CPU and rest of the system(the dotted lines).For example, consider the following sequence ofevents given the initial values {A=1,B=2}:CPU 1 CPU 2A = 3; x = A;B = 4; y = B;6 Or, for the more competitively oriented, the first CPU’sstore to B “wins”.The set of accesses as seen by the memory systemin the middle can be arranged in 24 differentcombinations, with loads denoted by “ld” and storesdenoted by “st”:st A=3, st B=4, x=ld A→3, y=ld B→4st A=3, st B=4, y=ld B→4, x=ld A→3st A=3, x=ld A→3, st B=4, y=ld B→4st A=3, x=ld A→3, y=ld B→2, st B=4st A=3, y=ld B→2, st B=4, x=ld A→3st A=3, y=ld B→2, x=ld A→3, st B=4st B=4, st A=3, x=ld A→3, y=ld B→4st B=4, ......and can thus result in four different combinationsof values:x == 1, y == 2x == 1, y == 4x == 3, y == 2x == 3, y == 4Furthermore, the stores committed by a CPU tothe memory system may not be perceived by theloads made by another CPU in the same order asthe stores were committed.As a further example, consider this sequence ofevents given the initial values {A=1,B=2,C=3,P=&A,Q=&C}:CPU 1 CPU 2B = 4; Q = P;P = &B D = *Q;There is an obvious data dependency here, as thevalue loaded into D depends on the address retrievedfrom P by CPU 2. At the end of the sequence, anyof the following results are possible:(Q == &A) and (D == 1)(Q == &B) and (D == 2)(Q == &B) and (D == 4)Note that CPU 2 will never try and load C into Dbecause the CPU will load P into Q before issuingthe load of *Q.12.2.8 Device Operations<strong>So</strong>me devices present their control interfaces as collectionsof memory locations, but the order in whichthe control registers are accessed is very important.For instance, imagine an ethernet card with a set ofinternal registers that are accessed through an addressport register (A) and a data port register (D).To read internal register 5, the following code might

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!