10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

118 CHAPTER 12. ADVANCED SYNCHRONIZATIONical code results in a significant number of cachemisses. To limit the resulting performance degradation,CPUs have been designed to execute otherinstructionsandmemoryreferenceswhilewaitingforthe a cache miss to fetch data from memory. Thisclearly causes instructions and memory referencesto execute out of order, which could cause seriousconfusion, as illustrated in Figure 12.2. Compilersand synchronization primitives (such as locking andRCU) are responsible for maintaining the illusion ofordering through use of “memory barriers” (for example,smp_mb() in the Linux kernel). These memorybarriers can be explicit instructions, as they areon ARM, POWER, <strong>It</strong>anium, and Alpha, or they canbe implied by other instructions, as they are on x86.1 thread0(void)2 {3 A = 1;4 smp_wmb();5 B = 1;6 }78 thread1(void)9 {10 while (B != 1)11 continue;12 barrier();13 C = 1;14 }1516 thread2(void)17 {18 while (C != 1)19 continue;20 smp_mb();21 assert(A != 0);22 }23 assert(b == 2);Figure 12.3: <strong>Parallel</strong> <strong>Hard</strong>ware is Non-CausalFigure 12.2: CPUs <strong>Can</strong> <strong>Do</strong> Things Out of OrderSincethestandardsynchronizationprimitivespreservethe illusion of ordering, your path of least resistanceis to stop reading this section and simplyuse these primitives.However, if you need to implement the synchronizationprimitives themselves, or if you are simplyinterested in understanding how memory orderingand memory barriers work, read on!The next sections present counter-intuitive scenariosthat you might encounter when using explicitmemory barriers.12.2.2 <strong>If</strong> B Follows A, and C FollowsB, Why <strong>Do</strong>esn’t C Follow A?Memory ordering and memory barriers can be extremelycounter-intuitive. For example, consider thefunctions shown in Figure 12.3 executing in parallelwhere variables A, B, and C are initially zero:Intuitively, thread0() assigns to B after it assignsto A, thread1() waits until thread0() has assignedto B before assigning to C, and thread2() waits un-tilthread1()hasassignedtoCbeforereferencingA.Therefore, again intuitively, the assertion on line 21cannot possibly fire.This line of reasoning, intuitively obvious thoughit may be, is completely and utterly incorrect.Please note that this is not a theoretical assertion:actually running this code on real-world weaklyorderedhardware (a 1.5GHz 16-CPU POWER 5system) resulted in the assertion firing 16 timesout of 10 million runs. Clearly, anyone who producescode with explicit memory barriers should dosome extreme testing – although a proof of correctnessmight be helpful, the strongly counter-intuitivenature of the behavior of memory barriers shouldin turn strongly limit one’s trust in such proofs.The requirement for extreme testing should not betaken lightly, given that anumber of dirty hardwaredependenttricks were used to greatly increase theprobability of failure in this run.Quick Quiz 12.1: How on earth could the assertionon line 21 of the code in Figure 12.3 on page 118possibly fail???Quick Quiz 12.2: Great... <strong>So</strong> how do I fix it?<strong>So</strong>whatshouldyoudo? <strong>You</strong>rbeststrategy, ifpossible,is to use existing primitives that incorporateany needed memory barriers, so that you can simply

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!