10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

122 CHAPTER 12. ADVANCED SYNCHRONIZATIONtee that at least one of the loads saw the value storedby the corresponding store (or some later value forthat same variable).Stores “Pass in the Night”. In the followingexample, after both CPUs have finished executingtheir code sequences, it is quite tempting ot concludethat the result {A==1,B==2} cannot happen.CPU 1 CPU 2A=1; B=2;smp_mb(); smp_mb();B=1; A=2;Unfortunately, such a conclusion does not necessarilyhold on all 20 th -century systems. Supposethat the cache line containing A is initially ownedby CPU 2, and that containing B is initially ownedby CPU 1. Then, in systems that have invalidationqueues and store buffers, it is possible for the firstassigments to “pass in the night”, so that the secondassignments actually happen first. This strange (butquite common) effect is explained in Appendix C.This same effect can happen in any memorybarrierpairing where each CPU’s memory barrier ispreceded by a store, including the “ears to mouths”pairing.However, 21 st -century hardware does accommodateordering intuitions, and do permit this combinationto be used safely.12.2.4.6 Pair-Wise Memory Barriers: Non-Portable CombinationsIn the following pairings from Table 12.1, the memorybarriers have no effect that portable code cansafely depend on.Ears to Ears. Since loads do not change the stateof memory (ignoring MMIO registers for the moment),it is not possible for one of the loads to seethe results of the other load.Mouth to Mouth, Ear to Ear. One of the variablesisonlyloadedfrom,andtheotherisonlystoredto. Because (once again, ignoring MMIO registers)it is not possible for one load to see the results ofthe other, it is not possible to detect the conditionalordering provided by the memory barrier. (Yes, itis possible to determine which store happened last,but this does not depend on the memory barrier.)Only One Store. Because there is only one store,only one of the variables permits one CPU to seethe results of the other CPU’s access. Therefore,there is no way to detect the conditional orderingprovided by the memory barriers. (Yes, it is possibleto determine whether or not the load saw the resultof the corresponding store, but this does not dependon the memory barrier.)12.2.4.7 Semantics Sufficient to ImplementLockingSuppose we have an exclusive lock (spinlock_tin the Linux kernel, pthread_mutex_t in pthreadscode) that guards a number of variables (in otherwords, these variables are not accessed except fromthe lock’s critical sections). The following propertiesmust then hold true:1. A given CPU or thread must see all of its ownloads and stores as if they had occurred in programorder.2. The lock acquisitions and releases must appearto have executed in a single global order. 23. Supposeagivenvariablehasnotyetbeenstoredto in a critical section that is currently executing.Then any load from a given variable performedin that critical section must see the laststoretothatvariablefromthelastpreviouscriticalsection that stored to it.The difference between the last two properties isa bit subtle: the second requires that the lock acquisitionsand releases occur in a well-defined order,while the third requires that the critical sections not“bleed out” far enough to cause difficulties for othercritical section.Why are these properties necessary?Suppose the first property did not hold. Then theassertion in the following code might well fail!a = 1;b = 1 + a;assert(b == 2);Quick Quiz 12.7: How could the assertion b==2on page 122 possibly fail?Suppose that the second property did not hold.Then the following code might leak memory!spin_lock(&mylock);if (p == NULL)p = kmalloc(sizeof(*p), GFP_KERNEL);spin_unlock(&mylock);2 Of course, this order might be different from one run tothe next. On any given run, however, all CPUs and threadsmust have a consistent view of the order of critical sectionsfor a given exclusive lock.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!