10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

128 CHAPTER 12. ADVANCED SYNCHRONIZATIONWhilst this may seem like a failure of coherencyorcausalitymaintenance, itisn’t, andthisbehaviourcan be observed on certain real CPUs (such as theDEC Alpha).To deal with this, a data dependency barrier mustbe inserted between the address load and the dataload (again with initial values of {A=1,B=2,C=3,P=&A,Q=&C}):CPU 1 CPU 2B = 4;P = &B;Q = P;D = *Q;This enforces the occurrence of one of the two implications,and prevents the third possibility fromarising.Note that this extremely counterintuitive situationarises most easily on machines with split caches,so that, for example, one cache bank processes evennumberedcache lines and the other bank processesodd-numbered cache lines. The pointer P might bestored in an odd-numbered cache line, and the variableB might be stored in an even-numbered cacheline. Then, if the even-numbered bank of the readingCPU’s cache is extremely busy while the oddnumberedbank is idle, one can see the new value ofthe pointer P (which is &B), but the old value of thevariable B (which is 1).Another example of where data dependency barriersmight by required is where a number is readfrom memory and then used to calculate the indexfor an array access with initial values {M[0]=1,M[1]=2,M[3]=3,P=0,Q=3}:CPU 1 CPU 2M[1] = 4;P = 1;Q = P;D = M[Q];The data dependency barrier is very importantto the Linux kernel’s RCU system, for example, seercu_dereference() in include/linux/rcupdate.h. This permits the current target of an RCU’dpointer to be replaced with a new modified target,without the replacement target appearing to be incompletelyinitialised.See also the subsection on @@@”Cache Coherency”for a more thorough example.12.2.10.5 Control DependenciesA control dependency requires a full read memorybarrier, not simply a data dependency barrier tomake it work correctly. Consider the following bit ofcode:1 q = &a;2 if (p)3 q = &b;4 5 x = *q;This will not have the desired effect because thereis no actual data dependency, but rather a controldependency that the CPU may short-circuit by attemptingto predict the outcome in advance. In sucha case what’s actually required is:1 q = &a;2 if (p)3 q = &b;4 5 x = *q;12.2.10.6 SMP Barrier PairingWhen dealing with CPU-CPU interactions, certaintypes of memory barrier should always be paired.A lack of appropriate pairing is almost certainly anerror.A write barrier should always be paired with adata dependency barrier or read barrier, though ageneral barrier would also be viable. Similarly aread barrier or a data dependency barrier should alwaysbepairedwithatleastanwritebarrier,though,again, a general barrier is viable:CPU 1 CPU 2a = 1;b = 2;x = b;y = a;Or:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!