10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

C.7. MEMORY-BARRIER INSTRUCTIONS FOR SPECIFIC CPUS 177Figure C.12: Half Memory Barrier1 r1 = x;2 if (r1 == 0)3 nop();4 y = 1;5 r2 = z;6 ISB();7 r3 = z;In this example, load-store control dependency orderingcauses the load from x on line 1 to be orderedbefore the store to y on line 4. However, ARM doesnot respect load-load control dependencies, so thatthe load on line 1 might well happen after the loadon line 5. On the other hand, the combination of theconditional branch on line 2 and the ISB instructionon line 6 ensures that the load on line 7 happensafter the load on line 1.C.7.4 IA64IA64 offers a weak consistency model, so that in absenceof explicit memory-barrier instructions, IA64is within its rights to arbitrarily reorder memory references[Int02b]. IA64 has a memory-fence instructionnamed mf, but also has “half-memory fence”modifiers to loads, stores, and to some of its atomicinstructions [Int02a]. The acq modifier preventssubsequent memory-reference instructions from beingreordered before the acq, but permits priormemory-reference instructions to be reordered afterthe acq, as fancifully illustrated by Figure C.12.Similarly, the rel modifier prevents prior memoryreferenceinstructions from being reordered after therel, but allows subsequent memory-reference instructionsto be reordered before the rel.These half-memory fences are useful for criticalsections,sinceitissafetopushoperationsintoacriticalsection, but can be fatal to allow them to bleedout. However, as one of the only CPUs with thisproperty, IA64 defines Linux’s semantics of memoryorderingassociatedwithlockacquisitionandrelease.TheIA64mfinstructionisusedforthesmp rmb(),smp mb(), and smp wmb() primitives in the Linuxkernel. Oh, and despite rumors to the contrary,the “mf” mneumonic really does stand for “memoryfence”.Finally, IA64 offers a global total order for “release”operations, including the “mf” instruction.This provides the notion of transitivity, where if agiven code fragment sees a given access as havinghappened, any later code fragment will also see thatearlier access as having happened. Assuming, thatis, that all the code fragments involved correctly usememory barriers.C.7.5 PA-RISCAlthough the PA-RISC architecture permits full reorderingof loads and stores, actual CPUs run fullyordered [Kan96]. This means that the Linux kernel’smemory-ordering primitives generate no code,however, theydousethegccmemoryattributetodisablecompiler optimizations that would reorder codeacross the memory barrier.C.7.6 POWER / Power PCThe POWER and Power PC R○ CPU families have awide variety of memory-barrier instructions [IBM94,LSH02]:1. sync causes all preceding operations to appearto have completed before any subsequent operationsare started. This instruction is thereforequite expensive.2. lwsync (light-weight sync) orders loads with respectto subsequent loads and stores, and alsoorders stores. However, it does not order storeswith respect to subsequent loads. Interestinglyenough, the lwsync instruction enforces thesame ordering as does zSeries, and coincidentally,SPARC TSO.3. eieio (enforce in-order execution of I/O, incase you were wondering) causes all precedingcacheable stores to appear to have completedbefore all subsequent stores. However, stores tocacheable memory are ordered separately fromstores to non-cacheable memory, which meansthateieio will not force an MMIO store to precedea spinlock release.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!