10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

12.2. MEMORY BARRIERS 135Ordering with Multiple CPUs on One Lock:Suppose, instead of the two different locks as shownin Table 12.2, both CPUs acquire the same lock, asshown in Table 12.4?CPU 1 CPU 2A = a; E = e;LOCK M; LOCK M;B = b; F = f;C = c; G = g;UNLOCK M; UNLOCK M;D = d; H = h;Table 12.4: Ordering With Multiple CPUs on OneLockIn this case, either CPU 1 acquires M beforeCPU 2 does, or vice versa. In the first case, theassignments to A, B, and C must precede those toF, G, and H. On the other hand, if CPU 2 acquiresthe lock first, then the assignments to E, F, and Gmust precede those to B, C, and D.12.2.13 The Effects of the CPUCacheThe perceived ordering of memory operations is affectedby the caches that lie between the CPUs andmemory, as well as by the cache coherence protocolthat maintains memory consistency and ordering.From a software viewpoint, these caches are forall intents and purposes part of memory. Memorybarriers can be thought of as acting on the verticaldotted line in Figure 12.17, ensuring that the CPUpresent its values to memory in the proper order, aswell as ensuring that it see changes made by otherCPUs in the proper order.Although the caches can “hide” a given CPU’smemory accesses from the rest of the system,the cache-coherence protocol ensures that all otherCPUs see any effects of these hidden accesses, migratingand invalidating cachelines as required. Furthermore,the CPU core may execute instructions inany order, restricted only by the requirement thatprogram causality and memory ordering appear tobe maintained. <strong>So</strong>me of these instructions may generatememory accesses that must be queued in theCPU’s memory access queue, but execution maynonetheless continue until the CPU either fills upits internal resources or until it must wait for somequeued memory access to complete.12.2.13.1 Cache CoherencyAlthough cache-coherence protocols guarantee thata given CPU sees its own accesses in order, and thatall CPUs agree on the order of modifications to asingle variable contained within a single cache line,there is no guarantee that modifications to differentvariables will be seen in the same order by all CPUs— although some computer systems do make somesuch guarantees, portable software cannot rely onthem.Toseewhyreorderingcanoccur, considerthetwo-CPU system shown in Figure 12.18, in which eachCPUhasasplitcache. Thissystemhasthefollowingproperties:1. An odd-numbered cache line may be in cacheA, cache C, in memory, or some combination ofthe above.2. An even-numbered cache line may be in cacheB, cache D, in memory, or some combination ofthe above.3. While the CPU core is interrogating one ofits caches, 9 its other cache is not necessarilyquiescent. This other cache may instead beresponding to an invalidation request, writingback a dirty cache line, processing elements inthe CPU’s memory-access queue, and so on.4. Each cache has queues of operations that needto be applied to that cache in order to maintainthe required coherence and ordering properties.5. These queues are not necessarily flushed byloads from or stores to cache lines affected byentries in those queues.In short, if cache A is busy, but cache B is idle,then CPU 1’s stores to odd-numbered cache linesmay be delayed compared to CPU 2’s stores toeven-numbered cache lines. In not-so-extreme cases,CPU 2 may see CPU 1’s operations out of order.Much more detail on memory ordering in hardwareand software may be found in Appendix C.12.2.14 Where Are Memory BarriersNeeded?Memory barriers are only required where there’s apossibility of interaction between two CPUs or betweena CPU and a device. <strong>If</strong> it can be guaranteedthat there won’t be any such interaction in any9 But note that in “superscalar” systems, the CPU mightwell be accessing both halves of its cache at once, and mightin fact be performing multiple concurrent accesses to each ofthe halves.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!