10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

308 APPENDIX F. ANSWERS TO QUICK QUIZZESAnswer:The code assumes that as soon as a given CPU stopsseeing its own value, it will immediately see thefinal agreed-upon value. On real hardware, someof the CPUs might well see several intermediateresults before converging on the final value.Quick Quiz 12.4:How could CPUs possibly have different views ofthe value of a single variable at the same time?Answer:Many CPUs have write buffers that record thevalues of recent writes, which are applied once thecorresponding cache line makes its way to the CPU.Therefore, it is quite possible for each CPU to seea different value for a given variable at a singlepoint in time — and for main memory to hold yetanother value. One of the reasons that memorybarriers were invented was to allow software to dealgracefully with situations like this one.Quick Quiz 12.5:Why do CPUs 2 and 3 come to agreement soquickly, when it takes so long for CPUs 1 and 4 tocome to the party?Answer:CPUs 2 and 3 are a pair of hardware threads on thesame core, sharing the same cache hierarchy, andtherefore have very low communications latencies.This is a NUMA, or, more accurately, a NUCAeffect.This leads to the question of why CPUs 2 and 3ever disagree at all. One possible reason is that theyeach might have a small amount of private cache inaddition to a larger shared cache. Another possiblereason is instruction reordering, given the short 10-nanosecond duration of the disagreement and thetotal lack of memory barriers in the code fragment.Quick Quiz 12.6:But if the memory barriers do not unconditionallyforce ordering, how the heck can a device driverreliably execute sequences of loads and stores toMMIO registers???Answer:MMIO registers are special cases: because theyappear in uncached regions of physical memory.Memory barriers do unconditionally force orderingof loads and stores to uncached memory. SeeSection @@@ for more information on memorybarriers and MMIO regions.Quick Quiz 12.7:How could the assertion b==2 on page 122 possiblyfail?Answer:<strong>If</strong> the CPU is not required to see all of its loads andstores in order, then the b=1+a might well see anold version of the variable “a”.This is why it is so very important that each CPUor thread see all of its own loads and stores in programorder.Quick Quiz 12.8:How could the code on page 122 possibly leakmemory?Answer:Only the first execution of the critical section shouldsee p==NULL. However, if there is no global orderingof critical sections for mylock, then how can you saythat a particular one was first? <strong>If</strong> several differentexecutions of that critical section thought that theywere first, they would all see p==NULL, and theywould all allocate memory. All but one of thoseallocations would be leaked.This is why it is so very important that all thecritical sections for a given exclusive lock appear toexecute in some well-defined order.Quick Quiz 12.9:How could the code on page 122 possibly countbackwards?Answer:Suppose that the counter started out with thevalue zero, and that three executions of the criticalsection had therefore brought its value to three. <strong>If</strong>the fourth execution of the critical section is notconstrained to see the most recent store to thisvariable, it might well see the original value of zero,and therefore set the counter to one, which wouldbe going backwards.This is why it is so very important that loads froma given variable in a given critical section see the laststore from the last prior critical section to store tothat variable.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!