10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

312 APPENDIX F. ANSWERS TO QUICK QUIZZESQuick Quiz C.4:How does the hardware handle the delayed transitionsdescribed above?Answer:Usually by adding additional states, though theseadditional states need not be actually stored withthe cache line, due to the fact that only a few linesat a time will be transitioning. The need to delaytransitions is but one issue that results in real-worldcache coherence protocols being much more complexthan the over-simplified MESI protocol describedin this appendix. Hennessy and Patterson’s classicintroduction to computer architecture [HP95] coversmany of these issues.Quick Quiz C.5:<strong>What</strong> sequence of operations would put the CPUs’caches all back into the “invalid” state?Answer:There is no such sequence, at least in absenceof special “flush my cache” instructions in theCPU’s instruction set. Most CPUs do have suchinstructions.Quick Quiz C.6:In step 1 above, why does CPU 0 need to issue a“read invalidate” rather than a simple “invalidate”?Quick Quiz C.8:Say what??? Why do we need a memory barrierhere, given that the CPU cannot possibly executethe assert() until after the while loop completes???Answer:CPUs are free to speculatively execute, which canhave the effect of executing the assertion before thewhile loop completes.Quick Quiz C.9:<strong>Do</strong>es the guarantee that each CPU sees its ownmemory accesses in order also guarantee that eachuser-level thread will see its own memory accessesin order? Why or why not?Answer:No. Consider the case where a thread migrates fromone CPU to another, and where the destinationCPU perceives the source CPU’s recent memoryoperations out of order. To preserve user-modesanity, kernel hackers must use memory barriersin the context-switch path. However, the lockingalready required to safely do a context switchshould automatically provide the memory barriersneeded to cause the user-level task to see its ownaccesses in order. That said, if you are designing asuper-optimized scheduler, either in the kernel or atuser level, please keep this scenario in mind!Answer:Because the cache line in question contains morethan just the variable a.Quick Quiz C.7:In step 1 of the first scenario in Section C.4.3, whyis an “invalidate” sent instead of a ”read invalidate”message? <strong>Do</strong>esn’t CPU 0 need the values of theother variables that share this cache line with “a”?Answer:CPU 0 already has the values of these variables,given that it has a read-only copy of the cache linecontaining “a”. Therefore, all CPU 0 need do isto cause the other CPUs to discard their copies ofthis cache line. An “invalidate” message thereforesuffices.Quick Quiz C.10:Could this code be fixed by inserting a memorybarrier between CPU 1’s “while” and assignment to“c”? Why or why not?Answer:No. Such a memory barrier would only forceordering local to CPU 1. <strong>It</strong> would have no effecton the relative ordering of CPU 0’s and CPU 1’saccesses, so the assertion could still fail. However,all mainstream computer systems provide onemechanism or another to provide “transitivity”,which provides intuitive causal ordering: if B sawthe effects of A’s accesses, and C saw the effects ofB’s accesses, then C must also see the effects of A’saccesses. In short, hardware designers have takenat least a little pity on software developers.Quick Quiz C.11:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!