10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

F.7. CHAPTER 8: DEFERRED PROCESSING 303for a safe way to use non-atomic operations inrcu_read_lock() and rcu_read_unlock().Quick Quiz 8.45:Come off it! We can see the atomic_read()primitive in rcu_read_lock()!!! <strong>So</strong> why are youtrying to pretend that rcu_read_lock() containsno atomic operations???Answer:The atomic_read() primitives does not actuallyexecute atomic machine instructions, but ratherdoes a normal load from an atomic_t. <strong>It</strong>s solepurpose is to keep the compiler’s type-checkinghappy. <strong>If</strong> the Linux kernel ran on 8-bit CPUs, itwould also need to prevent “store tearing”, whichcould happen due to the need to store a 16-bitpointer with two eight-bit accesses on some 8-bitsystems. But thankfully, it seems that no one runsLinux on 8-bit systems.Quick Quiz 8.46:Great, if we have N threads, we canhave 2N ten-millisecond waits (one set perflip_counter_and_wait() invocation, and eventhat assumes that we wait only once for each thread.<strong>Do</strong>n’t we need the grace period to complete muchmore quickly?Answer:Keep in mind that we only wait for a given thread ifthat thread is still in a pre-existing RCU read-sidecritical section, and that waiting for one hold-outthread gives all the other threads a chance tocomplete any pre-existing RCU read-side criticalsections that they might still be executing. <strong>So</strong>the only way that we would wait for 2N intervalswould be if the last thread still remained in apre-existing RCU read-side critical section despiteall the waiting for all the prior threads. In short,this implementation will not wait unnecessarily.However, if you are stress-testing code that usesRCU, you might want to comment out the poll()statement in order to better catch bugs that incorrectlyretain a reference to an RCU-protected dataelement outside of an RCU read-side critical section.Quick Quiz 8.47:All of these toy RCU implementations have eitheratomic operations in rcu_read_lock() andrcu_read_unlock(), or synchronize_rcu() overheadthat increases linearly with the number ofthreads. Under what circumstances could an RCUimplementation enjoy light-weight implementationsfor all three of these primitives, all having deterministic(O(1)) overheads and latencies?Answer:Special-purpose uniprocessor implementations ofRCU can attain this ideal [McK09a].Quick Quiz 8.48:<strong>If</strong> any even value is sufficient to tellsynchronize_rcu() to ignore a given task,why doesn’t line 10 of Figure 8.40 simply assignzero to rcu_reader_gp?Answer:Assigning zero (or any other even-numbered constant)would in fact work, but assigning the valueof rcu_gp_ctr can provide a valuable debuggingaid, as it gives the developer an idea of when thecorresponding thread last exited an RCU read-sidecritical section.Quick Quiz 8.49:Why are the memory barriers on lines 17 and 29of Figure 8.40 needed? Aren’t the memory barriersinherent in the locking primitives on lines 18 and28 sufficient?Answer:These memory barriers are required because thelocking primitives are only guaranteed to confinethe critical section. The locking primitives areunder absolutely no obligation to keep other codefrom bleeding in to the critical section. The pair ofmemory barriers are therefore requires to preventthis sort of code motion, whether performed by thecompiler or by the CPU.Quick Quiz 8.50:Couldn’t the update-side optimization described inSection 8.3.4.6 be applied to the implementationshown in Figure 8.40?Answer:Indeed it could, with a few modifications. Thiswork is left as an exercise for the reader.Quick Quiz 8.51:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!