10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8.3. READ-COPY UPDATE (RCU) 1051 static void rcu_read_lock(void)2 {3 }45 static void rcu_read_unlock(void)6 {7 }89 rcu_quiescent_state(void)10 {11 smp_mb();12 __get_thread_var(rcu_reader_qs_gp) =13 ACCESS_ONCE(rcu_gp_ctr) + 1;14 smp_mb();15 }1617 static void rcu_thread_offline(void)18 {19 smp_mb();20 __get_thread_var(rcu_reader_qs_gp) =21 ACCESS_ONCE(rcu_gp_ctr);22 smp_mb();23 }2425 static void rcu_thread_online(void)26 {27 rcu_quiescent_state();28 }Figure 8.44: Quiescent-State-Based RCU Read Sidelines 1-7 in the figure, the rcu_read_lock() andrcu_read_unlock() primitives do nothing, and caninfactbeexpectedtobeinlinedandoptimizedaway,as they are in server builds of the Linux kernel.This is due to the fact that quiescent-state-basedRCU implementations approximate the extents ofRCU read-side critical sections using the aforementionedquiescent states, which contains calls torcu_quiescent_state(), shown from lines 9-15 inthe figure. Threads entering extended quiescentstates (for example, when blocking) may insteaduse the thread_offline() and thread_online()AP<strong>Is</strong> to mark the beginning and the end, respectively,of such an extended quiescent state. Assuch, thread_online() is analogous to rcu_read_lock()andthread_offline()isanalogous torcu_read_unlock(). These two functions are shown onlines 17-28 in the figure. In either case, it is illegalfor a quiescent state to appear within an RCUread-side critical section.In rcu_quiescent_state(), line 11 executes amemory barrier to prevent any code prior to thequiescent state from being reordered into the quiescentstate. Lines 12-13 pick up a copy of theglobal rcu_gp_ctr, using ACCESS_ONCE() to ensurethat the compiler does not employ any optimizationsthat would result in rcu_gp_ctr beingfetched more than once, and then adds one tothe value fetched and stores it into the per-threadrcu_reader_qs_gp variable, so that any concurrentinstance of synchronize_rcu() will see an odd-1 void synchronize_rcu(void)2 {3 int t;45 smp_mb();6 spin_lock(&rcu_gp_lock);7 rcu_gp_ctr += 2;8 smp_mb();9 for_each_thread(t) {10 while (rcu_gp_ongoing(t) &&11 ((per_thread(rcu_reader_qs_gp, t) -12 rcu_gp_ctr) < 0)) {13 poll(NULL, 0, 10);14 }15 }16 spin_unlock(&rcu_gp_lock);17 smp_mb();18 }Figure 8.45: RCU Update Side Using QuiescentStatesnumbered value, thus becoming aware that a newRCU read-side critical section has started. Instancesof synchronize_rcu() that are waiting on olderRCU read-side critical sections will thus know to ignorethis new one. Finally, line 14 executes a memorybarrier.Quick Quiz 8.55: <strong>Do</strong>esn’t the additional memorybarrier shown on line 14 of Figure 8.44, greatlyincrease the overhead of rcu_quiescent_state?<strong>So</strong>me applications might use RCU only occasionally,but use it very heavily when they do useit. Such applications might choose to use rcu_thread_online() when starting to use RCU andrcu_thread_offline() when no longer using RCU.The time between a call to rcu_thread_offline()and a subsequent call to rcu_thread_online() isan extended quiescent state, so that RCU will notexpect explicit quiescent states to be registered duringthis time.The rcu_thread_offline() function simply setsthe per-thread rcu_reader_qs_gp variable to thecurrent value of rcu_gp_ctr, which has an evennumberedvalue. Any concurrent instances ofsynchronize_rcu() will thus know to ignore thisthread.Quick Quiz 8.56: Why are the two memory barrierson lines 19 and 22 of Figure 8.44 needed?The rcu_thread_online() function simply invokesrcu_quiescent_state(), thus marking theend of the extended quiescent state.Figure 8.45 (rcu_qs.c) shows the implementationof synchronize_rcu(), which is quite similarto that of the preceding sections.This implementation has blazingly fast read-sideprimitives, with an rcu_read_lock()-rcu_read_unlock() round trip incurring an overhead ofroughly 50 picoseconds. The synchronize_rcu()

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!