10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8.3. READ-COPY UPDATE (RCU) 971 DEFINE_SPINLOCK(rcu_gp_lock);2 atomic_t rcu_refcnt[2];3 atomic_t rcu_idx;4 DEFINE_PER_THREAD(int, rcu_nesting);5 DEFINE_PER_THREAD(int, rcu_read_idx);Figure 8.30: RCU Global Reference-Count PairDataquite heavyweight, with read-side overhead rangingfromabout100nanosecondsonasinglePower5CPUup to almost 40 microseconds on a 64-CPU system.This means that the RCU read-side criticalsections have to be extremely long in order to getany real read-side parallelism. On the other hand,in the absence of readers, grace periods elapse inabout 40 nanoseconds, many orders of magnitudefaster than production-quality implementations inthe Linux kernel.Quick Quiz 8.40: How can the grace period possiblyelapse in 40 nanoseconds when synchronize_rcu() contains a 10-millisecond delay?Second, if there are many concurrent rcu_read_lock() and rcu_read_unlock() operations, therewill be extreme memory contention on rcu_refcnt,resulting in expensive cache misses. Both of thesefirst two shortcomings largely defeat a major purposeof RCU, namely to provide low-overhead readsidesynchronization primitives.Finally, a large number of RCU readers withlong read-side critical sections could preventsynchronize_rcu() from ever completing, as theglobalcountermightneverreachzero. Thiscouldresultin starvation of RCU updates, which is of courseunacceptable in production settings.Quick Quiz 8.41: Why not simply make rcu_read_lock()waitwhenaconcurrentsynchronize_rcu() has been waiting too long in the RCU implementationin Figure 8.29? Wouldn’t that preventsynchronize_rcu() from starving?Therefore, it is still hard to imagine this implementationbeing useful in a production setting,though it has a bit more potential than the lockbasedmechanism, for example, as an RCU implementationsuitable for a high-stress debugging environment.The next section describes a variation onthe reference-counting scheme that is more favorableto writers.8.3.4.4 Starvation-Free Counter-BasedRCUFigure 8.31 (rcu_rcgp.h) shows the read-side primitivesof an RCU implementation that uses a pairof reference counters (rcu_refcnt[]), along with1 static void rcu_read_lock(void)2 {3 int i;4 int n;56 n = __get_thread_var(rcu_nesting);7 if (n == 0) {8 i = atomic_read(&rcu_idx);9 __get_thread_var(rcu_read_idx) = i;10 atomic_inc(&rcu_refcnt[i]);11 }12 __get_thread_var(rcu_nesting) = n + 1;13 smp_mb();14 }1516 static void rcu_read_unlock(void)17 {18 int i;19 int n;2021 smp_mb();22 n = __get_thread_var(rcu_nesting);23 if (n == 1) {24 i = __get_thread_var(rcu_read_idx);25 atomic_dec(&rcu_refcnt[i]);26 }27 __get_thread_var(rcu_nesting) = n - 1;28 }Figure 8.31: RCU Read-Side Using GlobalReference-Count Paira global index that selects one counter out of thepair (rcu_idx), a per-thread nesting counter rcu_nesting, a per-thread snapshot of the global index(rcu_read_idx), and a global lock (rcu_gp_lock),which are themselves shown in Figure 8.30.The rcu_read_lock() primitive atomically incrementsthe member of the rcu_refcnt[] pairindexed by rcu_idx, and keeps a snapshot ofthis index in the per-thread variable rcu_read_idx. The rcu_read_unlock() primitive then atomicallydecrements whichever counter of the pair thatthe corresponding rcu_read_lock() incremented.However, because only one value of rcu_idx is rememberedper thread, additional measures must betaken to permit nesting. These additional measuresuse the per-thread rcu_nesting variable to tracknesting.To make all this work, line 6 of rcu_read_lock()in Figure 8.31 picks up the current thread’s instanceof rcu_nesting, and if line 7 finds that this is theoutermostrcu_read_lock(), thenlines8-10pickupthe current value of rcu_idx, save it in this thread’sinstanceofrcu_read_idx, andatomicallyincrementthe selected element of rcu_refcnt. Regardless ofthe value of rcu_nesting, line 12 increments it.Line 13 executes a memory barrier to ensure thatthe RCU read-side critical section does not bleedout before the rcu_read_lock() code.Similarly, the rcu_read_unlock() function executesa memory barrier at line 21 to ensure that the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!