10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8.3. READ-COPY UPDATE (RCU) 958.3.3.4 <strong>So</strong>, <strong>What</strong> is RCU Really?At its core, RCU is nothing more nor less than anAPI that supports publication and subscription forinsertions, waiting for all RCU readers to complete,and maintenance of multiple versions. That said, itis possible to build higher-level constructs on top ofRCU, including the reader-writer-locking, referencecounting,and existence-guarantee constructs listedin the companion article. Furthermore, I have nodoubt that the Linux community will continue tofind interesting new uses for RCU, just as they dofor any of a number of synchronization primitivesthroughout the kernel.Of course, a more-complete view of RCU wouldalso include all of the things you can do with theseAP<strong>Is</strong>.However, for many people, a complete view ofRCU must include sample RCU implementations.The next section therefore presents a series of “toy”RCU implementations of increasing complexity andcapability.8.3.4 “Toy” RCU ImplementationsThe toy RCU implementations in this section aredesigned not for high performance, practicality, orany kind of production use, but rather for clarity.Nevertheless, you will need a thorough understandingof Chapters 1, 2, 3, 5, and 8 for even these toyRCU implementations to be easily understandable.This section provides a series of RCU implementationsin order of increasing sophistication, from theviewpoint of solving the existence-guarantee problem.Section 8.3.4.1 presents a rudimentary RCUimplementation based on simple locking, while Section8.3.4.3 through 8.3.4.9 present a series of simpleRCU implementations based on locking, referencecounters, and free-running counters. Finally,Section 8.3.4.10 provides a summary and a list ofdesirable RCU properties.8.3.4.1 Lock-Based RCUPerhaps the simplest RCU implementation leverageslocking, as shown in Figure 8.27 (rcu_lock.hand rcu_lock.c). In this implementation, rcu_read_lock() acquires a global spinlock, rcu_read_unlock() releases it, and synchronize_rcu() acquiresit then immediately releases it.Because synchronize_rcu() does not return untilit has acquired (and released) the lock, it cannotreturn until all prior RCU read-side criticalsections have completed, thus faithfully implementingRCU semantics. Of course, only one RCUreader may be in its read-side critical section ata time, which almost entirely defeats the purposeof RCU. In addition, the lock operations inrcu_read_lock() and rcu_read_unlock() are extremelyheavyweight, with read-side overhead rangingfrom about 100 nanoseconds on a single Power5CPU up to more than 17 microseconds on a 64-CPU system. Worse yet, these same lock operationspermit rcu_read_lock() to participate in deadlockcycles. Furthermore, in absence of recursive locks,RCU read-side critical sections cannot be nested,and,finally,althoughconcurrentRCUupdatescouldin principle be satisfied by a common grace period,this implementation serializes grace periods, preventinggrace-period sharing.Quick Quiz 8.34: Why wouldn’t any deadlockin the RCU implementation in Figure 8.27 also be adeadlock in any other RCU implementation?Quick Quiz 8.35: Why not simply use readerwriterlocks in the RCU implementation in Figure8.27 in order to allow RCU readers to proceedin parallel?<strong>It</strong> is hard to imagine this implementation beinguseful in a production setting, though it does havethe virtue of being implementable in almost anyuser-level application. Furthermore, similar implementationshavingonelockperCPUorusingreaderwriterlocks have been used in production in the 2.4Linux kernel.A modified version of this one-lock-per-CPU approach,but instead using one lock per thread, isdescribed in the next section.8.3.4.2 Per-Thread Lock-Based RCUFigure 8.28 (rcu_lock_percpu.h and rcu_lock_percpu.c) shows an implementation based onone lock per thread. The rcu_read_lock()and rcu_read_unlock() functions acquire andrelease, respectively, the current thread’s lock.1 static void rcu_read_lock(void)2 {3 spin_lock(&rcu_gp_lock);4 }56 static void rcu_read_unlock(void)7 {8 spin_unlock(&rcu_gp_lock);9 }1011 void synchronize_rcu(void)12 {13 spin_lock(&rcu_gp_lock);14 spin_unlock(&rcu_gp_lock);15 }Figure 8.27: Lock-Based RCU Implementation

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!