10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

92 CHAPTER 8. DEFERRED PROCESSINGperiod”. The asynchronous update-side primitive,call_rcu(), invokes a specified function with aspecified argument after a subsequent grace period.For example, call_rcu(p,f); will result in the“RCU callback” f(p) being invoked after a subsequentgrace period. There are situations, such aswhen unloading a Linux-kernel module that usescall_rcu(), when it is necessary to wait for all outstandingRCU callbacks to complete [McK07e]. Thercu_barrier() primitive does this job. Note thatthe more recent hierarchical RCU [McK08a] implementationdescribed in Sections D.2 and D.3 alsoadheres to “RCU Classic” semantics.Finally, RCU may be used to provide type-safememory [GC96], as described in Section 8.3.2.6.In the context of RCU, type-safe memory guaranteesthat a given data element will not changetype during any RCU read-side critical sectionthat accesses it. To make use of RCU-basedtype-safe memory, pass SLAB_DESTROY_BY_RCU tokmem_cache_create(). <strong>It</strong> is important to notethat SLAB_DESTROY_BY_RCU will in no way preventkmem_cache_alloc() from immediately reallocatingmemory that was just now freed viakmem_cache_free()! In fact, the SLAB_DESTROY_BY_RCU-protected data structure just returned byrcu_dereference might be freed and reallocatedan arbitrarily large number of times, even whenunder the protection of rcu_read_lock(). Instead,SLAB_DESTROY_BY_RCU operates by preventingkmem_cache_free() from returning a completelyfreed-upslabofdatastructurestothesystemuntil after an RCU grace period elapses. In short,although the data element might be freed and reallocatedarbitrarily often, at least its type will remainthe same.Quick Quiz 8.23: How do you prevent a hugenumber of RCU read-side critical sections from indefinitelyblocking a synchronize_rcu() invocation?Quick Quiz 8.24: The synchronize_rcu() APIwaits for all pre-existing interrupt handlers to complete,right?In the “RCU BH” column, rcu_read_lock_bh()and rcu_read_unlock_bh() delimit RCU read-sidecritical sections, and call_rcu_bh() invokes thespecified function and argument after a subsequentgrace period. Note that RCU BH does nothave a synchronous synchronize_rcu_bh() interface,though one could easily be added if required.Quick Quiz 8.25: <strong>What</strong> happens if you mix andmatch? For example, suppose you use rcu_read_lock() and rcu_read_unlock() to delimit RCUread-side critical sections, but then use call_rcu_bh() to post an RCU callback?Quick Quiz 8.26: <strong>Hard</strong>ware interrupt handlerscan be thought of as being under the protection ofan implicit rcu_read_lock_bh(), right?In the “RCU Sched” column, anything that disablespreemption acts as an RCU read-side criticalsection, and synchronize_sched() waits forthe corresponding RCU grace period. This RCUAPI family was added in the 2.6.12 kernel, whichsplit the old synchronize_kernel() API into thecurrent synchronize_rcu() (for RCU Classic) andsynchronize_sched() (for RCU Sched). Note thatRCU Sched did not originally have an asynchronouscall_rcu_sched() interface, but one was added in2.6.26. In accordance with the quasi-minimalist philosophyof the Linux community, AP<strong>Is</strong> are added onan as-needed basis.Quick Quiz 8.27: <strong>What</strong> happens if you mix andmatch RCU Classic and RCU Sched?Quick Quiz 8.28: In general, you cannot relyonsynchronize_sched()towaitforallpre-existinginterrupt handlers, right?The “Realtime RCU” column has the same APIas does RCU Classic, the only difference being thatRCU read-side critical sections may be preemptedandmayblockwhileacquiringspinlocks. Thedesignof Realtime RCU is described elsewhere [McK07a].Quick Quiz 8.29: Why do both SRCU andQRCU lack asynchronous call_srcu() or call_qrcu() interfaces?The “SRCU” column in Table 8.5 displays a specializedRCU API that permits general sleeping inRCU read-side critical sections (see Appendix D.1for more details). Of course, use of synchronize_srcu() in an SRCU read-side critical section can resultin self-deadlock, so should be avoided. SRCUdiffers from earlier RCU implementations in thatthe caller allocates an srcu_struct for each distinctSRCU usage. This approach prevents SRCUread-side critical sections from blocking unrelatedsynchronize_srcu() invocations. In addition, inthis variant of RCU, srcu_read_lock() returns avalue that must be passed into the correspondingsrcu_read_unlock().The“QRCU”columnpresentsanRCUimplementationwith the same API structure as SRCU, butoptimized for extremely low-latency grace periods inabsence of readers, as described elsewhere [McK07f].As with SRCU, use of synchronize_qrcu() in aQRCU read-side critical section can result in selfdeadlock,so should be avoided. Although QRCUhas not yet been accepted into the Linux kernel, itis worth mentioning given that it is the only kernellevelRCU implementation that can boast deep sub-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!