10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8.3. READ-COPY UPDATE (RCU) 93microsecond grace-period latencies.Quick Quiz 8.30: Under what conditions cansynchronize_srcu() be safely used within anSRCU read-side critical section?The Linux kernel currently has a surprising numberof RCU AP<strong>Is</strong> and implementations. There issome hope of reducing this number, evidenced bythe fact that a given build of the Linux kernel currentlyhas at most three implementations behindfour AP<strong>Is</strong> (given that RCU Classic and RealtimeRCU share the same API). However, careful inspectionand analysis will be required, just as would berequired in order to eliminate one of the many lockingAP<strong>Is</strong>.The various RCU AP<strong>Is</strong> are distinguished by theforward-progress guarantees that their RCU readsidecritical sections must provide, and also by theirscope, as follows:1. RCU BH: read-side critical sections must guaranteeforward progress against everything exceptfor NMI and IRQ handlers, but not includingsoftirq handlers. RCU BH is global inscope.2. RCU Sched: read-side critical sections mustguarantee forward progress against everythingexcept for NMI and IRQ handlers, includingsoftirq handlers. RCU Sched is global in scope.3. RCU (both classic and real-time): readsidecritical sections must guarantee forwardprogress against everything except for NMIhandlers, IRQ handlers, softirq handlers, and(in the real-time case) higher-priority real-timetasks. RCU is global in scope.4. SRCU and QRCU: read-side critical sectionsneed not guarantee forward progress unlesssome other task is waiting for the correspondinggrace period to complete, in which case theseread-side critical sections should complete inno more than a few seconds (and preferablymuch more quickly). 1 SRCU’s and QRCU’sscope is defined by the use of the correspondingsrcu_struct or qrcu_struct, respectively.In other words, SRCU and QRCU compensate fortheirextremelyweakforward-progressguaranteesbypermitting the developer to restrict their scope.1 Thanks to James Bottomley for urging me to this formulation,as opposed to simply saying that there are no forwardprogressguarantees.8.3.3.2 RCU has Publish-Subscribe andVersion-Maintenance AP<strong>Is</strong>Fortunately, theRCUpublish-subscribeandversionmaintenanceprimitives shown in the following tableapply to all of the variants of RCU discussed above.Thiscommonalitycaninsomecasesallowmorecodeto be shared, which certainly reduces the API proliferationthat would otherwise occur. The originalpurpose of the RCU publish-subscribe AP<strong>Is</strong> was tobury memory barriers into these AP<strong>Is</strong>, so that Linuxkernel programmers could use RCU without needingto become expert on the memory-ordering modelsof each of the 20+ CPU families that Linux supports[Spr01].The first pair of categories operate on Linuxstruct list_head lists, which are circular, doublylinkedlists. The list_for_each_entry_rcu()primitive traverses an RCU-protected list in a typesafemanner, while also enforcing memory orderingfor situations where a new list element is insertedinto the list concurrently with traversal. Onnon-Alpha platforms, this primitive incurs little orno performance penalty compared to list_for_each_entry(). The list_add_rcu(), list_add_tail_rcu(), and list_replace_rcu() primitivesare analogous to their non-RCU counterparts, butincur the overhead of an additional memory barrieron weakly-ordered machines. The list_del_rcu()primitive is also analogous to its non-RCU counterpart,but oddly enough is very slightly faster dueto the fact that it poisons only the prev pointerrather than both the prev and next pointers aslist_del() must do. Finally, the list_splice_init_rcu() primitive is similar to its non-RCUcounterpart, but incurs a full grace-period latency.The purpose of this grace period is to allow RCUreaders to finish their traversal of the source list beforecompletely disconnecting it from the list header– failure to do this could prevent such readers fromever terminating their traversal.Quick Quiz 8.31: Whydoesn’tlist_del_rcu()poison both the next and prev pointers?The second pair of categories operate on Linux’sstruct hlist_head, which is a linear linkedlist. One advantage of struct hlist_head overstruct list_head is that the former requires only asingle-pointer list header, which can save significantmemory in large hash tables. The struct hlist_head primitives in the table relate to their non-RCU counterparts in much the same way as do thestruct list_head primitives.The final pair of categories operate directly onpointers, and are useful for creating RCU-protected

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!