10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

D.2. HIERARCHICAL RCU OVERVIEW 189read-side critical sections have finished, were designedwith only a few tens of CPUs in mind. Theirscalabilityislimitedbyagloballockthatmustbeacquiredby each CPU at least once during each graceperiod. Although Classic RCU actually scales toa couple of hundred CPUs, and can be tweaked toscale to roughly a thousand CPUs (but at the expenseof extending grace periods), emerging multicoresystems will require it to scale better.In addition, Classic RCU has a sub-optimaldynticks interface, with the result that Classic RCUwill wake up every CPU at least once per grace period.To see the problem with this, consider a 16-CPU system that is sufficiently lightly loaded that itis keeping only four CPUs busy. In a perfect world,the remaining twelve CPUs could be put into deepsleep mode in order to conserve energy. Unfortunately,if the four busy CPUs are frequently performingRCU updates, those twelve idle CPUs willbe awakened frequently, wasting significant energy.Thus, any major change to Classic RCU should alsoleave sleeping CPUs lie.Both the classic and the hierarchical implementationshave have Classic RCU semantics and identicalAP<strong>Is</strong>, however, theoldimplementationwillbecalled“classic RCU” and the new implementation will becalled “hierarchical RCU”.@@@ roadmap @@@D.2.1 Review of RCU FundamentalsIn its most basic form, RCU is a way of waiting forthings to finish. Of course, there are a great manyother ways of waiting for things to finish, includingreference counts, reader-writer locks, events, and soon. The great advantage of RCU is that it can waitfor each of (say) 20,000 different things without havingto explicitly track each and every one of them,and without having to worry about the performancedegradation, scalability limitations, complex deadlockscenarios, and memory-leak hazards that areinherent in schemes using explicit tracking.In RCU’s case, the things waited on are called”RCU read-side critical sections”. An RCU readsidecritical section starts with an rcu_read_lock()primitive, and ends with a correspondingrcu_read_unlock() primitive. RCU read-side critical sectionscan be nested, and may contain pretty much anycode, aslongasthatcodedoesnotexplicitlyblockorsleep (although a special form of RCU called SRCU,described in Section D.1 does permit general sleepinginSRCUread-sidecriticalsections). <strong>If</strong>youabideby these conventions, you can use RCU to wait forany desired piece of code to complete.RCU accomplishes this feat by indirectly determiningwhen these other things have finished, as hasbeendescribedelsewhere[MS98]forclassicRCUandSection D.4 for preemptable RCU.In particular, as shown in the Figure 8.11 onpage 8.11, RCU is a way of waiting for pre-existingRCU read-side critical sections to completely finish,also including the memory operations executed bythose critical sections.However, notethatRCUread-sidecriticalsectionsthatbeginafterthebeginningofagivengraceperiodcan and will extend beyond the end of that graceperiod.The following section gives a very high-level viewof how the Classic RCU implementation operates.D.2.2 Brief Overview of Classic RCUImplementationThe key concept behind the Classic RCU implementationis that Classic RCU read-side critical sectionsare confined to kernel code and are not permittedto block. This means that any time a given CPUis seen either blocking, in the idle loop, or exitingthe kernel, we know that all RCU read-side criticalsections that were previously running on that CPUmust have completed. Such states are called “quiescentstates”, and after each CPU has passed throughat least one quiescent state, the RCU grace periodends.struct rcu_ctrlblkCPU 0rcp−>cpumaskProtected by rcp−>lockRecord Quiescent StateFigure D.11: Flat Classic RCU StateClassic RCU’s most important data structure isthe rcu_ctrlblk structure, which contains the ->cpumask field, which contains one bit per CPU, asshown in Figure D.11. Each CPU’s bit is set toone at the beginning of each grace period, and eachCPU must clear its bit after it passes through a quiescentstate. Because multiple CPUs might want to

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!