10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

200 APPENDIX D. READ-COPY UPDATE IMPLEMENTATIONSdynticks-idle mode throughout would need tobe scanned. In some cases, for example when adynticks-idle CPU is handling an interrupt duringa scan, subsequent scans are required. However,each such scan is performed separately, soscheduling latency is degraded by the overheadof only one such scan.<strong>If</strong> this scan proves problematic, one straightforwardsolution would be to do the scan incrementally.This would increase code complexityslightly and would also increase thetime required to end a grace period, but wouldnonetheless be a likely solution.2. The rcu_node hierarchy is created at compiletime, and is therefore sized for the worst-caseNR_CPUS number of CPUs. However, even for4,096 CPUs, the rcu_node hierarchy consumesonly 65 cache lines on a 64-bit machine (andjust you try accommodating 4,096 CPUs on a32-bit machine!). Of course, a kernel built withNR_CPUS=4096 running on a 16-CPU machinewould use a two-level tree when a single-nodetree would work just fine. Although this configurationwould incur added locking overhead,this does not affect hot-path read-side code, soshould not be a problem in practice.3. This patch does increase kernel text and datasomewhat: the old Classic RCU implementationconsumes1,757bytesofkerneltextand456bytes of kernel data for a total of 2,213 bytes,whilethenewhierarchicalRCUimplementationconsumes 4,006 bytes of kernel text and 624bytes of kernel data for a total of 4,630 byteson a NR_CPUS=4 system. This is a non-problemeven for most embedded systems, which oftencomewithhundredsofmegabytesofmainmemory.However, if this is a problem for tiny embeddedsystems, it may be necessary to provideboth “scale up” and “scale down” implementationsof RCU.This hierarchical RCU implementation shouldnevertheless be a vast improvement over ClassicRCU for machines with hundreds of CPUs. Afterall, Classic RCU was designed for systems with only16-32 CPUs.At some point, it may be necessary to also applyhierarchy to the preemptable RCU implementation.This will be challenging due to the modular arithmeticusedontheper-CPUcounterpairs,butshouldbe doable.D.3 Hierarchical RCU CodeWalkthroughThis section walks through selected sections of theLinux-kernel hierarchical RCU code. As such, thissection is intended for hard-core hackers who wish tounderstandhierarchicalRCUataverylowlevel, andsuch hackers should first read Section D.2. <strong>Hard</strong>coremasochists might also be interested in readingthis section. Of course really hard-core masochistswill read this section before reading Section D.2.Section D.3.1 describes data structures and kernelparameters, Section D.3.2 covers external functioninterfaces, Section D.3.3 presents the initializationprocess, Section D.3.4 explains the CPU-hotplug interface,Section D.3.5 covers miscellaneous utilityfunctions, Section D.3.6 describes the mechanics ofgrace-period detection, Section D.3.7 presents thedynticks-idleinterface,SectionD.3.8coversthefunctionsthat handle holdout CPUs (including offlineand dynticks-idle CPUs), and Section D.3.9 presentsfunctions that report on stalled CPUs, namely thosespinning in kernel mode for many seconds. Finally,Section D.3.10 reports on possible design flaws andfixes.D.3.1 Data Structures and KernelParametersA full understanding of the Hierarchical RCU datastructures is critically important to understandingthe algorithms. To this end, Section D.3.1.1describes the data structures used to track eachCPU’s dyntick-idle state, Section D.3.1.2 describesthe fields in the per-node data structure making upthe rcu_node hierarchy, Section D.3.1.3 describesper-CPU rcu_data structure, Section D.3.1.4 describesthe field in the global rcu_state structure,and Section D.3.1.5 describes the kernel parametersthat control Hierarchical RCU’s operation.Figure D.17 on Page 193 and Figure D.26 onPage 211 can be very helpful in keeping one’s placethrough the following detailed data-structure descriptions.D.3.1.1 Tracking Dyntick StateThe per-CPU rcu_dynticks structure tracksdynticks state using the following fields:dynticks_nesting: This int counts the number ofreasons that the corresponding CPU should bemonitored for RCU read-side critical sections.<strong>If</strong> the CPU is in dynticks-idle mode, then this

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!