10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

D.2. HIERARCHICAL RCU OVERVIEW 197D.2.7.10 Detect a Too-Long Grace PeriodWhen the CONFIG_RCU_CPU_STALL_DETECTOR kernelparameter is specified, the record_gp_stall_check_time() function records the time and also atimestamp set three seconds into the future. <strong>If</strong> thecurrentgraceperiodstillhasnotendedbythattime,the check_cpu_stall() function will check for theculprit, invoking print_cpu_stall() if the currentCPU is the holdout, or print_other_cpu_stall()if it is some other CPU. A two-jiffies offset helps ensurethat CPUs report on themselves when possible,taking advantage of the fact that a CPU can normallydo a better job of tracing its own stack thanit can tracing some other CPU’s stack.D.2.8 TestingRCU is fundamental synchronization code, so anyfailure of RCU results in random, difficult-to-debugmemory corruption. <strong>It</strong> is therefore extremely importantthat RCU be highly reliable. <strong>So</strong>me of thisreliability stems from careful design, but at the endof the day we must also rely on heavy stress testing,otherwise known as torture.Fortunately, although there has been some debateas to exactly what populations are covered by theprovisions of the Geneva Convention it is still thecase that it does not apply to software. Therefore,it is still legal to torture your software. In fact, itis strongly encouraged, because if you don’t tortureyour software, it will end up torturing you by crashingat the most inconvenient times imaginable.Therefore, we torture RCU quite vigorously usingthe rcutorture module.However, it is not sufficient to torture thecommon-case uses of RCU. <strong>It</strong> is also necessary totorture it in unusual situations, for example, whenconcurrently onlining and offlining CPUs and whenCPUs are concurrently entering and exiting dynticksidle mode. I use a script @@@ move to CodeSamples,ref@@@andusethetest_no_idle_hzmoduleparameter to rcutorture to stress-test dynticks idlemode. Just to be fully paranoid, I sometimes run akernbench workload in parallel as well. Ten hoursof this sort of torture on a 128-way machine seemssufficient to shake out most bugs.Eventhisisnotthecompletestory. AsAlexey<strong>Do</strong>briyanand Nick Piggin demonstrated in early 2008,it is also necessary to torture RCU with all relevantcombinations of kernel parameters. The relevantkernel parameters may be identified using yetanother script@@@ move toCodeSamples, ref@@@1. CONFIG_CLASSIC_RCU: Classic RCU.2. CONFIG_PREEMPT_RCU: Preemptable (real-time)RCU.3. CONFIG_TREE_RCU: Classic RCU for huge SMPsystems.4. CONFIG_RCU_FANOUT: Number of children foreach rcu_node.5. CONFIG_RCU_FANOUT_EXACT: Balance the rcu_node tree.6. CONFIG_HOTPLUG_CPU: Allow CPUs to be offlinedand onlined.7. CONFIG_NO_HZ: Enable dyntick-idle mode.8. CONFIG_SMP: Enable multi-CPU operation.9. CONFIG_RCU_CPU_STALL_DETECTOR: EnableRCU to detect when CPUs go on extendedquiescent-state vacations.10. CONFIG_RCU_TRACE: Generate RCU trace filesin debugfs.We ignore the CONFIG_DEBUG_LOCK_ALLOC configurationvariable under the perhaps-naive assumptionthat hierarchical RCU could not have brokenlockdep. There are still 10 configuration variables,which would result in 1,024 combinations if theywere independent boolean variables. Fortunatelythe first three are mutually exclusive, which reducesthe number of combinations down to 384, butCONFIG_RCU_FANOUT can take on values from 2 to64, increasingthe number of combinations to12,096.This is an infeasible number of combinations.One key observation is that only CONFIG_NO_HZ and CONFIG_PREEMPT can be expected to havechanged behavior if either CONFIG_CLASSIC_RCU orCONFIG_PREEMPT_RCUareineffect,asonlytheseportionsof the two pre-existing RCU implementationswere changed during this effort. This cuts out almosttwo thirds of the possible combinations.Furthermore, not all of the possible values ofCONFIG_RCU_FANOUT produce significantly differentresults, in fact only a few cases really need to betested separately:1. Single-node “tree”.2. Two-level balanced tree.3. Three-level balanced tree.4. Autobalanced tree, where CONFIG_RCU_FANOUTspecifies an unbalanced tree, but such that itis auto-balanced in absence of CONFIG_RCU_FANOUT_EXACT.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!