10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

D.2. HIERARCHICAL RCU OVERVIEW 191struct rcu_statestructrcu_nodestructrcu_nodestructrcu_nodestructrcu_nodestructrcu_node1struct2rcu_nodestructrcu_nodestructrcu_nodestructrcu_nodestructrcu_nodestructrcu_nodestructrcu_nodestructrcu_node3structrcu_node4structrcu_nodestructrcu_nodestructrcu_nodestructrcu_nodestructrcu_nodestructrcu_nodeCPU 1CPU 3CPU 5structrcu_node5structrcu_node6CPU 0CPU 2CPU 4structrcu_nodestructrcu_nodestructrcu_nodestructrcu_nodestructrcu_nodestructrcu_nodeFigure D.12: Hierarchical RCU Stateaccessing each of the lower rcu_node structures willaccess the upper rcu_node, namely, the last of eachpair of CPUs to record a quiescent state for the correspondinggrace period.This results in a significant reduction in lock contention:instead of six CPUs contending for a singlelock each grace period, we have only three forthe upper rcu_node’s lock (a reduction of 50%) andonly two for each of the lower rcu_nodes’ locks (areduction of 67%).struct rcu_state0:7 0:3 4:7 0:1 2:3 4:5 6:7Figure D.13: Mapping rcu node Hierarchy Into ArrayThe tree of rcu_node structures is embedded intoa linear array in the rcu_state structure, with theroot of the tree in element zero, as shown in FigureD.13 for an eight-CPU system with a threelevelhierarchy. Each arrow links a given rcu_nodestructure to its parent, representing the rcu_node’s->parent field. Each rcu_node indicates the rangeof CPUs covered, so that the root node covers all ofthe CPUs, each node in the second level covers halfof the CPUs, and each node in the leaf level coveringa pair of CPUs. This array is allocated statically atcompile time based on the value of NR_CPUS.The sequence of diagrams in Figure D.14 showsFigure D.14: Hierarchical RCU Grace Periodhow grace periods are detected. In the first figure,no CPU has yet passed through a quiescent state,as indicated by the red rectangles. Suppose that allsix CPUs simultaneously try to tell RCU that theyhave passed through a quiescent state. Only oneof each pair will be able to acquire the lock on thecorresponding lower rcu_node, and so the secondfigure shows the result if the lucky CPUs are numbers0,3, and5, asindicatedbythegreenrectangles.Once these lucky CPUs have finished, then the otherCPUs will acquire the lock, as shown in the third figure.Each of these CPUs will see that they are thelast in their group, and therefore all three will attemptto move to the upper rcu_node. Only one ata time can acquire the upper rcu_node structure’slock, and the fourth, fifth, and sixth figures show thesequenceofstatesassumingthatCPU1, CPU2, andCPU 4 acquire the lock in that order. The sixth andfinal figure in the group shows that all CPUs havepassed through a quiescent state, so that the graceperiod has ended.Intheabovesequence, therewerenevermorethanthree CPUs contending for any one lock, in happycontrast to Classic RCU, where all six CPUs mightcontend. However, evenmoredramaticreductionsinlock contention are possible with larger numbers ofCPUs. Consider a hierarchy of rcu_node structures,with 64 lower structures and 64*64=4,096 CPUs, asshown in Figure D.15.Here each of the lower rcu_node structures’ locksare acquired by 64 CPUs, a 64-times reduction fromthe 4,096 CPUs that would acquire Classic RCU’ssinglegloballock. Similarly, duringagivengraceperiod,only one CPU from each of the lower rcu_node

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!