10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

D.4. PREEMPTABLE RCU 241unlock() were reordered by the compiler?Quick Quiz D.60: <strong>What</strong> problems could ariseif the lines containing ACCESS_ONCE() in rcu_read_unlock() were reordered by the CPU?Quick Quiz D.61: <strong>What</strong> problems could arise inrcu_read_unlock() if irqs were not disabled?CPU 0 CPU 1 CPU 2 CPU 3MB MB MB MBGrace PeriodCPU 0 CPU 1 CPU 2 CPU 3MBMBMBMBMBMBMBMBMBMBMBMB MBMBMB MB MBGrace PeriodMBMBMBMBFigure D.76: Premptable RCU with Grace-PeriodMemory BarriersMBMBMBGiven that the Linux kernel can execute literallymillions of RCU read-side critical sections per graceperiod, this latter approach can result in substantialread-side savings, due to the fact that it amortizesthe cost of the memory barrier over all the read-sidecritical sections in a grace period.Figure D.75: Preemptable RCU with Read-SideMemory BarriersMemory-Barrier Considerations Note thatthese two primitives contains no memory barriers,so there is nothing to stop the CPU from executingthe critical section before executing the rcu_read_lock() or after executing the rcu_read_unlock().The purpose of the rcu_try_flip_waitmb_stateis to account for this possible reordering, but onlyat the beginning or end of a grace period. To seewhy this approach is helpful, consider Figure D.75,which shows the wastefulness of the conventional approachof placing a memory barrier at the beginningand end of each RCU read-side critical section[MSMB06].The ”MB”s represent memory barriers, and onlythe emboldened barriers are needed, namely the firstand last on a given CPU for each grace period.This preemptible RCU implementation therefore associatesthe memory barriers with the grace period,as shown in Figure D.76.D.4.3 Validation of PreemptibleRCUD.4.3.1 TestingThe preemptible RCU algorithm was tested with atwo-stage grace period on weakly ordered POWER4and POWER5 CPUs using rcutorture running formore than 24 hours on each machine, with 15M and20M grace periods, respectively, and with no errors.Of course, this in no way proves that this algorithmis correct. At most, it shows either that these twomachines were extremely lucky or that any bugs remainingin preemptible RCU have an extremely lowprobability of occurring. We therefore required additionalassurance that this algorithm works, or, alternatively,identification of remaining bugs.This task requires a conceptual approach, whichis taken in the next section.D.4.3.2 Conceptual ValidationBecause neither rcu_read_lock() nor rcu_read_unlock() contain memory barriers, the RCU read-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!