10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

190 APPENDIX D. READ-COPY UPDATE IMPLEMENTATIONSclear their bits concurrently, which would corruptthe ->cpumask field, a ->lock spinlock is used toprotect ->cpumask, preventing any such corruption.Unfortunately, this spinlock can also suffer extremecontention if there are more than a few hundredCPUs, which might soon become quite common ifmulticore trends continue. Worse yet, the fact thatall CPUs must clear their own bit means that CPUsare not permitted to sleep through a grace period,which limits Linux’s ability to conserve power.The next section lays out what we need from anew non-real-time RCU implementation.D.2.3 RCU DesiderataThelistofreal-timeRCUdesiderata[MS05]isaverygood start:1. Deferred destruction, so that an RCU grace periodcannot end until all pre-existing RCU readsidecritical sections have completed.2. Reliable, so that RCU supports 24x7 operationfor years at a time.3. Callable from irq handlers.4. Contained memory footprint, so that mechanismsexist to expedite grace periods if thereare too many callbacks. (This is weakened fromthe LCA2005 list.)5. Independent of memory blocks, so that RCUcan work with any conceivable memory allocator.6. Synchronization-freereadside, sothatonlynormalnon-atomic instructions operating on CPUortask-local memory are permitted. (This isstrengthened from the LCA2005 list.)7. Unconditional read-to-write upgrade, which isused in several places in the Linux kernel wheretheupdate-sidelockisacquiredwithintheRCUread-side critical section.8. Compatible API.9. Because this is not to be a real-time RCU,the requirement for preemptable RCU read-sidecritical sections can be dropped. However, weneed to add the following new requirements toaccount for changes over the past few years.10. Scalability with extremely low internal-to-RCUlock contention. RCU must support at least1,024 CPUs gracefully, and preferably at least4,096.11. Energy conservation: RCU must be able toavoid awakening low-power-state dynticks-idleCPUs, but still determine when the currentgrace period ends. This has been implementedin real-time RCU, but needs serious simplification.12. RCU read-side critical sections must be permittedin NMI handlers as well as irq handlers.Note that preemptable RCU was ableto avoid this requirement due to a separatelyimplemented synchronize_sched().13. RCUmustoperategracefullyinfaceofrepeatedCPU-hotplug operations. This is simply carryingforward a requirement met by both classicand real-time.14. <strong>It</strong> must be possible to wait for all previouslyregistered RCU callbacks to complete, thoughthis is already provided in the form of rcu_barrier().15. Detecting CPUs that are failing to respond isdesirable, to assist diagnosis both of RCU andof various infinite loop bugs and hardware failuresthat can prevent RCU grace periods fromending.16. Extreme expediting of RCU grace periods is desirable,so that an RCU grace period can beforced to complete within a few hundred microsecondsof the last relevant RCU read-sidecritical second completing. However, such anoperation would be expected to incur severeCPU overhead, and would be primarily usefulwhen carrying out a long sequence of operationsthat each needed to wait for an RCU grace period.The most pressing of the new requirements is thefirst one, scalability. The next section therefore describeshow to make order-of-magnitude reductionsin contention on RCU’s internal locks.D.2.4 Towards a More Scalable RCUImplementationOne effective way to reduce lock contention is to createa hierarchy, as shown in Figure D.12. Here, eachof the four rcu_node structures has its own lock, sothat only CPUs 0 and 1 will acquire the lower leftrcu_node’s lock, only CPUs 2 and 3 will acquire thelower middle rcu_node’s lock, and only CPUs 4 and5 will acquire the lower right rcu_node’s lock. Duringany given grace period, only one of the CPUs

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!