10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8.3. READ-COPY UPDATE (RCU) 8310000100001000rwlockrwlockOverhead (nanoseconds)1001010.10.010.001rcuOverhead (nanoseconds)100010010rcu1e-041e-050 2 4 6 8 10 12 14 16Number of CPUs10 2 4 6 8 10 12 14 16Number of CPUsFigure 8.15: Performance Advantage of RCU OverReader-Writer LockingFigure8.16: PerformanceAdvantageofPreemptableRCU Over Reader-Writer Lockingsome cases, it is possible to mechanically substituteRCU API members for the corresponding readerwriterlock API members. But first, why bother?Advantages of RCU include performance, deadlockimmunity, and realtime latency. There are, ofcourse, limitations to RCU, including the fact thatreaders and updaters run concurrently, that lowpriorityRCUreaderscanblockhigh-prioritythreadswaiting for a grace period to elapse, and that graceperiodlatencies can extend for many milliseconds.These advantages and limitations are discussed inthe following sections.Performance The read-side performance advantagesof RCU over reader-writer locking are shownin Figure 8.15.Quick Quiz 8.12: WTF??? How the heck doyou expect me to believe that RCU has a 100-femtosecond overhead when the clock period at3GHz is more than 300 picoseconds?Note that reader-writer locking is orders of magnitudeslower than RCU on a single CPU, and isalmost two additional orders of magnitude sloweron 16 CPUs. In contrast, RCU scales quite well.In both cases, the error bars span a single standarddeviation in either direction.A more moderate view may be obtained froma CONFIG_PREEMPT kernel, though RCU still beatsreader-writer locking by between one and three ordersof magnitude, as shown in Figure 8.16. Notethe high variability of reader-writer locking at largernumbers of CPUs. The error bars span a single standarddeviation in either direction.Of course, the low performance of reader-writerlocking in Figure 8.16 is exaggerated by the unrealisticzero-length critical sections. The performanceadvantages of RCU become less significant as theoverheadofthecriticalsectionincreases, asshowninFigure8.17fora16-CPUsystem, inwhichthey-axisrepresents the sum of the overhead of the read-sideprimitives and that of the critical section.Quick Quiz 8.13: Why does both the variabilityand overhead of rwlock decrease as the criticalsectionoverhead increases?However, this observation must be tempered bythe fact that a number of system calls (and thus anyRCU read-side critical sections that they contain)can complete within a few microseconds.In addition, as is discussed in the next section,RCU read-side primitives are almost entirelydeadlock-immune.Deadlock Immunity Although RCU offers significantperformance advantages for read-mostlyworkloads, one of the primary reasons for creatingRCU in the first place was in fact its immunity toread-side deadlocks. This immunity stems from thefact that RCU read-side primitives do not block,spin, or even do backwards branches, so that theirexecution time is deterministic. <strong>It</strong> is therefore im-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!