Is Parallel Programming Hard, And, If So, What Can You Do About It?

More documents

Recommendations

Info

$TeX op Mac OS X, met teTeX en TeXShop - Nluug$

188 APPENDIX D. READ-COPY UPDATE IMPLEMENTATIONS1 void srcu_read_unlock(struct srcu_struct *sp, int idx)2 {3 preempt_disable();4 srcu_barrier();5 per_cpu_ptr(sp->per_cpu_ref,6 smp_processor_id())->c[idx]--;7 preempt_enable();8 }Figure D.9: SRCU Read-Side ReleaseD.1.3.4 Update-Side ImplementationThe key point behind SRCU is thatsynchronize_sched() blocks until all currentlyexecutingpreempt-disabled regions of code complete.The synchronize_srcu() primitive makesheavy use of this effect, as can be seen in FigureD.10.Line 5 takes a snapshot of the grace-periodcounter. Line 6 acquires the mutex, and lines 7-10check to see whether at least two grace periods haveelapsed since the snapshot, and, if so, releases thelock and returns — in this case, someone else hasdone our work for us. Otherwise, line 11 guaranteesthat any other CPU that sees the incremented valueof the grace period counter in srcu_read_lock()also sees any changes made by this CPU prior to enteringsynchronize_srcu(). This guarantee is requiredtomakesurethatanySRCUread-sidecriticalsectionsnotblockingthenextgraceperiodhaveseenany prior changes.Line 12 fetches the bottom bit of the grace-periodcounter for later use as an index into the per-CPUcounter arrays, and then line 13 increments thegrace-period counter. Line 14 then waits for anycurrently-executingsrcu_read_lock()tocomplete,so that by the time that we reach line 15, all extantinstances of srcu_read_lock() will be usingthe updated value from sp->completed. Therefore,the counters sampled in by srcu_readers_active_idx() on line 15 are guaranteed to be monotonicallydecreasing, so that once their sum reaches zero, it isguaranteed to stay there.However, there are no memory barriers in thesrcu_read_unlock() primitive, so the CPU iswithin its rights to reorder the counter decrementup into the SRCU critical section, so that referencesto an SRCU-protected data structure could in effect“bleed out” of the SRCU critical section. This scenariois addressed by the synchronize_sched() online 17, which blocks until all other CPUs executingin preempt_disable() code sequences (such as thatin srcu_read_unlock()) complete these sequences.Because completion of a given preempt_disable()code sequence is observed from the CPU executingthat sequence, completion of the sequence impliescompletion of any prior SRCU read-side critical section.Any required memory barriers are supplied bythe code making the observation.At this point, it is therefore safe to release the mutexas shown on line 18 and return to the caller, whocan now be assured that all SRCU read-side criticalsections sharing the same struct srcu_structwill observe any update made prior to the call tosynchronize_srcu().1 void synchronize_srcu(struct srcu_struct *sp)2 {3 int idx;45 idx = sp->completed;6 mutex_lock(&sp->mutex);7 if ((sp->completed - idx) >= 2) {8 mutex_unlock(&sp->mutex);9 return;10 }11 synchronize_sched();12 idx = sp->completed & 0x1;13 sp->completed++;14 synchronize_sched();15 while (srcu_readers_active_idx(sp, idx))16 schedule_timeout_interruptible(1);17 synchronize_sched();18 mutex_unlock(&sp->mutex);19 }Figure D.10: SRCU Update-Side ImplementationQuick Quiz D.3: Why is it OK to assume thatupdates separated by synchronize sched() will beperformed in order?Quick Quiz D.4: Why must line 17 insynchronize srcu() (Figure D.10) precede the releaseof the mutex on line 18? What would have tochange to permit these two lines to be interchanged?Would such a change be worthwhile? Why or whynot?D.1.4 SRCU SummarySRCU provides an RCU-like set of primitives thatpermit general sleeping in the SRCU read-side criticalsections. However, it is important to note thatSRCU has been used only in prototype code, thoughit has passed the RCU torture test. It will be veryinteresting to see what use, if any, SRCU sees in thefuture.D.2 Hierarchical RCUOverviewAlthough Classic RCU’s read-side primitives enjoyexcellent performance and scalability, the updatesideprimitives, which determine when pre-existing
D.2. HIERARCHICAL RCU OVERVIEW 189read-side critical sections have finished, were designedwith only a few tens of CPUs in mind. Theirscalabilityislimitedbyagloballockthatmustbeacquiredby each CPU at least once during each graceperiod. Although Classic RCU actually scales toa couple of hundred CPUs, and can be tweaked toscale to roughly a thousand CPUs (but at the expenseof extending grace periods), emerging multicoresystems will require it to scale better.In addition, Classic RCU has a sub-optimaldynticks interface, with the result that Classic RCUwill wake up every CPU at least once per grace period.To see the problem with this, consider a 16-CPU system that is sufficiently lightly loaded that itis keeping only four CPUs busy. In a perfect world,the remaining twelve CPUs could be put into deepsleep mode in order to conserve energy. Unfortunately,if the four busy CPUs are frequently performingRCU updates, those twelve idle CPUs willbe awakened frequently, wasting significant energy.Thus, any major change to Classic RCU should alsoleave sleeping CPUs lie.Both the classic and the hierarchical implementationshave have Classic RCU semantics and identicalAPIs, however, theoldimplementationwillbecalled“classic RCU” and the new implementation will becalled “hierarchical RCU”.@@@ roadmap @@@D.2.1 Review of RCU FundamentalsIn its most basic form, RCU is a way of waiting forthings to finish. Of course, there are a great manyother ways of waiting for things to finish, includingreference counts, reader-writer locks, events, and soon. The great advantage of RCU is that it can waitfor each of (say) 20,000 different things without havingto explicitly track each and every one of them,and without having to worry about the performancedegradation, scalability limitations, complex deadlockscenarios, and memory-leak hazards that areinherent in schemes using explicit tracking.In RCU’s case, the things waited on are called”RCU read-side critical sections”. An RCU readsidecritical section starts with an rcu_read_lock()primitive, and ends with a correspondingrcu_read_unlock() primitive. RCU read-side critical sectionscan be nested, and may contain pretty much anycode, aslongasthatcodedoesnotexplicitlyblockorsleep (although a special form of RCU called SRCU,described in Section D.1 does permit general sleepinginSRCUread-sidecriticalsections). Ifyouabideby these conventions, you can use RCU to wait forany desired piece of code to complete.RCU accomplishes this feat by indirectly determiningwhen these other things have finished, as hasbeendescribedelsewhere[MS98]forclassicRCUandSection D.4 for preemptable RCU.In particular, as shown in the Figure 8.11 onpage 8.11, RCU is a way of waiting for pre-existingRCU read-side critical sections to completely finish,also including the memory operations executed bythose critical sections.However, notethatRCUread-sidecriticalsectionsthatbeginafterthebeginningofagivengraceperiodcan and will extend beyond the end of that graceperiod.The following section gives a very high-level viewof how the Classic RCU implementation operates.D.2.2 Brief Overview of Classic RCUImplementationThe key concept behind the Classic RCU implementationis that Classic RCU read-side critical sectionsare confined to kernel code and are not permittedto block. This means that any time a given CPUis seen either blocking, in the idle loop, or exitingthe kernel, we know that all RCU read-side criticalsections that were previously running on that CPUmust have completed. Such states are called “quiescentstates”, and after each CPU has passed throughat least one quiescent state, the RCU grace periodends.struct rcu_ctrlblkCPU 0rcp−>cpumaskProtected by rcp−>lockRecord Quiescent StateFigure D.11: Flat Classic RCU StateClassic RCU’s most important data structure isthe rcu_ctrlblk structure, which contains the ->cpumask field, which contains one bit per CPU, asshown in Figure D.11. Each CPU’s bit is set toone at the beginning of each grace period, and eachCPU must clear its bit after it passes through a quiescentstate. Because multiple CPUs might want to
Page 1 and 2:
Is Parallel Programming Hard, And,
Page 3 and 4:
Contents1 Introduction 11.1 Histori
Page 5 and 6:
CONTENTSv6 Locking 676.1 Staying Al
Page 7 and 8:
CONTENTSviiB Synchronization Primit
Page 9 and 10:
CONTENTSixE.7.1 Introduction to Pre
Page 11 and 12:
PrefaceThe purpose of this book is
Page 13 and 14:
Chapter 1IntroductionParallel progr
Page 15 and 16:
1.2. PARALLEL PROGRAMMING GOALS 3CP
Page 17 and 18:
1.3. ALTERNATIVES TO PARALLEL PROGR
Page 19 and 20:
1.4. WHAT MAKES PARALLEL PROGRAMMIN
Page 21 and 22:
1.5. GUIDE TO THIS BOOK 9other hand
Page 23 and 24:
Chapter 2Hardware and its HabitsMos
Page 25:
2.1. OVERVIEW 13Therefore, as shown
Page 28 and 29:
16 CHAPTER 2. HARDWARE AND ITS HABI
Page 30 and 31:
18 CHAPTER 2. HARDWARE AND ITS HABI
Page 32 and 33:
20 CHAPTER 3. TOOLS OF THE TRADE1 p
Page 34 and 35:
22 CHAPTER 3. TOOLS OF THE TRADE1 p
Page 36 and 37:
24 CHAPTER 3. TOOLS OF THE TRADE1.1
Page 38 and 39:
26 CHAPTER 3. TOOLS OF THE TRADEQui
Page 40 and 41:
28 CHAPTER 3. TOOLS OF THE TRADE
Page 42 and 43:
30 CHAPTER 4. COUNTING1 atomic_t co
Page 44 and 45:
32 CHAPTER 4. COUNTING4.2.3 Eventua
Page 46 and 47:
34 CHAPTER 4. COUNTINGvanish when t
Page 48 and 49:
36 CHAPTER 4. COUNTINGper-thread va
Page 50 and 51:
38 CHAPTER 4. COUNTING1 unsigned lo
Page 52 and 53:
Page 54 and 55:
42 CHAPTER 4. COUNTING1 #define THE
Page 56 and 57:
Page 58 and 59:
46 CHAPTER 4. COUNTINGReadsAlgorith
Page 60 and 61:
48 CHAPTER 5. PARTITIONING AND SYNC
Page 62 and 63:
Page 64 and 65:
Page 66 and 67:
Page 68 and 69:
Page 70 and 71:
Page 72 and 73:
Page 74 and 75:
Page 76 and 77:
Page 78 and 79:
Page 80 and 81:
68 CHAPTER 6. LOCKING1 int delete(i
Page 82 and 83:
70 CHAPTER 7. DATA OWNERSHIP
Page 84 and 85:
72 CHAPTER 8. DEFERRED PROCESSINGfo
Page 86 and 87:
74 CHAPTER 8. DEFERRED PROCESSINGth
Page 88 and 89:
76 CHAPTER 8. DEFERRED PROCESSING
Page 90 and 91:
78 CHAPTER 8. DEFERRED PROCESSINGfi
Page 92 and 93:
80 CHAPTER 8. DEFERRED PROCESSINGti
Page 94 and 95:
82 CHAPTER 8. DEFERRED PROCESSINGNo
Page 96 and 97:
84 CHAPTER 8. DEFERRED PROCESSING12
Page 98 and 99:
Page 100 and 101:
88 CHAPTER 8. DEFERRED PROCESSINGvo
Page 102 and 103:
90 CHAPTER 8. DEFERRED PROCESSINGLi
Page 104 and 105:
92 CHAPTER 8. DEFERRED PROCESSINGpe
Page 106 and 107:
94 CHAPTER 8. DEFERRED PROCESSINGCa
Page 108 and 109:
96 CHAPTER 8. DEFERRED PROCESSINGTh
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
104 CHAPTER 8. DEFERRED PROCESSINGs
Page 118 and 119:
106 CHAPTER 8. DEFERRED PROCESSINGo
Page 120 and 121:
108 CHAPTER 8. DEFERRED PROCESSING
Page 122 and 123:
110 CHAPTER 9. APPLYING RCU1 struct
Page 124 and 125:
112 CHAPTER 9. APPLYING RCU
Page 126 and 127:
114 CHAPTER 10. VALIDATION: DEBUGGI
Page 128 and 129:
116 CHAPTER 11. DATA STRUCTURES
Page 130 and 131:
118 CHAPTER 12. ADVANCED SYNCHRONIZ
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Page 146 and 147:
Page 148 and 149:
Page 150 and 151: 138 CHAPTER 13. EASE OF USEFigure 1
Page 152 and 153: 140 CHAPTER 13. EASE OF USE
Page 154 and 155: 142 CHAPTER 14. TIME MANAGEMENT
Page 156 and 157: 144 CHAPTER 15. CONFLICTING VISIONS
Page 166 and 167: 154 APPENDIX A. IMPORTANT QUESTIONS
Page 168 and 169: 156 APPENDIX A. IMPORTANT QUESTIONS
Page 170 and 171: 158 APPENDIX B. SYNCHRONIZATION PRI
Page 172 and 173: 160 APPENDIX B. SYNCHRONIZATION PRI
Page 174 and 175: 162 APPENDIX C. WHY MEMORY BARRIERS
Page 196 and 197: 184 APPENDIX D. READ-COPY UPDATE IM
Page 250 and 251:
238 APPENDIX D. READ-COPY UPDATE IM
Page 252 and 253:
Page 254 and 255:
Page 256 and 257:
244 APPENDIX E. FORMAL VERIFICATION
Page 258 and 259:
Page 260 and 261:
Page 262 and 263:
Page 264 and 265:
Page 266 and 267:
Page 268 and 269:
Page 270 and 271:
Page 272 and 273:
Page 274 and 275:
Page 276 and 277:
Page 278 and 279:
Page 280 and 281:
Page 282 and 283:
Page 284 and 285:
272 APPENDIX F. ANSWERS TO QUICK QU
Page 286 and 287:
Page 288 and 289:
Page 290 and 291:
Page 292 and 293:
Page 294 and 295:
Page 296 and 297:
Page 298 and 299:
Page 300 and 301:
Page 302 and 303:
Page 304 and 305:
Page 306 and 307:
Page 308 and 309:
Page 310 and 311:
Page 312 and 313:
Page 314 and 315:
Page 316 and 317:
Page 318 and 319:
Page 320 and 321:
Page 322 and 323:
Page 324 and 325:
Page 326 and 327:
Page 328 and 329:
Page 330 and 331:
Page 332 and 333:
Page 334 and 335:
Page 336 and 337:
Page 338 and 339:
Page 340 and 341:
Page 342 and 343:
330 APPENDIX G. GLOSSARY(2) A physi
Page 344 and 345:
332 APPENDIX G. GLOSSARYnear by. Th
Page 346 and 347:
334 APPENDIX G. GLOSSARY
Page 348 and 349:
336 BIBLIOGRAPHY[But97]USA, March 2
Page 350 and 351:
338 BIBLIOGRAPHY[HMB06][Hol03][HP95
Page 352 and 353:
340 BIBLIOGRAPHY[McK06] Paul E. McK
Page 354 and 355:
342 BIBLIOGRAPHYtor. Software - Pra
Page 356 and 357:
344 BIBLIOGRAPHY[UoC08][VGS08]Berke
Page 358:
346 APPENDIX H. CREDITSH.4 Original
show all

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Create successful ePaper yourself

Delete template?

Save as template?