10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 9Applying RCUThis chapter showshow toapplyRCU tosomeexamplesdiscussed earlier in this book. In some cases,RCU provides simpler code, in other cases betterperformance and scalability, and in still other cases,both.9.1 RCU and Per-Thread-Variable-Based StatisticalCountersSection 4.2.4 described an implementation of statisticalcounters that provided excellent performance,roughly that of simple increment (as in the C ++ operator),and linear scalability — but only for incrementingvia inc_count(). Unfortunately, threadsneedingtoreadoutthevalueviaread_count()wererequired to acquire a global lock, and thus incurredhigh overhead and suffered poor scalability. Thecode for the lock-based implementation is shown inFigure 4.8 on Page 33.Quick Quiz 9.1: Why on earth did we need thatglobal lock in the first place???9.1.1 DesignThe hope is to use RCU rather thanfinal_mutex toprotectthethreadtraveralinread_count()inorderto obtain excellent performance and scalabilty fromread_count(), rather than just from inc_count().However, we do not want to give up any accuracyin the computed sum. In particular, when a giventhread exits, we absolutely cannot lose the exitingthread’s count, nor can we double-count it. Such anerror could result in inaccuracies equal to the fullprecision of the result, in other words, such an errorwould make the result completely useless. <strong>And</strong> infact, oneofthepurposesoffinal_mutexistoensurethat threads do not come and go in the middle ofread_count() execution.Quick Quiz 9.2: Just what is the accuracy ofread_count(), anyway?Therefore, if we are to dispense with final_mutex, we will need to come up with some othermethod for ensuring consistency. One approach is toplacethetotalcountforallpreviouslyexitedthreadsand the array of pointers to the per-thread countersinto a single structure. Such a structure, once madeavailable to read_count(), is held constant, ensuringthat read_count() sees consistent data.9.1.2 ImplementationLines 1-4 of Figure 9.1 show the countarray structure,which contains a ->total field for the countfrom previously exited threads, and a counterp[]array of pointers to the per-thread counter for eachcurrently running thread. This structure allowsa given execution of read_count() to see a totalthat is consistent with the indicated set of runningthreads.Lines 6-8 contain the definition of the per-threadcounter variable, the global pointer countarraypreferencing the current countarray structure, andthe final_mutex spinlock.Lines 10-13 show inc_count(), which is unchangedfrom Figure 4.8.Lines 15-29 show read_count(), which haschanged significantly. Lines 21 and 27 substitutercu_read_lock() and rcu_read_unlock() for acquisitionand release of final_mutex. Line 22uses rcu_dereference() to snapshot the currentcountarray structure into local variable cap.Proper use of RCU will guarantee that thiscountarray structure will remain with us throughat least the end of the current RCU read-side criticalsection at line 27. Line 23 initializes sum tocap->total, which is the sum of the counts ofthreads that have previously exited. Lines 24-26 addup the per-thread counters corresponding to currentlyrunning threads, and, finally, line 28 returns

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!