10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

F.4. CHAPTER 4: COUNTING 289<strong>You</strong>r user application hanging!Quick Quiz 4.44:<strong>What</strong> if you want an exact limit counter to be exactonly for its lower limit?Answer:One simple solution is to overstate the upper limitby the desired amount. The limiting case of suchoverstatement results in the upper limit being seto the largest value that the counter is capable ofrepresenting.3. The I/O might fail, and so do_io() will likelyneed a return value.4. <strong>If</strong>thedevicefails, thelastI/Omightnevercomplete.In such cases, there might need to besome sort of timeout to allow error recovery.5. Both add_count() and sub_count() can fail,but their return values are not checked.6. Reader-writer locks do not scale well. One wayof avoiding the high read-acquisition costs ofreader-writer locks is presented in Chapter 8.Quick Quiz 4.45:<strong>What</strong> else had you better have done when using abiased counter?Answer:<strong>You</strong> had better have set the upper limit to belarge enough accommodate the bias, the expectedmaximum number of accesses, and enough “slop”to allow the counter to work efficiently even whenthe number of accesses is at its maximum.Quick Quiz 4.46:This is ridiculous! We are read-acquiring a readerwriterlock to update the counter? <strong>What</strong> are youplaying at???Answer:Strange, perhaps, but true! Almost enough to makeyou think that the name “reader-writer lock” waspoorly chosen, isn’t it?Quick Quiz 4.47:<strong>What</strong> other issues would need to be accounted forin a real system?Answer:A huge number!!!Here are a few to start with:1. There could be any number of devices, so thattheglobalvariablesareinappropriate, asarethelack of arguments to functions like do_io().2. Polling loops can be problematic in real systems.In many cases, it is far better to have thelast completing I/O wake up the device-removalthread.Quick Quiz 4.48:On the count_stat.c row of Table 4.1, we see thatthe update side scales linearly with the number ofthreads. How is that possible given that the morethreads there are, the more per-thread countersmust be summed up?Answer:The read-side code must scan the entire fixed-sizearray, regardless of the number of threads, so thereis no difference in performance. In contrast, in thelast two algorithms, readers must do more workwhen there are more threads. In addition, thelast two algorithms interpose an additional level ofindirection because they map from integer threadID to the corresponding __thread variable.Quick Quiz 4.49:Even on the last row of Table 4.1, the read-sideperformance of these statistical counter implementationsis pretty horrible. <strong>So</strong> why bother withthem?Answer:“Use the right tool for the job.”As can be seen from Figure 4.3, single-variableatomic increment need not apply for any job involvingheavy use of parallel updates. In contrast, thealgorithms shown in Table 4.1 do an excellent job ofhandling update-heavy situations. Of course, if youhave a read-mostly situation, you should use somethingelse, for example, a single atomically incrementedvariable that can be read out using a singleload.Quick Quiz 4.50:Given the performance data shown in Table 4.2,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!