10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.6. PARALLEL COUNTING DISCUSSION 45zero that the counter can operate efficiently. Whensomeonewantstoremovethedevice, thisbiasissubtractedfrom the counter value. Counting the lastfew accesses will be quite inefficient, but the importantpoint is that the many prior accesses will havebeen been counted at full speed.Quick Quiz 4.45: <strong>What</strong>elsehadyoubetterhavedone when using a biased counter?Although a biased counter can be quite helpfuland useful, it is only a partial solution to the removableI/O device access-count problem called out onpage 29. When attempting to remove a device, wemust not only know the precise number of currentI/O accesses, we also need to prevent any future accessesfrom starting. One way to accomplish this isto read-acquire a reader-writer lock when updatingthe counter, and to write-acquire that same readerwriterlock when checking the counter. Code fordoing I/O might be as follows:1 read_lock(&mylock);2 if (removing) {3 read_unlock(&mylock);4 cancel_io();5 } else {6 add_count(1);7 read_unlock(&mylock);8 do_io();9 sub_count(1);10 }Line 1 read-acquires the lock, and either line 3 or7 releases it. Line 2 checks to see if the device isbeing removed, and, if so, line 3 releases the lockand line 4 cancels the I/O, or takes whatever actionisappropriategiventhatthedeviceistoberemoved.Otherwise, line 6 increments the access count, line 7releases the lock, line 8 performs the I/O, and line 9decrements the access count.Quick Quiz 4.46: This is ridiculous! We areread-acquiring a reader-writer lock to update thecounter? <strong>What</strong> are you playing at???Thecodetoremovethedevicemightbeasfollows:1 write_lock(&mylock);2 removing = 1;3 sub_count(mybias);4 write_unlock(&mylock);5 while (read_count() != 0) {6 poll(NULL, 0, 1);7 }8 remove_device();Line1write-acquiresthelockandline3releasesit.Line 2 notes that the device is being removed, andthe loop spanning lines 5-7 wait for any I/O operationsto complete. Finally, line 8 does any additionalprocessing needed to prepare for device removal.Quick Quiz 4.47: <strong>What</strong> other issues would needto be accounted for in a real system?4.6 <strong>Parallel</strong> Counting DiscussionThis chapter has presented the reliability, performance,and scalability problems with traditionalcounting primitives. The C-language ++ operator isnot guaranteed to function reliably in multithreadedcode, and atomic operations to a single variableneither perform nor scale well. This chapter hasalso presented a number of counting algorithms thatperform and scale extremely well in certain specialcases.Table 4.1 shows the performance of the three parallelstatistical counting algorithms. All three algorithmsprovide perfect linear scalability for updates.The per-thread-variable implementation issignificantly faster on updates than the array-basedimplementation, but is slower at reads, and sufferssevere lock contention when there are many parallelreaders. This contention can be addressed usingtechniques introduced in Chapter 8, as shown on thelast row of Table 4.1.Quick Quiz 4.48: On the count_stat.c row ofTable 4.1, we see that the update side scales linearlywith the number of threads. How is that possiblegiven that the more threads there are, the more perthreadcounters must be summed up?Quick Quiz 4.49: Even on the last row of Table4.1, the read-side performance of these statisticalcounter implementations is pretty horrible. <strong>So</strong> whybother with them?Figure 4.2 shows the performance of the parallellimit-counting algorithms. Exact enforcement of thelimits incurs a substantial performance penalty, althoughon the Power 5 system this penalty can bereduced by substituting read-side signals for updatesideatomicoperations.Alloftheseimplementationssuffer from read-side lock contention in the face ofconcurrent readers.Quick Quiz 4.50: Given the performance datashown in Table 4.2, we should always prefer updatesidesignals over read-side atomic operations, right?Quick Quiz 4.51: <strong>Can</strong> advanced techniques beapplied to address the lock contention for readersseen in Table 4.2?The fact that these algorithms only work well intheir respective special cases might be considered amajor problem with parallel programming in gen-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!