10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

44 CHAPTER 4. COUNTING1 unsigned long read_count(void)2 {3 int t;4 unsigned long sum;56 spin_lock(&gblcnt_mutex);7 sum = globalcount;8 for_each_thread(t)9 if (counterp[t] != NULL)10 sum += *counterp[t];11 spin_unlock(&gblcnt_mutex);12 return sum;13 }Figure 4.23: Signal-Theft Limit Counter Read Functionreordering the fastpath body to follow line 13, whichpermits any subsequent signal handlers to undertaketheft. Line 14 again disables compiler reordering,and then line 15 checks to see if the signal handlerdeferred the theft state-change to READY, and, ifso, line 16 executes a memory barrier to ensure thatany CPU that sees line 17 setting state to READYalso sees the effects of line 9. <strong>If</strong> the fastpath additionat line 9 was executed, then line 20 returns success.Otherwise, we fall through to the slowpath startingat line 21. The structure of the slowpath is similarto those of earlier examples, so its analysis is leftas an exercise to the reader. Similarly, the structureof sub_count() on lines 38-71 is the same as that ofadd_count(), so the analysis of sub_count() is alsoleft as an exercise for the reader, as is the analysisof read_count() in Figure 4.23.Lines 1-12 of Figure 4.24 show count_init(),which set up flush_local_count_sig() as the signalhandler for SIGUSR1, enabling the pthread_kill() calls in flush_local_count() to invokeflush_local_count_sig(). The code for threadregistry and unregistry is similar to that of earlierexamples, so its analysis is left as an exercise for thereader.4.4.5 Signal-Theft Limit CounterDiscussionThe signal-theft implementation runs more thantwice as fast as the atomic implementation on myIntel Core Duo laptop. <strong>Is</strong> it always preferable?The signal-theft implementation would be vastlypreferable on Pentium-4 systems, given their slowatomic instructions, but the old 80386-based SequentSymmetrysystemswoulddomuchbetterwiththe shorter path length of the atomic implementation.<strong>If</strong> ultimate performance is of the essence, youwill need to measure them both on the system thatyour application is to be deployed on.1 void count_init(void)2 {3 struct sigaction sa;45 sa.sa_handler = flush_local_count_sig;6 sigemptyset(&sa.sa_mask);7 sa.sa_flags = 0;8 if (sigaction(SIGUSR1, &sa, NULL) != 0) {9 perror("sigaction");10 exit(-1);11 }12 }1314 void count_register_thread(void)15 {16 int idx = smp_thread_id();1718 spin_lock(&gblcnt_mutex);19 counterp[idx] = &counter;20 countermaxp[idx] = &countermax;21 theftp[idx] = &theft;22 spin_unlock(&gblcnt_mutex);23 }2425 void count_unregister_thread(int nthreadsexpected)26 {27 int idx = smp_thread_id();2829 spin_lock(&gblcnt_mutex);30 globalize_count();31 counterp[idx] = NULL;32 countermaxp[idx] = NULL;33 theftp[idx] = NULL;34 spin_unlock(&gblcnt_mutex);35 }Figure 4.24: Signal-Theft Limit Counter InitializationFunctionsThis is but one reason why high-quality AP<strong>Is</strong> areso important: they permit implementations to bechanged as required by ever-changing hardware performancecharacteristics.Quick Quiz 4.44: <strong>What</strong> if you want an exactlimit counter to be exact only for its lower limit?4.5 Applying Specialized <strong>Parallel</strong>CountersAlthough the exact limit counter implementationsin Section 4.4 can be very useful, they are not muchhelp if the counter’s value remains near zero at alltimes, as it might when counting the number of outstandingaccesses to an I/O device. The high overheadof such near-zero counting is especially painfulgiven that we normally don’t care how many referencesthere are. As noted in the removable I/O deviceaccess-count problem on page 29, the number ofaccesses is irrelevant except in those rare cases whensomeone is actually trying to remove the device.One simple solution to this problem is to add alarge “bias” (for example, one billion) to the counterin order to ensure that the value is far enough from

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!