10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

24 CHAPTER 3. TOOLS OF THE TRADE1.110.9ideal1 pthread_rwlock_t rwl = PTHREAD_RWLOCK_INITIALIZER;2 int holdtime = 0;3 int thinktime = 0;4 long long *readcounts;5 int nreadersrunning = 0;67 #define GOFLAG_INIT 08 #define GOFLAG_RUN 19 #define GOFLAG_STOP 210 char goflag = GOFLAG_INIT;1112 void *reader(void *arg)13 {14 int i;15 long long loopcnt = 0;16 long me = (long)arg;1718 __sync_fetch_and_add(&nreadersrunning, 1);19 while (ACCESS_ONCE(goflag) == GOFLAG_INIT) {20 continue;21 }22 while (ACCESS_ONCE(goflag) == GOFLAG_RUN) {23 if (pthread_rwlock_rdlock(&rwl) != 0) {24 perror("pthread_rwlock_rdlock");25 exit(-1);26 }27 for (i = 1; i < holdtime; i++) {28 barrier();29 }30 if (pthread_rwlock_unlock(&rwl) != 0) {31 perror("pthread_rwlock_unlock");32 exit(-1);33 }34 for (i = 1; i < thinktime; i++) {35 barrier();36 }37 loopcnt++;38 }39 readcounts[me] = loopcnt;40 return NULL;41 }Figure 3.9: Measuring Reader-Writer Lock ScalabilityCritical Section Performance0.80.70.60.50.40.30.20.110K1K00 20 40 60 80 100 120 140Number of CPUs (Threads)100M10M1M100KFigure 3.10: Reader-Writer Lock Scalabilitythe compiler to fetch goflag on each pass throughthe loop—the compiler would otherwise be withinits rights to assume that the value of goflag wouldnever change.The loop spanning lines 22-38 carries out the performancetest. Lines 23-26 acquire the lock, lines 27-29 hold the lock for the specified duration (and thebarrier() directive prevents the compiler from optimizingthe loop out of existence), lines 30-33 releasethe lock, and lines 34-36 wait for the specifieddurationbeforere-acquiringthelock. Line37countsthis lock acquisition.Line 38 moves the lock-acquisition count to thisthread’s element of the readcounts[] array, andline 40 returns, terminating this thread.Figure 3.10 shows the results of running this teston a 64-core Power-5 system with two hardwarethreads per core for a total of 128 software-visibleCPUs. The thinktime parameter was zero for allthese tests, and the holdtime parameter set to valuesranging from one thousand (“1K” on the graph)to 100 million (“100M” on the graph). The actualvalue plotted is:L NNL 1(3.1)where N is the number of threads, L N is the numberoflockacquisitionsbyN threads, andL 1 isthenumberof lock acquisitions by a single thread. Givenideal hardware and software scalability, this value

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!