10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

32 CHAPTER 4. COUNTING4.2.3 Eventually Consistent ImplementationOne way to retain update-side scalability whilegreatly improving read-side performance is toweaken consistency requirements. While the countingalgorithm in the previous section is guaranteedto return a value between the value that anideal counter would hace taken on near the beginningof read_count()’s execution and that near theend of read_count()’s execution. Eventual consistency[Vog09] provides a weaker guarantee: inabsence of calls to inc_count(), calls to read_count() will eventually return the correct answer.We exploit eventual consistency by maintaining aglobal counter. However, updaters only manipulatetheir per-thread counters. A separate thread is providedto transfer counts from the per-thread countersto the global counter. Readers simply access thevalue of the global counter. <strong>If</strong> updaters are active,the value used by the readers will be out of date,however, once updates cease, the global counter willeventually converge on the true value—hence thisapproach qualifies as eventually consistent.The implementation is shows in Figure 4.7(count_stat_eventual.c). Lines 1-2 show the perthreadvariable and the global variable that trackthe counter’s value, and line three shows stopflagwhich is used to coordinate termination (for the casewhere we want to terminate the program with anaccurate counter value). The inc_count() functionshown on lines 5-8 is identical to its counterpartin Figure 4.5. The read_count() functionshown on lines 10-13 simply returns the value of theglobal_count variable.However, the count_init() function on lines 34-42 creates theeventual() thread shown on lines 15-32, which cycles through all the threads, usingthe atomic_xchg() function to remove count fromeach thread’s local counter, adding the sum to theglobal_count variable. The eventual() threadwaits an arbitrarly chosen one millisecond betweenpasses. The count_cleanup() function on lines 44-50 coordinates termination.This approach gives extremely fast counter readoutwhile still supporting linear counter-update performance.However, this excellent read-side performanceand update-side scalability comes at thecost of high update-side overhead, due to both theatomic operations and the array indexing hidden inthe __get_thread_var() primitive, which can bequite expensive on some CPUs with deep pipelines.Quick Quiz 4.16: Why does inc_count() inFigure 4.7 need to use atomic instructions?1 DEFINE_PER_THREAD(atomic_t, counter);2 atomic_t global_count;3 int stopflag;45 void inc_count(void)6 {7 atomic_inc(&__get_thread_var(counter));8 }910 unsigned long read_count(void)11 {12 return atomic_read(&global_count);13 }1415 void *eventual(void *arg)16 {17 int t;18 int sum;1920 while (stopflag < 3) {21 sum = 0;22 for_each_thread(t)23 sum += atomic_xchg(&per_thread(counter, t), 0);24 atomic_add(sum, &global_count);25 poll(NULL, 0, 1);26 if (stopflag) {27 smp_mb();28 stopflag++;29 }30 }31 return NULL;32 }3334 void count_init(void)35 {36 thread_id_t tid;3738 if (pthread_create(&tid, NULL, eventual, NULL) != 0) {39 perror("count_init:pthread_create");40 exit(-1);41 }42 }4344 void count_cleanup(void)45 {46 stopflag = 1;47 while (stopflag < 3)48 poll(NULL, 0, 1);49 smp_mb();50 }Figure 4.7: Array-Based Per-Thread EventuallyConsistent Counters

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!