10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

5.2. DESIGN CRITERIA 53left-hand spinlock, line 45 left-enqueues the elementonto the left-hand queue, and finally line 46 releasesthe lock. The pdeq_enqueue_r() implementation(shown on lines 49-56) is quite similar.1 struct list_head *pdeq_dequeue_l(struct pdeq *d)2 {3 struct list_head *e;4 int i;56 spin_lock(&d->llock);7 e = deq_dequeue_l(&d->ldeq);8 if (e == NULL) {9 spin_lock(&d->rlock);10 e = deq_dequeue_l(&d->rdeq);11 list_splice_init(&d->rdeq.chain, &d->ldeq.chain);12 spin_unlock(&d->rlock);13 }14 spin_unlock(&d->llock);15 return e;16 }1718 struct list_head *pdeq_dequeue_r(struct pdeq *d)19 {20 struct list_head *e;21 int i;2223 spin_lock(&d->rlock);24 e = deq_dequeue_r(&d->rdeq);25 if (e == NULL) {26 spin_unlock(&d->rlock);27 spin_lock(&d->llock);28 spin_lock(&d->rlock);29 e = deq_dequeue_r(&d->rdeq);30 if (e == NULL) {31 e = deq_dequeue_r(&d->ldeq);32 list_splice_init(&d->ldeq.chain, &d->rdeq.chain);33 }34 spin_unlock(&d->llock);35 }36 spin_unlock(&d->rlock);37 return e;38 }3940 void pdeq_enqueue_l(struct list_head *e, struct pdeq *d)41 {42 int i;4344 spin_lock(&d->llock);45 deq_enqueue_l(e, &d->ldeq);46 spin_unlock(&d->llock);47 }4849 void pdeq_enqueue_r(struct list_head *e, struct pdeq *d)50 {51 int i;5253 spin_lock(&d->rlock);54 deq_enqueue_r(e, &d->rdeq);55 spin_unlock(&d->rlock);56 }Figure 5.11: Compound <strong>Parallel</strong> <strong>Do</strong>uble-EndedQueue Implementation5.1.2.5 <strong>Do</strong>uble-Ended Queue DiscussionThe compound implementation is somewhat morecomplex than the hashed variant presented in Section5.1.2.3,butisstillreasonablysimple. Ofcourse,a more intelligent rebalancing scheme could be arbitrarilycomplex, but the simple scheme shown herewill work well in a number of reasonable situations.The key point is that there can be significant overheadenqueueing to or dequeueing from a sharedqueue. <strong>It</strong> is therefore critically important that theoverhead of the typical unit of work be quite largecompared to the enqueue/dequeue overhead.5.1.3 Partitioning Example DiscussionThe optimal solution to the dining philosophersproblem given in the answer to the Quick Quiz inSection 5.1.1 is an excellent example of “horizontalparallelism”or“dataparallelism”. Thesynchronizationoverhead in this case is nearly (or even exactly)zero. In contrast, the double-ended queue implementationsare examples of “vertical parallelism” or“pipelining”, given that data moves from one threadto another. The tighter coordination required forpipelining in turn requires larger units of work toobtain a given level of efficiency.Quick Quiz 5.9: The tandem double-endedqueue runs about twice as fast as the hashed doubleendedqueue, even when I increase the size of thehash table to an insanely large number. Why isthat?Quick Quiz 5.10: <strong>Is</strong> there a significantly betterway of handling concurrency for double-endedqueues?These two examples show just how powerful partitioningcan be in devising parallel algorithms. However,these example beg for more and better designcriteria for parallel programs, a topic taken up in thenext section.5.2 Design CriteriaSection 1.2 called out the three parallelprogramminggoals of performance, productivity,and generality. However, more detailed design criteriaare required to actually produce a real-worlddesign, a task taken up in this section. This being

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!