10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.4. EXACT LIMIT COUNTERS 41addition of line 7, which is now required to split outcounter and countermax from counterandmax.The code forflush_local_count(), which movesall threads’ local counter state to the global counter,is shown on lines 14-32. Line 22 checks to seeif the value of globalreserve permits any perthreadcounts, and, if not, line 23 returns. Otherwise,line 24 initializes local variable zero to acombined zeroed counter and countermax. Theloop spanning lines 25-31 sequences through eachthread. Line 26 checks to see if the current threadhas counter state, and, if so, lines 27-30 move thatstate to the global counters. Line 27 atomicallyfetches the current thread’s state while replacing itwith zero. Line 28 splits this state into its counter(in local variable c) and countermax (in local variablecm) components. Line 29 adds this thread’scounter to globalmax, while line 30 subtracts thisthread’s countermax from globalreserve.Quick Quiz 4.34: <strong>What</strong>stopsathreadfromsimplyrefilling itscounterandmax variable immediatelyafterflush_local_count()online14ofFigure4.18empties it?Quick Quiz 4.35: <strong>What</strong> prevents concurrentexecution of the fastpath of either atomic_add() or atomic_sub() from interfering withthe counterandmax variable while flush_local_count()isaccessingitonline27ofFigure4.18emptiesit?Lines 34-54 show the code for balance_count(), which refills the calling thread’s localcounterandmax variable. This function is quite similartothatoftheprecedingalgorithms,withchangesrequired to handle the merged counterandmax variable.Detailed analysis of the code is left as anexercise for the reader, as it is with the count_register_thread()functionstartingonline56andthe count_unregister_thread() function startingon line 65.Quick Quiz 4.36: Given that the atomic_set() primitive does a simple store to the specifiedatomic_t, how can line 53 of balance_count()in Figure 4.18 work correctly in face of concurrentflush_local_count() updates to this variable?4.4.2 Atomic Limit Counter DiscussionThis is the first implementation that actually allowsthe counter to be run all the way to either of its limits,butitdoessoattheexpenseofaddingatomicoperationsto the fastpaths, which slow down the fastpathssignificantly. Although some workloads mighttolerate this slowdown, it is worthwhile looking forREQneedflushIDLE!countingcountingACKnocountflushedREADYdonecountingFigure 4.19: Signal-Theft State Machinealgorithms with better read-side performance. Onesuch algorithm uses a signal handler to steal countsfrom other threads. Because signal handlers run inthecontextofthesignaledthread, atomicoperationsare not necessary, as shown in the next section.4.4.3 Signal-Theft Limit Counter DesignFigure 4.19 shows the state diagram. The statemachine starts out in the IDLE state, and whenadd_count() or sub_count() find that the combinationof the local thread’s count and the globalcount cannot accommodate the request, the correspondingslowpath sets each thread’s theft state toREQ (unless that thread has no count, in which caseit transitions directly to READY). Only the slowpath,which holds thegblcnt_mutex lock, is permittedto transition from the IDLE state, as indicatedby the green color. The slowpath then sends a signalto each thread, and the corresponding signal handlerchecks the corresponding thread’s theft andcounting variables. <strong>If</strong> the theft state is not REQ,then the signal handler is not permitted to changethe state, and therefore simply returns. Otherwise,if the counting variable is set, indicating that thecurrent thread’s fastpath is in progress, the signalhandler sets the theft state to ACK, otherwise toREADY.<strong>If</strong> the theft state is ACK, only the fastpath ispermitted to change the theft state, as indicatedby the blue color. When the fastpath completes, it

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!