10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

5.3. SYNCHRONIZATION GRANULARITY 59doing atomic increments in a tight loop. 7 The valueof µ is therefore about 40,000,000 atomic incrementsper second.Of course, the value of λ increases with increasingnumbers of CPUs, as each CPU is capable of processingtransactions independently (again, ignoringsynchronization):λ = nλ 0 (5.1)where n is the number of CPUs and λ 0 is thetransaction-processing capability of a single CPU.Note that the expected time for a single CPU toexecute a single transaction is 1/λ 0 .Because the CPUs have to “wait in line” behindeachothertogettheirchancetoincrementthesingleshared variable, we can use the M/M/1 queueingmodelexpressionfortheexpectedtotalwaitingtime:Figure 5.20: Data Locking and Skew5.3.5 Locking Granularity and PerformanceThis section looks at locking granularity and performancefrom a mathematical synchronizationefficientyviewpoint. Readers who are uninspired bymathematics might choose to skip this section.The approach is to use a crude queueing model forthe efficiency of synchronization mechanism that operateon a single shared global variable, based on anM/M/1 queue. M/M/1 queuing models are basedon an exponentially distributed “inter-arrival rate”λ and an exponentially distributed “service rate” µ.The inter-arrival rate λ can be thought of as the averagenumber of synchronization operations per secondthat the system would process if the synchronizationwere free, in other words, λ is an inversemeasureoftheoverheadofeachnon-synchronizationunit of work. For example, if each unit of work was atransaction, if each transaction took one millisecondto process, not counting synchronization overhead,then λ would be 1,000 transactions per second.The service rate µ is defined similarly, but for theaverage number of synchronization operations persecond that the system would process if the overheadof each transaction was zero, and ignoring thefact that CPUs must wait on each other to completetheir increment operations, in other words, µ can beroughly thought of as the synchronization overheadin absence of contention. For example, some recentcomputer systems are able to do an atomic incrementevery 25 nanoseconds or so if all CPUs areT = 1µ−λSubstituting the above value of λ:T =(5.2)1µ−nλ 0(5.3)Now, the efficiency is just the ratio of the timerequired to process a transaction in absence of synchronizationto the time required including synchronization:e = 1/λ 0T +1/λ 0(5.4)Substituting the above value for T and simplifying:e =µλ 0−nµλ 0−(n−1)(5.5)But the value of µ/λ 0 is just the ratio of the timerequired to process the transaction (absent synchronizationoverhead) to that of the synchronizationoverhead itself (absent contention). <strong>If</strong> we call thisratio f, we have:e =f −nf −(n−1)(5.6)Figure 5.21 plots the synchronization efficiency eas a function of the number of CPUs/threads n fora few values of the overhead ratio f. For example,again using the 25-nanosecond atomic increment,the f = 10 line corresponds to each CPU attempting7 Of course, if there are 8 CPUs, each CPU must wait 175nanoseconds for each of the other CPUs to do its incrementbefore consuming an additional 25 nanoseconds doing its ownincrement.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!