10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

46 CHAPTER 4. COUNTINGReadsAlgorithm Section Updates 1 Core 64 Corescount_stat.c 4.2.2 40.4 ns 220 ns 220 nscount_end.c 4.2.4 6.7 ns 521 ns 205,000 nscount_end_rcu.c 9.1 6.7 ns 481 ns 3,700 nsTable 4.1: Statistical Counter Performance on Power 5ReadsAlgorithm Section Exact? Updates 1 Core 64 Corescount_lim.c 4.9 N 9.7 ns 517 ns 202,000 nscount_lim_app.c 4.3.4 N 6.6 ns 520 ns 205,000 nscount_lim_atomic.c 4.4.1 Y 56.1 ns 606 ns 166,000 nscount_lim_sig.c 4.4.4 Y 17.5 ns 520 ns 205,000 nsTable 4.2: Limit Counter Performance on Power 5eral. After all, the C-language ++ operator worksjust fine in single-threaded code, and not just forspecial cases, but in general, right?This line of reasoning does contain a grain oftruth, but is in essence misguided. The problemis not parallelism as such, but rather scalability. Tounderstand this, first consider the C-language++ operator.The fact is that it does not work in general,only for a restricted range of numbers. <strong>If</strong> youneed to deal with 1,000-digit decimal numbers, theC-language ++ operator will not work for you.Quick Quiz 4.52: The ++ operator works justfine for 1,000-digit numbers!!! Haven’t you heard ofoperator overloading???This problem is not specific to arithmetic. Supposeyou need to store and query data. Should youuse an ASCII file, XML, a relational database, alinked list, a dense array, a B-tree, a radix tree, oranyoftheplethoraofotherdatastructuresandenvironmentsthatpermitdatatobestoredandqueried?<strong>It</strong> depends on what you need to do, how fast youneed it done, and how large your data set is.Similarly, if you need to count, your solution willdepend on how large of numbers you need to workwith, how many CPUs need to be manipulating agiven number concurrently, how the number is to beused, and what level of performance and scalabilityyou will need.Nor is this problem specific to software. The designfor a bridge meant to allow people to walkacross a small brook might be a simple as a plankthrown across the brook. But this solution of usinga plank does not scale. <strong>You</strong> would probably not usea plank to span the kilometers-wide mouth of theColumbia River, nor would such a design be advisablefor bridges carrying concrete trucks. In short,just as bridge design must change with increasingspan and load, so must software design change asthe number of CPUs increases.The examples in this chapter have shown that animportant tool permitting large numbers of CPUsto be brought to bear is partitioning. whether fullypartitioned, asinthestatisticalcountersdiscussedinSection 4.2, or partially partitioned as in the limitcounters discussed in Sections 4.3 and 4.4. Partitioningwill be considered in far greater depth in thenext chapter.Quick Quiz 4.53: But if we are going to haveto partition everything, why bother with sharedmemorymultithreading? Why not just partition theproblem completely and run as multiple processes,each in its own address space?

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!