10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

284 APPENDIX F. ANSWERS TO QUICK QUIZZESprovide multiple counters is left as an exercise tothe reader.Quick Quiz 4.16:Why does inc_count() in Figure 4.7 need to useatomic instructions?Answer:<strong>If</strong> non-atomic instructions were used, counts couldbe lost.Quick Quiz 4.17:Won’t the single global thread in the functioneventual() of Figure 4.7 be just as severe abottleneck as a global lock would be?Answer:In this case, no. <strong>What</strong> will happen instead isthat the estimate of the counter value returned byread_count() will become more inaccurate.Quick Quiz 4.18:Won’t the estimate returned by read_count() inFigure 4.7 become increasingly inaccurate as thenumber of threads rises?Answer:Yes. <strong>If</strong> this proves problematic, one fix is to providemultiple eventual() threads, each covering its ownsubset of the other threads. In even more extremecases, a tree-like hierarchy of eventual() threadsmight be required.Quick Quiz 4.19:Why do we need an explicit array to find theother threads’ counters? Why doesn’t gcc providea per_thread() interface, similar to the Linuxkernel’s per_cpu() primitive, to allow threads tomore easily access each others’ per-thread variables?Answer:Why indeed?To be fair, gcc faces some challenges that theLinux kernel gets to ignore. When a user-levelthread exits, its per-thread variables all disappear,which complicates the problem of per-threadvariableaccess, particularly before the advent ofuser-level RCU. In contrast, in the Linux kernel,when a CPU goes offline, that CPU’s per-CPU variablesremain mapped and accessible.Similarly, when a new user-level thread is created,its per-thread variables suddenly come into existence.Incontrast,intheLinuxkernel,allper-CPUvariables are mapped and initialized at boot time,regardless of whether the corresponding CPU existsyet, or indeed, whether the corresponding CPU willever exist.A key limitation that the Linux kernel imposesis a compile-time maximum limit on the numberof CPUs, namely, CONFIG_NR_CPUS. In contrast, inuser space, there is no hard-coded upper limit on thenumber of threads.Of course, both environments must deal withdyanmically loaded code (dynamic libraries in userspace, kernel modules in the Linux kernel), whichincreases the complexity of per-thread variables inboth environments.These complications make it significantly harderfor user-space environments to provide access toother threads’ per-thread variables. Nevertheless,such access is highly useful, and it is hoped that itwill someday appear.Quick Quiz 4.20:Why on earth do we need something as heavyweightas a lock guarding the summation in the functionread_count() in Figure 4.8?Answer:Remember, when a thread exits, its per-threadvariables disappear. Therefore, if we attempt toaccess a given thread’s per-thread variables afterthat thread exits, we will get a segmentation fault.The lock coordinates summation and thread exit,preventing this scenario.Of course, we could instead read-acquire a readerwriterlock, but Chapter 8 will introduce evenlighter-weight mechanisms for implementing the requiredcoordination.Quick Quiz 4.21:Why on earth do we need to acquire the lock incount_register_thread() in Figure 4.8??? <strong>It</strong> isa single properly aligned machine-word store to alocation that no other thread is modifying, so itshould be atomic anyway, right?Answer:This lock could in fact be omitted, but better safethan sorry, especially given that this function isexecuted only at thread startup, and is therefore

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!