10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8.3. READ-COPY UPDATE (RCU) 89type. In other words, these lockless algorithms cantolerate a given data element being freed and reallocatedas the same type of structure while theyare referencing it, but must prohibit a change intype. This guarantee, called “type-safe memory”in academic literature [GC96], is weaker than theexistence guarantees in the previous section, and istherefore quite a bit harder to work with. Type-safememory algorithms in the Linux kernel make useof slab caches, specially marking these caches withSLAB_DESTROY_BY_RCU so that RCU is used whenreturning a freed-up slab to system memory. Thisuse of RCU guarantees that any in-use element ofsuch a slab will remain in that slab, thus retainingits type, for the duration of any pre-existing RCUread-side critical sections.Quick Quiz 8.20: But what if there is an arbitrarilylong series of RCU read-side critical sectionsin multiple threads, so that at any point in timethere is at least one thread in the system executingin an RCU read-side critical section? Wouldn’t thatprevent any data from a SLAB_DESTROY_BY_RCU slabever being returned to the system, possibly resultingin OOM events?These algorithms typically use a validation stepthat checks to make sure that the newly referenceddata structure really is the one that was requested[LS86, Section 2.5]. These validation checksrequire that portions of the data structure remainuntouched by the free-reallocate process. Such validationchecks are usually very hard to get right, andcan hide subtle and difficult bugs.Therefore, although type-safety-based lockless algorithmscan be extremely helpful in a very few difficultsituations, you should instead use existenceguarantees where possible. Simpler is after all almostalways better!8.3.2.7 RCU is a Way of Waiting for Thingsto FinishAs noted in Section 8.3.1 an important componentofRCUisawayofwaitingforRCUreaderstofinish.One of RCU’s great strengths is that it allows youto wait for each of thousands of different things tofinish without having to explicitly track each and everyone of them, and without having to worry abouttheperformancedegradation, scalability limitations,complex deadlock scenarios, and memory-leak hazardsthat are inherent in schemes that use explicittracking.In this section, we will show how synchronize_sched()’s read-side counterparts (which includeanything that disables preemption, along with hard-1 struct profile_buffer {2 long size;3 atomic_t entry[0];4 };5 static struct profile_buffer *buf = NULL;67 void nmi_profile(unsigned long pcvalue)8 {9 struct profile_buffer *p = rcu_dereference(buf);1011 if (p == NULL)12 return;13 if (pcvalue >= p->size)14 return;15 atomic_inc(\&p->entry[pcvalue]);16 }1718 void nmi_stop(void)19 {20 struct profile_buffer *p = buf;2122 if (p == NULL)23 return;24 rcu_assign_pointer(buf, NULL);25 synchronize_sched();26 kfree(p);27 }Figure 8.25: Using RCU to Wait for NM<strong>Is</strong> to Finishware operations and primitives that disable irq)permit you to implement interactions with nonmaskableinterrupt (NMI) handlers that would bequite difficult if using locking. This approach hasbeen called ”Pure RCU” [McK04], and it is used ina number of places in the Linux kernel.The basic form of such ”Pure RCU” designs is asfollows:1. Make a change, for example, to the way thatthe OS reacts to an NMI.2. Wait for all pre-existing read-side critical sectionsto completely finish (for example, by usingthe synchronize_sched() primitive). Thekey observation here is that subsequent RCUread-side critical sections are guaranteed to seewhatever change was made.3. Clean up, for example, return status indicatingthat the change was successfully made.The remainder of this section presents examplecode adapted from the Linux kernel. In this example,the timer_stop function uses synchronize_sched() to ensure that all in-flight NMI notificationshave completed before freeing the associatedresources. A simplified version of this code is shownFigure 8.25.Lines1-4defineaprofile_bufferstructure, containinga size and an indefinite array of entries.Line 5 defines a pointer to a profile buffer, whichis presumably initialized elsewhere to point to a dynamicallyallocated region of memory.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!