10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8.3. READ-COPY UPDATE (RCU) 77pens before the initialization of p fields, then concurrentreaders could see the uninitialized values.Memory barriers are required to keep things ordered,but memory barriers are notoriously difficultto use. We therefore encapsulate them into a primitivercu_assign_pointer()thathaspublicationsemantics.Thelastfourlineswouldthenbeasfollows:1 p->a = 1;2 p->b = 2;3 p->c = 3;4 rcu_assign_pointer(gp, p);The rcu_assign_pointer() would publish thenew structure, forcing both the compiler and theCPU to execute the assignment to gp after the assignmentsto the fields referenced by pHowever, it is not sufficient to only enforce orderingattheupdater,asthereadermustenforceproperordering as well. Consider for example the followingcode fragment:1 p = gp;2 if (p != NULL) {3 do_something_with(p->a, p->b, p->c);4 }Although this code fragment might well seem immuneto misordering, unfortunately, the DEC AlphaCPU [McK05a, McK05b] and value-speculationcompiler optimizations can, believe it or not, causethe values ofp->a, p->b, andp->c to be fetched beforethe value of p. This is perhaps easiest to see inthecaseofvalue-speculationcompileroptimizations,where the compiler guesses the value of p fetchesp->a, p->b, and p->c then fetches the actual valueof p in order to check whether its guess was correct.Thissortofoptimization isquiteaggressive, perhapsinsanely so, but does actually occur in the contextof profile-driven optimization.Clearly, we need to prevent this sort of skullduggeryon the part of both the compiler and the CPU.The rcu_dereference() primitive uses whatevermemory-barrier instructions and compiler directivesare required for this purpose:1 rcu_read_lock();2 p = rcu_dereference(gp);3 if (p != NULL) {4 do_something_with(p->a, p->b, p->c);5 }6 rcu_read_unlock();The rcu_dereference() primitive can thus bethoughtofassubscribing toagivenvalueofthespecifiedpointer, guaranteeing that subsequent dereferenceoperations will see any initialization thatnext next next nextprevprev prev prevA B CFigure 8.6: Linux Circular Linked ListA B CFigure 8.7: Linux Linked List Abbreviated1 struct foo {2 struct list_head *list;3 int a;4 int b;5 int c;6 };7 LIST_HEAD(head);89 /* . . . */1011 p = kmalloc(sizeof(*p), GFP_KERNEL);12 p->a = 1;13 p->b = 2;14 p->c = 3;15 list_add_rcu(&p->list, &head);Figure 8.8: RCU Data Structure Publicationoccurred before the corresponding publish (rcu_assign_pointer() operation. The rcu_read_lock() andrcu_read_unlock() calls areabsolutelyrequired: they define the extent of the RCU readsidecritical section. Their purpose is explained inSection 8.3.1.2, however, they never spin or block,nor do they prevent the list_add_rcu() fromexecuting concurrently. In fact, in non-CONFIG_PREEMPT kernels, they generate absolutely no code.Although rcu_assign_pointer() andrcu_dereference() can in theory be usedto construct any conceivable RCU-protecteddata structure, in practice it is often betterto use higher-level constructs. Therefore, thercu_assign_pointer() and rcu_dereference()primitives have been embedded in special RCUvariants of Linux’s list-manipulation API. Linuxhas two variants of doubly linked list, the circularstruct list head and the linear structhlist head/struct hlist node pair. The formeris laid out as shown in Figure 8.6, where the greenboxes represent the list header and the blue boxesrepresent the elements in the list. This notation iscumbersome, and will therefore be abbreviated asshown in Figure 8.7.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!