The C11 and C++11 Concurrency Model

More documents

Recommendations

Info

36 Multi-copy atomicity Some memory models, like x86 and SC, order all of the writes in the system by, for example, maintaining a shared memory as part of the state of the model. Other memory models are weaker and allow different threads to see writes across the system in different orders. Consider the following example, called IRIW+addrs for independent reads of independent writes with address dependencies: int x = 0; int y = 0; x = 1; y = 1; r1 = y; r3 = x; r2 = *(&x+r1-r1); r4 = *(&y+r3-r3); In this test, there are two writing threads and two reading threads, and the question is whetherthetworeadingthreadscanseethewritesindifferentorders; canthevaluesofr1, r2, r3 and r4 end up being 1/0/1/0? On the Power and ARM architectures, this outcome would be allowed by the thread-local speculation mechanism if the dependencies were removed. With the dependencies in place, an unoptimised machine-instruction version of the test probes whether the Power and ARM storage subsystem requires writes across the system to be observed in an order that is consistent across all threads, a property called multi-copy atomicity [3]. Neither Power nor ARM are multi-copy atomic, so the outcome 1/0/1/0 is allowed. Cumulativity Power and ARM provide additional guarantees that constrain multicopy atomicity when dependencies are chained together across multiple threads. Consider the following example, called ISA2 [71], where the dependency to the read of x is extended by inserting a write and read to a new location, z, before the read of x. Note that there is still a chain of dependencies from the read of y to the write of x: int x = 0; int y = 0; x = 1; r1 = y; r2 = z; lwsync; z = 1+r1-r1; r3 = *(&x+r2-r2); y = 1; Here there are loads of y, z and x. The outcome 1/1/0 is not visible on Power and ARM because the ordering guaranteed by the lwsync is extended through the following dependency chain. Without the lwsync or the dependencies the relaxed behaviour 1/1/0 would be allowed: the writes on Thread 1 could be committed or propagated out of order, theinstructionsofThread2or3couldbecommittedoutoforder, orthewritesofThreads 1 and 2 could be propagated out of order to Thread 3. With the addition of the barriers and dependencies, it is clear that Thread 1 and 2 must commit in order, and that Thread 1 must propagate its writes in order.
37 ItisnotyetobviousthatthewritesofThread1mustbepropagatedtoThread3before the write of Thread 2, however. A new guarantee called B-cumulativity, provides ordering through executions with chained dependencies following an lwsync. In this example, B- cumulativity ensures that the store of x is propagated to Thread 3 before the store of z. The example above shows that B-cumulativity extends ordering to the right of an lwsync, A-cumulativity extends ordering to the left: consider the following example, called WRC+lwsync+addr for write-to-read causality [37] with an lwsync and an address dependency. Thread 1 writes x, and Thread 2 writes y: int x = 0; int y = 0; x = 1; r1 = x; r2 = y; lwsync r3 = *(&x + r2 - r2); y = 1; The lwsync and the address dependency prevent thread-local speculation from occurring on Threads 2 and 3, but there is so far nothing to force the write on Thread 1 to propagate to Thread 3 before the write of Thread 2, allowing the outcome 1/1/0 for the reads of x on Thread 2, y on Thread 3 and x on Thread 3. We define the group A writes to be those that have propagated to the thread of an lwsync at the point that it is executed. The write of x is in the group A of the lwsync in the execution of the program above. A-cumulativity requires group A writes to propagate to all threads before writes that follow the barrier in program order, guaranteeing that the outcome 1/1/0 is forbidden on Power and ARM architectures; it provides ordering to dependency chains to the left of an lwsync. Note that in either case of cumulativity, if the dependencies were replaced by lwsyncs or syncs, then the ordering would still be guaranteed. Load-linked store-conditional ThePowerandARMarchitecturesprovideload-linked (LL) and store-conditional (SC) instructions that allow the programmer to load from a location, and then store only if no other thread accessed the location in the interval between the two. These instructions allow the programmer to establish consensus as the global lock did in x86. The load-linked instruction is a load from memory that works in conjunction with a program-order-later store-conditional. The store-conditional has two possible outcomes; it can store to memory, or it may fail if the coherence commitment order is sufficiently unconstrained, allowing future steps of the abstract machine to place writes before it. On success, load-linked and store-conditional instructions atomically read and then write a location, and can be used to implement language features like compare-and-swap.
Page 1: The C11 and C++11 Concurrency Model
Page 5: Mark John Batty The C11 and C++11 C
Page 8 and 9: 8 3.5.1 Release sequences . . . . .
Page 10 and 11: 10 B.1 The pre-execution type . . .
Page 12 and 13: 12
Page 14 and 15: 14 stores to memory are interleaved
Page 16 and 17: 16 still, relaxed-concurrencybugsca
Page 18 and 19: 18 to sequential consistency [37],
Page 20 and 21: 20 ory model. It has been used for
Page 22 and 23: 22 by Dubois et al. [48]. C/C++11 c
Page 24 and 25: 24 There has been some work on veri
Page 26 and 27: 26 int x = 0; int y = 0; x = 1; y =
Page 28 and 29: 28 On the SC memory model this prog
Page 30 and 31: 30 The x86 architecture provides th
Page 32 and 33: 32 Write request Read request Barri
Page 34 and 35: 34 coherence-commitment order restr
Page 38 and 39: 38 Further details of the Power and
Page 40 and 41: 40 they can be guaranteed with no d
Page 42 and 43: 42 indivisible events that affect t
Page 44 and 45: 44 undefined behaviour. In the prog
Page 46 and 47: 46 On each architecture, this is su
Page 48 and 49: 48
Page 50 and 51: 50 sublanguages and the related par
Page 52 and 53: 52 int main() { int x = 2; int y =
Page 54 and 55: 54 a:W NA x=0 sb int main() { int x
Page 56 and 57: 56 effects of a particular read are
Page 58 and 59: 58 | Blocked rmw l → lk l = Atomi
Page 60 and 61: 60 let det read (Xo, Xw, :: (“vse
Page 62 and 63: 62 let indeterminate reads (Xo, Xw,
Page 64 and 65: 64 let single thread memory model =
Page 66 and 67: 66 becomes: int main() { int x = 0;
Page 68 and 69: 68 additional-synchronises-withedge
Page 70 and 71: 70 let locks only consistent locks
Page 72 and 73: 72 let data races (Xo, Xw, (“hb
Page 74 and 75: 74 = ( (¬ (a = b)) ∧ is write a
Page 76 and 77: 76 the atomic location is accessed.
Page 78 and 79: 78 Message passing, MP The first ex
Page 80 and 81: 80 a:W NA x=0 sb rf b:W NA y=0 rf r
Page 82 and 83: 82 the C/C++11 analogue of the test
Page 84 and 85: 84 On Power and ARM, the analogue o
Page 86 and 87:
86 stores of this fragment of the l
Page 88 and 89:
88 int main() { atomic_int x = 0; a
Page 90 and 91:
90 int main() { int x = 0; atomic_i
Page 92 and 93:
92 and ARM all guarantee that each
Page 94 and 95:
94 Release fences In the example in
Page 96 and 97:
96 ( is fence a ∧ is release a
Page 98 and 99:
98 3.7 Programs with SC atomics Pro
Page 100 and 101:
100 not forbid the store-buffering
Page 102 and 103:
102 int main() { atomic_int x = 0;
Page 104 and 105:
104 ( (w, w ′ ) ∈ Xw.mo ∧ (w
Page 106 and 107:
Page 108 and 109:
108 conjunct of sc fenced sc fences
Page 110 and 111:
110 int main() { int x = 0; atomic_
Page 112 and 113:
112 let r = sw ∪ dob ∪ (compose
Page 114 and 115:
114 of the read. This write forms t
Page 116 and 117:
116
Page 118 and 119:
118 behaviour of the memory model.
Page 120 and 121:
120 At the top right, there are con
Page 122 and 123:
Page 124 and 125:
124 A release sequence headed by a
Page 126 and 127:
126 [ Note: The visible sequence of
Page 128 and 129:
Page 130 and 131:
130 5.6 Undefined loops In C/C++11
Page 132 and 133:
132 Next consider the execution bel
Page 134 and 135:
134 ultimately faulty specification
Page 136 and 137:
136 air problem. It seems clear the
Page 138 and 139:
138 void main() { atomic_int x = 0;
Page 140 and 141:
140 5.10.2 Possible solutions One c
Page 142 and 143:
142 programs with thin-air values w
Page 144 and 145:
144 Java introduces a great deal of
Page 146 and 147:
Page 148 and 149:
148
Page 150 and 151:
150 standard model T. 1 with consum
Page 152 and 153:
152 tions, or they both produce und
Page 154 and 155:
154 the two models are almost ident
Page 156 and 157:
156 h, in the acyclic relation mo.
Page 158 and 159:
158 | RMW mo → (mo ∈ {Acq rel,
Page 160 and 161:
160 Programs without SC atomics Rem
Page 162 and 163:
162 match a with | Lock → true |
Page 164 and 165:
164 Theorem 8. (∀ opsem p. static
Page 166 and 167:
166 is release rel ∧ ( (b = rel)
Page 168 and 169:
168 Now we show that this equivalen
Page 170 and 171:
170 is a dynamic property that can
Page 172 and 173:
172 standard model T. 1 with consum
Page 174 and 175:
174 [...][ Note: It can be shown th
Page 176 and 177:
176 • consistent hb Furthermore,
Page 178 and 179:
178 | Load mo → (mo ∈ {NA, Seq
Page 180 and 181:
180 The induction will proceed by a
Page 182 and 183:
182 exists a consistent execution i
Page 184 and 185:
184 The domain and range of hbscr i
Page 186 and 187:
186 implies that there is no tot in
Page 188 and 189:
188 There are two directions to est
Page 190 and 191:
190 incorporates the new action. Th
Page 192 and 193:
192 Now for each sort of fault, we
Page 194 and 195:
194 Theorem 14. For a program that
Page 196 and 197:
196 we need only show that the sc-o
Page 198 and 199:
198
Page 200 and 201:
200 This chapter presents theorems
Page 202 and 203:
202 C++0x actions a:W NA x=1 d:R AC
Page 204 and 205:
204 and consume atomics require the
Page 206 and 207:
206 Thread 0 has an lwsync as in th
Page 208 and 209:
208 The thread and storage subsyste
Page 210 and 211:
210 we know that any inter-thread h
Page 212 and 213:
212
Page 214 and 215:
214 behaviour in the specification.
Page 216 and 217:
216 atomic Seq S; void init() { sto
Page 218 and 219:
218 Message passing (MP): int a, b,
Page 220 and 221:
220 in composition with a client wi
Page 222 and 223:
222 tation in an arbitrary client c
Page 224 and 225:
224 push and pop in an execution. T
Page 226 and 227:
226 execution of the component exte
Page 228 and 229:
228 Then for some Z ∈ C(L 2 )(I
Page 230 and 231:
230
Page 232 and 233:
232 it is possible to identify erra
Page 234 and 235:
234 The release-sequence of C/C++11
Page 236 and 237:
236 mer’s memory model. The CPU a
Page 238 and 239:
238 17.3 defines additional terms t
Page 240 and 241:
240 and the other is not, or if the
Page 242 and 243:
242 abstract machine with the same
Page 244 and 245:
244 a = ((a + b) + 32765); since if
Page 246 and 247:
246 | RMW of aid ∗ tid ∗ memory
Page 248 and 249:
248 not otherwise specifically sequ
Page 250 and 251:
250 gram can potentially access eve
Page 252 and 253:
252 acquire fence, a release fence,
Page 254 and 255:
254 This relation does not order th
Page 256 and 257:
256 performs a consume operation on
Page 258 and 259:
258 • A is sequenced before B, or
Page 260 and 261:
260 The model represents visible si
Page 262 and 263:
262 where the three relations disag
Page 264 and 265:
264 CoRW In this coherence violatio
Page 266 and 267:
266 referred to as “sequential co
Page 268 and 269:
268 29 Atomic operations library [a
Page 270 and 271:
270 In the model, the memory order
Page 272 and 273:
272 [...]) (∃c∈actions. (a, c)
Page 274 and 275:
274 4 For an atomic operation B tha
Page 276 and 277:
276 ForatomicoperationsAandB onanat
Page 278 and 279:
278 However, implementations should
Page 280 and 281:
280 | RMW l → lk l = Atomic | Fen
Page 282 and 283:
282 The model captures these potent
Page 284 and 285:
284 bool atomic_compare_exchange_we
Page 286 and 287:
286 expected = current.load(); do {
Page 288 and 289:
288 31 Effects: fetch key(operand)
Page 290 and 291:
290 [...] (a, x) ∈ sb ∧ (x, y)
Page 292 and 293:
292 extern "C" void atomic_signal_f
Page 294 and 295:
294 30.3.1.2 thread constructors [t
Page 296 and 297:
296 Header synopsis [elided] 30.4.
Page 298 and 299:
298 The behaviour of locks and unlo
Page 300 and 301:
300 9 Postcondition: The calling th
Page 302 and 303:
302 let locks only bad mutexes (Xo,
Page 304 and 305:
304 after the unlock call returns.
Page 306 and 307:
306 |〉 threads : set (tid); lk :
Page 308 and 309:
308 let aid of a = match a with | L
Page 310 and 311:
310 Effects: Blocks the calling thr
Page 312 and 313:
312 atomic location. On failure the
Page 314 and 315:
314 Requires: The failure argument
Page 316 and 317:
316 | Fence mo → mo ∈ {Release,
Page 318 and 319:
318
Page 320 and 321:
320 The thread-local semantics coll
Page 322 and 323:
322 B.2.2 Modification order Modifi
Page 324 and 325:
324 isIrreflexive Xw.lo ∧ ∀ a
Page 326 and 327:
326 A release sequence headed by a
Page 328 and 329:
328 creation, mutex accesses, atomi
Page 330 and 331:
330
Page 332 and 333:
332 An atomic operation A that is a
Page 334 and 335:
334 B.3.5 Dependency ordered before
Page 336 and 337:
336 the inter-thread-happens-before
Page 338 and 339:
338 B.3.9 Visible sequence of side
Page 340 and 341:
340 B.4.1 Coherence The previous se
Page 342 and 343:
342 The coherence restriction These
Page 344 and 345:
344 Second read-from derivative In
Page 346 and 347:
346 To cover the second and third c
Page 348 and 349:
348 This restriction requires reads
Page 350 and 351:
350 [...]If a side effect on a scal
Page 352 and 353:
352 The model captures these potent
Page 354 and 355:
354 successful to have an interveni
Page 356 and 357:
356 each empty M.undefined X then D
Page 358 and 359:
358 type program impl = nat type ti
Page 360 and 361:
360 let named predicate tree measur
Page 362 and 363:
362 C.3 Projection functions let ai
Page 364 and 365:
364 | RMW → true | → false end
Page 366 and 367:
366 | Store mo → mo = Seq cst | R
Page 368 and 369:
368 blocking observed Xo.actions Xo
Page 370 and 371:
370 a ′ ∈ fringe set Xo ′ A
Page 372 and 373:
372 val indeterminate reads : candi
Page 374 and 375:
374 let locks only consistent locks
Page 376 and 377:
376 val locks only behaviour : ∀
Page 378 and 379:
378 |〉〈| rf flag = true; mo fl
Page 380 and 381:
380 (“release acquire coherent me
Page 382 and 383:
382 val release acquire relaxed beh
Page 384 and 385:
384 behaviourrelease acquire fenced
Page 386 and 387:
386 ∀ (Xo, Xw, rl) ∈ Xs. ∀ a
Page 388 and 389:
388 let sc fenced behaviour opsem (
Page 390 and 391:
390 behaviour with consume memory m
Page 392 and 393:
392 let release acquire SC conditio
Page 394 and 395:
394 let bounded executions (Xs : se
Page 396 and 397:
396 { (a, b) | ∀ a ∈ Xo.actions
Page 398 and 399:
398 statically satisfied single thr
Page 400 and 401:
400 let vse = visible side effect s
Page 402 and 403:
402
Page 404 and 405:
404 [15] J. Alglave, D. Kroening, V
Page 406 and 407:
406 [42] L. Censier and P. Feautrie
Page 408 and 409:
408 [68] S. Mador-Haim, L. Maranget
Page 410 and 411:
410 ization and analysis of message
Page 412 and 413:
Index actions, 41, 245 additional s
Page 414:
414 Last updated: Saturday 29 th No
show all

The C11 and C++11 Concurrency Model

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?