Using Kilim's isolation types for multicore efficiency
Using Kilim's isolation types for multicore efficiency
Using Kilim's isolation types for multicore efficiency
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
UNIVERSITYOF<br />
CAMBRIDGE<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong><br />
<strong>efficiency</strong><br />
Alan Mycroft (∗)<br />
Computer Laboratory, University of Cambridge<br />
http://www.cl.cam.ac.uk/users/am/<br />
5 October 2011<br />
(*) I greatfully acknowledge the contributions of Sriram Srinivasan [Kilim,<br />
ECOOP’2008] and Robert Ennals and Richard Sharp [PacLang, ESOP’2004].<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 1 FoVeOOS’2011
AM’s abstract<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
We identify a ‘memory <strong>isolation</strong>’ property which enables multi-core<br />
programs to avoid slowdown due to cache contention.<br />
We give a tutorial on existing work on Kilim and its <strong>isolation</strong>-type<br />
system building bridges with both substructural <strong>types</strong> and memory<br />
<strong>isolation</strong>.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 2 FoVeOOS’2011
The big picture<br />
• Traditional object-orientation mode of thinking:<br />
– everything is an object<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
– any object can reference (or even alias) any other – limited<br />
only by type<br />
Spaghetti data!<br />
• Modern hardware isn’t like this:<br />
– memory isn’t uni<strong>for</strong>m (or sequentially consistent)<br />
– multi-core parallelism<br />
• Spaghetti is bad <strong>for</strong> software engineering too<br />
• Programming languages can help [PacLang, Singularity, Kilim]<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 3 FoVeOOS’2011
What’s Kilim, and what’s new?<br />
• Kilim is a message-passing (actor) framework <strong>for</strong> Java with<br />
ultra-lightweight threads and zero-copy message passing<br />
[ECOOP’2008].<br />
• A type system contrains pointer aliasing (using Java<br />
annotations) to make message passsing (pass-by-value) and<br />
pass-by-reference equivalent<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
• What’s new? No single idea, but the combination of ideas gives<br />
an effective language design point.<br />
• This talk highlights issues caused by current multi-core<br />
processors.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 4 FoVeOOS’2011
Why does multi-core affect things?<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
• We want more than “call and wait <strong>for</strong> result” – otherwise we’re<br />
only using one CPU. So: threads, spawn . . .<br />
• Threads, locking, races, sequentially-inconsistent memory . . .<br />
• Caches mean that where computations are done greatly affects<br />
speed (doing all computations on one CPU is clearly bad).<br />
Method placement: by user or by system?<br />
• Communication costs, heterogeneous multi-core: placement is a<br />
big issue.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 5 FoVeOOS’2011
How do programming languages help?<br />
• Specify invariants which analysers may find hard to discover<br />
• Provide compiler-en<strong>for</strong>ced invariants<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
• Examples: <strong>types</strong>, “this method always executes on processor 42”.<br />
• Needs careful design: <strong>types</strong> are ubiquitously accepted but<br />
“always execute on processor 42” is likely to be too low-level and<br />
inflexible during system evolution.<br />
• This talk: type-like invariants saying “no other thread has an<br />
alias to my data”.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 6 FoVeOOS’2011
• an example<br />
• modern hardware<br />
Talk structure<br />
• sub-structural type systems, ownership<br />
• Kilim<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 7 FoVeOOS’2011
Example banking code<br />
void checktotal(AccountList p, int expected)<br />
{ int sum = 0<br />
}<br />
<strong>for</strong> (i in p) sum += i.balance;<br />
if (sum != expected) report_oddity();<br />
void movemoney(AccountList p, int amount)<br />
{ p.first.balance -= amount;<br />
}<br />
p.second.balance += amount;<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
Questions:<br />
1. when can movemoney() and checktotal() be run in parallel?<br />
2. when can movemoney() be run in parallel with itself?<br />
3. Locking? Do you know how much this costs?<br />
It helps <strong>for</strong> compilers to understand aliasing.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 8 FoVeOOS’2011
• an example<br />
• modern hardware<br />
Talk structure<br />
• sub-structural type systems, ownership<br />
• Kilim<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 9 FoVeOOS’2011
Introduction to multi-core hardware semantics<br />
Two ‘obvious’ but incorrect statements:<br />
1. If a program with two threads runs on a single-core processor<br />
then it will run unchanged on a two-core processor.<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
2. (After correcting problems in 1.) a two-thread program will run<br />
faster on a two-core processor than it runs on a single-core<br />
processor.<br />
Brief answer: need to think of multi-core processors as distibuted<br />
systems not merely as concurrent systems.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 10 FoVeOOS’2011
Multi-core and sequential consistency<br />
On a single-core (implementing tasking using interrupts) the<br />
instructions in each thread are interleaved.<br />
volatile int x=0,y=0;<br />
thread1: { x=1; print "y=",y; }<br />
thread2: { y=1; print "x=",x; }<br />
For most executions this program prints x=0, y=1 or x=1, y=0;<br />
• relatively rarely it prints x=1, y=1;<br />
• but it never prints x=0, y=0.<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
But on multi-core x=0, y=0 can happen! Not all executions interleave<br />
instructions from the two threads (they do on a single-core processor<br />
with interrupt-driven scheduling). Failure of sequential consistency.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 11 FoVeOOS’2011
Multi-core and sequential consistency (2)<br />
volatile int x=0,y=0;<br />
thread1: { x=1; print "y=",y; }<br />
thread2: { y=1; print "x=",x; }<br />
Why can x=0, y=0 happen?<br />
• On an isolated computation it is quite valid <strong>for</strong> a read and a<br />
write to distinct locations to be re-ordered<br />
• Single-core processors exploit this <strong>for</strong> speed (pipelining)<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
• Manufacturers of multi-core processors want a single CPU of a<br />
<strong>multicore</strong> processor to be as fast as a CPU of a single core.<br />
Solution: programmer’s responsibility(!) to fix such races, e.g. by<br />
locking, or using mfence [both expensive in time].<br />
Find “[Relaxed] Memory Model” <strong>for</strong> guidance.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 12 FoVeOOS’2011
Multi-core hardware semantics (2)<br />
Incorrect statement: a two-thread program will run faster on a<br />
two-core processor than it runs on a single-core processor.<br />
Reason: caches [ . . . implicit data copying]<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 13 FoVeOOS’2011
A programmer’s view of memory<br />
✛<br />
✚<br />
CPU<br />
✘<br />
✙<br />
1 cycle<br />
✲<br />
(This model was pretty accurate in 1985.)<br />
✛<br />
MEMORY<br />
✚<br />
✘<br />
✙<br />
A 2004-era single-core view of memory and timings<br />
✛<br />
CPU<br />
✚<br />
✘✛<br />
✲<br />
✙✚<br />
2<br />
L1 cache<br />
✘✛<br />
✲<br />
✙✚<br />
10<br />
L2 cache<br />
✘✛<br />
✲<br />
✙✚<br />
200<br />
MEMORY<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
✘<br />
✙<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 14 FoVeOOS’2011
Multi-core-chip memory models<br />
Today’s model (cache simplified to one level):<br />
✛<br />
other CPU<br />
or GPU etc ✲<br />
✚<br />
✛<br />
CPU 2–15<br />
✚<br />
✛<br />
CPU 1<br />
✚<br />
✘✛<br />
✙2<br />
✚<br />
✻<br />
✘✛<br />
❄<br />
✲2<br />
✙✚<br />
✲✻<br />
✘✛<br />
❄<br />
✲<br />
coherency<br />
2<br />
FAST<br />
MEMORY<br />
incoherency<br />
CACHES 2-15<br />
CACHE 1<br />
✙✚<br />
✘<br />
✙<br />
✘<br />
✙❄<br />
✘<br />
✒<br />
❅<br />
❅❘<br />
✚<br />
200<br />
✙<br />
✛ DMA<br />
200 ✛<br />
MEMORY<br />
✘<br />
✙<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 15 FoVeOOS’2011
Why can a program run slower on <strong>multicore</strong>?<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
• CPU1 writes a cache line ⇒ data in CPU2’s cache is discarded.<br />
• CPU2 now writes this cache line ⇒ the line must be reloaded<br />
(even from memory) and data in CPU1’s cache discarded.<br />
• A two-core version of virtual memory ‘thrashing’ (or of repeated<br />
cache reloading in naive big-matrix multiplication).<br />
It’s harmless (or even good) if two threads running on the same<br />
processor core both access a cache line (since all accesses will hit the<br />
cache), but . . .<br />
it’s bad if two threads running on different processor cores access the<br />
same cache line. Program runs slower on two cores than one!<br />
Multicore cache model is like multiple-reader/single-writer . . .<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 16 FoVeOOS’2011
How should we react?<br />
Shared memory and parallelism is problematic:<br />
• hard to write correct code (locks, mfence etc. needed)<br />
• sequential consistency surprises in semantics<br />
• cache effect surprises in per<strong>for</strong>mance<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
But lots of these problems go away if the programmer knows when<br />
shared data is logically being transferred between cores.<br />
Type systems (richer than usual) can express and en<strong>for</strong>ce this . . .<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 17 FoVeOOS’2011
Aside: shared memory vs. message passing<br />
Which is more efficient?<br />
Classical answer is shared memory (because message passing is<br />
‘implemented on top of it’).<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
However, note that on a dual-core processor the illusion of shared<br />
memory is implemented via message passing in the cache-coherency<br />
protocol!<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 18 FoVeOOS’2011
Caches are multiple-reader/single-writer<br />
Caches are like ownership:<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
• A core writing seizes ownership of the cache line (rescinding all<br />
ownership held by other cores)<br />
• A core reading shares ownership with the most recent writer and<br />
all intervening readers of that cache line.<br />
The transition from no-ownership to ownership costs time.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 19 FoVeOOS’2011
Effective use of modern hardware – conclusions<br />
• Need careful control of concurrent memory access.<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
• For <strong>efficiency</strong> program has to be aware when data (a cache-line)<br />
is newly-accessed by a new CPU. Writes invalidate all but the<br />
writer CPU’s copy of a cache line. Single-writer multiple-reader<br />
view.<br />
• Solution: need to express and control inter-processor aliasing at<br />
source level.<br />
(understanding aliasing is also a good idea <strong>for</strong> software<br />
engineering purposes!)<br />
• Best speed-up is when separate cores access disjoint memory.<br />
• Minimise the number of ownership transfers (i.e. cache-level data<br />
movement).<br />
Tidy/improve<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 20 FoVeOOS’2011
• an example<br />
• modern hardware<br />
Talk structure<br />
• sub-structural type systems, ownership<br />
• Kilim<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 21 FoVeOOS’2011
Classical Type Systems<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
Compilers and theory of type systems model the variables in scope<br />
with a environment of type assumptions Γ – pairs x : t.<br />
We have the judgement <strong>for</strong>m Γ ⊢ e : t which holds when expression e<br />
has type t under assumptions Γ.<br />
Note that Γ changes only at scope entry/exit.<br />
The standard rule <strong>for</strong> variables says:<br />
Γ ⊢ x : t<br />
(provided x : t ∈ Γ)<br />
Note that together these say (almost too obvious to notice/question):<br />
“each use of a variable in a scope has the same type”.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 22 FoVeOOS’2011
Type Systems – weakness<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
Variables keeping the same type throughout a block means that the<br />
two errors in the following program can’t be detected by the type<br />
system:<br />
{ char *x = malloc(10); // x has type char *<br />
foo(x); // x has type char *<br />
free(x); // x has type char * (but should not??)<br />
foo(x); // a well-known disaster...<br />
x = malloc(20); // x has type char * (and should)<br />
// problem: a memory leak now occurs<br />
} // as x goes out-of-scope<br />
Replacing free(x) with pass_ownership(x,task42) shows the<br />
weakness of classical <strong>types</strong> at controlling sharing of data.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 23 FoVeOOS’2011
Type Systems – weakness (2)<br />
• We want each variable to be free’d exactly (∗) once.<br />
• ‘Once’ means ‘once on each possible control path’.<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
• Let’s develop a type system in which each variable can only be<br />
used once – and then refine it to “at most one free-like<br />
operation”<br />
(∗) at most once if we don’t care about space leaks!<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 24 FoVeOOS’2011
Classical and Linear Type Systems (1)<br />
Standard ML-like type system:<br />
(VAR) Γ[x : t] ⊢ x : t<br />
(LAM)<br />
Γ[x : t] ⊢ e : t′<br />
(INT) Γ ⊢ n : int<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
(BOOL) Γ ⊢ b : bool<br />
Γ ⊢ λx.e : t → t ′ (APP) Γ ⊢ e1 : t → t ′ Γ ⊢ e2 : t<br />
Γ ⊢ e1 e2 : t ′<br />
(COND) Γ ⊢ e1 : bool Γ ⊢ e2 : t Γ ⊢ e3 : t<br />
Γ ⊢ if e1 then e2 else e3 : t<br />
Note that Γ only changes on scope change (here LAM).<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 25 FoVeOOS’2011
Classical and Linear Type Systems (2)<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
An equivalent type system. Assumptions Γ, ∆ are now multi-sets (i.e.<br />
permutable lists):<br />
(VAR) [x : t] ⊢ x : t<br />
(INT) [ ] ⊢ n : int<br />
(BOOL) [ ] ⊢ b : bool<br />
Γ[x : t] ⊢ e : t′<br />
(LAM)<br />
Γ ⊢ λx.e : t → t ′ (APP) Γ ⊢ e1 : t → t ′ ∆ ⊢ e2 : t<br />
Γ, ∆ ⊢ e1 e2 : t ′<br />
(COND) Γ ⊢ e1 : bool ∆ ⊢ e2 : t ∆ ⊢ e3 : t<br />
Γ, ∆ ⊢ if e1 then e2 else e3 : t<br />
Γ ⊢ e : t<br />
(WEAKEN)<br />
Γ, ∆ ⊢ e : t<br />
Γ, ∆, ∆ ⊢ e : t<br />
(CONTRACT)<br />
Γ, ∆ ⊢ e : t<br />
Why do this? Answer: substructural type systems – can now adjust<br />
WEAKEN/CONTRACT.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 26 FoVeOOS’2011
Substructural type systems<br />
(WEAKEN) and (CONTRACT) are known as structural rules.<br />
Γ ⊢ e : t<br />
(WEAKEN)<br />
Γ, ∆ ⊢ e : t<br />
Γ, ∆, ∆ ⊢ e : t<br />
(CONTRACT)<br />
Γ, ∆ ⊢ e : t<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
They control how the assumptions Γ are can be passed within an<br />
inference tree. [There’s a correspondence between combinators S and<br />
K and (CONTRACT) and (WEAKEN).]<br />
<strong>Using</strong> both WEAKEN and CONTRACT gives a system equivalent to<br />
the previous one.<br />
Without CONTRACT every variable must be used at most once.<br />
Without WEAKEN every variable must be used at least once.<br />
This is a good start <strong>for</strong> banning use-after-free and memory-leak<br />
errors. [Apology: I’m slight cheating in that higher-order functions<br />
are considerably more complicated.]<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 27 FoVeOOS’2011
Consider:<br />
How to read this intuitively<br />
(COND) Γ ⊢ e1 : bool ∆ ⊢ e2 : t ∆ ⊢ e3 : t<br />
Γ, ∆ ⊢ if e1 then e2 else e3 : t<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
• Use of ∆ says both arms of if-then-else get the same variables.<br />
• The use of Γ, ∆ in the result of the inference rule says: “share<br />
out the variables in scope: passing some (Γ) to e1 and some (∆)<br />
to e2 (and e3).”<br />
It helps to read inference rules from the bottom (goal-oriented).<br />
However, we want to refine this all-or-nothing partition – we’d like to<br />
allow (e.g.) x.f to be tested in e1 be<strong>for</strong>e x is freed in e2 (and not<br />
vice-versa).<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 28 FoVeOOS’2011
An algebra of capabilities<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
Above in Γ, ∆, the “,” operator shares each x : t into either Γ or ∆.<br />
(Cf. partial function.)<br />
We’d like an algebra of capabilities, e.g. {unused, free, readonly}<br />
where unused models partiality above, and free means ‘this variable is<br />
allowed to be free’d”. These are combined with the “;” (afterwards)<br />
operator, giving <strong>for</strong> example:<br />
• unused ; x = x<br />
• x ; unused = x<br />
• readonly ; readonly = readonly<br />
• readonly ; free = free<br />
• free ; readonly is undefined<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 29 FoVeOOS’2011
[A subtlety in the type system.]<br />
An algebra of capabilities (2)<br />
We actually need another operator on environments to deal with<br />
passing the same variable as multiple arguments to a procedure:<br />
int g(free x, readonly y)<br />
{ if (...) { deallocate(x); return y.field; }<br />
}<br />
else { int z = y.field; deallocate(x); return z; }<br />
int f(free x) { return g(x,x); }<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
The function g() is perfectly fine – but a problem arises when its<br />
arguments alias (as in the call from f()). The “+” operator on<br />
environments <strong>for</strong>bids f() by being slightly more restrictive than “;”<br />
on argument tuples.<br />
• both free + readonly and readonly + free are undefined.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 30 FoVeOOS’2011
From algebra to algorithm<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
This ‘immaculate’ splitting “;” is great <strong>for</strong> type theorists, but<br />
non-obvious how it’s implemented. Observe it behaves like a sum on<br />
usages – e.g. the sum effect of a readonly use and a free use is a free.<br />
When implementing this in a traditional type-checking left-right<br />
depth-first tree walk we implement the sum above as difference:<br />
• if x has type free [able to be freed] and the LHS operand only<br />
uses x as readonly then free still remains <strong>for</strong> the RHS<br />
• I.e. free − readonly = free<br />
• But readonly − free is undefined<br />
• Similarly x − unused = x<br />
• But unused − x is undefined (<strong>for</strong> x = unused).<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 31 FoVeOOS’2011
Aside: Separation logic<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
People familiar with judgements in separation logic will recognise its<br />
rules as being a substructural system: H1 ∗ H2 on heaps (partial<br />
maps from locations to values) is only defined when<br />
dom H1 ∩ dom H2 = {}.<br />
Indeed Parkinson’s generalisation to fractional ownership, e.g.<br />
[x 0.4<br />
→ a, y 0.5<br />
→ b] ∗ [x 0.3<br />
→ a, z 0.6<br />
→ c] = [x 0.7<br />
→ a, y 0.5<br />
→ b, z 0.6<br />
→ c]<br />
is similarly inspired by avoiding “all-or-nothing” ownership and<br />
parallels ‘;’ and ‘+’ above.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 32 FoVeOOS’2011
• an example<br />
• modern hardware<br />
Talk structure<br />
• sub-structural type systems, ownership<br />
• Kilim – a Java-based practical solution<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 33 FoVeOOS’2011
Kilim syntax<br />
• Syntax same as Java, with additional type qualifiers: @free,<br />
@cuttable, @safe and we add @readonly.<br />
• These qualify classes marked as Message.<br />
• Fields of Message <strong>types</strong> can only be other Message <strong>types</strong> and<br />
Java primitive <strong>types</strong>.<br />
• Design choice: Message <strong>types</strong> are trees and have no heap<br />
aliasing. Subversive question: what does<br />
class Tree { Int val; Tree left,right; }<br />
define in Java? A tree? A graph?<br />
• Only Message <strong>types</strong> can be transferred between actors [next<br />
slide].<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 34 FoVeOOS’2011
Kilim qualifiers<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
Kilim en<strong>for</strong>ces Messages to be tree-like. Qualifiers further restrict<br />
this:<br />
@free <strong>for</strong>bids an object to have heap-pointers to it (a tree root).<br />
[Not if-and-only-if <strong>for</strong> static analysis reasons.]<br />
@cuttable a value which is @free if its parent’s pointer is nullified<br />
@safe <strong>for</strong>bids structural modification of the Message recursively<br />
@readonly <strong>for</strong>bids any modification of the Message recursively.<br />
Note these are deep qualifiers, unlike const in C.<br />
• Only @free Message <strong>types</strong> can be transferred between actors.<br />
(And, more subtly, as method return <strong>types</strong>!)<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 35 FoVeOOS’2011
Examples<br />
void send(@free x) { ... builtin ...}<br />
bool send2(@free x) { send(x); send(x); } // illegal<br />
bool foo(@safe x) { x.count++; return length(x) != 42; }<br />
void maybesend1(@free x) { if (foo(x)) send(x); } // OK<br />
void maybesend2(@free x) { @safe y = x;<br />
}<br />
if (foo(y)) send(x);<br />
// accessing y is illegal here.<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
Note the last subtlety. Although qualifiers appear to act on variables,<br />
they actually operate on values (capabilities).<br />
So qualifiers on variables need to be downgraded due to dataflow.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 36 FoVeOOS’2011
Core Kilim syntax<br />
FuncDcl ::= free opt m(p : α) { (lb : Stmt) ∗; }<br />
Stmt ::= x := new | x := y<br />
| x := y.f | x.f := y | x := cut(y.f)<br />
| x := y[·] | x[·] := y | x := cut(y[·])<br />
| x := m(y) | if/goto lb | return x<br />
x, y, p ∈ variable names f ∈ field names<br />
lb ∈ label names m ∈ function names<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
sel ∈ field names ∪ {[·]} [·] pseudo field name <strong>for</strong> array access<br />
α, β ∈ <strong>isolation</strong> qualifier {free, cuttable, safe}<br />
null is treated as a special readonly variable<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 37 FoVeOOS’2011
Kilim – aliasing and de-aliasing (role of cut)<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
• Note that the x.sel = y <strong>for</strong>m is the only creator of heap aliases.<br />
This can degrade y (e.g.) from @free to @cuttable.<br />
Note also that attempts to create heap-sharing will be faulted<br />
(next slide).<br />
• The cut operator severs a subtree from its @cuttable parent<br />
thus:<br />
y = cut(x.sel)<br />
def<br />
= y = x.sel; x.sel = null;<br />
Crucially it also marks y as @free – the right-hand-side above<br />
would not do this (especially on arrays). Merely doing y = x.sel<br />
would merely leave y as @cuttable, the x.sel = null makes good<br />
the semantic promise that y has no pointers to it and is hence<br />
@free.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 38 FoVeOOS’2011
Kilim <strong>isolation</strong> type checking<br />
Many fine details [ECOOP’2008]. Principal steps:<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
1. Do a simplified shape analysis on the program to over-estimate<br />
possible aliasing. At each program point get a shape graph:<br />
• Nodes: finite, abstracting run-time heap locations.<br />
• Nodes labelled by which local variables may point to them.<br />
Special node ∅ <strong>for</strong> unknown internal structure.<br />
[Core Kilim syntax en<strong>for</strong>ces “no unnamed temporaries”]<br />
• Edges represent possible run-time references.<br />
• Edges labelled with field names.<br />
A simple <strong>for</strong>ward-dataflow iteration.<br />
2. At each program point downgrade declared qualifier in<strong>for</strong>mation<br />
by dataflow and fault if inconsistent.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 39 FoVeOOS’2011
• an example<br />
• modern hardware<br />
Talk structure<br />
• sub-structural type systems, ownership<br />
• Kilim – a Java-based practical solution<br />
• Some random observations<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 40 FoVeOOS’2011
One neat point<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
When values are used linearly (no longer referenced (∗) by the caller<br />
after being passed to another method/thread) then:<br />
• call-by-reference (pass-by-identity) = call-by-value (pass-by-copy)<br />
This sidesteps the well-known problem that RMI is a non-trivial<br />
solution to exploiting <strong>multicore</strong>. It’s heavyweight and its marshalling<br />
implies pass-by-copy rather that pass-by-identity.<br />
(*) Of course, the callee can later pass such a value back to the caller.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 41 FoVeOOS’2011
Another neat point<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
Having a non-modified data-structure shared by several threads is a<br />
common programming idiom.<br />
Remember we said that caches work like multiple-reader<br />
single-writer? Hence they work efficiently on the above idiom (each<br />
CPU gets a private cache copy of it, and no invalidations occur).<br />
• Kilim above only allows Message-type objects to be transferred<br />
between actors – other Java <strong>types</strong> are faulted.<br />
• Some of these are harmless (e.g. String)<br />
• Marking these as Sharable directs Kilim to allow them to be<br />
passed between actors.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 42 FoVeOOS’2011
What I didn’t say about Kilim<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
• There are also classical Java objects (no restrictions on aliasing)<br />
but these cannot be passed between threads – except via an<br />
(unsafe) loophole.<br />
• Implementation embedding in Java via bytecode re-writer.<br />
• Ultra-fast task switching (one million threads) implemented by<br />
stack reflect/reify using @pausable attribute.<br />
• Thread creation and messaging compares favourably with Erlang.<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 43 FoVeOOS’2011
What I didn’t say about ownership type systems<br />
• The Kilim ownership system is based on heap-pointer<br />
uniqueness; each heap object as at most one pointer to it.<br />
• In addition there are also ownership type systems based on:<br />
– owners as dominators<br />
– owners as modifiers<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 44 FoVeOOS’2011
Conclusions<br />
• multi-core works best under process memory <strong>isolation</strong>: each<br />
process owns disjoint memory. A good model (actor-like).<br />
UNIVERSITYOF<br />
CAMBRIDGE<br />
• in general need ownership transfer – hardware cost of this is that<br />
of a copy.<br />
• good if ownership transfer is policed by a type system; syntactic<br />
markers give the compiler the right to optimise actual data<br />
transfer.<br />
• substructural <strong>types</strong> provide such a type system<br />
• might need loopholes/exceptions to the type system <strong>for</strong> real<br />
programs (most of the data is still protected).<br />
• Kilim is one solution (a sweet-spot).<br />
<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 45 FoVeOOS’2011