24.10.2013 Views

Using Kilim's isolation types for multicore efficiency

Using Kilim's isolation types for multicore efficiency

Using Kilim's isolation types for multicore efficiency

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

UNIVERSITYOF<br />

CAMBRIDGE<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong><br />

<strong>efficiency</strong><br />

Alan Mycroft (∗)<br />

Computer Laboratory, University of Cambridge<br />

http://www.cl.cam.ac.uk/users/am/<br />

5 October 2011<br />

(*) I greatfully acknowledge the contributions of Sriram Srinivasan [Kilim,<br />

ECOOP’2008] and Robert Ennals and Richard Sharp [PacLang, ESOP’2004].<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 1 FoVeOOS’2011


AM’s abstract<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

We identify a ‘memory <strong>isolation</strong>’ property which enables multi-core<br />

programs to avoid slowdown due to cache contention.<br />

We give a tutorial on existing work on Kilim and its <strong>isolation</strong>-type<br />

system building bridges with both substructural <strong>types</strong> and memory<br />

<strong>isolation</strong>.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 2 FoVeOOS’2011


The big picture<br />

• Traditional object-orientation mode of thinking:<br />

– everything is an object<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

– any object can reference (or even alias) any other – limited<br />

only by type<br />

Spaghetti data!<br />

• Modern hardware isn’t like this:<br />

– memory isn’t uni<strong>for</strong>m (or sequentially consistent)<br />

– multi-core parallelism<br />

• Spaghetti is bad <strong>for</strong> software engineering too<br />

• Programming languages can help [PacLang, Singularity, Kilim]<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 3 FoVeOOS’2011


What’s Kilim, and what’s new?<br />

• Kilim is a message-passing (actor) framework <strong>for</strong> Java with<br />

ultra-lightweight threads and zero-copy message passing<br />

[ECOOP’2008].<br />

• A type system contrains pointer aliasing (using Java<br />

annotations) to make message passsing (pass-by-value) and<br />

pass-by-reference equivalent<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

• What’s new? No single idea, but the combination of ideas gives<br />

an effective language design point.<br />

• This talk highlights issues caused by current multi-core<br />

processors.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 4 FoVeOOS’2011


Why does multi-core affect things?<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

• We want more than “call and wait <strong>for</strong> result” – otherwise we’re<br />

only using one CPU. So: threads, spawn . . .<br />

• Threads, locking, races, sequentially-inconsistent memory . . .<br />

• Caches mean that where computations are done greatly affects<br />

speed (doing all computations on one CPU is clearly bad).<br />

Method placement: by user or by system?<br />

• Communication costs, heterogeneous multi-core: placement is a<br />

big issue.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 5 FoVeOOS’2011


How do programming languages help?<br />

• Specify invariants which analysers may find hard to discover<br />

• Provide compiler-en<strong>for</strong>ced invariants<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

• Examples: <strong>types</strong>, “this method always executes on processor 42”.<br />

• Needs careful design: <strong>types</strong> are ubiquitously accepted but<br />

“always execute on processor 42” is likely to be too low-level and<br />

inflexible during system evolution.<br />

• This talk: type-like invariants saying “no other thread has an<br />

alias to my data”.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 6 FoVeOOS’2011


• an example<br />

• modern hardware<br />

Talk structure<br />

• sub-structural type systems, ownership<br />

• Kilim<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 7 FoVeOOS’2011


Example banking code<br />

void checktotal(AccountList p, int expected)<br />

{ int sum = 0<br />

}<br />

<strong>for</strong> (i in p) sum += i.balance;<br />

if (sum != expected) report_oddity();<br />

void movemoney(AccountList p, int amount)<br />

{ p.first.balance -= amount;<br />

}<br />

p.second.balance += amount;<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

Questions:<br />

1. when can movemoney() and checktotal() be run in parallel?<br />

2. when can movemoney() be run in parallel with itself?<br />

3. Locking? Do you know how much this costs?<br />

It helps <strong>for</strong> compilers to understand aliasing.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 8 FoVeOOS’2011


• an example<br />

• modern hardware<br />

Talk structure<br />

• sub-structural type systems, ownership<br />

• Kilim<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 9 FoVeOOS’2011


Introduction to multi-core hardware semantics<br />

Two ‘obvious’ but incorrect statements:<br />

1. If a program with two threads runs on a single-core processor<br />

then it will run unchanged on a two-core processor.<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

2. (After correcting problems in 1.) a two-thread program will run<br />

faster on a two-core processor than it runs on a single-core<br />

processor.<br />

Brief answer: need to think of multi-core processors as distibuted<br />

systems not merely as concurrent systems.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 10 FoVeOOS’2011


Multi-core and sequential consistency<br />

On a single-core (implementing tasking using interrupts) the<br />

instructions in each thread are interleaved.<br />

volatile int x=0,y=0;<br />

thread1: { x=1; print "y=",y; }<br />

thread2: { y=1; print "x=",x; }<br />

For most executions this program prints x=0, y=1 or x=1, y=0;<br />

• relatively rarely it prints x=1, y=1;<br />

• but it never prints x=0, y=0.<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

But on multi-core x=0, y=0 can happen! Not all executions interleave<br />

instructions from the two threads (they do on a single-core processor<br />

with interrupt-driven scheduling). Failure of sequential consistency.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 11 FoVeOOS’2011


Multi-core and sequential consistency (2)<br />

volatile int x=0,y=0;<br />

thread1: { x=1; print "y=",y; }<br />

thread2: { y=1; print "x=",x; }<br />

Why can x=0, y=0 happen?<br />

• On an isolated computation it is quite valid <strong>for</strong> a read and a<br />

write to distinct locations to be re-ordered<br />

• Single-core processors exploit this <strong>for</strong> speed (pipelining)<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

• Manufacturers of multi-core processors want a single CPU of a<br />

<strong>multicore</strong> processor to be as fast as a CPU of a single core.<br />

Solution: programmer’s responsibility(!) to fix such races, e.g. by<br />

locking, or using mfence [both expensive in time].<br />

Find “[Relaxed] Memory Model” <strong>for</strong> guidance.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 12 FoVeOOS’2011


Multi-core hardware semantics (2)<br />

Incorrect statement: a two-thread program will run faster on a<br />

two-core processor than it runs on a single-core processor.<br />

Reason: caches [ . . . implicit data copying]<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 13 FoVeOOS’2011


A programmer’s view of memory<br />

✛<br />

✚<br />

CPU<br />

✘<br />

✙<br />

1 cycle<br />

✲<br />

(This model was pretty accurate in 1985.)<br />

✛<br />

MEMORY<br />

✚<br />

✘<br />

✙<br />

A 2004-era single-core view of memory and timings<br />

✛<br />

CPU<br />

✚<br />

✘✛<br />

✲<br />

✙✚<br />

2<br />

L1 cache<br />

✘✛<br />

✲<br />

✙✚<br />

10<br />

L2 cache<br />

✘✛<br />

✲<br />

✙✚<br />

200<br />

MEMORY<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

✘<br />

✙<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 14 FoVeOOS’2011


Multi-core-chip memory models<br />

Today’s model (cache simplified to one level):<br />

✛<br />

other CPU<br />

or GPU etc ✲<br />

✚<br />

✛<br />

CPU 2–15<br />

✚<br />

✛<br />

CPU 1<br />

✚<br />

✘✛<br />

✙2<br />

✚<br />

✻<br />

✘✛<br />

❄<br />

✲2<br />

✙✚<br />

✲✻<br />

✘✛<br />

❄<br />

✲<br />

coherency<br />

2<br />

FAST<br />

MEMORY<br />

incoherency<br />

CACHES 2-15<br />

CACHE 1<br />

✙✚<br />

✘<br />

✙<br />

✘<br />

✙❄<br />

✘<br />

✒<br />

❅<br />

❅❘<br />

✚<br />

200<br />

✙<br />

✛ DMA<br />

200 ✛<br />

MEMORY<br />

✘<br />

✙<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 15 FoVeOOS’2011


Why can a program run slower on <strong>multicore</strong>?<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

• CPU1 writes a cache line ⇒ data in CPU2’s cache is discarded.<br />

• CPU2 now writes this cache line ⇒ the line must be reloaded<br />

(even from memory) and data in CPU1’s cache discarded.<br />

• A two-core version of virtual memory ‘thrashing’ (or of repeated<br />

cache reloading in naive big-matrix multiplication).<br />

It’s harmless (or even good) if two threads running on the same<br />

processor core both access a cache line (since all accesses will hit the<br />

cache), but . . .<br />

it’s bad if two threads running on different processor cores access the<br />

same cache line. Program runs slower on two cores than one!<br />

Multicore cache model is like multiple-reader/single-writer . . .<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 16 FoVeOOS’2011


How should we react?<br />

Shared memory and parallelism is problematic:<br />

• hard to write correct code (locks, mfence etc. needed)<br />

• sequential consistency surprises in semantics<br />

• cache effect surprises in per<strong>for</strong>mance<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

But lots of these problems go away if the programmer knows when<br />

shared data is logically being transferred between cores.<br />

Type systems (richer than usual) can express and en<strong>for</strong>ce this . . .<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 17 FoVeOOS’2011


Aside: shared memory vs. message passing<br />

Which is more efficient?<br />

Classical answer is shared memory (because message passing is<br />

‘implemented on top of it’).<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

However, note that on a dual-core processor the illusion of shared<br />

memory is implemented via message passing in the cache-coherency<br />

protocol!<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 18 FoVeOOS’2011


Caches are multiple-reader/single-writer<br />

Caches are like ownership:<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

• A core writing seizes ownership of the cache line (rescinding all<br />

ownership held by other cores)<br />

• A core reading shares ownership with the most recent writer and<br />

all intervening readers of that cache line.<br />

The transition from no-ownership to ownership costs time.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 19 FoVeOOS’2011


Effective use of modern hardware – conclusions<br />

• Need careful control of concurrent memory access.<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

• For <strong>efficiency</strong> program has to be aware when data (a cache-line)<br />

is newly-accessed by a new CPU. Writes invalidate all but the<br />

writer CPU’s copy of a cache line. Single-writer multiple-reader<br />

view.<br />

• Solution: need to express and control inter-processor aliasing at<br />

source level.<br />

(understanding aliasing is also a good idea <strong>for</strong> software<br />

engineering purposes!)<br />

• Best speed-up is when separate cores access disjoint memory.<br />

• Minimise the number of ownership transfers (i.e. cache-level data<br />

movement).<br />

Tidy/improve<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 20 FoVeOOS’2011


• an example<br />

• modern hardware<br />

Talk structure<br />

• sub-structural type systems, ownership<br />

• Kilim<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 21 FoVeOOS’2011


Classical Type Systems<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

Compilers and theory of type systems model the variables in scope<br />

with a environment of type assumptions Γ – pairs x : t.<br />

We have the judgement <strong>for</strong>m Γ ⊢ e : t which holds when expression e<br />

has type t under assumptions Γ.<br />

Note that Γ changes only at scope entry/exit.<br />

The standard rule <strong>for</strong> variables says:<br />

Γ ⊢ x : t<br />

(provided x : t ∈ Γ)<br />

Note that together these say (almost too obvious to notice/question):<br />

“each use of a variable in a scope has the same type”.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 22 FoVeOOS’2011


Type Systems – weakness<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

Variables keeping the same type throughout a block means that the<br />

two errors in the following program can’t be detected by the type<br />

system:<br />

{ char *x = malloc(10); // x has type char *<br />

foo(x); // x has type char *<br />

free(x); // x has type char * (but should not??)<br />

foo(x); // a well-known disaster...<br />

x = malloc(20); // x has type char * (and should)<br />

// problem: a memory leak now occurs<br />

} // as x goes out-of-scope<br />

Replacing free(x) with pass_ownership(x,task42) shows the<br />

weakness of classical <strong>types</strong> at controlling sharing of data.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 23 FoVeOOS’2011


Type Systems – weakness (2)<br />

• We want each variable to be free’d exactly (∗) once.<br />

• ‘Once’ means ‘once on each possible control path’.<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

• Let’s develop a type system in which each variable can only be<br />

used once – and then refine it to “at most one free-like<br />

operation”<br />

(∗) at most once if we don’t care about space leaks!<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 24 FoVeOOS’2011


Classical and Linear Type Systems (1)<br />

Standard ML-like type system:<br />

(VAR) Γ[x : t] ⊢ x : t<br />

(LAM)<br />

Γ[x : t] ⊢ e : t′<br />

(INT) Γ ⊢ n : int<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

(BOOL) Γ ⊢ b : bool<br />

Γ ⊢ λx.e : t → t ′ (APP) Γ ⊢ e1 : t → t ′ Γ ⊢ e2 : t<br />

Γ ⊢ e1 e2 : t ′<br />

(COND) Γ ⊢ e1 : bool Γ ⊢ e2 : t Γ ⊢ e3 : t<br />

Γ ⊢ if e1 then e2 else e3 : t<br />

Note that Γ only changes on scope change (here LAM).<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 25 FoVeOOS’2011


Classical and Linear Type Systems (2)<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

An equivalent type system. Assumptions Γ, ∆ are now multi-sets (i.e.<br />

permutable lists):<br />

(VAR) [x : t] ⊢ x : t<br />

(INT) [ ] ⊢ n : int<br />

(BOOL) [ ] ⊢ b : bool<br />

Γ[x : t] ⊢ e : t′<br />

(LAM)<br />

Γ ⊢ λx.e : t → t ′ (APP) Γ ⊢ e1 : t → t ′ ∆ ⊢ e2 : t<br />

Γ, ∆ ⊢ e1 e2 : t ′<br />

(COND) Γ ⊢ e1 : bool ∆ ⊢ e2 : t ∆ ⊢ e3 : t<br />

Γ, ∆ ⊢ if e1 then e2 else e3 : t<br />

Γ ⊢ e : t<br />

(WEAKEN)<br />

Γ, ∆ ⊢ e : t<br />

Γ, ∆, ∆ ⊢ e : t<br />

(CONTRACT)<br />

Γ, ∆ ⊢ e : t<br />

Why do this? Answer: substructural type systems – can now adjust<br />

WEAKEN/CONTRACT.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 26 FoVeOOS’2011


Substructural type systems<br />

(WEAKEN) and (CONTRACT) are known as structural rules.<br />

Γ ⊢ e : t<br />

(WEAKEN)<br />

Γ, ∆ ⊢ e : t<br />

Γ, ∆, ∆ ⊢ e : t<br />

(CONTRACT)<br />

Γ, ∆ ⊢ e : t<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

They control how the assumptions Γ are can be passed within an<br />

inference tree. [There’s a correspondence between combinators S and<br />

K and (CONTRACT) and (WEAKEN).]<br />

<strong>Using</strong> both WEAKEN and CONTRACT gives a system equivalent to<br />

the previous one.<br />

Without CONTRACT every variable must be used at most once.<br />

Without WEAKEN every variable must be used at least once.<br />

This is a good start <strong>for</strong> banning use-after-free and memory-leak<br />

errors. [Apology: I’m slight cheating in that higher-order functions<br />

are considerably more complicated.]<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 27 FoVeOOS’2011


Consider:<br />

How to read this intuitively<br />

(COND) Γ ⊢ e1 : bool ∆ ⊢ e2 : t ∆ ⊢ e3 : t<br />

Γ, ∆ ⊢ if e1 then e2 else e3 : t<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

• Use of ∆ says both arms of if-then-else get the same variables.<br />

• The use of Γ, ∆ in the result of the inference rule says: “share<br />

out the variables in scope: passing some (Γ) to e1 and some (∆)<br />

to e2 (and e3).”<br />

It helps to read inference rules from the bottom (goal-oriented).<br />

However, we want to refine this all-or-nothing partition – we’d like to<br />

allow (e.g.) x.f to be tested in e1 be<strong>for</strong>e x is freed in e2 (and not<br />

vice-versa).<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 28 FoVeOOS’2011


An algebra of capabilities<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

Above in Γ, ∆, the “,” operator shares each x : t into either Γ or ∆.<br />

(Cf. partial function.)<br />

We’d like an algebra of capabilities, e.g. {unused, free, readonly}<br />

where unused models partiality above, and free means ‘this variable is<br />

allowed to be free’d”. These are combined with the “;” (afterwards)<br />

operator, giving <strong>for</strong> example:<br />

• unused ; x = x<br />

• x ; unused = x<br />

• readonly ; readonly = readonly<br />

• readonly ; free = free<br />

• free ; readonly is undefined<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 29 FoVeOOS’2011


[A subtlety in the type system.]<br />

An algebra of capabilities (2)<br />

We actually need another operator on environments to deal with<br />

passing the same variable as multiple arguments to a procedure:<br />

int g(free x, readonly y)<br />

{ if (...) { deallocate(x); return y.field; }<br />

}<br />

else { int z = y.field; deallocate(x); return z; }<br />

int f(free x) { return g(x,x); }<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

The function g() is perfectly fine – but a problem arises when its<br />

arguments alias (as in the call from f()). The “+” operator on<br />

environments <strong>for</strong>bids f() by being slightly more restrictive than “;”<br />

on argument tuples.<br />

• both free + readonly and readonly + free are undefined.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 30 FoVeOOS’2011


From algebra to algorithm<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

This ‘immaculate’ splitting “;” is great <strong>for</strong> type theorists, but<br />

non-obvious how it’s implemented. Observe it behaves like a sum on<br />

usages – e.g. the sum effect of a readonly use and a free use is a free.<br />

When implementing this in a traditional type-checking left-right<br />

depth-first tree walk we implement the sum above as difference:<br />

• if x has type free [able to be freed] and the LHS operand only<br />

uses x as readonly then free still remains <strong>for</strong> the RHS<br />

• I.e. free − readonly = free<br />

• But readonly − free is undefined<br />

• Similarly x − unused = x<br />

• But unused − x is undefined (<strong>for</strong> x = unused).<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 31 FoVeOOS’2011


Aside: Separation logic<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

People familiar with judgements in separation logic will recognise its<br />

rules as being a substructural system: H1 ∗ H2 on heaps (partial<br />

maps from locations to values) is only defined when<br />

dom H1 ∩ dom H2 = {}.<br />

Indeed Parkinson’s generalisation to fractional ownership, e.g.<br />

[x 0.4<br />

→ a, y 0.5<br />

→ b] ∗ [x 0.3<br />

→ a, z 0.6<br />

→ c] = [x 0.7<br />

→ a, y 0.5<br />

→ b, z 0.6<br />

→ c]<br />

is similarly inspired by avoiding “all-or-nothing” ownership and<br />

parallels ‘;’ and ‘+’ above.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 32 FoVeOOS’2011


• an example<br />

• modern hardware<br />

Talk structure<br />

• sub-structural type systems, ownership<br />

• Kilim – a Java-based practical solution<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 33 FoVeOOS’2011


Kilim syntax<br />

• Syntax same as Java, with additional type qualifiers: @free,<br />

@cuttable, @safe and we add @readonly.<br />

• These qualify classes marked as Message.<br />

• Fields of Message <strong>types</strong> can only be other Message <strong>types</strong> and<br />

Java primitive <strong>types</strong>.<br />

• Design choice: Message <strong>types</strong> are trees and have no heap<br />

aliasing. Subversive question: what does<br />

class Tree { Int val; Tree left,right; }<br />

define in Java? A tree? A graph?<br />

• Only Message <strong>types</strong> can be transferred between actors [next<br />

slide].<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 34 FoVeOOS’2011


Kilim qualifiers<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

Kilim en<strong>for</strong>ces Messages to be tree-like. Qualifiers further restrict<br />

this:<br />

@free <strong>for</strong>bids an object to have heap-pointers to it (a tree root).<br />

[Not if-and-only-if <strong>for</strong> static analysis reasons.]<br />

@cuttable a value which is @free if its parent’s pointer is nullified<br />

@safe <strong>for</strong>bids structural modification of the Message recursively<br />

@readonly <strong>for</strong>bids any modification of the Message recursively.<br />

Note these are deep qualifiers, unlike const in C.<br />

• Only @free Message <strong>types</strong> can be transferred between actors.<br />

(And, more subtly, as method return <strong>types</strong>!)<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 35 FoVeOOS’2011


Examples<br />

void send(@free x) { ... builtin ...}<br />

bool send2(@free x) { send(x); send(x); } // illegal<br />

bool foo(@safe x) { x.count++; return length(x) != 42; }<br />

void maybesend1(@free x) { if (foo(x)) send(x); } // OK<br />

void maybesend2(@free x) { @safe y = x;<br />

}<br />

if (foo(y)) send(x);<br />

// accessing y is illegal here.<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

Note the last subtlety. Although qualifiers appear to act on variables,<br />

they actually operate on values (capabilities).<br />

So qualifiers on variables need to be downgraded due to dataflow.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 36 FoVeOOS’2011


Core Kilim syntax<br />

FuncDcl ::= free opt m(p : α) { (lb : Stmt) ∗; }<br />

Stmt ::= x := new | x := y<br />

| x := y.f | x.f := y | x := cut(y.f)<br />

| x := y[·] | x[·] := y | x := cut(y[·])<br />

| x := m(y) | if/goto lb | return x<br />

x, y, p ∈ variable names f ∈ field names<br />

lb ∈ label names m ∈ function names<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

sel ∈ field names ∪ {[·]} [·] pseudo field name <strong>for</strong> array access<br />

α, β ∈ <strong>isolation</strong> qualifier {free, cuttable, safe}<br />

null is treated as a special readonly variable<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 37 FoVeOOS’2011


Kilim – aliasing and de-aliasing (role of cut)<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

• Note that the x.sel = y <strong>for</strong>m is the only creator of heap aliases.<br />

This can degrade y (e.g.) from @free to @cuttable.<br />

Note also that attempts to create heap-sharing will be faulted<br />

(next slide).<br />

• The cut operator severs a subtree from its @cuttable parent<br />

thus:<br />

y = cut(x.sel)<br />

def<br />

= y = x.sel; x.sel = null;<br />

Crucially it also marks y as @free – the right-hand-side above<br />

would not do this (especially on arrays). Merely doing y = x.sel<br />

would merely leave y as @cuttable, the x.sel = null makes good<br />

the semantic promise that y has no pointers to it and is hence<br />

@free.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 38 FoVeOOS’2011


Kilim <strong>isolation</strong> type checking<br />

Many fine details [ECOOP’2008]. Principal steps:<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

1. Do a simplified shape analysis on the program to over-estimate<br />

possible aliasing. At each program point get a shape graph:<br />

• Nodes: finite, abstracting run-time heap locations.<br />

• Nodes labelled by which local variables may point to them.<br />

Special node ∅ <strong>for</strong> unknown internal structure.<br />

[Core Kilim syntax en<strong>for</strong>ces “no unnamed temporaries”]<br />

• Edges represent possible run-time references.<br />

• Edges labelled with field names.<br />

A simple <strong>for</strong>ward-dataflow iteration.<br />

2. At each program point downgrade declared qualifier in<strong>for</strong>mation<br />

by dataflow and fault if inconsistent.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 39 FoVeOOS’2011


• an example<br />

• modern hardware<br />

Talk structure<br />

• sub-structural type systems, ownership<br />

• Kilim – a Java-based practical solution<br />

• Some random observations<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 40 FoVeOOS’2011


One neat point<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

When values are used linearly (no longer referenced (∗) by the caller<br />

after being passed to another method/thread) then:<br />

• call-by-reference (pass-by-identity) = call-by-value (pass-by-copy)<br />

This sidesteps the well-known problem that RMI is a non-trivial<br />

solution to exploiting <strong>multicore</strong>. It’s heavyweight and its marshalling<br />

implies pass-by-copy rather that pass-by-identity.<br />

(*) Of course, the callee can later pass such a value back to the caller.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 41 FoVeOOS’2011


Another neat point<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

Having a non-modified data-structure shared by several threads is a<br />

common programming idiom.<br />

Remember we said that caches work like multiple-reader<br />

single-writer? Hence they work efficiently on the above idiom (each<br />

CPU gets a private cache copy of it, and no invalidations occur).<br />

• Kilim above only allows Message-type objects to be transferred<br />

between actors – other Java <strong>types</strong> are faulted.<br />

• Some of these are harmless (e.g. String)<br />

• Marking these as Sharable directs Kilim to allow them to be<br />

passed between actors.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 42 FoVeOOS’2011


What I didn’t say about Kilim<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

• There are also classical Java objects (no restrictions on aliasing)<br />

but these cannot be passed between threads – except via an<br />

(unsafe) loophole.<br />

• Implementation embedding in Java via bytecode re-writer.<br />

• Ultra-fast task switching (one million threads) implemented by<br />

stack reflect/reify using @pausable attribute.<br />

• Thread creation and messaging compares favourably with Erlang.<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 43 FoVeOOS’2011


What I didn’t say about ownership type systems<br />

• The Kilim ownership system is based on heap-pointer<br />

uniqueness; each heap object as at most one pointer to it.<br />

• In addition there are also ownership type systems based on:<br />

– owners as dominators<br />

– owners as modifiers<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 44 FoVeOOS’2011


Conclusions<br />

• multi-core works best under process memory <strong>isolation</strong>: each<br />

process owns disjoint memory. A good model (actor-like).<br />

UNIVERSITYOF<br />

CAMBRIDGE<br />

• in general need ownership transfer – hardware cost of this is that<br />

of a copy.<br />

• good if ownership transfer is policed by a type system; syntactic<br />

markers give the compiler the right to optimise actual data<br />

transfer.<br />

• substructural <strong>types</strong> provide such a type system<br />

• might need loopholes/exceptions to the type system <strong>for</strong> real<br />

programs (most of the data is still protected).<br />

• Kilim is one solution (a sweet-spot).<br />

<strong>Using</strong> Kilim’s <strong>isolation</strong> <strong>types</strong> <strong>for</strong> <strong>multicore</strong> <strong>efficiency</strong> 45 FoVeOOS’2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!