27.12.2014 Views

The Software Stack for Transactional Memory - Computer Systems ...

The Software Stack for Transactional Memory - Computer Systems ...

The Software Stack for Transactional Memory - Computer Systems ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>The</strong> <strong>Software</strong> <strong>Stack</strong> <strong>for</strong><br />

<strong>Transactional</strong> <strong>Memory</strong><br />

Challenges and Opportunities<br />

Brian D. Carlstrom<br />

JaeWoong Chung,<br />

Christos Kozyrakis, Kunle Olukotun<br />

<strong>Computer</strong> <strong>Systems</strong> Lab<br />

Stan<strong>for</strong>d University<br />

http://tcc.stan<strong>for</strong>d.edu<br />

(picture will come here)


Parallel Programming<br />

on Shared <strong>Memory</strong><br />

Traditionally done using locks<br />

But locks are hard to use<br />

Semantic problems<br />

Deadlock<br />

Priority inversion<br />

Per<strong>for</strong>mance problems<br />

Simplicity at the expense of concurrency<br />

High concurrency at the expense of simplicity<br />

Pessimistic concurrency<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

2


<strong>Transactional</strong> <strong>Memory</strong><br />

Allows <strong>for</strong> lock-free parallel programming<br />

Transactions mark critical sections<br />

Same properties as database transactions<br />

Atomicity : all or nothing<br />

Isolation : no partial updates<br />

Transactions are easier to use than locks<br />

Coarse-grained non-blocking synchronization<br />

Optimistic concurrency<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

3


Opportunities and Challenges<br />

TM is a promising solution <strong>for</strong> easy and<br />

efficient parallel programming on multi-<br />

core systems<br />

TM brings up both opportunities and<br />

challenges to software stack<br />

Today’s s talk focuses on, but not limited to,<br />

the software stack on top of hardware TM<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

4


Contents<br />

<strong>Transactional</strong> <strong>Memory</strong> Overview<br />

What is TM<br />

Why is it interesting to Multi-core systems<br />

TM example and primitives<br />

<strong>Software</strong> <strong>Stack</strong><br />

Data Structure<br />

Programming Composition<br />

Operating System<br />

Language Implementation<br />

Programming Models<br />

Distributed Transactions<br />

Conclusion<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

5


TM execution model example<br />

(<strong>Transactional</strong> Coherence and Consistency)<br />

CPU 0 CPU 1 CPU 2<br />

...<br />

ld 0xdddd<br />

ld 0xeeee<br />

...<br />

st 0xbeef<br />

Execute<br />

Code<br />

Validation<br />

Commit<br />

...<br />

ld 0xaaaa<br />

ld 0xbbbb<br />

...<br />

0xbeef<br />

Execute<br />

Code<br />

Validation<br />

0xbeef<br />

Execute<br />

Code<br />

Violate<br />

...<br />

ld 0xbeef<br />

...<br />

Commit<br />

Re-<br />

Execute<br />

Code<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

6


Advanced TM primitives<br />

Nesting [isca06]<br />

Core 0 Core 1<br />

Core 0<br />

Core 1<br />

// A is initially 0;<br />

Atomic {<br />

....<br />

Atomic {<br />

// A is initially 0;<br />

Atomic {<br />

....<br />

Open_Atomic {<br />

A++; // 1 A = ;<br />

A++; // 1<br />

….<br />

….<br />

}<br />

}<br />

= A; // 0<br />

A++; // 2<br />

A++; // 2<br />

A = ;<br />

= A; // 1<br />

}<br />

= A; // 2<br />

}<br />

= A; // 2<br />

< Closed Nesting > < Open Nesting ><br />

Callback<br />

Violation Handler and Commit Handler<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

7


Data Structure<br />

(Opportunity)<br />

Coarse-grain non-blocking synchronization<br />

Both ease-to<br />

to-use and per<strong>for</strong>mance<br />

Nesting – reduces violation overhead<br />

Open nesting reduces the frequency of conflicts<br />

Closed nesting reduces the penalty of violation<br />

Callback – provides more programmability<br />

Violation Handler<br />

Automatic clean-up at conflicts<br />

Application specific conflict handler<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

8


Data Structure<br />

(Challenge)<br />

How to hide the advanced techniques <strong>for</strong><br />

novice programmers<br />

TM-based library<br />

like GNU classpath Java library<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

9


Programming Composition<br />

Transaction nesting<br />

(Opportunity)<br />

Flexibility in composing transactions<br />

Speculative parallel loops<br />

To avoid the hassle of setting and wrapping<br />

up multiple threads<br />

T_FOR(..)<br />

{<br />

// loop iteration<br />

}<br />

Iter1<br />

Iter2<br />

IterN<br />

tx1<br />

tx2<br />

txN<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

10


Programming Composition<br />

<strong>Transactional</strong> I/O<br />

I/O buffering<br />

(Challenge)<br />

Defer I/O operations by the commit<br />

Execute the deferred operations at commit<br />

Conditional Waits<br />

Wait() is related to lock objects<br />

Composible conditions <strong>for</strong> atomic regions<br />

Overflows<br />

Deep call stacks make transactions long<br />

Buffer overflow mechanism<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

11


Operating System<br />

(Opportunity)<br />

Non-blocking synchronization<br />

Easier kernel construction<br />

Potential <strong>for</strong> speedup<br />

Atomicity <strong>for</strong> fault-tolerance<br />

tolerance<br />

TM undoes instructions at rollback<br />

Easy check-pointing<br />

Isolation <strong>for</strong> security [sosp03]<br />

TM isolates instructions by commit<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

12


Context-switch<br />

Operating System<br />

(Challenge)<br />

In hardware TMs, transactional states have<br />

affinity to processors<br />

Interrupt [hpca06]<br />

I/O<br />

Swapping in/out transactions<br />

<strong>Software</strong> TM runs on virtual address space<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

13


Programming Models<br />

(Opportunity)<br />

TM-based models tuned <strong>for</strong> parallelism<br />

Atomos [pldi06]<br />

Atomos<br />

Java – old synchronization APIs + new TM primitives<br />

support <strong>for</strong> nesting, callback, and high-level<br />

language construct<br />

X10, Fortress, and Chapel also explore<br />

transactions<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

14


Programming Models<br />

(Challenge)<br />

Many different semantics <strong>for</strong> TM<br />

Different definitions <strong>for</strong> the same term<br />

Strong vs. weak consistency<br />

We prefer strong consistency<br />

No need to worry about possible bugs due to interaction<br />

between transaction code and non-transactional code<br />

APIs <strong>for</strong> application and system programming<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

15


Language Implementation<br />

(Opportunity)<br />

Aggressive JIT compiler optimization<br />

Try unsafe optimization<br />

Constant Propagation<br />

Rollback the computation if there is a problem<br />

Restart with safe code<br />

Speculative Parallelism<br />

Make a code segment run in parallel<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

16


Language Implementation<br />

<strong>Memory</strong> allocation<br />

(Challenge)<br />

Private memory pool or Nesting<br />

Incremental/Concurrent garbage collection<br />

Use violation handlers to deal with conflicts<br />

between collectors and mutators<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

17


Distributed Transactions<br />

(Opportunity)<br />

Integration with distributed transaction<br />

systems<br />

Transaction Service in .Net, J2EE, and<br />

CORBA<br />

Extracting parallelism from distributed<br />

objects with transactional properties<br />

Enterprise Java Beans<br />

TX_REQUIRED, TX_BEAN_MANAGED<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

18


Distributed Transactions<br />

(Challenge)<br />

E-commerce transactions are long<br />

Longer than time quanta<br />

I/O operations<br />

TM virtualization can be helpful<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

19


Conclusion<br />

<strong>Transactional</strong> <strong>Memory</strong> is a promising<br />

solution <strong>for</strong> parallel programming<br />

<strong>Transactional</strong> memory brings up both<br />

opportunities and challenges to software<br />

stack<br />

We hope research <strong>for</strong>ces from many areas<br />

join the ef<strong>for</strong>ts <strong>for</strong> <strong>Transactional</strong> memory<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

20

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!