21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8 Experimental Results<br />

MIPS MT: A Multithreaded RISC Architecture 17<br />

The follow<strong>in</strong>g are experimental results obta<strong>in</strong>ed on an FPGA implementation of the<br />

34K processor with 5TCs, runn<strong>in</strong>g on the MIPS Malta development board, us<strong>in</strong>g the<br />

ROPEbench framework developed by Jacob Leverich of Stanford. Each benchmark is<br />

run for a constant large number of iterations, divided among some number of software<br />

threads. The results are the calculated cycles-per-iteration.<br />

On a uniprocessor configuration, each software thread is a pthread, time-shar<strong>in</strong>g a<br />

s<strong>in</strong>gle virtual CPU. In the SMTC configurations, each pthread represents a kernel thread<br />

scheduled accord<strong>in</strong>g to standard SMP algorithms across 5 virtual CPUs. In the ROPE<br />

configuration, each pthread represents a ROPE microthread, of which the kernel has no<br />

direct knowledge. The uniprocessor and “SMTC-PT” systems use the pthread mutex<br />

implementation of the L<strong>in</strong>ux glibc 2.4. The “SMTC-ITC” and ROPE systems use an experimental<br />

library us<strong>in</strong>g MIPS MT ITC cells mapped <strong>in</strong>to the program’s address space.<br />

8.1 Synchronization<br />

The “Ferris wheel” benchmark measures synchronization costs between threads, where<br />

N threads are organized as a logical r<strong>in</strong>g, each repeatedly acquir<strong>in</strong>g a lock that must first<br />

be released by its predecessor. It’s <strong>in</strong>ner loop is:<br />

for (i = 0; i < count; i++) {<br />

lock(wheel, me);<br />

unlock(wheel, next);<br />

}<br />

Cycles/<br />

Iteration<br />

Table 1. Ferris Wheel<br />

1 Thread 2 Threads 3 Threads 4 Threads 5 Threads<br />

Uniprocessor 414 2046 2494 2792 3004<br />

SMTC-PT 572 2052 11833 13556 14451<br />

SMTC-ITC 27 19 19 19 19<br />

ROPE 26 18 18 18 18<br />

There are two noteworthy phenomena here. One is that the classical software pthread<br />

implementation degrades significantly as SMP threads are added. In the uniprocessor<br />

case, it is only by a rare accident of pre-emption that there will be contention for a lowlevel<br />

lock, but with multiple concurrent <strong>in</strong>struction streams active, such contention becomes<br />

<strong>in</strong>creas<strong>in</strong>gly likely.<br />

The second phenomenon worth not<strong>in</strong>g is that us<strong>in</strong>g the MIPS MT ITC store to implement<br />

the mutex <strong>in</strong> hardware is more than an order of magnitude faster, and does not<br />

suffer from the same scal<strong>in</strong>g problems.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!