05.08.2013 Views

Power Point Slides in PDF - University of California, Santa Cruz

Power Point Slides in PDF - University of California, Santa Cruz

Power Point Slides in PDF - University of California, Santa Cruz

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The Return <strong>of</strong> the SIMD<br />

Computers:<br />

UCSC Kestrel and Beyond<br />

Andrea Di Blas<br />

School <strong>of</strong> Eng<strong>in</strong>eer<strong>in</strong>g<br />

<strong>University</strong> <strong>of</strong> <strong>California</strong><br />

<strong>Santa</strong> <strong>Cruz</strong>


Outl<strong>in</strong>e<br />

Introduction: UCSC Kestrel<br />

“Synchronous” applications<br />

“Asynchronous” applications<br />

A. Di Blas 2


INTRODUCTION<br />

A. Di Blas 3


In a not-too-distant past…<br />

A. Di Blas 4


A. Di Blas 5


A long time ago (late 1980’s)<br />

the comput<strong>in</strong>g community<br />

had high hope and<br />

expectation <strong>in</strong> a new k<strong>in</strong>d <strong>of</strong><br />

architecture, the “S<strong>in</strong>gle<br />

Instruction-Multiple Instruction Multiple Data”<br />

(SIMD) parallel computers.<br />

A. Di Blas 6


However, almost all were<br />

short-lived. short lived. Their high cost,<br />

the ever-<strong>in</strong>creas<strong>in</strong>g ever <strong>in</strong>creas<strong>in</strong>g power <strong>of</strong><br />

the evil serial CPUs and,<br />

above all, the effort required<br />

to program such an<br />

unfamiliar architecture,<br />

forced big SIMD mach<strong>in</strong>es to<br />

an early retirement.<br />

A. Di Blas 7


By the mid-90’s, mid 90’s, SIMD<br />

mach<strong>in</strong>es were already<br />

disappear<strong>in</strong>g from<br />

Top500, the list <strong>of</strong> the<br />

world’s largest<br />

supercomputers.<br />

A. Di Blas 8


But <strong>in</strong> late 1998, a small group<br />

at UC <strong>Santa</strong> <strong>Cruz</strong> f<strong>in</strong>ally had<br />

the first work<strong>in</strong>g prototype <strong>of</strong> a<br />

new k<strong>in</strong>d <strong>of</strong> high-performance,<br />

high performance,<br />

low-cost low cost SIMD co-processor.<br />

co processor.<br />

Orig<strong>in</strong>ally designed for<br />

computational biology, it<br />

proved extremely powerful <strong>in</strong> a<br />

variety <strong>of</strong> other applications.<br />

A. Di Blas 9


A. Di Blas 10


A. Di Blas 11


Kestrel<br />

A. Di Blas 12


Multiple Instruction-<br />

Multiple Data<br />

MIMD and SIMD<br />

S<strong>in</strong>gle Instruction-<br />

Multiple Data<br />

A. Di Blas 13


Image Filters on Kestrel<br />

2D Gaussian filter<br />

Edge detector<br />

A. Di Blas 14


2D Gaussian convolution<br />

“Red Rocks Canyon”<br />

A. Di Blas 15


2D Gaussian convolution<br />

“Red Rocks Canyon”<br />

A. Di Blas 16


2D Gaussian convolution<br />

“Red Rocks Canyon”<br />

A. Di Blas 17


2D Gaussian convolution<br />

The 2D Gaussian kernel is separable<br />

A. Di Blas 18


2D Gaussian convolution<br />

512x512-pixel<br />

512x512 pixel<br />

Kernel size, time <strong>in</strong> s<br />

Image (8bpp) 5x5 7x7 9x9 11x11<br />

CPU time 0.050 0.070 0.070 0.080<br />

Kestrel time 0.016 0.017 0.018 0.019<br />

SPEEDUP 3.12 4.12 3.89 4.21<br />

Kestrel runs at 20 MHz!<br />

CPU: 1GHz Pentium-III 256 MB RAM cc –O2<br />

A. Di Blas 19


Edge detector<br />

“Big Sur”<br />

A. Di Blas 20


Edge detector<br />

“Big Sur”<br />

A. Di Blas 21


Edge detector<br />

A. Di Blas 22


Edge detector<br />

512x512-pixel<br />

512x512 pixel<br />

Image (8bpp)<br />

time [s]<br />

CPU 0.040<br />

Kestrel 0.018<br />

SPEEDUP 2.22<br />

CPU: 1GHz Pentium-III 256 MB RAM cc –O2<br />

A. Di Blas 23


Asynchronous applications<br />

Mandelbrot Set<br />

2D Median filter<br />

A. Di Blas 24


Mandelbrot set<br />

A. Di Blas 25


Mandelbrot set (synchronous)<br />

A. Di Blas 26


“SIMD SIMD Phase Programm<strong>in</strong>g Model”<br />

Simple methodology to turn a sequential,<br />

data-dependent data dependent algorithm <strong>in</strong>to a SIMD-<br />

parallel one<br />

Can be used with “partitionable<br />

“ partitionable” ” problems<br />

Provides dynamic load balanc<strong>in</strong>g without<br />

the need <strong>of</strong> a high-level high level support system<br />

A. Di Blas 27


Mandelbrot set (SPPM)<br />

A. Di Blas 28


Mandelbrot set (SPPM)<br />

A. Di Blas 29


Mandelbrot set (SPPM)<br />

A. Di Blas 30


Mandelbrot set (SPPM)<br />

A. Di Blas 31


512x512-pixel<br />

512x512 pixel<br />

Mandelbrot set<br />

Max # <strong>of</strong> iterations, time <strong>in</strong> s<br />

Image (16bpp) 1000 5000 10000<br />

CPU time 4.88 22.21 44.37<br />

Kestrel time (synch) 3.65 17.18 34.79<br />

Kestrel time (SPPM) 3.55 8.73 15.11<br />

SPEEDUP (SPPM vs CPU) 1.37 2.54 2.94<br />

SPEEDUP (SPPM vs synch) 1.03 1.97 2.30<br />

CPU: 500 MHz UltraSPARC-II, 640MB RAM, cc –xO3<br />

A. Di Blas 32


2D Median filter<br />

“Office Hours”<br />

A. Di Blas 33


2D Median filter<br />

“Office Hours”<br />

A. Di Blas 34


2D Median filter<br />

“Office Hours”<br />

A. Di Blas 35


2D Median filter<br />

A. Di Blas 36


512x512-pixel<br />

512x512 pixel<br />

2D Median filter<br />

W<strong>in</strong>dow size<br />

Image (8bpp) 5x5 7x7 9x9 11x11<br />

CPU time 0.190 0.370 0.540 0.760<br />

Kestrel time 0.054 0.076 0.105 0.141<br />

SPEEDUP 3.52 4.97 5.14 5.39<br />

CPU: 1GHz Pentium-III 256 MB RAM cc –O2<br />

A. Di Blas 37


The end<br />

A. Di Blas 38

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!