Power Point Slides in PDF - University of California, Santa Cruz
Power Point Slides in PDF - University of California, Santa Cruz
Power Point Slides in PDF - University of California, Santa Cruz
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
The Return <strong>of</strong> the SIMD<br />
Computers:<br />
UCSC Kestrel and Beyond<br />
Andrea Di Blas<br />
School <strong>of</strong> Eng<strong>in</strong>eer<strong>in</strong>g<br />
<strong>University</strong> <strong>of</strong> <strong>California</strong><br />
<strong>Santa</strong> <strong>Cruz</strong>
Outl<strong>in</strong>e<br />
Introduction: UCSC Kestrel<br />
“Synchronous” applications<br />
“Asynchronous” applications<br />
A. Di Blas 2
INTRODUCTION<br />
A. Di Blas 3
In a not-too-distant past…<br />
A. Di Blas 4
A. Di Blas 5
A long time ago (late 1980’s)<br />
the comput<strong>in</strong>g community<br />
had high hope and<br />
expectation <strong>in</strong> a new k<strong>in</strong>d <strong>of</strong><br />
architecture, the “S<strong>in</strong>gle<br />
Instruction-Multiple Instruction Multiple Data”<br />
(SIMD) parallel computers.<br />
A. Di Blas 6
However, almost all were<br />
short-lived. short lived. Their high cost,<br />
the ever-<strong>in</strong>creas<strong>in</strong>g ever <strong>in</strong>creas<strong>in</strong>g power <strong>of</strong><br />
the evil serial CPUs and,<br />
above all, the effort required<br />
to program such an<br />
unfamiliar architecture,<br />
forced big SIMD mach<strong>in</strong>es to<br />
an early retirement.<br />
A. Di Blas 7
By the mid-90’s, mid 90’s, SIMD<br />
mach<strong>in</strong>es were already<br />
disappear<strong>in</strong>g from<br />
Top500, the list <strong>of</strong> the<br />
world’s largest<br />
supercomputers.<br />
A. Di Blas 8
But <strong>in</strong> late 1998, a small group<br />
at UC <strong>Santa</strong> <strong>Cruz</strong> f<strong>in</strong>ally had<br />
the first work<strong>in</strong>g prototype <strong>of</strong> a<br />
new k<strong>in</strong>d <strong>of</strong> high-performance,<br />
high performance,<br />
low-cost low cost SIMD co-processor.<br />
co processor.<br />
Orig<strong>in</strong>ally designed for<br />
computational biology, it<br />
proved extremely powerful <strong>in</strong> a<br />
variety <strong>of</strong> other applications.<br />
A. Di Blas 9
A. Di Blas 10
A. Di Blas 11
Kestrel<br />
A. Di Blas 12
Multiple Instruction-<br />
Multiple Data<br />
MIMD and SIMD<br />
S<strong>in</strong>gle Instruction-<br />
Multiple Data<br />
A. Di Blas 13
Image Filters on Kestrel<br />
2D Gaussian filter<br />
Edge detector<br />
A. Di Blas 14
2D Gaussian convolution<br />
“Red Rocks Canyon”<br />
A. Di Blas 15
2D Gaussian convolution<br />
“Red Rocks Canyon”<br />
A. Di Blas 16
2D Gaussian convolution<br />
“Red Rocks Canyon”<br />
A. Di Blas 17
2D Gaussian convolution<br />
The 2D Gaussian kernel is separable<br />
A. Di Blas 18
2D Gaussian convolution<br />
512x512-pixel<br />
512x512 pixel<br />
Kernel size, time <strong>in</strong> s<br />
Image (8bpp) 5x5 7x7 9x9 11x11<br />
CPU time 0.050 0.070 0.070 0.080<br />
Kestrel time 0.016 0.017 0.018 0.019<br />
SPEEDUP 3.12 4.12 3.89 4.21<br />
Kestrel runs at 20 MHz!<br />
CPU: 1GHz Pentium-III 256 MB RAM cc –O2<br />
A. Di Blas 19
Edge detector<br />
“Big Sur”<br />
A. Di Blas 20
Edge detector<br />
“Big Sur”<br />
A. Di Blas 21
Edge detector<br />
A. Di Blas 22
Edge detector<br />
512x512-pixel<br />
512x512 pixel<br />
Image (8bpp)<br />
time [s]<br />
CPU 0.040<br />
Kestrel 0.018<br />
SPEEDUP 2.22<br />
CPU: 1GHz Pentium-III 256 MB RAM cc –O2<br />
A. Di Blas 23
Asynchronous applications<br />
Mandelbrot Set<br />
2D Median filter<br />
A. Di Blas 24
Mandelbrot set<br />
A. Di Blas 25
Mandelbrot set (synchronous)<br />
A. Di Blas 26
“SIMD SIMD Phase Programm<strong>in</strong>g Model”<br />
Simple methodology to turn a sequential,<br />
data-dependent data dependent algorithm <strong>in</strong>to a SIMD-<br />
parallel one<br />
Can be used with “partitionable<br />
“ partitionable” ” problems<br />
Provides dynamic load balanc<strong>in</strong>g without<br />
the need <strong>of</strong> a high-level high level support system<br />
A. Di Blas 27
Mandelbrot set (SPPM)<br />
A. Di Blas 28
Mandelbrot set (SPPM)<br />
A. Di Blas 29
Mandelbrot set (SPPM)<br />
A. Di Blas 30
Mandelbrot set (SPPM)<br />
A. Di Blas 31
512x512-pixel<br />
512x512 pixel<br />
Mandelbrot set<br />
Max # <strong>of</strong> iterations, time <strong>in</strong> s<br />
Image (16bpp) 1000 5000 10000<br />
CPU time 4.88 22.21 44.37<br />
Kestrel time (synch) 3.65 17.18 34.79<br />
Kestrel time (SPPM) 3.55 8.73 15.11<br />
SPEEDUP (SPPM vs CPU) 1.37 2.54 2.94<br />
SPEEDUP (SPPM vs synch) 1.03 1.97 2.30<br />
CPU: 500 MHz UltraSPARC-II, 640MB RAM, cc –xO3<br />
A. Di Blas 32
2D Median filter<br />
“Office Hours”<br />
A. Di Blas 33
2D Median filter<br />
“Office Hours”<br />
A. Di Blas 34
2D Median filter<br />
“Office Hours”<br />
A. Di Blas 35
2D Median filter<br />
A. Di Blas 36
512x512-pixel<br />
512x512 pixel<br />
2D Median filter<br />
W<strong>in</strong>dow size<br />
Image (8bpp) 5x5 7x7 9x9 11x11<br />
CPU time 0.190 0.370 0.540 0.760<br />
Kestrel time 0.054 0.076 0.105 0.141<br />
SPEEDUP 3.52 4.97 5.14 5.39<br />
CPU: 1GHz Pentium-III 256 MB RAM cc –O2<br />
A. Di Blas 37
The end<br />
A. Di Blas 38