05.08.2012 Views

Par4all: Auto-Parallelizing C and Fortran for the CUDA Architecture

Par4all: Auto-Parallelizing C and Fortran for the CUDA Architecture

Par4all: Auto-Parallelizing C and Fortran for the CUDA Architecture

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

•<strong>CUDA</strong> generation ◮<br />

(I)<br />

• Launching a GPU kernel is expensive<br />

◮ so we need to launch only kernels with a significant speed-up<br />

(launching overhead, memory CPU-GPU copy overhead...)<br />

• Some systems use #pragma to give a go/no-go in<strong>for</strong>mation to<br />

parallel execution<br />

1 #pragma omp parallel i f (size>100)<br />

• ∃ phase in PIPS to symbolically estimate complexity of<br />

statements<br />

• Based on preconditions<br />

• Use a SuperSparc2 model from <strong>the</strong> ’90s... �<br />

• Can be changed, but precise enough to have a coarse go/no-go<br />

in<strong>for</strong>mation<br />

• To be refined: use memory usage complexity to have in<strong>for</strong>mation<br />

about memory reuse (even a big kernel could be more efficient<br />

on a CPU if <strong>the</strong>re is a good cache use)<br />

�Par4All in <strong>CUDA</strong> — GPU conference 10/1/2009<br />

HPC Project, Mines ParisTech, TÉLÉCOM Bretagne, RPI Ronan KERYELL et al. 23 / 46

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!