Par4all: Auto-Parallelizing C and Fortran for the CUDA Architecture

More documents

Recommendations

Info

•CUDA generation ◮ (I) • Launching a GPU kernel is expensive ◮ so we need to launch only kernels with a significant speed-up (launching overhead, memory CPU-GPU copy overhead...) • Some systems use #pragma to give a go/no-go information to parallel execution 1 #pragma omp parallel i f (size>100) • ∃ phase in PIPS to symbolically estimate complexity of statements • Based on preconditions • Use a SuperSparc2 model from the ’90s... � • Can be changed, but precise enough to have a coarse go/no-go information • To be refined: use memory usage complexity to have information about memory reuse (even a big kernel could be more efficient on a CPU if there is a good cache use) �Par4All in CUDA — GPU conference 10/1/2009 HPC Project, Mines ParisTech, TÉLÉCOM Bretagne, RPI Ronan KERYELL et al. 23 / 46
•CUDA generation ◮ • Reduction are common patterns that need special care to be correctly parallelized • Reduction detection already implemented in PIPS • Efficient computation in CUDA needs to create local reduction trees in the thread-blocks • On-going implementation at Rensselaer Polytechnic Institute (RPI) with parallel-prefix �Par4All in CUDA — GPU conference 10/1/2009 HPC Project, Mines ParisTech, TÉLÉCOM Bretagne, RPI Ronan KERYELL et al. 24 / 46
Page 1 and 2: • ◮ Par4All: Auto-Parallelizing
Page 3 and 4: •Introduction ◮ • Parallelize
Page 5 and 6: •Par4All ◮ 1 Par4All 2 CUDA gen
Page 7 and 8: •Par4All ◮ (I) • PIPS (Interp
Page 9 and 10: •Par4All ◮ • Automatic parall
Page 11 and 12: •CUDA generation ◮ 1 Par4All 2
Page 13 and 14: •CUDA generation ◮ • Find par
Page 15 and 16: •CUDA generation ◮ (I) Parallel
Page 17 and 18: •CUDA generation ◮ (I) • Memo
Page 19 and 20: •CUDA generation ◮ (I) Conserva
Page 21 and 22: •CUDA generation ◮ • Hardware
Page 23: •CUDA generation ◮ (II) • So
Page 27 and 28: •CUDA generation ◮ (II) • For
Page 29 and 30: •CUDA generation ◮ (I) • CUDA
Page 31 and 32: •CUDA generation ◮ (III) 21 [..
Page 33 and 34: •Results ◮ 1 Par4All 2 CUDA gen
Page 35 and 36: •Results ◮ (I) Average time in
Page 37 and 38: •Results ◮ (I) Average time in
Page 39 and 40: •Conclusion ◮ 1 Par4All 2 CUDA
Page 41 and 42: •Conclusion ◮ Why open source?
Page 43 and 44: •Conclusion ◮ (II) • We have
Page 45 and 46: •Conclusion ◮ (II) • Capitali
Page 47 and 48: •Conclusion ◮ • HPC Project

Par4all: Auto-Parallelizing C and Fortran for the CUDA Architecture

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?