Par4all: Auto-Parallelizing C and Fortran for the CUDA Architecture

More documents

Recommendations

Info

•Conclusion ◮ (I) • BaC99k to C99imple C99ings • Avoiding using C99umbersome old C C99onstruC99ts C99an lead to C99leaner C99ode, more effiC99ient and more parallelizable C99ode with Par4All • C99 adds C99ympaC99etiC99 features that are unfortunately not well known: ◮ MultidimenC99ional arrays with non-statiC99c size � Avoid malloc() and useless pointer C 99onstruC 99ions � You C 99an have arrays with a dynamiC 99 size in funC 99tion parameters (as in Fortran) � Avoid useless linearizing C 99omputaC 99ions (a[i][j] instead of a[i+n*j]...) � Avoid most of alloca() ◮ TLS (Thread-Local Storage) ExtenC99ion C99an express independent storage ◮ C99omplex numbers and booleans avoid further struC99tures or enums �Par4All in CUDA — GPU conference 10/1/2009 HPC Project, Mines ParisTech, TÉLÉCOM Bretagne, RPI Ronan KERYELL et al. 39 / 46
•Conclusion ◮ Why open source? • Once upon a time, a research team in Luxembourg decided to experiment program transformations with genetic algorithms... • Main idea: ◮ Represent the transformations applied to a program as a gene ◮ Make the genes to evolve ◮ Measure the performance ◮ Select and iterate • They used PIPS to implement this • Side effects ◮ Now PIPS is scriptable in Python (PYPS) ◮ Algorithm written in Python ◮ We have an automated way to try many different optimizations to generate better CUDA kernels! Stay tuned... �Par4All in CUDA — GPU conference 10/1/2009 HPC Project, Mines ParisTech, TÉLÉCOM Bretagne, RPI Ronan KERYELL et al. 40 / 46
Page 1 and 2: • ◮ Par4All: Auto-Parallelizing
Page 3 and 4: •Introduction ◮ • Parallelize
Page 5 and 6: •Par4All ◮ 1 Par4All 2 CUDA gen
Page 7 and 8: •Par4All ◮ (I) • PIPS (Interp
Page 9 and 10: •Par4All ◮ • Automatic parall
Page 11 and 12: •CUDA generation ◮ 1 Par4All 2
Page 13 and 14: •CUDA generation ◮ • Find par
Page 15 and 16: •CUDA generation ◮ (I) Parallel
Page 17 and 18: •CUDA generation ◮ (I) • Memo
Page 19 and 20: •CUDA generation ◮ (I) Conserva
Page 21 and 22: •CUDA generation ◮ • Hardware
Page 23 and 24: •CUDA generation ◮ (II) • So
Page 25 and 26: •CUDA generation ◮ • Reductio
Page 27 and 28: •CUDA generation ◮ (II) • For
Page 29 and 30: •CUDA generation ◮ (I) • CUDA
Page 31 and 32: •CUDA generation ◮ (III) 21 [..
Page 33 and 34: •Results ◮ 1 Par4All 2 CUDA gen
Page 35 and 36: •Results ◮ (I) Average time in
Page 37 and 38: •Results ◮ (I) Average time in
Page 39: •Conclusion ◮ 1 Par4All 2 CUDA
Page 43 and 44: •Conclusion ◮ (II) • We have
Page 45 and 46: •Conclusion ◮ (II) • Capitali
Page 47 and 48: •Conclusion ◮ • HPC Project

Par4all: Auto-Parallelizing C and Fortran for the CUDA Architecture

Create successful ePaper yourself

Delete template?

Save as template?