Par4all: Auto-Parallelizing C and Fortran for the CUDA Architecture

More documents

Recommendations

Info

•CUDA generation ◮ (I) • Fortran 77 parser available in PIPS • CUDA is C/C++ with some restrictions on the GPU-executed parts • Need a Fortran to C translator (f2c...)? • Only one internal representation is used in PIPS ◮ Use the Fortran parser ◮ Use the C pretty-printer • But the IO Fortran library is complex to use... and to translate ◮ If you have IO instructions in a Fortran loop-nest, it is not parallelized anyway because of sequential side effects � ◮ So keep the Fortran output everywhere but in the parallel CUDA kernels ◮ Apply a memory access transposition phasea(i,j) � a[j-1][j-1] inside the kernels to be pretty-printed as C • We could output CUDA Fortran directly when available �Par4All in CUDA — GPU conference 10/1/2009 HPC Project, Mines ParisTech, TÉLÉCOM Bretagne, RPI Ronan KERYELL et al. 25 / 46
•CUDA generation ◮ (II) • Fortran 90 and 95 support is quite interesting ◮ Dynamic allocation � Avoid nasty user allocation functions in Fortran 77 used to allocate objects in a big array... � Was a parallelism killer... ◮ Derived types (C structures) � Avoid using more dimensions to arrays to simulate structure fields in Fortran 77 � Was a parallelism killer, used useless dereferencing... ◮ Pointers � Useful to navigate through recursive derived type � Was a parallelism killer, used useless dereferencing... � Mmm... Try to avoid this anyway � • Most of these concepts are in C, so are already dealt by PIPS • � On-going Fortran 95 support with the gfc parser from GCC Gfortran �Par4All in CUDA — GPU conference 10/1/2009 HPC Project, Mines ParisTech, TÉLÉCOM Bretagne, RPI Ronan KERYELL et al. 26 / 46
Page 1 and 2: • ◮ Par4All: Auto-Parallelizing
Page 3 and 4: •Introduction ◮ • Parallelize
Page 5 and 6: •Par4All ◮ 1 Par4All 2 CUDA gen
Page 7 and 8: •Par4All ◮ (I) • PIPS (Interp
Page 9 and 10: •Par4All ◮ • Automatic parall
Page 11 and 12: •CUDA generation ◮ 1 Par4All 2
Page 13 and 14: •CUDA generation ◮ • Find par
Page 15 and 16: •CUDA generation ◮ (I) Parallel
Page 17 and 18: •CUDA generation ◮ (I) • Memo
Page 19 and 20: •CUDA generation ◮ (I) Conserva
Page 21 and 22: •CUDA generation ◮ • Hardware
Page 23 and 24: •CUDA generation ◮ (II) • So
Page 25: •CUDA generation ◮ • Reductio
Page 29 and 30: •CUDA generation ◮ (I) • CUDA
Page 31 and 32: •CUDA generation ◮ (III) 21 [..
Page 33 and 34: •Results ◮ 1 Par4All 2 CUDA gen
Page 35 and 36: •Results ◮ (I) Average time in
Page 37 and 38: •Results ◮ (I) Average time in
Page 39 and 40: •Conclusion ◮ 1 Par4All 2 CUDA
Page 41 and 42: •Conclusion ◮ Why open source?
Page 43 and 44: •Conclusion ◮ (II) • We have
Page 45 and 46: •Conclusion ◮ (II) • Capitali
Page 47 and 48: •Conclusion ◮ • HPC Project

Par4all: Auto-Parallelizing C and Fortran for the CUDA Architecture

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?