Tutorial: Introduction to CUDA Fortran | GTC 2013
Tutorial: Introduction to CUDA Fortran | GTC 2013
Tutorial: Introduction to CUDA Fortran | GTC 2013
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Reductions on GPU<br />
3 1 7 0 4 1 6 3<br />
• Parallelism across blocks<br />
• Parallelism within a block<br />
• No global synchronization<br />
4 7 5 9<br />
11 14<br />
– two-stage approach (two kernel lauches), same code for both stages<br />
25<br />
3 1 7 0 4 1 6 3<br />
4 7 5 9<br />
11 14<br />
25<br />
3 1 7 0 4 1 6 3<br />
4 7 5 9<br />
11 14<br />
25<br />
3 1 7 0 4 1 6 3<br />
4 7 5 9<br />
11 14<br />
25<br />
3 1 7 0 4 1 6 3<br />
4 7 5 9<br />
11 14<br />
25<br />
3 1 7 0 4 1 6 3<br />
4 7 5 9<br />
11 14<br />
25<br />
3 1 7 0 4 1 6 3<br />
4 7 5 9<br />
11 14<br />
25<br />
3 1 7 0 4 1 6 3<br />
4 7 5 9<br />
11 14<br />
25<br />
3 1 7 0 4 1 6 3<br />
4 7 5 9<br />
11 14<br />
25<br />
Level 0:<br />
8 blocks<br />
3 1 7 0 4 1 6 3<br />
4 7 5 9<br />
11 14<br />
25<br />
Level 1:<br />
1 block