Tutorial: Introduction to CUDA Fortran | GTC 2013
Tutorial: Introduction to CUDA Fortran | GTC 2013
Tutorial: Introduction to CUDA Fortran | GTC 2013
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Thread-Level Parallelism<br />
• Mimic high resource use<br />
– Specify enough shared memory so only one thread block can reside on<br />
a multiprocessor at a time<br />
ccall copy(b_d, a_d)<br />
No Shared Memory<br />
Shared Memory<br />
Thread Block Occupancy Bandwidth Occupancy Bandwidth<br />
32 0.25 96 0.016 8<br />
64 0.5 125 0.031 15<br />
128 1.0 136 0.063 29<br />
256 1.0 137 0.125 53<br />
512 1.0 137 0.25 91<br />
1024 1.0 133 0.5 123