19.11.2014 Views

Tutorial: Introduction to CUDA Fortran | GTC 2013

Tutorial: Introduction to CUDA Fortran | GTC 2013

Tutorial: Introduction to CUDA Fortran | GTC 2013

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Thread-Level Parallelism<br />

• Mimic high resource use<br />

– Specify enough shared memory so only one thread block can reside on<br />

a multiprocessor at a time<br />

ccall copy(b_d, a_d)<br />

No Shared Memory<br />

Shared Memory<br />

Thread Block Occupancy Bandwidth Occupancy Bandwidth<br />

32 0.25 96 0.016 8<br />

64 0.5 125 0.031 15<br />

128 1.0 136 0.063 29<br />

256 1.0 137 0.125 53<br />

512 1.0 137 0.25 91<br />

1024 1.0 133 0.5 123

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!