30.07.2015 Views

Slides - Par Lab

Slides - Par Lab

Slides - Par Lab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Imperatives for Efficient CUDA Code ¡ Expose abundant fine-­‐grained parallelism § need 1000’s of threads for full utilization ¡ Maximize on-­‐chip work § on-­‐chip memory orders of magnitude faster ¡ Minimize execution divergence § SIMT execution of threads in 32-­‐thread warps ¡ Minimize memory divergence § warp loads and consumes complete 128-­‐byte cache line 30/73

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!