30.07.2015 Views

Slides - Par Lab

Slides - Par Lab

Slides - Par Lab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Mapping CUDA to Nvidia GPUs ¡ CUDA is designed to be functionally forgiving § First priority: make things work. Second: get performance. ¡ However, to get good performance, one must understand how CUDA is mapped to Nvidia GPUs ¡ Threads: each thread is a SIMD vector lane ¡ Warps: A SIMD instruction acts on a “warp” § Warp width is 32 elements: LOGICAL SIMD width ¡ Thread blocks: Each thread block is scheduled onto an SM § Peak efficiency requires multiple thread blocks per SM 31/73

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!