Tutorial CUDA
Tutorial CUDA
Tutorial CUDA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Example:<br />
Square Matrix Multiplication<br />
C = A · B of size N x N<br />
Without blocking:<br />
© NVIDIA Corporation 2008<br />
One thread handles one element of C<br />
A and B are loaded N times from global<br />
memory<br />
Wastes bandwidth<br />
Poor balance of<br />
work to bandwidth<br />
A<br />
B<br />
C<br />
N N<br />
N<br />
N