Tutorial CUDA
Tutorial CUDA
Tutorial CUDA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Example:<br />
Square Matrix Multiplication Example<br />
C = A · B of size N x N<br />
With blocking:<br />
© NVIDIA Corporation 2008<br />
One thread block handles one M x M<br />
sub-matrix C sub of C<br />
A and B are only loaded (N / M) times<br />
from global memory<br />
Much less<br />
bandwidth<br />
Much better<br />
balance of<br />
work to bandwidth<br />
A<br />
M M M<br />
B<br />
C<br />
C sub<br />
N N<br />
M<br />
M M M M<br />
N<br />
N