DirectCompute Optimizations and Best Practices - Nvidia
DirectCompute Optimizations and Best Practices - Nvidia
DirectCompute Optimizations and Best Practices - Nvidia
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Matrix Multiplication (cont.)<br />
Optimization<br />
GeForce<br />
GTX 280<br />
GeForce<br />
GTX 8800<br />
No optimization<br />
Coalesced using local<br />
memory to store a tile<br />
of A<br />
Using thread group<br />
shared memory to<br />
eliminate redundant<br />
reads of a tile of B<br />
8.8 GBps 0.7 GBps<br />
14.3 GBps 8.2 GBps<br />
29.7 GBps 15.7 GBps