DirectCompute Optimizations and Best Practices - Nvidia
DirectCompute Optimizations and Best Practices - Nvidia
DirectCompute Optimizations and Best Practices - Nvidia
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Performance for 4M element reduction<br />
Shader 1:<br />
interleaved addressing<br />
with divergent branching<br />
Shader 2:<br />
interleaved addressing<br />
with bank conflicts<br />
Shader 3:<br />
sequential addressing<br />
Shader 4:<br />
first add during global load<br />
Shader 5:<br />
unroll last warp<br />
Time (2 22 ints)<br />
B<strong>and</strong>width<br />
8.054 ms 2.083 GB/s<br />
Step<br />
Speedup<br />
Cumulative<br />
Speedup<br />
3.456 ms 4.854 GB/s 2.33x 2.33x<br />
1.722 ms 9.741 GB/s 2.01x 4.68x<br />
0.965 ms 17.377 GB/s 1.78x 8.34x<br />
0.536 ms 31.289 GB/s 1.8x 15.01x