26.12.2014 Views

DirectCompute Optimizations and Best Practices - Nvidia

DirectCompute Optimizations and Best Practices - Nvidia

DirectCompute Optimizations and Best Practices - Nvidia

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Performance for 4M element reduction<br />

Shader 1:<br />

interleaved addressing<br />

with divergent branching<br />

Shader 2:<br />

interleaved addressing<br />

with bank conflicts<br />

Shader 3:<br />

sequential addressing<br />

Shader 4:<br />

first add during global load<br />

Shader 5:<br />

unroll last warp<br />

Time (2 22 ints)<br />

B<strong>and</strong>width<br />

8.054 ms 2.083 GB/s<br />

Step<br />

Speedup<br />

Cumulative<br />

Speedup<br />

3.456 ms 4.854 GB/s 2.33x 2.33x<br />

1.722 ms 9.741 GB/s 2.01x 4.68x<br />

0.965 ms 17.377 GB/s 1.78x 8.34x<br />

0.536 ms 31.289 GB/s 1.8x 15.01x

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!