26.12.2014 Views

DirectCompute Optimizations and Best Practices - Nvidia

DirectCompute Optimizations and Best Practices - Nvidia

DirectCompute Optimizations and Best Practices - Nvidia

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Complete Unrolling<br />

• Assuming we know the number of iterations at compile<br />

time, we could completely unroll the reduction<br />

— Luckily, the block size is limited by the GPU to 512 threads<br />

— Also, we are sticking to power-of-2 block sizes<br />

• So we can easily unroll for a fixed block size

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!