26.12.2014 Views

DirectCompute Optimizations and Best Practices - Nvidia

DirectCompute Optimizations and Best Practices - Nvidia

DirectCompute Optimizations and Best Practices - Nvidia

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Problem: Global Synchronization<br />

• If we could synchronize across all thread groups we can run reduce on a very<br />

large array<br />

— A global sync after each group produces its result<br />

— Once all groups reach sync, continue recursively<br />

• But GPUs have no global synchronization. Why<br />

— Expensive to build in hardware for GPUs with high processor count<br />

— Would force programmer to run fewer groups (no more than # multiprocessors * #<br />

resident groups / multiprocessor) to avoid deadlock, which may reduce overall<br />

efficiency<br />

• Solution: decompose into multiple shader dispatches<br />

— A dispatch() call serves as a global synchronization point<br />

— Dispatch() has negligible HW overhead, low SW overhead

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!