11.07.2015 Views

Memory Bandwidth Limited Kernels - GPU Technology Conference

Memory Bandwidth Limited Kernels - GPU Technology Conference

Memory Bandwidth Limited Kernels - GPU Technology Conference

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Launch Configuration• Conclusion: Have enough loads in flight to saturate the bus,or handle multiple elements per thread with independentloads, same performance. Typically, you’d like 512+ threadsper SM• Independent loads and stores from the same thread• Loads and stores from different threads• Larger word sizes can also help (float2 is twice the transactions of float,for example)• For more details:• Vasily Volkov’s GTC2010 talk “Better Performance at Lower Occupancy”© NVIDIA Corporation 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!