GPU Performance Analysis and Optimization - GPU Technology ...
GPU Performance Analysis and Optimization - GPU Technology ...
GPU Performance Analysis and Optimization - GPU Technology ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Case Study 4: Results• Original code (Array of Structures):– Time: 22.8 ms– 24.5 transactions per load, 6 transactions per store• Code using read-only loads:– Time: 15.7 ms (1.45x speedup over original)• Code with Structure of Arrays data layout:– Time: 9.3 ms (2.45x speedup over original)– Successive threads access successive words• 3 transactions per load request– Due to offset halo reads: addressed with non-caching loads: 8.9ms (2.56x speedup)• 2 transactions per store request© 2012, NVIDIA65