GPU Performance Analysis and Optimization - GPU Technology ...
GPU Performance Analysis and Optimization - GPU Technology ...
GPU Performance Analysis and Optimization - GPU Technology ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Case Study 4: Diagnosing• Double-precision code, so ideal ratio is 2.0 for loads <strong>and</strong> stores• Measured values:– 24.5 L1 lines per load– 73% L1 hit rate– 6.0 transactions per store– Throughputs:• 23% of DRAM b<strong>and</strong>width• 13% of instruction b<strong>and</strong>width• Conclusion– <strong>Performance</strong> is latency-limited• Both throughputs are small percentages of theory• Recall that high reissues of memory instructions increase latency– Address pattern wastes b<strong>and</strong>width:• transactions per request much higher than 2.0• Even with 73% hit rate, (1-0.73) * 24.5 = ~6.6 L1 load misses per request© 2012, NVIDIA63