12.07.2015 Views

GPU Performance Analysis and Optimization - GPU Technology ...

GPU Performance Analysis and Optimization - GPU Technology ...

GPU Performance Analysis and Optimization - GPU Technology ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Case Study 4: Diagnosing• Double-precision code, so ideal ratio is 2.0 for loads <strong>and</strong> stores• Measured values:– 24.5 L1 lines per load– 73% L1 hit rate– 6.0 transactions per store– Throughputs:• 23% of DRAM b<strong>and</strong>width• 13% of instruction b<strong>and</strong>width• Conclusion– <strong>Performance</strong> is latency-limited• Both throughputs are small percentages of theory• Recall that high reissues of memory instructions increase latency– Address pattern wastes b<strong>and</strong>width:• transactions per request much higher than 2.0• Even with 73% hit rate, (1-0.73) * 24.5 = ~6.6 L1 load misses per request© 2012, NVIDIA63

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!