12.07.2015 Views

GPU Performance Analysis and Optimization - GPU Technology ...

GPU Performance Analysis and Optimization - GPU Technology ...

GPU Performance Analysis and Optimization - GPU Technology ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Case Study 6: Kepler 8-byte SMEM Access• TTI Reverse Time Migration– A seismic processing code, 3DFD• fundamental component is applying a 3D stencil to 2 wavefields to compute discretederivatives– Natural to interleave the wavefields in shared memory:• store as a float2 structure• Also a slight benefit to global memory performance, on both Fermi <strong>and</strong> Kepler• Impact on performance from enabling 8-byte mode:– More SMEM operations as order in space increases– 8 th order in space:• 2 kernels, only one uses shared memory• 1.14x full code speedup (1.18x kernel speedup)– 16 th order in space:• 3 kernels, only one uses shared memory• 1.20x full code speedup (1.29x kernel speedup)© 2012, NVIDIA86

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!