12.07.2015 Views

GPU Performance Analysis and Optimization - GPU Technology ...

GPU Performance Analysis and Optimization - GPU Technology ...

GPU Performance Analysis and Optimization - GPU Technology ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Pattern Category 2: Large Inter-thread Stride• Cause:– Successive threads access words at regular distance, distance greater than one word• <strong>GPU</strong> access words: 1, 2, 4, 8, 16 bytes– Example cases:• Data transpose (warp accessing a column in a row-major data structure)• Cases where some data is accessed in transposed fashion, other isn’t• Issues:– Wasted b<strong>and</strong>width: moves more bytes than needed– Substantially increased latency:• If a warp address pattern requires N transactions, the instruction is issued N times• Symptoms:– Transactions per request much greater than ideal• Remedies:– Full: change data layout, stage accesses via SMEM– Partial: non-caching loads, read-only loads© 2012, NVIDIA56

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!