11.07.2015 Views

Evolution of the NVIDIA GPU Architecture

Evolution of the NVIDIA GPU Architecture

Evolution of the NVIDIA GPU Architecture

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Warp Shuffle Instructions∗ In Fermi, data could only be exchanged between threads using sharedmemory.∗ Resulted in additional synchronization time∗ Kepler allows <strong>the</strong> shuffle functions, which∗ Exchange data between threads without using shared memory∗ Handles <strong>the</strong> store‐and‐load operation as a single step∗ Data can only be shared within <strong>the</strong> same warp∗ In <strong>the</strong>ir example, an FFT algorithm saw 6% performance increase whenusing this instruction.18

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!