Evolution of the NVIDIA GPU Architecture
Evolution of the NVIDIA GPU Architecture
Evolution of the NVIDIA GPU Architecture
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Warp Shuffle Instructions∗ In Fermi, data could only be exchanged between threads using sharedmemory.∗ Resulted in additional synchronization time∗ Kepler allows <strong>the</strong> shuffle functions, which∗ Exchange data between threads without using shared memory∗ Handles <strong>the</strong> store‐and‐load operation as a single step∗ Data can only be shared within <strong>the</strong> same warp∗ In <strong>the</strong>ir example, an FFT algorithm saw 6% performance increase whenusing this instruction.18