Evolution of the NVIDIA GPU Architecture

More documents

Recommendations

Info

Warp Shuffle Instructions∗ In Fermi, data could only be exchanged between threads using sharedmemory.∗ Resulted in additional synchronization time∗ Kepler allows the shuffle functions, which∗ Exchange data between threads without using shared memory∗ Handles the store‐and‐load operation as a single step∗ Data can only be shared within the same warp∗ In their example, an FFT algorithm saw 6% performance increase whenusing this instruction.18
Kepler Hardware Features∗ Dynamic Parallelism∗ Any kernel can launch more kernels from within itself∗ Takes additional load off of the CPU∗ Hyper‐Q∗ 32 hardware managed work queues∗ Fermi had 1 queue∗ Grid Management Unit∗ Needed to manage the number of grids that are executed∗ Introduction of the GMU to handle all of the grids that can be activeat one time∗ NVIDIA GPUDirect TM∗ Ability for CUDA enabled GPUs to interact without the need for CPUintervention∗ The GPU can interact directly with the NIC19
Page 4: Graphics Pipeline
Page 8 and 9: Streaming Multiprocessor Architectu
Page 10 and 11: Architectural Memory Hierarchy
Page 12 and 13: Fermi Improvements∗ Increase the
Page 15 and 16: Kepler SM Design15
Page 17: Kepler Memory Architecture∗ Share
Page 21 and 22: Use for Computation∗ Historically
Page 23 and 24: Example - Vector AdditionCfor( int
Page 25: Synchronization and Performance∗
Page 28 and 29: Conclusions∗ GPUs are massively p

Evolution of the NVIDIA GPU Architecture

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?