Tutorial CUDA
Tutorial CUDA
Tutorial CUDA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Reduce 3<br />
Each thread stores its result in an array of<br />
numThreadsPerBlock elements in shared memory<br />
Each block performs a parallel reduction on this<br />
array<br />
reduce_kernel is called only 2 times:<br />
First call reduces from numValues to numBlocks<br />
© NVIDIA Corporation 2008<br />
Second call performs final reduction using one thread<br />
block