Tutorial CUDA
Tutorial CUDA
Tutorial CUDA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Back to Reduce Exercise:<br />
Problem with Reduce 2<br />
Reduce 2 does not take advantage of shared memory!<br />
Reduce 3 fixes this by implementing parallel<br />
reduction in shared memory<br />
Runtime shared memory allocation:<br />
size_t SharedMemBytes = 64; // 64 bytes of shared memory<br />
KernelFunc>(...);<br />
© NVIDIA Corporation 2008<br />
The optional SharedMemBytes bytes are:<br />
Allocated in addition to the compiler allocated shared memory<br />
Mapped to any variable declared as:<br />
extern __shared__ float DynamicSharedMem[];