Tutorial CUDA
Tutorial CUDA
Tutorial CUDA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Reduce 5:<br />
More Optimizations through Unrolling<br />
Parallel reduction inner loop:<br />
for (int stride = numThreadsPerBlock / 2;<br />
stride > 0; stride /= 2)<br />
{<br />
__syncthreads();<br />
if (threadID < stride)<br />
sresult[threadID] += sresult[threadID + stride];<br />
}<br />
There are only so many values for numThreadsPerBlock:<br />
Multiple of 32, less or equal to 512<br />
So, templatize on numThreadsPerBlock:<br />
template <br />
__global__ void reduce_kernel(const float* valuesIn,<br />
uint numValues,<br />
float* valuesOut)<br />
And unroll:<br />
© NVIDIA Corporation 2008