29.01.2013 Views

Tutorial CUDA

Tutorial CUDA

Tutorial CUDA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Reduce 5:<br />

More Optimizations through Unrolling<br />

Parallel reduction inner loop:<br />

for (int stride = numThreadsPerBlock / 2;<br />

stride > 0; stride /= 2)<br />

{<br />

__syncthreads();<br />

if (threadID < stride)<br />

sresult[threadID] += sresult[threadID + stride];<br />

}<br />

There are only so many values for numThreadsPerBlock:<br />

Multiple of 32, less or equal to 512<br />

So, templatize on numThreadsPerBlock:<br />

template <br />

__global__ void reduce_kernel(const float* valuesIn,<br />

uint numValues,<br />

float* valuesOut)<br />

And unroll:<br />

© NVIDIA Corporation 2008

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!