Tutorial CUDA
Tutorial CUDA
Tutorial CUDA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Reduce 5: Unrolled Loop<br />
if (numThreadsPerBlock >= 512)<br />
{ __syncthreads(); if (threadID < 256) sresult[threadID] += sresult[threadID + 256]; }<br />
if (numThreadsPerBlock >= 256)<br />
{ __syncthreads(); if (threadID < 128) sresult[threadID] += sresult[threadID + 128]; }<br />
if (numThreadsPerBlock >= 128)<br />
{ __syncthreads(); if (threadID < 64) sresult[threadID] += sresult[threadID + 64]; }<br />
if (numThreadsPerBlock >= 64)<br />
{ __syncthreads(); if (threadID < 32) sresult[threadID] += sresult[threadID + 32]; }<br />
if (numThreadsPerBlock >= 32)<br />
{ __syncthreads(); if (threadID < 16) sresult[threadID] += sresult[threadID + 16]; }<br />
…<br />
if (numThreadsPerBlock >= 4)<br />
{ __syncthreads(); if (threadID < 2) sresult[threadID] += sresult[threadID + 2]; }<br />
if (numThreadsPerBlock >= 2)<br />
{ __syncthreads(); if (threadID < 1) sresult[threadID] += sresult[threadID + 1]; }<br />
All code in blue will be evaluated at compile time!<br />
© NVIDIA Corporation 2008