26.12.2014 Views

DirectCompute Optimizations and Best Practices - Nvidia

DirectCompute Optimizations and Best Practices - Nvidia

DirectCompute Optimizations and Best Practices - Nvidia

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Reduction #7: Multiple Adds / Thread<br />

Replace load <strong>and</strong> add of two elements:<br />

unsigned int tid = threadIdx.x;<br />

unsigned int i = groupIdx.x*(groupDim_x*2) + threadIdx.x;<br />

sdata[tid] = g_idata[i] + g_idata[i+groupDim_x];<br />

GroupMemoryBarrierWithGroupSync();<br />

With a while loop to add as many as necessary:<br />

unsigned int tid = threadIdx.x;<br />

unsigned int i = groupIdx.x*(groupDim*2) + threadIdx.x;<br />

unsigned int dispatchSize = groupDim *2*gridDim.x;<br />

sdata[tid] = 0;<br />

while (i < n) {<br />

sdata[tid] += g_idata[i] + g_idata[i+groupDim_x];<br />

i += dispatchSize;<br />

}<br />

GroupMemoryBarrierWithGroupSync();

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!