26.12.2014 Views

DirectCompute Optimizations and Best Practices - Nvidia

DirectCompute Optimizations and Best Practices - Nvidia

DirectCompute Optimizations and Best Practices - Nvidia

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Reduction #4: First Add During Load<br />

Halve the number of groups, <strong>and</strong> replace single load:<br />

// each thread loads one element from global to shared mem<br />

unsigned int tid = threadIdx.x;<br />

unsigned int i = groupIdx.x*groupDim_x + threadIdx.x;<br />

sdata[tid] = g_idata[i];<br />

GroupMemoryBarrierWithGroupSync();<br />

With two loads <strong>and</strong> first add of the reduction:<br />

// perform first level of reduction,<br />

// reading from global memory, writing to shared memory<br />

unsigned int tid = threadIdx.x;<br />

unsigned int i = groupIdx.x*(groupDim_x*2) + threadIdx.x;<br />

sdata[tid] = g_idata[i] + g_idata[i+groupDim_x];<br />

GroupMemoryBarrierWithGroupSync();

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!