29.01.2013 Views

Tutorial CUDA

Tutorial CUDA

Tutorial CUDA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Back to Reduce Exercise:<br />

Problem with Reduce 2<br />

Reduce 2 does not take advantage of shared memory!<br />

Reduce 3 fixes this by implementing parallel<br />

reduction in shared memory<br />

Runtime shared memory allocation:<br />

size_t SharedMemBytes = 64; // 64 bytes of shared memory<br />

KernelFunc>(...);<br />

© NVIDIA Corporation 2008<br />

The optional SharedMemBytes bytes are:<br />

Allocated in addition to the compiler allocated shared memory<br />

Mapped to any variable declared as:<br />

extern __shared__ float DynamicSharedMem[];

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!