Lecture 2 â Threads - many-core.group
Lecture 2 â Threads - many-core.group
Lecture 2 â Threads - many-core.group
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
GPU strategy 2 – what went wrong?<br />
• The shared memory kernel performed worse than expected mainly<br />
because <strong>many</strong> threads do not compute (just load into shared memory):<br />
• For a 16x16 block, 60 threads do not compute (23%)<br />
• But max threads per block is 512 (sqrt(512)=22.6)<br />
(Also – stencil is small - little reuse)