28.11.2014 Views

Lecture 2 – Threads - many-core.group

Lecture 2 – Threads - many-core.group

Lecture 2 – Threads - many-core.group

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GPU strategy 2 – what went wrong?<br />

• The shared memory kernel performed worse than expected mainly<br />

because <strong>many</strong> threads do not compute (just load into shared memory):<br />

• For a 16x16 block, 60 threads do not compute (23%)<br />

• But max threads per block is 512 (sqrt(512)=22.6)<br />

(Also – stencil is small - little reuse)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!