31.07.2014 Views

Implementing Finite Volume algorithms on GPUs - many-core.group ...

Implementing Finite Volume algorithms on GPUs - many-core.group ...

Implementing Finite Volume algorithms on GPUs - many-core.group ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Template over blockDim<br />

Initially allocated shared memory dynamically<br />

Drawback:<br />

Offsets <strong>on</strong>ly calculated at run-time<br />

Overflow of shared memory <strong>on</strong>ly found at run-time<br />

So template all functi<strong>on</strong>s over blockDim.x and blockDim.y:<br />

template<br />

global void getSLICflux GPU(Grid u, Grid flux, float dt, int<br />

coord)<br />

{<br />

shared float temp[4][NUM VARS][blockDim y][blockDim x];<br />

}<br />

Compiler can pre-compute memory offsets<br />

Use of excess shared-memory can be detected at compile-time<br />

Drawback: Must determine block-size at compile-time<br />

Time now: 8.23s (109.7×)<br />

<str<strong>on</strong>g>Finite</str<strong>on</strong>g> <str<strong>on</strong>g>Volume</str<strong>on</strong>g> Methods<br />

Laboratory for Scientific<br />

Computing<br />

13 / 22

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!