Implementing Finite Volume algorithms on GPUs - many-core.group ...
Implementing Finite Volume algorithms on GPUs - many-core.group ...
Implementing Finite Volume algorithms on GPUs - many-core.group ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Template over blockDim<br />
Initially allocated shared memory dynamically<br />
Drawback:<br />
Offsets <strong>on</strong>ly calculated at run-time<br />
Overflow of shared memory <strong>on</strong>ly found at run-time<br />
So template all functi<strong>on</strong>s over blockDim.x and blockDim.y:<br />
template<br />
global void getSLICflux GPU(Grid u, Grid flux, float dt, int<br />
coord)<br />
{<br />
shared float temp[4][NUM VARS][blockDim y][blockDim x];<br />
}<br />
Compiler can pre-compute memory offsets<br />
Use of excess shared-memory can be detected at compile-time<br />
Drawback: Must determine block-size at compile-time<br />
Time now: 8.23s (109.7×)<br />
<str<strong>on</strong>g>Finite</str<strong>on</strong>g> <str<strong>on</strong>g>Volume</str<strong>on</strong>g> Methods<br />
Laboratory for Scientific<br />
Computing<br />
13 / 22