Implementing Finite Volume algorithms on GPUs - many-core.group ...
Implementing Finite Volume algorithms on GPUs - many-core.group ...
Implementing Finite Volume algorithms on GPUs - many-core.group ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
What block-size should we use? (x-directi<strong>on</strong>)<br />
When performing flux-calculati<strong>on</strong>, want to reduce no. of overlap cells<br />
relative to block size<br />
Block size Overall time<br />
16 × 8 7.13s<br />
32 × 4 6.93s<br />
64 × 2 6.95s<br />
32 × 8 7.02s<br />
64 × 4 6.98<br />
First 4 lines can have 4 thread blocks per multiprocessor (register<br />
limitati<strong>on</strong>)<br />
Last two lines are limited to 3 thread blocks per multiprocessor.<br />
<str<strong>on</strong>g>Finite</str<strong>on</strong>g> <str<strong>on</strong>g>Volume</str<strong>on</strong>g> Methods<br />
Laboratory for Scientific<br />
Computing<br />
16 / 22