Tree Poisson solver
Tree Poisson solver
Tree Poisson solver
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Tree</strong>-based self-gravity <strong>solver</strong><br />
R. Wünsch I. Berentzen<br />
A. P. Whitworth R. Banerjee<br />
Features:<br />
• two versions for Flash2 and Flash3<br />
• Barnes & Hut octal tree with monopole moments<br />
• works with 3D Cartezian coords, AMR tree needed<br />
• isolated and periodic boundaries<br />
◮ periodic: Ewald method<br />
• efficient MPI communication<br />
• interaction lists<br />
◮ nearby cells and tree nodes are not tested for opening angle criterion<br />
• ported to GPUs (by Ingo Berentzen)<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 1/26
Algorithm overview<br />
• Parameters:<br />
◮ tree limangle (θ lim ) . . . REAL . . . 1.0 - 0.5<br />
◮ tree ilist . . . INTEGER . . . 0 or 1<br />
• Algorithm: (as in Grid solve<strong>Poisson</strong>())<br />
if (grid changed .eq. 1) then<br />
call treeComBlkProperties() (nodetype, lrene, child, neigh, coords) 1%<br />
if (tree ilist .eq. 1) call treeFindNeighbours() 0%<br />
endif<br />
call gr treeBuild<strong>Tree</strong>() 3%<br />
call gr treeExchange<strong>Tree</strong>s() 3%<br />
call gr treePotential(idensvar, ipotvar) 93%<br />
call gr treeDestroy<strong>Tree</strong>() 0%<br />
◮ routines that include MPI communication marked red<br />
◮ relative times for a collapse of the BE sphere<br />
(512 CPUs, θ lim = 0.5, 76000 leaf blocks)<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 2/26
Communication of grid properties<br />
• treeComBlkProperties()<br />
◮ nodetype, lrene, child, neigh and cell coords in each block<br />
◮ memory: MAXBLOCKS × nCPUs × (16×INT + 27×REAL)<br />
◮ for MAXBLOCKS = 1000, nCPU = 1000 → 300 MB<br />
◮ only active values communicated<br />
• treeFindNeighbours() (needed by interaction lists)<br />
◮ nds 26 neighbours in all (incl. diagonal) directions<br />
◮ tr surbox(2, 27, MAXBLOCKS)<br />
→ records block number and cpu of each neighbour<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 3/26
Build <strong>Tree</strong><br />
1. build trees in blocks<br />
◮ octal tree with log 2 (nxb) levels<br />
2. communicate mass and mass<br />
centre pos. of all leaf blocks<br />
3. build Parent<strong>Tree</strong> on each CPU<br />
◮ Parent<strong>Tree</strong>(4, MAXBLOCKS, nCPU)<br />
3<br />
0<br />
2<br />
1<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 4/26
• for 8 × 8 × 8 blocks:<br />
Block tree in RAM<br />
level 0<br />
level 1<br />
m x mc y mc z mc<br />
1 2 3 4<br />
5 9 13 17 21 25 29 33<br />
level 2<br />
36 68 100 132 164 196 228 260<br />
level 3<br />
masses only (mc given by cell coordinates), 8 3<br />
= 512<br />
292 804<br />
<strong>Tree</strong> size = 8 L +4<br />
L−1<br />
∑<br />
i=0<br />
8 i = 8 L +4 8L − 1<br />
7<br />
L . . . number of the lowest level (e.g. 3 for 8 × 8 × 8 blocks)<br />
tree nodes identified by multi-index - integer array of size L: (l 1 , l 2 , l 3 ); l i =<br />
◮ 1-8 . . . number of node on i-th level<br />
◮ 0 . . . multi-index (i.e. node) is of level i-1<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 5/26
Communication of block trees<br />
1. determine tree levels to be sent<br />
◮ distance between a given block and all<br />
blocks on a given CPU<br />
2. communicate tree levels<br />
◮ all values for a given CPU packed into<br />
a single message<br />
3. allocate space for block trees<br />
from other CPUs<br />
◮ dynamic memory allocation to avoid<br />
wasting of memory<br />
4. communicate block trees<br />
◮ all block trees for a given CPU packed<br />
into a single message<br />
CPU 1<br />
level 3<br />
level 2<br />
level 1<br />
level 0<br />
CPU 0<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 6/26
• 5 types of iteractions:<br />
1. cell – cell<br />
2. cell – block tree node<br />
◮ θ lim checked for each cell<br />
3. cell – block<br />
<strong>Tree</strong> walk<br />
◮ θ lim checked once for the whole block<br />
4. cell – cell (with interaction lists)<br />
◮ distance pre-calculated<br />
5. cell – block tree node (with interaction lists)<br />
◮ θ lim criterion pre-calculated<br />
[ 02-13-2012 15:15:36.682 ] [TREE]: cell-cell distances: 0.548E+08, per zone: 111.4<br />
[ 02-13-2012 15:15:36.692 ] [TREE]: cell-node distances: 0.756E+09, per zone: 1537.1<br />
[ 02-13-2012 15:15:36.733 ] [TREE]: cell-block distances: 0.169E+08, per zone: 34.3<br />
[ 02-13-2012 15:15:36.773 ] [TREE]: IL cell-cell distances: 0.169E+09, per zone: 343.4<br />
[ 02-13-2012 15:15:36.803 ] [TREE]: IL cell-node distances: 0.461E+09, per zone: 938.4<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 7/26
Interaction lists<br />
• relative positions of cells and tree nodes are known<br />
• for each cell can be found:<br />
◮ list of cells with which it interacts<br />
◮ list of block tree nodes with which it interacts<br />
• lists do not depend on lrefine<br />
• for cells/nodes within a block and 26 surrounding blocks<br />
• makes tree walk faster by ∼ 25%<br />
◮ more ecient for smaller sims<br />
• costs memory:<br />
◮ 8×8×8 blocks:<br />
∼ 200 MB<br />
◮ 16×16×16 blocks: ∼ 700 MB<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 8/26
Test: Bonnor-Ebert sphere<br />
• mass: M = 1 M ⊙<br />
• temperature: T = 10 K (BES), T amb = 10 4 K (ambient)<br />
• unstable: ξ 0 = 10 (threshold value is 6.5)<br />
• radius: R = 0.041 pc<br />
• accuracy tests:<br />
◮ "uniform grid": lrene min = lrene max = 5<br />
→ 4096 leaf blocks<br />
◮ "AMR grid": lrene min = 1, lrene max = 5<br />
→ 1240 leaf blocks<br />
→ renement controlled by Jeans length<br />
• performance tests:<br />
◮ "AMR grid": lrene min = 1, lrene max = 8<br />
→ 76000 leaf blocks (run on 64 − 512 CPUs)<br />
◮ integrated for 10 time-steps<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 9/26
Flash 3: error in Φ<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 10/26
Flash 3: error in Φ (lref min = lref max = 5)<br />
log 10 [(Φ-Φ anl )/|Φ anl (0)|]<br />
0.025<br />
0.02<br />
0.015<br />
0.01<br />
0.005<br />
0<br />
<strong>Tree</strong>, θ lim = 1.0<br />
<strong>Tree</strong>, θ lim = 0.5<br />
<strong>Tree</strong>, θ lim = 0.2<br />
Multigrid, mpole_lmax = 0<br />
Multigrid, mpole_lmax = 8<br />
Multigrid, mpole_lmax = 15<br />
-0.005<br />
0 0.01 0.02 0.03 0.04 0.05<br />
r [pc]<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 11/26
Flash 3: error in Φ (lref min = lref max = 5)<br />
log 10 [(Φ-Φ anl )/|Φ anl (0)|]<br />
0.025<br />
0.02<br />
0.015<br />
0.01<br />
0.005<br />
0<br />
<strong>Tree</strong>, θ lim = 1.0<br />
<strong>Tree</strong>, θ lim = 0.5<br />
<strong>Tree</strong>, θ lim = 0.2<br />
Multigrid, mpole_lmax = 0<br />
Multigrid, mpole_lmax = 8<br />
Multigrid, mpole_lmax = 15<br />
<strong>Tree</strong>, θ lim = 0.0<br />
-0.005<br />
0 0.01 0.02 0.03 0.04 0.05<br />
r [pc]<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 12/26
Flash 3: error in F r (lref min = lref max = 5)<br />
log 10 [(F r -F r,anl )/|F r,anl |]<br />
0.1<br />
0.05<br />
0<br />
-0.05<br />
<strong>Tree</strong>, θ lim = 1.0<br />
<strong>Tree</strong>, θ lim = 0.5<br />
<strong>Tree</strong>, θ lim = 0.2<br />
Multigrid, mpole_lmax = 0<br />
Multigrid, mpole_lmax = 8<br />
Multigrid, mpole_lmax = 15<br />
-0.1<br />
0 0.01 0.02 0.03 0.04 0.05<br />
r [pc]<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 13/26
Flash 3: error in F r (lref min = lref max = 5)<br />
log 10 [(F r -F r,anl )/|F r,anl |]<br />
0.1<br />
0.05<br />
0<br />
-0.05<br />
<strong>Tree</strong>, θ lim = 1.0<br />
<strong>Tree</strong>, θ lim = 0.5<br />
<strong>Tree</strong>, θ lim = 0.2<br />
Multigrid, mpole_lmax = 0<br />
Multigrid, mpole_lmax = 8<br />
Multigrid, mpole_lmax = 15<br />
<strong>Tree</strong>, θ lim = 0.0<br />
-0.1<br />
0 0.01 0.02 0.03 0.04 0.05<br />
r [pc]<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 14/26
Flash 3: error in F r (lref min = 1, lref max = 6)<br />
log 10 [(F r -F r,anl )/|F r,anl |]<br />
0.1<br />
0.05<br />
0<br />
-0.05<br />
<strong>Tree</strong>, θ lim = 1.0<br />
<strong>Tree</strong>, θ lim = 0.5<br />
<strong>Tree</strong>, θ lim = 0.2<br />
Multigrid, mpole_lmax = 0<br />
Multigrid, mpole_lmax = 8<br />
Multigrid, mpole_lmax = 15<br />
-0.1<br />
0 0.01 0.02 0.03 0.04 0.05<br />
r [pc]<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 15/26
Flash 2: error in Φ (lref min = 1, lref max = 6)<br />
log 10 [(Φ-Φ anl )/|Φ anl (0)|]<br />
0.025<br />
0.02<br />
0.015<br />
0.01<br />
0.005<br />
0<br />
<strong>Tree</strong>, θ lim = 1.0<br />
<strong>Tree</strong>, θ lim = 0.5<br />
<strong>Tree</strong>, θ lim = 0.2<br />
Multigrid, mpole_lmax = 0<br />
Multigrid, mpole_lmax = 8<br />
Multigrid, mpole_lmax = 15<br />
-0.005<br />
0 0.01 0.02 0.03 0.04 0.05<br />
r [pc]<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 16/26
Flash 2: error in F r (lref min = 1, lref max = 6)<br />
0.3<br />
0.2<br />
log 10 [(F r -F r,anl )/|F r,anl |]<br />
0.1<br />
0<br />
-0.1<br />
-0.2<br />
-0.3<br />
<strong>Tree</strong>, θ lim = 1.0<br />
<strong>Tree</strong>, θ lim = 0.5<br />
<strong>Tree</strong>, θ lim = 0.2<br />
Multigrid, mpole_lmax = 0<br />
Multigrid, mpole_lmax = 8<br />
Multigrid, mpole_lmax = 15<br />
0 0.01 0.02 0.03 0.04 0.05<br />
r [pc]<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 17/26
Flash 3, BES: time(nCPUs)<br />
350<br />
300<br />
Seconds per timestep<br />
250<br />
200<br />
150<br />
100<br />
50<br />
other<br />
com/gcell<br />
tree walk/fft<br />
hydro<br />
0<br />
tree_1.0<br />
tree_0.5<br />
mg<br />
tree_1.0<br />
tree_0.5<br />
mg<br />
tree_1.0<br />
tree_0.5<br />
mg<br />
tree_1.0<br />
tree_0.5<br />
mg<br />
64 128 256 512<br />
Number of CPUs<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 18/26
Flash 3, BES: relative time<br />
1<br />
other<br />
com/gcell<br />
tree walk/fft<br />
hydro<br />
Seconds per timestep<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
tree_1.0<br />
tree_0.5<br />
mg<br />
tree_1.0<br />
tree_0.5<br />
mg<br />
tree_1.0<br />
tree_0.5<br />
mg<br />
tree_1.0<br />
tree_0.5<br />
mg<br />
64 128 256 512<br />
Number of CPUs<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 19/26
Flash 2, test: expanding shell<br />
M shell = 2 × 10 4 M ⊙<br />
-20<br />
T shell = 10 K<br />
-22<br />
R shell,0 = 10 pc<br />
-24<br />
V shell,0 = 2.2 km s −1<br />
-26<br />
R shell,max = 23 pc<br />
-28<br />
P ext = 10 −17 , 10 −13<br />
-30<br />
or 5 × 10 −13 dyne cm −2 -32<br />
log(ρ) [g/cm 3 ], (log(T) - 32) [K]<br />
ρ ∝ sech 2<br />
T = 10 4 K<br />
2.5<br />
1.5<br />
0.5<br />
T = 10 K<br />
0 5 10 15 20 25 30 0<br />
r [pc]<br />
log ρ<br />
log(T) - 32<br />
v<br />
2<br />
1<br />
v [km/s]<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 20/26
Flash 2, shell: time(nCPUs)<br />
70,000<br />
60,000<br />
Seconds per evolution<br />
50,000<br />
40,000<br />
30,000<br />
20,000<br />
10,000<br />
other<br />
com/gcell<br />
tree walk/fft<br />
hydro<br />
0<br />
tree_1.0<br />
tree_0.5<br />
mg<br />
tree_1.0<br />
tree_0.5<br />
mg<br />
tree_1.0<br />
tree_0.5<br />
mg<br />
64 128 256<br />
Number of CPUs<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 21/26
Flash 3, BES: speedup<br />
speedup (64 x t 64 /t)<br />
700<br />
600<br />
500<br />
400<br />
300<br />
200<br />
100<br />
90<br />
80<br />
70<br />
60<br />
50<br />
hydro<br />
multigrid<br />
tree, θ lim = 0.5<br />
tree, θ lim = 1.0<br />
64 128 256 512<br />
Number of CPUs<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 22/26
Flash 3: time(number of blocks)<br />
seconds per time-step<br />
1000<br />
100<br />
10<br />
1<br />
0.1<br />
0.01<br />
64 CPUs, lrefine_min = lrefine_max, ilist=1<br />
hydro<br />
tree, θ lim = 1.0<br />
N log(N)<br />
N<br />
0.001<br />
64 512 4096 32768<br />
Number of blocks<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 23/26
Flash 3, BES: time(θ lim )<br />
seconds per time-step<br />
10000<br />
1000<br />
100<br />
10<br />
1<br />
64 CPUs, lrefine_min = lrefine_max = 5, ilist=1<br />
θ lim<br />
-2<br />
θ lim<br />
-3<br />
0.1<br />
0.2 0.5 1<br />
θ lim<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 24/26
GPU version<br />
• tree walk ported to GPUs by Ingo Berentzen<br />
• 5-10 faster, comparable to hydro<br />
Future<br />
• elimination of MAXBLOCKS×nCPU size arrays<br />
• uniform grid<br />
• quadrupole (higher) moments<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 25/26
Download at:<br />
http://galaxy.asu.cas.cz/˜richard/tree-<strong>solver</strong>/
Download at:<br />
http://galaxy.asu.cas.cz/˜richard/tree-<strong>solver</strong>/<br />
Thank you!<br />
Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 26/26