13.01.2014 Views

Tree Poisson solver

Tree Poisson solver

Tree Poisson solver

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Tree</strong>-based self-gravity <strong>solver</strong><br />

R. Wünsch I. Berentzen<br />

A. P. Whitworth R. Banerjee<br />

Features:<br />

• two versions for Flash2 and Flash3<br />

• Barnes & Hut octal tree with monopole moments<br />

• works with 3D Cartezian coords, AMR tree needed<br />

• isolated and periodic boundaries<br />

◮ periodic: Ewald method<br />

• efficient MPI communication<br />

• interaction lists<br />

◮ nearby cells and tree nodes are not tested for opening angle criterion<br />

• ported to GPUs (by Ingo Berentzen)<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 1/26


Algorithm overview<br />

• Parameters:<br />

◮ tree limangle (θ lim ) . . . REAL . . . 1.0 - 0.5<br />

◮ tree ilist . . . INTEGER . . . 0 or 1<br />

• Algorithm: (as in Grid solve<strong>Poisson</strong>())<br />

if (grid changed .eq. 1) then<br />

call treeComBlkProperties() (nodetype, lrene, child, neigh, coords) 1%<br />

if (tree ilist .eq. 1) call treeFindNeighbours() 0%<br />

endif<br />

call gr treeBuild<strong>Tree</strong>() 3%<br />

call gr treeExchange<strong>Tree</strong>s() 3%<br />

call gr treePotential(idensvar, ipotvar) 93%<br />

call gr treeDestroy<strong>Tree</strong>() 0%<br />

◮ routines that include MPI communication marked red<br />

◮ relative times for a collapse of the BE sphere<br />

(512 CPUs, θ lim = 0.5, 76000 leaf blocks)<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 2/26


Communication of grid properties<br />

• treeComBlkProperties()<br />

◮ nodetype, lrene, child, neigh and cell coords in each block<br />

◮ memory: MAXBLOCKS × nCPUs × (16×INT + 27×REAL)<br />

◮ for MAXBLOCKS = 1000, nCPU = 1000 → 300 MB<br />

◮ only active values communicated<br />

• treeFindNeighbours() (needed by interaction lists)<br />

◮ nds 26 neighbours in all (incl. diagonal) directions<br />

◮ tr surbox(2, 27, MAXBLOCKS)<br />

→ records block number and cpu of each neighbour<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 3/26


Build <strong>Tree</strong><br />

1. build trees in blocks<br />

◮ octal tree with log 2 (nxb) levels<br />

2. communicate mass and mass<br />

centre pos. of all leaf blocks<br />

3. build Parent<strong>Tree</strong> on each CPU<br />

◮ Parent<strong>Tree</strong>(4, MAXBLOCKS, nCPU)<br />

3<br />

0<br />

2<br />

1<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 4/26


• for 8 × 8 × 8 blocks:<br />

Block tree in RAM<br />

level 0<br />

level 1<br />

m x mc y mc z mc<br />

1 2 3 4<br />

5 9 13 17 21 25 29 33<br />

level 2<br />

36 68 100 132 164 196 228 260<br />

level 3<br />

masses only (mc given by cell coordinates), 8 3<br />

= 512<br />

292 804<br />

<strong>Tree</strong> size = 8 L +4<br />

L−1<br />

∑<br />

i=0<br />

8 i = 8 L +4 8L − 1<br />

7<br />

L . . . number of the lowest level (e.g. 3 for 8 × 8 × 8 blocks)<br />

tree nodes identified by multi-index - integer array of size L: (l 1 , l 2 , l 3 ); l i =<br />

◮ 1-8 . . . number of node on i-th level<br />

◮ 0 . . . multi-index (i.e. node) is of level i-1<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 5/26


Communication of block trees<br />

1. determine tree levels to be sent<br />

◮ distance between a given block and all<br />

blocks on a given CPU<br />

2. communicate tree levels<br />

◮ all values for a given CPU packed into<br />

a single message<br />

3. allocate space for block trees<br />

from other CPUs<br />

◮ dynamic memory allocation to avoid<br />

wasting of memory<br />

4. communicate block trees<br />

◮ all block trees for a given CPU packed<br />

into a single message<br />

CPU 1<br />

level 3<br />

level 2<br />

level 1<br />

level 0<br />

CPU 0<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 6/26


• 5 types of iteractions:<br />

1. cell – cell<br />

2. cell – block tree node<br />

◮ θ lim checked for each cell<br />

3. cell – block<br />

<strong>Tree</strong> walk<br />

◮ θ lim checked once for the whole block<br />

4. cell – cell (with interaction lists)<br />

◮ distance pre-calculated<br />

5. cell – block tree node (with interaction lists)<br />

◮ θ lim criterion pre-calculated<br />

[ 02-13-2012 15:15:36.682 ] [TREE]: cell-cell distances: 0.548E+08, per zone: 111.4<br />

[ 02-13-2012 15:15:36.692 ] [TREE]: cell-node distances: 0.756E+09, per zone: 1537.1<br />

[ 02-13-2012 15:15:36.733 ] [TREE]: cell-block distances: 0.169E+08, per zone: 34.3<br />

[ 02-13-2012 15:15:36.773 ] [TREE]: IL cell-cell distances: 0.169E+09, per zone: 343.4<br />

[ 02-13-2012 15:15:36.803 ] [TREE]: IL cell-node distances: 0.461E+09, per zone: 938.4<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 7/26


Interaction lists<br />

• relative positions of cells and tree nodes are known<br />

• for each cell can be found:<br />

◮ list of cells with which it interacts<br />

◮ list of block tree nodes with which it interacts<br />

• lists do not depend on lrefine<br />

• for cells/nodes within a block and 26 surrounding blocks<br />

• makes tree walk faster by ∼ 25%<br />

◮ more ecient for smaller sims<br />

• costs memory:<br />

◮ 8×8×8 blocks:<br />

∼ 200 MB<br />

◮ 16×16×16 blocks: ∼ 700 MB<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 8/26


Test: Bonnor-Ebert sphere<br />

• mass: M = 1 M ⊙<br />

• temperature: T = 10 K (BES), T amb = 10 4 K (ambient)<br />

• unstable: ξ 0 = 10 (threshold value is 6.5)<br />

• radius: R = 0.041 pc<br />

• accuracy tests:<br />

◮ "uniform grid": lrene min = lrene max = 5<br />

→ 4096 leaf blocks<br />

◮ "AMR grid": lrene min = 1, lrene max = 5<br />

→ 1240 leaf blocks<br />

→ renement controlled by Jeans length<br />

• performance tests:<br />

◮ "AMR grid": lrene min = 1, lrene max = 8<br />

→ 76000 leaf blocks (run on 64 − 512 CPUs)<br />

◮ integrated for 10 time-steps<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 9/26


Flash 3: error in Φ<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 10/26


Flash 3: error in Φ (lref min = lref max = 5)<br />

log 10 [(Φ-Φ anl )/|Φ anl (0)|]<br />

0.025<br />

0.02<br />

0.015<br />

0.01<br />

0.005<br />

0<br />

<strong>Tree</strong>, θ lim = 1.0<br />

<strong>Tree</strong>, θ lim = 0.5<br />

<strong>Tree</strong>, θ lim = 0.2<br />

Multigrid, mpole_lmax = 0<br />

Multigrid, mpole_lmax = 8<br />

Multigrid, mpole_lmax = 15<br />

-0.005<br />

0 0.01 0.02 0.03 0.04 0.05<br />

r [pc]<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 11/26


Flash 3: error in Φ (lref min = lref max = 5)<br />

log 10 [(Φ-Φ anl )/|Φ anl (0)|]<br />

0.025<br />

0.02<br />

0.015<br />

0.01<br />

0.005<br />

0<br />

<strong>Tree</strong>, θ lim = 1.0<br />

<strong>Tree</strong>, θ lim = 0.5<br />

<strong>Tree</strong>, θ lim = 0.2<br />

Multigrid, mpole_lmax = 0<br />

Multigrid, mpole_lmax = 8<br />

Multigrid, mpole_lmax = 15<br />

<strong>Tree</strong>, θ lim = 0.0<br />

-0.005<br />

0 0.01 0.02 0.03 0.04 0.05<br />

r [pc]<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 12/26


Flash 3: error in F r (lref min = lref max = 5)<br />

log 10 [(F r -F r,anl )/|F r,anl |]<br />

0.1<br />

0.05<br />

0<br />

-0.05<br />

<strong>Tree</strong>, θ lim = 1.0<br />

<strong>Tree</strong>, θ lim = 0.5<br />

<strong>Tree</strong>, θ lim = 0.2<br />

Multigrid, mpole_lmax = 0<br />

Multigrid, mpole_lmax = 8<br />

Multigrid, mpole_lmax = 15<br />

-0.1<br />

0 0.01 0.02 0.03 0.04 0.05<br />

r [pc]<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 13/26


Flash 3: error in F r (lref min = lref max = 5)<br />

log 10 [(F r -F r,anl )/|F r,anl |]<br />

0.1<br />

0.05<br />

0<br />

-0.05<br />

<strong>Tree</strong>, θ lim = 1.0<br />

<strong>Tree</strong>, θ lim = 0.5<br />

<strong>Tree</strong>, θ lim = 0.2<br />

Multigrid, mpole_lmax = 0<br />

Multigrid, mpole_lmax = 8<br />

Multigrid, mpole_lmax = 15<br />

<strong>Tree</strong>, θ lim = 0.0<br />

-0.1<br />

0 0.01 0.02 0.03 0.04 0.05<br />

r [pc]<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 14/26


Flash 3: error in F r (lref min = 1, lref max = 6)<br />

log 10 [(F r -F r,anl )/|F r,anl |]<br />

0.1<br />

0.05<br />

0<br />

-0.05<br />

<strong>Tree</strong>, θ lim = 1.0<br />

<strong>Tree</strong>, θ lim = 0.5<br />

<strong>Tree</strong>, θ lim = 0.2<br />

Multigrid, mpole_lmax = 0<br />

Multigrid, mpole_lmax = 8<br />

Multigrid, mpole_lmax = 15<br />

-0.1<br />

0 0.01 0.02 0.03 0.04 0.05<br />

r [pc]<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 15/26


Flash 2: error in Φ (lref min = 1, lref max = 6)<br />

log 10 [(Φ-Φ anl )/|Φ anl (0)|]<br />

0.025<br />

0.02<br />

0.015<br />

0.01<br />

0.005<br />

0<br />

<strong>Tree</strong>, θ lim = 1.0<br />

<strong>Tree</strong>, θ lim = 0.5<br />

<strong>Tree</strong>, θ lim = 0.2<br />

Multigrid, mpole_lmax = 0<br />

Multigrid, mpole_lmax = 8<br />

Multigrid, mpole_lmax = 15<br />

-0.005<br />

0 0.01 0.02 0.03 0.04 0.05<br />

r [pc]<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 16/26


Flash 2: error in F r (lref min = 1, lref max = 6)<br />

0.3<br />

0.2<br />

log 10 [(F r -F r,anl )/|F r,anl |]<br />

0.1<br />

0<br />

-0.1<br />

-0.2<br />

-0.3<br />

<strong>Tree</strong>, θ lim = 1.0<br />

<strong>Tree</strong>, θ lim = 0.5<br />

<strong>Tree</strong>, θ lim = 0.2<br />

Multigrid, mpole_lmax = 0<br />

Multigrid, mpole_lmax = 8<br />

Multigrid, mpole_lmax = 15<br />

0 0.01 0.02 0.03 0.04 0.05<br />

r [pc]<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 17/26


Flash 3, BES: time(nCPUs)<br />

350<br />

300<br />

Seconds per timestep<br />

250<br />

200<br />

150<br />

100<br />

50<br />

other<br />

com/gcell<br />

tree walk/fft<br />

hydro<br />

0<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

64 128 256 512<br />

Number of CPUs<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 18/26


Flash 3, BES: relative time<br />

1<br />

other<br />

com/gcell<br />

tree walk/fft<br />

hydro<br />

Seconds per timestep<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

64 128 256 512<br />

Number of CPUs<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 19/26


Flash 2, test: expanding shell<br />

M shell = 2 × 10 4 M ⊙<br />

-20<br />

T shell = 10 K<br />

-22<br />

R shell,0 = 10 pc<br />

-24<br />

V shell,0 = 2.2 km s −1<br />

-26<br />

R shell,max = 23 pc<br />

-28<br />

P ext = 10 −17 , 10 −13<br />

-30<br />

or 5 × 10 −13 dyne cm −2 -32<br />

log(ρ) [g/cm 3 ], (log(T) - 32) [K]<br />

ρ ∝ sech 2<br />

T = 10 4 K<br />

2.5<br />

1.5<br />

0.5<br />

T = 10 K<br />

0 5 10 15 20 25 30 0<br />

r [pc]<br />

log ρ<br />

log(T) - 32<br />

v<br />

2<br />

1<br />

v [km/s]<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 20/26


Flash 2, shell: time(nCPUs)<br />

70,000<br />

60,000<br />

Seconds per evolution<br />

50,000<br />

40,000<br />

30,000<br />

20,000<br />

10,000<br />

other<br />

com/gcell<br />

tree walk/fft<br />

hydro<br />

0<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

64 128 256<br />

Number of CPUs<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 21/26


Flash 3, BES: speedup<br />

speedup (64 x t 64 /t)<br />

700<br />

600<br />

500<br />

400<br />

300<br />

200<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

hydro<br />

multigrid<br />

tree, θ lim = 0.5<br />

tree, θ lim = 1.0<br />

64 128 256 512<br />

Number of CPUs<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 22/26


Flash 3: time(number of blocks)<br />

seconds per time-step<br />

1000<br />

100<br />

10<br />

1<br />

0.1<br />

0.01<br />

64 CPUs, lrefine_min = lrefine_max, ilist=1<br />

hydro<br />

tree, θ lim = 1.0<br />

N log(N)<br />

N<br />

0.001<br />

64 512 4096 32768<br />

Number of blocks<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 23/26


Flash 3, BES: time(θ lim )<br />

seconds per time-step<br />

10000<br />

1000<br />

100<br />

10<br />

1<br />

64 CPUs, lrefine_min = lrefine_max = 5, ilist=1<br />

θ lim<br />

-2<br />

θ lim<br />

-3<br />

0.1<br />

0.2 0.5 1<br />

θ lim<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 24/26


GPU version<br />

• tree walk ported to GPUs by Ingo Berentzen<br />

• 5-10 faster, comparable to hydro<br />

Future<br />

• elimination of MAXBLOCKS×nCPU size arrays<br />

• uniform grid<br />

• quadrupole (higher) moments<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 25/26


Download at:<br />

http://galaxy.asu.cas.cz/˜richard/tree-<strong>solver</strong>/


Download at:<br />

http://galaxy.asu.cas.cz/˜richard/tree-<strong>solver</strong>/<br />

Thank you!<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 26/26

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!