28.11.2012 Views

Scalability of the Poisson solvers - TDDFT.org

Scalability of the Poisson solvers - TDDFT.org

Scalability of the Poisson solvers - TDDFT.org

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

From where we came?<br />

Where we go?<br />

Survey<br />

Conclusions<br />

Outline<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 2 / 33


From where we came?<br />

Where we go?<br />

Survey<br />

Conclusions<br />

From where we came?<br />

Outline<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 3 / 33


From where we came?<br />

Introduction<br />

This presentation is based on a paper that we are going to publish “A<br />

survey <strong>of</strong> <strong>the</strong> parallel performance and accuracy <strong>of</strong> <strong>Poisson</strong> <strong>solvers</strong> for<br />

electronic structure calculations” <strong>of</strong> <strong>the</strong> following authors:<br />

• Pablo García Risueño<br />

• Joseba Alberdi-Rodriguez<br />

• Micael J. T. Oliveira<br />

• Xavier Andrade<br />

• Michael Pippig<br />

• Javier Muguerza<br />

• Agustin Arruabarrena<br />

• Angel Rubio<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 4 / 33


From where we came?<br />

What do we want?<br />

• Simulate efficiently chlorophyll molecule<br />

180 atoms 441 atoms<br />

650 atoms 1365 atoms<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 5 / 33


From where we came?<br />

What was <strong>the</strong> problem?<br />

• Time-dependent runs did not scale ideally [?]<br />

Time (s)<br />

100<br />

10<br />

1<br />

0.1<br />

10 100 1000 10000 100000<br />

CPU cores<br />

TD with ISF<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 6 / 33


• Blue Gene/P<br />

Time (s)<br />

100<br />

10<br />

1<br />

0.1<br />

From where we came?<br />

<strong>Poisson</strong> solver did not scale,<br />

TD with ISF<br />

<strong>Poisson</strong> solver<br />

10 100 1000 10000 100000<br />

CPU cores<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 7 / 33


speed-up<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

From where we came?<br />

or; Not optimised <strong>Poisson</strong> solver<br />

<strong>Poisson</strong> solver % <strong>of</strong> <strong>the</strong> TD<br />

3% 4% 14% 27% 48%<br />

512 1024 2048 4096 8192<br />

MPI processes<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 8 / 33


From where we came?<br />

Where we go?<br />

Survey<br />

Conclusions<br />

Where we go?<br />

Outline<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 9 / 33


Where we go?<br />

Possible solution<br />

Change to a new scalable <strong>Poisson</strong> solver<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 10 / 33


Where we go?<br />

Possible solution<br />

Change to a new scalable <strong>Poisson</strong> solver<br />

• Fast multipole method<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 10 / 33


Where we go?<br />

Possible solution<br />

Change to a new scalable <strong>Poisson</strong> solver<br />

• Fast multipole method<br />

• Parallel fast Fourier transform<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 10 / 33


Where we go?<br />

Possible solution<br />

Change to a new scalable <strong>Poisson</strong> solver<br />

• Fast multipole method<br />

• Parallel fast Fourier transform<br />

So, we have implemented two new <strong>Poisson</strong> <strong>solvers</strong> based on parallel<br />

FFT [?] fast multipole method (FMM) [?] libraries.<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 10 / 33


Based on serial FFT<br />

Where we go?<br />

Parallel fast Fourier transform<br />

• Works with parallelepiped grid shapes only<br />

• Padding from octopus grid needed<br />

• Done with MPI Ga<strong>the</strong>r and MPI Scatter<br />

• Never better than constant time.<br />

PFFT implementation<br />

• Also, parallelepiped grid shape<br />

• But, with a new grid partitioning (next slide)<br />

• Implemented highly parallel data-movement, using MPI Alltoall<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 11 / 33


Z axis<br />

Where we go?<br />

PFFT grid partitioning<br />

Simplified domain decomposition <strong>of</strong> <strong>the</strong> simulation meshes.<br />

• A) Octopus mesh with a 3D domain decomposition<br />

• B) PFFT mesh with a 2D decomposition.<br />

Y axis<br />

X axis<br />

A) Octopus mesh B) PFFT mesh<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 12 / 33


Previous execution<br />

X MPI processes<br />

X MPI processes<br />

Where we go?<br />

Serial FFT solver<br />

ga<strong>the</strong>r<br />

scatter<br />

root<br />

root<br />

Mesh_to_cube<br />

Cube_to_mesh<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 13 / 33<br />

FFT


Previous execution PFFT<br />

X MPI processes<br />

X MPI processes<br />

Where we go?<br />

Parallel FFT solver<br />

ALLga<strong>the</strong>r<br />

root Mesh_to_cube<br />

root<br />

scatter Cube_to_mesh<br />

FFT<br />

1) Do ga<strong>the</strong>r <strong>of</strong><br />

pfft%data_in<br />

2) RS =<br />

pfft%data_in<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 14 / 33


Where we go?<br />

Parallel FFT solver execution<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 15 / 33


Where we go?<br />

Fast multipole method<br />

• Usage <strong>of</strong> an external library, developed in JSC<br />

• Use <strong>of</strong> <strong>the</strong> adaptive grid shape <strong>of</strong> octopus<br />

• Implementation <strong>of</strong> a correction term<br />

• O(N) complexity<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 16 / 33


Where we go?<br />

Scheme <strong>of</strong> how <strong>the</strong> inclusion<br />

<strong>of</strong> semi-neighbours <strong>of</strong> point<br />

P<br />

• A) Scheme without<br />

semi-neighbours.<br />

• B) considering<br />

semi-neighbours <strong>of</strong><br />

point P<br />

Fast multipole method (II)<br />

A)<br />

B)<br />

C)<br />

L/2<br />

P Error in <strong>the</strong><br />

integral for V<br />

P neighbours<br />

P<br />

P semi-neighbours<br />

point P<br />

charges are equispaced<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 17 / 33


Where we go?<br />

Fast multipole method (III)<br />

2D example <strong>of</strong> <strong>the</strong> position <strong>of</strong> cells containing semi<br />

neighbours.�r0-centred cell itself.<br />

A)<br />

B)<br />

Original box: all cell’s size is L 2 (2D)<br />

New: Semi-neighbours cell sizes are L 2 /4<br />

Side cells sizes are 7/8L 2<br />

Central cell size is L 2 /2<br />

(all in 2D)<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 18 / 33


From where we came?<br />

Where we go?<br />

Survey<br />

Conclusions<br />

Survey<br />

Outline<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 19 / 33


Survey<br />

Survey<br />

We have measured <strong>the</strong> accuracy and <strong>the</strong> performance <strong>of</strong> <strong>the</strong> 6 available<br />

<strong>Poisson</strong> solver <strong>of</strong> octopus. Those are compared <strong>solvers</strong>:<br />

• PFFT<br />

• ISF<br />

• FMM<br />

• CG corrected<br />

• Multigrid<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 20 / 33


Survey<br />

Accuracy<br />

We measured <strong>the</strong> accuracy <strong>of</strong> a <strong>Poisson</strong> solver, Epot and Eenergy , as<br />

follows:<br />

Epot :=<br />

Ea = 1<br />

2<br />

En = 1<br />

2<br />

�<br />

ijk |va(�rijk) − vn(�rijk)|<br />

�<br />

ijk |va(�rijk)|<br />

�<br />

ρ(�rijk)va(�rijk) ,<br />

ijk<br />

�<br />

ρ(�rijk)vn(�rijk) ,<br />

ijk<br />

Eenergy := Ea − En<br />

Ea<br />

,<br />

,<br />

where;<br />

• �rijk: all <strong>the</strong> points <strong>of</strong> <strong>the</strong> analysed<br />

grid<br />

• va: analytically calculated<br />

potential<br />

• vn: calculated potential calculated<br />

• Ea: analytical Hartree energy<br />

• En: numerical Hartree energy<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 21 / 33


Survey<br />

Accuracy (Epot error)<br />

Epot errors <strong>of</strong> different <strong>Poisson</strong> <strong>solvers</strong> in <strong>the</strong> calculation <strong>of</strong> <strong>the</strong> Hartree<br />

potential created by a Gaussian charge distribution represented on a<br />

grid <strong>of</strong> edge Le = 15.8 ˚A and spacing 0.2 ˚A.<br />

Le (˚A) PFFT ISF FMM CG Multigrid<br />

7 9·10 −3 4·10 −3 4·10 −3 4·10 −3 4·10 −3<br />

10 1·10 −4 2·10 −5 8·10 −5 3·10 −5 2·10 −5<br />

15.8 6·10 −5 2·10 −7 1·10 −4 5·10 −5 6·10 −6<br />

22.1 1·10 −5 4·10 −10 3·10 −4 3·10 −4 2·10 −6<br />

25.9 4·10 −7 7·10 −10 3·10 −4 5·10 −3 4·10 −7<br />

31.7 4·10 −8 1 ·10 −9 2·10 −4 1·10 −2 4·10 −7<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 22 / 33


Survey<br />

Accuracy (Eenergy error)<br />

Eenergy errors <strong>of</strong> different <strong>Poisson</strong> <strong>solvers</strong> in <strong>the</strong> calculation <strong>of</strong> <strong>the</strong><br />

Hartree potential created by a Gaussian charge distribution represented<br />

on a grid <strong>of</strong> edge Le = 15.8 ˚A and spacing 0.2 ˚A.<br />

Le (˚A) PFFT ISF FMM CG Multigrid<br />

7 2·10 −3 2·10 −3 2·10 −3 2·10 −3 2·10 −3<br />

10 9·10 −6 9·10 −6 4·10 −6 4·10 −6 9·10 −6<br />

15.8 2·10 −12 3·10 −7 3·10 −6 3·10 −5 5·10 −6<br />

22.1


Survey<br />

Performance<br />

• We have used 2 different architectures<br />

• x86-64<br />

• Curie<br />

• Corvo<br />

• Blue Gene/P<br />

• Jugene<br />

• Genius<br />

• We have run hartree test and measured <strong>the</strong> time <strong>of</strong> <strong>the</strong> <strong>Poisson</strong><br />

solver<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 24 / 33


Survey<br />

<strong>Poisson</strong> solver, Blue Gene/P<br />

Execution-times for <strong>the</strong> calculation <strong>of</strong> <strong>the</strong> Hartree potential created by a<br />

Gaussian charge distribution on a Blue Gene/P machine as a function <strong>of</strong><br />

six different <strong>Poisson</strong> <strong>solvers</strong> and <strong>of</strong> <strong>the</strong> involved number <strong>of</strong> MPI<br />

processes.<br />

1000<br />

t (s)<br />

100<br />

10<br />

1<br />

0.1<br />

0.01<br />

1 4 16 64 256 4096<br />

MPI proc.<br />

PFFT<br />

Serial FFT<br />

ISF<br />

CG corrected<br />

Multigrid<br />

FMM<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 25 / 33


Survey<br />

<strong>Poisson</strong> solver, machine comparison<br />

Execution-times <strong>of</strong> <strong>the</strong> PFFT solver in Genius (Blue Gene/P), Curie and<br />

Corvo (x86-64) for a system size <strong>of</strong> Le = 15.8 as a function <strong>of</strong> <strong>the</strong><br />

number <strong>of</strong> MPI processes.<br />

t (s)<br />

100<br />

10<br />

1<br />

0.1<br />

0.01<br />

Corvo (x86-64)<br />

Curie (x86-64)<br />

Genius (Blue Gene/P)<br />

1 10 100<br />

MPI proc.<br />

1000<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 26 / 33


Time (s)<br />

1000<br />

100<br />

Survey<br />

TD execution-time<br />

10<br />

1<br />

180 atoms<br />

650 atoms<br />

1365 atoms<br />

10 100 1000 10000 100000<br />

CPU cores<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 27 / 33


speed-up<br />

100,000<br />

10,000<br />

1,000<br />

100<br />

10<br />

Survey<br />

Speed-up <strong>of</strong> <strong>the</strong> TD<br />

Ideal<br />

180 atoms<br />

650 atoms<br />

1365 atoms<br />

10 100 1,000 10,000 100,000<br />

CPU cores<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 28 / 33


From where we came?<br />

Where we go?<br />

Survey<br />

Conclusions<br />

Conclusions<br />

Outline<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 29 / 33


Conclusions<br />

Conclusions<br />

• FFT based methods are <strong>the</strong> most accurate ones<br />

• Although, all are accurate enough<br />

• Time-dependent execution has done one step forward<br />

• PFFT solver has overcame <strong>the</strong> problem <strong>of</strong> <strong>the</strong> previous<br />

implementations<br />

• FMM solver is also highly parallel, but with a big prefactor<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 30 / 33


Conclusions<br />

Acknowledgements<br />

• PRACE Research Infrastructure<br />

• Rechenzentrum Garching (RZG) <strong>of</strong> <strong>the</strong> Max Planck Society<br />

• European Research Council Advanced Grant DYNamo<br />

(ERC-2010-AdG-Proposal No. 267374)<br />

• Spanish Grants (FIS2011-65702-C02-01 and PIB2010US-00652)<br />

• ACI-Promociona (ACI2009-1036),<br />

• General funding for research groups UPV/EHU (ALDAPA,<br />

GIU10/02)<br />

• Grupos Consolidados UPV/EHU del Gobierno Vasco (IT-319-07)<br />

• European Commission project CRONOS (280879-2 CRONOS<br />

CP-FP7).<br />

• Scholarship <strong>of</strong> <strong>the</strong> University <strong>of</strong> <strong>the</strong> Basque Country UPV/EHU.<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 31 / 33


Joseba Alberdi-Rodriguez.<br />

Conclusions<br />

Bibliography<br />

Analysis <strong>of</strong> performance and scaling <strong>of</strong> <strong>the</strong> scientific code Octopus.<br />

LAP LAMBERT Academic Publishing, 2010.<br />

I. Kabadshow and H. Dachsel.<br />

The Error-Controlled Fast Multipole Method for Open and Periodic<br />

Boundary Conditions.<br />

In Fast Methods for Long-Range Interactions in Complex Systems, IAS<br />

Series, Volume 6, Forschungszentrum Jülich, Germany, 2010. CECAM.<br />

Michael Pippig.<br />

An Efficient and Flexible Parallel FFT Implementation Based on FFTW.<br />

In Bisch<strong>of</strong>, Christian and Hegering, Heinz-Gerd and Nagel, Wolfgang E.<br />

and Wittum, Gabriel, editor, Competence in High Performance<br />

Computing, pages 125–134. Springer, 2010.<br />

J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 32 / 33

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!