Scalability of the Poisson solvers - TDDFT.org
Scalability of the Poisson solvers - TDDFT.org
Scalability of the Poisson solvers - TDDFT.org
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
From where we came?<br />
Where we go?<br />
Survey<br />
Conclusions<br />
Outline<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 2 / 33
From where we came?<br />
Where we go?<br />
Survey<br />
Conclusions<br />
From where we came?<br />
Outline<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 3 / 33
From where we came?<br />
Introduction<br />
This presentation is based on a paper that we are going to publish “A<br />
survey <strong>of</strong> <strong>the</strong> parallel performance and accuracy <strong>of</strong> <strong>Poisson</strong> <strong>solvers</strong> for<br />
electronic structure calculations” <strong>of</strong> <strong>the</strong> following authors:<br />
• Pablo García Risueño<br />
• Joseba Alberdi-Rodriguez<br />
• Micael J. T. Oliveira<br />
• Xavier Andrade<br />
• Michael Pippig<br />
• Javier Muguerza<br />
• Agustin Arruabarrena<br />
• Angel Rubio<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 4 / 33
From where we came?<br />
What do we want?<br />
• Simulate efficiently chlorophyll molecule<br />
180 atoms 441 atoms<br />
650 atoms 1365 atoms<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 5 / 33
From where we came?<br />
What was <strong>the</strong> problem?<br />
• Time-dependent runs did not scale ideally [?]<br />
Time (s)<br />
100<br />
10<br />
1<br />
0.1<br />
10 100 1000 10000 100000<br />
CPU cores<br />
TD with ISF<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 6 / 33
• Blue Gene/P<br />
Time (s)<br />
100<br />
10<br />
1<br />
0.1<br />
From where we came?<br />
<strong>Poisson</strong> solver did not scale,<br />
TD with ISF<br />
<strong>Poisson</strong> solver<br />
10 100 1000 10000 100000<br />
CPU cores<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 7 / 33
speed-up<br />
100<br />
90<br />
80<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
From where we came?<br />
or; Not optimised <strong>Poisson</strong> solver<br />
<strong>Poisson</strong> solver % <strong>of</strong> <strong>the</strong> TD<br />
3% 4% 14% 27% 48%<br />
512 1024 2048 4096 8192<br />
MPI processes<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 8 / 33
From where we came?<br />
Where we go?<br />
Survey<br />
Conclusions<br />
Where we go?<br />
Outline<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 9 / 33
Where we go?<br />
Possible solution<br />
Change to a new scalable <strong>Poisson</strong> solver<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 10 / 33
Where we go?<br />
Possible solution<br />
Change to a new scalable <strong>Poisson</strong> solver<br />
• Fast multipole method<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 10 / 33
Where we go?<br />
Possible solution<br />
Change to a new scalable <strong>Poisson</strong> solver<br />
• Fast multipole method<br />
• Parallel fast Fourier transform<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 10 / 33
Where we go?<br />
Possible solution<br />
Change to a new scalable <strong>Poisson</strong> solver<br />
• Fast multipole method<br />
• Parallel fast Fourier transform<br />
So, we have implemented two new <strong>Poisson</strong> <strong>solvers</strong> based on parallel<br />
FFT [?] fast multipole method (FMM) [?] libraries.<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 10 / 33
Based on serial FFT<br />
Where we go?<br />
Parallel fast Fourier transform<br />
• Works with parallelepiped grid shapes only<br />
• Padding from octopus grid needed<br />
• Done with MPI Ga<strong>the</strong>r and MPI Scatter<br />
• Never better than constant time.<br />
PFFT implementation<br />
• Also, parallelepiped grid shape<br />
• But, with a new grid partitioning (next slide)<br />
• Implemented highly parallel data-movement, using MPI Alltoall<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 11 / 33
Z axis<br />
Where we go?<br />
PFFT grid partitioning<br />
Simplified domain decomposition <strong>of</strong> <strong>the</strong> simulation meshes.<br />
• A) Octopus mesh with a 3D domain decomposition<br />
• B) PFFT mesh with a 2D decomposition.<br />
Y axis<br />
X axis<br />
A) Octopus mesh B) PFFT mesh<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 12 / 33
Previous execution<br />
X MPI processes<br />
X MPI processes<br />
Where we go?<br />
Serial FFT solver<br />
ga<strong>the</strong>r<br />
scatter<br />
root<br />
root<br />
Mesh_to_cube<br />
Cube_to_mesh<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 13 / 33<br />
FFT
Previous execution PFFT<br />
X MPI processes<br />
X MPI processes<br />
Where we go?<br />
Parallel FFT solver<br />
ALLga<strong>the</strong>r<br />
root Mesh_to_cube<br />
root<br />
scatter Cube_to_mesh<br />
FFT<br />
1) Do ga<strong>the</strong>r <strong>of</strong><br />
pfft%data_in<br />
2) RS =<br />
pfft%data_in<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 14 / 33
Where we go?<br />
Parallel FFT solver execution<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 15 / 33
Where we go?<br />
Fast multipole method<br />
• Usage <strong>of</strong> an external library, developed in JSC<br />
• Use <strong>of</strong> <strong>the</strong> adaptive grid shape <strong>of</strong> octopus<br />
• Implementation <strong>of</strong> a correction term<br />
• O(N) complexity<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 16 / 33
Where we go?<br />
Scheme <strong>of</strong> how <strong>the</strong> inclusion<br />
<strong>of</strong> semi-neighbours <strong>of</strong> point<br />
P<br />
• A) Scheme without<br />
semi-neighbours.<br />
• B) considering<br />
semi-neighbours <strong>of</strong><br />
point P<br />
Fast multipole method (II)<br />
A)<br />
B)<br />
C)<br />
L/2<br />
P Error in <strong>the</strong><br />
integral for V<br />
P neighbours<br />
P<br />
P semi-neighbours<br />
point P<br />
charges are equispaced<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 17 / 33
Where we go?<br />
Fast multipole method (III)<br />
2D example <strong>of</strong> <strong>the</strong> position <strong>of</strong> cells containing semi<br />
neighbours.�r0-centred cell itself.<br />
A)<br />
B)<br />
Original box: all cell’s size is L 2 (2D)<br />
New: Semi-neighbours cell sizes are L 2 /4<br />
Side cells sizes are 7/8L 2<br />
Central cell size is L 2 /2<br />
(all in 2D)<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 18 / 33
From where we came?<br />
Where we go?<br />
Survey<br />
Conclusions<br />
Survey<br />
Outline<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 19 / 33
Survey<br />
Survey<br />
We have measured <strong>the</strong> accuracy and <strong>the</strong> performance <strong>of</strong> <strong>the</strong> 6 available<br />
<strong>Poisson</strong> solver <strong>of</strong> octopus. Those are compared <strong>solvers</strong>:<br />
• PFFT<br />
• ISF<br />
• FMM<br />
• CG corrected<br />
• Multigrid<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 20 / 33
Survey<br />
Accuracy<br />
We measured <strong>the</strong> accuracy <strong>of</strong> a <strong>Poisson</strong> solver, Epot and Eenergy , as<br />
follows:<br />
Epot :=<br />
Ea = 1<br />
2<br />
En = 1<br />
2<br />
�<br />
ijk |va(�rijk) − vn(�rijk)|<br />
�<br />
ijk |va(�rijk)|<br />
�<br />
ρ(�rijk)va(�rijk) ,<br />
ijk<br />
�<br />
ρ(�rijk)vn(�rijk) ,<br />
ijk<br />
Eenergy := Ea − En<br />
Ea<br />
,<br />
,<br />
where;<br />
• �rijk: all <strong>the</strong> points <strong>of</strong> <strong>the</strong> analysed<br />
grid<br />
• va: analytically calculated<br />
potential<br />
• vn: calculated potential calculated<br />
• Ea: analytical Hartree energy<br />
• En: numerical Hartree energy<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 21 / 33
Survey<br />
Accuracy (Epot error)<br />
Epot errors <strong>of</strong> different <strong>Poisson</strong> <strong>solvers</strong> in <strong>the</strong> calculation <strong>of</strong> <strong>the</strong> Hartree<br />
potential created by a Gaussian charge distribution represented on a<br />
grid <strong>of</strong> edge Le = 15.8 ˚A and spacing 0.2 ˚A.<br />
Le (˚A) PFFT ISF FMM CG Multigrid<br />
7 9·10 −3 4·10 −3 4·10 −3 4·10 −3 4·10 −3<br />
10 1·10 −4 2·10 −5 8·10 −5 3·10 −5 2·10 −5<br />
15.8 6·10 −5 2·10 −7 1·10 −4 5·10 −5 6·10 −6<br />
22.1 1·10 −5 4·10 −10 3·10 −4 3·10 −4 2·10 −6<br />
25.9 4·10 −7 7·10 −10 3·10 −4 5·10 −3 4·10 −7<br />
31.7 4·10 −8 1 ·10 −9 2·10 −4 1·10 −2 4·10 −7<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 22 / 33
Survey<br />
Accuracy (Eenergy error)<br />
Eenergy errors <strong>of</strong> different <strong>Poisson</strong> <strong>solvers</strong> in <strong>the</strong> calculation <strong>of</strong> <strong>the</strong><br />
Hartree potential created by a Gaussian charge distribution represented<br />
on a grid <strong>of</strong> edge Le = 15.8 ˚A and spacing 0.2 ˚A.<br />
Le (˚A) PFFT ISF FMM CG Multigrid<br />
7 2·10 −3 2·10 −3 2·10 −3 2·10 −3 2·10 −3<br />
10 9·10 −6 9·10 −6 4·10 −6 4·10 −6 9·10 −6<br />
15.8 2·10 −12 3·10 −7 3·10 −6 3·10 −5 5·10 −6<br />
22.1
Survey<br />
Performance<br />
• We have used 2 different architectures<br />
• x86-64<br />
• Curie<br />
• Corvo<br />
• Blue Gene/P<br />
• Jugene<br />
• Genius<br />
• We have run hartree test and measured <strong>the</strong> time <strong>of</strong> <strong>the</strong> <strong>Poisson</strong><br />
solver<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 24 / 33
Survey<br />
<strong>Poisson</strong> solver, Blue Gene/P<br />
Execution-times for <strong>the</strong> calculation <strong>of</strong> <strong>the</strong> Hartree potential created by a<br />
Gaussian charge distribution on a Blue Gene/P machine as a function <strong>of</strong><br />
six different <strong>Poisson</strong> <strong>solvers</strong> and <strong>of</strong> <strong>the</strong> involved number <strong>of</strong> MPI<br />
processes.<br />
1000<br />
t (s)<br />
100<br />
10<br />
1<br />
0.1<br />
0.01<br />
1 4 16 64 256 4096<br />
MPI proc.<br />
PFFT<br />
Serial FFT<br />
ISF<br />
CG corrected<br />
Multigrid<br />
FMM<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 25 / 33
Survey<br />
<strong>Poisson</strong> solver, machine comparison<br />
Execution-times <strong>of</strong> <strong>the</strong> PFFT solver in Genius (Blue Gene/P), Curie and<br />
Corvo (x86-64) for a system size <strong>of</strong> Le = 15.8 as a function <strong>of</strong> <strong>the</strong><br />
number <strong>of</strong> MPI processes.<br />
t (s)<br />
100<br />
10<br />
1<br />
0.1<br />
0.01<br />
Corvo (x86-64)<br />
Curie (x86-64)<br />
Genius (Blue Gene/P)<br />
1 10 100<br />
MPI proc.<br />
1000<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 26 / 33
Time (s)<br />
1000<br />
100<br />
Survey<br />
TD execution-time<br />
10<br />
1<br />
180 atoms<br />
650 atoms<br />
1365 atoms<br />
10 100 1000 10000 100000<br />
CPU cores<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 27 / 33
speed-up<br />
100,000<br />
10,000<br />
1,000<br />
100<br />
10<br />
Survey<br />
Speed-up <strong>of</strong> <strong>the</strong> TD<br />
Ideal<br />
180 atoms<br />
650 atoms<br />
1365 atoms<br />
10 100 1,000 10,000 100,000<br />
CPU cores<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 28 / 33
From where we came?<br />
Where we go?<br />
Survey<br />
Conclusions<br />
Conclusions<br />
Outline<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 29 / 33
Conclusions<br />
Conclusions<br />
• FFT based methods are <strong>the</strong> most accurate ones<br />
• Although, all are accurate enough<br />
• Time-dependent execution has done one step forward<br />
• PFFT solver has overcame <strong>the</strong> problem <strong>of</strong> <strong>the</strong> previous<br />
implementations<br />
• FMM solver is also highly parallel, but with a big prefactor<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 30 / 33
Conclusions<br />
Acknowledgements<br />
• PRACE Research Infrastructure<br />
• Rechenzentrum Garching (RZG) <strong>of</strong> <strong>the</strong> Max Planck Society<br />
• European Research Council Advanced Grant DYNamo<br />
(ERC-2010-AdG-Proposal No. 267374)<br />
• Spanish Grants (FIS2011-65702-C02-01 and PIB2010US-00652)<br />
• ACI-Promociona (ACI2009-1036),<br />
• General funding for research groups UPV/EHU (ALDAPA,<br />
GIU10/02)<br />
• Grupos Consolidados UPV/EHU del Gobierno Vasco (IT-319-07)<br />
• European Commission project CRONOS (280879-2 CRONOS<br />
CP-FP7).<br />
• Scholarship <strong>of</strong> <strong>the</strong> University <strong>of</strong> <strong>the</strong> Basque Country UPV/EHU.<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 31 / 33
Joseba Alberdi-Rodriguez.<br />
Conclusions<br />
Bibliography<br />
Analysis <strong>of</strong> performance and scaling <strong>of</strong> <strong>the</strong> scientific code Octopus.<br />
LAP LAMBERT Academic Publishing, 2010.<br />
I. Kabadshow and H. Dachsel.<br />
The Error-Controlled Fast Multipole Method for Open and Periodic<br />
Boundary Conditions.<br />
In Fast Methods for Long-Range Interactions in Complex Systems, IAS<br />
Series, Volume 6, Forschungszentrum Jülich, Germany, 2010. CECAM.<br />
Michael Pippig.<br />
An Efficient and Flexible Parallel FFT Implementation Based on FFTW.<br />
In Bisch<strong>of</strong>, Christian and Hegering, Heinz-Gerd and Nagel, Wolfgang E.<br />
and Wittum, Gabriel, editor, Competence in High Performance<br />
Computing, pages 125–134. Springer, 2010.<br />
J. Alberdi-Rodriguez (UPV/EHU) <strong>Poisson</strong> solver in Octopus October 23 32 / 33