Large Scale Biomolecular Dynamics Simulations - prace
Large Scale Biomolecular Dynamics Simulations - prace
Large Scale Biomolecular Dynamics Simulations - prace
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
XXL-BIOMD<br />
<strong>Large</strong> <strong>Scale</strong> <strong>Biomolecular</strong> <strong>Dynamics</strong> <strong>Simulations</strong><br />
David van der Spoel, PI<br />
Aatto Laaksonen<br />
Peter Coveney<br />
Siewert-Jan Marrink<br />
Mikael Peräkylä<br />
Uppsala, Sweden<br />
Stockholm, Sweden<br />
London, UK<br />
Groningen, Netherlands<br />
Kuopio, Finland<br />
onsdag, 2009 maj 13
Molecular <strong>Dynamics</strong><br />
1. Start with a system of particles with given<br />
coordinates<br />
2. Compute forces on each particle due to a<br />
(classical) energy function<br />
3. Integrate the particle positions<br />
4. Save coordinates and energies etc.<br />
5. Go to 2.<br />
Particles != Atoms<br />
onsdag, 2009 maj 13
Protein BBA5:<br />
4000 Waters:<br />
Total:<br />
400 atoms<br />
12000 atoms<br />
12400 atoms<br />
onsdag, 2009 maj 13
onsdag, 2009 maj 13
BBA5 folding: CPU Time<br />
• 12400 atoms<br />
• 100 interactions per atom<br />
• 50 flops / interaction<br />
• 2 x 10 7 integration steps<br />
• 1.24 Petaflop (NOW: 1 day on 8 cores)<br />
• 10,000 copies of the simulation (in 2004<br />
using Folding@Home with GROMACS)<br />
onsdag, 2009 maj 13
Fraction folded structures<br />
Fold Fraction over time<br />
0.6<br />
0.4<br />
Folding time<br />
from slope<br />
4-5 μs<br />
0.2<br />
lagtime<br />
0.0 0<br />
ns<br />
10<br />
ns<br />
20<br />
ns<br />
30<br />
ns<br />
Rhee et al. Proc. Natl. Acad. Sci. U.S.A. 101 (2004) p. 6456<br />
onsdag, 2009 maj 13
Simulation vs. Experiments<br />
Experiments<br />
Efficient averaging<br />
Bond vibrations<br />
torsions<br />
single ion passes<br />
through a channel<br />
Where we<br />
need to be<br />
Less detail<br />
-15 -12 -9 -6<br />
-3 0<br />
3<br />
10 s 10 s 10 s 10 s 10 s 10 s 10 s<br />
<strong>Simulations</strong><br />
Extreme detail<br />
Sampling issues?<br />
Parameter quality?<br />
Where we are<br />
Fast protein<br />
folding<br />
Normal protein<br />
folding<br />
Where we<br />
want to be<br />
Interesting<br />
protein folding<br />
Biologically<br />
interesting stuff<br />
onsdag, 2009 maj 13
GROMACS<br />
• The world’s fastest MD code - and it’s GPL!<br />
• Estimated 5,000-10,000 academic and<br />
industrial users<br />
• Used in Folding@Home - 250,000 CPUs<br />
(2008)<br />
• Part of SPECfp & PRACE benchmark suites<br />
• PRACE project for improvements on<br />
GROMACS (CSC, Erik Lindahl)<br />
onsdag, 2009 maj 13
DPPC & Cholesterol:<br />
130k atoms<br />
BlueGene: 6ns/day,<br />
using 2000 CPUs<br />
GROMACS only<br />
achieves 2ns/day ...<br />
...on a single dual dual<br />
core Opteron node!<br />
onsdag, 2009 maj 13
Parallel Domain Decomposition<br />
• Partition space, instead of atoms, over<br />
nodes<br />
• Supported in version 4 of Gromacs<br />
• Good for load balancing<br />
• Bad for communication bandwidth<br />
• Each node ‘imports’ coordinate and<br />
exports forces from neighbors within a<br />
sphere with radius=cutoff (expensive)<br />
Data must be imported<br />
from whole sphere, although<br />
it can be optimized to half<br />
onsdag, 2009 maj 13
The Eighth-Sphere Method<br />
• Smarter way to communicate<br />
•<br />
Don’t calculate interactions on a<br />
home node, but in general on<br />
“neutral territory” (David Shaw)<br />
• Drastically reduced communication<br />
bandwidth needs for dom. dec.<br />
• 2D example to the right<br />
•<br />
In 3D, we need to import data<br />
from 1/8 sphere to the central cell<br />
cells send data to<br />
central (purple) cell, where<br />
interactions are calculated<br />
Red/Yellow<br />
• Implemented in GROMACS 4<br />
onsdag, 2009 maj 13
Efficient PME Parallelization<br />
• Almost all accurate simulations today<br />
use Particle-Mesh Ewald lattice<br />
summation<br />
• Small 3D Fourier Transforms scale<br />
bad - all-to-all communication<br />
• Direct space & PME are mostly<br />
independent, though!<br />
• Dedicate a subset of nodes to run a<br />
separate PME-only version of the<br />
program to improve scaling<br />
X<br />
Y<br />
• FFT over 5 instead of 25 nodes!<br />
Original implementation with<br />
help of RZG of MPI<br />
PME nodes<br />
onsdag, 2009 maj 13
Scaling - DHFR 23558 atoms<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
•1 fs time step<br />
•Constraints on H-<br />
Bonds<br />
•PME every other step<br />
(NAMD, Desmond)<br />
resp. every step<br />
(GROMACS)<br />
<br />
<br />
<br />
Hess et al. J. Comp.Theor. Chem. 4 (2008) 435-447<br />
onsdag, 2009 maj 13
Blue Gene/P scaling<br />
200<br />
0.43M atoms<br />
60<br />
•Coarse grained system<br />
•Cut-off 2.6 sigma<br />
steps / second<br />
100<br />
3.4M atoms<br />
40<br />
20<br />
steps / second<br />
0<br />
0 512 1024 1536 2048<br />
#cores<br />
0<br />
Amdahl’s law:<br />
•0.43 Matoms on 1024 cores - 28% time in global sum<br />
•3.4 Matoms on 2048 cores: 33% time in global sum<br />
onsdag, 2009 maj 13
Deisa Extreme<br />
Computing Initiative<br />
• <strong>Large</strong>r systems<br />
• Longer simulation times<br />
onsdag, 2009 maj 13
<strong>Dynamics</strong> of a<br />
Virus Capsid<br />
Daniel Larsson<br />
Lars Liljas<br />
David van der Spoel<br />
Uppsala University<br />
Sweden<br />
onsdag, 2009 maj 13
Why study viruses?<br />
• New viral diseases may be on the<br />
way - Bird flu, SARS, Mexican<br />
• 40 existing drugs against viruses -<br />
20 against AIDS<br />
• Most drugs target viral<br />
reproduction cycle (reverse<br />
transcriptases, proteases)<br />
• Interesting features - self assembly<br />
• Packaging of RNA/DNA<br />
onsdag, 2009 maj 13
Satellite Tobacco Necrosis Virus<br />
• Discovered 1967<br />
• Icosahedral plant<br />
virus<br />
• Satellite virus, TNV is<br />
the helper<br />
• Transmitted by a<br />
fungus<br />
• Small: 18 nm<br />
diameter<br />
• Simple model system<br />
onsdag, 2009 maj 13
5 ns simulation. Blue: protein. Green: Cl - . Red Na +<br />
Computer time estimate: 12 Petaflop<br />
onsdag, 2009 maj 13
What can simulations<br />
contribute with?<br />
• Non-averaged aspects<br />
• Non-symmetrical aspects<br />
• Missing pieces of the structure (Res. 1-11)<br />
• Effect of structural Ca 2+ on stability<br />
• RNA binding / Salt effects<br />
• <strong>Dynamics</strong> – the fourth dimension<br />
onsdag, 2009 maj 13
onsdag, 2009 maj 13
RNA Secondary Structure Prediction<br />
Bringloe et al. J. Gener. Virol. 79 (1998) p. 1539<br />
onsdag, 2009 maj 13
The goal is not merely reproducing<br />
experimental results.<br />
onsdag, 2009 maj 13
Unfortunately, reproducing<br />
experimental results is difficult.<br />
onsdag, 2009 maj 13
Simulation Details<br />
• 1,000,000+ particles<br />
• OPLS/AA force field + TIP3P water<br />
• Particle mesh Ewald<br />
• 5 fs timestep<br />
• Dodecahedron simulation box<br />
onsdag, 2009 maj 13
Hardware Specs<br />
Louhi (CSC,<br />
Espoo)<br />
HECToR<br />
(Edinburgh)<br />
Neolith<br />
(Linköping)<br />
Vendor Cray Cray HP<br />
CPU type<br />
AMD Opteron<br />
2.3 GHz<br />
AMD Opteron<br />
Intel Xeon E534<br />
(2.33 GHz, 8<br />
MB L2 cache)<br />
#Cores 10800 12000 6440<br />
Interconnect Cray Cray<br />
Infiniband<br />
ConnectX<br />
OS Cray Linux Cray Linux Centos 5 Linux<br />
Top500 (11-08) 32 47 56<br />
Price ? ? ?<br />
Time (2008) 400 kh (DECI) 800 kh (DECI) 3600 kh (VR)<br />
onsdag, 2009 maj 13
Superposition of 60<br />
monomers after 100<br />
ns MD<br />
Superposition of 12<br />
pentamers<br />
after 100 ns MD<br />
•Main part of the protein acts like a rigid body.<br />
•Residue 1-24 forms a flexible arm.<br />
onsdag, 2009 maj 13
onsdag, 2009 maj 13<br />
500 ns: ~1.2 Exaflop, 30 core-years
onsdag, 2009 maj 13<br />
Size of the virus particle
Water flow analysis<br />
• Is the virus capsid leaky?<br />
• Can water molecules pass?<br />
• Can ions pass?<br />
• Is the flow concentrated to specific<br />
regions?<br />
onsdag, 2009 maj 13
onsdag, 2009 maj 13<br />
Physiological Conditions
Conclusions<br />
• N-terminal arm is flexible<br />
• Ionic strength is important for stability but<br />
may not be enough. RNA!<br />
• Capsid is very leaky<br />
• (interfering with capsid stability is a<br />
possibility for antiviral therapy, but not<br />
proven yet)<br />
onsdag, 2009 maj 13
Questions & future<br />
plans...<br />
• What structural changes facilitate the<br />
water flow?<br />
• The RNA structure - 2D to 3D structure<br />
• Affinities (ΔGbind) between the proteins -<br />
quantitative stability analysis<br />
• Pathways for virus assembly (simplified<br />
models?)<br />
onsdag, 2009 maj 13
Extreme Computing<br />
• If you can not solve a small problem, try a<br />
bigger problem<br />
• How about accuracy of the results & the<br />
predictive power?<br />
• HPC will force many codes to reconsider the<br />
physics<br />
• Amdahl’s law needs to be taken care of in MD<br />
onsdag, 2009 maj 13
Molecular Biophysics, Uppsala<br />
Daniel Larsson<br />
onsdag, 2009 maj 13