30.04.2015 Views

Large Scale Biomolecular Dynamics Simulations - prace

Large Scale Biomolecular Dynamics Simulations - prace

Large Scale Biomolecular Dynamics Simulations - prace

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

XXL-BIOMD<br />

<strong>Large</strong> <strong>Scale</strong> <strong>Biomolecular</strong> <strong>Dynamics</strong> <strong>Simulations</strong><br />

David van der Spoel, PI<br />

Aatto Laaksonen<br />

Peter Coveney<br />

Siewert-Jan Marrink<br />

Mikael Peräkylä<br />

Uppsala, Sweden<br />

Stockholm, Sweden<br />

London, UK<br />

Groningen, Netherlands<br />

Kuopio, Finland<br />

onsdag, 2009 maj 13


Molecular <strong>Dynamics</strong><br />

1. Start with a system of particles with given<br />

coordinates<br />

2. Compute forces on each particle due to a<br />

(classical) energy function<br />

3. Integrate the particle positions<br />

4. Save coordinates and energies etc.<br />

5. Go to 2.<br />

Particles != Atoms<br />

onsdag, 2009 maj 13


Protein BBA5:<br />

4000 Waters:<br />

Total:<br />

400 atoms<br />

12000 atoms<br />

12400 atoms<br />

onsdag, 2009 maj 13


onsdag, 2009 maj 13


BBA5 folding: CPU Time<br />

• 12400 atoms<br />

• 100 interactions per atom<br />

• 50 flops / interaction<br />

• 2 x 10 7 integration steps<br />

• 1.24 Petaflop (NOW: 1 day on 8 cores)<br />

• 10,000 copies of the simulation (in 2004<br />

using Folding@Home with GROMACS)<br />

onsdag, 2009 maj 13


Fraction folded structures<br />

Fold Fraction over time<br />

0.6<br />

0.4<br />

Folding time<br />

from slope<br />

4-5 μs<br />

0.2<br />

lagtime<br />

0.0 0<br />

ns<br />

10<br />

ns<br />

20<br />

ns<br />

30<br />

ns<br />

Rhee et al. Proc. Natl. Acad. Sci. U.S.A. 101 (2004) p. 6456<br />

onsdag, 2009 maj 13


Simulation vs. Experiments<br />

Experiments<br />

Efficient averaging<br />

Bond vibrations<br />

torsions<br />

single ion passes<br />

through a channel<br />

Where we<br />

need to be<br />

Less detail<br />

-15 -12 -9 -6<br />

-3 0<br />

3<br />

10 s 10 s 10 s 10 s 10 s 10 s 10 s<br />

<strong>Simulations</strong><br />

Extreme detail<br />

Sampling issues?<br />

Parameter quality?<br />

Where we are<br />

Fast protein<br />

folding<br />

Normal protein<br />

folding<br />

Where we<br />

want to be<br />

Interesting<br />

protein folding<br />

Biologically<br />

interesting stuff<br />

onsdag, 2009 maj 13


GROMACS<br />

• The world’s fastest MD code - and it’s GPL!<br />

• Estimated 5,000-10,000 academic and<br />

industrial users<br />

• Used in Folding@Home - 250,000 CPUs<br />

(2008)<br />

• Part of SPECfp & PRACE benchmark suites<br />

• PRACE project for improvements on<br />

GROMACS (CSC, Erik Lindahl)<br />

onsdag, 2009 maj 13


DPPC & Cholesterol:<br />

130k atoms<br />

BlueGene: 6ns/day,<br />

using 2000 CPUs<br />

GROMACS only<br />

achieves 2ns/day ...<br />

...on a single dual dual<br />

core Opteron node!<br />

onsdag, 2009 maj 13


Parallel Domain Decomposition<br />

• Partition space, instead of atoms, over<br />

nodes<br />

• Supported in version 4 of Gromacs<br />

• Good for load balancing<br />

• Bad for communication bandwidth<br />

• Each node ‘imports’ coordinate and<br />

exports forces from neighbors within a<br />

sphere with radius=cutoff (expensive)<br />

Data must be imported<br />

from whole sphere, although<br />

it can be optimized to half<br />

onsdag, 2009 maj 13


The Eighth-Sphere Method<br />

• Smarter way to communicate<br />

•<br />

Don’t calculate interactions on a<br />

home node, but in general on<br />

“neutral territory” (David Shaw)<br />

• Drastically reduced communication<br />

bandwidth needs for dom. dec.<br />

• 2D example to the right<br />

•<br />

In 3D, we need to import data<br />

from 1/8 sphere to the central cell<br />

cells send data to<br />

central (purple) cell, where<br />

interactions are calculated<br />

Red/Yellow<br />

• Implemented in GROMACS 4<br />

onsdag, 2009 maj 13


Efficient PME Parallelization<br />

• Almost all accurate simulations today<br />

use Particle-Mesh Ewald lattice<br />

summation<br />

• Small 3D Fourier Transforms scale<br />

bad - all-to-all communication<br />

• Direct space & PME are mostly<br />

independent, though!<br />

• Dedicate a subset of nodes to run a<br />

separate PME-only version of the<br />

program to improve scaling<br />

X<br />

Y<br />

• FFT over 5 instead of 25 nodes!<br />

Original implementation with<br />

help of RZG of MPI<br />

PME nodes<br />

onsdag, 2009 maj 13


Scaling - DHFR 23558 atoms<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

•1 fs time step<br />

•Constraints on H-<br />

Bonds<br />

•PME every other step<br />

(NAMD, Desmond)<br />

resp. every step<br />

(GROMACS)<br />

<br />

<br />

<br />

Hess et al. J. Comp.Theor. Chem. 4 (2008) 435-447<br />

onsdag, 2009 maj 13


Blue Gene/P scaling<br />

200<br />

0.43M atoms<br />

60<br />

•Coarse grained system<br />

•Cut-off 2.6 sigma<br />

steps / second<br />

100<br />

3.4M atoms<br />

40<br />

20<br />

steps / second<br />

0<br />

0 512 1024 1536 2048<br />

#cores<br />

0<br />

Amdahl’s law:<br />

•0.43 Matoms on 1024 cores - 28% time in global sum<br />

•3.4 Matoms on 2048 cores: 33% time in global sum<br />

onsdag, 2009 maj 13


Deisa Extreme<br />

Computing Initiative<br />

• <strong>Large</strong>r systems<br />

• Longer simulation times<br />

onsdag, 2009 maj 13


<strong>Dynamics</strong> of a<br />

Virus Capsid<br />

Daniel Larsson<br />

Lars Liljas<br />

David van der Spoel<br />

Uppsala University<br />

Sweden<br />

onsdag, 2009 maj 13


Why study viruses?<br />

• New viral diseases may be on the<br />

way - Bird flu, SARS, Mexican<br />

• 40 existing drugs against viruses -<br />

20 against AIDS<br />

• Most drugs target viral<br />

reproduction cycle (reverse<br />

transcriptases, proteases)<br />

• Interesting features - self assembly<br />

• Packaging of RNA/DNA<br />

onsdag, 2009 maj 13


Satellite Tobacco Necrosis Virus<br />

• Discovered 1967<br />

• Icosahedral plant<br />

virus<br />

• Satellite virus, TNV is<br />

the helper<br />

• Transmitted by a<br />

fungus<br />

• Small: 18 nm<br />

diameter<br />

• Simple model system<br />

onsdag, 2009 maj 13


5 ns simulation. Blue: protein. Green: Cl - . Red Na +<br />

Computer time estimate: 12 Petaflop<br />

onsdag, 2009 maj 13


What can simulations<br />

contribute with?<br />

• Non-averaged aspects<br />

• Non-symmetrical aspects<br />

• Missing pieces of the structure (Res. 1-11)<br />

• Effect of structural Ca 2+ on stability<br />

• RNA binding / Salt effects<br />

• <strong>Dynamics</strong> – the fourth dimension<br />

onsdag, 2009 maj 13


onsdag, 2009 maj 13


RNA Secondary Structure Prediction<br />

Bringloe et al. J. Gener. Virol. 79 (1998) p. 1539<br />

onsdag, 2009 maj 13


The goal is not merely reproducing<br />

experimental results.<br />

onsdag, 2009 maj 13


Unfortunately, reproducing<br />

experimental results is difficult.<br />

onsdag, 2009 maj 13


Simulation Details<br />

• 1,000,000+ particles<br />

• OPLS/AA force field + TIP3P water<br />

• Particle mesh Ewald<br />

• 5 fs timestep<br />

• Dodecahedron simulation box<br />

onsdag, 2009 maj 13


Hardware Specs<br />

Louhi (CSC,<br />

Espoo)<br />

HECToR<br />

(Edinburgh)<br />

Neolith<br />

(Linköping)<br />

Vendor Cray Cray HP<br />

CPU type<br />

AMD Opteron<br />

2.3 GHz<br />

AMD Opteron<br />

Intel Xeon E534<br />

(2.33 GHz, 8<br />

MB L2 cache)<br />

#Cores 10800 12000 6440<br />

Interconnect Cray Cray<br />

Infiniband<br />

ConnectX<br />

OS Cray Linux Cray Linux Centos 5 Linux<br />

Top500 (11-08) 32 47 56<br />

Price ? ? ?<br />

Time (2008) 400 kh (DECI) 800 kh (DECI) 3600 kh (VR)<br />

onsdag, 2009 maj 13


Superposition of 60<br />

monomers after 100<br />

ns MD<br />

Superposition of 12<br />

pentamers<br />

after 100 ns MD<br />

•Main part of the protein acts like a rigid body.<br />

•Residue 1-24 forms a flexible arm.<br />

onsdag, 2009 maj 13


onsdag, 2009 maj 13<br />

500 ns: ~1.2 Exaflop, 30 core-years


onsdag, 2009 maj 13<br />

Size of the virus particle


Water flow analysis<br />

• Is the virus capsid leaky?<br />

• Can water molecules pass?<br />

• Can ions pass?<br />

• Is the flow concentrated to specific<br />

regions?<br />

onsdag, 2009 maj 13


onsdag, 2009 maj 13<br />

Physiological Conditions


Conclusions<br />

• N-terminal arm is flexible<br />

• Ionic strength is important for stability but<br />

may not be enough. RNA!<br />

• Capsid is very leaky<br />

• (interfering with capsid stability is a<br />

possibility for antiviral therapy, but not<br />

proven yet)<br />

onsdag, 2009 maj 13


Questions & future<br />

plans...<br />

• What structural changes facilitate the<br />

water flow?<br />

• The RNA structure - 2D to 3D structure<br />

• Affinities (ΔGbind) between the proteins -<br />

quantitative stability analysis<br />

• Pathways for virus assembly (simplified<br />

models?)<br />

onsdag, 2009 maj 13


Extreme Computing<br />

• If you can not solve a small problem, try a<br />

bigger problem<br />

• How about accuracy of the results & the<br />

predictive power?<br />

• HPC will force many codes to reconsider the<br />

physics<br />

• Amdahl’s law needs to be taken care of in MD<br />

onsdag, 2009 maj 13


Molecular Biophysics, Uppsala<br />

Daniel Larsson<br />

onsdag, 2009 maj 13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!