23.12.2014 Views

Implementing MD Simulations and Principal Component Analysis to ...

Implementing MD Simulations and Principal Component Analysis to ...

Implementing MD Simulations and Principal Component Analysis to ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Implementing</strong> <strong>MD</strong><br />

<strong>Simulations</strong> <strong>and</strong> <strong>Principal</strong><br />

<strong>Component</strong> <strong>Analysis</strong> <strong>to</strong><br />

Calculate Protein Motions<br />

Myrna I. Merced Serrano<br />

Computational <strong>and</strong> Systems Biology<br />

Summer Institute<br />

Iowa State University<br />

Ames, IA 50010


<strong>MD</strong><br />

Molecular Dynamics is a way <strong>to</strong> study a<strong>to</strong>ms<br />

<strong>and</strong> molecules with computer simulations.<br />

This study of a<strong>to</strong>ms <strong>and</strong> molecules is based<br />

on the laws of physics.<br />

PCA<br />

<strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> is a technique<br />

<strong>to</strong> reduce multidimensional data sets <strong>to</strong> lower<br />

dimensional data.


Goal<br />

To perform <strong>MD</strong> <strong>Simulations</strong> <strong>to</strong><br />

study large domain protein<br />

motions.


Data<br />

Protein pdb’s files:<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

1AL3→ CysB<br />

1LST→ LAO (closed form)<br />

2LAO→ LAO (open form)<br />

1PDA→ PBGD<br />

2DRI→ RBP<br />

1TFA→ OVOT<br />

2LIV→ LIVBP


Software<br />

Cygwin<br />

<br />

Linux environment for Windows.<br />

NA<strong>MD</strong><br />

<br />

<strong>MD</strong> code <strong>to</strong> perform simulations of large<br />

biomolecular systems.<br />

MatLab<br />

<br />

Programming language <strong>and</strong> numerical computing<br />

environment for algorithm development, data<br />

visualization, <strong>and</strong> data analysis.


Methods<br />

Download the pdb files of the proteins from http://<br />

www.pdb.org.<br />

Development of a bash script <strong>to</strong> generate the files <strong>to</strong><br />

run the minimizations in NA<strong>MD</strong>.<br />

Minimize each protein for 2,500 steps.<br />

Development of a bash script <strong>to</strong> generate the<br />

configuration file <strong>to</strong> run the <strong>MD</strong>.<br />

Run each protein for 3ns (1,500,000 steps).<br />

Development of a bash script <strong>to</strong> generate the input<br />

matrix for the PCA.<br />

Analyze each trajec<strong>to</strong>ry with the PCA in MatLab.<br />

Plot the results of the PCA (MatLab <strong>and</strong> Excel).


Script <strong>to</strong> generate the files<br />

#Script <strong>to</strong> generate the files for the minimization in NA<strong>MD</strong><br />

#!/bin/bash<br />

COMMON_DIR=/home/mymese/inv_summer_2007/data/common<br />

DISTANCE_PROT_BOX=5<br />

#write de pdb protein file<br />

echo -e "mol load pdb $1.pdb\\n set prot [a<strong>to</strong>mselect <strong>to</strong>p protein]\\n \$prot writepdb $1_p.pdb\\n quit\\n" |<br />

/usr/vmd/vmd -dispdev text<br />

#create the pgn file<br />

/bin/sed -n -e "s/MOL/$1/" -e "w $1.pgn" $COMMON_DIR/template.pgn<br />

#run the pgn file<br />

echo -e "source $1.pgn\\n quit\\n" | /usr/vmd/vmd -dispdev text<br />

#put the protein in a box of water<br />

echo -e "package require solvate\\n solvate $1_psfgen.psf $1_psfgen.pdb -t $DISTANCE_PROT_BOX -o $1_wb \n quit\\n" |<br />

/usr/vmd/vmd -dispdev text<br />

#load psf <strong>and</strong> pdb in<strong>to</strong> V<strong>MD</strong> <strong>and</strong> calculate the center of the box<br />

box_center=`echo -e "mol load psf $1_wb.psf\\n mol load pdb $1_wb.pdb\\n set prot [a<strong>to</strong>mselect <strong>to</strong>p \"all\"]\\n<br />

measure center \\$prot\\n quit\\n" | /usr/vmd/vmd -dispdev text | tail -3 | head -1 | cut -d ' ' -f3-5`<br />

#load psf <strong>and</strong> pdb in<strong>to</strong> V<strong>MD</strong> <strong>and</strong> calculate the minmax of the box<br />

box_minmax=`echo -e "mol load psf $1_wb.psf\\n mol load pdb $1_wb.pdb\\n set prot [a<strong>to</strong>mselect <strong>to</strong>p \"all\"]\\n<br />

measure minmax \\$prot\\n quit\\n" | /usr/vmd/vmd -dispdev text | tail -3 | head -1 | cut -d ' ' -f3-8`<br />

#values of box center <strong>and</strong> measures of minmax<br />

echo "Box center: $box_center"<br />

echo "Box minmax: $box_minmax"<br />

#copy the psf <strong>and</strong> pdb <strong>to</strong> common direc<strong>to</strong>ry<br />

cp $1_psfgen* $COMMON_DIR/.<br />

cp $1_w* $COMMON_DIR/.<br />

#create the configuration file<br />

/bin/sed -n -e "s/MOL/$1/" -e "s/BOX_CENTER/$box_center/" -e "w $1_conf_wb.conf" $COMMON_DIR/template_wb.conf


Script <strong>to</strong> generate the matrix<br />

#Script <strong>to</strong> generate the matrix for the PCA<br />

#!/bin/bash<br />

#You need <strong>to</strong> change the name of the input file's names<br />

PSF_FILE=$1_wb.psf<br />

DCD_FILE=$1<br />

#load the dcd trajec<strong>to</strong>ry file <strong>and</strong> saved in a pdb format<br />

echo -e "mol load psf $PSF_FILE dcd $DCD_FILE.dcd \\n animate write pdb DCD_$DCD_FILE.pdb \\n quit\\n" | /usr/vmd/vmd -dispdev text<br />

#remove the residues in the trajec<strong>to</strong>ry<br />

grep "CA" DCD_$DCD_FILE.pdb > DCD_CA_$DCD_FILE.coords<br />

#first residue<br />

CA=`head -1 DCD_CA_$DCD_FILE.coords | cut -d ' ' -f1-14`<br />

#count the number of residues<br />

grep -c "CA" $1_wb.pdb > res.temp<br />

#count the number of frames<br />

grep -c "${CA}" DCD_CA_$DCD_FILE.coords > fram.temp<br />

#set the number of residues<br />

<strong>to</strong>tal_residues=`cat res.temp`<br />

echo <strong>to</strong>tal: $<strong>to</strong>tal_residues<br />

#set the number of frames<br />

<strong>to</strong>tal_frames=`cat fram.temp`<br />

echo <strong>to</strong>tal: $<strong>to</strong>tal_frames<br />

#remove the coordinates of the CA<br />

cut -c32-54 DCD_CA_$DCD_FILE.coords > CA_$DCD_FILE.coods<br />

#create the matrix with the coordinates of the CA<br />

cat CA_$DCD_FILE.coods | for i in `seq 1 $<strong>to</strong>tal_frames`; do<br />

for i in `seq 1 $<strong>to</strong>tal_residues`; do<br />

read coords;<br />

echo -n "$coords " >> matrix_$DCD_FILE.txt;<br />

done<br />

echo " " >> matrix_$DCD_FILE.txt;<br />

done<br />

#remove the not necessary files<br />

rm CA_* DCD_CA* *.temp


<strong>Analysis</strong><br />

In this study we worked with 7 different<br />

proteins, here are the results of 1AL3.


RMSD of the trajec<strong>to</strong>ry<br />

RMSD of 1AL3's trajec<strong>to</strong>ry<br />

RMSD<br />

1.7<br />

1.5<br />

1.3<br />

1.1<br />

0.9<br />

0.7<br />

0.5<br />

0 200000 400000 600000 800000 1000000 1200000 1400000<br />

Time Step (0.002ps)<br />

This figure shows the RMSD for the 3ns of 1AL3’s<br />

trajec<strong>to</strong>ry.<br />

The protein has relatively small fluctuations over<br />

time.


Fraction of Variance<br />

MatLab code:<br />

<br />

PCA<br />

1<br />

0.9<br />

0.8<br />

Fraction of 1AL3's Variance<br />

<br />

[pc, score, latent, tsquare] =<br />

princomp(matrix_1AL3);<br />

Individual Fraction<br />

Fraction<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

1 2 3 4 5<br />

PC<br />

Individual<br />

Cumulative<br />

<br />

<br />

<br />

latent = the eigenvalues of the<br />

matrix_1AL3 covariance matrix<br />

latent(1:5)/sum(latent)<br />

Total Cumulative Fraction<br />

<br />

sum(latent(1:5))/sum(latent)


Scatter Plots of the Data<br />

3 0<br />

P C 1 v s P C 2 ( 1 A L 3 )<br />

3 D S c a t t e r P l o t o f 1 A L 3<br />

2 0<br />

3 0<br />

1 0<br />

2 0<br />

1 0<br />

P C 2<br />

0<br />

- 1 0<br />

P C 3<br />

0<br />

- 1 0<br />

- 2 0<br />

- 2 0<br />

- 3 0<br />

- 3 0<br />

4 0<br />

2 0<br />

0<br />

5 0<br />

1 0 0<br />

- 4 0<br />

- 6 0 - 4 0 - 2 0 0 2 0 4 0 6 0<br />

P C 1<br />

P C 2<br />

- 2 0<br />

- 4 0<br />

- 5 0<br />

0<br />

P C 1<br />

MatLab code:<br />

<br />

<br />

score = Z-scores<br />

scatter(score(:,1), score(:,2))<br />

scatter3(score(:,1), score(:,2), score(:, 3))


PC1 Residue Fluctuations<br />

0 . 0 4<br />

R e s i d u e F l u c t u a t i o n s o f 1 A L 3 ' s P C 1<br />

0 . 0 3<br />

0 . 0 2<br />

M e a n - S q u a r e F l u c t u a t i o n s<br />

0 . 0 1<br />

0<br />

- 0 . 0 1<br />

- 0 . 0 2<br />

- 0 . 0 3<br />

- 0 . 0 4<br />

- 0 . 0 5<br />

2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0 1 6 0 1 8 0 2 0 0 2 2 0<br />

R e s i d u e I n d e x


Visualization of the PC 1 Motion<br />

Picture <strong>and</strong> video by Lei Yang.


Preliminary Conclusions<br />

These results suggest that minimized protein<br />

structures are stable.<br />

<strong>Principal</strong> components can be used <strong>to</strong> extract<br />

the important motions of a protein; however,<br />

long simulation times are required <strong>to</strong> obtain<br />

larger motions.


Acknowledgements<br />

Lei Yang<br />

Dr. Xuefeng Zhao<br />

Dr. Robert Jernigan<br />

CyBlue Staff<br />

BBSI 2007


And now what<br />

<strong>Principal</strong> <strong>Component</strong>s <strong>Analysis</strong> in<br />

the trajec<strong>to</strong>ries of DNA-CNT<br />

Hybrids.


DNA-CNT Hybrid (5 monomers)<br />

Orthographic View Perspective View<br />

Initial Structure<br />

Final Structure


Some preliminary results<br />

Poly-C of 5 monomers<br />

Poly-C of 15 monomers<br />

There is a transition <strong>to</strong> the final structure.<br />

The process is irreversible.<br />

Graphics by José Sotero

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!