13.07.2015 Views

computer modeling in molecular biology.pdf

computer modeling in molecular biology.pdf

computer modeling in molecular biology.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Computer Modell<strong>in</strong>g<strong>in</strong> Molecular BiologyEdited byJulia M. Goodfellow


0 VCH Verlagsgesellschaft mbH, D-69451 We<strong>in</strong>heim (Federal Republic of Germany), 1995Distribution:VCH, P. 0. Box 10 1161, D-69451 We<strong>in</strong>heim (Federal Republic of Germany)Switzerland: VCH, P. 0. Box, CH-4020 Basel (Switzerland)United K<strong>in</strong>gdom and Ireland: VCH, 8 Well<strong>in</strong>gton Court, Cambridge CB1 lHZ (United K<strong>in</strong>gdom)USA and Canada: VCH, 220 East 23rd Street, New York, NY 100104606 (USA)Japan: VCH, Eikow Build<strong>in</strong>g, 10-9 Hongo 1-chome, Bunkyo-ku, Tokyo 113 (Japan)ISBN 3-527-30062-7


Professor Julia M. GoodfellowDepartment of CrystallographyBirkbeck CollegeUniversity of LondonMalet StreetLondon WCl E 7HXUnited K<strong>in</strong>gdomThis book was carefully produced. Nevertheless, authors, editor and publisher do not warrant the<strong>in</strong>formation conta<strong>in</strong>ed there<strong>in</strong> to be free of errors. Readers are advised to keep <strong>in</strong> m<strong>in</strong>d that statements,data, illustrations, procedural details or other items may <strong>in</strong>advertently be <strong>in</strong>accurate.Published jo<strong>in</strong>tly byVCH Verlagsgesellschaft, We<strong>in</strong>heim (Federal Republic of Germany)VCH Publishers, New York, NY (USA)Editorial Director: Dr. Hans-Joachim KrausProduction Manager: Claudia Gross1Library of Congress Card No. applied forBritish Library Catalogu<strong>in</strong>g-<strong>in</strong>-Publication Data:A catalogue record for this book is available from the British LibraryDie Deutsche Bibliothek - CIP-E<strong>in</strong>heitsaufnahmeComputer modell<strong>in</strong>g <strong>in</strong> <strong>molecular</strong> <strong>biology</strong> / ed. by Julia M. Goodfellow. -We<strong>in</strong>heim ; New York ; Basel ; Cambridge ; Tokyo : VCH, 1995ISBN 3-527-30062-7NE: Goodfellow, Julia M. [Hrsg.]OVCH Verlagsgesellschaft mbH, D-69451 We<strong>in</strong>heim (Federal Republic of Germany), 1995Pr<strong>in</strong>ted on acid-free and low-chlor<strong>in</strong>e paperAll rights reserved (<strong>in</strong>clud<strong>in</strong>g those of translation <strong>in</strong>to other languages). No part of this book may be reproduced<strong>in</strong> any form -by photopr<strong>in</strong>t<strong>in</strong>g, microfilm, or any other means -nor transmitted or translated <strong>in</strong>to amach<strong>in</strong>elanguage without written permission from thepublishers. Registered names, trademarks, etc. used<strong>in</strong> this book, even when not specifically marked as such, are not to be considered unprotected by law.Composition: Filmsatz Unger & Sommer GmbH, D-69469 We<strong>in</strong>heimPr<strong>in</strong>t<strong>in</strong>g: betz-druck GmbH, D-64291 DarmstadtBookb<strong>in</strong>d<strong>in</strong>g: GroBbuchb<strong>in</strong>derei Josef Sp<strong>in</strong>ner, D-77831 OttersweierPr<strong>in</strong>ted <strong>in</strong> the Federal Republic of Germany


PrefaceComputer simulation studies have advanced along way from the studies onmonatomic fluids such as argon and <strong>molecular</strong> liquids such as water. Even the earlypioneer<strong>in</strong>g explorations of macro<strong>molecular</strong> conformation, which were only achievedwith the greatest technical expertise, look simple <strong>in</strong> comparison to what can beachieved <strong>in</strong> the 1990s. These advances have occurred because of the <strong>in</strong>creas<strong>in</strong>g speedof <strong>computer</strong> hardware whether scalar, vector or parallel as well as the availability ofsoftware.This book provides a series of snap-shots of the use of <strong>molecular</strong> simulationtechniques to study a wide-range of biological problems. I hope you will not onlysee the current successes but also realize that there is still much further to go. Weare still limited to the picosecond and nanosecond time range and to relatively smallmacromolecules when the <strong>biology</strong> demands that we study the behaviour ofmacro<strong>molecular</strong> assemblies on millisecond timescales. There are many other grandchallenges ahead <strong>in</strong>clud<strong>in</strong>g the better <strong>in</strong>corporation of quantum mechanical effects,with<strong>in</strong> large <strong>molecular</strong> systems, and the use of more realistic electrostatic models forboth short and long range calculations. It will be of <strong>in</strong>terest to see how <strong>molecular</strong>modell<strong>in</strong>g develops over the next few years and whether it cont<strong>in</strong>ues to progress asfast as the previous ten years and if it can keep pace with the exponential <strong>in</strong>crease<strong>in</strong> experimental data on both sequences and structure of biological macromolecules.London, May 1995Julia M. Goodfellow


ContentsColour Illustrations ..........................................XI -XVIIntroduction to Computer Simulation: Methods and Applications.. ....Julia M. Goodfellow and Mark A. Williams1Modell<strong>in</strong>g Prote<strong>in</strong> Structures ...................................... 9Tim J. I! Hubbard and Arthur M. LeskMolecular Dynamics Simulations of Peptides ........................D. J. Osgufhorpe and I? K. C PaulMolecular Dynamics and Free Energy Calculations Applied to theEnzyme Barnase and One of its Stability Mutants.. ..................Shoshana J. Wodak, Daniel van Belle, and Mart<strong>in</strong>e Prkvost3761The Use of Molecular Dynamics Simulations for Modell<strong>in</strong>gNucleic Acids .................................................... 103E. WesthoJ C. Rub<strong>in</strong>-Carrez, and K FritschTheory of Transport <strong>in</strong> Ion Channels.. .............................Benoit RouxMolecular Modell<strong>in</strong>g and Simulations of Major HistocompatibilityComplex Class I Prote<strong>in</strong>-Peptide Interactions ........................Christopher .J# Thorpe and David S. MossPath Energy M<strong>in</strong>imization: A New Method for the Simulation ofConformational Transitions of Large Molecules ......................Oliver S. Smart13317 1215Index ........................................................... 241


ContributorsV. FritschInstitut de Biologie Moleculaireet CellulaireCentre National de la RechercheScientifique15, Rue R. DescartesF-67084 StrasbourgFranceJulia M. GoodfellowDepartment of CrystallographyBirkbeck CollegeUniversity of LondonMalet StreetLondon WClE 7HXU.K.Tim J. P. HubbardCentre for Prote<strong>in</strong> Eng<strong>in</strong>eer<strong>in</strong>gMedical Research Council CentreHills RoadCambridge CB2 2QHU.K.Arthur M. LeskDepartment of HeamatologyMedical Research Council CentreHills RoadCambridge CB2 2QHU.K.David S. MossDepartment of CrystallographyBirkbeck CollegeUniversity of LondonMalet StreetLondon WClE 7HXU.K.D. J. OsguthorpeMolecular Graphics UnitSchool of ChemistryUniversity of BathBath BA2 7AYU.K.P. K. C. PaulMolecular Graphics UnitSchool of ChemistryUniversity of BathBath BA2 7AYU.K.Mart<strong>in</strong>e PrCvostUniversitC Libre de BruxellesUnit6 de Conformation deMacromolCculesBiologiques, CP160/16, P2Avenue P. HCgerB-1050 BruxellesBelgium


XContributorsBenoit RouxGroupe de Recherche en TransportMembranaireDtpartement de PhysiqueUniversitC de MontrCalC.P. 6128, succ. ACanada H3C 357C. Rub<strong>in</strong>-CarrezInstitut de Biologie MolCculaireet CellulaireCentre National de la RechercheScientifique15, Rue R. DescartesF-67084 StrasbourgFranceOliver S. SmartDepartment of CrystallographyBirkbeck CollegeUniversity of LondonMalet StreetLondon WClE 7HXU.K.Christopher J. ThorpeMicrobiologiskt och lhmorbiologisktCentrumKarol<strong>in</strong>ska InstitutetDoetars<strong>in</strong>gen 13S-17177 StockholmSwedenDaniel Van BelleUniversitC Libre de BruxellesUnit6 de Conformation deMacromoltculesBiologiques, CP160/16, P2Avenue P. HCgerB-1050 BruxellesBelgiumE. WesthofInstitut de Biologie MolCculaireet CellulaireCentre National de la RechercheScientifique15, Rue R. DescartesF-67084 StrasbourgFranceMark A. WilliamsDepartment of CrystallographyBirkbeck CollegeUniversity of LondonMalet StreetLondon WClE 7HXU.K.Shoshana J. WodakUniversitC Libre de BruxellesUnitC de Conformation deMacromolCcules Biologiques,CP160/16, P2Avenue P. HCgerB-1050 BruxellesBelgium


Colour Illustrations


Colour IllustrationsXI11Figure 4-4. The simulated barnase-water system. (a) The start<strong>in</strong>g conformation of the system,consist<strong>in</strong>g of one of the three barnase molecules <strong>in</strong> the asymmetric unit (molecule C) fromthe 2 A resolution ref<strong>in</strong>ed crystal structure (represented by its <strong>molecular</strong> surface), 94crystallography determ<strong>in</strong>ed water positions located with<strong>in</strong> 4 A of a prote<strong>in</strong> atom (<strong>in</strong> blue) and2265 randomly oriented water molecules (<strong>in</strong> red) placed on a cubic lattice <strong>in</strong> a rectangular box(dimensions : 49.68 x 37.16 x 49.68 A), (b) the same system after 50 ps. Colour<strong>in</strong>g of the prote<strong>in</strong>surface is chosen accord<strong>in</strong>g to the value of the mean square displacement of the ma<strong>in</strong>cha<strong>in</strong>atoms: small displacement < 0.6 A (blue), medium < 1.2 A (yellow), large < 4.0 A(red). Water molecules display<strong>in</strong>g a very large displacement ( > 4.0 A) are coloured <strong>in</strong> p<strong>in</strong>k andrepresent bulk water molecules. It can also be seen that water molecules trapped <strong>in</strong> cavitiesdisplay the same mean displacement as the surround<strong>in</strong>g prote<strong>in</strong> atoms.(Text see page 69).


Colour IllustrationsXVFigure 7-4 Figure 7-5Figure 7-4. Molecular model for the <strong>in</strong>teraction of HLA-A*0201 with peptide. The peptide,shown as a p<strong>in</strong>k tube with side cha<strong>in</strong>s <strong>in</strong> ball an stick representation, is sited <strong>in</strong> the antigenb<strong>in</strong>d<strong>in</strong>g cleft between the a-helices of the a1 and a2 doma<strong>in</strong>s (coloured red and green respectively).The doma<strong>in</strong>s coloured dark blue and light blue are a3 and f%,-miroglobul<strong>in</strong> respectivelyand are membrane proximal <strong>in</strong> their orientation to the antigen present<strong>in</strong>g cell (APC).Both the N- and C-term<strong>in</strong>i are heavily buried with<strong>in</strong> the prote<strong>in</strong>. In the 2.1 A structure ofHLA-B*2705 it has been demonstrated that 48% of the 2003 A2 of buried surface of theRRIKAITLK model peptide would be buried by the chelation of alan<strong>in</strong>e residues from Pl-P3and P8-P9 with no residues built <strong>in</strong>to positions P4-P7. This buried surface was <strong>in</strong>creased to57% by the substitution of AlaP2 to ArgP2. This suggests that the predom<strong>in</strong>ant direct <strong>in</strong>teractionsare at the term<strong>in</strong>i and that the central bulge of the peptide is raised on a solvent bedto maximise its contact with the T-cell receptor (TCR).(Text see page 180).Figure 7-5. Side view of the model of HLA-A*0201 with peptide show<strong>in</strong>g the asymmetry ofthe molecule. In the immunoglobul<strong>in</strong>-like a3 and ~2-microglobul<strong>in</strong> doma<strong>in</strong>s the asymmetryof the pack<strong>in</strong>g gives rise to an atypical immunoglobul<strong>in</strong> (Ig) pair<strong>in</strong>g. A slight shift <strong>in</strong> doma<strong>in</strong>disposition, with respect to that <strong>in</strong> HLA-A2, <strong>in</strong> the membrane proximal Ig-like doma<strong>in</strong>s hasbeen observed <strong>in</strong> the structures of HLA-B27 and H-2Kb.(Text see page 180).


XVIColour IllustrationsFigure 7-18. Overlay of five conformers (displayed <strong>in</strong> magenta, red, purple, green and cyan)from the simulation of the Arg-P2 residue of the EBNA 3C peptide <strong>in</strong> the “45-pocket” ofHLA-B*2705. It may be clearly observed that TIP451 is stationary throughout the simulation.The movements of the Arg-P2 side cha<strong>in</strong> are small and are co-operative with those of TIP456,a water residue which is <strong>in</strong>volved <strong>in</strong> the square planar hydrogen bond<strong>in</strong>g network <strong>in</strong> the P2pocket. (Text see page 207).


Computer Modell<strong>in</strong>g <strong>in</strong> Molecular BiologyEdited by Julia M. GoodfellowOVCH Verlagsgesellschaft mbH, 19951 Introduction to Computer Simulation:Methods and ApplicationsJulia M. Goodfellow and Mark A. WilliamsDepartment of Crystallography, Birkbeck College, University of London,Malet Street, London WClE 7HX, EnglandContents1.1 Introduction ....................................................... 21.2 What is Computer Simulation? ...................................... 21.3 Methods .......................................................... 31.4 Prediction of Molecular Conformation. ............................... 41.5 Flexibility and Dynamics ............................................ 51.6 Thermodynamics ................................................... 51.7 Summary ......................................................... 6References.. ....................................................... 6


2 Julia M. Goodfellow and Mark A. Williams1.1 IntroductionThere have been considerable advances <strong>in</strong> the field of <strong>computer</strong> simulation of <strong>molecular</strong>systems s<strong>in</strong>ce the basic algorithms and early applications to simple liquidsystems were published [l-51. The complexity and size of the systems be<strong>in</strong>g studied,and the number of applications which are published annually have <strong>in</strong>creased enormouslydur<strong>in</strong>g the past decade. This expansion is correlated with the improvements<strong>in</strong> <strong>computer</strong> hardware [6] and the wider availability of simulation software. Projectswhich were not practicable <strong>in</strong> the 1970s became feasible us<strong>in</strong>g super<strong>computer</strong>s dur<strong>in</strong>gthe 1980s, and have now become rout<strong>in</strong>e applications on high speed workstations.Thus, we can now reserve the current super<strong>computer</strong>s for the most sophisticated, andhopefully most realistic, modell<strong>in</strong>g applications at the cutt<strong>in</strong>g edge of computationalscience.The number of <strong>computer</strong> simulation and <strong>molecular</strong> modell<strong>in</strong>g studies reported <strong>in</strong>the past five years is so large that it is difficult to cover comprehensively even thesubset of studies <strong>in</strong>volv<strong>in</strong>g the behaviour of the macromolecules, such as prote<strong>in</strong>sand DNA, that are the subject of this book. In this area of structural <strong>molecular</strong><strong>biology</strong>, brief annual reviews of both methodological advances and applications arenow available [7-91. In this book, the focus is on an <strong>in</strong> depth consideration of particularapplications which have been chosen to highlight current areas of research,rather than methodologies, which are considered briefly <strong>in</strong> this chapter. This selectionhas been made <strong>in</strong> the full knowledge that many other <strong>in</strong>terest<strong>in</strong>g areas ofresearch <strong>in</strong> this very active subject have been omitted.1.2 What is Computer Simulation?The term ‘<strong>computer</strong> simulation and modell<strong>in</strong>g of macromolecules’ is used to encompassa wide range of techniques. It is possible to divide these, somewhat artefactually,<strong>in</strong>to three classes : knowledge or rule based methods, quantum mechanical methods,and classical ‘potential energy’ based techniques. These three areas are not mutuallyexclusive, and it may be necessary to comb<strong>in</strong>e quantum mechanical and classicalmethods to tackle particular problems, as well as to ref<strong>in</strong>e rule-based models us<strong>in</strong>gclassical ‘potential energy’ methods. However, each class of methods forms asubstantial research area <strong>in</strong> itself and it would be difficult to give detailed considerationto all of these classes <strong>in</strong> one book. Consequently, the focus of this book is onone of them, the classical potential energy based methods, which <strong>in</strong>clude energym<strong>in</strong>imization, Monte Carlo methods, and <strong>molecular</strong> dynamics. No attempt has beenmade to describe the numerous methods and applications <strong>in</strong> the area of quantum


1 Introduction to Computer Simulation: Methods and Applications 3mechanics. However, it has been impossible to resist the temptation to <strong>in</strong>clude anexample from the extremely important area of rule based modell<strong>in</strong>g of prote<strong>in</strong> structure.1.3 MethodsThe classical methods <strong>in</strong>volv<strong>in</strong>g energy calculations are cont<strong>in</strong>ually be<strong>in</strong>g updatedand improved. However, the underly<strong>in</strong>g pr<strong>in</strong>ciples rema<strong>in</strong> the same. They all require<strong>in</strong>put of an atomistic start<strong>in</strong>g model (i. e. atomic coord<strong>in</strong>ates), a mathematicaldescription of the <strong>in</strong>teraction energy between all pairs of atom types (i. e. force field),and the def<strong>in</strong>ition of the constra<strong>in</strong>ts on the system usually <strong>in</strong> terms of size,temperature, and volume or pressure. The basic simulation algorithms provide differentmethods for the sampl<strong>in</strong>g of the equilibrium configurations of a <strong>molecular</strong>system, and detailed descriptions of the methods are available <strong>in</strong> the literature(e. g. [lo- 121). Protocols such as simulated anneal<strong>in</strong>g have been developed to dealwith flexible peptides such as those described <strong>in</strong> Chapter 3. Long timescale simulatedanneal<strong>in</strong>g studies have also been used <strong>in</strong> attempts to simulate prote<strong>in</strong> unfold<strong>in</strong>g[13- 161. Other time-consum<strong>in</strong>g calculations are those needed for the estimation offree energy differences between similar <strong>molecular</strong> systems [17], and appkations ofsuch calculations are given for mutants of barnase (Chapter 4) and ion transport <strong>in</strong>channels (Chapter 6).Many force fields have been developed dur<strong>in</strong>g the past twenty years [18-271, theseare often associated with a particular software package, although several recent programsoffer a choice. There are several studies <strong>in</strong> the literature which compare thedifferent force fields [28 -311 and their underly<strong>in</strong>g approximations. One frequentlyused approximation <strong>in</strong>volves the representation of the solvent implicity by modificationof <strong>in</strong>tra and <strong>in</strong>tersolute forces, rather than explicitly <strong>in</strong>clud<strong>in</strong>g solvent molecules.This approximation is made because the explicit <strong>in</strong>clusion of solvent substantially <strong>in</strong>creasesthe number of atoms <strong>in</strong> the system and is consequently very time-consum<strong>in</strong>g.The use of implicit solvent representations is not however without drawbacks, andthe effects of different solvent representations are considered <strong>in</strong> detail <strong>in</strong> Chapter 5for nucleic acids and <strong>in</strong> Chapter 6 for prote<strong>in</strong>s.Another almost universial assumption has been the use of isotropic potentials todescribe the electrostatic <strong>in</strong>teractions between the molecule’s constituent atoms.Techniques are now available which take account of anisotropic electrostatic effectssuch as the use of electrostatic multipoles on each atomic site [32-341.Long range electrostatic <strong>in</strong>teractions are generally omitted <strong>in</strong> order to m<strong>in</strong>imisethe time required to carry out a simulation of a large macro<strong>molecular</strong> system. Theeffects of this truncation of the electrostatic <strong>in</strong>teractions <strong>in</strong> the system have recently


4 Julia M. Goodfellow and Mark A. Williamsbeen assessed [35] and found to alter some properties of the system. Methods to efficientlyapproximate the long range <strong>in</strong>teractions have been applied recently tobiological systems [36] <strong>in</strong> an effort to m<strong>in</strong>imise these truncation effects.The underly<strong>in</strong>g constra<strong>in</strong>t <strong>in</strong> an application of simulation techniques to macro<strong>molecular</strong>problems is often <strong>computer</strong> resources which may limit the size of thesystem, the sophistication of the force field and/or the length of time for which thesimulation can be undertaken. The latter is important as recent papers are emphasis<strong>in</strong>gthe need for simulations of a nanosecond or longer [37, 381 <strong>in</strong> order to obta<strong>in</strong>representative samples of the equilibrium configurations of large biological macromolecules.Important recent developments which will help to reduce the user timenecessary to carry out macro<strong>molecular</strong> simulations <strong>in</strong>clude the use of multipletimestep algorithms [39] and parallel architecture <strong>computer</strong>s. Specific changes toalgorithms have been made to take advantage of a number of different types of parallelhardware [40-431.1.4 Prediction of Molecular ConformationOne of the major applications of <strong>molecular</strong> simulation algorithms is the predictionof the conformation of macromolecules. Simulated anneal<strong>in</strong>g methods are now usedrout<strong>in</strong>ely to ref<strong>in</strong>e both X-ray crystallographic [44] and NMR solution structure [45]of prote<strong>in</strong>s and DNA, often prior to conventional least-squares restra<strong>in</strong>ed ref<strong>in</strong>ement.In other applications, only limited experimental data may be available for therequired structure and these data may only be used to generate the start<strong>in</strong>g modelwhich is ref<strong>in</strong>ed us<strong>in</strong>g <strong>molecular</strong> dynamics. For example, one may be <strong>in</strong>terested <strong>in</strong>predict<strong>in</strong>g the solution structure of a molecule given its crystal structure (Chapter 6),predict<strong>in</strong>g the conformation of a complex between receptor and ligand when onlythe structures of the uncomplexed molecules are known [46], or predict<strong>in</strong>g the effectof chemical modification on a structure [47, 481. There are numerous examples ofthese types of applications [7-91. An example of the use of simulation to study theimportant problem of the b<strong>in</strong>d<strong>in</strong>g of peptides to the MHC class I antigen isdescribed <strong>in</strong> Chapter 7. Energy m<strong>in</strong>imization and <strong>molecular</strong> dynamics may also beused to ref<strong>in</strong>e rule-based or homology built models of prote<strong>in</strong>s such as thosedescribed <strong>in</strong> Chapters 2 and 7.


1 Introduction to Computer Simulation: Methods and Applications 51.5 Flexibility and DynamicsMolecular dynamics methods offer the possibility of study<strong>in</strong>g changes <strong>in</strong> conformationover a period of time rang<strong>in</strong>g from picoseconds to nanoseconds, as well asestimat<strong>in</strong>g average properties of a system at equilibrium. For example, one can seechanges <strong>in</strong> the pucker of the ribose r<strong>in</strong>g <strong>in</strong> oligonucleotide conformations or the flexibilityof loop regions <strong>in</strong> prote<strong>in</strong>s. On a longer timescale, some of the most recentapplications are <strong>in</strong> the related areas of prote<strong>in</strong> unfold<strong>in</strong>g [13- 161 and stability of peptides[49].It is <strong>in</strong>terest<strong>in</strong>g to study the m<strong>in</strong>imum energy path by which changes <strong>in</strong> conformationcan occur as such a study provides considerable <strong>in</strong>sight <strong>in</strong>to the behaviour oflarge molecules, and can often be carried out much more rapidly than a full simulationof the process. Techniques to f<strong>in</strong>d such reaction coord<strong>in</strong>ates for conformationaltransitions form an important advance <strong>in</strong> methodology which has only recently beenapplied to biological systems [50, 511. One promis<strong>in</strong>g method is described <strong>in</strong> Chapter8 together with an application related to changes <strong>in</strong> sugar pucker.1.6 ThermodynamicsAs simulation techniques are compatit.: with the pr<strong>in</strong>ciples of statisticr. mechanics,it is possible to estimate enthalpies and some free energies from simulations of<strong>molecular</strong> systems. As thermodynamic data are usually easier to obta<strong>in</strong> than structural<strong>in</strong>formation for macromolecules <strong>in</strong> solution, comparison of calculated and experimentalthermodynamic quantities is an important process which gives some <strong>in</strong>dicationof the reliability of a particular simulation. Developments <strong>in</strong> the 1980s ledto practical ways of estimat<strong>in</strong>g differences <strong>in</strong> free energy between similar systemsus<strong>in</strong>g thermodynamic cycles. As well as provid<strong>in</strong>g <strong>in</strong>formation which can helpvalidate simulation, such methods are of <strong>in</strong>terest <strong>in</strong> rational drug design and thedesign of novel molecules. The use of free energy methods allows one to estimatethe difference <strong>in</strong> equilibrium behaviour of similar prote<strong>in</strong>s (e. g. the stability of wildtype versus mutant prote<strong>in</strong>s) as <strong>in</strong> Chapter 4, and also to compare the b<strong>in</strong>d<strong>in</strong>g of differentligands to the same receptor. Other examples of the application of free energycalculations to understand<strong>in</strong>g biological processes <strong>in</strong>clude the study of the transportof ions and solvent through membrane channels, such as that of the simple membranespann<strong>in</strong>g polypeptide gramicid<strong>in</strong> considered <strong>in</strong> Chapter 6.


6 Julia M. Goodfellow and Mark A. Williams1.7 SummaryComputer simulation is one of a number of techniques which can be applied <strong>in</strong> thearea of structural <strong>molecular</strong> <strong>biology</strong>. It is possible to both understand and predictconformational, dynamic and thermodynamic properties of macromolecules fromknowledge about the <strong>in</strong>teractions of their constituent atoms. There are cont<strong>in</strong>uoustechnical developments and comparisons with experimental data which are lead<strong>in</strong>gto more precise and accurate understand<strong>in</strong>g of these complex systems.References[l] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., Teller, E., J. Chem.Php, 1953, 21, 1087-1092.[2] Verlet, L., Phys. Rev. 1967, 159, 98-103.[3] Gear, C. W., Numerical Initial Value Problems <strong>in</strong> Ord<strong>in</strong>ary Differential Equations, PrenticeHall, New York, 1971.[4] Beeman, D., J. Comp. Phys. 1976, 20, 130-.[5] Rahman, A., Phys. Rev. 1964, 136A, 405-411.[6] Kaufmann, W. J. 111, Smarr, L. L., Supercomput<strong>in</strong>g and the transformation of science,Scientific American Library, Freeman and Co., New York, 1993.[7] Berendsen, H., Curr Op<strong>in</strong>. Struct. Biol. 1991, I, 191-195.[8] Goodfellow, J. M., Williams, M. A., Curr. Op<strong>in</strong>. Struct. Biol. 1992, 2, 211-216.[9] van Gunsteren, W. F., Curr. Op<strong>in</strong>. Struct. Biol. 3, 1993, 277-282.[lo] Allen, M. P., Tildesley, D. J., Computer Simulation of Liquids, Clarendon Press, Oxford,1987.[ 111 McCammon, J. A., Harvey, S. C., Dynamics of prote<strong>in</strong>s and nucleic acids, CambridgeUniversity Press, Cambridge, 1987.[12] van Gunsteren, W. F., Berendsen, H. J. C., Angew. Chem. Znt. Ed. Engl. 1990, 29,992- 1023.[13] Mark, A. E., van Gunsteren, W. F., Biochemistry 1992, 31, 1745-7748.[14] Daggett, V., Levitt, M., Proc. Natl. Acad. Sci. USA 1992, 89, 5142-5146.[15] Tirado-Rives, J., Jorgensen, W. L., Biochemistry 1991, 30, 3864-3871.[16] Tirado-Rives, J., Jorgensen, W. L., Biochemistry 1993, 32, 4175-4184.[17] Beveridge, D. L., Dicapua, F. M., Annu. Rev. Biophys. Biophys. Chem. 1989, 18,431 -492.I181 Momany, F. A., Carruthers, L. M., McGuire, R. F., Scheraga, H. A., J. Phys. Chem.1974, 78, 1595-1620.[19] Momany, F. A., J. Phys. Chem. 1975, 79, 2361-2381.[20] Lifson, S., Hagler, A. T., Dauber, P., J. Am. Chem. SOC. 1979, 101, 5111-5121.[21] Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J., Swam<strong>in</strong>athan, S.,Karlplus, M., J. Comp. Chem. 1983, 4, 187-217.[22] We<strong>in</strong>er, S. J., Kollman, P. A., Case, D. A., S<strong>in</strong>gh, U. C., Ghio, C., Alagona, G., Profeta,S., We<strong>in</strong>er, P., J. Am. Chem. SOC. 1984, 106, 765-784.


1 Introduction to Computer Simulation: Methods and Applications 7[23] We<strong>in</strong>er, S. J., Kollman, P. A., Nguyen, D. T., Case, D. A., J. Comp. Chem. 1986, 7,230-252.1241 Nilsson, L., Karplus, M., J. Comp. Chem. 1986, 7, 591-616.[25] Jorgensen, W. L., Tirado-Rives, J., J Am. Chem. SOC. 1988, 110, 1657-1666.[261 Hagler, A. T., Maple, J. R., Tacher, T. S., Fitzgerald, G. B., D<strong>in</strong>ar, U., Potential energyfunctions for organic and bio<strong>molecular</strong> systems <strong>in</strong>: Computer simulation of bio<strong>molecular</strong>systems. Theoretical and experimental applications, (eds.), van Gunsteren, W.,We<strong>in</strong>er, P. K. 149-167, Escom, Leiden, 1990.[27] Lii, J.-H., All<strong>in</strong>ger, N. L., J Comp. Chem. 1991, 12, 186-199.[28] Hall, Pavritt, N., J Comp. Chem. 1984, 5, 441-450.[29] Roterman, Gibson, K. D., Scheraga, H. A., J. Biomol. Struc. & Dyn. 1989, 7, 391-420.[30] Roterman, Lambert, M. H., Gibson, K. D., Scheraga H. A., J. Biomol. Struc. & Dun.1989, 7, 421-454.[31] Beveridge, D. L., Swam<strong>in</strong>athan, S., Ravishanker, G., Withka, J. M., Sr<strong>in</strong>ivasan, J.,Prevost, C., Louise-May, S., Langley, D. R., Dicapua, F. M., Bolton, P. H., ‘MolecularDynamics simulations on the Hydration, Structure and Motion of DNA oligomers’ <strong>in</strong>:Water <strong>in</strong> Biological Macromolecules Westhof, E. (ed.), Macmillan, 1993.[32] Stone, A. J., Alderton, M., Mol., Phys. 1985, 56, 1047-1064.[33] Price, S. L., Mol. Simul., 1988, I, 135-156.[34] Faerman, C., Price, S. L., J Am. Chem. SOC. 1990, II2, 4915-4926.[35] Guenot, J., Kollman, P. A., J Comp. Chem. 1993, 14, 295-311.[36] Shimada, J., Kaneko, H., Takado, T., J. Comp. Chem. 1994, IS, 28-43.[37] Pearlman, D. A., Kollman, P., J Mol. Biol. 1991, 220, 457-479.[38] Soman, K. V., Karimi, A., Case, D. A., Biopolymers 1991, 31, 1351-1361.[39] Thckerman, Berne, B., J. Chem. Phys. 1991, 95, 8362-8364.1401 Merz, J. E., Tobias, D. J., Brooks, C. L. 111, S<strong>in</strong>gh, U. C., J. Comp. Chem. 1991, 12,1270- 1277.[41] Ra<strong>in</strong>e, A. R. C., Mol. Simul. 1991, 7, 59-69.[42] W<strong>in</strong>demuth, A., Schulten, K., Mol. Simul. 1991, 12, 175-179.[43] Jones, D. M., Goodfellow, J. M., J. Comp. Chem. 1993, 14, 127-137.[44] Kuriyan, J., Osapay, K., Burley, S. K., Brunger, A., Hendrickson, W. A., Karplus, M.,Prote<strong>in</strong>s 1991, 10, 340-358.[45] Torda, A. E., van Gunsteren, W. E., Comput. Phys. Commun. 1991, 62, 289-296.[46] Herzyk, P., Neidle, S., Goodfellow, J. M., J. Biomol. Struct. & Dyn. 1992, 10, 97-140.[47] Parker, K., Cruzeiro-Hansson, L., Goodfellow, J. M., J Chem. SOC. Faraday Trans.1993, 89, 2637-2650.[48] Cruzeiro-Hansson, L., Swann, P. F., Pearl, L., Goodfellow, J. M., Carc<strong>in</strong>ogenesis 1992,13, 2067-2073.[49] Hermans, J., Curr Op<strong>in</strong>. Struct. Biol. 3, 270-276.[50] Verkhivker, G., Elber, R., Gibson, Q. H., J. Am. Chem. SOC. 1992, 114, 7866-7878.[511 Ech-Cherif El-Kettani, M. A., Durup, J., Biopolymers 1992, 32, 561 -575.


Computer Modell<strong>in</strong>g <strong>in</strong> Molecular BiologyEdited by Julia M . GoodfellowOVCH Verlagsgesellschaft mbH. 19952 Modell<strong>in</strong>g Prote<strong>in</strong> StructuresTim J. I? Hubbard‘ and Arthur M . ksk2Centre for Prote<strong>in</strong> Eng<strong>in</strong>eer<strong>in</strong>g and Department of Haematology.Medical Research Council Centre. Hills Road. Cambridge. CB2 2QH.EnglandContents2.1 Introduction ...................................................... 102.1.1 The Difficulty of Prote<strong>in</strong> Structure Prediction ......................... 102.1.2 The Idea of Homology Between Prote<strong>in</strong>s ............................. 112.1.3 12A Summary of What Can and Cannot be Predicted ...................2.2 Prov<strong>in</strong>g a Sequence/Structural Relationship ........................... 142.3 Modell<strong>in</strong>g Start<strong>in</strong>g from a Known Structure .......................... 172.3.1 Evolution of Prote<strong>in</strong> Structures ...................................... 172.3.2 Techniques ........................................................ 222.3.2.1 Alignment and Division <strong>in</strong>to SCR’s and SVR’s ........................ 222.3.2.2 Modell<strong>in</strong>g Loop Regions ............................................ 232.3.2.3 Side Cha<strong>in</strong> Build<strong>in</strong>g and Optimisation of Side Cha<strong>in</strong> Conformation ..... 282.3.3 Available Modell<strong>in</strong>g Programs ....................................... 292.4 Modell<strong>in</strong>g de now: Structure Prediction .............................. 292.4.1 A Family of Similar Sequences ...................................... 292.4.2 A Lone Sequence or a Designed Sequence: no Multiple Sequence,no Known Relatives ................................................ 302.5 Future Possibilities ................................................. 312.6 Summary ......................................................... 31References ........................................................ 33


10 Tim J. I? Hubbard and Arthur M. Lesk2.1 IntroductionThe modell<strong>in</strong>g of prote<strong>in</strong> structures comprises a wide variety of activities, cover<strong>in</strong>ga multitude of s<strong>in</strong>s plus an occasional good deed. Thus, although we shall <strong>in</strong>cludediscussion of claims <strong>in</strong> addition to hard results (else there would be little to writeabout) we emphasise that “what you get is what you see”, not what people tell youthat they get: Any method that has not been subjected to controlled bl<strong>in</strong>d tests isof dubious worth.It is useful to start by classify<strong>in</strong>g the types of methods used accord<strong>in</strong>g to the start<strong>in</strong>g<strong>in</strong>formation, or “<strong>in</strong>put”; and the expected nature and quality of the results, the“output”. This paper is not <strong>in</strong>tended as a comprehensive review of research on theprote<strong>in</strong> fold<strong>in</strong>g and prote<strong>in</strong> structure prediction problems. Nor is it a step-by-stepguide to construct<strong>in</strong>g a prote<strong>in</strong> model. Instead it is an <strong>in</strong>troduction to a number ofmethods that can be applied now to problems of modell<strong>in</strong>g the structure of a prote<strong>in</strong>sequence, with the emphasis on allow<strong>in</strong>g the reader who wants to build a model todecide whether his goals are currently practicable. Improvements that can reasonablybe expected <strong>in</strong> the future are also outl<strong>in</strong>ed.2.1.1 The Difficulty of Prote<strong>in</strong> Structure PredictionIt is generally accepted that the am<strong>in</strong>o acid sequences of prote<strong>in</strong>s conta<strong>in</strong> sufficient<strong>in</strong>formation to specify how the l<strong>in</strong>ear cha<strong>in</strong> folds up <strong>in</strong>to a compact 3-D structure.The evidence for this is the type of prote<strong>in</strong> refold<strong>in</strong>g experiments carried out firstby Anf<strong>in</strong>sen [l] and extended by many workers to other systems. The existence of“chaperone” prote<strong>in</strong>s, which are <strong>in</strong> some cases necessary for prote<strong>in</strong> fold<strong>in</strong>g,modifies but does not overturn the fundamental general pr<strong>in</strong>ciple [2]. It should benoted that the ability of many <strong>in</strong>tact prote<strong>in</strong>s to refold after denaturation proves thattheir fold<strong>in</strong>g does not depend on the process of prote<strong>in</strong> synthesis: it cannot be truethat the <strong>in</strong>itially-synthesised N-term<strong>in</strong>us must serve as a nucleus for fold<strong>in</strong>g.We take as our start<strong>in</strong>g po<strong>in</strong>t, therefore, that the am<strong>in</strong>o acid sequence of a prote<strong>in</strong>determ<strong>in</strong>es the conformation. Nature therefore has an “algorithm” for mapp<strong>in</strong>g aset of am<strong>in</strong>o acid sequences <strong>in</strong>to three-dimensional structures. There are two possibleapproaches to try<strong>in</strong>g to predict structure from sequence : deductive methods basedon general physico-chemical pr<strong>in</strong>ciples (this is what nature does), and <strong>in</strong>ductive approachesbased on studies of the known prote<strong>in</strong> structures, <strong>in</strong>clud<strong>in</strong>g modell<strong>in</strong>g byhomology.S<strong>in</strong>ce the fold<strong>in</strong>g of an extended cha<strong>in</strong> is a dynamic process the most physicallyrealistic approach to the problem is the simulation of the motion of all atoms of aprote<strong>in</strong> cha<strong>in</strong> (<strong>molecular</strong> dynamics: MD) [3]. There are a number of currently <strong>in</strong>-


2 Modell<strong>in</strong>n Prote<strong>in</strong> Structures 11surmountable problems with this approach that have deprived it of the success itshould <strong>in</strong> pr<strong>in</strong>ciple someday achieve. The major problems are limitations on <strong>computer</strong>time (currently time <strong>in</strong>tervals of at most nanoseconds can be simulated withtoday’s fastest <strong>computer</strong>s, whereas prote<strong>in</strong> fold<strong>in</strong>g is thought to occur over seconds);and the problem of <strong>in</strong>complete and <strong>in</strong>exact representation of the thermodynamic <strong>in</strong>teractions- particularly the problem of appropriately represent<strong>in</strong>g the prote<strong>in</strong>water<strong>in</strong>teraction (prote<strong>in</strong>s fold <strong>in</strong> an aqueous environment which must berepresented explicitly and accurately) - and other uncerta<strong>in</strong>ties about the <strong>in</strong>teractionpotentials used between atoms <strong>in</strong> the system, particularly the electrostatic terms.MD can however be usefully applied to modell<strong>in</strong>g problems where there are sufficientconformational restra<strong>in</strong>ts,All other a priori prediction methods are less physically realistic than the aboveand <strong>in</strong>volve either unproven assumptions or unlikely generalisations (or both). Nomethod that might be expected to become generally successful is <strong>in</strong> sight.As a priori prediction is unsuccessful, we are fortunate that - unlike a prote<strong>in</strong>cha<strong>in</strong> fold<strong>in</strong>g <strong>in</strong> vivo - modellers can make use of knowledge from all known prote<strong>in</strong>sequences and structures when predict<strong>in</strong>g a prote<strong>in</strong> fold. It is the extraction andapplication of this <strong>in</strong>formation that is the basis for all successful prediction methods.Central to such methods is the idea of homology.2.1.2 The Idea of Homology Between Prote<strong>in</strong>sTwo prote<strong>in</strong>s are homologous if they are related by natural evolutionary processes.Many homologous sequences are sufficiently closely related that their am<strong>in</strong>o acid sequencescan be aligned so that the number of similar or identical pairs of am<strong>in</strong>oacids at aligned positions is greater than expected by chance. An alignment may conta<strong>in</strong>gaps because as a prote<strong>in</strong> sequence evolves both mutation and <strong>in</strong>sertion/deletionevents can occur <strong>in</strong> the encod<strong>in</strong>g DNA.Form<strong>in</strong>g an accurate alignment of the am<strong>in</strong>o acid sequences is absolutely essentialfor useful model build<strong>in</strong>g. When the divergence of the sequences has left no fewerthan about 40% of the residues identical <strong>in</strong> an optimal alignment, it is likely thatthe standard sequence-alignment methods will provide a correct alignment [4]. Indeed,such an alignment is prima facie evidence for homology, which of course canonly rarely be detected directly. (The exceptions are cases of obvious gene duplicationand divergence. A classic example of this is the two doma<strong>in</strong>s of rhodanese [5].) Whensequences have diverged substantially farther than this 40% threshold, it may be impossibleto determ<strong>in</strong>e the correct alignment from pairs of sequences only, but it maybe possible to determ<strong>in</strong>e a correct alignment from a comparison of the structures,provided of course that they are available. Also, multiple sequence alignments aremuch more <strong>in</strong>formative than alignments of only a s<strong>in</strong>gle pair of sequences. In ex-


12 Tim J.P Hubbard and Arthur M. Lesktreme cases, the sequences may have diverged so far that even the fact of a relationshipcannot be detected from the sequences alone, and it is usually impossible todist<strong>in</strong>guish true homology from convergent evolution. We shall discuss this po<strong>in</strong>t <strong>in</strong>more detail below.Close similarities among prote<strong>in</strong> sequences allow prote<strong>in</strong>s of different function orfrom different organisms to be clustered <strong>in</strong>to families. This cluster<strong>in</strong>g providesevidence for homology, <strong>in</strong>dicat<strong>in</strong>g that each member of a family is evolutionarilyrelated and derived from a common ancestor. It is observed that homologous prote<strong>in</strong>sequences also have similar 3-D structures and the relationship between sequencedivergence and structure divergence has been quantified [6, 71. But structuralsimilarity is often observed even if no sequence homology can be detected. A wellknownexample is the family conta<strong>in</strong><strong>in</strong>g hexok<strong>in</strong>ase, heat shock prote<strong>in</strong> 70 (hsp70)and act<strong>in</strong>: although many sequences were available, these were not known to berelated until the structures of hsp70 and act<strong>in</strong> were solved and found to be verysimilar [B]. The converse - sequence similarity without structural similarity - hasnot been observed for anyth<strong>in</strong>g other than very short peptides [9]. The conclusionis that prote<strong>in</strong> structure is better conserved <strong>in</strong> evolution than prote<strong>in</strong> sequence, andsufficient overall sequence similarity between prote<strong>in</strong>s implies homology and asimilarity of conformation. This idea underlies the methods of prote<strong>in</strong> modell<strong>in</strong>g weshall discuss.2.1.3 A Summary of What Can and Cannot be PredictedFigure 2-1 conta<strong>in</strong>s a representation of the current state of the art of prote<strong>in</strong> structureprediction. Vertically we characterise the <strong>in</strong>put <strong>in</strong>formation: how much isknown about prote<strong>in</strong>s related to the target prote<strong>in</strong>. Horizontally we characterise theoutput <strong>in</strong>formation: the more <strong>in</strong>formation we have about relatives of the target sequence,the more - and the higher the quality - of the predictions we can make.Both <strong>in</strong>put (unknown sequence) and output (structure prediction) axes can bedivided <strong>in</strong>to two dist<strong>in</strong>ct parts:For an <strong>in</strong>put above the dashed horizon l<strong>in</strong>e, the sequence to be modelled can beshown to be homologous to at least one other sequence the 3-D structure of whichis known. An <strong>in</strong>put below the horizon l<strong>in</strong>e is homologous only to other sequencesthe structure of which are unknown, or no homologies are known at all.For an output to the left of the dashed vertical l<strong>in</strong>e 3-D structures are predictablewith some degree of confidence and accuracy. Right of the vertical l<strong>in</strong>e a l-D structure(i. e. secondary structure prediction), with perhaps h<strong>in</strong>ts of super-secondarystructure, is the best result that is likely to be obta<strong>in</strong>ed.The ma<strong>in</strong> message of this figure is that if the <strong>in</strong>put sequence cannot be l<strong>in</strong>ked toany prote<strong>in</strong> of known structure then no >l-D structure can be predicted with any


2 Modell<strong>in</strong>g Prote<strong>in</strong> Structures 13State of Prote<strong>in</strong>StructurePredictionSequence same as prote<strong>in</strong> ofknown sbucture <strong>in</strong> differentstate of ligation’ Sequence differs from prote<strong>in</strong>sN of known smcture only atp p<strong>in</strong>t mutationsUT Sequence differs Fromprote<strong>in</strong>s of known structureonly <strong>in</strong> loop regionsSE Sequence closely related ( ~ 0 %residue identity) to at least oneQ prote<strong>in</strong> of known structureUE Sequence distantly relsled toseveral prote<strong>in</strong>s of known structureNCE Squence related to many sequencesbut none of known structureSequence unrelated to any otherknown sequenceOUTPUT PREDICTIONHigh resolution Rough model General Fold Scoondary StructureCoord<strong>in</strong>ate Set (3-D Structure) (2-D Structure) (I-D Structure)--+Energy m<strong>in</strong>imization<strong>molecular</strong> dynamicsn-D prediction I 1-D predictiona--+Specialized homology modell<strong>in</strong>gConformational search<strong>in</strong>g----+ IGeneral Homology modell<strong>in</strong>g I--+ ’Structural <strong>in</strong>formation ‘thread<strong>in</strong>g’. Motifs (pmsite) Im m 1 1 1 m m m 1 1 m 1 1 =No Structural Information7====+--T---Predicted <strong>in</strong>teractions between Secondary Structuresecondary structural units I PredictionIIFigure 2-1. An overview of the state of the art <strong>in</strong> structure prediction. This figure is organised<strong>in</strong>to different categories of potential <strong>in</strong>put <strong>in</strong>formation and different categories of potentialoutput <strong>in</strong>formation. It describes the dependence of the quality and extent of detail of possible<strong>in</strong>ferences, on the <strong>in</strong>formation available when undertak<strong>in</strong>g a modell<strong>in</strong>g project. The terms 1-42-D and 3-D structure under “Output Prediction” refer to the <strong>in</strong>formation predicted: 1-Dmeans that only the structural state (e. g. a-helix, P-strand, turn, coil) of a residue is predicted.2-D means that a list of <strong>in</strong>teractions to each residue is predicted (e. g. contact map). 3-D meansthat the geometry of <strong>in</strong>teractions of each residue is predicted (e. g. full threedimensionalmodel). Note that the prediction of 1-D <strong>in</strong>formation may <strong>in</strong>volve contributions from 2-D <strong>in</strong>formation(e. g. Secondary Structure Prediction (I-D) <strong>in</strong>volves i - i - n . . . i - i + n terms).confidence. This is a restatement of the prote<strong>in</strong> fold<strong>in</strong>g problem outl<strong>in</strong>ed <strong>in</strong> Section2.1.2. What is <strong>in</strong>terest<strong>in</strong>g is how the dashed l<strong>in</strong>es are mov<strong>in</strong>g: For the <strong>in</strong>put thehorizontal is mov<strong>in</strong>g with respect to the proportion of the sequences of unknownstructure which lie on each side. Progress is be<strong>in</strong>g made <strong>in</strong> align<strong>in</strong>g all known prote<strong>in</strong>sequences to at least one known structure, as the sensitivity of the aligment programsimproves and the database of known structures <strong>in</strong>creases <strong>in</strong> size. A recent test caseanalysis of the 182 prote<strong>in</strong>s identified <strong>in</strong> Yeast chromosome I11 [lo, 111 showed that


14 Tim Ll? Hubbard and Arthur M. Lesk14 Yo of sequences could be associated with a prote<strong>in</strong> of known structure us<strong>in</strong>g standardsequence alignment methods.For the output, the divid<strong>in</strong>g l<strong>in</strong>e may soon become blurred if it becomes possibleto predict more than just secondary structure (1-D) when families of homologous sequencesare considered together. The Yeast chromosome I11 analysis found that 24%of sequences could be associated with an exist<strong>in</strong>g sequence family that had no knownstructure. As more prote<strong>in</strong>s are sequenced such families are com<strong>in</strong>g to have <strong>in</strong>creas<strong>in</strong>glylarge numbers of members with wider sequence diversity. S<strong>in</strong>ce related sequencesmay all be expected to adopt the same fold, any prediction must be consistentwith each sequence <strong>in</strong> such a family. This is a considerable restriction and hasallowed significant improvements <strong>in</strong> 1-D secondary structure prediction [12- 141, thelatter method be<strong>in</strong>g available to anyone with access to electronic mail (send “help”to Predictprote<strong>in</strong> @ embl-heidelberg-de). S<strong>in</strong>ce the number of natural folds isthought to be f<strong>in</strong>ite and may be as small as 1000 [I51 there will come a time whenall new sequences can be associated with a known prote<strong>in</strong> structure. There istherefore someth<strong>in</strong>g of a race between various methods - fold recognition versusfold prediction - that seek to elim<strong>in</strong>ate the current “unpredictable” region of sequencespace.Figure 2-1 does not <strong>in</strong>clude all prote<strong>in</strong> modell<strong>in</strong>g exercises, as it omits designedsequences. It is important to realise that even if methods to predict a structure consistentwith a large family of sequences are developed, this is not a solution of thefold<strong>in</strong>g problem. The assumptions that (1) any sequence folds and (2) folds aresimilar among homologous sequences are based on evolutionary reason<strong>in</strong>g, for sequencesthat do not fold would be selected aga<strong>in</strong>st and would not therefore beobserved by chance, and significant sequence homologies are only likely to occurthrough divergent evolution, i. e. from a s<strong>in</strong>gle fold. Designed sequences may not foldlike the sequence to which they appear to be related and <strong>in</strong> many cases may not foldat all. In order to be able predict the structure of a designed sequence it will benecessary to predict structure from <strong>in</strong>dividual sequences, ignor<strong>in</strong>g evolutionary relations,i.e. to solve the a priori fold<strong>in</strong>g problem [16].2.2 Prov<strong>in</strong>g a Sequence/Structural RelationshipThe first stage <strong>in</strong> any modell<strong>in</strong>g project should be to compare the sequence of theprote<strong>in</strong> of <strong>in</strong>terest with the contents of sequence databases. There are many sequencealignment programs available that can do this with vary<strong>in</strong>g speed and sensitivity. Theobjective is to f<strong>in</strong>d homologous sequences of known structure, but f<strong>in</strong>d<strong>in</strong>g anyhomologous sequence is useful s<strong>in</strong>ce it provides additional <strong>in</strong>formation about theprote<strong>in</strong> to be modelled.


2 Modell<strong>in</strong>g Prote<strong>in</strong> Structures 15The database of known structures, the PDB (Prote<strong>in</strong> Data Bank) [17, 181 conta<strong>in</strong>smore than 3000 experimentally determ<strong>in</strong>ed prote<strong>in</strong> structures (Jan95 release)although by sequence homology these can be clustered <strong>in</strong>to less than 400 dist<strong>in</strong>ctfamilies [19], and by structural superposition <strong>in</strong>to perhaps not more than 150 folds[20]. For each of these structures a file exists which conta<strong>in</strong>s all clearcut alignmentsbetween the sequence of that structure and all sequences <strong>in</strong> the prote<strong>in</strong> sequencedatabase Swissprot [21]. These “HSSP” files (homology-derived secondary structureof prote<strong>in</strong>s) [22] are available by anonymous FTP over <strong>in</strong>ternet from ftp.emb1-heidelberg.de. If the sequence <strong>in</strong> question is not listed <strong>in</strong> any HSSP file it does notnecessarily mean there is no relationship to any known structure: either the sequenceis too new to be <strong>in</strong> the version of Swissprot used to generate the HSSP files or anyhomology is too weak be identified by such a method. Clearcut homology is consideredto exist where more than 40% of residues are identical <strong>in</strong> both sequencesafter alignment. HSSP files conta<strong>in</strong> weaker homologies than this (down to around30%) although such alignments should be evaluated carefully. Still weakeralignments may be detected by other methods: The detection of weak homologiesby sequence methods alone is a science <strong>in</strong> itself. A wide number of methods areavailable but it requires experience to dist<strong>in</strong>guish a real alignment from a false, randomone [23, 241.To try to detect structural similarities where sequence homologies are near thenoise threshold, additional <strong>in</strong>formation must be <strong>in</strong>cluded <strong>in</strong> the alignment procedure.The two possible sources are multiple sequence <strong>in</strong>formation and structural<strong>in</strong>formation.Rather than try to detect overall homologies, an alternative approach is to lookfor conserved sequence motifs. These are short regions of conserved sequence andcan be found by exam<strong>in</strong><strong>in</strong>g a multiple sequence alignment. If a number of conservedmotifs can be found, a search “template” can be constructed, be<strong>in</strong>g a series ofmotifs l<strong>in</strong>ked by variable lengths of connect<strong>in</strong>g sequence. If an <strong>in</strong>put sequence is amember of a sequence family, a template can be constructed for that family and usedto search the sequence database look<strong>in</strong>g for a match to a sequence of known structure.There are also motif databases, collect<strong>in</strong>g <strong>in</strong> Amos Bairoch’s Prosite [25] thatcan be scanned with appropriate software [26]. An example of the use of this techniqueto produce a successful fold recognition and subsequent modell<strong>in</strong>g was therecognition that the HIV protease resembles half of an aspartic protease and as adimer has the same active site [27]. In this case, although it was only the active sitesequence Asp-Thr-Gly that was clearly a conserved motif, the prote<strong>in</strong> cha<strong>in</strong>s turnedout to have very similar folds.Nevertheless, a conserved motif or motifs does not prove a global structuralsimilarity and could even be a result of convergent rather than divergent evolution.For <strong>in</strong>stance, the GTP b<strong>in</strong>d<strong>in</strong>g site motif GxGxxG is common to a large number ofprote<strong>in</strong> families with folds of substantially different topology but which share a commonactive site [25]. Therefore identification of a prote<strong>in</strong> by a motif may permit <strong>in</strong>-


16 Tim J.P Hubbard and Arthur M. Leskferences about and even a model of a b<strong>in</strong>d<strong>in</strong>g site, but it may not be possible to extendthe model to the entire structure. In general, match<strong>in</strong>g folds based on smallfragments must be done with care.Although the use of sequence templates has made weak structural relationshipsdetectable, the weakness of the method is that it tends to concentrate on smallregions of the sequence and does not test if two sequences are likely to have the samefold. Recently it was realised that traditional sequence alignment ignores a lot ofpotential extra <strong>in</strong>formation if the structure correspond<strong>in</strong>g to one of the sequencesis known [28, 291. Rather than align<strong>in</strong>g sequences us<strong>in</strong>g global residue exchangematrices as is done <strong>in</strong> normal sequence alignment, different exchange matrices areused for different residue positions depend<strong>in</strong>g on structural environment [30, 311. Inmany ways this is similar to sequence template methods except that the substitutionpattern used at each position <strong>in</strong> the sequence is derived from the expected substitutionpattern for the structural environment at that po<strong>in</strong>t (e. g. helix/sheet/coil;buried/exposed; polar/nonpolar neighbours, Figure 2-2) rather than the observedsubstitution pattern at a position <strong>in</strong> a multiple sequence alignment. An extension ofthis approach is to consider residue-residue <strong>in</strong>teractions <strong>in</strong> the known structure <strong>in</strong> theform of a potential for further constra<strong>in</strong>ts on an alignment [32-341.\ la,,Fractionpolar 4 O emi-pda % kenvironmentburied: 'uIPartly exposed;Figure 2-2. Bowie, Luethy and Eisenberg [30] characterise the environments of residues <strong>in</strong>prote<strong>in</strong>s <strong>in</strong> three categories: the degree of their exposure to solvent, the polarity of the atomswith which they are <strong>in</strong> contact (six classes are shown here. secondary structure: helix, sheetand other. This gives a total of 3 X 6 = 18 classes. The statistical preference of certa<strong>in</strong> am<strong>in</strong>oacids for certa<strong>in</strong> classes can be applied to methods for identify<strong>in</strong>g fold<strong>in</strong>g patterns and detectionof errors <strong>in</strong> structures.


2 Modell<strong>in</strong>g Prote<strong>in</strong> Structures 17The use of structural <strong>in</strong>formation <strong>in</strong> sequence alignment can be thought of as“thread<strong>in</strong>g” a sequence of unknown structure onto the fold of the known structureand measur<strong>in</strong>g the quality of the fit [35, 361. Such techniques are more sensitive thantraditional sequence alignment techniques but it is not yet clear by how much. Atpresent no method can cluster folds <strong>in</strong> the structural database as well as searchesus<strong>in</strong>g structural superposition [37]. For example, a very common fold is that of thea/p (TIM) barrel. At least 20 examples of this fold exist, the relationship betweenwhich cannot be recognised by sequence alignment methods [38]. Although it seemslikely that at least some barrels may have evolved by convergent evolution [39], aperfect fold recognition algorithm should able to cluster them all together <strong>in</strong> anunambiguous way due to the particular symmetry of this fold. Although there hasbeen progress on this front [40] no method can currently achieve this without eithermiss<strong>in</strong>g some or <strong>in</strong>clud<strong>in</strong>g unrelated folds.If after use of these methods an <strong>in</strong>put sequence can be l<strong>in</strong>ked to a known structure,prediction of the structure can proceed through homology modell<strong>in</strong>g (Section2.3). If not, pure structural prediction methods must be used (Section 2.4). Inmany cases it will be unclear whether a fold has been recognised or not. The onlysolution is to try to build a model based on the presumed structural similarity andthen attempt to test its validity (<strong>in</strong> a similar way that a designed sequence is testedfor compatibility with the fold it is meant to adopt (Section 2.4.2)). Even then theresults may well be ambiguous. A current example of such uncerta<strong>in</strong>ty is the modelof Hsp70 C-term<strong>in</strong>al doma<strong>in</strong> based on the HLA b<strong>in</strong>d<strong>in</strong>g site [41]. Despite considerabledetailed analysis it rema<strong>in</strong>s difficult to decide, without an experimentalstructure determ<strong>in</strong>ation, whether this model is right or wrong.2.3 Modell<strong>in</strong>g Start<strong>in</strong>g from a Known Structure2.3.1 Evolution of Prote<strong>in</strong> StructuresIn order to understand the possibilities and limitations of model-build<strong>in</strong>g byhomology, it is necessary to appreciate the k<strong>in</strong>ds of structural changes that occur asprote<strong>in</strong>s diverge. Numerous studies analys<strong>in</strong>g the structural relationships betweenrelated prote<strong>in</strong>s [42], and the dependence of structural divergence on sequencedivergence [6, 71, provide us with the ability to estimate quantitatively how successfula model-build<strong>in</strong>g exercise can be, know<strong>in</strong>g how closely the target prote<strong>in</strong> and itshomologues are related.Natural variations <strong>in</strong> families of homologous prote<strong>in</strong>s reveal how the structuresaccomodate changes <strong>in</strong> am<strong>in</strong>o acid sequence. There are several accepted measures of


18 Tim JI? Hubbard and Arthur M. Leskdivergence of sequences; we have used the percent identical residues <strong>in</strong> the alignmentof the sequences. There is the correspond<strong>in</strong>g problem of calibrat<strong>in</strong>g a measure ofthe similarity of two or more structures, or portions of structures. A useful mathematicaltechnique is to determ<strong>in</strong>e the optimal “least-squares” superposition of a pairof structures or parts of structures. By this we mean the follow<strong>in</strong>g: We fix the positionand orientation of one of the structures, and vary the position and orientationof the other to f<strong>in</strong>d the m<strong>in</strong>imum value of the sum of the squares of the distancesbetween the correspond<strong>in</strong>g atoms. The square root of the average value of thesquared distances between correspond<strong>in</strong>g atoms is the root-mean-square (r. m. s.)deviation. If the two objects were precisely congruent, it would be possible tosuperimpose them exactly, and the r. m. s. deviation would be zero. In real cases, the“fit” of two nonidentical structures is never exact, and the m<strong>in</strong>imal r. m. s. deviationis a quantitative measure of the structural difference.Included <strong>in</strong> the approximately 3000 prote<strong>in</strong> structures now known are severalmembers of families <strong>in</strong> which the molecules ma<strong>in</strong>ta<strong>in</strong> the same basic fold<strong>in</strong>g patternover ranges of sequence homology from near-identity down to below 20%. In bothclosely and distantly related prote<strong>in</strong>s the general response to mutation is conformationalchange. The ma<strong>in</strong>tenance of function <strong>in</strong> widely divergent sequences requiresthe <strong>in</strong>tegration of the response to mutations over all or at least a large portion ofthe molecule.It is the ability of prote<strong>in</strong> structures to accommodate mutations <strong>in</strong> nonfunctionalresidues that permits a large amount of apparently nonadaptive change to occur.Residues active <strong>in</strong> function, such as the proximal histid<strong>in</strong>e of the glob<strong>in</strong>s or thecatalytic ser<strong>in</strong>e, histid<strong>in</strong>e and aspartate of the ser<strong>in</strong>e proteases, are resistant to mutationbecause chang<strong>in</strong>g them would <strong>in</strong>terfere, explicitly and directly, with function.Most buried residues are <strong>in</strong> the well-packed <strong>in</strong>terfaces between helices and sheets.Dur<strong>in</strong>g the course of evolution, the buried residues rema<strong>in</strong> hydrophobic, but canchange size. Mutations that change the volumes of buried residues generally do notchange the conformations of <strong>in</strong>dividual helices or sheets, but produce distortions oftheir spatial assembly. These tend to take the form of rigid-body shifts and rotational,which may be as large as 7 A, but more typically are 3-5 A. Surface residuesnot <strong>in</strong>volved <strong>in</strong> function are usually free to mutate. Loops on the surface can oftenaccomodate changes by local refold<strong>in</strong>g 1431.The nature of the forces that stabilise prote<strong>in</strong> structures sets general limitationson these conformational changes; other constra<strong>in</strong>ts derived from function varyfrom case to case. In some prote<strong>in</strong> families large movements are coupled to conservethe structure of the active site (e.g., the glob<strong>in</strong>s); <strong>in</strong> others, active sites of alternativestructure are found (e. g., cytochromes c). In prote<strong>in</strong>s that for functional reasonscannot tolerate conformational change - such as those with multiple b<strong>in</strong>d<strong>in</strong>g sitesthat must ma<strong>in</strong>ta<strong>in</strong> a relative spatial disposition, or those that must ma<strong>in</strong>ta<strong>in</strong> a surface<strong>in</strong>volved <strong>in</strong> complex formation - am<strong>in</strong>o acid sequences are more highly conserved.


2 Modell<strong>in</strong>g Prote<strong>in</strong> Structures 19Families of related prote<strong>in</strong>s tend to reta<strong>in</strong> similar fold<strong>in</strong>g patterns. If one exam<strong>in</strong>essets of related prote<strong>in</strong>s (see Figures 2-3 and 2-4) it is clear that although the generalfold<strong>in</strong>g pattern is preserved, there are distortions which <strong>in</strong>crease progressively as theam<strong>in</strong>o acid sequences diverge. These distortions are not uniformly distributedthroughout the structure. Instead, <strong>in</strong> any family of prote<strong>in</strong>s there is a core of thestructure that reta<strong>in</strong>s the same qualitative fold, and other parts of the structure thatchange conformation radically. To expla<strong>in</strong> the idea of the common core of two structures,consider the letters B and R. Considered as structures they have a commoncore which corresponds to the letter P. Outside the common core they differ: at thebottom right B has a loop and R has a diagonal stroke.Figure 2-3. -0 closely-related prote<strong>in</strong>s: (a) act<strong>in</strong>id<strong>in</strong> (crystal structure by E. N. Baker andE. J. Dodson [go]) and (b) papa<strong>in</strong> (crystal structure by I. G. Kamphuis et al. [91]). The am<strong>in</strong>oacid sequences of these molecules have about 50 070 identical residues.Figure 2-4. Wo distantly-related prote<strong>in</strong>s: (a) poplar leaf plastocyan<strong>in</strong> (crystal structure by J.M. Guss and H. C. Freeman [92]) and (b) A. denitrificuns azur<strong>in</strong> (crystal structure byG. E. Norris, B. F. Anderson and E. N. Baker [93]). The circle near the top of the structuremarks the position of the copper. In this case the double P-sheet portion of these moleculesreta<strong>in</strong>s the same fold, but the long loop at the left changes its conformation completely.


20 Tim LJ? Hubbard and Arthur M. LeskIt should be emphasised that, only the residues of the core can be aligned, thatis, there is some correct residue-residue correspondence that l<strong>in</strong>ks residues with thesame structural context <strong>in</strong> two prote<strong>in</strong>s. If one deals with sequences only, the standardsequence alignment procedures will reproduce this correct structure alignmentprovided the sequences are sufficiently closely related. However, an error that is oftenmade is to suggest, on the basis of sequences alone, that residues outside the core“align poorly”. What is <strong>in</strong> fact happen<strong>in</strong>g is that the residues outside the core cannotbe aligned at all, because so much <strong>in</strong>sertion and deletion has taken place that thetrace of evolution has been entirely obscured. The examples of plastocyan<strong>in</strong> andazur<strong>in</strong> illustrate this po<strong>in</strong>t well. Of course, without analysis of the structures, this isnot easy to detect, but it would be well if <strong>molecular</strong> biologists would stop th<strong>in</strong>k<strong>in</strong>gabout “well-align<strong>in</strong>g” regions and “poorly-align<strong>in</strong>g” regions, but about “alignable”and “nonalignable” regions.Figure 2-3, show<strong>in</strong>g act<strong>in</strong>id<strong>in</strong> and papa<strong>in</strong>, illustrates two structures that are quiteclosely related. The sequences of these molecules have 49% residue identity <strong>in</strong> thecommon core. The common core consists of almost the entire structure except forsmall loop regions on the surface. The structural deviation is very small: the Caatoms of the residues of the common core can be superposed to with<strong>in</strong> an averagedeviation of 0.77 A.Figure 2-4, show<strong>in</strong>g plastocyan<strong>in</strong> and azur<strong>in</strong>, shows two distantly-related prote<strong>in</strong>s.In this case the common core is limited to less than 50% of the structure. Itis clear that the long loop at the left has entirely refolded. (The fact that this regionconta<strong>in</strong>s a helix <strong>in</strong> each molecule does not imply that the helices are homologous:<strong>in</strong> fact they are <strong>in</strong>dependent.) Nevertheless, the selective constra<strong>in</strong>t on function haspreserved the geometry of the copper-b<strong>in</strong>d<strong>in</strong>g site.Systematic studies of the structural differences between pairs of related prote<strong>in</strong>shave def<strong>in</strong>ed a quantitative relationship between the divergence of am<strong>in</strong>o acid sequenceof the core of a family of structures and the divergence of structure. As thesequence diverges, there are progressively <strong>in</strong>creas<strong>in</strong>g distortions <strong>in</strong> the ma<strong>in</strong> cha<strong>in</strong>conformation, and the fraction of the residues <strong>in</strong> the core usually decreases. Untilthe fraction of identical residues <strong>in</strong> the sequence drops below about 40-50%, theseeffects are relatively modest: almost all the structure rema<strong>in</strong>s <strong>in</strong> the core, and thedeformation of the ma<strong>in</strong> cha<strong>in</strong> atoms are on the average no more than 1.0 A. Act<strong>in</strong>id<strong>in</strong>and papa<strong>in</strong> illustrate this regime (Figure 2-3). With <strong>in</strong>creas<strong>in</strong>g sequencedivergence, some regions refold entirely, reduc<strong>in</strong>g the size of the core, and the distortionsof the residues rema<strong>in</strong><strong>in</strong>g with<strong>in</strong> the core <strong>in</strong>crease <strong>in</strong> magnitude. Plastocyan<strong>in</strong>and azur<strong>in</strong> illustrate this effect (Figure 2-4).Figure 2-5 shows results from compar<strong>in</strong>g pairs of homologous prote<strong>in</strong>s fromrelated families, <strong>in</strong>clud<strong>in</strong>g glob<strong>in</strong>s, cytochromes-c, immunoglobul<strong>in</strong> doma<strong>in</strong>s, ser<strong>in</strong>eproteases, lysozymes, sulphydryl proteases, dihydrofolate reductases, and plastocyan<strong>in</strong>-azur<strong>in</strong>.Each po<strong>in</strong>t corresponds to a pair of prote<strong>in</strong>s: After determ<strong>in</strong><strong>in</strong>g thecore of the structure, the number of identical residues <strong>in</strong> the aligned sequences of


2 Modell<strong>in</strong>n Prote<strong>in</strong> Structures 21the core was counted, and the root-mean-square deviation of the ma<strong>in</strong> cha<strong>in</strong> atomsof the core was calculated. (The po<strong>in</strong>ts correspond<strong>in</strong>g to 100% residue identity areprote<strong>in</strong>s for which the structure was determ<strong>in</strong>ed <strong>in</strong> two or more crystal environments,and the deviations show that crystal pack<strong>in</strong>g forces can modify slightly the conformationof the prote<strong>in</strong>s.) Figure 2-6 shows the changes <strong>in</strong> the fraction of residues <strong>in</strong>'? 2.4-c0cm>2m:3c 1.2-mc2 0 0.6-80.0 1I I I I 100 80 60 40 20 0Percent residue identityFigure 2-5. The relationship between the divergence of the am<strong>in</strong>o-acid sequence of the coreof related prote<strong>in</strong>s and the divergence of the ma<strong>in</strong> cha<strong>in</strong> conformation of the core.c%o I 80 ' 60 I 40 I 2bSequence identity ("YO)' 0 'Figure 2-6. The relationship between the divergence of the am<strong>in</strong>o-acid sequence of the coreof related prote<strong>in</strong>s and the relative size of the core.


22 Tim .l I! Hubbard and Arthur M. Leskthe core as a function of sequence divergence. In pairs of distantly related prote<strong>in</strong>sthe size of the cores can vary: In some cases the fraction of residues <strong>in</strong> the core rema<strong>in</strong>shigh, <strong>in</strong> others it can drop to below 50% of the structure.2.3.2 TechniquesA general outl<strong>in</strong>e of the steps <strong>in</strong>volved <strong>in</strong> model build<strong>in</strong>g by homology is as follows:1. The sequence of unknown structure is aligned to the sequence(s) of known structureand the sequence of unknown structure is divided <strong>in</strong>to SCR’s and SVR’s:regions where the alignment has sufficient sequence conservation to be conservedstructurally (alignable) are def<strong>in</strong>ed as SCR’s (Structurally Conserved Regions).The rema<strong>in</strong><strong>in</strong>g regions (nonalignable - <strong>in</strong>clud<strong>in</strong>g but not restricted to loopregions) are def<strong>in</strong>ed as SVR’s (Structurally Variable Regions).2. The ma<strong>in</strong> cha<strong>in</strong> conformation and spatial relationship of the SCR’s are takenfrom the coord<strong>in</strong>ates of the known structure to which they were aligned. Conformationsare generated for each SVR <strong>in</strong> the sequence, with correct endpo<strong>in</strong>tgeometry and length, which do not clash sterically with the rest of the structure,either us<strong>in</strong>g a database search method [44] or any alternative approach. Thiscreates a complete cont<strong>in</strong>uous ma<strong>in</strong> cha<strong>in</strong> model.3. Side cha<strong>in</strong>s are built onto the ma<strong>in</strong> cha<strong>in</strong> model and their conformations optimised.2.3.2.1 Alignment and Division <strong>in</strong>to SCR’s and SVR’sDiscover<strong>in</strong>g a relationship between an <strong>in</strong>put sequence and a structure does notnecessarily give a full or accurate alignment. Frequently more sensitive alignmenttechniques not designed for fold recognition can give a more accurate alignmentonce the sequences to align have been identified. It is important to keep the lessonsfrom evolution <strong>in</strong> m<strong>in</strong>d (Section 2.3.1): There will be regions where there are severalpossible alternative alignments and there will be regions that cannot be aligned(nonalignable) because they are structurally different.The first of these problems can be tackled by us<strong>in</strong>g as much <strong>in</strong>formation as possible(multiple sequence alignments for both known and unknown) and by explor<strong>in</strong>gsignificant alternative sub-optimal alignments [45, 461, and if necessary, by build<strong>in</strong>gmultiple structural models us<strong>in</strong>g alternative alignments.The second problem is to dist<strong>in</strong>guish the regions <strong>in</strong> which the <strong>in</strong>put sequence hasthe same fold as the model (SCR’s) and where it is different (SVR’s). Clearly wherethere are <strong>in</strong>sertions and deletions the cha<strong>in</strong> trace of the model must be different;


2 Modell<strong>in</strong>g Prote<strong>in</strong> Structures 23however, regions on either side of any deletion may also have a changed conformation.Secondary structure prediction (SSP) may be useful at this po<strong>in</strong>t to see whetherthere is a strong change <strong>in</strong> the prediction at any po<strong>in</strong>t where the alignment is weak.Regions that have been identified as SVR’s must be predicted by methods described<strong>in</strong> the next section, as “loops” connect<strong>in</strong>g the two cha<strong>in</strong> ends of the preced<strong>in</strong>g andfollow<strong>in</strong>g SCR’s.2.3.2.2 Modell<strong>in</strong>g Loop RegionsThe term “loops” refers to sections of the polypeptide cha<strong>in</strong> that connect regionsof secondary structure. Frequently, helices and strands of sheet run across a prote<strong>in</strong>or doma<strong>in</strong> from one surface to another, and loops are characterised by (a) appear<strong>in</strong>gon the surfaces of prote<strong>in</strong>s and (b) revers<strong>in</strong>g the direction of the cha<strong>in</strong>. A typicalglobular prote<strong>in</strong> conta<strong>in</strong>s one third of its residues <strong>in</strong> loops.In model-build<strong>in</strong>g by homology, loops often present special problems becausethey are often the sites of <strong>in</strong>sertion and deletions. Frequently the residues cannot bealigned with those of the parent molecule because of this. Special techniques havetherefore been developed to build loops, assum<strong>in</strong>g that the core of the target prote<strong>in</strong>has already been modelled.Hairp<strong>in</strong> loops (those that connect successive strands of antiparallel P-sheet) havebeen studied extensively to classify them and to elucidate the determ<strong>in</strong>ants of theirconformations [47-571. Most residues of prote<strong>in</strong>s have their ma<strong>in</strong> cha<strong>in</strong>s <strong>in</strong> one oftwo sterically-favourable conformations (these correspond to the conformations ofa-helices and P-sheets). However, <strong>in</strong> order for a short region of polypeptide cha<strong>in</strong>3-4 residues <strong>in</strong> length to reverse direction, and fold back on itself to form a loop,a residue that takes up a conformation outside these usual states is generally required.The conformations of short loops therefore depend primarily on the positionwith<strong>in</strong> the loop of special residues - usually Gly, Asn or Pro - that allow the cha<strong>in</strong>to take up an unusual conformation. As po<strong>in</strong>ted out by Sibanda and Thornton [55],the conformation of a short hairp<strong>in</strong> can often be deduced from the position <strong>in</strong> thesequence of such special residues.These general rules are however of limited utility for the understand<strong>in</strong>g andprediction of the conformations of many functionally important loop regions ; for<strong>in</strong>stance the antigen-b<strong>in</strong>d<strong>in</strong>g loops of immunoglobul<strong>in</strong>s. Many loops are not short,or not hairp<strong>in</strong>s, or neither; and the determ<strong>in</strong>ants of their conformations are not entirely<strong>in</strong>tr<strong>in</strong>sic to the am<strong>in</strong>o acid sequence of the loop itself, but <strong>in</strong>volve tertiary <strong>in</strong>teractions: hydrogen bond<strong>in</strong>g and pack<strong>in</strong>g. Indeed, even for some short hairp<strong>in</strong>s, tertiary<strong>in</strong>teractions can override the predisposition of the sequence, to determ<strong>in</strong>e a conformationof the loop that does not follow these sequence-structure correlations. Anexample important <strong>in</strong> immunoglobul<strong>in</strong> structure is the second hypervariable region


24 Tim 1 F! Hubbard and Arthur M. Leskof the VH doma<strong>in</strong> (H2). The size of the residue at site 71, a site <strong>in</strong> the conserved0-sheet of the VH doma<strong>in</strong>, is a major determ<strong>in</strong>ant of the conformation and positionof this loop [58].Several general methods have been developed for prediction of the conformationsof loops <strong>in</strong> prote<strong>in</strong>s. The antigen-b<strong>in</strong>d<strong>in</strong>g loops of antibodies have received specialattention, and some special methods have been developed for them [52].Prediction of Loop Conformations by Energy Calculations. The ma<strong>in</strong> cha<strong>in</strong> conformationof a loop attached to a given framework must obey the constra<strong>in</strong>t that thecha<strong>in</strong> must connect two fixed endpo<strong>in</strong>ts us<strong>in</strong>g a specified number of residues. Forloops of fewer than about six residues, it is possible to enumerate a fairly completeset of ma<strong>in</strong> cha<strong>in</strong> and side cha<strong>in</strong> conformations that bridge the given endpo<strong>in</strong>ts anddo not make steric collisions with<strong>in</strong> the loop or between the loop and the rest of themolecule. The search procedure can be f<strong>in</strong>e enough to be sure to produce a loop closeto the correct one.However, there are <strong>in</strong> general many possible loops of different <strong>in</strong>ternal conformationsthat bridge a given pair of endpo<strong>in</strong>ts. To choose one of them as the predictedconformation, it is possible to estimate conformational energies and evaluate the accessiblesurface areas of each loop - <strong>in</strong> the context of the rema<strong>in</strong>der of the prote<strong>in</strong>- and set criteria for select<strong>in</strong>g the one that appears the most favourable. Typicalconformational energy calculations <strong>in</strong>clude terms represent<strong>in</strong>g hydrogen bond<strong>in</strong>g,van der Waals, and electrostatic <strong>in</strong>teractions. Accessible surface area calculationsgive estimates of the <strong>in</strong>teraction between the prote<strong>in</strong> and the solvent. This is <strong>in</strong> pr<strong>in</strong>ciplea completely general, automatic and objective procedure.Procedures for conformation generation and evaluation have been implemented<strong>in</strong> a number of <strong>computer</strong> programs, of which the best known is CONGEN, by Bruccoleriand Karplus [59]. (Other similar procedures have been developed by F<strong>in</strong>e etal. [60], and by Moult and James [61].) An application to predict<strong>in</strong>g all six antigenb<strong>in</strong>d<strong>in</strong>gloops of McPC603 and HyHELS, based on the program CONGEN, hasbeen described by Bruccoleri, Haber and Novotny [62]. The CONGEN proceduregenerates conformations for a s<strong>in</strong>gle loop, and calculates energies of that loop <strong>in</strong> thecontext of the fixed portion of the structure. To apply this procedure to the predictionof several loops - for <strong>in</strong>stance the six loops of an antigen-b<strong>in</strong>d<strong>in</strong>g site - a protocolmust be used that <strong>in</strong>volves a sequential prediction of the loops, start<strong>in</strong>g withthe loops that <strong>in</strong>teract primarily with the known parts of the molecule, and then proceed<strong>in</strong>gto the loops that <strong>in</strong>teract with each other.Prediction of Loop Conformations by Data Base Screen<strong>in</strong>g. Jones and Thirup [44]developed a method of build<strong>in</strong>g loops, based on select<strong>in</strong>g from prote<strong>in</strong>s <strong>in</strong> thedatabase of known structures loops that span the given endpo<strong>in</strong>ts and overlap withpeptides at the loop term<strong>in</strong>i. Vpically, a user selects the endpo<strong>in</strong>ts and <strong>in</strong>itiates adata base search. The results are displayed <strong>in</strong>teractively at a graphics term<strong>in</strong>al.


2 Modell<strong>in</strong>g Prote<strong>in</strong> Structures 25Wo general possibilities may arise: All the ma<strong>in</strong> cha<strong>in</strong>s of the loops found liewith<strong>in</strong> a narrow “sheaf” of trajectories between the fixed endpo<strong>in</strong>ts. Provided thecommon structure thus <strong>in</strong>dicated does not have steric clashes with the rest of theprote<strong>in</strong>, one can adopt the ma<strong>in</strong> cha<strong>in</strong> with some confidence as an approximatemodel for the target loop. In many cases there is conservation of a special residue- such as Gly, Asn or Pro - that is responsible for the conformation of theloop.Alternatively, the loops retrieved from the data base may “fan out” broadly. Inthis case, the selection of the model is more hazardous. One can look for the presenceof special residues - aga<strong>in</strong>, Gly, Asn or Pro - at the same positions as <strong>in</strong> the targetstructure. Alternatively, it has been attempted to determ<strong>in</strong>e the conformationalenergies of the loops to select the best one.Database search<strong>in</strong>g is the most widely available method for loop build<strong>in</strong>g. It isa facility of a large number of <strong>computer</strong> graphics programs as a loop-build<strong>in</strong>g optionand has been <strong>in</strong>corporated <strong>in</strong>to many automatic model build<strong>in</strong>g programs such asComposer [63].A particular problem that arises <strong>in</strong> build<strong>in</strong>g a model by graft<strong>in</strong>g loops <strong>in</strong>to amodel of a set of SCR’s is the possibility of error at the junctions between loops andSCR’s. It is useful to apply programs that build a ma<strong>in</strong> cha<strong>in</strong> with CP atoms directlyfrom Ca coord<strong>in</strong>ates alone, by automatic cha<strong>in</strong> fitt<strong>in</strong>g procedures [37, 64, 651. Onemay test a model of the ma<strong>in</strong> cha<strong>in</strong> of an entire structure for “self-consistency”, byextract<strong>in</strong>g the Ca’s, rebuild<strong>in</strong>g the complete backbone from them, and compar<strong>in</strong>gwith the start<strong>in</strong>g model. A large number of ma<strong>in</strong> cha<strong>in</strong> peptide “flips” between theorig<strong>in</strong>al and automatically generated ma<strong>in</strong> cha<strong>in</strong> may show errors <strong>in</strong> the orig<strong>in</strong>alassumptions from which the model was built. (Such procedures are also useful <strong>in</strong>ref<strong>in</strong>ement of models dur<strong>in</strong>g structure determ<strong>in</strong>ations.)Special-Purpose Technique for Antigen-Build<strong>in</strong>g Loops of Immunoglobul<strong>in</strong>s. Analysisof the antigen-b<strong>in</strong>d<strong>in</strong>g loops <strong>in</strong> known structures has shown that the ma<strong>in</strong> cha<strong>in</strong>conformations are determ<strong>in</strong>ed by a few particular residues and that only theseresidues, and the overall length of the loop, need to be conserved to ma<strong>in</strong>ta<strong>in</strong> theconformation of the loop [52]. The conserved residues may be those that can adoptspecial ma<strong>in</strong> cha<strong>in</strong> conformations - Gly, Asn or Pro - or that form specialhydrogen-bond<strong>in</strong>g or pack<strong>in</strong>g <strong>in</strong>teractions. Other residues <strong>in</strong> the sequences of theloops are thus left free to vary, to modulate the surface topography and chargedistribution of the antigen-b<strong>in</strong>d<strong>in</strong>g site.The ability to isolate the determ<strong>in</strong>ants of loop conformation <strong>in</strong> a few particularresidues <strong>in</strong> the sequence makes it possible to analyse the distribution of loop conformations<strong>in</strong> the many known immunoglobul<strong>in</strong> sequences [66]. It appears that at leastfive of the hypervariable regions of antibodies have only a few ma<strong>in</strong> cha<strong>in</strong> conformationsor “canonical structures”. Most sequence variations only modify the surfaceby alter<strong>in</strong>g the side cha<strong>in</strong>s on the same canonical ma<strong>in</strong> cha<strong>in</strong> structure. Sequence


26 Tim .l I! Hubbard and Arthur M. Leskchanges at a few specific sets of positions switch the ma<strong>in</strong> cha<strong>in</strong> to a differentcanonical conformation.As an example Figure 2-7 shows the L3 loop from VK McPC603. In this, the mostcommon VK L3 conformation, there is a prol<strong>in</strong>e at position 95 <strong>in</strong> the loop, <strong>in</strong> a cisconformation. Hydrogen bonds between the side cha<strong>in</strong> of the residue at position 90,just N-term<strong>in</strong>al to the loop, and the ma<strong>in</strong> cha<strong>in</strong> atoms of residues <strong>in</strong> the loop,stabilise the conformation. The side cha<strong>in</strong> is an Asn <strong>in</strong> McPC603; it can also be aGln or His <strong>in</strong> other VK cha<strong>in</strong>s. The comb<strong>in</strong>ation of the polar side cha<strong>in</strong> at position90 and the prol<strong>in</strong>e at position 95 constitute the “signature” of this conformation <strong>in</strong>this loop, from which it can be recognised <strong>in</strong> a sequence of an immunoglobul<strong>in</strong> ofunknown structure.bFigure 2-7. An antigen-b<strong>in</strong>d<strong>in</strong>g loop from the VK doma<strong>in</strong> of the immunoglobul<strong>in</strong> McPC603.This loop conta<strong>in</strong>s a cis-prol<strong>in</strong>e, and is stabilised by hydrogen bond<strong>in</strong>g between a polar sidecha<strong>in</strong> just N-term<strong>in</strong>al to the loop and <strong>in</strong>ward-po<strong>in</strong>t<strong>in</strong>g ma<strong>in</strong> cha<strong>in</strong> atoms <strong>in</strong> the loop.bThe observed conformations are determ<strong>in</strong>ed by the <strong>in</strong>teractions of a few residuesat specific sites <strong>in</strong> the hypervariable regions and, for certa<strong>in</strong> loops, <strong>in</strong> the frameworkregions. Hypervariable regions that have the same conformations <strong>in</strong> different immunoglobul<strong>in</strong>shave the same or very similar residues at these sites. On the basis ofthe canonical structural model, it has been possible to create a detailed roster of thecanonical conformations of each loop - with the possible exception of H3 whichis more complicated and still uncerta<strong>in</strong> - and the sets of “signature” residues thatpermit discrim<strong>in</strong>ation among them.A procedure to predict the structures of the variable doma<strong>in</strong>s of immunoglobul<strong>in</strong>shas been formulated based on the structures of solved immunoglobul<strong>in</strong>s and thecanonical structure model of the conformations of the hypervariable loops [65].


2 Modell<strong>in</strong>g Prote<strong>in</strong> Structures 271. Align the sequence of the VL and VH cha<strong>in</strong>s of the target immunoglobul<strong>in</strong> withthe sequences of the correspond<strong>in</strong>g doma<strong>in</strong>s <strong>in</strong> the known immunoglobul<strong>in</strong> structures.2. For each doma<strong>in</strong> (VL and VH), select a parent doma<strong>in</strong> from among the correspond<strong>in</strong>gdoma<strong>in</strong>s of known structure. The percent residue identity with thetarget doma<strong>in</strong> is usually <strong>in</strong> the range 45 070 and 85 070.3. If the selected parent structures for VL and VH doma<strong>in</strong>s come from different immunoglobul<strong>in</strong>s,pack them together by a least-squares fit of the ma<strong>in</strong> cha<strong>in</strong> atomsof residues conserved <strong>in</strong> the VLVH <strong>in</strong>terface.4, Identify the canonical structure of each loop by check<strong>in</strong>g the sequence for theparticular sets of residues that form the signature of each canonical structure. H3is a special case, far more variable <strong>in</strong> length, sequence and structure; and mustbe modelled by other methods.5. Graft a loop from a known immunoglobul<strong>in</strong> structure - preferably the doma<strong>in</strong>from which the framework was built - <strong>in</strong>to the framework model.6. If a canonical structure for any loop cannot be identified, the loop must bemodelled by other means.7. To build the side cha<strong>in</strong>s: At sites where the parent structure and the model havethe same residue, reta<strong>in</strong> the conformation of the parent structure. If the side cha<strong>in</strong>is different, take its conformation, if possible, from an immunoglobul<strong>in</strong> hav<strong>in</strong>gthe same residue <strong>in</strong> the correspond<strong>in</strong>g position; with<strong>in</strong> hypervariable loops, takethe side cha<strong>in</strong> conformation only from a loop with the same canonical structure.8. Subject the model to limited energy ref<strong>in</strong>ement, only to tidy up the stereochemistry.How good a model can be expected from this procedure, assum<strong>in</strong>g the hypothesisthat for the three loops of the light cha<strong>in</strong> and for the first two of the heavy cha<strong>in</strong>a canonical structure present <strong>in</strong> the data base can be identified?The first of several “bl<strong>in</strong>d” tests was made on the antilysozyme antibody D1.3[6]. Comparison of this prediction with the best available crystal structure of D1.3has shown that all six hypervariable regions had the predicted ma<strong>in</strong> cha<strong>in</strong> conformations.Other tests are described <strong>in</strong> Chothia et al. [66]. The general conclusion is thatif the structures used as parent structures for the two doma<strong>in</strong>s and the loops are highresolution, well-ref<strong>in</strong>ed structures, one can expect the backbone of the frameworkto be correct with<strong>in</strong> 1.0 A r. m. s. deviation, and the backbone of the predicted loops,not <strong>in</strong>clud<strong>in</strong>g the special case of H3, to differ by about 0.7 A r.m.s. deviation onaverage, and by no more than 1.0-1.2 A <strong>in</strong> all cases. In addition, one can expect thepositions of Ca atoms of residues <strong>in</strong> the loops to shift, relative to the frameworksof VL and VH doma<strong>in</strong>s, by 1.0-2.0 A typically and by up to 3 A <strong>in</strong> the worst cases.


28 Tim Jl? Hubbard and Arthur M. Lesk2.3.2.3 Side Cha<strong>in</strong> Build<strong>in</strong>g and Optimisation of Side Cha<strong>in</strong>ConformationOnce a complete ma<strong>in</strong> cha<strong>in</strong> model has been constructed side cha<strong>in</strong>s need to be builtand their conformations determ<strong>in</strong>ed.For closely-related prote<strong>in</strong>s, it is observed that most side cha<strong>in</strong>s tend to reta<strong>in</strong> conformation- even mutated ones. This is because each side cha<strong>in</strong>, even those on thesurface, is packed <strong>in</strong> a cage formed by its neighbours. In closely-related prote<strong>in</strong>s, amutated side cha<strong>in</strong> is likely to f<strong>in</strong>d itself <strong>in</strong> a cage created largely by nonmutatedneighbours, and must conform itself to it. Therefore the first approximation shouldbe: for side cha<strong>in</strong>s that have not been changed from the parent structure, reta<strong>in</strong> thesame conformation; for mutated side cha<strong>in</strong>s, reta<strong>in</strong> the same conformation as faras the stereochemical similarity will allow. Of course application of this rule will producesome sterically impossible comb<strong>in</strong>ations.An essential step therefore is to adjust the side cha<strong>in</strong> conformations to achievea low-energy conformation. There are now a large number of programs available tocarry out such build<strong>in</strong>g and pack<strong>in</strong>g automatically [37, 64, 65, 671 which are probablymore accurate (and of course much quicker) than manual manipulation [68].Currently available procedures for automatic side cha<strong>in</strong> modell<strong>in</strong>g and pack<strong>in</strong>gperform quite well when start<strong>in</strong>g from experimental ma<strong>in</strong> cha<strong>in</strong> atoms or Ca’s. Itis not so clear what happens as the position of the ma<strong>in</strong> cha<strong>in</strong> <strong>in</strong> the model becomesless and less accurate (as the closeness of the relationship between the <strong>in</strong>put sequenceand the sequence of known structure decreases). Exact Ca/ma<strong>in</strong> cha<strong>in</strong> positions mayforce a unique side cha<strong>in</strong> pack<strong>in</strong>g. In contrast, <strong>in</strong> a real modell<strong>in</strong>g situation Cdma<strong>in</strong>cha<strong>in</strong> positions will be <strong>in</strong>exact, and pack<strong>in</strong>g errors are more likely. In particular, the<strong>in</strong>correct position<strong>in</strong>g of a large buried hydrophobic residue can result <strong>in</strong> seriouspack<strong>in</strong>g errors with<strong>in</strong> a whole region of the hydrophobic core of a prote<strong>in</strong>.F<strong>in</strong>ally, once all atoms have been built, the model can be subjected to EnergyM<strong>in</strong>imisation (EM) or Molecular Dynamics (MD). EM is a purely cosmetic operation.It will remove some bad atom contacts <strong>in</strong> a model but will not significantly altereven side cha<strong>in</strong> conformations. It can be useful however to “clean up” a model bysmall local adjustments. MD can be used to explore a much greater conformationalspace around the model structure than EM. If the changes <strong>in</strong> conformation are tobe at all realistic, it is necessary to simulate <strong>in</strong> the presence of water. However, it isimportant to realise that if the start<strong>in</strong>g model has substantial errors, even a very longMD run is very unlikely to improve it. Perhaps the most useful result of such simulationsis to observe how a model moves with time. If the model is wrong it is likelyto be more unstable when subjected to simulation.


2.3.3 Available Modell<strong>in</strong>g Programs2 Modell<strong>in</strong>g Prote<strong>in</strong> Structures 29Homology modell<strong>in</strong>g programs of various degrees of automation are now readilyavailable. These <strong>in</strong>clude : Insight I1 [69] (commercial, graphics based, Homologymodell<strong>in</strong>g module: semi automatic); Quanta (commercial, graphics based, semiautomatic module) ; What If [70] (academic, graphics based, semi automaticmodule); 0 [71] (academic, graphics based, essentially crystallographic modell<strong>in</strong>gprogram but with database loop modell<strong>in</strong>g features); Sybyl (commercial, graphicsbased, semi automatic module based on Composer [72, 731, also available as a nongraphical,academic program). All these packages conta<strong>in</strong> essentially a sequencealignment program, a database loop search<strong>in</strong>g program and features for optimis<strong>in</strong>gside cha<strong>in</strong> conformations. For an assessment of the errors associated with suchmodell<strong>in</strong>g procedures see Topham et al. [63]. Such semi-automatic modell<strong>in</strong>gpackages can very quickly produce models with no bad atom-atom contacts butwhich are partially or completely wrong due to the errors associated with alignmentand loop build<strong>in</strong>g already discussed. Programs for evaluat<strong>in</strong>g homology models arenot generally <strong>in</strong>cluded <strong>in</strong> such packages: methods such as those described below(Section 2.4.2) should be used to look for errors. Regardless of the results of anytests, any user of models built <strong>in</strong> this way should always be m<strong>in</strong>dful of the likelyerrors.2.4 Modell<strong>in</strong>g de novo: Structure PredictionWhen no specific relationship can be found between a sequence of unknown structureand any known structure only direct structure prediction methods rema<strong>in</strong> an option,and as shown <strong>in</strong> Figure 2-1, only the secondary structure can be predicted withany degree of accuracy at present.2.4.1 A Family of Similar SequencesSecondary structure prediction (SSP) can carried out on s<strong>in</strong>gle sequences; however,where a family of homologous sequences exist more accurate results can be obta<strong>in</strong>ed.This has been known for some time [74, 751 but it is with the successful prediction[12] of the catalytic subunit of cyclic AMP dependent prote<strong>in</strong> k<strong>in</strong>ase [76] and thedevelopment of a neural-network based multiple sequence SSP method availableover the <strong>in</strong>ternet by e-mail [13, 141 that use of such methods have become


30 Tim J.l? Hubbard and Arthur M. Leskwidespread. Such methods make use of multiple sequence <strong>in</strong>formation, look<strong>in</strong>g forconsistency between the predictions for different sequences. It should be noted thateven given a correct secondary structure assignment, it is very difficult to determ<strong>in</strong>ehow the units fit together <strong>in</strong> three dimensions. [77]. A table of aligned sequences maywell conta<strong>in</strong> derivable <strong>in</strong>formation about the 3-D structure of a prote<strong>in</strong> but attemptsto recover it have so far met with no more than sporadic success [12]. However, usefuldeductions about the most likely folded structure can be made <strong>in</strong> a systematic wayfrom a comb<strong>in</strong>ation of analysis of SSP results and the conservation patternsobserved <strong>in</strong> a multiple sequence alignment. Successful predictions us<strong>in</strong>g such an approachhave been made for the annex<strong>in</strong> [78] and Src homology 2 (SH2) prote<strong>in</strong>families [79].2.4.2 A Lone Sequence or a Designed Sequence:no Multiple Sequence, no Known RelativesThis situation is the most unfavourable for model build<strong>in</strong>g; as one has no way ofapply<strong>in</strong>g known sequence or structure <strong>in</strong>formation. In effect the problem can onlybe handled by a priori methods, Even secondary structure prediction is <strong>in</strong>accuratefor s<strong>in</strong>gle sequences and therefore the likelihood of build<strong>in</strong>g a correct three-dimensionalmodel is small.If there is any suspicion (perhaps on functional grounds) that a natural sequencehas a certa<strong>in</strong> fold, or <strong>in</strong> the case of a designed sequence, built to fold <strong>in</strong> a particularway, the situation is slightly better s<strong>in</strong>ce it is possible to test the likelihood that a sequencecan match a particular fold. Methods for do<strong>in</strong>g this <strong>in</strong>clude check<strong>in</strong>g polarity1801; pack<strong>in</strong>g quality and residue-residue contact frequencies [81] ; various freeenergy functions <strong>in</strong>corporat<strong>in</strong>g solvation effects [82, 831, hydration and heat stabilityeffects and more recently us<strong>in</strong>g thread<strong>in</strong>g techniques to establish if the sequence iscompatible with the fold [MI.The disadvantages of these methods are that (1) most provide only an assessmentof the structure as a whole rather than of local regions (models are frequently onlypartially right e. g. [86]) (2) even at this level they are <strong>in</strong>accurate, i. e. some experimentallydeterm<strong>in</strong>ed structures are classified as <strong>in</strong>correct whereas some misfolded modelsare classed as correct <strong>in</strong> bl<strong>in</strong>d tests and (3) that the results are essentially dependenton the quality of the model rather than the correctness of its fold. Moreover, thesetests only look at the f<strong>in</strong>al state and do not assess if the sequence is compatible withany pathway to that state. For natural sequences it can be assumed that fold<strong>in</strong>g toa compact state can be achieved but this is more likely to fail to be the case fordesigned sequences. Current experimental [87] and theoretical work [88] on thefold<strong>in</strong>g pathways of prote<strong>in</strong>s suggest that there are clear fold<strong>in</strong>g <strong>in</strong>itiation sites


2 Modell<strong>in</strong>g Prote<strong>in</strong> Structures 31specified <strong>in</strong> a prote<strong>in</strong> cha<strong>in</strong>. No method for identify<strong>in</strong>g such sequences directly hasyet been developed but it would seem clear that many sequences that are compatiblewith the desired f<strong>in</strong>al fold may conta<strong>in</strong> no such fold<strong>in</strong>g signals. Until an understand<strong>in</strong>gof the requirements for such sites is developed, de novo prote<strong>in</strong> design [89] willrema<strong>in</strong> very difficult, particularly for large prote<strong>in</strong>s.2.5 Future PossibilitiesThis review has tried to present a snapshot of what is possible now. What are theprospects for the near future?Research is most active <strong>in</strong> the area of thread<strong>in</strong>g. The many groups develop<strong>in</strong>gpotentials for fold recognition have taken a number of slightly different approaches,each with its own advantages. There will be more variations and those <strong>in</strong> the fieldanticipate substantial further improvements <strong>in</strong> the potentials. Fold recognition ishowever only the first stage <strong>in</strong> build<strong>in</strong>g a 3-D model.Thread<strong>in</strong>g methods should ultimately be able to <strong>in</strong>corporate almost all themethods discussed for model build<strong>in</strong>g and evaluation so that a s<strong>in</strong>gle sequence (Section2.4.2) may be tested aga<strong>in</strong>st all known folds. It is therefore anticipated that whatwill emerge will not only provide more accurate fold recognition, but the <strong>in</strong>corporationof other, more detailed, model-build<strong>in</strong>g techniques to produce a specific threedimensionalstructural prediction. It is by br<strong>in</strong>g<strong>in</strong>g together the various techniquesfor prediction and test<strong>in</strong>g of structures that the <strong>in</strong>teraction between them willultimately generate the most satisfactory results.2.6 SummaryThe explosion of prote<strong>in</strong> sequence and structural <strong>in</strong>formation has generated <strong>in</strong> itswake a number of significant advances <strong>in</strong> prote<strong>in</strong> modell<strong>in</strong>g methods. If a relationshipcan be demonstrated between the sequence to be modelled and some knownstructure, a 3-D model of predictable quality can be constructed. If no such relationshipcan be shown, models can still be constructed but with little quantification ofthe chance of their be<strong>in</strong>g correct.


32 Tim J I? Hubbard and Arthur M. LeskNote Added <strong>in</strong> ProofThere is a serious ‘catch 22’ like problem <strong>in</strong> evaluat<strong>in</strong>g the effectiveness of prote<strong>in</strong>modell<strong>in</strong>g: if you model a structure that is known, you cannot be sure how biasedyou were by that prior knowledge (s<strong>in</strong>ce almost no modell<strong>in</strong>g system is entirely ablack box) whereas if you model a structure that is unknown you cannot assess theaccuracy of your models. One way around this is to build models ‘just <strong>in</strong> time’, i. e.immediately before publication of an experimentally-determ<strong>in</strong>ed structure, so it ispossible to evaluate the accuracy of your model with the confidence that it was abl<strong>in</strong>d prediction. When this chapter was written there had been isolated examples ofthis sort of arrangement between theoreticians and experimentalists but they werequite rare.In the last month there has been a meet<strong>in</strong>g to evaluate the first ever large scaleprote<strong>in</strong> structure prediction competition, which ran for most of 1994 [94]. -35groups made - 150 predictions about - 25 target prote<strong>in</strong>s. The predictions were considered<strong>in</strong> three categories : homology modell<strong>in</strong>g, fold recognition and ab <strong>in</strong>itioprediction. The results were <strong>in</strong>structive:Homology modell<strong>in</strong>g naturally gives the most reliable predictions, but despite theefforts made to automate the modell<strong>in</strong>g process, it is clear that where the templatestructure used to build the model differs substantially from the experimental structurethe model is generally wrong: we are unable to model the variations that commonlyoccur between homologous prote<strong>in</strong>s (loops, man cha<strong>in</strong> shift and the associateddifferent side cha<strong>in</strong> pack<strong>in</strong>g) with much greater accuracy than was possible byhand 10-15 years ago.Although the accuracy of homology modell<strong>in</strong>g was disappo<strong>in</strong>t<strong>in</strong>g, the number oftargets that could potentially be modelled based on a template structure is go<strong>in</strong>g to<strong>in</strong>crease, s<strong>in</strong>ce the meet<strong>in</strong>g demonstrated that fold recognition techniques (us<strong>in</strong>g newmethods such as ‘thread<strong>in</strong>g’) can already <strong>in</strong>dentify the most similar fold <strong>in</strong> the structuredatabase <strong>in</strong> a substantial number of cases. Thread<strong>in</strong>g is still a very young techniqueand it is clear that many improvements can be made, so the accuracy and sensitivitycan only <strong>in</strong>crease.F<strong>in</strong>ally, it does appear that useful ab <strong>in</strong>itio structure predictions can be made fortargets where there are many homologous sequences. Secondary structure predictionby the PHD method [13, 141 <strong>in</strong> such cases is sufficiently reliable for predictors to considerhow these secondary structural elements might be assembled (i. e. to attempta full tertiary prediction) and new techniques are emerg<strong>in</strong>g to predict such long range<strong>in</strong>teractions based on specialized potentials [95] and correlation <strong>in</strong>formation 1961.


2 Modell<strong>in</strong>a Prote<strong>in</strong> Structures 33AcknowledgementsTJPH thanks the Medical Research Council and Zeneca Pharmaceuticals and AMLthanks the Kay Kendall Foundation for generous support.References[l] Anf<strong>in</strong>sen, C. B., Science 1973, 181, 223-230.[2] Hubbard, T. J., Sander C., Prote<strong>in</strong> Eng. 1991, 4, 711-717.[3] Karplus, M., Petsko, G. A., Nature 1990, 347, 631-639.[4] Barton, G. J., Sternberg, M. J., Prote<strong>in</strong> Eng. 1987, I, 89-94.[5] Ploegman, J. H., et al., J. Mol. Biol. 1978, 123, 557-565.[6] Chothia, C., Lesk, A. M., EMBO J. 1986, 5, 823-826.[7] Hubbard, T. J., Blundell, T. L., Prote<strong>in</strong> Eng. 1987, I, 159-171.[8] Flaherty, K. M. et al., Proc. Natl. Acad. Sci. USA 1991, 88, 5041-5045.[9] Kabsch, W., Sander, C., Proc. Natl. Acad. Sci. USA 1984, 81, 1075-1078.[lo] Bork, P. et al., Prote<strong>in</strong> Sci. 1992, 1, 1677-1690.[Ill Bork, P. et aI., Nature 1992, 358, 287.[12] Benner, S. A., Gerloff, D., Adv. Enz. Regul. 1990, 31, 121-181.[13] Rost, B., Sander, C., Nature 1992, 360.[14] Rost, B. et al., TZBS 1993, 18, 120-123.[15] Chothia, C., Nature 1992, 357, 543-544.[16] Pastore, A., Lesk, A. M., Curr. Op<strong>in</strong>. Biotech. 1991, 2, 592-598.[17] Bernste<strong>in</strong>, F. C., et al., .I Mol. Biol. 1977, 112, 535-542.[18] Abola, E. et al., <strong>in</strong>: Crystallographic Databases - Information Content, SoftwareSystems, Scientific Applications, Allen, F. H., et al., (eds.), Data Commission of the InternationalUnion of Crystallography, Bonn/Cambridge/Chester, 1987, pp. 107- 132.[19] Hobohm, U. et al, Prote<strong>in</strong> Sci. 1992, 1, 409-417.[20] Holm, L. et al., Prote<strong>in</strong> Sci. 1992, 1, 1691-1698.[21] Bairoch, A., Boeckmann, B., Nucl. Acids Res. 1991, 19, 2247-2250.[22] Sander, C., Schneider, R., Prote<strong>in</strong>s 1991, 9, 56-68.[23] Argos, P., et al. Prote<strong>in</strong> Eng. 1991, 4, 375-383.[24] von Heijne, G., Eur. J. Biochem. 1991, 199, 253-256.[25] Bairoch, A., Nucl. Acids Res. 1992, 20 (Suppl.), 2013-2018.[26] Sibbald, P. R., Argos, P., Comput. Appi. Biosci. 1990, 6, 279-288.[27] Pearl, L. H., Taylor, W. R., Nature 1987, 329, 351-354.[28] Over<strong>in</strong>gton, J. et al., Proc. R. SOC. London B. 1990, 241, 132-145.[29] Luethy, R. et al., Prote<strong>in</strong>s 1991, 10, 229-239.[30] Bowie, J. U. et al., Science 1991, 253, 164-170.[31] Over<strong>in</strong>gton, J. et al., Prote<strong>in</strong> Sci. 1992, I, 216-226.[32] Sippl, M. J., J. Mol. Biol. 1990, 213, 859-883.[33] F<strong>in</strong>kelste<strong>in</strong>, A. V., Reva, B. A., Nature 1991, 351, 497-499.[34] Sippl, M. J., Weitckus, S., Prote<strong>in</strong>s 1992, 13, 258-271.[35] Jones, D. T., et al., Nature 1992, 358, 86-89.[36] Bryant, S. H., Lawrence, C. E., Prote<strong>in</strong>s 1993, 16, 92-112.


34 Tim J. I! Hubbard and Arthur M. Lesk[37] Holm, L., Sander, C., Prote<strong>in</strong>s 1992, 14, 213-223.[38] Pickett, S. D. et al., J. Mol. Biol. 1992, 228, 170-187.[39] Lesk, A. M. et al., Prote<strong>in</strong>s 1989, 5, 139-148.[40] Wilmanns, M., Eisenberg, D., Proc. Natl. Acad. Sci. USA 1993, 90, 1379-1383.[41] Rippmann, F. et al., EMBO J. 1991, 10, 1053-1059.[42] Lesk, A. M., Prote<strong>in</strong> Architecture: A Practical Approach, IRL Press, Oxford, 1991.[43] Lesk, A. M., Chothia, C., Philos. Trans. R. SOC. (London) 1986, 317, 345-356.[44] Jones, T. A., Thirup, S., EMBO J. 1986, 5, 819-822.[45] Zuker, M., J. Mol. Biol. 1991, 221, 403-420.[46] Saqi, M. A. et al., Prote<strong>in</strong> Eng. 1992, 5, 305-311.[47] Venkatachalam, C., Biopolymers 1968, 6, 1425 - 1436.[48] Rose, G. D. et al., Adv. Prote<strong>in</strong> Chem. 1985, 37, 1-109.[49] Sibanda, B. L., Thornton, J. M., J. Mol. Biol. 1985, 316, 170-174.[50] Efimov, A. V., Mol. Biol. (USSR) 1986, 20, 208-216.[51] Leszczynski, J. F., Rose, G. D., Science 1986, 234, 849-855.[52] Chothia, C., Lesk, A. M., J. Mol. Biol. 1987, 196, 901-917.[53] Wilmot, C. M., Thornton, J. M., J. Mol. Biol. 1988, 203, 221-232.[54] Milner-White, E. J. et al., J. Mol. Biol. 1988, 204, 777-782.[55] Sibanda, B. L. et al., J. Mol. Biol. 1989, 206, 759-777.[56] Sibanda, B. L., Thornton, J. M., J. Mol. Biol. 1993, 229, 428-447.[57] Tramontano, A. et al,, Prote<strong>in</strong>s 1989, 6, 382-394.[58] Tramontano, A. et al., J. Mol. Biol. 1990, 215, 175-182.[59] Bruccoleri, R. E., Karplus, M., Biopolymers 1987, 26, 137-168.[60] F<strong>in</strong>e, R. M. et al., Prote<strong>in</strong>s 1986, I, 342-362.[61] Moult, J., James, M. N., Prote<strong>in</strong>s 1986, I, 146-163.[62] Bruccoleri, R. E. et al., Nature 1988, 335, 564-568.[63] Topham, C. M. et al., Biochem. SOC. Symp. 1990, 57, 1-9.[64] Summers, N. L., Karplus, M., Methods Enzymol. 1991, 202, 156-204.[65] Levitt, M., J. Mol. Biol. 1992, 226, 507-533.[66] Chothia, C. et al., Nature 1989, 342, 877-883.[67] Wilson, C. et al., J. Mol. Biol. 1993, 229, 996-1006.[68] Reid, L. S., Thornton, J. M., Prote<strong>in</strong>s 1989, 5, 170-182.[69] Dayr<strong>in</strong>ger, H. E. et al., J. Mol. Graphics 1986, 4, 82-87.[70] Vriend, G., J. Mol. Graphics 1990, 8, 52-56.[71] Jones, T. A. et al., <strong>in</strong>: Crystallographic and Modell<strong>in</strong>g Methods <strong>in</strong> Molecular Design,199. Bugg, C. E. and Ealick, S. E. (eds.), New York, Spr<strong>in</strong>ger-Verlag, 1990, pp. 189-199.[72] Sutcliffe, M. J. et al., Prote<strong>in</strong> Eng. 1987a, 1, 377-384.[73] Sutcliffe, M. J. et al., Prote<strong>in</strong> Eng. 1987b, I, 385-392.[74] Zvelebil, M. J. et al., J. Mol. Biol. 1987, 195, 957-961.[75] Niermann, T., Kirschner, K., Prote<strong>in</strong> Eng. 1991, 4, 359-370.[76] Knighton, D. R. et al., Science 1991, 253, 407-414.[77] Fasman, G., <strong>in</strong>: Prediction of Prote<strong>in</strong> Structure and the Pr<strong>in</strong>ciples of Prote<strong>in</strong> Conformation,Fasman, G. (ed.), New York, Plenum, 1989, pp. 193-316.[78] Barton, G. J. et al., Eur. J Biochem. 1991, 198, 749-760.[79] Russell, R. B. et al., FEBS Lett. 1992, 304, 15-20.[SO] Baumann, G. et al., Prote<strong>in</strong> Eng. 1989, 2, 329-334.[81] Gregoret, L. M., Cohen, F. E., J. Mol. Biol. 1990, 211, 959-974.[82] Novotny, J. et al., Prote<strong>in</strong>s 1988, 4, 19-30.[83] Chiche, L. et al., Proc. Natl. Acad. Sci. USA 1990, 87, 3240-3243.[84] Oobatake, M., Ooi, T., Prog. Biophys. Mol. Biol. 1993, 59, 237-284.


2 Modell<strong>in</strong>g Prote<strong>in</strong> Structures 35[85] Luethy, R. et al., Nature 1992, 356, 83-85.[86] Bates, P. A. et al., Prote<strong>in</strong> Eng. 1989, 3, 13-21.[87] Matouschek, A. et al,, Nature 1990, 346, 440-445.[88] Moult, J., Unger, R., Biochemistry 1991, 34 3816-3824.[89] Sander, C. et al., Prote<strong>in</strong>s 1992, 12, 105-110.[90] Baker, E. N., Dodson, E. J., Acfa Crystallogr. Sect, A 1980, 36, 559.[91] Kamphuis, I. G. et ul., J. Mol. BioL 1984, 179, 233.[92] Guss, J. M., Freeman, H. C., J. Mol. Biol. 1983, 169, 521.[93] Norris, G. E. et al., J. Am. Chem. SOC. 1986, 108, 2184.[94] Conclusions of the Meet<strong>in</strong>g for the Critical Assessment of Techniques for Prote<strong>in</strong> StructurePrediction, Asilomar 1994, Prote<strong>in</strong>s 1995, (<strong>in</strong> preparation).[95] Hubbard, T. J., <strong>in</strong>: Proceed<strong>in</strong>gs of the Biotechnology Comput<strong>in</strong>g Pack, Prote<strong>in</strong> StructurePrediction M<strong>in</strong>iDack of the 27th HICSS, R. H. (ed.), IEEE Computer Society Press,1994, pp. 336-354.[96] Gobel, U., et al., 1994, Prote<strong>in</strong>s 1994, 18, 309-317.


Computer Modell<strong>in</strong>g <strong>in</strong> Molecular BiologyEdited by Julia M. GoodfellowOVCH Verlagsgesellschaft mbH, 19953 Molecular Dynamics Simulationsof PeptidesD. J Osguthorpe and I? K. C. Paul *Molecular Graphics Unit, School of Chemistry, University of Bath,Bath BA2 7AY, England* Present address : Unilever Research, Port Sunlight Laboratory,Wirral, L63 3JW, EnglandContents3.1 Introduction . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.2 Energy Calculation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.33.3.13.3.23.3.33.3.43.3.53.3.63.3.73.3.7.1Applications of Molecular Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Pharmaceutical Applications of Conformational Studies of Peptides . . . . . 44Lute<strong>in</strong>is<strong>in</strong>g Hormone Releas<strong>in</strong>g Hormone (LHRH) . . . . . . . . . . . . . . . . . . . . 45Structural Studies on LHRH.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46LHRH Agonists ................................................... 49Melan<strong>in</strong> Concentrat<strong>in</strong>g Hormone (MCH) . . . . . . . . . . . . . . . . . . . . . . . . . . . , . 51De Novo Peptide and Prote<strong>in</strong> Design . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Molecular Dynamics Calculations on Synthetic Ion Channels . . . . . . . . , . . 53Spacer <strong>in</strong> (LSSLLSL)*3 Helix.. . . . . . . . . . . . . . . . . . . . . . . . . , . . . . . . . . . . . . 563.4 Conclusions ....................................................... 58References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58


38 D. J Osmthorue and I? K. C Paul3.1 IntroductionConformational studies of peptides have a long history, with some of the first calculationsof biological molecules be<strong>in</strong>g studies of dipeptides. The first major understand<strong>in</strong>g<strong>in</strong> the field of peptide and prote<strong>in</strong> conformation came with the f<strong>in</strong>d<strong>in</strong>g thatnearest neighbour <strong>in</strong>teractions have an important part to play <strong>in</strong> shap<strong>in</strong>g prote<strong>in</strong> conformation[l-31. Today with hundreds of prote<strong>in</strong> crystal structures solved it is stillfound that most of the residues fall with<strong>in</strong> the allowed (q, I,U) region of theRamachandran map. The use of energy parameters [2-51 and m<strong>in</strong>imisation techniques[6-81 to search for global m<strong>in</strong>imum energy conformations and the subsequentapplication of <strong>molecular</strong> dynamics techniques [9] to study conformational transitionsand pathways was the next stage <strong>in</strong> the understand<strong>in</strong>g of peptide and prote<strong>in</strong>conformation. Currently, with the advent of more and more sophisticated forcefieldsand <strong>in</strong>tegrat<strong>in</strong>g algorithms coupled with extremely good graphics visualisation programs,<strong>molecular</strong> modell<strong>in</strong>g us<strong>in</strong>g dynamics simulations is becom<strong>in</strong>g a rout<strong>in</strong>e tool<strong>in</strong> the study of peptides and prote<strong>in</strong>s. The number of <strong>molecular</strong> dynamics studieson peptides has burgeoned <strong>in</strong> the last few years. Table 3-1 gives a list (by no meanscomplete) of recent peptide modell<strong>in</strong>g studies. As is evident, the comb<strong>in</strong>ed use ofNMR data with <strong>molecular</strong> dynamics simulations is on the <strong>in</strong>crease, and representsa pofverful method to obta<strong>in</strong> structural po<strong>in</strong>ters <strong>in</strong> expla<strong>in</strong><strong>in</strong>g bio-activity.Peptide hormones are currently of great <strong>in</strong>terest as they control many of thehomeostatic mechanisms of animals and humans. By modulat<strong>in</strong>g the behaviour ofpeptide hormones with agonists or antagonists, it is possible to control many medicalconditions. There is currently much <strong>in</strong>terest <strong>in</strong> the pharmaceutical <strong>in</strong>dustry <strong>in</strong> design<strong>in</strong>gdrugs by mimick<strong>in</strong>g the peptide controll<strong>in</strong>g the faulty or diseased biologicalmechanism. We know that hormones b<strong>in</strong>d to a receptor, which is generally a prote<strong>in</strong>,by non-bond<strong>in</strong>g <strong>in</strong>teractions. The receptor is only capable of b<strong>in</strong>d<strong>in</strong>g molecules ifthey can adopt a specific conformation, the active conformation. However, the structureof most receptors is generally not known. This is partly because there are onlya small number of receptor molecules per cell, which makes it difficult to get enoughmaterial for structural studies, and partly because many receptors are membraneboundprote<strong>in</strong>s, whose structures are difficult to determ<strong>in</strong>e. Our only way out of thissituation, if no receptor structure is likely to be available, is to attempt to “guess”the structure of the active conformation by look<strong>in</strong>g at ligands which are known tob<strong>in</strong>d to the receptor. S<strong>in</strong>ce peptide hormones are highly flexible, a major part ofdesign<strong>in</strong>g the drug is to f<strong>in</strong>d out what is the “active conformation” of the hormone,by <strong>in</strong>vestigat<strong>in</strong>g the energetically accessible conformations of the hormone and itsanalogues.


3 Molecular Dynamics Simulations of Peptides 39Table 3-1. Some recent examples of <strong>molecular</strong> dynamics studies on small peptides.Examples of system studied Remarks ReferenceaBoc-Cys-Val-Pro-Pro-Phe-Phe-Cys-OMecyclo enkephal<strong>in</strong> analoguesMD <strong>in</strong> water Zannoti et al., 1993Chew et al., 1993cyclic hexapeptide NK-2analoguesneuropeptide Y analogueTyr-Pro-Gly-Asp-Valdeltorph<strong>in</strong>-I1antifreeze peptide 38 residueHPLC-6antamanideLHRH antagonistsanti-adhesive RGD peptidesdefens<strong>in</strong> antimicrobialsHNP-1 and NP-2c(G1y-Pro-D-Phe-Gly-Val)oxytoc<strong>in</strong>Boc-Ala-Aib-Ala-OMeAc-Pro-D-Ala-Ala-NMe withCa2 +lanthion<strong>in</strong>e-bridged enkephal<strong>in</strong>analoguesdermoph<strong>in</strong> analoguesz<strong>in</strong>c-f<strong>in</strong>ger peptidecyclic analogue of substance P0-glycosylated cyclic peptidesmastoparan-Xribunuclease-S peptidec(G1y-Pro-Phe-Val-Phe-Phe)cyclic peptide templatesbombes<strong>in</strong> and GRP analoguestufts<strong>in</strong> and analoguesneurok<strong>in</strong><strong>in</strong> antagonistGRF analoguesBoc-Gly-Val-Gly-Gly-Leu-OMeMD <strong>in</strong> polar and apolarenvironsNOE-restra<strong>in</strong>ed MDMD used to confirmstructuresMD <strong>in</strong> solutionNOE-restra<strong>in</strong>ed MDMD on iceMD <strong>in</strong> chloroformNOE-restra<strong>in</strong>ed MD <strong>in</strong>vacuo and solventrestra<strong>in</strong>ed and free MDNOE-restra<strong>in</strong>ed MDMD <strong>in</strong> vacuo comparedwith NMRrestra<strong>in</strong>ed MDMD with time-averagedconstra<strong>in</strong>tsMD coupled with randomsearchMD compared with NOEdataMD at 600 K with 4start<strong>in</strong>g po<strong>in</strong>tsMD <strong>in</strong> vacuo and waterNOE-restra<strong>in</strong>ed MD <strong>in</strong>DMSONOE-restra<strong>in</strong>ed MD <strong>in</strong>vacuo and solventsNOE-restra<strong>in</strong>ed MDMD and free energiesMD <strong>in</strong> vacuo and DMSOMD at 900 KNOE-restra<strong>in</strong>ed MDhigh temperature quenchedMDNOE-restra<strong>in</strong>ed MDNOE-restra<strong>in</strong>ed MDMD <strong>in</strong> vacuoWollborn et al., 1993BeckSick<strong>in</strong>ger et al., 1993Karpen et al., 1993,Tobias et al., 1991Ohno et al., 1993Jorgensen et al., 1993Bruschweiler et al., 1992Rizo et al.. 1992Gurrath et al., 1992Pardi et al., 1992Liu and Giersch, 1992Bhaskaran et al., 1992Brunne and Leibfritz, 1992Michel et al., 1992Pol<strong>in</strong>sky et al., 1992Wilkes and Schiller, 1992Palmer and Case, 1992Saulitius et al., 1992Kessler et al., 1991, 1992Sukumar and Higashijima,1992Simonson and Brunger, 1992Kessler et al., 1992Floegel and Mutter, 1992Malikayil et al., 1992O’Connor et al., 1992Malikayil and Harbeson,1992Fry et al., 1992Lelj et al., 1992


40 D. J; Osnuthorue and I? K. C. PaulTable 3-1. (cont<strong>in</strong>ued).Examples of system studied Remarks Referenceapolyalan<strong>in</strong>e 13 residuesRGD conta<strong>in</strong><strong>in</strong>g peptidescyclic endothel<strong>in</strong> antagonistendothel<strong>in</strong>-Ienkephal<strong>in</strong> derivativeH helix of myoglob<strong>in</strong>motil<strong>in</strong>c(D-Ala-Phe-Val-Lys-Trp-Phe)neuropeptide Yvasopress<strong>in</strong> and antagonistspolyalan<strong>in</strong>eAla and Val peptidesnr-Pro-Gly-Asp-Valcyclol<strong>in</strong>opeptide Agramicid<strong>in</strong> channelsphysalaem<strong>in</strong> analoguesoxytoc<strong>in</strong>blocked dipeptideslgGl h<strong>in</strong>ge peptide derivativeribunuclease A S-peptide(Gly)*30 and (Ala)*30human calciton<strong>in</strong> gene-relatedpeptidealamethic<strong>in</strong>vasoactive <strong>in</strong>test<strong>in</strong>al peptideanalogueMCH and analoguesantagonists of LHRHMD <strong>in</strong> vacuo and solventMD and NMR dataMD with NMR distanceconstra<strong>in</strong>tsMD and NMR derivedconstra<strong>in</strong>tsMD <strong>in</strong> solventMD <strong>in</strong> waterNOE-restra<strong>in</strong>ed MDMD with DMSO as solvenlMD at 600 KMD us<strong>in</strong>g NMR datalong MD at hightemperatureMD with specialisedsampl<strong>in</strong>g2.211s MD <strong>in</strong> waterMD <strong>in</strong> vacuo and solventMD with waterNOE-restra<strong>in</strong>ed MDconformational searchand MDMD <strong>in</strong> waterNOE-restra<strong>in</strong>ed MD <strong>in</strong>vacuo and <strong>in</strong> solventMD <strong>in</strong> waterMD <strong>in</strong> vacuo and waterNOE-restra<strong>in</strong>ed MDsimulation <strong>in</strong> waterMD compared with NMRdataNOE-restra<strong>in</strong>ed MDsimulation <strong>in</strong> solventsMD compared with nmrconformational search<strong>in</strong>gand MDDaggett and Levitt, 1992Bogusky et al., 1992Krystek et al., 1992Saudek et al., 1991,Krystek et al., 1991,Reily and Dunbar, 1991Smith and Pettitt, 1991,Smith et al., 1991Soman et al., 1991Edmonson et al., 1991Mierke and Kessler, 1991MacKerell, 1991Schmidt et al., 1991Daggett et al., 1991Tobias and Brooks, 1991Tobias et al., 1991Saviano et al., 1991,Castiglione et al., 1991Chiu et al., 1991Holzemann et al., 1991Ward et al., 1991Tobias et al., 1990Kessler et al., 1991TiradoRives andJorgensen, 1991DiCapua et al., 1991Breeze et al., 1991Fraternali, 1990Fry et al., 1989Paul et al., 1989Paul et al., 1989


3 Molecular Dynamics Simulations of PeDtides 41a References for Table 3-1 :BeckSick<strong>in</strong>ger, A. G., Koppen, H., Hoffman, E., Gaida, W., and Jung, G., A ReceptorResearch 1993, 13, 215-228.Bhaskaran, R., Chuang, L. C., and Yu, C., Biopolymers 1992, 32, 1599-1608.Bogusky, M. J., Naylor, A. M., Pitzenberger, S. M., Nutt, R. F., Brady, S. F., Colton, C. D.,Sisko, J. T., Anderson, P. S., and Veber, D. F., Znt. L Peptide Prote<strong>in</strong> Res. 1992,39, 63-76.Breeze, A. L., Harvey, T. S., Bazzo, R., and Campbell, I. D., Biochemistry 1991,30, 575-582.Brunne, R. M. and Liebfritz, D., Znt. A Peptide Prote<strong>in</strong> Res. 1992, 40, 401-406.Bruschweiler, R., Roux, B., Blackledge, M., Gries<strong>in</strong>ger, C., Karplus, M., and Ernst, R. R.,A Am. Chem SOC. 1992, 114, 2289-2302.Castiglione, M. A. M., Pastore, A., Pedone, C., Temussi, P. A., Zannoti, G., and Tancredi,T., Int. A Peptide Prote<strong>in</strong> Res. 1991, 37, 81-89.Chew, C., Villar, H. O., and hew, G. H., Biopolymers 1993, 33, 647-657.Chiu, S. W., Jakobsson, E., Subramaniam, S., and McCammon, J. A., Biophys. J. 1991, 60,273-285.Daggett, V., Kollman, P. A., and Kuntz, I. D., Biopolymers 1991, 31, 1115-1134.Daggett, V. and Levitt, M., A Mol. Biol. 1992, 223, 1121-1138.DiCapua, F. M., Swam<strong>in</strong>athan, S., and Beverridge, D. L., A Am. Chem. SOC. 1991, 113,6145-6155.Edmonson, S., Khan, N., Shriver, J., Zdunek, J., and Graslund, A., Biochemistry 1991, 30,11271-11279.Floegel, R. and Mutter, M., Biopolymers 1992, 32, 1283-1310.Fraternali, F., Biopolymers 1990, 30, 1083- 1099.Fry, D. C., Madison, V. S., Bol<strong>in</strong>, D. R., Greely, D. N., Toome, V., and Wegrzynski, D. B.,Biochemistry 1989, 28, 2399-2409.Fry, D. C., Madison, V. S., Greely, D. N., Felix, A. M., Heimer, E. P., Frohman, L., Campbell,R. M., Mowles, T. F., Toome, V., and Wegrzynski, B. B., Biopolymers 1992, 32,649- 666.Gurrath, M., Muller, G., Kessler, H., Aumailley, M., and Timpl, R., Eur. A Biochem. 1992,210, 911-921.Holzemann, G., Jonczyk, A., Eiermann, V., Pachler, K. G. R., Barnickel, G., and Regoli, D.,Biopolymers 1991, 31, 691-697.Jorgensen, H., Mori, M., Matsui, H., Kanaoka, M., Yanagi, H., Yabusaki, Y., and Kikuzono,Y., Prot. Eng. 1993, 6, 19-27.Kessler, H., Geyer, A., Matter, H., and Kock, M., Znt. J. Peptide Prote<strong>in</strong> Res. 1992, 40, 25-40.Kessler, H., Matter, H., Gemmecker, G., Bats, J. W., and Kottenhahn, M., J Am. Chem. SOC.1991, 113, 7550-7563.Kessler, H., Matter, H., Gemmecker, G., Kottenhahn, M., and Bats, J. W., J. Am. Chem. SOC.1992, 114, 4805-4818.Kessler, H., Mronga, S., Muller, G., Moroder, L., and Huber, R., Biopolymers 1991, 31,1189- 1124.Krystek, S. R., Bassol<strong>in</strong>o, D. A., Bruccoleri, R. E., Hunt, J. T., Porbucan, M. A., Wandler,C. F., and Anderson, N. H., FEBS Letters 1992, 299, 255-261.Krystek, Jr, S. R., Bassol<strong>in</strong>o, D. A., Novotny, J., Chen, C., Marscher, T. M., and Anderson,N. H., FEBS Letters 1991, 281, 212-218.Lelj, F., Tamburro, A. M., Villani, V., Grimaldi, P., and Guantieri, V., Biopolymers 1992,32, 161-172.Liu, Z. P. and Gierasch, L. M., Biopolymers 1992, 32, 1727-1739.MacKerell Jr, A. D., Methods <strong>in</strong> Enzymology 1991, 202, 449-470.Malikayil, J. A., Edwards, J. V., and McLean, L. R., Biochemistry 1992, 31, 7043-7049.


42 D. J. OsPuthorDe and J? K. C PaulMalikayil, J. A. and Harbeson, S. L., Int. J. Peptide Prote<strong>in</strong> Res. 1992, 39, 497-505.Michel, A. G., Jeandenans, C., and Ananthanarayanan, V. S., J. Biomol. Str. Dyn. 1992, 10,281 -293.Mierke, D. F. and Kessler, H., J. Am. Chem. SOC. 1991, 113, 9466-9470.O’Connor, S. D., Smith, P. E., AlObeidi, F., and Pettitt, B. M., J. Med. Chem. 1992, 35,2870-2881.Ohno, Y., Segawa, M., Oshishi, H., Doi, M., Kitamura, K., Ishida, T., Inoue, M., andIwashita, T., Eur. J. Biochem. 1993, 212, 185-191.Palmer, A. G. and CAse, D. A., J. Am. Chem. SOC. 1992, 114, 9059-9067.Pardi, A., Zhang, X. L., Selsted, M. E., Skalicky, J. J., and Yip, P. F., Biochemistry 1992,31, 11357-11364.Paul, P. K. C., Dauber-Osguthorpe, P., Campbell, M. M., Brown, D. W., K<strong>in</strong>sman, R. G.,Moss, C., and Osguthorpe, D. J., Biopolymers 1990, 29, 623.Paul, P. K. C., Dauber-Osguthorpe, P., Campbell, M. M., and Osguthorpe, D. J., Biochem.Biophys. Res. Comm. 1989, 165, 1051.Pl<strong>in</strong>sky, A., Cooney, M. G., ToyPalmer, A., Osapay, G., and Goodman, M., J. Med. Chem.1992, 35, 4185-4194.Reilly, M. D. and Dunbar Jr, J. B., Biochem. Biophys. Res. Comm. 1991, 178, 570-577.Saudek, V., Hoflack, J., and Pelton, J. T., Znt. J. Peptide Prote<strong>in</strong> Res. 1991, 37, 174-179.Saulitius, J., Mierke, D. F., Byk, G., Gilon, C., and Kessler, H., J. Am. Chem. SOC. 1992, 114,4818-4827.Saviano, M., Aida, M., and Corongiu, G., Biopolymers 1991, 31, 1017-1024.Schmidt, J. M., Ohlenschlager, O., Ruterjans, H., Grzonka, Z., Kojro, E., Pavo, I., andFahrenholz, F., Eur. J. Biochem. 1991, 201, 355-371.Sirnonson, T., and Brunger, A. T., Biochemistry 1992, 31, 8661-8614.Smith, P. E. and Dang, L. X., J. Am. Chem. SOC. 1991, 113, 67-73.Smith, P. E. and Pettitt, B. M., J. Am. Chem. SOC. 1991, 113, 6029-6037.Soman, K. V., Karimi, A., Case, D. A., Biopolymers 1991, 31, 1351-1361.Sukumar, M. and Higashijima, T., J. Biol. Chem. 1992, 21421-21424.TiradoRives, J. and Jorgensen, W. L., Biochemistry 1991, 30, 3864-3871.Tobias, D. J. and Brooks 111, C. L., Biochemistry 1991, 30, 6059-6070.Tobias, D. J., Mertz, J. E., and Brooks 111, C. L., Biochemistry 1991, 30, 6054-6058.Tobias, D. J., Sneddon, S. F., and Brooks 111, C. L., J. Mol. Biol. 1990, 216, 783-796.Ward, D. J., Chen, Y., Platt, E., and Robson, B., J. Theor. Biol. 1991, 148, 193-227.Wilkes, B. C. and Schiller, P. W., Znt. J Peptide Prote<strong>in</strong> Res. 1992, 40, 249-254.Wolburn, U., Brunne, R. M., Hart<strong>in</strong>g, J., Holzemann, G., and Liebfritz, D., Int. .I. PeptideProte<strong>in</strong> Res. 1993, 41, 316-384.Zanotti, G., Majone, A., Rossi, F., Sav<strong>in</strong>o, M., Pedone, C., and Tancredi, T., Biopolymers1993, 33, 1083-1091.3.2 Energy Calculation MethodsMost peptide hormones have had some form of energetic study of their conformationalbehaviour. The first studies of peptide hormones used simple energy m<strong>in</strong>imisationto determ<strong>in</strong>e local energy m<strong>in</strong>imum conformations. Unfortunately, the flexiblenature of peptide hormones means that they have many structurally very dif-


3 Molecular Dynamics Simulations of Peutides 43ferent conformations which are with<strong>in</strong> 1-2 kcals of each other. For this reasonvarious conformational search methods have been used to attempt to explore moreof conformational space without perform<strong>in</strong>g a full search. Two important methodsare those based on distance geometry [lo] and those based on <strong>molecular</strong> dynamics[I 1, 121. Distance geometry techniques essentially are non-energetic techniques whichset up random distance matrices, i.e. the matrix of all distances between all atompairs <strong>in</strong> the molecule, with constra<strong>in</strong>ts on certa<strong>in</strong> distances, e. g. bond lengths, NOEdata etc. and compute Cartesian coord<strong>in</strong>ates from these matrices. Moleculardynamics is a determ<strong>in</strong>istic simulation process where<strong>in</strong> the positions and velocitiesof atoms <strong>in</strong> a molecule are <strong>in</strong>tegrated forward <strong>in</strong> time us<strong>in</strong>g Newton’s laws of motion.The <strong>in</strong>itial velocities are randomly ascribed to atoms via a Maxwellian distributionconsistent with the temperature at which the simulation is be<strong>in</strong>g performed. Themovement of the atoms is then governed by the k<strong>in</strong>etic energy <strong>in</strong>put <strong>in</strong>to the systemand the restor<strong>in</strong>g forces that act on the molecule when its position from a m<strong>in</strong>imumenergy conformation is disturbed. The latter term is described by a forcefield fromwhich the potential energy of the system can be determ<strong>in</strong>ed. This term consists ofstra<strong>in</strong> energies such as bond length, bond angle deformations, torsional componentsetc. and van der Waals non-bonded <strong>in</strong>teractions and electrostatic terms. The ValenceForce Field (VFF) [13] AMBER [14] and CHARMM [15] are examples of forcefields<strong>in</strong> use to study peptide and prote<strong>in</strong> conformations.3.3 Applications of Molecular DynamicsAs seen from Table 3-1, the most common use of calculations on peptides at the momentis to aid <strong>in</strong> generat<strong>in</strong>g conformational hypotheses <strong>in</strong> association with data fromNMR studies. Many of these studies <strong>in</strong>volve NOE-restra<strong>in</strong>ed <strong>molecular</strong> dynamics, <strong>in</strong>which the experimental proton-proton distance constra<strong>in</strong>ts correspond<strong>in</strong>g to NOEcross-relaxation rates, obta<strong>in</strong>ed from 2D NOESY experiments, are directly used <strong>in</strong>the simulation [16]. Typically a harmonic term is <strong>in</strong>cluded <strong>in</strong> the forcefield topenalise for deviations from the observed proton-proton distances.Another application is the use of relative free energy to compare chemicallydist<strong>in</strong>ct systems us<strong>in</strong>g f<strong>in</strong>ite difference thermodynamic <strong>in</strong>tegration (FDTI) [17]. Inpractice, this is done by <strong>in</strong>troduc<strong>in</strong>g a coupl<strong>in</strong>g parameter A <strong>in</strong>to the forcefield whichchanges from 0 to 1 as the hamiltonians correspond<strong>in</strong>gly change from state A to stateB. Results from these calculations can be directly related to experimentally obta<strong>in</strong>edthermodynamic properties.A number systems have been studied <strong>in</strong> solvent, ma<strong>in</strong>ly water, but some have<strong>in</strong>volved other solvents commonly used <strong>in</strong> NMR studies of peptides.


~443.3.1D. J. Osnuthorpe and I? K. C. PaulPharmaceutical Applications ofConformational Studies of PeptidesA very common problem <strong>in</strong> the pharmaceutical <strong>in</strong>dustry is how to determ<strong>in</strong>e the conformationalrequirements for activity <strong>in</strong> systems where the structure of the receptoris not known. Much time and effort has been devoted to this problem. Here we concentrateon the application of <strong>molecular</strong> dynamics to this problem and look at thespecific example of peptide hormones, systems which are very difficult to handlewith standard methods because of the highly flexible nature of peptides. Two examplesof peptide hormones that we have studied will br<strong>in</strong>g out the techniques andthe <strong>in</strong>formation one can get from such studies.Molecular dynamics was first used <strong>in</strong> this way to study the hormone vasopress<strong>in</strong>[MI. These procedures were developed based on the idea that conformationalrecognition is the basis of receptor-ligand <strong>in</strong>teractions. Initially the receptorrecognises a “b<strong>in</strong>d<strong>in</strong>g” conformation, which the ligand must be able to adopt tob<strong>in</strong>d to the receptor. The receptor may recognise this conformation by look<strong>in</strong>g forthe correct position<strong>in</strong>g of certa<strong>in</strong> functional groups, the b<strong>in</strong>d<strong>in</strong>g groups. This meansthat the ligand has to be able to adopt a certa<strong>in</strong> conformation, one which has thefunctional groups positioned correctly. Follow<strong>in</strong>g b<strong>in</strong>d<strong>in</strong>g, the response is generatedby either a conformational change occurr<strong>in</strong>g <strong>in</strong> the ligand-receptor complex (which<strong>in</strong>volves the ligand) or the correct position<strong>in</strong>g of certa<strong>in</strong> functional groups (the activegroups). Thus, agonists are capable of undergo<strong>in</strong>g this conformational change orthey have the active functional groups, whereas antagonists do not.Therefore, if we f<strong>in</strong>d the accessible conformations of a peptide hormone andthose of an antagonist, and then perform a structural cross comparison of these twosets of structures, conformations that both the peptide and antagonist adopt areputative b<strong>in</strong>d<strong>in</strong>g conformations whereas conformations that agonists adopt but antagonistsdo not are putative conformations necessary for activity. We have used<strong>molecular</strong> dynamics simulations to access the conformational space of a molecule,and use the result<strong>in</strong>g m<strong>in</strong>imised conformations to give us po<strong>in</strong>ters to the molecule’sfunction. In this case we are simply us<strong>in</strong>g the properties of <strong>molecular</strong> dynamics tosearch conformational space <strong>in</strong> an energetically directed manner. This is not a“simulation” of the <strong>molecular</strong> properties of the system, but a means to determ<strong>in</strong>ewhich conformations the trajectory has passed through. Hence we can use hightemperatures and we do not have to worry about equilibration, <strong>in</strong>deed it is better touse the non-equilibrated parts of the trajectory. We use two major methods of trac<strong>in</strong>gaccessed conformations. The first one is to follow the conformational transitions<strong>in</strong> the molecule by plott<strong>in</strong>g backbone torsional angles like a, and t,u of key or “<strong>in</strong>terest<strong>in</strong>g”residues and m<strong>in</strong>imis<strong>in</strong>g from that po<strong>in</strong>t <strong>in</strong> the trajectory. The secondmethod is to periodically m<strong>in</strong>imise conformations accessed dur<strong>in</strong>g the simulation atsay every 5.0 or 1 picosecond and cluster them <strong>in</strong>to families depend<strong>in</strong>g on rms devia-


3 Molecular Dynamics Simulations of Peptides 45tions. We have found the former technique useful for constra<strong>in</strong>ed systems like cyclicpeptides with the latter be<strong>in</strong>g more useful for l<strong>in</strong>ear molecules with a lot of conformationalflexibility. Other parameters to use to follow the overall shape of themolecule <strong>in</strong>clude end-to-end distance, radius of gyration and moment of <strong>in</strong>ertia.3.3.2 Lute<strong>in</strong>is<strong>in</strong>g Hormone Releas<strong>in</strong>g Hormone (LHRH)LHRH is a decapeptide synthesised <strong>in</strong> the hypothalamus which on arrival at theanterior pituitary gland, selectively stimulates the release of gonadotrop<strong>in</strong>s Lute<strong>in</strong>is<strong>in</strong>gHormone (LH) and Follicle Stimulat<strong>in</strong>g Hormone (FSH). These glycoprote<strong>in</strong>s <strong>in</strong>turn stimulate the gonadal production of sex steroids and gametogenesis respectively.In 1977 the Nobel prize was awarded to Drs. Schally and Guillem<strong>in</strong> for the isolationand chemical characterisation of LHRH. The chemical structure elucidation ofLHRH was a major breakthrough <strong>in</strong> peptide chemistry and LHRH was found to havethe sequence p-Glu-His-Trp-Ser-Tyr-Gly-Leu-Arg-Pro-Gly-NH2. LHRH supposedlyhas a half life of 3 to 6 m<strong>in</strong>utes <strong>in</strong> the hypothalamic-pituitary blood portal circulationand is degraded by a comb<strong>in</strong>ation of endopeptidases act<strong>in</strong>g at 5r5-Gly6, enzymesthat hydrolyse p-Glu and enzymes that cleave the carboxyl side of Pro.Not long after the sequence of LHRH was first established and its synthesis successfullyperformed, analogues were made which turned out to be (a) superagonistsor (b) mild antagonists. In the case of agonists the replacement of Gly6 by hydrophobicD-residues had the greatest impact on its activity as did substitutions at theC-term<strong>in</strong>al end. Most of these analogues turned out to be superagonists. Howeverwhen Gly6 was replaced by L residues there was a considerable drop <strong>in</strong> activity. Thisled to the implication that, while resistance to enzymatic activity may be responsiblefor the <strong>in</strong>creased potency by substitution of the Gly at the C-term<strong>in</strong>al end, it is theconformational constra<strong>in</strong>t provided by the D-Ala6 residue while b<strong>in</strong>d<strong>in</strong>g to thereceptor that is responsible for the superagonist action. Superagonist analogues arenow commercially available and have a variety of cl<strong>in</strong>ical applications.S<strong>in</strong>ce the receptor of this hormone is not known, the “active conformation” hasto be deduced from data about the hormone and putative agonists and antagonists.The work<strong>in</strong>g hypothesis is that the hormone, its agonists and antagonists all havecommon conformational features which are necessary for b<strong>in</strong>d<strong>in</strong>g to the receptor.The hormone and its agonist (but not antagonists) have additional features whichelicit the receptor’s biological response. The possibility of us<strong>in</strong>g an LHRH antagonistto control fertility and also treat hormonal disorders had spurred the searchfor a highly potent long act<strong>in</strong>g low toxicity analogue. To this end thousands of antagonistanalogues of LHRH, <strong>in</strong>corporat<strong>in</strong>g various substitutions at different positions<strong>in</strong> its sequence have been made and tested. Furthermore, over the years the b<strong>in</strong>d<strong>in</strong>gand activity of the analogues have <strong>in</strong>creased by several orders of magnitude with


46 D. 1 Osnuthorpe and r! K. C. Paulthe second generation of antagonists 100 to a 1000 times more active than the earliestanalogues and the present day antagonists a similar proportion more active than thesecond generation analogues.3.3.3 Structural Studies on LHRHThe early energy calculations on LHRH by Momany, published as two papers <strong>in</strong>JACS <strong>in</strong> 1976, supported a conformation for LHRH which was characterised by aTyr5-Gly6-Leu7-Arg* modified type 11' /3 turn [19, 201. Later calculations by othergroups also tended to support this conformation [21]. However, unequivocal NMRevidence for such a conformation of LHRH itself was not forthcom<strong>in</strong>g though manygroups had worked on the problem. This however could be attributed to the solvent<strong>in</strong> which NMR was performed (ma<strong>in</strong>ly water or D20) and also the conformationalvariety exhibited by the molecule. However, it was generally accepted that the lowenergy conformations of LHRH were characterised by the 5 -8 p-turn.In the design of antagonists therefore, it was thought essential to reta<strong>in</strong> this conformationalfeature of LHRH as it was expected that a similar aff<strong>in</strong>ity to the receptorwould be conferred by the presence of the type 11' 5-8 8-turn. A common practice<strong>in</strong> design<strong>in</strong>g peptide analogues is to design analogues with restricted conformationalspace. Cyclisation is one of the ways to achieve this restriction [22]. A set ofLHRH analogues, cyclised through the side cha<strong>in</strong>s of residues 5 and 8, was foundto be potent agonists [23]. The analogue: Ac-D-Phe'(C1)-D-Phe2(C1)-D-Trp3-Ser4-Glu5-D-Arg6-Leu7-Lys8-Prog-D-Ala'o-NH2 was found to be a highly potent LHRHantagonist. This antagonist was conformationally constra<strong>in</strong>ed to form a cyclic peptidebetween residues 5 and 8 <strong>in</strong> order to mimic the p-turn that was shown by energycalculations to be present <strong>in</strong> the native peptide.An exhaustive study of the conformational preferences of this analogue was carriedout us<strong>in</strong>g conformational search MD and m<strong>in</strong>imisation techniques [24]. A conformationalsearch of the cyclic part of the analogue was carried out first, followedby m<strong>in</strong>imisations of all the generated conformations that were consistent withcyclisation. Among the 22 conformations obta<strong>in</strong>ed after m<strong>in</strong>imisation, only two hada &turn conformation at position 6-7, and these were =lo kcal/mol higher <strong>in</strong>energy than the most stable conformation. In all other generated conformations thecentral two residues were <strong>in</strong> extended or y-turn conformations. A MD study of thisanalogue was performed next. The <strong>in</strong>itial structure was based on the lowest energyconformation found for the cyclic part. Additional <strong>in</strong>vestigations of the conformationalpreferences of the N and C term<strong>in</strong>al of the peptide def<strong>in</strong>ed a plausible conformationfor 3 more residues. The others were set to an extended conformation. Afterabout 24 ps the peptide adopted a 8 sheet conformation, with a /3 turn betweenresidues 3-6. The evolution of the 8-turn 8 sheet conformation from an extended


3 Molecular Dynamics Simulations of Peptides 47one is illustrated <strong>in</strong> Figure 3-1. Additional <strong>molecular</strong> dynamics simulations start<strong>in</strong>gfrom the conformation with a P-turn at residues 5-8 resulted <strong>in</strong> higher energy conformations.Similar studies were carried out on other antagonists cyclised throughthe side cha<strong>in</strong>s of residues 5 and 8. A total of 1.5 nanoseconds of MD simulationswere carried out, start<strong>in</strong>g from different <strong>in</strong>itial conformations as suggested from con-Figure 3-1. Results of a conformational search and <strong>molecular</strong> dynamics study of an LHRHantagonist. The backbone of the peptide is denoted by the filled bonds. Snapshots of conformationscaptured <strong>in</strong> this figure show the evolution of a /3-turn, /%sheet conformation at 50 psfrom a fairly extended conformation at 18 ps. The f<strong>in</strong>al conformation (lower right hand corner)was found to fold <strong>in</strong>to a 3-6 type I /3-turn which <strong>in</strong>cluded the Glu-Lys sidecha<strong>in</strong> cyclisationresidues.


48 D. J. Osguthorpe and I? K. C. Paulformational searches and previous studies. A master list of accessible conformationsfor the peptide and its cyclised analogue was compiled. Each of the compounds wastemplate forced onto all conformations, thus <strong>in</strong>vestigat<strong>in</strong>g the ability of eachanalogue to adopt these conformations. A consistent trend emerged <strong>in</strong>dicat<strong>in</strong>g that,for this series of analogues, a p sheet structure with a turn at residues 3-6 is motfavourable. All structures with a 5-8 turn were of higher energies.In the preferred structure that emerged from these simulations the N and C term<strong>in</strong>alsare <strong>in</strong> close proximity. This suggested that the next step towards a more activeantagonist might be to jo<strong>in</strong> the two ends by an arnide bond. Calculations [25] showedthat the bicyclic analogue ma<strong>in</strong>ta<strong>in</strong>ed the preference for a 3-6 rather than a 5-8 turn.In addition, any other possible patterns of hydrogen bond<strong>in</strong>g consistent with a p-sheet structure were ruled out by us<strong>in</strong>g an N-methyl residue at position 10. The stableconformation for this analogue is depicted <strong>in</strong> Figure 3-2. The two bicyclic analogues,with D-Ala" and with D-MeAla" were synthesised. The first analogue was not ac-Figure 3-2. Proposed bicyclic analogue of LHRH based on the results of <strong>molecular</strong> dynamicsconformational search<strong>in</strong>g.


3 Molecular Dynamics Simulations of Peptides 49tive, however, the methylated analogue showed considerable activity [25]. Thus thedesign of an analogue based on the <strong>molecular</strong> dynamics calculations confirms ourstructural predictions regard<strong>in</strong>g the location of the 0-turn and its overall <strong>in</strong>fluence<strong>in</strong> the structure.3.3.4 LHRH AgonistsThis leads to the follow<strong>in</strong>g question. Is the conformation of LHRH proposed earlierfrom energy calculations [19, 201, i. e. the one with a 5-8 p-turn a truly low energyconformation? Constra<strong>in</strong>ed <strong>molecular</strong> dynamics on LHRH was performed start<strong>in</strong>gwith conformations forced to adopt /?-turns between residues 2-5, 3-6, 4-7 and 5-8.All possible types of p-turns, appropriate to the sequence of the middle residues weretried <strong>in</strong> the calculations. Thus the start<strong>in</strong>g conformations consisted of(a) a type I His2-Trp3 p-turn(b) a type I Trp3-Ser4 p-turn(c) a type I Ser4-Tyr’ /?-turn(d) a type I Tyr5-Gly6 p-turn(e) a type I1 Tyr5-Gly6 p-turn(f) a type I Gly6-Leu7 p-turn(8) a type 11’ Gly6- Leu7 8-turn(h) an extended conformationThe simulations were performed for 50 ps each, at 600 K, and the conformations accessedsampled at appropriate <strong>in</strong>tervals. The results are shown <strong>in</strong> Table 3-2. Thelowest energy conformations do <strong>in</strong>dicate some p-turn character between residues 5to 8 though without the associated 4-1 hydrogen bond, v<strong>in</strong>dicat<strong>in</strong>g the earliercalculations on LHRH. Conformations with a hydrogen bonded p-turn betweenresidues 3 to 6 (which were found to be low energy conformations <strong>in</strong> the antagonistseries) are about 10 kcal/mol higher <strong>in</strong> energy. The calculations were extended to D-and LAla6 substituted LHRH analogues and it was found that a 5-8 type 11’ pturnconformation is <strong>in</strong>deed the one with the lowest energy conformation. Though boththe analogues can adopt this P-turn the D-Ala analogue is about 6 kcal/mol lower<strong>in</strong> energy than the GAla analogue.Why is this apparent predilection of agonists for one type of p-turn and for themost potent antagonists known to date to adopt another type? One possibility is thatthe <strong>in</strong> the antagonist conformations with the 3 -6 p-turn though the b<strong>in</strong>d<strong>in</strong>g is goodthe effector system is not triggered ow<strong>in</strong>g to the different sidecha<strong>in</strong>s that come tobear on the receptor. Whereas <strong>in</strong> the agonist with a 5-8 type II’ &urn it is someother residues that <strong>in</strong>teract with the receptor which is responsible for gonadotrop<strong>in</strong>secretion. In the absence of a structure for the LHRH receptor all this will rema<strong>in</strong>highly speculative.


50 D. J. Osguthorpe and I? K. C. PaulTable 3-2. Conformations of LHRH from different start<strong>in</strong>g P-turn conformations.Stat pglul Hi2 Trp3 Se? Ty? Glys Leu7 Arga Pro' Gly'O Energyhnf. c Y c Y c Y c (v c Y c Y # Y c V C Y # w(kcaVmo1)34-typel I41 174 -85 85 -71 -12 -65 -27 -83 82 -84 77 -88 100-102 104-82 82 -85 78 165.5109 75 -89 82 -95 -72 -78 81 -127 96 88 -92 -126 75-148 116-79 95-165 79 144.4114 -47 -83 74 -77 -21 -67 -24-100 -59 -86 101 -76 82-136 95-67 97 88 -77 142.645-typel 143 90 -89 78-108139 -57 -45 -80 -7-173 84 -88106 -90 116-80 74 -84 90 144.9142 91 -87 75-109 146 -61 -36-106 93 98 71 -84104-104104-80 82-160 93 140.2138 100 -79 79-103 142 -61 -18 -92 -67 -90 87 -88 92 -112 155 -75 89 -175 96 130.6139 97 -83 87-108 154 -58 -29-116 55 160 77 -89 90-105152-67 -27 -79 74 133.656Jypel 109 75 -91 83 -99 100 -79168 -58 -38-100 61 -145 112 -98107-81 82 -84 79 161.8143 93 -88 91 -79 91 -163 -79-127 -33 -62 -58 -94 102 -86 124-80 78 87 -86 140.1144 -43 -81 91 -80 79-171 -59-129 -47 -61 -59 -93105 -89 115-84 67 79 -80 127.256-type2 145 71 -89 80 -97 86 -91 150 -63 117 81 -4-149 85 -119 96-83 78 -84 78 148.5112 -54 -88 84 -95 82 -104 160 -62 119 81 -3-150 95 -111 93-78 84 146 -86 139.967-type1 145 69 -90 81 -95 71 -89 89-130 133 -59 -29 -88 -4-101 108-80 78 -85 78 161.1109 74 -88 83 -95 65 -93 77-105 147 -76 -67 -84102 -98123-52 -43 -86 81 152.8144 -52-143130 -84 92 -82 76-127142 -56 -47 -72 -34-136 86-86 78-169 -89 132.667-type2' 144 90 -115 82 -96 99 -79 99 -85 81 66 -112 -90 -5-100 108-81 113-147 86 134.7146 -50 -93 88 -86 125 -77 102 -84 80 77 -104 -122 66 -148 87-71 111 -144 113 128.9144 -51 -88 86-131 158 -76 97 -81 80 77-106-121 67-148 94-76 100 84 -89 121.7110100 -77100-143160 -78 96. -83 84 80 -69-129 -48 -71 118-76100 170 -82 132.1extended 1445 68 -90 82 -94 66 -95 78 -94 83 -84 78 -88 101 -102 104 -82 82 -85 78 158.3145 69 -94 76 -114 163 -77 90 -82 97 -84 47-100 79-125 91 -83 77-178 83 149.1145 -53 -99 94-132 171 -79 80 -88 157 -75 -69 -84 113-101 107-80 92 143-101 137.7


3 Molecular Dynamics Simulations of Peptides 513.3.5 Melan<strong>in</strong> Concentrat<strong>in</strong>g Hormone (MCH)MCH is a neuropeptide produced <strong>in</strong> the hypothalamus. In teleosts it concentratesmelan<strong>in</strong> with<strong>in</strong> the pigment cells of the sk<strong>in</strong> [26], hence caus<strong>in</strong>g the fish sk<strong>in</strong> to appearpaler, but it is not a simple antagonist of melan<strong>in</strong> stimulat<strong>in</strong>g hormone (MSH).It also <strong>in</strong>duces melanosome dispersion with<strong>in</strong> tetrapod melanophores [27]. MCHalso acts as a potent pituitary hormone, <strong>in</strong>hibit<strong>in</strong>g the release of ACTH <strong>in</strong> mammals[28], and stimulat<strong>in</strong>g growth hormone release <strong>in</strong> rats [29]. Although it has beenshown to be present <strong>in</strong> man its function <strong>in</strong> man is, as of yet, unknown. MCH is anoligopeptide of 17 residues with the sequence:Asp-Thr-Met-Arg-Cys-Met-Val-Gly-Arg-Val-Tyr-Arg-Pro-Cys-Trp-Glu-ValA disulphide bridge between Cys' and Cys14 forms an <strong>in</strong>tra<strong>molecular</strong> r<strong>in</strong>g of 10residues.From an arbitrary start<strong>in</strong>g conformation, built us<strong>in</strong>g <strong>molecular</strong> graphics such thatthe Cys-Cys disulphide could be made, we generated a total of 150 picoseconds of<strong>molecular</strong> dynamics trajectories on MCH and the 2 fragments, the cyclic MCH5-14r<strong>in</strong>g and the l<strong>in</strong>ear MCH,-,, fragment [30, 311. The next stage of the <strong>molecular</strong>design procedure is to determ<strong>in</strong>e the local m<strong>in</strong>ima be<strong>in</strong>g passed through. This is doneby tak<strong>in</strong>g the <strong>in</strong>stantaneous coord<strong>in</strong>ates every picosecond and m<strong>in</strong>imis<strong>in</strong>g each ofthese structures. One of the major features of these conformations is an <strong>in</strong>ternalcross-r<strong>in</strong>g hydrogen bond from the Tyr" side cha<strong>in</strong> hydroxyl to the backbone carbony1of Cys' which may, to a certa<strong>in</strong> extent, be responsible for the rigidity of ther<strong>in</strong>g. In all the conformations we generated we found that the largest conformationalchanges occured <strong>in</strong> the GlyE-Argg region. Some regions of the peptide were seen tobe constra<strong>in</strong>ed <strong>in</strong> the simulations. One such area was the section Val" to Cys14,around the Pro'3 residue. A sample m<strong>in</strong>imised conformation of cyclic MCH,-,,show<strong>in</strong>g the cross-r<strong>in</strong>g hydrogen bond is given <strong>in</strong> Figure 3-3.Thus, from the <strong>molecular</strong> dynamics we have been able to identify conformationallyconstra<strong>in</strong>ed features and regions of conformational flexibility of MCH,which may have bear<strong>in</strong>g on the conformation required for activity. It should benoted these studies have been carried out as part of experimental programs, whichis important for many biological systems as modell<strong>in</strong>g with limited <strong>in</strong>formation mustalways be checked aga<strong>in</strong>st experimental data and structural hypotheses tested by synthesis(see [30-321 for full details).At this stage of the <strong>in</strong>vestigation, we cannot determ<strong>in</strong>e which of the conformationsmay be the b<strong>in</strong>d<strong>in</strong>g or active conformations. Additional <strong>in</strong>formation, from aknown antagonist or additional agonists or <strong>in</strong>active compounds, is needed before wecan make a more specific prediction of the active conformation.


52 D. J. Osguthorpe and I! K. C. PaulFigure 3-3. The cyclic r<strong>in</strong>g <strong>in</strong> MCH from Cys' and Cys14, the part considered essential foractivity. Molecular dynamics shows conformational mobility around Glys and conformationalrigidity around Pro13. The cross-r<strong>in</strong>g hydrogen bond <strong>in</strong>volv<strong>in</strong>g Tyr" is shown by thedotted l<strong>in</strong>e.3.3.6 De NOVO Peptide and Prote<strong>in</strong> DesignSo far we have considered the <strong>molecular</strong> dynamics studies of peptides with a specificsequence and known functionality. We would like to be able to design arbitrary peptidesequences which would have some specific function, e.g. a certa<strong>in</strong> secondarystructure. Although such de novo design is <strong>in</strong> its very early stages, a recent study ofours has attempted to do just this by design<strong>in</strong>g amphiphilic peptides which wouldadopt an a-helical structure <strong>in</strong> a membrane.Electronic devices have lead to rapid advances <strong>in</strong> technology by virtue of theirability to be densely packed. However, electronic devices are now at a stage wherefurther reductions <strong>in</strong> size are limited by fundamental physical laws. The search is onfor novel devices that could replace current electronic devices such as transistors, <strong>in</strong>tegratedcircuits and yet be much smaller <strong>in</strong> size. Molecular-based devices wouldallow very small devices to be created and many such devices would occupy a verysmall amount of space. Liv<strong>in</strong>g creatures use <strong>molecular</strong> behaviour to manage their


3 Molecular Dynamics Simulations of Pevtides 53<strong>in</strong>teraction with the environment and have done this very successfully. Us<strong>in</strong>g natureas a model, we can look for analogues of the systems we would like to develop as<strong>molecular</strong> devices. One important function required is a switch which acts on some<strong>in</strong>formation flow. An important switch <strong>in</strong> nature is the ligand gated ion-channel,which is found <strong>in</strong> many organs but <strong>in</strong> particular is an important component of bra<strong>in</strong>cells. A switchable ion channel could form the basis of <strong>molecular</strong> electronics componentsbased on ion gat<strong>in</strong>g, rather than electron gat<strong>in</strong>g.Switchable transmembrane ion channels are used <strong>in</strong> nature to control ion concentrationson either side of a membrane. One of the best-studied ion channel familiesare the ligand-gated ion channels [33] of the which the most studied member is thenicot<strong>in</strong>ic acetylchol<strong>in</strong>e receptor. From experimental studies of this channel, the ionchannel is believed to <strong>in</strong>volve four a-helices which pack together <strong>in</strong> a helix-bundletype arrangement to form a central channel. From am<strong>in</strong>o-acid sequence <strong>in</strong>formationof the ion channel, model a-helices have been created which suggest that the polaram<strong>in</strong>o-acids of these helices aggregate on one face and the apolar am<strong>in</strong>o-acids onthe opposite face. Synthetic peptide ion channel models based on these native sequencesalso have a similar position<strong>in</strong>g of polar and apolar am<strong>in</strong>o-acids. Such peptideshave been shown to act as short life-time channels <strong>in</strong> synthetic lipid bilayersP41.Hence, we wished to design novel peptides which would form stable a-helices <strong>in</strong>membranes and would aggregate to form a helix-bundle which would allow ions topass through a central channel. Our primary design suggestion was to add hydrophilicpolyethoxy spacers to l<strong>in</strong>k the hydrophilic residues (Ser) of the synthetic sequencebased on the natural ion channel which showed channel activity. A numberof question were raised by this suggestion for which we used <strong>molecular</strong> dynamicssimulations to f<strong>in</strong>d answers. Two ma<strong>in</strong> questions were: What length of spacer wouldstabilise the helix with the least perturbation of the helix structure? and How manyof these helices would form the bundle best able to transport ions?3.3.7 Molecular Dynamics Calculationson Synthetic Ion ChannelsSimulations were performed on synthetic ion channels hav<strong>in</strong>g 4, 5 and 6 helicalbundles of (LSSLLSL)*3 peptide with and without bridg<strong>in</strong>g spacers between theSer-2 and Serd of each LSSLLSL fragment. All the helices were aligned parallel toeach other <strong>in</strong> the start<strong>in</strong>g conformations. The <strong>in</strong>itial conformation for the residueswas set to a (p, ty) of (- 57, - 47). In addition, <strong>in</strong> the case of the four helix bundlean anti-parallel arrangement was tried. Also a simulation of the four helix bundlewithout spacers but <strong>in</strong>clud<strong>in</strong>g three Na ions with<strong>in</strong> the channel was attempted. Thus


54 D. L Osguthorpe and I! K. C Paula total of eight <strong>molecular</strong> dynamics simulations were undertaken. The simulationswere performed for a total of 100 picoseconds (ps) for the 4 and 5 helix bundles andfor 50 ps for the 6 helix bundle. The temperature of the simulations was 300 K witha timestep of 1 femtosecond and the Leap Frog algorithm was used to update thecoord<strong>in</strong>ates and velocities. The start<strong>in</strong>g structure <strong>in</strong> each case was built by eye us<strong>in</strong>gthe program Insight [35].The results of the dynamics simulations were analysed <strong>in</strong> terms of the conformationsaccessed by the different systems under study. These conformations weregenerated by energy m<strong>in</strong>imisation at various <strong>in</strong>stances dur<strong>in</strong>g the simulation. Thetorsion angles v, and ty of each residue were tabulated for each m<strong>in</strong>imised structure.The pr<strong>in</strong>cipal components of the moment of <strong>in</strong>ertia around an axis of rotation andthe associated radius of gyration were also calculated for each structure. In the caseof the four helix bundles, the anti-parallel arrangement was found to be more stablethan the parallel one from energy considerations. Table 3-3 compares the lowestenergy conformation obta<strong>in</strong>ed from an analysis of the <strong>molecular</strong> dynamics simulationof 100 ps each of the parallel and anti-parallel four helical bundles. Clearly thehighly favourable electrostatic <strong>in</strong>teraction is the cause of the lower energy of the antiparallelarrangement. One explanation is that a-helices can be thought of as macrodipoles [36].Table 3-3. Lowest energy conformations of 4 helical (LSSLLSL)*3 bundles.Energy kcal/mol Parallel Anti-parallel~_____Bond 164.2 169.1Angle 310.7 318.9Torsion 52.4 63.9Out-of-Plane 1.9 3 .ONon-Bonded - 12.5 16.0Electrostatic 117.7 25.3Total 634.4 596.3Figure 3-4 shows the pr<strong>in</strong>cipal components of the moment of <strong>in</strong>ertia of the paralleland anti-parallel four helix bundles. In these figures the smaller value cont<strong>in</strong>uousl<strong>in</strong>e corresponds to the direction of the helical axis, while the dashed and dotted l<strong>in</strong>esrepresent elongations along directions normal to the helical axis. Clearly the fourhelix bundle <strong>in</strong> both parallel and anti-parallel arrangements show overall channel likestructure over the simulation period. However, the constancy of these values for theanti-parallel case suggests that the orig<strong>in</strong>al arrangement of the helix bundle built byeye on a picture system is better reta<strong>in</strong>ed right through the 100 ps <strong>in</strong> the anti-parallelcase. The parallel arrangement of four helical bundles seems a plausible structuralarrangement for channel form<strong>in</strong>g activity of synthetic peptides [37, 381. However,there is no consensus regard<strong>in</strong>g the parallel or anti-parallel arrangement <strong>in</strong> naturally


3 Molecular Dynamics Simulations of Peptides 55IPr<strong>in</strong>cipal MI 4 helical parallel6.0 --II I ~ l l 1 ~ 1 l 1 ~ 1 1 1 ~ 1 1 ~0.0 20.0 40.0 60.0 80.0 100.0IPr<strong>in</strong>cipal MI 4 helical anti-parallelA,- -8.0- --6.0 -n0l l l ~ l l l ~ l l l [ l ~ ~ [occur<strong>in</strong>g systems like Alamethic<strong>in</strong> [39]. A ribbon diagram of the lowest energy conformationof the four helical parallel bundle is shown <strong>in</strong> Figure 3-5 from two differentperspectives.


56 D. J Osguthorpe and I? K. C. PaulFigure 3-5. Side-on and end-on views of the lowest energy conformation of a four helix(LSSLLSL)*3 bundle. The ribbon represents spl<strong>in</strong>e curves fitted to the position of thebackbone atoms of the peptide, with the plane of the ribbon determ<strong>in</strong>ed by the plane of theamide bonds.3.3.7.1 Spacer <strong>in</strong> (LSSLLSL)*3 HelixIn order to confer additional stability to the a-helix a synthetic program was startedwhere<strong>in</strong> the residues Ser’ and Ser6 <strong>in</strong> each (LSSLLSL) fragment was bridged. Threebridges were tried and calculations showed that the bridge -CH2-CO-CH2- betweenthe Ser’ and Ser6 sidecha<strong>in</strong>s was an optimum one for reta<strong>in</strong><strong>in</strong>g an a-helical arrangement.Molecular dynamics simulations of the four helix bundles with spacers wasperformed as before. The moment of <strong>in</strong>ertia plots for a four helical bridged bundle<strong>in</strong> a parallel arrangement is shown <strong>in</strong> Figure 3-6. Once aga<strong>in</strong>, we see that the overallchannel arrangement is ma<strong>in</strong>ta<strong>in</strong>ed dur<strong>in</strong>g the simulation period. Compar<strong>in</strong>gFigures 3-6 and 3-4 it can be seen that the helix is shortened along the helix axisdirection and broader <strong>in</strong> one of the directions normal to the helix axis.F<strong>in</strong>ally, <strong>in</strong> the four helical bundle case, simulations were performed with 3 Naions <strong>in</strong>corporated <strong>in</strong>to the helix. Even here, the overall channel structure was reta<strong>in</strong>edright through the simulation period show<strong>in</strong>g the feasibility of these synthetic peptidesto conduct small cations. This is illustrated <strong>in</strong> Figure 3-7.The calculations were extended to five and six helix bundles with and withoutbridg<strong>in</strong>g spacers <strong>in</strong> the parallel arrangements. In all cases except the five helical bundlemotif the overall channel structure was found to be ma<strong>in</strong>ta<strong>in</strong>ed. We do not knowat this <strong>in</strong>stance whether this was an artifact of our start<strong>in</strong>g conformation or if thefive helical arrangement is an unfavourable one as a channel. Although each of itsa-helices are not far removed from their start<strong>in</strong>g conformations. However, thebridged version of the five helical bundle reta<strong>in</strong>s its overall structure. The pore size<strong>in</strong>creased as the number of peptides form<strong>in</strong>g the bundle <strong>in</strong>creased. We are currentlyawait<strong>in</strong>g experimental work on these peptides to be concluded so as to be able topredict with greater authority all the structural characteristics.


3 Molecular Dynamics Simulations of Peptides 57Figure 3-6. Pr<strong>in</strong>cipal components of moment of <strong>in</strong>ertia of the four helical bundle withspacers. The small value cont<strong>in</strong>uous l<strong>in</strong>e corresponds to the direction of the helical axis. Thedotted and dashed l<strong>in</strong>es are extensions along directions normal to the helical axis.Figure 3-7. Pr<strong>in</strong>cipal components of moment of <strong>in</strong>ertia of the four helical bundle with Naions. See Figure 3-6 for a description of the mean<strong>in</strong>gs of the l<strong>in</strong>es.


58 D. J. Osguthorpe and r! K. C. Paul3.4 ConclusionsMolecular dynamics is an useful implement <strong>in</strong> the ever-grow<strong>in</strong>g arsenal of computationalchemists and <strong>molecular</strong> modellers. It f<strong>in</strong>ds wide application <strong>in</strong> the area ofdrug design. With the advent of more powerful <strong>computer</strong>s it is now possible torout<strong>in</strong>ely <strong>in</strong>clude solvent <strong>in</strong> <strong>molecular</strong> dynamics simulations. This makes feasibledirect comparison with certa<strong>in</strong> experimental conditions. While such simulations areno doubt useful <strong>in</strong> certa<strong>in</strong> conditions, it is useful to keep <strong>in</strong> perspective the overallaim of a <strong>molecular</strong> dynamics study. As shown <strong>in</strong> this article much <strong>in</strong>sight can begathered which <strong>in</strong> turn can be helpful <strong>in</strong> design<strong>in</strong>g more active analogues even by<strong>in</strong> vacuo calculations. This is especially relevant <strong>in</strong> cases where the structure of thereceptor is not known and one has to attempt to <strong>in</strong>fer the b<strong>in</strong>d<strong>in</strong>g site structure fromthe allowed conformations of analogues. In vacuo calculations would br<strong>in</strong>g out mostof the underly<strong>in</strong>g conformations without be<strong>in</strong>g side-tracked by the solvent used <strong>in</strong>the study which <strong>in</strong> turn may not be representative of the b<strong>in</strong>d<strong>in</strong>g environment.References[l] Ramachandran, G. N., Ramakrishnan, C., Sasisekharan, V., J. Mol. Biol. 1963, 7, 95.[2] Ramachandran, G. N., Sasisekharan, V., Adv. Prote<strong>in</strong> Chem. 1968, 23, 283.[3] Scheraga, H. A., Adv. Phys. Org. Chem. 1968, 6, 103.[4] Flory, P. J., <strong>in</strong>: Statistical Mechanics of Cha<strong>in</strong> Molecules, New York-Interscience.J. Wiley and Sons, New York 1969.[5] Liquori, A. M., Q. Rev. Biophys. 1969, 2, 65.[6] Fletcher, R., Powell, M. J. D., Comput. J. 1963, 6, 163.[7] Ermer, O., Struct. Bond. Berl<strong>in</strong> 1976, 27, 161.[8] Fletcher, R., Practical Methods of Optimization 1, Wiley, New York, 1980.[9] McCammon, J. A., Gel<strong>in</strong>, B. R., Karplus, M., Nature 1977, 267, 585.[lo] Crippen, G. M., Havel, T. F., Distance Geometry and Molecular Conformation, JohnWiley and Sons, New York 1988.[ll] McCammon, J. A., Harvey, S. C., Dynamics of Prote<strong>in</strong>s and Nucleic Acids, CambridgeUniversity Press, Cambridge, 1987.[12] Tildesley, D. J., Allen, M. P., Computer Simulations of Liquids, Clarendon Press, Oxford,1987.[13] Dauber-Osguthorpe, P., Roberts, V. A., Osguthorpe, D. J., Wolff, Genest, M., Hagler,A. T., Prote<strong>in</strong> Struct. Funct. Genet. 1988, 4, 31-47.[14] We<strong>in</strong>er, S. J., Kollman, P. A., Case, D. A., S<strong>in</strong>gh, U. C., Ghio, C., Alagona, G., Profeta,S., Jr., We<strong>in</strong>er, P., J. Am. Chem. SOC. 1984, 106, 765-784.[15] Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J., Swam<strong>in</strong>athan, S., Karplus,M., J. Comp. Chem. 1983, 4, 187.[16] Wuthrich, K., NMR of Prote<strong>in</strong>s and Nucleic Acids, Wiley-Interscience, 1986.


3 Molecular Dynamics Simulations of Peptides 59[17] Mezei, M., J. Chem. Phys. 1987, 86, 7084-7088.[18] Hagler, A. T., Osguthorpe, D. J., Dauber-Osguthorpe, P., Hempel, J. C., Science 1985,227, 1309-1315.[19] Momany, F. A., J. Am. Chem. Soc. 1976, 98, 2990-3000.[20] Momany, F. A., J. Am. Chem. SOC. 1976, 98, 2996.[21] Struthers, R. S., Rivier, J., Hagler, A. T., Ann. NX Acud. Sci. 1984, 439, 81.[22] Rizo, J., Koerber, S. C., Bienstock, R. J., Rivier, J., Hagler, A. T., Gierasch, L. M.,J. Am. Chem. Soc. 1992, 114, 2852-2859.(231 Dutta, A. S., Gormley, J. J., Mclachlan, P. F. Woodburn, J. R., Biochem. Biophys. Res.Comm. 1989, 159, 1114-1120.[24] Paul, P. K. C., Dauber-Osguthorpe, P., Campbell, M. M., Osguthorpe, D. J., Biochem.Biophys. Res. Comm. 1989, 165, 1051 - 1058.[25] Dutta, A. S., Gormley, J. J., Woodburn, J. R., Paul, P. K. C., Osguthorpe, D. J., Campbell,M. M., Bioorg. Med. Chem. Lett. 1993, 33, 943-948.[26] Gilham, I. D., Baker, B. I., 1 Endocr<strong>in</strong>ol. 1984, 102, 237.[27] Wilkes, B. W., Hruby, V. J., Sherbrooke, W. C., Castrucci, A. M., Hadley, M. E., Science1984, 224, 1111.[28] Baker, B. I., Bird, D. J., Buck<strong>in</strong>gham, J. C., J. Endcr<strong>in</strong>ol. 1985, R5-R8.[29] Skotfitsch, G., Jacobowitz, M., Zamir, N., Bra<strong>in</strong> Res. Bull. 1985, 635.[30] Brown, D. W., Campbell, M. M., K<strong>in</strong>sman, R. G., Moss, C., Paul, P. K. C., Osguthorpe,D. J., Baker, B., J. Chem. SOC. Chem. Commun. 1988, 1543-1545.[31] Paul, P. K. C., Dauber-Osguthorpe, P., Campbell, M. M., Brown, D. W., K<strong>in</strong>sman,R. G., Moss, C., Osguthorpe, D. J., Biopolymers 1990, 29, 623-637.[32] Brown, D. W., Campbell, M. M., K<strong>in</strong>sman, R. G., White, P. D., Moss, C. A.,Osguthorpe, D. J., Paul, P. K. C., Baker, B. I., Biopolymers 1990, 29, 609-622.[33] Cockcroft, V. B., Osguthorpe, D. J., Friday, A. F., Barnard, E. A., Lunt, G. G., Mol.Neurobiol. 1990, 4, 129- 169.[34] Lear, J. D., Wasserman, Z. R., DeGrado, W. F., Science 1988, 240, 1177-1181.[35] BIOSYM Technologies[36] Hol. W. G. J., Halie, L. M., Sander, C., Nature 1981, 294, 532-536.[37] Montal, M. S., Blewitt, R., Tomich, J. M., Montal, M., FEBS Lett. 1992, 313, 12-18.[38] Oiki, S., Madison, V., Montal, M., Prote<strong>in</strong>s 1990, 8, 226-236.[39] Wooley, G. A. Wallace, B. A., J. Membr. Biol. 1992, 129, 109-136.


Computer Modell<strong>in</strong>g <strong>in</strong> Molecular BiologyEdited by Julia M . GoodfellowOVCH Verlagsgesellschaft mbH. 19954 Molecular Dynamics andFree Energy Calculations Appliedto the Enzyme Barnase andOne of its Stability MutantsShoshana J Wodak. Daniel van Belle. and Mart<strong>in</strong>e Prt!vostUniversitC Libre de Bruxelles. Unit6 de Conformation de MacromolCculesBiologiques. CP160/16. P2. Ave . P . HCger. B-1050 Bruxelles. BelgiumContents4.1 Introduction ...................................................... 624.24.2.14.2.1.14.2.1.24.2.1.34.2.1.44.2.24.2.2.14.2.2.24.2.2.34.2.34.2.3.14.2.3.24.34.3.14.3.24.3.2.14.3.2.24.3.34.3.44.3.5Molecular Dynamics Simulations of Barnase <strong>in</strong> Water .................. 63Simulation Methodology ........................................... 64Integration Algorithm .............................................. 64The Force-Field ................................................... 65Treatment of Long-Range Coulombic Interactions ..................... 67Start<strong>in</strong>g Conformation and Simulation Conditions ..................... 68Analysis of the 250 ps Trajectory of Barnase <strong>in</strong> Water ................. 71Prote<strong>in</strong> Motion .................................................... 71Variations of the Prote<strong>in</strong> Accessible Surface Area and Volume .......... 73Hydrogen Bonds .................................................. 74Structural and Dynamic Properties of Water Molecules Near theProte<strong>in</strong> Surface .................................................... 80Structural Properties ............................................... 80Dynamic Properties ................................................ 82Comput<strong>in</strong>g the Free Energy Change Associated with a HydrophobicMutation <strong>in</strong> Barnase ............................................... 86The Free Energy Perturbation Method ............................... 86Comput<strong>in</strong>g Free Energy Differences : Practical Aspects ................. 88Implementation of the Perturbation Method .......................... 88The Molecular Systems and Simulation Procedure ..................... 89Computed Changes <strong>in</strong> Prote<strong>in</strong> Stability for the Ile 96 + Ala Mutation .... 91Error Estimation .................................................. 94Prote<strong>in</strong> Stability and the Hydrophobic Effect ......................... 954.4 Conclud<strong>in</strong>g Remarks ............................................... 98References ........................................................ 99


62 Shoshana .l Wodak, Daniel van Belle, and Mart<strong>in</strong>e Pw'vost4.1 IntroductionMolecular dynamics (MD) techniques have found many useful applications <strong>in</strong> an <strong>in</strong>creas<strong>in</strong>gvariety of problems <strong>in</strong>volv<strong>in</strong>g biological molecules (see Goodfellow andWilliams [l] for a recent review).One very popular application, which requires faithful reproduction of structuralproperties, has been the ref<strong>in</strong>ement of prote<strong>in</strong> models obta<strong>in</strong>ed by X-ray diffraction[2] and NMR spectroscopy [3-51. It <strong>in</strong>volves techniques termed restra<strong>in</strong>ed MDsimulations [6, 71 where use is made of simulated anneal<strong>in</strong>g protocols [8, 91 to optimizethe correspondence between the model and the experimental data.Much <strong>in</strong>terest has also been generated by the use of MD simulations to evaluatefree energy changes <strong>in</strong> prote<strong>in</strong>s caused by sequence modification [lo- 131, differences<strong>in</strong> association constants for prote<strong>in</strong>-prote<strong>in</strong> [14] and enzyme-ligand complexes[15, 161, or the energetics of enzyme reactions [17-201. With the need of reproduc<strong>in</strong>genergetic and thermodynamic quantities to with<strong>in</strong> only a few kilocalories, this lattercategory of applications demands an accurate representation of both structural anddynamic properties of the system, as well as an adequate sampl<strong>in</strong>g of its phase space.Substantial progress has been achieved <strong>in</strong> MD simulations s<strong>in</strong>ce their first applicationto biological systems <strong>in</strong> 1976-1979 [21]. Clearly however, they are still notable to achieve all this reliably. Many problems such as (1) use of <strong>in</strong>adequate or <strong>in</strong>completepotential functions, (2) improper representation of the environment (solvent,ions), (3) sensitivity to simulation conditions, (4) <strong>in</strong>sufficient sampl<strong>in</strong>g of relevantregions of phase space, are still be<strong>in</strong>g tackled by many researchers.As an illustration of the progress <strong>in</strong> the field we describe a room temperature MDstudy of a small prote<strong>in</strong>, barnase, <strong>in</strong> presence of explicit water molecules, with emphasison practical aspects of the simulations and on the analysis of the results. Thelength of the studied trajectory is 250 ps (1 ps = s). Until recently, only twoother studies described trajectories of comparable length for a prote<strong>in</strong>-water systemthe 60 ps trajectory of protease A from Streptomyces griseus <strong>in</strong> the crystall<strong>in</strong>e state[25], and the 200 ps trajectory of Bov<strong>in</strong>e Pancreatic Tryps<strong>in</strong> Inhibitor (BPTI) <strong>in</strong>water [22]. Only lately trajectories of 50'0 ps and longer have been generated for anumber of prote<strong>in</strong>- and peptide-water systems [12- 14, 23 -261. These trajectorieswere computed both at room temperature and at higher temperatures used tosimulate of prote<strong>in</strong> denaturation. Their detailed analysis is still <strong>in</strong> progress, andshould provide valuable new <strong>in</strong>sight <strong>in</strong>to the dynamics of the native state and theevents associated with unfold<strong>in</strong>g.The second part of this chapter describes the use of MD simulations <strong>in</strong> comput<strong>in</strong>gthe differences <strong>in</strong> unfold<strong>in</strong>g free energy between wild-type barnase and a mutantwhere isoleuc<strong>in</strong>e 96 <strong>in</strong> the hydrophobic core of the prote<strong>in</strong> is replaced by alan<strong>in</strong>e.Various aspects of the computational analysis will be described and results will be


4 Molecular Dynamics and Free Energy Calculations 63discussed <strong>in</strong> light of the experimentally determ<strong>in</strong>ed measures of prote<strong>in</strong> stabilitychanges caused by the mutation.4.2 Molecular Dynamics Simulationsof Barnase <strong>in</strong> WaterBarnase is an extra cellular ribonuclease from Bacillus amyloliquefaciens. It is ofparticular <strong>in</strong>terest because, be<strong>in</strong>g small enough (1 10 residues), it is readily amenableto physical, chemical, spectroscopic and structural studies. Indeed, it has beenanalyzed by nuclear magnetic resonance spectroscopy (NMR) [27, 281. The crystalstructure of the free enzyme is known to 2.0 A resolution [29] and the structuredeterm<strong>in</strong>ations of its complexes with nucleotides [30, 311 and with the prote<strong>in</strong> <strong>in</strong>hibitorbarstar [30], are be<strong>in</strong>g completed. Furthermore, barnase undergoes reversiblethermal and urea <strong>in</strong>duced denaturation [32] and has served as a model system forstudy<strong>in</strong>g prote<strong>in</strong> stability and fold<strong>in</strong>g by site directed mutagenesis [28]. It conta<strong>in</strong>sa significant amount of secondary structure, which comprises a P-sheet composedof 5 strands (residues 50-55, 70-75, 85-91, 94-101, and 106-108) and two a-helices(residues 6-18 and 26-34), as depicted <strong>in</strong> Figure 4-1.The 250 ps MD simulation of solvated barnase described <strong>in</strong> this section, is usedto analyze the detailed motional behaviour of this prote<strong>in</strong> <strong>in</strong> solution and its <strong>in</strong>terac-Figure 4-1. Ribbon draw<strong>in</strong>g [51] of the barnase crystal structure [29].


64 Shoshana J. Wodak, Daniel van Belle, and Mart<strong>in</strong>e PrPvosttions with the solvent. The major aim of such theoretical analyses is to assess thecomputational methodologies, and to ga<strong>in</strong> new <strong>in</strong>sights, which experimental studiescannot provide.4.2.1 Simulation Methodology4.2.1.1 Integration AlgorithmTo compute the time dependent trajectory of the prote<strong>in</strong> barnase <strong>in</strong> presence of explicitsolvent molecules, Newton’s equations of motion of all the particles <strong>in</strong> thesystem were <strong>in</strong>tegrated us<strong>in</strong>g the Verlet algorithm [33, 341:At2rj(t+At)=2rj(t) -ri(t-At)+-e(t)miri(t + At) - ri(t -At)vi (t) =24 t(4-1 a)(4-1 b)To speed up MD simulations of complex biological system a common practice is to“freeze” the very fast vibrational motion associated with bond stretch<strong>in</strong>g <strong>in</strong> the prote<strong>in</strong>molecule and with all <strong>in</strong>tra-<strong>molecular</strong> modes (bond stretch<strong>in</strong>g and angle bend<strong>in</strong>g)of the water molecules. Freez<strong>in</strong>g such high frequency motions should <strong>in</strong> pr<strong>in</strong>ciplenot affect the properties of the system above the pic0 second time scale [35]. Ithas the advantage of allow<strong>in</strong>g the use of longer time steps <strong>in</strong> the numerical <strong>in</strong>tegrationprocedure (At = 2 x s <strong>in</strong>stead of 0.5 x s, used otherwise), and isachieved by apply<strong>in</strong>g constra<strong>in</strong>ts. This is done us<strong>in</strong>g the SHAKE procedure byRyckaert et al. [36], which is implemented here <strong>in</strong> the context of the Verlet algorithm.In presence of constra<strong>in</strong>ts, the equation of motion of a particle becomes:The second term on the right hand side is an extra-force due to I holonomic constra<strong>in</strong>tsimposed on atom i. & represents Lagrange multipliers, that must be chosenso as to always satisfy the constra<strong>in</strong>ts equation:


4 Molecular Dynamics and Free Energy Calculations 65Where c is the fixed distance between particles i and j, taken here as that observed<strong>in</strong> the start<strong>in</strong>g conformation. The latter is the crystal structure of barnase after be<strong>in</strong>gsubjected to energy m<strong>in</strong>imization so as to regularize bond distances and angles andto relieve close contacts. To freeze bond stretch<strong>in</strong>g, the distance constra<strong>in</strong>ts aredirectly applied to the atoms def<strong>in</strong><strong>in</strong>g the relevant bonds. To freeze the bend<strong>in</strong>g ofa valence angle between atoms i - j - k, they are applied to the distance betweenatoms i and k.Insert<strong>in</strong>g the right hand side of Eq. (4-2) <strong>in</strong>to the Verlet’s algorithm Eq. (4-1 a):At2ri (t + At) = 2ri (t)- ri (t- A t) + -[I;;.(t)- c lZ,ViG,] (4-4)mik=lwhere one readily recognizes the standard Verlet formulation of Eq. (4-1 a), plus acorrection term due to the constra<strong>in</strong>ts. This allows to “decouple” the <strong>in</strong>tegration ofthe positions <strong>in</strong>to two steps:Step 1: the equation of motion are solved <strong>in</strong> absence of constra<strong>in</strong>ts us<strong>in</strong>g the Verletalgorithm to predict new “unconstra<strong>in</strong>ed” positions r’ (t + At).Step 2: the “unconstra<strong>in</strong>ed” position r’ is modified by an <strong>in</strong>crement 6r to yield thenew “constra<strong>in</strong>ed” positions r (t + At):ri (t + At) = r; (t + At) + 6ri (4-5)This <strong>in</strong>volves an iterative procedure which is applied until all the constra<strong>in</strong>ts aresatisfied to with<strong>in</strong> a predef<strong>in</strong>ed tolerance.I4.2.1.2 The Force-FieldOne of the major challenges <strong>in</strong> simulations of large biological systems is the availabilityof force-fields that adequately represent their physical properties. The use ofclassical empirical force-fields offers the important advantage of test<strong>in</strong>g differentfunctional forms and fitt<strong>in</strong>g parameters to a variety of experimental data and toresults from detailed quantum mechanical calculations. Obta<strong>in</strong><strong>in</strong>g an improved setof parameters and better potential functions is an ongo<strong>in</strong>g effort <strong>in</strong> manylaboratories [37-421. Here, all the atoms of the system, <strong>in</strong>clud<strong>in</strong>g aliphatic and polarhydrogens, were considered explicitly. Forces and <strong>in</strong>teraction energies between prote<strong>in</strong>atoms and between atoms of the prote<strong>in</strong> and water, were calculated us<strong>in</strong>g arelatively recent version (version 19) of the CHARMM potentials [38] developed at


66 Shoshana J Wodak, Daniel van Belle, and Mart<strong>in</strong>e PrkvostHarvard. These potentials are expressed as a sum of bonded and non-bonded energyterms.utot = ub + unb(4-6)The bonded terms represent the potential energy contribution associated with thebond, bond-angle and torsion-angle deformations.where bo is the equilibrium bond length <strong>in</strong> A and Kb is the force constant <strong>in</strong>kcal/mol/A2; Bo is the equilibrium angle <strong>in</strong> degrees and KO is the force constant <strong>in</strong>kcal/mol/degree2; p is the torsion angle measured between two planes def<strong>in</strong>ed byfour atoms, K, is the force constant <strong>in</strong> kcal/mol, n is the periodicity and 6 is aphase angle. Terms represent<strong>in</strong>g the deformation of planar groups (fe <strong>in</strong> aromaticgroups), are also considered, but not shown here.The non-bonded terms represent the potential energy contribution from nonbonded<strong>in</strong>teractions. They are expressed as powers of the <strong>in</strong>verse distance betweenpairs of atoms, which are separated by at least three chemical bonds. The first term<strong>in</strong> Eq. (4-8) describes the coulombic <strong>in</strong>teractions and the second term, the Lennard-Jones dispersion-repulsion <strong>in</strong>teractions:4.9.un, = c rJ + c 4eiji


4 Molecular Dynamics and Free Energy Calculations 67Figure 4-2. The TIP3P water molecule [40]. b is the value of the (fixed) 0 - H bond length<strong>in</strong> angstrom, 0 is the H-0-H (fixed) bond angle value <strong>in</strong> degrees. qaH are the partialcharges assigned to the nuclei (<strong>in</strong> fraction of elementary charge unit [el = 1.6 lo-'' C). As<strong>in</strong>gle Lennard-Jones <strong>in</strong>teraction center is positioned at the oxygen nucleus, E is the well depth<strong>in</strong> kcal/mol and o the <strong>in</strong>teratomic distance at which the potential is zero (<strong>in</strong> angstrom).4.2.1.3 Treatment of Long-Range Coulombic InteractionsCoulombic <strong>in</strong>teractions vary as rij -' and therefore belong to the catagory of longrange <strong>in</strong>teractions [44]. One of the major problems <strong>in</strong> simulat<strong>in</strong>g dilute prote<strong>in</strong> solutionswhich conta<strong>in</strong> a large polar water phase, is to achieve an adequate treatmentof such <strong>in</strong>teractions. To save <strong>computer</strong> time a common practice is to neglect <strong>in</strong>teractionsbetween atoms that are further apart than a given cutoff distance, rc, whichis usually no more than 7-10 A. Whereas van der Waals <strong>in</strong>teractions are negligibleat such distances, Coulomb <strong>in</strong>teractions still make substantial contributions, andsimply apply<strong>in</strong>g the cutoff rule causes spurious fields that <strong>in</strong>fluence the behavior ofthe system <strong>in</strong> a non-physical way. To avoid this problem a radial atom-atom cutoffdistance of 7 A was used here together with a truncation scheme which modifies theCoulomb <strong>in</strong>teractions over the entire distance range so that it smoothly decays tozero at the cutoff distance:U'(r)= U(r).S(r) (4-9)


68 Shoshana .l Wodak, Daniel van Belle, and Mart<strong>in</strong>e Privostwith U' and U be<strong>in</strong>g respectively the modified and unmodified Coulomb potentialand r the <strong>in</strong>teratomic distance. S (r) is the ME14 shift<strong>in</strong>g function [45] which has thefollow<strong>in</strong>g form :S (r) = 0 r > r,(4-10)with r, be<strong>in</strong>g the cutoff distance.The modification produced by S(r) on the Coulomb potential is displayed <strong>in</strong>Figure 4-3. In a recent study [46], several truncation schemes for long-range <strong>in</strong>teractionswere calibrated aga<strong>in</strong>st the Ewald-Kornfeld summation method [47] <strong>in</strong> simulationsof pure liquid water. It was shown that with three-center models such as SPCand TIPS, the ME14 function performs best with respect to both structural and thermodynamicproperties, hence the choice of ME14 for prote<strong>in</strong>-water simulations <strong>in</strong>which water is expected to play an important role.-_ -_ -_----____0.0 I I I I I0.2 0.4 0.6 0.8 1.0 r/rcFigure 4-3. Coulombic potential as a function of the reduced distance r/rc.- unmodified potential U (r), - - - modified potential U' (r) [46].4.2.1.4 Start<strong>in</strong>g Conformation and Simulation ConditionsThe start<strong>in</strong>g conformation of the system consisted of the follow<strong>in</strong>g components :(1) one of the three molecules (molecule C) <strong>in</strong> the asymmetric unit of the 2 A resolutionref<strong>in</strong>ed crystal structure of barnase [29]; this molecule was subjected to100 steps of energy m<strong>in</strong>imization us<strong>in</strong>g the conjugate gradient algorithm [48], <strong>in</strong>


4 Molecular Dvnamics and Free Enerav Calculations 69order to releave possible close contacts, and to adjust bond distances and angles,(2) crystallographically determ<strong>in</strong>ed water positions located with<strong>in</strong> 4 A of a prote<strong>in</strong>atom and (3) randomly oriented water molecules placed on a cubic lattice <strong>in</strong> arectangular box of dimensions 49.68 A x 37.16 A x 49.68 A, as illustrated <strong>in</strong>Figure 4-4a. The system conta<strong>in</strong>ed a total of 8777 atoms. These <strong>in</strong>cluded 1700prote<strong>in</strong> atoms, compris<strong>in</strong>g all hydrogen positions, generated us<strong>in</strong>g standard bonddistances and angles [49], 94 water molecules positioned crystallographically, and2265 generated water molecules.Relevant simulation data and parameters are summarized <strong>in</strong> Table 4-1. In perform<strong>in</strong>gthe simulations, periodic boundary conditions [73] were applied. With thechosen box dimensions, atoms of prote<strong>in</strong>s <strong>in</strong> adjacent boxes were separated by atleast three water layers, and therefore did not <strong>in</strong>fluence each other. The simulationswere performed <strong>in</strong> the microcanonical ensemble (N, V, E), at room temperature((T) = 304 K) us<strong>in</strong>g vectorized procedures implemented <strong>in</strong> the Brugel package [51].They started by a thermalization period of 4 ps, dur<strong>in</strong>g which the temperature wasadjusted us<strong>in</strong>g a velocity rescale procedure where the velocity of each atom ismultiplied at regular <strong>in</strong>tervals by the factor I/T,/T where T and are respectivelythe current and the desired equilibrium temperatures. This was followed by anequilibration period of 40 ps, after which the simulations were carried on for 250 ps.The trajectory generated dur<strong>in</strong>g this latter <strong>in</strong>terval was used to analyze the propertiesof the system. A snapshot of the system after the first 50 ps of the production runis shown <strong>in</strong> Figure 4-4 b.Table 4-1. Relevant data and simulation parameters for the 250 ps <strong>molecular</strong> dynamicssimulation of barnase <strong>in</strong> water.Prote<strong>in</strong>-solvent simulation conditionsThermodynamic ensemble:Microcanonical (N, V, E)Integration algorithm: Verlet [33]Integration time-step:0.002 ps (1 ps = 10-12 s)Constra<strong>in</strong>ts (Shake [35, 361): Prote<strong>in</strong>: bond distancesWater: bond distances and bond anglesLong-range <strong>in</strong>teractions :I A cutoff distanceShift<strong>in</strong>g function: ME14 [45, 461Periodic boundary conditions : Box dimensions: 49.68 x 49.68 x 49.68 AThermalization :4 PSEquilibration :40 ps


yI0Shoshana 1 Wodak, Daniel van Belle, and Mart<strong>in</strong>e PrtrvostFigure 4-4. The simulated barnase-water system. (a) The start<strong>in</strong>g conformation of the system,consisttng of one of the three barnase molecules <strong>in</strong> the asymmetric unit (molecule C) fromthe 2 A resolution ref<strong>in</strong>ed crystal structure (represented ~ its <strong>molecular</strong> surface), 94crystallography determ<strong>in</strong>ed water positions located with<strong>in</strong> 4 A of a prote<strong>in</strong> atom (<strong>in</strong> blue) and2265 randomly oriented water molecules (<strong>in</strong> red) placed on a cubic lattice <strong>in</strong> a rectangular box(dimensions: 49.68 x 37.16 x 49.68 A), (b) the same system after 50 ps. Colour<strong>in</strong>g of the prote<strong>in</strong>surface is chosen accord<strong>in</strong>g to the vadue of the mean square displacement of the maiqcha<strong>in</strong>atoms: small displacement


4 Molecular Dynamics and Free Energy Calculations 714.2.2 Analysis of the 250 ps Trajectory of Barnase<strong>in</strong> Water4.2.2.1 Prote<strong>in</strong> MotionAnalysis of the movements displayed by the prote<strong>in</strong> structure dur<strong>in</strong>g the MD simulationis always quite reveal<strong>in</strong>g. Such movements can be measured by comput<strong>in</strong>g thedeviations of the atomic positions between conformations generated <strong>in</strong> the simulationand a reference state, usually taken to be the crystal structure. To elim<strong>in</strong>ate contributionsfrom the rigid-body tumbl<strong>in</strong>g and translational motion of the entire molecule,these deviations are computed after perform<strong>in</strong>g coord<strong>in</strong>ate superpositions [52].Large structural deformations uniformly distributed across the 3D structure couldresult from shortcom<strong>in</strong>gs <strong>in</strong> the potential functions or from <strong>in</strong>adequate simulationconditions. Significant localized deformations could be due to such problems aswell, but may also represent structural re-adjustments that occur <strong>in</strong> certa<strong>in</strong> regionsupon transfer from the crystall<strong>in</strong>e environment to solution conditions, or reflect the<strong>in</strong>herent local flexibility of such regions.The root mean square deviation (rmsd) of the backbone atoms of <strong>in</strong>dividual conformationsof barnase along the 250 ps strajectory is displayed <strong>in</strong> Figure 4-5. Thisrmsd, measured relative to the start<strong>in</strong>g crystal conformation, displays a value ofabout 1.2 A at the beg<strong>in</strong>n<strong>in</strong>g of the trajectory (t = 0), <strong>in</strong>dicat<strong>in</strong>g that movements1.63 1.2t (PS)0.00 50 100 . 150 200 250 4 ~Figure 4-5. Root mean square deviations (rmsd) of barnase backbone atoms <strong>in</strong> conformationsalong the 250 ps <strong>molecular</strong> dynamics trajectory of the prote<strong>in</strong>-water system. The rmsdvalues are computed relative to the barnase crystal structure after coord<strong>in</strong>ate superposition(McLachlan [52]). The significant rmsd value at t = 0, reflects movements <strong>in</strong> the structure thatoccurred dur<strong>in</strong>g the thermalization and equilibration phases that precede the 250 ps productionperiod (see text).


72 Shoshana .l Wodak, Daniel van Belle, and Mart<strong>in</strong>e Prt-vostaway from the crystal structure have occurred dur<strong>in</strong>g the thermalization andequilibration periods that precede the production run. Thereafter, only a smalloverall drift is observed, with fluctuations of the order of 0.3 A. The backbone rmsdof the average prote<strong>in</strong> conformation computed from the entire 250 ps trajectory is1.2 A, and that of all the atoms (<strong>in</strong>clud<strong>in</strong>g aliphatic hydrogens) is 1.7 A.The rmsd values obta<strong>in</strong>ed <strong>in</strong> the present simulations are among the lowest valuesreported so far for simulations of solvated prote<strong>in</strong>s where rmsd of C, atoms usuallyrange around 1.4-1.9 A [23-261. Somewhat smaller rmsd values (1.18 A) wererecently reported for the non-hydrogen atoms of BPTI <strong>in</strong> a 200 ps MD simulationP21.Inspection of the deviations between the backbone atoms <strong>in</strong> the average simulatedprote<strong>in</strong> conformation and the crystal structure (Figure 4-6) shows that the largestdeviations occur <strong>in</strong> regions that participate <strong>in</strong> crystal contacts. These regions arelocated at the N term<strong>in</strong>us and <strong>in</strong> three loops: the loop preced<strong>in</strong>g the second a-helix,the loop between /3-strands 2 and 3 and that between strands 4 and 5. As expected,residues belong<strong>in</strong>g to the &sheet, which are <strong>in</strong>volved <strong>in</strong> numerous tertiary <strong>in</strong>teractionsdisplay the lowest deviations (rmsd- 0.6 A). The deviations displayed by therecognition loop (residues 54-60) are <strong>in</strong>termediate <strong>in</strong> range, with the largest one(-2.0 A) exhibited by residue Lys 57. The near uniform 1 A displacement ofresidues 6-17 <strong>in</strong> the first a-helix, represent a rigid-body movement of this helixrelative to the P-sheet.Figure 4-6. Deviations (<strong>in</strong> A) of Ca atoms of the 250 ps average conformation of barnasefrom the crystal structure, as a function of their position N, along the sequence. Bars <strong>in</strong>dicatethe limits of secondary structure elements, and stars <strong>in</strong>dicate residues <strong>in</strong>volved <strong>in</strong> <strong>in</strong>ter<strong>molecular</strong>contacts <strong>in</strong> the barnase crystal. Both are displayed below the abscissa.


4 Molecular Dynamics and Free Energy Calculations 734.2.2.2 Variations of the Prote<strong>in</strong> Accessible Surface Areaand VolumeOther parameters such as the prote<strong>in</strong> accessible surface area and accessible volumewere also monitored along the trajectory (see Figure 4-7). These quantities, def<strong>in</strong>ed<strong>in</strong> the legend of Figure 4-7, were computed us<strong>in</strong>g an analytic algorithm implemented(AZ)Accessible surface(A3)22000 10 50 100 150 200 250(a)Accessible volume215002100020500 32 m:/,""y , , , , , , , , , , , , , , t (PS) , ,Figure 4-7. Accessible surface areas (A2) and volumes (A3) of barnase conformations alongthe 250 ps trajectory of the prote<strong>in</strong>-water system. Areas (a) and volumes (b) were computedwith an analytical algorithm [54], us<strong>in</strong>g a probe radius of 1.4 A. The accessible volume isdef<strong>in</strong>ed as the volume conta<strong>in</strong>ed with<strong>in</strong> the accessible surface of the prote<strong>in</strong> [54]. Values correspond<strong>in</strong>gto the accessible surface area and the accessible volume of the crystal structurerespectively, 5800 A' and 20050 A3, are <strong>in</strong>dicated by arrows. The largest changes <strong>in</strong> bothquantities occur dur<strong>in</strong>g the thermalization and equilibration periods, which precede the250 ps production run.


74 Shoshana .l Wodak, Daniel van Belle, and Mart<strong>in</strong>e PrPvost<strong>in</strong> the Brugel package [51]. We see that here too a large <strong>in</strong>crease <strong>in</strong> the prote<strong>in</strong>accessible surface area and volume occurs dur<strong>in</strong>g the thermalization and equilibrationphases that precede the 250 ps production run. Consider<strong>in</strong>g all the simulationphases, the largest change relative to the crystal structure is 17% for the accessiblesurface area, and 8% for the accessible volume. The volume changes are consistentwith a slight expansion of the molecule (an <strong>in</strong>crease of about 0.5 A <strong>in</strong> the gyrationradius) which is not <strong>in</strong>compatible with the known physical properties of prote<strong>in</strong>systems [53]. The surface area changes are larger then expected from the expansion.They result from movements of surface side cha<strong>in</strong>s and the N-term<strong>in</strong>us, which leadto their <strong>in</strong>creased exposure. The fluctuations of the surface areas and volumes of <strong>in</strong>dividualconformation relative to the mean values of the correspond<strong>in</strong>g quantities,evaluated over the entire trajectory, are significantly smaller and do not exceed 2%.4.2.2.3 Hydrogen BondsHydrogen Bonds Between Prote<strong>in</strong> Atoms. Hydrogen bonds are important landmarks<strong>in</strong> prote<strong>in</strong> conformation. They contribute to the stability of secondary structures andof <strong>in</strong>teractions between specific sidecha<strong>in</strong> and ma<strong>in</strong>cha<strong>in</strong> polar atoms. A comparativeanalysis between the hydrogen bonds formed dur<strong>in</strong>g a MD trajectory andthose found <strong>in</strong> the crystal structure provides an additional means for assess<strong>in</strong>g theFigure 4-8. Hydrogen bonds formed between barnase atoms dur<strong>in</strong>g the 250 ps trajectory.(a) “Persistent” hydrogen bonds, those present <strong>in</strong> more than 60% of the conformations alongthe trajectory; (b) “medium” hydrogen bonds appear<strong>in</strong>g <strong>in</strong> 30% to 60% of the conformations;(c) “weak” hydrogen bonds, formed <strong>in</strong> less than 30% of the conformations. Differenthatch<strong>in</strong>gs are used to represent hydrogen bonds counts accord<strong>in</strong>g to the location of the donorand the acceptor on the polypeptidic cha<strong>in</strong>: ma<strong>in</strong>cha<strong>in</strong>-ma<strong>in</strong>cha<strong>in</strong> ; ma<strong>in</strong>cha<strong>in</strong>-sidecha<strong>in</strong>; sidecha<strong>in</strong>-sidecha<strong>in</strong> H . Hydrogen bonds present <strong>in</strong> the orig<strong>in</strong>al crystal structure arelabeled “crystal” ; those formed dur<strong>in</strong>g the simulation are labeled “new”. Criteria used todef<strong>in</strong>e hydrogen bonds were as follows: the hydrogen-acceptor distance must be 5 2.5 A, thedonor-hydrogen-acceptor angle, and the hydrogen-acceptor- “from” angles must be larger orequal to 90” (“from” stands for the atom covalently l<strong>in</strong>ked to the acceptor atom).


4 Molecular Dynamics and Free Energy Calculations 75correspondence between the simulated dynamical system and the experimental staticpicture. Such analysis nearly always <strong>in</strong>volves the use of somewhat arbitrary(geometric or energetic) criteria to decide when an H-bond is formed and when itis not. With the criteria used here (see legend of Figure 4-8) we def<strong>in</strong>e a total of 140<strong>in</strong>tra-<strong>molecular</strong> hydrogen bonds <strong>in</strong> our start<strong>in</strong>g barnase crystal structure. Amongthese, 63 are <strong>in</strong> the ma<strong>in</strong>cha<strong>in</strong>, 53 between ma<strong>in</strong>cha<strong>in</strong> and sidecha<strong>in</strong> groups and 24between sidecha<strong>in</strong> groups only.Table 4-2 summarizes the behavior of these hydrogen bonds <strong>in</strong> the 250 ps trajectory.We see that a majority (78%) of the 63 ma<strong>in</strong>cha<strong>in</strong> hydrogen bonds present <strong>in</strong>Table 4-2. Ma<strong>in</strong>cha<strong>in</strong>-ma<strong>in</strong>cha<strong>in</strong>, ma<strong>in</strong>cha<strong>in</strong>-sidecha<strong>in</strong> and sidecha<strong>in</strong>-sidecha<strong>in</strong> hydrogenbonds behavior <strong>in</strong> the 250 ps trajectory of solvated barnase.PersistentMediumWeakTotalcrystal498663Ma<strong>in</strong>-ma<strong>in</strong>146new8106583PersistentMediumWeakTotalcrystal2372353Ma<strong>in</strong>-sidenew1027131168221Side-sidecrystalnewPersistent42Medium98Weak1162Total247296“Crystal” refers to hydrogen bonds present <strong>in</strong> the orig<strong>in</strong> crystal structure, “new” refers tohydrogen bonds formed dur<strong>in</strong>g the simulation. Hydrogen bonds formed <strong>in</strong> more than 60%of the conformations <strong>in</strong> the trajectory are labelled “persistent”, “medium” hydrogen bondsare those found <strong>in</strong> 30-60% of the conformations, “weak” hydrogen bonds are found <strong>in</strong> lessthan 30% of the conformations.


76 Shoshana .I Wodak, Daniel van Belle, and Mart<strong>in</strong>e Prtfvostthe crystal structure, persist dur<strong>in</strong>g the simulation (they occur <strong>in</strong> more than 60% ofthe conformations <strong>in</strong> the trajectory). The proportion of persist<strong>in</strong>g crystallographichydrogen bonds decreases to 43 (70 <strong>in</strong> the sidecha<strong>in</strong>-ma<strong>in</strong>cha<strong>in</strong> category, and then furtherto 17 %, <strong>in</strong> the sidecha<strong>in</strong>-sidecha<strong>in</strong> category. In parallel to the loss of crystallographichydrogen bonds we observe the formation of new <strong>in</strong>tra-<strong>molecular</strong> hydrogenbonds not present <strong>in</strong> the crystal structure. However, most of them occur transientlywith only a very small number persist<strong>in</strong>g <strong>in</strong> 6O%, or more, of the generated conformations.For example, among the 83 new hydrogen bonds formed between ma<strong>in</strong>cha<strong>in</strong>atoms, only about 10% persist, and among the 72 new sidecha<strong>in</strong>-sidecha<strong>in</strong>hydrogen bonds only 2 (about 3 070) persist. Figure 4-9 summarizes these data for thema<strong>in</strong>cha<strong>in</strong> hydrogen bonds <strong>in</strong> barnase.Figure 4-9. Hydrogen bonds formed between amide and carbonyl groups of the polypeptidebackbone dur<strong>in</strong>g the 250 ps <strong>molecular</strong> dynamics simulation of solvated barnase. The sequenceposition of the amide groups is given vertically, and the hydrogen bonds that they formare plotted <strong>in</strong> the upper left diagonal. The sequence position of the carbonyl groups is givenhorizontally and their hydrogen bonds are plotted <strong>in</strong> the lower right diagonal. Only 4catagories of hydrogen bonds are plotted: hydrogen bonds observed <strong>in</strong> the crystal structure,and ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong> more than 60% of the conformations <strong>in</strong> the trajectory (B); hydrogenbonds observed <strong>in</strong> the crystal structure, and ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong> less than 60% of the conformations(0); new hydrogen bonds generated dur<strong>in</strong>g the simulation and ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong> more than 60%of the conformations (a); new hydrogen bonds generated dur<strong>in</strong>g the simulation and ma<strong>in</strong>ta<strong>in</strong>ed<strong>in</strong> less than 60% of the conformations (0).Prote<strong>in</strong>-Solvent Hydrogen Bonds Involv<strong>in</strong>g the Backbone. In addition to <strong>in</strong>tra<strong>molecular</strong><strong>in</strong>teractions, polar groups can also form hydrogen bonds with watermolecules. The average number of hydrogen bonds formed dur<strong>in</strong>g the simulationby ma<strong>in</strong>cha<strong>in</strong> amide and carbonyl groups with water molecules is displayed <strong>in</strong>Figure 4-10a. Data on <strong>in</strong>tra-<strong>molecular</strong> hydrogen bonds formed between the same


4 Molecular Dynamics and Free Energy Calculations 77groups are shown <strong>in</strong> Figure 4-lob for comparison. We see that overall, backbonegroups are <strong>in</strong>volved more often <strong>in</strong> hydrogen bonds with water molecules (a total of144 hydrogen bonds on the average) than with other prote<strong>in</strong> atoms (a total of 60hydrogen bonds on the average). We f<strong>in</strong>d furthermore that carbonyl groups are moreoften <strong>in</strong>volved <strong>in</strong> hydrogen bonds with water molecules than amide groups. This fitswith the general tendency of carbonyl groups to form more hydrogen bonds than10 10 20 30 40 50 60 70 80 90 100 110 NFigure 4-10. The average number of hydrogen bonds formed by backbone amides and carbony1groups of barnase dur<strong>in</strong>g the 250 ps simulation of the prote<strong>in</strong>-water system. The averagenumber of hydrogen bonds for each group ((NHb)NH or (NHb),,) is def<strong>in</strong>ed as the numberof conformations <strong>in</strong> which the correspond<strong>in</strong>g group makes a hydrogen bond divided by thetotal number of conformations <strong>in</strong> the trajectory. The criteria for hydrogen bond formationare given <strong>in</strong> the legend of Figure 4-8. (a) displays the average number of hydrogen bonds madewith other prote<strong>in</strong> atoms, by the backbone amides (above) and carbonyl groups (below) alongthe am<strong>in</strong>o-acid sequence of barnase. (b) displays the same data, but count<strong>in</strong>g only hydrogenbonds made with water molecules.


78 Shoshana J; Wodak, Daniel van Belle, and Mart<strong>in</strong>e PrPvostamide groups, observed previously <strong>in</strong> surveys of prote<strong>in</strong>s [55, 561. It is also consistentwith the known chemical properties of these groups. Carbonyl oxygens have <strong>in</strong>deedthe ability to utilise their lone pair sp2 orbitals as acceptors, and hence to form twohydrogen bonds simultaneously, while the amide nitrogens can act only as a s<strong>in</strong>glehydrogen bond donnor.Not unexpectedly, most of the backbone-solvent hydrogen bonds are located <strong>in</strong>loop regions, and at the extremities of secondary structure elements. In several <strong>in</strong>stancesthe same group (more often a carbonyl than an amide) makes on the averagehydrogen bonds with both the prote<strong>in</strong> and the surround<strong>in</strong>g solvent. This occurs fefor several peptide groups <strong>in</strong> the loop <strong>in</strong>volved <strong>in</strong> guan<strong>in</strong>e recognition (residues57-61), and at the N term<strong>in</strong>ii of the a-helices. In general it reflects exchangebetween <strong>in</strong>tra- and <strong>in</strong>ter<strong>molecular</strong> hydrogen bonds which takes place dur<strong>in</strong>g thesimulation as a result of thermal motion. Detailed comparison of these results withexperimental data on amide proton-deuterium exchange rates measured by nuclearmagnetic resonance [57, 581 should establish to what extent they reflect the actualbehavior of the prote<strong>in</strong>. Analysis of such data, which have become available for barnaserecently [31] is <strong>in</strong> progress.Hydrogen Bonds Involv<strong>in</strong>g Prote<strong>in</strong> Sidecha<strong>in</strong>s. Figure 4-1 1 displays the averagenumber of hydrogen bonds formed by the sidecha<strong>in</strong>s of barnase dur<strong>in</strong>g the simulationas a function of their position <strong>in</strong> the sequence. Sidecha<strong>in</strong> groups too are <strong>in</strong>general more often <strong>in</strong>volved <strong>in</strong> hydrogen bonds with water molecules (a total of 94hydrogen bonds on the average) than <strong>in</strong> hydrogen bonds with other prote<strong>in</strong> atoms(a total of 33 hydrogen bonds on the average), as seen from Figures 4- 11 a-c. Sidecha<strong>in</strong>-sidecha<strong>in</strong>hydrogen bonds (Figure 4-11 b) are less common than sidecha<strong>in</strong>ma<strong>in</strong>cha<strong>in</strong>hydrogen bonds (Figure 4-11 a). Interest<strong>in</strong>gly, persistent sidecha<strong>in</strong>-ma<strong>in</strong>cha<strong>in</strong>hydrogen bonds - those present <strong>in</strong> 90% or more of the computed conformations- <strong>in</strong>volve primarily short sidecha<strong>in</strong>s such as Asn, Thr and Ser. Thosesidecha<strong>in</strong>s contribute an average number of hydrogen bonds equal to, or higher than0.5 <strong>in</strong> Figure 4-11 a. Among the fewer sidecha<strong>in</strong>-sidecha<strong>in</strong> hydrogen bonds, the mostpersistent ones belong preferentially to charged sidecha<strong>in</strong>s such as Arg, Glu and Asp,and one belongs also to Tyr 103. Among these sidecha<strong>in</strong>s, several (Arg 83, 87;Glu 60, 73; 5 r 103) are believed to be <strong>in</strong>volved <strong>in</strong> the catalytic function of barnase[59-611. Others, such as that between Asp 8, 12 and Arg 110, which bridge the cha<strong>in</strong>ends, are likely to play an important role <strong>in</strong> prote<strong>in</strong> stability [62, 631.


4 Molecular Dynamics and Free Energy Calculations 79(a) Average number of hydrogenbonds made by thesidecha<strong>in</strong> of each residuewith backbone atoms.(b) Average number of hydrogenbonds made by thesidecha<strong>in</strong> of each residuewith other sidecha<strong>in</strong>s.3.0 1 n2.01.0(c) Average number of hydrogenbonds made by thesidecha<strong>in</strong> of each residue0 10 20 30 40 50 60 70 80 90 100 110 N with water molecules.(c)Figure 4-11. The average number of hydrogen bonds formed by <strong>in</strong>dividual residues along thesequence dur<strong>in</strong>g the 250 ps <strong>molecular</strong> dynamics simulation of barnase <strong>in</strong> water.In each case the average number of the correspond<strong>in</strong>g hydrogen bonds is computed by summ<strong>in</strong>gthe average number of hydrogen bonds, def<strong>in</strong>ed as <strong>in</strong> legend of Figure 4-10, made by alldonor and acceptor groups of the sidecha<strong>in</strong> and divid<strong>in</strong>g it by the number of donors and acceptors<strong>in</strong> the sidecha<strong>in</strong>.


80 Shoshana .I Wodak, Daniel van Belle, and Mart<strong>in</strong>e Prtfvost4.2.3 Structural and Dynamic Propertiesof Water Molecules Near the Prote<strong>in</strong> SurfaceMD simulations can provide a very accurate picture of the structure and dynamicsof water molecules near the prote<strong>in</strong> surface. A number of recent simulation studies[26, 64-66] have already contributed to chang<strong>in</strong>g our view from that, provided byX-ray and neutron diffraction studies, where hundreds of molecules are consideredto be statically bound to the prote<strong>in</strong>, to that <strong>in</strong> which water at the prote<strong>in</strong> surfaceconserves a significant degree of mobility, imply<strong>in</strong>g that many fewer watermolecules, if any, are truly immobilized at the prote<strong>in</strong> surface. A similar picture isemerg<strong>in</strong>g from experimental analyses by NMR [67-691.4.2.3.1 Structural PropertiesHere we describe results of the analysis of water structure at the surface of barnaseas seen <strong>in</strong> our 250 ps simulation. Radial distribution function of water oxygens andhydrogens around specific prote<strong>in</strong> atoms <strong>in</strong> non-polar, polar and charged groupshave been computed <strong>in</strong> sidecha<strong>in</strong>s whose atoms are exposed to solvent [51]. A sampleof these distributions is shown <strong>in</strong> Figure 4-12a-c.We see that for polar groups, the maxima of the distributions, which correspondto the most probable position of nearest hydrogen bond<strong>in</strong>g partners, are with<strong>in</strong> theexpected hydrogen bond<strong>in</strong>g distances of the groups <strong>in</strong>volved (see fe Figure 4-12a).The peak <strong>in</strong> the water oxygen distribution around the non-polar methyl group ofAla 32 <strong>in</strong> barnase (Figure 4-12c) occurs at 3.55 A, correspond<strong>in</strong>g roughly to the sumof the van der Waals radii of the methyl and water groups. Similar distances of 3.6and 3.7 A were obta<strong>in</strong>ed previously for the methane-water first peak [70] and forboth the butane-water [71] and the peptide methyl-water [72] first peaks, respectively.A quite useful pictorial representation of the first solvation shell around specificprote<strong>in</strong> sidecha<strong>in</strong> groups can be obta<strong>in</strong>ed by represent<strong>in</strong>g water molecules from <strong>in</strong>dividualsnapshots along the trajectory <strong>in</strong> a common local reference frame attachedto the correspond<strong>in</strong>g sidecha<strong>in</strong>s, as illustrated <strong>in</strong> Figure 4-13. Such representation isparticularly helpful <strong>in</strong> analyz<strong>in</strong>g the water structure <strong>in</strong> the vic<strong>in</strong>ity of planar groupsas illustrated <strong>in</strong> Figure 4-13a. This figure shows the near planar arrangement of thewater molecules relative to the aromatic plane of Phe 82 <strong>in</strong> barnase, with waterhydrogens po<strong>in</strong>t<strong>in</strong>g towards the plane on its more exposed side. In contrast, Figure4-13 b shows the nice spherical arrangement of the water molecules around thesidecha<strong>in</strong> of Ala 32.


4 Molecular Dynamics and Free Energy Calculations 812.01.5-a5n22.o1.04 1.510.5-072.59It,0.5-2.18Ala32 I0'5104 /, I ,0 1 2 3 4 5 6 7 8r (4Figure 4-12. Radial distribution functions g (r) for water atoms around solvent exposedgroups of barnase. g(r) between water oxygens (0,) and the Os atom <strong>in</strong> the carbonyl groupof Asn 22 (a); g(r) of water oxygens (0,) and the H, group of Tyr 17 (b); g(r) of water oxygens(0,) and the CD of Ala 32 (c).


82 Shoshana 1 Wodak, Daniel van Belle, and Mart<strong>in</strong>e PrkvostIIFigure 4-13. Pictorial representation of water structure around selected solvent exposedgroups <strong>in</strong> barnase. Individual snapshots along the trajectory are represented <strong>in</strong> a commonlocal reference frame attached to the respective sidecha<strong>in</strong> groups. Atoms from the sidecha<strong>in</strong>groups appear superimposed <strong>in</strong> the center of the picture, with water molecules arrangedaround them. Water structure around Phe 82 (a) and around Ala 32 (b).4.2.3.2 Dynamic PropertiesDetailed <strong>in</strong>formation about the dynamic properties of water molecules surround<strong>in</strong>gthe prote<strong>in</strong> can be obta<strong>in</strong>ed from the simulations by comput<strong>in</strong>g their translationaland rotational diffusion coefficients. In do<strong>in</strong>g so it is furthermore possible toanalyze separately water molecules <strong>in</strong>teract<strong>in</strong>g with different groups on the prote<strong>in</strong>surface (hydrophobic, charged, polar, flexible, rigid, etc.), thereby provid<strong>in</strong>g <strong>in</strong>sight<strong>in</strong>to how various physical parameters of the prote<strong>in</strong> surface <strong>in</strong>fluence the solventdynamic properties. As a first step <strong>in</strong> this direction, the translational self diffusioncoefficients of molecules <strong>in</strong> the first water shells around selected prote<strong>in</strong> groups wereevaluated. The self diffusion coefficient D was computed us<strong>in</strong>g the E<strong>in</strong>ste<strong>in</strong> relation1731, which requires monitor<strong>in</strong>g the displacements of the center of mass of <strong>in</strong>dividualwater molecules (ri) <strong>in</strong> a given solvation shell as a function of time (s), and averag<strong>in</strong>gover time orig<strong>in</strong>s (t) and molecules <strong>in</strong> the shell:(4-1 1)A schematic representation show<strong>in</strong>g how the water molecules belong<strong>in</strong>g to the solvationshell are selected along the trajectory is given <strong>in</strong> Figure 4-14. The result<strong>in</strong>gensemble of water molecules is then used to calculate the time correlation function


4 Molecular Dynamics and Free Enerw Calculations 83Figure 4-14. Def<strong>in</strong>ition of a dynamical solvation shell around prote<strong>in</strong> groups. A watermolecule is considered as belong<strong>in</strong>g to the first shell if it is with<strong>in</strong> a distance R of a referenceprote<strong>in</strong> atom P, and has not left this perimeter dur<strong>in</strong>g the simulation for a cont<strong>in</strong>uous periodlonger than 10% of a maximum correlation time s [74] (see Eq. (4-11). In all cases R is fixedto 4 A and the correlation time s to 5-10 ps.<strong>in</strong> Eq. (4-11). A typical correlation function for such water molecules is given <strong>in</strong>Figure 4-15. Accord<strong>in</strong>g to Eq. (4-ll), D is proportional to the slope of this functionat long time scales. Due to the conditions applied for select<strong>in</strong>g the water molecules(see legend Figure 4-14), D does not represent simply the average diffusion coefficientof water molecules <strong>in</strong> the first solvation shell. The diffusion coefficient isdom<strong>in</strong>ated by the most slowly mov<strong>in</strong>g water molecules <strong>in</strong> this shell, of which theremay be only a few.Table 4-3 lists the results obta<strong>in</strong>ed for water molecules around, respectively, theHp protons of alan<strong>in</strong>es, the H, protons of isoleuc<strong>in</strong>es and the Hc protons of lys<strong>in</strong>es<strong>in</strong> barnase. The value for the diffusion coefficient computed for bulk water <strong>in</strong> thesame simulation is given <strong>in</strong> the column on the far right. In these calculations, onlyprotons <strong>in</strong> groups that expose more than 30% of their surface to solvent were considered(see legend of Table 4-3 for details).We f<strong>in</strong>d that the self diffusion coefficients of water surround<strong>in</strong>g the amideprotons and the Ala fi protons are very small (0.3 and 0.4 x lop9 m2 s-', respectively)relative to the diffusion coefficient of 4.8 x lop9 m2 s-', computed forbulk water <strong>in</strong> the same simulations. It is noteworthy that the latter value, whichagrees with those previously obta<strong>in</strong>ed for the pure liquid us<strong>in</strong>g the SPC or TIPSwater models f43, 46, 751, is a factor of 2 larger than the experimental measure(2.4 x m2 s-I). Water <strong>in</strong> contact with the Ile 6 and the Lys < protons is


84 Shoshana 1 Wodak, Daniel van Belle, and Mart<strong>in</strong>e PrhostFigure 4-15. Mean square displacement correlation function of the water molecules aroundHJ’S of Ile <strong>in</strong> barnase. Were considered <strong>in</strong> the computations only isoleuc<strong>in</strong>es Ha’s that exposemore than 30% of their surface to solvent (residues 4 and 55). The diffusion coefficient Dis proportional to the slope of the correlation function at long time scales.significantly more mobile (D = 1.6 x s-’), but still much less mobile than bulkwater.To help <strong>in</strong>terpret these observations, Eq. (4-11) was also used to compute the “selfdiffusion coefficient” of each type of reference proton around which the water shellswere analyzed. These coefficients are given <strong>in</strong> column 2 of Table 4-3.Table 4-3. Diffusion coefficients at the surface of barnase.GroupN-HAla HpIle H,LYS H,D<strong>in</strong>tr<strong>in</strong>sic (lo-’ m2 s-’)0.30.41.40.65DWarer (lo-’ m2 s-’)Intr<strong>in</strong>sic refers to the diffusion coefficient of the reference prote<strong>in</strong> atoms and water refers tothe diffusion coefficient of the water molecules belong<strong>in</strong>g to the first solvation shell aroundthese reference atoms (see Figure 4-14). In the calculation 21 amide protons were considered;8 Ala Ha protons belong<strong>in</strong>g to Ala 32, 37,43 (3 out of the 7 Ala residues <strong>in</strong> barnase); 6 IleH,<strong>in</strong> Ile 4, 55 (2 out of the 8 Ile <strong>in</strong> barnase) and 14 LysHr <strong>in</strong> Lys 19, 39, 49, 62, 66, 98, 108(7 out of the 8 Lys residues <strong>in</strong> barnase).0.30.41.61.6We see that the “diffusion coefficient” of the Ile H, is about 4 times larger thanthose of the amide and p protons. This is consistent with the 6 protons be<strong>in</strong>g at theextremity of a flexible side-cha<strong>in</strong> and therefore mov<strong>in</strong>g faster, than the amide and


4 Molecular Dynamics and Free Energy Calculations 85p protons which are part of the backbone whose movement reflects more closely theslower overall tumbl<strong>in</strong>g motion of the prote<strong>in</strong>. The small diffusion coefficient of theLys [ protons can be expla<strong>in</strong>ed by the fact that 3 out of the 4 considered lys<strong>in</strong>es are<strong>in</strong>volved <strong>in</strong> electrostatic <strong>in</strong>teractions with negatively charged prote<strong>in</strong> groups andtherefore do not move freely.Comparison of the data <strong>in</strong> columns 1 and 2 of Table 4-3 reveals clearly that thediffusion coefficients of the reference proton are virtually identical to those of theslowest water molecules <strong>in</strong> their first solvent shell. There is one exception however:the lys ( protons diffuse 3 times slower than their most slowly mov<strong>in</strong>g first shellwaters.The above results taken together <strong>in</strong>dicate that the motion of water molecules <strong>in</strong>the first solvation shell is <strong>in</strong>fluenced by the presence of prote<strong>in</strong> groups and possiblyeven coupled to their movements.It is <strong>in</strong>terest<strong>in</strong>g that no difference is detected <strong>in</strong> the dynamic behavior of watermolecules surround<strong>in</strong>g polar and hydrophobic groups, a result, which if confirmed,may have an important bear<strong>in</strong>g on our understand<strong>in</strong>g of the hydrophobic effect. Onesees <strong>in</strong>stead that water molecules <strong>in</strong> contact with groups that are close to or part ofthe prote<strong>in</strong> backbone (fe the amide and p protons) move more slowly than those <strong>in</strong>contact with flexible sidecha<strong>in</strong>s (fe 6 and (protons). This may be due to the fact thatthe movement of water molecules near the polypeptide backbone is hamperedthrough partial shield<strong>in</strong>g from the bulk by nearby sidecha<strong>in</strong>s, or through the comb<strong>in</strong>ed<strong>in</strong>fluence of <strong>in</strong>teractions with other backbone atoms. Similar observationsabout the slower movement of water molecules near the backbone than nearsidecha<strong>in</strong> groups were also made <strong>in</strong> a recent analysis of a 1 ns (1 ns = s)simulation of solvated Bov<strong>in</strong>e Pancreatic Tryps<strong>in</strong> Inhibitor (BPTI) [26].The slow diffusion coefficient of the ( protons relative to that of their surround<strong>in</strong>gwater molecules is <strong>in</strong>terest<strong>in</strong>g. It suggests loose coupl<strong>in</strong>g between the movementof these water molecules and that of the positively charged prote<strong>in</strong> group. Whetherthis is a general behavior of water surround<strong>in</strong>g positively charged groups rema<strong>in</strong>s tobe seen.A more exhaustive analysis, where averag<strong>in</strong>g is performed on larger ensembles,is clearly needed to verify these observations and to ga<strong>in</strong> further <strong>in</strong>sight.


86 Shoshana J. Wodak, Daniel van Belle, and Mart<strong>in</strong>e Pr6vost4.3 Comput<strong>in</strong>g the Free Energy ChangeAssociated with a Hydrophobic Mutation<strong>in</strong> BarnaseThe thermodynamic stability of prote<strong>in</strong>s, def<strong>in</strong>ed by their denaturation free energy,is not very large, with observed values <strong>in</strong> the range of about 10 kcal/mol [76]. Thisstability is thought to result from a delicate balance between different types of <strong>in</strong>teractions.Among those, hydrophobic <strong>in</strong>teractions are believed to play a major role[77-791. Site directed mutagenesis comb<strong>in</strong>ed with thermal and spectroscopicstability measurements have provided valuable <strong>in</strong>sights <strong>in</strong>to the contributions to prote<strong>in</strong>stability from <strong>in</strong>dividual am<strong>in</strong>o-acids [80]. But the <strong>in</strong>formation they provide onthe physical orig<strong>in</strong>s of these contributions is often <strong>in</strong>complete. Theoretical analysesbased on a detailed microscopic description of the <strong>molecular</strong> systems shouldtherefore be helpful <strong>in</strong> ga<strong>in</strong><strong>in</strong>g further understand<strong>in</strong>g of the underly<strong>in</strong>g physicalphenomena. Methods for comput<strong>in</strong>g free energy changes by us<strong>in</strong>g MD [81, 821 orMonte-Carlo techniques [83, 841 are particularly well suited for this purpose, as theyevaluate thermodynamic quantities that can be directly compared with the experimentalvalues. These methods are however still not rout<strong>in</strong>e [85-871 and thereliability of the results, especially when they are obta<strong>in</strong>ed for complex systems suchas prote<strong>in</strong>s, depends critically on the validity of the basic assumptions made <strong>in</strong> thesimulations [88].In the follow<strong>in</strong>g, the free energy simulation method is outl<strong>in</strong>ed, and its applicationto the analysis of the stability change produced by a hydrophobic mutation <strong>in</strong>barnase, is illustrated. The latter <strong>in</strong>volved the substitution of isoleuc<strong>in</strong>e at position96 by alan<strong>in</strong>e, <strong>in</strong> the hydrophobic core of the prote<strong>in</strong> and was shown to reduce thestability of the prote<strong>in</strong> from 10.5 kcal/mol (wild type) to 6.5-7.3 kcal/mol [89, 901.4.3.1 The Free Energy Perturbation MethodTo compute the free energy difference, AG, between two states A and B represent<strong>in</strong>grespectively, the wild type and mutant prote<strong>in</strong>s, a "<strong>computer</strong> alchemy" operation isundertaken where one am<strong>in</strong>o acid is transformed <strong>in</strong>to another [ll]. This is achievedus<strong>in</strong>g a hybrid potential:V(r", A) = (1 - A) v, (r") + A v, (rN) (4-12)This potential is a l<strong>in</strong>ear comb<strong>in</strong>ation of V, (rN) and V, (r"), the empirical potentialsdescrib<strong>in</strong>g wild type and the mutant prote<strong>in</strong>s respectively, with A be<strong>in</strong>g a cou-


4 Molecular Dynamics and Free Energy Calculations 87pl<strong>in</strong>g parameter, varied from 0 to 1. rN are the atomic coord<strong>in</strong>ates of the system encompass<strong>in</strong>ghere the prote<strong>in</strong> and solvent molecules. With this simple form of thehybrid potential problems of convergence are not uncommon and can be overcomeus<strong>in</strong>g more complex forms, non-l<strong>in</strong>ear <strong>in</strong> 1 [85].The free energy difference AG between the two states A (wild type) and B (mutant)can be obta<strong>in</strong>ed from either of two formally exact expressions. One is the socalled "exponential formula" (EF) [91, 921 :(4-13)where AV= VB (r") - V, (r"), dAi = Ai+l - Ai, kB the Boltzmann constant and 7the absolute temperature; the angle brackets represent an ensemble average obta<strong>in</strong>edus<strong>in</strong>g the potential V(r", Ai) from Eq. (4-12) to represent the system. To implementEq. (4-13), a series of simulations is set up correspond<strong>in</strong>g to a succession of discrete1 values between 0 and 1. From each simulation at a given 1, ensemble averages ofthe exponential factor are evaluated. Summation of the natural logarithm of theseaverages then leads to the required free energy value. Equation (4-13) is formallyexact provided that the configuration ensemble generated at a given Ai is representativeof the potential correspond<strong>in</strong>g to Ai+l. This implies conserv<strong>in</strong>g reversibility ofthe perturbation at each step along the pathway and requires tak<strong>in</strong>g small enough1 <strong>in</strong>tervals. It is also the ma<strong>in</strong> reason why the method is restricted to small size perturbations.The reversibility condition also stipulates that the EF expression, whenused <strong>in</strong> both the forward and reverse directions along a given perturbation, shouldyield the same result <strong>in</strong> absolute value. This is often done to monitor convergenceand as a consistency check, and <strong>in</strong>volves us<strong>in</strong>g the mirror expression of Eq. (4-13)which gives the free energy difference evaluated from an ensemble obta<strong>in</strong>ed from asimulation performed at Ai+l :AG = kBT c Ini(4-14)The other equivalent formulation, is called thermodynamic <strong>in</strong>tegration (TI). It hasoften been applied to compute the free energy change of systems where a parameterspecify<strong>in</strong>g the thermodynamic state, such as the volume or temperature, is variedslowly. It uses the folIow<strong>in</strong>g expression [93]:


88 Shoshana .L Wodak, Daniel van Belle, and Mart<strong>in</strong>e Prhost(4-15)(g)Ajwhere the canonical average is equal to ( Aprovided the hybrid poten-tial energy function V(r”, Ii) is l<strong>in</strong>ear <strong>in</strong> I (as <strong>in</strong> Eq. (4-12)). Ensemble averages ofA V are computed at each A value. Like <strong>in</strong> the EF procedure, these averages are obta<strong>in</strong>edfrom a series of simulations performed at successive A values. The <strong>in</strong>tegrationover A, which is also an exact expression, approximated here by a summation overa f<strong>in</strong>ite number of AI “w<strong>in</strong>dows”, then yields the requested free energy value.The accuracy of the results obta<strong>in</strong>ed with either the EF or TI procedures will dependon the quality of the empirical potential function V(rN, Ai), on whether or notit is appropriate to use the l<strong>in</strong>ear form for the hybrid potential (<strong>in</strong> particular for theTI procedure), and on the method employed to sample the configuration ensemble.Differences due to changes <strong>in</strong> the number of particles may arise due to the k<strong>in</strong>eticenergy contribution to A G. These differences cancel out however when identicalalchemical transformations are considered <strong>in</strong> two different states. Here they are thenative and unfolded states described below.4.3.2 Comput<strong>in</strong>g Free Energy Differences :Practical Aspects4.3.2.1 Implementation of the Perturbation MethodPrevious studies have shown that the l<strong>in</strong>ear dependence of the hybrid potential onthe coupl<strong>in</strong>g parameter A, is adequate for treat<strong>in</strong>g charged to non-polar (e.g.Asp + Ala [ll]), and charged to charged (e. g. Arg -+ His [94]) mutations. However,<strong>in</strong> cases where van der Waals <strong>in</strong>teractions dom<strong>in</strong>ate the transformation, non l<strong>in</strong>earforms of the hybrid potential were shown to lead to faster convergence [85]. In theapplication illustrated here which <strong>in</strong>volves a non-polar to non-polar mutation(Ile + Ala) dom<strong>in</strong>ance of van der Waals <strong>in</strong>teractions was also expected. To ensureconvergence of the calculations a variant of the classical procedure described <strong>in</strong> Section4.3.1 was therefore used. Several <strong>in</strong>termediate states were def<strong>in</strong>ed along apathway from A (wild type) to B (mutant) as illustrated <strong>in</strong> Figure 4-16. These stateswere generated by gradually modify<strong>in</strong>g van der Waals parameters and bond lengthsof the relevant sidecha<strong>in</strong>s <strong>in</strong> the direction of the transformation. The free energy differencesA Gi+i+l between successive states along the transformation pathway werethen computed with Eq. (4-13) or Eq. (4-15), us<strong>in</strong>g three values of I (A = 1/6, 3/6,


4 Molecular Dynamics and Free Energy Calculations 89dG(A 'B)1 A62s2f A63s3f A64s4f AC5s5f AC6S6Figure 4-16. Schematic representation of the pathway used for the Ile + Ala alchemicaltransformation. The pathway between the end states A (Ile) and B (Ala) is def<strong>in</strong>ed by six <strong>in</strong>termediatestates labelled S1 -S6. These states are generated by gradually modify<strong>in</strong>g van derWaals parameters and bond lengths of the sidecha<strong>in</strong> represent<strong>in</strong>g state A <strong>in</strong> the sidecha<strong>in</strong>represent<strong>in</strong>g state B (see Prkvost et al. for details [13]). The overall free energy difference iscomputed as the sum of the free energy differences between successive states along thisIpathway (AGj), yield<strong>in</strong>g AG(A-,B) = AGj.j=l516). Individual L~G~,~,~ values were then summed to give the overall free energydifference for the transformation. This corresponds to tak<strong>in</strong>g discrete po<strong>in</strong>ts alonga pathway between A and B that is described by a non-l<strong>in</strong>ear hybrid potential, andperform<strong>in</strong>g a l<strong>in</strong>ear <strong>in</strong>terpolation between these po<strong>in</strong>ts us<strong>in</strong>g the classical procedure.4.3.2.2 The Molecular Systems and Simulation ProcedureThe ensemble averages over configuration space required for Eqs. (4-13) and (4-15)were computed at room temperature (300 K) us<strong>in</strong>g a time sav<strong>in</strong>g <strong>molecular</strong> dynamicsprocedure called Stochastic Boundary Molecular Dynamics (SBMD) [95]. This pro-


90 Shoshana J Wodak, Daniel van Belle, and Mart<strong>in</strong>e Prkvostcedure is valid for simulat<strong>in</strong>g localized events and is appropriate for analyz<strong>in</strong>g smallconformational changes that would occur around and near the mutated sidecha<strong>in</strong>.An essential feature of this method is the partion<strong>in</strong>g of the system <strong>in</strong>to severalregions; simulat<strong>in</strong>g <strong>in</strong> full detail only the region of <strong>in</strong>terest while represent<strong>in</strong>g the effectof the rest of the system by appropriately chosen mean and stochastic forces.The perturbation protocol and the SBMD procedure were applied to the foldedand unfolded states of barnase. The folded state was represented by the high resolutioncrystallographic coord<strong>in</strong>ates of barnase [29]. Unlike the folded state, the unfoldedstate is rather ill-def<strong>in</strong>ed. Given that the mutated residue at position 96 is partof a /3 strand, and follow<strong>in</strong>g experimental evidence that the unfolded state of barnaseconta<strong>in</strong>s an appreciable amount of /3 structure [96], the unfolded state was modelledas an extended heptapeptide <strong>in</strong> water, compris<strong>in</strong>g <strong>in</strong> addition to residue 96 the threeresidues on each side of the mutated position.Calculations were performed <strong>in</strong> presence of water for both the folded and unfoldedstates. To remove close contacts, all <strong>in</strong>itial conformations were subjected to100 steps of energy m<strong>in</strong>imization.In both the folded and unfolded states, the system was partitioned <strong>in</strong>to two approximatelyspherical regions; the <strong>in</strong>ner region with a radius of 9 A was simulatedby <strong>molecular</strong> dynamics and the region between 9 and 11 A was modelled by Langev<strong>in</strong>dynamics [95]. The spheres were centered on the <strong>in</strong>itial Cpsposition of Ile 96, theresidue undergo<strong>in</strong>g mutation. The empty space <strong>in</strong>side the 11 A sphere was filled withwater molecules by overlay<strong>in</strong>g a previously equilibrated box of TIP3P waters [97],and remov<strong>in</strong>g all molecules with<strong>in</strong> 2.5 A of any prote<strong>in</strong> atom. After a 5 ps equilibrationof the water structure <strong>in</strong> the presence of the fixed prote<strong>in</strong> or heptapeptide, a secondTIP3P overlay was made to fill any voids <strong>in</strong> the solvent. The simulated systemfor the folded prote<strong>in</strong> <strong>in</strong>cluded 472 prote<strong>in</strong> atoms and 21 water molecules while thatof the unfolded state <strong>in</strong>cluded 76 prote<strong>in</strong> atoms and 145 water molecules. Simulationswere also performed for the unfolded state <strong>in</strong> vacuum, taken to represent a gasphase reference state.All the free energy calculations described <strong>in</strong> this section were performed with theCHARMM program [38]. The potentials V, and V, <strong>in</strong> Eq. (4-12) were representedby standard empirical energy functions and parameters for the polar hydrogen modelwhich treats aliphatic groups as extended atoms. Long-range <strong>in</strong>teractions weresmoothly truncated at 8.5 A with a shift<strong>in</strong>g function for the electrostatic <strong>in</strong>teractionand a switch<strong>in</strong>g function for van der Waals <strong>in</strong>teraction. A dielectric constant of unitywas used.Calculations of AG for a given L consist of an equilibration simulation of 5 psfollowed by an averag<strong>in</strong>g period of 10 ps. Coord<strong>in</strong>ates from every fifth time step weresaved and used <strong>in</strong> Eqs. (4-13) and (4-15). Each full alchemical transformation <strong>in</strong>volved105 ps of equilibration and 210 ps of averag<strong>in</strong>g. With an <strong>in</strong>tegration step of1 fs, a 5 ps simulation required about 30 m<strong>in</strong>utes of CPU time on the Cyber 205<strong>computer</strong> at the former John von Neuman Center.


4 Molecular Dynamics and Free Energy Calculations 914.3.3 Computed Changes <strong>in</strong> Prote<strong>in</strong> Stability for theIle 96-+Ala MutationTable 4-4 lists the computed free energy values obta<strong>in</strong>ed by the exponential formula(EF) and thermodynamic <strong>in</strong>tegration (TI) procedures for the Ile 96 <strong>in</strong>to Alaalchemical transformation. These are, ~lGf”~ for the solvated native prote<strong>in</strong>,LIGL’~ for the solvated unfolded state, and d GI’A for the gas phase reference state.Table 4-4. Computed free energy changes (<strong>in</strong> kcal/aol) the Ile --* Ala mutation <strong>in</strong> barnase.Contribution AGf-A AGf’A AAGf,,, AAG,,, AAG,,fprote<strong>in</strong> water referenceTI -3.09 -8.3~~-7.15 - 5.21 -1.15 4.06EF -3.39 -6.81 -6.56 - 3.42 -0.25 3.17EXP -3.3a; -4.0b -0.21‘Negative values <strong>in</strong> AAG correspond to contributions <strong>in</strong> which the wild type (Ile) is stabilizedrelative to the mutant (Ala). AGf, AG, and AG, are the free energies for the alchemicaltransformation Ile + Ala, <strong>in</strong> the folded state (prote<strong>in</strong>), the unfolded state (water), and the unfoldedstate <strong>in</strong> the gas phase (reference) respectively. AAGf-u is the unfold<strong>in</strong>g free energy differencefor the alchemical transformation. AAG,,, and AA G,,f are respectively, the solvationfree energy differences for Ile versus Ala <strong>in</strong> water and <strong>in</strong> the prote<strong>in</strong> <strong>in</strong>terior. The correspond<strong>in</strong>gAAG values are derived from the correspond<strong>in</strong>g AG values as expla<strong>in</strong>ed <strong>in</strong> the text.EF, TI and EXP stand for, the exponential formula Eq. (4-13), the thermodynamic <strong>in</strong>tegrationprocedure Eq. (4-15), and experimental results respectively. The superscripts a, b and c <strong>in</strong>dicatethat values are taken from Kellis et al. [89, 901 and Wolfenden et 01. [lo61 respectively.By use of the thermodynamic cycle (Figure 4-17 a), the correspond<strong>in</strong>g difference <strong>in</strong>unfold<strong>in</strong>g free energy, d ~lG~,~ is obta<strong>in</strong>ed from the follow<strong>in</strong>g expression [lo] :AAG~,, = AG,”~ - LIG~”~= AG&, - AG;,,, (4-16)The calculated LIL~G~+~ values are -3.41 and -5.21 kcal/mol obta<strong>in</strong>ed us<strong>in</strong>g theEF and TI procedures respectively. These values agree <strong>in</strong> sign and magnitude withthe values of - 3.3 and - 4.0 kcal/mol, measured <strong>in</strong> the experiment unfold<strong>in</strong>g studiesof Kellis et al. [89, 901. The negative values <strong>in</strong>dicate that unfold<strong>in</strong>g the mutant isenergetically more favorable than unfold<strong>in</strong>g the wild type, and hence that the wildtype folded state is more stable than the mutant folded state.Us<strong>in</strong>g another thermodynamic cycle the solvation free energy differencedAG,,,, is given by (Figure 4-17b):AAG,,, = AG;’* - LIG~+ = AG$, - AG:~,, (4-17)


92 Shoshana 1 Wodak, Daniel van Belle, and Mart<strong>in</strong>e Pkvostprote<strong>in</strong>IAGf + uwaterA G ;-'AAG:->~waterreferenceIA G sol"* E'UwaterA G :->A4GL'AreferencwaterreferenceE1rIAGr+ f* EIfprote<strong>in</strong>A G :-'AA G ;->Areferenciprote<strong>in</strong>


4 Molecular Dynamics and Free EnernV Calculations 93where AG,,,, are the solvation free energies, or equivalently, those of the transferfrom the gas phase to aqueous solution. The AAG,,. values calculated by the EFand TI procedures are - 0.25 and - 1.15 kcal/mol respectively. The correspond<strong>in</strong>gdifference obta<strong>in</strong>ed from experimental measures is - 0.21 kcal/mol. Experimentsand calculations hence agree about the small difference <strong>in</strong> solvation free energy betweenthe Ile and Ala residues. The negative sign of the computed (and experimental)values <strong>in</strong>dicates that transferr<strong>in</strong>g the Ile conta<strong>in</strong><strong>in</strong>g peptide from the gas phase towater is less favorable than transferr<strong>in</strong>g the Ala conta<strong>in</strong><strong>in</strong>g peptide.In addition, the free energy change for go<strong>in</strong>g from the gas phase to the foldedprote<strong>in</strong> was also calculated by use of the relation (Figure 4-17c):AAG,+= AG;’* - A G~’~ (4-18)4 Figure 4-17. The thermodynamic cycles <strong>in</strong> the calculations and the experiments.(a) The thermodynamic cycle for the unfold<strong>in</strong>g process of barnase. The vertical direction concernsalchemical processes correspond<strong>in</strong>g to the Ile -+ Ala transformation. On the lefthand side the transformation occurs <strong>in</strong> the folded prote<strong>in</strong>, yield<strong>in</strong>g the free energy differencedG{+A. On the right hand side, the transformation occurs <strong>in</strong> the unfolded state(modeled <strong>in</strong> our computation by an extended heptapeptide), yield<strong>in</strong>g the free energy differenceA G,”A. The horizontal direction concerns chemical steps of the unfold<strong>in</strong>g reaction.Unfold<strong>in</strong>g of the wild type prote<strong>in</strong> <strong>in</strong> presence of Ile is shown on top; it correspondsto the free energy difference dG;+:,,. Unfold<strong>in</strong>g of the Ala conta<strong>in</strong><strong>in</strong>g mutant is shownon the bottom, with the correspond<strong>in</strong>g change <strong>in</strong> free energy of AGj’,,.(b) The thermodynamic cycle for the solvation process of the unfolded prote<strong>in</strong> (the extendedheptapeptide). The vertical direction concerns alchemical processes correspond<strong>in</strong>g to theIle + Ala transformation. On the left hand side, the transformation occurs <strong>in</strong> the gasphase, yield<strong>in</strong>g the free energy difference LIG~’~. On the right hand side (which is identicalto the right hand side of cycle <strong>in</strong> (a)), the reaction occurs <strong>in</strong> water, yield<strong>in</strong>g the freeenergy difference d Gt’A. The horizontal direction concerns the solvation process for theunfolded prote<strong>in</strong>. Tak<strong>in</strong>g the Ile conta<strong>in</strong><strong>in</strong>g unfolded prote<strong>in</strong> from the gas phase to wateris shown on top. It yields the free energy difference AGf,,. The solvation process for theAla conta<strong>in</strong><strong>in</strong>g unfolded prote<strong>in</strong> is shown on the bottom, with the correspond<strong>in</strong>g change<strong>in</strong> free energy of AG:’,,.(c) The thermodynamic cycle for a fold<strong>in</strong>g/solvation process which br<strong>in</strong>gs the unfolded prote<strong>in</strong><strong>in</strong> the reference (gas phase) to the solvated folded state. The vertical direction concernsalchemical processes correspond<strong>in</strong>g to the Ile --t Ala transformation. On the left hand side(which is identical to the left hand side of the cycle <strong>in</strong> (b)), the transformation occurs <strong>in</strong>the gas phase, yield<strong>in</strong>g the free energy difference dGf‘A. The right hand side (which isidentical to the left hand side of the cycle <strong>in</strong> (a)), shows the same transformation <strong>in</strong> thefolded prote<strong>in</strong>, which yields the free energy difference d Gf”A. The horizontal directionconcerns the fold<strong>in</strong>g/solvation reaction. The process of transferr<strong>in</strong>g the Ile conta<strong>in</strong><strong>in</strong>g unfoldedprote<strong>in</strong> from the gas phase to the Ile conta<strong>in</strong><strong>in</strong>g folded state is shown on top. Ityields the free energy difference d Gf+j.. The same process <strong>in</strong> presence of Ala is shown onthe bottom, with the correspond<strong>in</strong>g change <strong>in</strong> free energy of ~lGfl,~.


94 Shoshana .l Wodak, Daniel van Belle, and Mart<strong>in</strong>e PrivostThe values obta<strong>in</strong>ed are +3.17 and +4.06 kcal/mol. No direct measurement of thisquantity is available from experiment. It can however be obta<strong>in</strong>ed by difference fromthe other experimental data given <strong>in</strong> Table 4-4. This yields values of +3.1 and+ 3.8 kcal/mol, correspond<strong>in</strong>g to the two different experimental measurements forA A Gf- -We thus see that though hydration effects provide important contributions to thefree energy of transfer of the <strong>in</strong>dividual sidecha<strong>in</strong>s from water to the prote<strong>in</strong> <strong>in</strong>terior,they seem to play only a m<strong>in</strong>or role <strong>in</strong> the overall free energy balance when the dvferencebetween the sidecha<strong>in</strong>s of Ile and Ala is considered. The major contributionto this free energy balance is on the other hand provided by terms perta<strong>in</strong><strong>in</strong>g to thenative folded state.It should be noted however, that for mutations <strong>in</strong>volv<strong>in</strong>g charged or polarsidecha<strong>in</strong> one would expect the contributions from hydration effects, or alternativelythose result<strong>in</strong>g from the alchemical transformation <strong>in</strong> the unfolded state, to be muchmore important [ll] than it is <strong>in</strong> the case discussed here.4.3.4 Error EstimationThe free energy values computed from ensemble averages of complex systems suchas those considered here, may be subject to uncerta<strong>in</strong>ties and errors from differentorig<strong>in</strong>s. This be<strong>in</strong>g a difficult and controversial aspect of the free energy calculations,much effort has been recently devoted to <strong>in</strong>vestigat<strong>in</strong>g it [98- 1031, and detaileddescription of the error analysis made for the computations described above wouldrequire extend<strong>in</strong>g the discussion well beyond the scope of the present chapter. In thefollow<strong>in</strong>g is therefore provided only a very limited account of this analysis.One obvious source of error would be the <strong>in</strong>complete convergence of the simulations[98, 1031 which can be assessed from estimates of their statistical precision.Computations based on the standard deviation of <strong>in</strong>dividual A Gi free energy valuesobta<strong>in</strong>ed here by the TI procedure, estimate the precision of the overall free energydifference AG to be about 1 kcal/mol. The precision of the correspond<strong>in</strong>g AAGvalues is thus of about 2 kcal/mol. Also, the computations have been performedwith two formally equivalent procedures (the EF and TI methods) that would giveidentical results if full convergence had been achieved. The discrepancy between thefree energy differences for isoleuc<strong>in</strong>e and alan<strong>in</strong>e obta<strong>in</strong>ed by these methods is 1.8for denaturation and 0.9 kcal/mol for solvation (see Table 4-4).In addition to random errors, systematic errors may also arise. Ow<strong>in</strong>g to the severelimitations on simulation times rotational barriers are crossed <strong>in</strong>frequently, andsampl<strong>in</strong>g of configuration space is often restricted to the potential wells of only oneor a few of the possible isomers. As a consequence, computed free energy valuesoften neglect the contributions from multiple isomeric states, which could be a


4 Molecular Dynamics and Free Energy Calculations 95potential problem [102]. Assum<strong>in</strong>g that only the rotational isomers of the mutatedsidecha<strong>in</strong>s contribute significantly to the free energy difference, the procedure ofStraatsma and McCammon [loo] was used to evaluate the correction that one wouldneed to apply to the computed free energy change for the Ile + Ala mutation, to adequatelyaccount for rotational isomer sampl<strong>in</strong>g. This correction was evaluated to beabout 0.5 kcal/mol [104], <strong>in</strong> accord with results obta<strong>in</strong>ed previously when comput<strong>in</strong>gb<strong>in</strong>d<strong>in</strong>g free energies of antiviral drugs to human rh<strong>in</strong>ovirus [105].Another source of systematic error could be the use of an <strong>in</strong>adequate model forthe unfolded state, about which detailed <strong>in</strong>formation is not available. The magnitudeof the error thus <strong>in</strong>troduced was evaluated <strong>in</strong> the context of a different mutation ofthe same residue: Ile 96 -+ Val [104]. The correspond<strong>in</strong>g alchemical transformationwas computed <strong>in</strong> the gas phase for both the extended heptapeptide, and the sameheptapeptide taken <strong>in</strong> the a-helical conformation. The computed free energy valueswere found to differ by less than 0.5 kcal/mol between the two conformations.In conclusion, the statistical imprecision of the free energy calculations described<strong>in</strong> this section are evaluated to be between 1-2 kcal/mol. Several of the possiblesystematic errors are estimated not to exceed 0.5 kcal/mol and hence to lie wellwith<strong>in</strong> the overall statistical error of the calculations. Systematic errors due to shortcom<strong>in</strong>gof the potential function may also be of consequence, but have not been consideredhere.4.3.5 Prote<strong>in</strong> Stability and the Hydrophobic EffectIn analyz<strong>in</strong>g the effects of am<strong>in</strong>o-acid substitutions on prote<strong>in</strong> stability, parallels areoften drawn between prote<strong>in</strong> denaturation and transfer processes from organicsolvents to water, or hydration free energies. In a good number of cases [89, 90,107- 1101 the change <strong>in</strong> thermodynamic stability between the wild type and mutantwas found to be roughly proportional to the free energies of transfer. But the correlationwith hydration free energy changes were much poorer [107].Similar trends are observed for the Ile-Ala mutation analyzed here. Thedifference <strong>in</strong> the transfer free energies of Ala versus Ile range from 1.5 to3.11 kcal/mol, values obta<strong>in</strong>ed for octanol and cyclohexane respectively [lll, 1121.These values are of the same order as the computed or experimental fold<strong>in</strong>g freeenergy differences between the wild type and Ala mutant. On the other hand,the hydration free energy differences between Ile and Ala are significantly smaller.The experimental values differ by - 0.21 kcal/mol and the computed difference(AAG,,,) is -0.251 - 1.15 kcal/mol.To understand the orig<strong>in</strong>s of these observations it is useful to consider separatelythe transfer of the side cha<strong>in</strong> from the gas phase to the <strong>in</strong>terior of the prote<strong>in</strong> and<strong>in</strong>to the aqueous solvent (Figure 4-18). To do so, the water to prote<strong>in</strong> transfer process


96 Shoshana .L Wodak, Daniel van Belle, and Mart<strong>in</strong>e PrPvostFigure 4-18. Thermodynamic cycle <strong>in</strong>corporat<strong>in</strong>g the solvation and unfold<strong>in</strong>g processes. Thewater to prote<strong>in</strong> transfer process (the unfold<strong>in</strong>g process) is decomposed <strong>in</strong>to two separatesolvation processes: gas phase to water solvation process, and gas phase to prote<strong>in</strong> solvationprocess. Boxed values correspond to the differences (Ile versus Ala) <strong>in</strong> the experimentaltransfer or solution free energy values for the organic solvents, ethanol (Ethol), octanol(Octol), cyclohexane (Chx) as well as the difference <strong>in</strong> water solution free energies. The computedAd G values for the Ile --t Ala transformation are given <strong>in</strong> bold (see legend of Figure 4-16for further details).is decomposed <strong>in</strong>to two separate solvation processes which use the gas phase as areference state. We see that the total free energy difference when go<strong>in</strong>g from the gasphase to the prote<strong>in</strong> <strong>in</strong>terior (Ad G,+) is significantly larger than the correspond<strong>in</strong>gdifference <strong>in</strong> hydration free energies (dAG,,,). As already outl<strong>in</strong>ed <strong>in</strong> Section4.3.3, this <strong>in</strong>dicates that hydration effects play only a m<strong>in</strong>or role <strong>in</strong> the overallfree energy balance when the difference between the sidecha<strong>in</strong>s of Ile and Ala is considered,despite of the fact that they provide important contributions to the freeenergy of transfer of <strong>in</strong>dividual sidecha<strong>in</strong>s from water to the prote<strong>in</strong> <strong>in</strong>terior. It alsosuggests furthermore that important contributions to this free energy balance areprovided by effects perta<strong>in</strong><strong>in</strong>g to the native prote<strong>in</strong>.The <strong>in</strong>terest<strong>in</strong>g question then arises of what such effects may be. In order to addressthis question it is useful to critically exam<strong>in</strong>e the correlation between thechanges <strong>in</strong> unfold<strong>in</strong>g free energies upon mutation and small molecule transfer data.


4 Molecular Dynamics and Free Energy Calculations 97Analyses of a number of “non-disruptive” mutations [89, 90, 107, 110, 113-1151where a large hydrophobic am<strong>in</strong>o-acid was replaced by a smaller hydrophobicresidue, as <strong>in</strong> the Ile -, Ala case studies here, showed that the unfold<strong>in</strong>g free energieswere highly variable, but that they were generally larger <strong>in</strong> magnitude than the correspond<strong>in</strong>gdifference <strong>in</strong> transfer free energies. This can be attributed <strong>in</strong> part to theuse of transfer data obta<strong>in</strong>ed with different organic solvents [go, 1071. But it is morelikely to result from the fact that transfer experiments may not always be a goodmodel for prote<strong>in</strong> fold<strong>in</strong>g. It is for example questionable that the prote<strong>in</strong> <strong>in</strong>teriorresembles non-polar or, even slightly polar, organic liquids. Prote<strong>in</strong>s are moredensely packed than these liquids, with densities similar to those of crystals of smallmolecules [116]. The polymeric nature of prote<strong>in</strong>s could furthermore severely constra<strong>in</strong>pack<strong>in</strong>g re-arrangements needed to accommodate changes <strong>in</strong> residue size.Also, the environment of a given am<strong>in</strong>o acid may differ from one prote<strong>in</strong> to another,or <strong>in</strong> different sites with<strong>in</strong> the same prote<strong>in</strong>,Arguments <strong>in</strong> favor of these hypotheses have been recently provided. The analysisof six non-disruptive hydrophobic mutations <strong>in</strong> the core of phage T4 lysozyme [113],presents compell<strong>in</strong>g evidence that the excess <strong>in</strong> stabilization free energy over thetransfer free energies observed <strong>in</strong> these mutants is l<strong>in</strong>early related to the size of thecavity created upon the mutation evaluated from the correspond<strong>in</strong>g mutant crystalstructure. This suggests that the prote<strong>in</strong> core may <strong>in</strong>deed differ from organic solventsby its reduced ability to adjust pack<strong>in</strong>g around the modified group. Rather conv<strong>in</strong>c<strong>in</strong>gtheoretical arguments were also presented [117], that this excess can be attributedto contributions from cavity formation and prote<strong>in</strong> reorganization processes.In view of the above considerations, what can be said about the Ile 96-+Alamutation <strong>in</strong> barnase? The measured change <strong>in</strong> unfold<strong>in</strong>g free energy (3.214.0 kcal/mol) could concievable be <strong>in</strong> excess relative to the difference <strong>in</strong> the transfer freeenergies (1.5-3.11 kcal/mol) if the lower value (obta<strong>in</strong>ed for octanol) is considered.In this case the possibility that substitution of the bulky Ile sidecha<strong>in</strong> by the smallerAla creates a cavity at the helix-sheet <strong>in</strong>terface, should be exam<strong>in</strong>ed. To this end,atomic volume calculations [54] were performed us<strong>in</strong>g coord<strong>in</strong>ates from the endpo<strong>in</strong>ts of the simulation pathway. They show that a cavity of about 60-90 A3 isformed around residue 96 <strong>in</strong> the Ala conta<strong>in</strong><strong>in</strong>g mutant. Furthermore, rms atomicfluctuations were computed from several <strong>in</strong>dependent 30 ps vacuum <strong>molecular</strong>dynamics trajectories of the wild type and the mutant prote<strong>in</strong>s. While these werefound to be similar on the average, fluctuations of ma<strong>in</strong>cha<strong>in</strong> and C, atoms ofresidue 96 were about twice as large <strong>in</strong> the Ala mutant, suggest<strong>in</strong>g that the latter ismore mobile. These results suggests that the Ile 96 + ala mutant <strong>in</strong> barnase followsthe trends observed for other non-disruptive mutations. But given that this conclusionis based on simulated models, it must await conformation from a detailedanalysis of the mutant crystal structure.


98 Shoshana 1 Wodak, Daniel van Belle, and Mart<strong>in</strong>e Prkvost4.4 Conclud<strong>in</strong>g RemarksThis chapter described the application of <strong>molecular</strong> dynamics simulation techniquesto the study of the dynamic and thermodynamic properties of wild type barnase andof one of its stability mutants.Methodological and practical aspects <strong>in</strong>volved <strong>in</strong> generat<strong>in</strong>g a 250 ps trajectoryof wild type barnase <strong>in</strong> water, were described. This trajectory was then analyzedfocus<strong>in</strong>g on aspects of prote<strong>in</strong> motion and prote<strong>in</strong>-solvent <strong>in</strong>teractions. The analysissuggests that the simulated system of 8777 atoms displayed a physically reasonablebehavior, a tribute to the progress achieved <strong>in</strong> improv<strong>in</strong>g the force-fields and simulationmethodology. Prote<strong>in</strong> conformations deviated relatively little from the start<strong>in</strong>gcrystal structure, dur<strong>in</strong>g the simulations; the structure of the first water layer at theprote<strong>in</strong> surface, followed patterns previously observed for dilute solutions of smallmolecules; the simulations revealed a reduced mobility of the water molecules at theprote<strong>in</strong> surface, <strong>in</strong> agreement with experimental evidence from NMR spectroscopy,and with other simulation studies.It is however clear that the time scale of the simulations (250 ps) is still much tooshort to allow the system to sample the conformational states accessible to it underexperimental conditions, and/or that are biologically relevant. It may thus not alwaysbe possible to relate the detailed picture provided by simulations of this length to experimentaldata, a problem which also hampers its validation. This situation is improv<strong>in</strong>ghowever, as computational tools, that allow a more efficient sampl<strong>in</strong>g ofconformational space are developed, and as the available <strong>computer</strong> power <strong>in</strong>creases.The second part of this chapter, illustrated the application of <strong>molecular</strong> dynamicssimulations to the calculation of the free energy change produced by substitut<strong>in</strong>g Ile96 by Ala <strong>in</strong> the hydrophobic core of barnase. Despite the well known sampl<strong>in</strong>g problemsdiscussed above, free energy perturbation methods rely<strong>in</strong>g on <strong>molecular</strong>dynamics trajectories of 315 ps, are shown to yield computed free energy differenceswhich are <strong>in</strong> satisfactory agreement with the changes <strong>in</strong> the denaturation freeenergies, and <strong>in</strong> solvation free energies measured experimentally. It is important torealize however, that the error associated with the computed free energy values is ofthe same order as the values themselves, and hence that the reliability of the resultsdepends critically on the underly<strong>in</strong>g assumptions. With due vigilance however, thedescribed free energy computation analysis is seen to provide useful <strong>in</strong>sights <strong>in</strong>to theorig<strong>in</strong>s of the hydrophobic stabilization of prote<strong>in</strong>s.


4 Molecular Dynamics and Free Energy Calculations 99AcknowledgementsThe free energy simulations were carried out <strong>in</strong> close collaboration with M. Karplus(Harvard University) and B. Tidor (Whitehead Institute). M. P. is research associateat the National Fund for Scientific Research (Belgium). This work was supported <strong>in</strong>part by the European Bridge Program (BIOT-CT91-0270).References[l] Goodfellow, J. M., Williams, M. A., Curr. Op<strong>in</strong>. Struct. Biol. 1992, 2, 211.[2] Gros, P., van Gunsteren, W. F., Hol, W. G. J., Science 1990, 249, 1149.[3] Torda, A. E., Scheek, R. M., van Gunsteren, W. F., J. Mol. Biol. 1990, 214, 223.[4] Nilges, M., Habazetti, J., Brunger, A. T., Holak, T. A., J. Mol. Biol. 1991, 219, 499.[5] Pearlman, D. A., Kollman, P. A., J. Mol. Biol. 1991, 220, 457.[6] Clore, G. M., Gronenborn, A. M., Prote<strong>in</strong> Eng. 1987, I, 275.[7] Kapte<strong>in</strong>, R., Boelens, R., Scheek, R. M., van Gunsteren, W. F., Biochemistry 1988, 27,5389.[8] Kuryan, J., Osapay, K., Burley, S. K., Bri<strong>in</strong>ger, A. T., Hendrickson, W. A., Karplus, M.,Prote<strong>in</strong>s 1991, 10, 340.[9] Chou, K.-C., Carlacci, I., Prote<strong>in</strong> Eng. 1991, 4, 661.[lo] Wong, C. F., McCammon, A., J. Am. Chem. SOC. 1986, 108, 3380.[ll] Gao, J., Kuczera, K., Tidor, B., Karplus, M., Science 1989, 244, 1069.[12] Simonson, T., Bri<strong>in</strong>ger, A. T., Biochemistry 1992, 31, 8661.[13] Prkvost, M., Wodak, S. J., Tidor, B., Karplus, M., Proc. Natl. Acad. Sci. 1991, 88, 10880.[14] Kuczera, K., Gao, J., Tidor, B., Karplus, M., Proc. Natl. Acad. Sci. 1990, 87, 8481.[15] Mizushima, N., Spellmeyer, D., Hirono, S., Pearlman, D., Kollman, P. A., J. Biol. Chem.1991, 266, 11801.[16] Caldwel, J. W., Agard, D. A., Kollman, P. A., Prote<strong>in</strong>s 1991, 10, 140.[17] Warshel, A., Sussman, F., Hwang, J.-K., J. Mol. Biol. 1988, 201, 139.[18] Yadav, A., Jackson, R. M., Holbrook, J. J., Warshel, A., J. Am. Chem. SOC. 1991, 113,4800.[19] Aqvist, J., Warshel, A., J. Mol. Biol. 1992, 224, 7.[20] Aqvist, J., J. Phys. Chem. 1991, 95, 4587.[21] Cecam Workshop, ‘Models for Prote<strong>in</strong> Dynamics: Orsay, France, 1976.[22] Levitt, M., Sharon, R., Proc. Natl. Acad. Sci. USA 1988, 85, 7557.[23] Chandrasekhar, I., Clore, G. M., Szabo, A., Gronenborn, A. M., Brooks, B. R., .I Mol.Biol. 1992, 226, 239.[24] Lonchrich, R. J., Brooks, B. R., J. Mol. Biol. 1990, 215, 439.[25] Avbelj, F., Moult, J., Kitson, D. H., James, M. N. G., Hagler, A. T., Biochemistry 1990,29, 8658.[26] Brunne, R. M., Liep<strong>in</strong>sh, E., Ott<strong>in</strong>g, G., Wuthrich, K., van Gunsteren, W. F., J. Mol.Biol. 1993, 231, 1040.[27] Meier<strong>in</strong>g, E. M., Bycroft, M., Fersht, A. R., Biochemistry 1991, 30, 11348.[28] Fersht, A. R., J. Mol. Biol. 1992, 224:3; this issue is entirely dedicated to the study ofstability and fold<strong>in</strong>g of barnase.


100 Shoshana J. Wodak, Daniel van Belle, and Mart<strong>in</strong>e P&ost[29] Mauguen, Y., Hartley, R. W., Dodson, E. J., Dodson, G. G., Bricogne, G., Chothia, C.,Jack, A., Nature 1982, 297, 162.[30] Mauguen, Y., personal communication.[31] Fersht, A. R., personal communication.[32] Hartley, R. W., Biochemistry 1968, 7, 2401.[33] Verlet, L., Phys. Rev. 1967, 159, 98.[34] Swope, W. C., Andersen, H. C., Berens, H., Wilson, K. R., J. Chem. Phys. 1982, 76, 637.[35] van Gunsteren, W. F., Berendsen, H. J. C., Mol. Phys. 1977, 34, 1311.[36] Ryckaert, J. P., Cicotti, G., Berendsen, H. J. C., J. Comp. Phys., 1977, 23, 327.[37] Blaney, J. M., We<strong>in</strong>er, P. K., Dear<strong>in</strong>g, A., Kollmann, P. A,, Jorgensen, E. C., Oatley,S. J., Burridge, J. M., Blake, C. C. F., J Am. Chem. SOC. 1982, 104, 6424.[38] Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., Swam<strong>in</strong>anthan, S., Karplus, M., J.Comp. Chem. 1983, 87, 1883.[39] Hermans, J., Berendsen, H. J. C., van Gunsteren, W. F., Biopolymers 1984, 23, 1513.[40] Jorgensen, W. L., Tirado-Rives, J., J. Am. Chem. SOC. 1988, 110, 1657.[41] Clementi, E., Cavallone, F., Scordamaglia, R., J. Am. Chem. SOC. 1977, 99, 5531.[42] Carozzo, L., Corongiu, G., Petrongolo, C., Clementi, E., J. Chem. Phys. 1978, 68, 787.[43] Jorgensen, W. L., J Am. Chem. SOC. 1981, 79, 926.[44] De Leeuw, S. W., Perram, J. W., Smith, E. R., Proc. R. SOC. London A 1980, 373, 27.[45] Brooks, C. L., Pettitt, B. M., Karplus, M., J. Chem. Phys. 1985, 83, 5897.[46] Prhost, M., van Belle, D., Lippens, G., Wodak, S. J., Mol. Phys. 1990, 71, 587.[47] Ewald, P. P., Ann. Phys. 1921, 64, 253.[48] Fletcher, R., Reeves, C. M., Comput. J. 1964, 7, 149.[49] Alard, P., Mdmoire de Licence, Universitt Libre de Bruxelles, Belgium, 1982.[50] Pangali, C., Rao, M. , Berne, B. J., Mol. Phys. 1980, 40, 661.[51] Delhaise, P., van Belle, D., Bardiaux, M., Alard, P., Hamers, P., van Cutsem, E.,Wodak, S. J., .l Mol. Graphics 1985, 3, 116.[52] McLachlan, A. D., J. Mol. Biol. 1979, 128, 49.(531 Cooper, A., Proc. Natl. Acad. Sci. 1976, 73, 2740.[54] Alard, P., Ph. D. Thesis, Universite Libre de Bruxelles, Belgium, 1991.[55] Holmes, M. A., Matthews, B. W., J Mol. Biol. 1982, 160, 623.[56] Baker, E. N., Hubbard, R. E., Prog. Biophys. Mol. Biol. 1984, 44, 97.[57] Ott<strong>in</strong>g, G., Liep<strong>in</strong>sh, E., Wiithrich, K., J. Am. Chem. SOC. 1991, 113, 4363.I581 Jeng, M. F., Englander, S. W., Elove, G. A., Wang, J. A., Roder, H., Biochemistry 1990,29, 10433.[59] Mossakowska, D., Nyberg, K., Fersht, A., Biochemistry 1989, 28, 3843.[60] Meier<strong>in</strong>g, E., Bycroft, M., Fersht, A., Biochemistry 1991, 30, 11348.[61] Meier<strong>in</strong>g, E., Serrano, L., Fersht, A., J. Mol. Biol. 1992, 225, 585.[62] Horowitz, A., Serrano, L., Avron, B., Bycroft, M., Fersht, A., J. Mol. Biol. 1990, 216,1031.[63] Horowitz, A., Fersht, A., J. Mol. Biol. 1992, 224, 733.[64] Berendsen, H. J. C., van Gunsteren, W. F., Zw<strong>in</strong>derman, H. R. J., Geurtsen, R. G., Ann.N. X Acad. Sci. 1986, 482, 269.[65] Ahlstrom, P., Teleman, O., Jonsson, B., J Am. Chem. SOC. 1988, 110, 4198.[66] Brooks, C. L. 111, Karplus, M., J. Mol. Biol. 1989, 208, 159.[67] Ott<strong>in</strong>g, G., Wiithrich, K., J. Am. Chem. SOC. 1989, Ill, 1871.[68] Ott<strong>in</strong>g, G., Liep<strong>in</strong>sh, E., Farmer, B. T., Wiithrich, K., J. Biomol. NMR 1991, 1, 209.[69] Ott<strong>in</strong>g, G., Liep<strong>in</strong>sh, E., Wiithrich, K., Science 1991, 254, 974.[70] Swam<strong>in</strong>athan, S., Harisson, S. W., Beveridge, D. C., J Am. Chem. SOC. 1978, 100, 5705.[71] Jorgensen, W. L., J Chem. Phys. 1982, 77, 5787.


4 Molecular Dynamics and Free Energy Calculations 101Rossky, P., Karplus, M., J. Am. Chem. SOC. 1978, 101, 1913.Hansen, J.-P., McDonald, I. R., Theory of Simple Liquids, Academic Press, 1987,p. 201.van Belle, D., Wodak, S. J., J. Am. Chem. SOC. 1993, 115, 647.Berendsen, H. J. C., Postma, J. P. M., van Gunsteren, W. F., Hermans, J., Inter<strong>molecular</strong>Forces, B. Pullman (ed.), Dordrecht, 1981, p. 331.Privalov, P. L., Adv. Prote<strong>in</strong> Chem., 1988, 39, 191.Kauzmann, W., Adv. Prote<strong>in</strong> Chem., 1959, 14, 1.Kauzmann, W., Nature 1987, 325, 763.Dill, K. A., Biochemistry 1990, 29, 7133.Alber, T., Ann. Rev. Biochem. 1989, 58, 765.Tembe, B. L., McCammon, J. A., Comp. Chem. 1984, 8, 281.Straatsma, T. P., Berendsen, H. J. C., Postma, J. P. M., J. Chem. Phys. 1986, 85, 6720.Brooks 111, C. L., J. Phys. Chem. 1986, 90, 6680.Jorgensen, W. L., Ravimohan, C., 1 Chem. Phys. 1985, 83, 3050.Cross, A. J., Chem. Phys. Lett. 1986, 128, 198.Straatsma, T. P., McCammon, J. A., J. Chem. Phys. 1989, 90, 3300.van Gunsteren, W. F., Modell<strong>in</strong>g of Molecular Structures and Properties. Proceed<strong>in</strong>gsof an International Meet<strong>in</strong>g, Nancy, France, 1.-15. September 1989, J.-L. Rivail (Ed.),Studies <strong>in</strong> Physical and Theoretical Chemistry, 1990, 71, 463.Shi, Y. Y., Mark, A. E., Cun-X<strong>in</strong>, W., Fuhua, H., Berendsen, H. J. C., van Gunsteren,W. F., Prote<strong>in</strong> Eng. 1993, 6, 289.Kellis, J. T. Jr., Nyberg, K., Sali, D., Fersht, A. R., Nature 1988, 333, 784.Kellis, J. T. Jr., Nyberg, K., Fersht, A. R., Biochemistry 1989, 28, 4914.Unless stated otherwise these methods expla<strong>in</strong>ed below are applied to the canonicalensemble for which the number N of particles, the volume and the temperature T arefixed parameters.Zwanzig, R. W., J. Chem. Phys. 1954, 22, 1420.Kirkwood, J. G., J. Chem. Phys. 1935, 3, 300.Tidor, B., Karplus, M., Biochemistry 1991, 30, 3217.Brunger, A. T., Brooks 111, C. L., Karplus, M., Proc. Natl. Acad. Sci. USA 1985, 82,8458.Matouschek, A., Kellis, Jr., J. T., Serrano, L., Fersht, A. R., Nature 1989, 340, 122.Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W., Kle<strong>in</strong>, M. L. J.,Chem. Phys. 1983, 79, 926.Beveridge, D. L., DiCapua, F. M., Ann. Rev. Biophys. Chern. 1989, 18, 431.Pearlman, D. A., Kollman, P. A., J. Chem. Phys. 1991, 94, 4532.Straatsma, T. P., McCammon, J. A., J. Chem. Phys. 1991, 95, 1175.Wood, R. H., J. Phys. Chem. 1991, 95, 4838.Hodel, A., Simonson, T., Fox, R. O., Brunger, A. T., J. Phys. Chem., 1993, 97, 3409.Mitchell, M. J., McCammon, J. A., J. Comp. Chem. 1991, Z2, 271.Prbost, M., Wodak, S. J., Tidor, B., Karplus, M., 1993, to be submitted.Wade, R. C., McCammon, J. A., J. Mol. Biol. 1992, 225, 679-696 and 697.Wolfenden, R., Andersson, L., Cullis, P. M., Southgate, C. C. B., Biochemistry 1981,20, 849.Matsumura, M., Becktel, W. J., Matthews, B. W., Nature 1988, 334, 406.Yutani, K., Ogasahara, K., Tsujita, T., Sug<strong>in</strong>o, Y., J. Biol. Chem. 1984, 259, 14076.Yutani, K., Ogasahara, K., Tsujita, T., Sug<strong>in</strong>o, Y., Proc. Natl. Acad. Sci. USA 1987,84, 4441.Sandberg, P. J., Terwill<strong>in</strong>ger, T. C., Science 1989, 245, 54.


102 Shoshana J Wodak, Daniel van Belle. and Mart<strong>in</strong>e PrPvost[Ill] Fauchere, J. L., Pliska, V., Eur. J Med. Chem. Chim. Theor. 1983, 18, 369.[112] Radzicka, A., Wolfenden, R., Biochemistry 1988, 27, 1664.[113] Ericksson, A. E., Baase, W. A., Zhang, X.-J., He<strong>in</strong>z, D. W., Blaber, M., Baldw<strong>in</strong>, E.P., Matthews, B. W., Science 1992, 255, 178.[114] Serrano, L., Kellis, Jr., J. T., Cann, P., Matouscheck, A., Fersht, A. R., J Mol. Biol.1992, 224, 783.[115] Shortle, D., Chan, H. S., Dill, K. A., Prote<strong>in</strong> Sci. 1992, I, 201.[116] Richards, F. M., J Mol. Biol. 1974, 82, 1.[117] Lee, B., Prot. Sci., 1993, 2, 733.


Computer Modell<strong>in</strong>g <strong>in</strong> Molecular BiologyEdited by Julia M . GoodfellowOVCH Verlagsgesellschaft mbH. 19955 The Use of Molecular DynamicsSimulations for Modell<strong>in</strong>gNucleic AcidsE . WesthoJ C. Rub<strong>in</strong>.Carrez. and K FritschModelisation et Simulation des Acides NuclCiques. UPR “Structuredes MacromolCcules Biologiques et MCcanismes de Reconnaissance”.Institut de Biologie MolCculaire et Cellulaire. Centre National de laRecherche Scientifique 15. Rue R . Descartes. F-67084 Strasbourg. FranceContents5.15.25.35.45.55.65.75.85.95.105.115.125.135.145.15Introduction ...................................................... 104Relevance of Molecular Dynamics Simulations ........................ 104Water: An Integral Part of Nucleic Acids ............................. 106Potential Energy Function .......................................... 107Implicit Treatment of the Solvent .................................... 108Explicit Treatment of the Solvent .................................... 110Choice of the Ensemble ............................................ 113Choice of Cut-Offs ................................................ 115Choice of Counterions ............................................. 116MD of DNA Oligomers with Implicit Solvent Treatment ............... 118MD of DNA Oligomers with Explicit Solvent Treatment ................ 121MD of the Anticodon Hairp<strong>in</strong> with Implicit Solvent Treatment .......... 123MD of the Anticodon Hairp<strong>in</strong> with Explicit Solvent Treatment .......... 123Modell<strong>in</strong>g of Large Nucleic Acid Molecules ........................... 127Conclusions ....................................................... 128References ........................................................ 129


104 E. Westhof; C. Rub<strong>in</strong>-Carrez, and K Fritsch5.1 IntroductionAn understand<strong>in</strong>g of the functional mechanism of a biological macromolecule requiresthe knowledge not only of its precise <strong>molecular</strong> organization <strong>in</strong> space but alsoof its <strong>in</strong>ternal dynamics. Molecular modell<strong>in</strong>g attempts to construct the three-dimensionalstructure of a macromolecule on the basis of the experimental as well astheoretical data available on a particular macromolecule and on the family to whichit belongs. The validity, the scope, and the predictive power of the model obta<strong>in</strong>edwill depend on the nature of the experimental observations collected (X-ray diffraction,sequence data, biochemical and chemical <strong>in</strong>formation, . . .). High-resolutionX-ray crystallographic analysis (diffraction data at 1.5 to 1.0 A resolution) yields awealth of unequalled structural <strong>in</strong>formation on the crystallized macromolecule.However, this requires not only the crystallization of the macromolecule but also thesolution to the phase problem. Generally, with biological macromolecules, the problemis compounded by their size and complexity. Besides, nucleic acids are very difficultto crystallize, s<strong>in</strong>ce they are highly charged macromolecules which, <strong>in</strong> case ofRNA molecules, can undergo spontaneous cleavages, In addition, when large,nucleic acids, especially RNAs, often exchange between various base pair<strong>in</strong>gs andfold<strong>in</strong>gs.5.2 Relevance of Molecular DynamicsSimulationsThe very fact that X-ray crystallography produces well-def<strong>in</strong>ed structures, characterizedby a set of coord<strong>in</strong>ates, fosters a rather rigid and static view of macromolecules.On the other hand, various spectroscopic methods (especially nuclear magneticresonance and fluorescence methods) and hydrogen exchange studies provide ampleevidence for motions of various frequencies and amplitudes <strong>in</strong> macromolecules <strong>in</strong>solution. Nevertheless, X-ray diffraction can contribute to our knowledge of smallscaledynamic properties of prote<strong>in</strong>s and nucleic acids. This advance was made possibleby the development of ref<strong>in</strong>ement methods [l], which allow the precise determ<strong>in</strong>ationof atomic coord<strong>in</strong>ates and of the atomic Debye-Waller factors (B-factorsor thermal parameters). Indeed, atoms with thermal motions have their contributionsto the <strong>in</strong>tensities of the diffracted X-rays reduced by an exponential factorwhich depends on the Debye-Waller parameter and on the resolution. This Debye-Waller parameter itself depends on the mean-square displacement of the atom, i. e.the mean of the squares of the differences between all positions occupied by the atom


5 Modell<strong>in</strong>g Nucleic Acids 105and its mean position. S<strong>in</strong>ce thermal parameters measure the mean-square atomicdisplacement, they depend on the physico-chemical potential surround<strong>in</strong>g each atomand <strong>in</strong> which each atom moves and, consequently, can be computed from trajectoriesobta<strong>in</strong>ed by <strong>molecular</strong> dynamics calculations [2, 31. With the impetus given by<strong>molecular</strong> dynamics simulations of macromolecules, B-factors determ<strong>in</strong>ed by X-rayref<strong>in</strong>ement were taken seriously and shown to represent a mean<strong>in</strong>gful measure ofatomic fluctuations <strong>in</strong> macromolecules [4]. Ref<strong>in</strong>ement of crystal structures ofmacromolecules allows also the determ<strong>in</strong>ation of solvent sites, i. e. of sites frequentlyoccupied by water molecules or ions. However, the residence times of those solventmolecules are not accessible by X-ray crystallographic techniques. Recently, NMRmethods could, however, attribute some residence times for structural watermolecules <strong>in</strong> prote<strong>in</strong>s and nucleic acids [5]. Furthermore, s<strong>in</strong>ce hydrogen atoms arenormally not seen with X-ray crystallography of macromolecules, hydrogen bondsare ascribed on the basis of distance criteria (for a discussion and references see Frey[6]). With <strong>molecular</strong> dynamics simulations performed <strong>in</strong> aqua the behavior of thewater molecules can be studied <strong>in</strong> great detail. Molecular dynamics simulations givealso <strong>in</strong>formation not only on <strong>in</strong>stantaneous hydrogen bond<strong>in</strong>g networks but on thelifetimes of hydrogen bonds.Moreover, <strong>in</strong> order to function, biological macromolecules have to <strong>in</strong>teract witheach other <strong>in</strong> a specific and controlled way. The specific <strong>in</strong>teractions betweenmacromolecules occur through complementary surfaces or templates held togetherby various physico-chemical forces like hydrogen bond<strong>in</strong>g, van der Waals, or electrostaticforces. Aga<strong>in</strong>, the precise description of the specific <strong>in</strong>teractions present <strong>in</strong>the formed complex can be obta<strong>in</strong>ed by techniques like X-ray crystallography. On theother hand, the physico-chemical processes underly<strong>in</strong>g recognition and occurr<strong>in</strong>gbefore complex formation are <strong>in</strong>herently more dynamic and consequently more difficultto <strong>in</strong>vestigate by most physico-chemical techniques. Molecular dynamicssimulations could yield tremendous <strong>in</strong>sights on the phenomena occurr<strong>in</strong>g prior tocomplex formation [7]. One of them which is of particular <strong>in</strong>terest and importanceis the desolvation step <strong>in</strong> each partner of a future complex. The roles of the solventand solute dynamics <strong>in</strong> the desolvation step are still unclear. It is not yet settled eitherwhether the thermal fluctuations occurr<strong>in</strong>g <strong>in</strong> both the ligand and the macro<strong>molecular</strong>recognition site are selectively amplified so as to help complex formation byfacilitat<strong>in</strong>g the <strong>in</strong>terplay of the various physico-chemical forces <strong>in</strong> the search for am<strong>in</strong>imum <strong>in</strong> free energy 18, 93.In this review, we will concentrate on our own experience and work on <strong>molecular</strong>dynamics simulations of nucIeic acids from a methodological po<strong>in</strong>t of view. Progresstowards the goals stated above is slow and we will try to emphasize the difficultiesand pitfalls of the method with the aim of del<strong>in</strong>eat<strong>in</strong>g directions for future research.Extensive reviews on <strong>molecular</strong> dynamics (MD) of nucleic acids exist and we referto them for a broader coverage [lo, 111. One of our goals <strong>in</strong> develop<strong>in</strong>g MD techniquesis to use simulations for generat<strong>in</strong>g and evaluat<strong>in</strong>g the reliability of ab <strong>in</strong>itio


106 E. Westhof; C. Rub<strong>in</strong>-Carrez, and K Fritschstructures of nucleic acids, i.e. of structures which are not derived from X-raycrystallography. These aspects will be discussed at the end of the article. However,at first, one should calibrate the method and assess the capability of MD techniques<strong>in</strong> reproduc<strong>in</strong>g structures and dynamical behaviors observed by experimentalmethods.5.3 Water: An Integral Part of Nucleic AcidsNucleic acids are highly charged macromolecules with numerous polar atoms on theheterocyclic bases and on the sugar-phosphate backbone. The tertiary structures ofnucleic acids result therefore from equilibria between (1) electrostatic forces due tothe negatively charged phosphates; (2) stack<strong>in</strong>g <strong>in</strong>teractions between the bases dueto hydrophobic and dispersion forces as well as to hydrogen bond<strong>in</strong>g <strong>in</strong>teractions betweenthe polar atoms of the bases and water molecules; and (3) the conformationalenergy of the sugar-phosphate backbone. In its preferred conformations, thepolynucleotide backbone exposes the negatively charged phosphates to the dielectricscreen<strong>in</strong>g by the solvent and promotes the stacked helical arrangement of adjacentbases. In this way, a hydrophobic core is created where hydrogen bond formation betweenthe nucleic acid bases as well as additional sugar-base and sugar-sugar <strong>in</strong>teractionsare favored. Further, via variations <strong>in</strong> torsion angles of the sugar-phosphatebackbone and through re-orientations of the bases, nucleic acids adapt their structuresso that their polar hydrophilic atoms form favorable <strong>in</strong>teractions with themolecules of the solvent. This <strong>in</strong>terdependence between solvent and nucleic acidstructure constitutes the physicochemical basis for DNA polymorphism. In suchhelical structures, only the <strong>in</strong>ternal atoms <strong>in</strong>volved <strong>in</strong> hydrogen bond<strong>in</strong>g between thebases are protected from solvent while most of the other atoms are accessible towater. Thus, water molecules participate to the overall stability of helical conformationsof nucleic acids by (1) screen<strong>in</strong>g the charges of the phosphates; (2) bond<strong>in</strong>g toand bridg<strong>in</strong>g between the polar exocyclic atoms of the bases; and (3) <strong>in</strong>fluenc<strong>in</strong>g theconformations of residues with methyl groups via hydrophobic <strong>in</strong>teractions. Besides,due to the periodicity of the helical structures of nucleic acids, water sites and waterbridges <strong>in</strong>volv<strong>in</strong>g polar base atoms or phosphate oxygens lead to structured arrangementsof water molecules, called columns, cha<strong>in</strong>s, filaments [12], or sp<strong>in</strong>es [13].Extensive reviews have appeared on nucleic acid hydration [lo, 14-16] and we willonly recall some salient po<strong>in</strong>ts. Similar water b<strong>in</strong>d<strong>in</strong>g sites and water bridges arefound repeatedly <strong>in</strong> small as well as <strong>in</strong> large nucleic acid crystals. The anionicphosphate oxygen atoms are the most hydrated, the sugar r<strong>in</strong>g oxygen atom 04’ is<strong>in</strong>termediate, and the esterified 03’ and 05’ backbone atoms are the least hydrated.The hydrophilic atoms of the bases are about equally well hydrated, at half the level


5 Modell<strong>in</strong>g Nucleic Acids 107of the phosphate oxygen atoms <strong>in</strong> DNA helices. The relative order of hydration aff<strong>in</strong>itiesis thus: anionic phosphate oxygens, polar base atoms, and sugar oxygenatoms. The most frequent water bridges appear ma<strong>in</strong>ly <strong>in</strong> the m<strong>in</strong>or groove of helicalnucleic acids. The systematic use of the sugar-water-base and base-water-base bridges<strong>in</strong> the m<strong>in</strong>or groove contrasts with the versatility and mobility of the base-water-basebridges and water occupation <strong>in</strong> the major groove of helical nucleic acids. The hydrationaround phosphate groups of helical structures is characterized by “cones ofhydration” centered on each anionic phosphate oxygen, by water bridges betweenanionic phosphate oxygen atoms of successive residues on the same strand and by5’-phosphate-water-base bridges. Water bridges are observed also <strong>in</strong> non-standardconformations and very systematically around non-canonical base pairs (e. g. A-G,A-A, U-G pairs). In RNA, the hydroxyl 02’ atom makes several types of waterbridges or direct hydrogen bonds to other residues. Those water molecules participat<strong>in</strong>gor mediat<strong>in</strong>g structural bridges between atoms of the nucleic acid shouldbe regarded as an <strong>in</strong>tegral constituent of nucleic acids <strong>in</strong> aqueous solution. Consequently,<strong>in</strong> helical nucleic acids, water molecules might strongly <strong>in</strong>fluence f<strong>in</strong>e structuralparameters like stack<strong>in</strong>g geometries, twist, and roll angles between base pairsas well as some propeller-twist angles of base pairs. In non-helical elements, likeloops and bends, water molecules participate <strong>in</strong> the stabilization of non-canonicalbase pairs, which often close hairp<strong>in</strong>s, and <strong>in</strong> bridg<strong>in</strong>g approach<strong>in</strong>g phosphategroups.5.4 Potential Energy FunctionWe have used the potential energy function of the modell<strong>in</strong>g program AMBER 3.0,which <strong>in</strong>cludes terms describ<strong>in</strong>g the covalent structure deformations (bond stretches,bond angle deformations, torsional rotations) and terms represent<strong>in</strong>g the nonbonded<strong>in</strong>teractions broken <strong>in</strong> van der Waals, electrostatic, and hydrogen bond<strong>in</strong>g contributions[17]. The potential energy function has the follow<strong>in</strong>g form:f C (7; :;)Hbonds rij ‘ijIn the electrostatic term, E is either a constant or a function of the distance betweenthe charges.


108 E. WesthoJ C. Rub<strong>in</strong>-Carrez, and K FritschThe description above implies that the ma<strong>in</strong> theoretical bottleneck <strong>in</strong> MD simulationsof nucleic acids resides <strong>in</strong> the treatment of the electrostatics and of the solvent(water molecules and counter ions). In an aqueous solution, two charges are screenedfrom one another by two dist<strong>in</strong>ct effects, the local water structure or orientation ofwater dipoles and the effect of other ions. Macroscopically, these two effects arehandled respectively by the dielectric constant and the Debye-Huckel screened potential.In MD simulations, two paths have been followed: either the macroscopic onewith an implicit treatment of the solvent effects or the microscopic one with an explicitatomic description of the solvent molecules.5.5 Implicit Treatment of the SolventTheoretically, the implicit approach is the least satisfy<strong>in</strong>g, s<strong>in</strong>ce it blends an atomisticdescription of the solute with a macroscopic treatment of the solvent. Besides, it isgenerally based on a distance-dependent “dielectric constant” E (r) <strong>in</strong> the termdescrib<strong>in</strong>g electrostatic <strong>in</strong>teractions.The peculiarities of a distance-dependent dielectric function are well described byRogers [MI. However, despite those caveats, several dielectric functions have beensuggested and used because of the tremendous reduction <strong>in</strong> <strong>computer</strong> time and thesimplicity of the calculations. The most common ones are those offered by theprogram AMBER where E (r) = a or E (r) = ar with a a scalar, usually equal to either1 or 4. Such functions, however, do not take <strong>in</strong>to account dielectric saturation of thewater dipoles, s<strong>in</strong>ce they are either constant or <strong>in</strong>crease l<strong>in</strong>early with distance.Recently, two other dielectric functions which <strong>in</strong>clude dielectric saturation have been<strong>in</strong>troduced <strong>in</strong> MD simulations, one for prote<strong>in</strong>s and another one ma<strong>in</strong>ly for nucleicacids (Figure 5-1). The one suggested by Mehler and Eichele [19] has the follow<strong>in</strong>gform :BE (r) = A +1 + ke-*BrA = -20.929, B = &H2o -A, I = 0.001787, k = 3.4781, &Hzo = 78.4We have used the sigmoidal dielectric function proposed by Lavery et al. [20] andwe modified the program AMBER to allow the use of this distance dependent functionwhich has the form:D-1E (r) = D --2((Ar)’ + 2Ar + 2) e-Ar


5 Modell<strong>in</strong>g Nucleic Acids 109ElFigure 5-1. Variations with the distance separat<strong>in</strong>g the charges of different dielectric functions(above) and of the correspond<strong>in</strong>g electrostatic energies (below). The curves are, respectively,el = 4r, ~2 = Lavery et al. [20], e3 = Mehler and Eichele [19], 84 = r, e5 = 1. The electrostaticenergies are computed with q, = q2 = 0.3 e and Ei (r) = ~332q1q2 . (From Fritsch [64]).Ei(r)rwhere D = 78, A = 0.36 or A = 0.16. The first value gives the same dependence ondistance than the function developed by H<strong>in</strong>gerty et al. [21].The second effect, the dampen<strong>in</strong>g of ionic <strong>in</strong>teractions due to screen<strong>in</strong>g by saltions, is usually handled by a Debye-Hiickel screened potential of the form [22]:e Ka1 + KG--Kr


110 E. Westhof; C. Rub<strong>in</strong>-Carrez, and K Fritschwhere I/K is the Debye screen<strong>in</strong>g distance, or distance at which the electrostaticpotential is reduced to l/e of its value <strong>in</strong> the absence of screen<strong>in</strong>g by the mobile ions,and G the ionic radius of the counterions. At the physiological ionic strength of100 mM, the Debye screen<strong>in</strong>g distance is about 10 A. With filamentous-like polyionslike DNA, the phenomenum of counterion condensation must be considered. Whenthe polyion is modelled as an <strong>in</strong>f<strong>in</strong>itely long and uniformly spaced l<strong>in</strong>e of charges,there is a condensed fraction of counterions which depends only on the l<strong>in</strong>ear chargedensity of the polyion and the valence of the counterion and which is <strong>in</strong>variant tosalt concentrations <strong>in</strong> excess of 0.1 M. For NaCl aqueous solutions of B-DNA, 76%of the phosphate charge is theoretically compensated by condensed counterions. Ina recent work, Fenley et al. [23] have <strong>in</strong>vestigated the effect of the f<strong>in</strong>ite length ofthe polyion: at 10-5 M, a 10 bp DNA oligomer has about 45% of its phosphatecharge compensated. These two macroscopic approaches are <strong>in</strong>corporated <strong>in</strong> twoways <strong>in</strong> MD simulations. First, the cut-off for the calculations of the electrostaticterms is set to about 10 A and, secondly, phosphate charges are scaled down to about-0.2. When applied brutally, the latter change has the drawback of decreas<strong>in</strong>g thecharges on the phosphate to values below those of some polar atoms <strong>in</strong> the bases(e. g. the am<strong>in</strong>o group of guan<strong>in</strong>e residues).We have recently described a fast algorithm for calculat<strong>in</strong>g any dielectric function<strong>in</strong> MD simulations of biological macromolecules [24]. With no <strong>in</strong>crease <strong>in</strong> CPUtime, one can add a constant to take care of ion condensation, a term for represent<strong>in</strong>gDebye-Huckel screen<strong>in</strong>g, and any dielectric function which accounts for saturationof water dipoles.5.6 Explicit Treatment of the SolventThe complete microscopic description of the solvent is theoretically the only firmlygrounded method. Pr<strong>in</strong>cipally, it sounds easy to implement although practically itis fraught with difficulties. A long period of equilibration and thermalization isnecessary <strong>in</strong> order to avoid strongly unfavorable energetic <strong>in</strong>teractions between thesolute and the solvent molecules with concomitant deformation of the solute.Generally, the conformational changes <strong>in</strong>troduced by collisions between the soluteand the solvent molecules are irreversible and the ensu<strong>in</strong>g simulation does not reflect<strong>molecular</strong> dynamics around the equilibrium (Figures 5-2 and 5-3).For DNA systems [25] the follow<strong>in</strong>g equilibration protocol was developped(Figure 5-4). First, <strong>in</strong> order to elim<strong>in</strong>ate close contacts and unfavorable geometricstra<strong>in</strong>s <strong>in</strong> the <strong>in</strong>itial system, 200 steps of steepest descent m<strong>in</strong>imization are appliedon the water molecules. To disorder the periodicity of the box, which is built fromsmaller boxes (each box conta<strong>in</strong><strong>in</strong>g 216 Monte Carlo water molecules), this step is


5 Modell<strong>in</strong>g Nucleic Acids 11 1Figure 5-2. Breakage of aWatson-Crick pair dur<strong>in</strong>g aMD simulation seen from themajor groove (top) and fromthe m<strong>in</strong>or groove (below). Thestart<strong>in</strong>g conformation is at theleft and the conformationafter 53 ps simulation at theright. Such large deformationsare obta<strong>in</strong>ed when the systemis not properly equilibratedbefore start<strong>in</strong>g the simulation.(From Fritsch [64]).Figure 5-3. Two views of a DNA fragment with its associated sodium counterions (blackspheres). At left is shown the start<strong>in</strong>g conformation. The state obta<strong>in</strong>ed after 40 ps of MDsimulation is shown at the right. A counterion has been strongly displaced from the DNA andis about to cross the “wall” of the solvation box lead<strong>in</strong>g to a halt of the simulation. Aga<strong>in</strong>,such situations occur when the whole system is not adequately equilibrated. (From Fritsch~41).


112 E. Westhof; C Rub<strong>in</strong>-Carrez and K Fritschig c 2150F 10050Figure 5-4. Protocol for simulat<strong>in</strong>g MD of a hydrated DNA fragment: (T) step for thermaliz<strong>in</strong>gthe water molecules; (H) gradual heat<strong>in</strong>g of the complete system (DNA + counterions+ water molecules); (E) equilibration step at 300 K; (P) production step. (From Fritsch[W.followed by one picosecond of MD simulation on the water molecules at 300 K andconstant volume (N, 7'). In those two steps, the DNA and counterions are heldfixed to their <strong>in</strong>itial positions us<strong>in</strong>g the Belly option of AMBER. To relax the possiblestra<strong>in</strong>s created at 300 K, one picosecond of MD simulation is performed at 10 Kat constant pressure (N, e 7') on the solvent only with the solute (i. e. the DNA atomsand the counterions) still fixed. Then the water bath is thermalized by a gradual <strong>in</strong>creaseof the temperature from 10 K to 300 K <strong>in</strong> steps of 50 K (3 ps at eachtemperature) with aga<strong>in</strong> the Belly option for constra<strong>in</strong><strong>in</strong>g the DNA and thecounterions. The complete system is then cooled back aga<strong>in</strong> to 10 K and graduallyheated from 10 to 300 K at (N, e 7') <strong>in</strong> steps of 50 K (3 ps at each temperature) withconstra<strong>in</strong>ts on each hydrogen bond <strong>in</strong>volved <strong>in</strong> Watson-Crick base pair<strong>in</strong>g (with aforce constant KH = 10 kcal.mol-'.A-2 and an equilibrium distance of 3 A).F<strong>in</strong>ally, the heat<strong>in</strong>g step is completed by 10 ps of equilibration at 300 K followed bythe production period.For a RNA system, the anticodon hairp<strong>in</strong>, a similar but simplified protocol hasbeen used successfully (Rub<strong>in</strong>-Carrez and Westhof, <strong>in</strong> preparation). First, 100 stepsof steepest descent m<strong>in</strong>imization are applied on the solvent molecules only <strong>in</strong> orderto relax the start<strong>in</strong>g stra<strong>in</strong>s present at the <strong>in</strong>terface between the RNA and the solventmolecules. Then, with the Belly option, water and counterion molecules are equilibratedaround the fixed RNA solute for 2.5 ps at 50 K. The temperature of thesystem is progressively <strong>in</strong>creased to 250 K <strong>in</strong> steps of 50 K with 2.5 ps simulationsat each step. At 250 K, constra<strong>in</strong>ts on the Watson-Crick H-bond<strong>in</strong>g distances are set<strong>in</strong> the helical part of the solute and 2.5 ps of MD simulation performed. This isfollowed by 5 ps at 275 K and 10 ps at 300 K. Accord<strong>in</strong>g to such a protocol, the thermalizationand equilibration of the system lasts 30 ps.


5 Modell<strong>in</strong>n Nucleic Acids 1135.7 Choice of the EnsembleStatistical mechanics def<strong>in</strong>es several ensembles (or assembly of all possible microstatesconsistent with a given set of constra<strong>in</strong>ts characteriz<strong>in</strong>g the macroscopic state)over which averages are evaluated <strong>in</strong> order to obta<strong>in</strong> properties of the system [26].In the thermodynamic limit (systems conta<strong>in</strong><strong>in</strong>g a number of particles close to theAvogadro number), those ensembles produce equivalent average properties. S<strong>in</strong>ceMD treats systems with a f<strong>in</strong>ite number of particles, simulations done <strong>in</strong> differentensembles will not give similar fluctuations. Three ensembles are generally considered.The microcanonical ensemble (or (N, y E)) is appropriate for closed andisolated systems with fixed total energy (E) and fixed size (i. e. number of particles,N, and volume, K constant) <strong>in</strong> which the correspond<strong>in</strong>g thermodynamic function isthe negative of the entropy. The canonical ensemble (or (N, I.: T)) refers to closedsystems with fixed size (Nand Vconstant) but kept at a constant temperature by contactto a heat bath. The correspond<strong>in</strong>g thermodynamic function is then theHelmholtz free energy. The isothermal-isobaric ensemble (or (N, E 7')) is theassembly of all microstates with T and P constant and thus <strong>in</strong> which both volumeand energy fluctuations may occur with the correspond<strong>in</strong>g thermodynamic functionbe<strong>in</strong>g the Gibbs free energy. It should be remembered that <strong>in</strong> <strong>computer</strong> simulationsone can compute <strong>in</strong>stantaneous mechanical quantities like the energy, the temperature,or the pressure but these should not be confused with thermodynamic conceptslike E, or P which are def<strong>in</strong>ed as ensemble averages. This is particularly important<strong>in</strong> case of the pressure for which large fluctuations <strong>in</strong> the <strong>in</strong>stantaneous pressure willbe observed even at a constant pressure [27].In the AMBER program, simulations at constant temperature follow the algorithmdue to Berendsen et al. [28] <strong>in</strong> which the velocities v are scaled to values Av withwhere To is the temperature, of the bath, T the <strong>in</strong>stantaneous temperature, At thetime step, and T~ the relaxation step characteristic of the coupl<strong>in</strong>g to the bath. Insimulations at constant energy, the velocities are l<strong>in</strong>early rescaled when thetemperature deviation from the reference temperature is larger than a given tolerance.The scale factor is then [29]:


114 E. Westhoj C Rub<strong>in</strong>-Carrez, and K FritschI""I"~'1''''I''~'0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1Alpr)B1 /f----1/I- -105 -15-25Time (ps)CFigure 5-5. Time evolutions for the totalenergy, the potential energy, and thetemperature for three types of MDsimulations: (A) the fully hydrated modelunder (N, T) conditions; (B) the implicitmodel at constant T with ccal; and(C) the implicit model at constant energywith E,,]. (From Fritsch et al. [25]).


5 Modell<strong>in</strong>g Nucleic Acids 115With <strong>in</strong> vacua simulations, the microcanonical ensemble appears preferable; while<strong>in</strong> aquo simulations with periodic boundary conditions are better handled <strong>in</strong> the(A? T)-ensemble (Figures 5-5 and 5-6).Figure 5-6. Root mean squares deviations (<strong>in</strong> A) versus time (<strong>in</strong> ps) dur<strong>in</strong>g the heat<strong>in</strong>g, theequilibration, and the production steps for the three types of MD simulations described <strong>in</strong>Figure 5-5. The deviations are calculated on all atoms between the start<strong>in</strong>g conformation andthe <strong>in</strong>stantaneous one. (From Fritsch et al. [25]).5.8 Choice of Cut-OffsAlthough, ideally, no cut-off limit<strong>in</strong>g <strong>in</strong>ter-particle <strong>in</strong>teractions should be applied,practically, <strong>computer</strong> limitations impose them especially <strong>in</strong> case of <strong>in</strong> aquo simulations.The notion of Debye distance mentioned above sets a cut-off on electrostatic<strong>in</strong>teractions around 10 A. For <strong>in</strong> aquo simulations, the cut-off for solute-solvent <strong>in</strong>teractionswas generally chosen around 8 to 9 A, while no cut-off was set for solutesolute<strong>in</strong>teractions (i. e. <strong>in</strong>teractions between nucleic acid atoms as well as betweennucleic acid atoms and counterions which are considered <strong>in</strong> AMBER as belong<strong>in</strong>gto the solute).


116 E. WesthoJ C. Rub<strong>in</strong>-Carrez, and K Fritsch5.9 Choice of CounterionsThe importance of ions for the stabilization of deoxy- and ribonucleic acids, has longbeen recognized. The theoretical treatment of specific ion b<strong>in</strong>d<strong>in</strong>g is difficult. Whendiscuss<strong>in</strong>g metal ion b<strong>in</strong>d<strong>in</strong>g to nucleotide ligands two limit<strong>in</strong>g cases are usually considered:(a) site b<strong>in</strong>d<strong>in</strong>g with a direct coord<strong>in</strong>ation of the metal ion to ligands of thenucleotide, lead<strong>in</strong>g to the formation of <strong>in</strong>ner-sphere complexes after partial dehydrationof the metal ion; (b) ion atmosphere b<strong>in</strong>d<strong>in</strong>g where<strong>in</strong> the metal ions with their<strong>in</strong>tact hydration shells <strong>in</strong>teract <strong>in</strong>directly through water molecules with thenucleotides, form<strong>in</strong>g outer-sphere complexes [30]. The k<strong>in</strong>etics of magnesium b<strong>in</strong>d<strong>in</strong>gto polynucleotides have been well studied 1311 and follow the preced<strong>in</strong>g structuraldescription. First, an outer-sphere complex is formed with a rate close to thelimit of diffusion control. In a second step, one or more water molecules exchangewith ligand of the polynucleotide lead<strong>in</strong>g to <strong>in</strong>ner-sphere complexation with a ratedeterm<strong>in</strong>ed by the process of water dissociation, around 100000 per second formagnesium and 1000 times faster for calcium ions. The second step was observedonly with short oligoriboadenylates, <strong>in</strong>dicat<strong>in</strong>g that <strong>in</strong>ner-sphere complexation requiresparticular conformations of the nucleotide cha<strong>in</strong>s and the presence of theaden<strong>in</strong>e base and of the ribose hydroxyl group 02’.nYo features observed <strong>in</strong> crystal structures appear <strong>in</strong>terest<strong>in</strong>g [lo] (Figure 5-7).First, the ion is rarely bound directly to the phosphate anionic oxygen atoms <strong>in</strong>ribonucleotide complexes, but more often <strong>in</strong> deoxynucleotide complexes. Direct contactto nucleic acid ligands occur ma<strong>in</strong>ly at N7 of pur<strong>in</strong>es and at anionic oxygenatoms while fully hydrated ions <strong>in</strong>teract as often with the bases as with the sugarphosphatebackbone. Secondly, often one or two water molecules of the ion hydrationshell are common to two ions so that they share an apex or an edge of the coord<strong>in</strong>ationspheres (two such sodium ions are separated by 3.3 to 3.8 A). While watermolecules are weakly held to sodium ions (they exchange at the diffusion-controlledvalue and have thus residency times on the ion around the nanosecond), watermolecules are held more strongly to magnesium ions. The very high rate constantsfor magnesium ions association to polynucleotides <strong>in</strong>dicate “outer sphere” complexation.Therefore, not surpris<strong>in</strong>gly, crystal structures of nucleotides and polynucleotidesreveal “<strong>in</strong>ner sphere” complexation with sodium ions and “outersphere” complexation with magnesium ions. However, while magnesium b<strong>in</strong>d<strong>in</strong>g tobases <strong>in</strong> helices appears to be preferentially of the “outer sphere” type, b<strong>in</strong>d<strong>in</strong>g tophosphate groups occur <strong>in</strong> loops and bends of the sugar-phophate backbone and isoften of the “<strong>in</strong>ner sphere” type.As counterions, we have favored the use of ammonium ions. First, its tetrahedralstereochemistry resembles that of water and its geometry as well as its empiricalparameters have been determ<strong>in</strong>ed [32]. Secondly, as described above, other types ofcounterions possess coord<strong>in</strong>ated water molecules (e. g. the octahedral arrangement


5 Modell<strong>in</strong>n Nucleic Acids 117Figure 5-7. Example of an “<strong>in</strong>ner” complex between a partially hydrated sodium ion and twonucleic acid fragments (above; adapted from Chevrier et al. [38]) and of an “outer” complexbetween fully hydrated magnesium ions and the deep groove of the anticodon helix (below;from Westhof and Sundaral<strong>in</strong>gam [65]).of water molecules around a sodium or magnesium ion) so that b<strong>in</strong>d<strong>in</strong>g can be eitherdue to “outer sphere” complexation (via water molecules) or to “<strong>in</strong>ner sphere” complexation(after removal of one or two water molecules) depend<strong>in</strong>g on severalenergetical factors, time scales, and the particular geometry of the nucleic acid. Correctparametrization of such phenomena is by far not available for sodium andmagnesium ions.


118 E. WesthoJ C. Rub<strong>in</strong>-Carrez, and I.: Fritsch5.10 MD of DNA Oligomers with ImplicitSolvent TreatmentThe least amount of distortions <strong>in</strong> double helical DNA fragments, as compared tocrystallographic results, is observed for simulations with a sigmoidal distance-dependentdielectric function. In (A, T)-rich DNA oligomers, the use of this dielectricfunction has shown that the thym<strong>in</strong>e sugars prefer the 04’-endo pucker, while theaden<strong>in</strong>e sugars adopt preferentially the CT-endo pucker [33]. The hydrogen bondsof the Watson-Crick base pairs are also stable dur<strong>in</strong>g such simulations, while thethree-center hydrogen bonds typical of dA-dT sequences have lifetimes at least20 times smaller than standard Watson-Crick H-bonds [34]. These simulation resultswere recently further supported by the analysis of the crystal structure of a B-DNAdodecamer d(CGCAAATTTGCG) [35].In MD simulations us<strong>in</strong>g a l<strong>in</strong>ear dependence on the distance (with either the constant4 or l), severe distortions of the DNA fragments can be observed. Besides, ruptureof Watson-Crick base pairs, especially term<strong>in</strong>al ones, is systematically observed.The lifetimes of the Watson-Crick pairs are therefore unrealistically low (less than50 ps). The disruption of the Watson-Crick hydrogen bonds leads to the stabilisationof three-center hydrogen bond<strong>in</strong>g and, often, to the formation of odd <strong>in</strong>ter-basepair<strong>in</strong>gs. With E = r, there is a strong tendency for form<strong>in</strong>g <strong>in</strong>tra<strong>molecular</strong> H-bondsbetween exocyclic am<strong>in</strong>o groups (of C or G) and anionic phosphate oxygens or carbony1groups, lead<strong>in</strong>g <strong>in</strong>evitably to severe distortion of the DNA fragments. Thistendency is prom<strong>in</strong>ent <strong>in</strong> m<strong>in</strong>imization studies. Such additional H-bonds were alsoobserved after <strong>in</strong> vacuo m<strong>in</strong>imization studies of the Z (WC)-DNA model proposedby Ansev<strong>in</strong> and Wang [36]. Such <strong>in</strong>tra<strong>molecular</strong> H-bonds are most probably artefactualand result from the simultaneous occurrence of two factors. First, there is a toostrong contribution of the electrostatic <strong>in</strong>teraction between the am<strong>in</strong>o groups and theclosest phosphate group. Secondly, the absence of explicit water molecules preventsthe <strong>in</strong>sertion and bridg<strong>in</strong>g of water molecules between am<strong>in</strong>o groups and anionicphosphate oxygens, as commonly observed <strong>in</strong> crystal structures [lo, 37, 381 and <strong>in</strong>recent MD simulations [39]. Figure 5-8 shows examples of such <strong>in</strong>tra<strong>molecular</strong> H-bonds for an A-A self-pair us<strong>in</strong>g the sigmoidal dependence function. It is <strong>in</strong>terest<strong>in</strong>gto note that those artefactual H-bonds were not observed when us<strong>in</strong>g E = 4r.Figure 5-9 shows a plot of lnr versus 1/T for the mean lifetimes of the threecenterhydrogen bonds compared to the lifetimes of the aden<strong>in</strong>e sugars <strong>in</strong> theC2‘-endo doma<strong>in</strong> for poly(dA)-poly(dT) simulated with the sigmoidal dielectricfunction. For comparison purposes, the theoretical curves expected on the basis ofabsolute rate theory are given. The temperature dependences for the two types oflifetimes are similar and much weaker than the theoretical one. One would, thus,conclude that the activation energies govern<strong>in</strong>g the three-center H-bonds are of


5 Modell<strong>in</strong>g Nucleic Acids 119Figure 5-8. Top: Stereo view of some water molecules around the unusual A-A base pair <strong>in</strong>the complex between cytidilyl-3’,Sf-adenos<strong>in</strong>e and proflav<strong>in</strong>e (Westhof et al. [66]). Notice thewater bridge between the am<strong>in</strong>o groups of adenos<strong>in</strong>e residues and opposite phosphate anionicoxygen atoms. Below: Hydrogen bond<strong>in</strong>g system around the same unusual base pair A-A obta<strong>in</strong>edafter m<strong>in</strong>imization of the parallel helix d(AGAGAGAGAG), us<strong>in</strong>g E,,~. (From Fritschand Westhof, submitted [54]).the same order of magnitude as those govern<strong>in</strong>g the vibrational and pseudorotationalmovements <strong>in</strong> the puckered aden<strong>in</strong>e sugars. Very low activation energies(< 1 kcal mol-’) are also observed for hydrogen-bond lifetimes <strong>in</strong> simulations ofliquid water [40].To analyze further the dynamics of the hydrogen bonds, autocorrelation functionswere evaluated. For calculations of autocorrelation functions, the history of


120 E. WesthoJ C. Rub<strong>in</strong>-Carrez, and K Fritsch-51 9 8 I I I 1 I 1 I I JIIIIIIII,,/,,.I,,,,,l,,l , , , / , , ,0 5 10 15 20IO~/T (~-1)Figure 5-9. Plot of In T versus 1/T for the mean lifetimes of the three-center hydrogen bonds(0) and for the mean lifetimes of the aden<strong>in</strong>e sugar <strong>in</strong> the C2’-endo pseudorotational doma<strong>in</strong>(0) dur<strong>in</strong>g a simulation of poly(dA)-poly(dT) with E,,,. The theoretical curves, given by theequation5 (ps) = v -’ exp (AG*/RT)with v -1 = 0.16 ps, correspond to AG* equal to 0.2, 0.5, and 1 kcal/mol. (From Fritsch andWesthof [34]).each potential hydrogen bond was recorded as a series of 1 (if present) and 0 (if absent)def<strong>in</strong><strong>in</strong>g the quantity S (t) [41]. The autocorrelation function is then given bywhere to = 5, 6t is the time at which the measurement beg<strong>in</strong>s along the simulationrun (with 6t = 0.05 ps and tm<strong>in</strong> = tm<strong>in</strong>6t = 5 ps). With such a def<strong>in</strong>ition, bonds notunformed at time to are ignored, and more importantly, bonds present at time t,whatever the number of <strong>in</strong>terven<strong>in</strong>g “breakage and re-formation” events, are <strong>in</strong>cluded.Two examples are shown <strong>in</strong> Figure 5-10. With E (r) = 4r, the autocorrelationcurves depend strongly on the geometrical criteria used for def<strong>in</strong><strong>in</strong>g the three-centerhydrogen bonds. This is much less the case with the sigmoidal dielectric function,<strong>in</strong>dicat<strong>in</strong>g aga<strong>in</strong> its superiority. In all curves, there is a very rapid drop of autocorrelationfollowed by a smooth transition to a plateau value of 0.5, reached after


5 Modell<strong>in</strong>g Nucleic Acids 1211.0model I (eca,)1.0model I (4r)300 K10.50 I------0.5 1 .o 1.50.5j ,,. . , , , , , , , ,,. . , , . , , , , , , , ,PS Q 0.5 1 .o 1.5 psFigure 5-10. Autocorrelation functions for the three-center hydrogen bond between A5 andT15 <strong>in</strong> the (dA)-(dT) decamer with two dielectric functions and for (1) soft geometrical criteria(r2 < 3 A, 82 > 90"); (2) medium geometrical criteria (rl < r2 < 3 A, 81 > 82 > go",350" < 01 + 02 + a < 360"). (From Fritsch and Westhof [34]).230.5 ps <strong>in</strong> the case of the sigmoidal dielectric function. Thus, when observ<strong>in</strong>g a threecenterH-bond, there is a 50% chance of observ<strong>in</strong>g it aga<strong>in</strong> after 0.5 ps. This value<strong>in</strong>creases to 80% at low temperature. Despite this high probability of occurrence,three-centre H-bonds appear, energetically, more as geometrical consequences of theanomalous structures adopted by dA-dT homopolymers than as structurally govern<strong>in</strong>gfactors.5.11 MD of DNA Oligomers with ExplicitSolvent TreatmentA simulation of a poly(dA)-poly(dT) decamer <strong>in</strong>clud<strong>in</strong>g 18 ammonium counterionsand 4109 water molecules was performed <strong>in</strong> the (N, R T)-ensemble [25]. The resultsshow that the DNA rema<strong>in</strong>s essentially <strong>in</strong> the B conformation with a tendency toadopt a slightly distorted, unwound and stretched conformation <strong>in</strong> comparison tostandard B-DNA. Surpris<strong>in</strong>gly, a peculiar behavior is observed for the aden<strong>in</strong>estrand, s<strong>in</strong>ce the term<strong>in</strong>al bases oscillate between the C2'-endo and 04'-endo doma<strong>in</strong>swhile the central ones are blocked <strong>in</strong> the C3'-endo pucker. In the m<strong>in</strong>or groove(Figure 5-11), a sp<strong>in</strong>e of hydration is found as observed by X-ray crystallography[13, 161 and other theoretical simulations [42-441.


122 E. WesthoL C. Rub<strong>in</strong>-Carrez, and K FritschFigure 5-11. Illustration of the sp<strong>in</strong>e of hydration, i. e. of water molecules bound to N3 and02 atoms <strong>in</strong> the m<strong>in</strong>or groove of the (dA)-(dT) decamer, along the MD simulation <strong>in</strong> aquaThe black circles correspond to the water oxygen atom positions. Notice how the periodicityat the end of the thermalization step is slowly broken up dur<strong>in</strong>g the simulation, although watermolecules still b<strong>in</strong>d systematically <strong>in</strong> the m<strong>in</strong>or groove up to the end of the production step.(From Fritsch et al. [25]).


5 Modell<strong>in</strong>g Nucleic Acids 1235.12 MD of the Anticodon Hairp<strong>in</strong> withImplicit Solvent TreatmentSeveral <strong>molecular</strong> dynamics simulations were performed on the anticodon stem (i. e.the five base pair helix with the seven residues loop) of the yeast tRNA-asp undervarious conditions (Rub<strong>in</strong>-Carrez and Westhof, <strong>in</strong> preparation). Without counterions(Figure 5-12), the use of dielectric functions equal to r or to 4r leads a completeloss of anticodon-like conformation of the loop <strong>in</strong> which the bases splay apart andpo<strong>in</strong>t <strong>in</strong>dependently toward the solvent. At the same time, there is a strong unw<strong>in</strong>d<strong>in</strong>gof the helix. With the sigmoidal dielectric function, the system evolves to aglobular and compact state <strong>in</strong> which the sugar-phosphate backbone of the loop foldsback on the deep an narrow groove of the helix. Whatever the dielectric function,the addition of counterions accelerates this phenomenum of <strong>molecular</strong> collapse bybridg<strong>in</strong>g the phosphate groups of the 5’-helical strand and the 3’-end of the loop.What are the orig<strong>in</strong>s of this <strong>molecular</strong> contraction? RNA helices are characterizedby a deep and narrow groove with a shallow and large groove correspond<strong>in</strong>g,respectively, to the large and m<strong>in</strong>or grooves of the B-DNA helix. This geometricaldifference arises from the displacement of the base pairs away from the helical axisand toward the m<strong>in</strong>or groove side. Such a movement fills <strong>in</strong> the m<strong>in</strong>or groove anddeepens the major groove sides, lead<strong>in</strong>g to an asymmetric distribution of matter,especially <strong>in</strong> a helix with less than one turn. In the simulations performed, it is apparentthat the weakest electrostatic dampen<strong>in</strong>g (E = r) leads to the least dramaticresults, emphasiz<strong>in</strong>g the fact that electrostatic repulsion keeps the strands apart <strong>in</strong>such simulations, but without warrant<strong>in</strong>g realistic dynamic behaviors. Or, <strong>in</strong> otherwords, implicit treatment of the screen<strong>in</strong>g of electrostatic repulsions by the solventgives too much weigth to the van der Waals attraction and leads to a globularmolecule <strong>in</strong> the absence of the space-fill<strong>in</strong>g property of bulk solvent.5.13 MD of the Anticodon Hairp<strong>in</strong> withExplicit Solvent TreatmentEven <strong>in</strong> aqueous solutions, simulations of the anticodon hairp<strong>in</strong> lead to severedistortions of the helical parameters. It should be rem<strong>in</strong>ded, however, that the sequenceof the anticodon helix conta<strong>in</strong>s a wobble G-U base pair which forms theepicentre of distortion. Also, the helix is only 5 base pairs, barely half a helical turn,and this could be responsible for artefactual values. The most deviant globalparameters (with respect to the helical axis) are the <strong>in</strong>cl<strong>in</strong>ation and the tip, i.e. the


124 E. Westhof; C. Rub<strong>in</strong>-Carrez, and K Fritsch(a)E=rT=300K(c)E = mT=300KFigure 5-12. Stereo views of the anticodon hairp<strong>in</strong> after 50 ps of MD simulation <strong>in</strong> V~CUOwith different dielectric functions. (From Rub<strong>in</strong>-Carrez [67]).


5 Modell<strong>in</strong>g Nucleic Acids 125rotations about the short and long axis of the base pair). In standard RNA helices,the values of those parameters are 15.9" and 0", respectively. The values averagedover the last 50 ps of simulation are -24.3' and 18.7', respectively. The change <strong>in</strong>sign of the <strong>in</strong>cl<strong>in</strong>ation parameter is particularly strik<strong>in</strong>g and difficult to rationalize.Despite those unexpla<strong>in</strong>ed structural deformations, the analysis of the hydrationof the RNA fragment was highly <strong>in</strong>structive. Cones of hydration around eachanionic phosphate oxygen (Figure 5-13) were systematically observed, as well aswater bridges connect<strong>in</strong>g successive phosphate groups with lifetimes around 10 psand more. The structur<strong>in</strong>g power of the ribose 02' atom on the polar environmentwas also apparent <strong>in</strong> the simulations, s<strong>in</strong>ce the 02' preferentially accepts hydrogenbond<strong>in</strong>g from water <strong>in</strong>stead of be<strong>in</strong>g a hydrogen donor. The organized hydration <strong>in</strong>the shallow groove of RNA helices and around unusual base pairs, like G-U pairs,already discussed on the basis of crystallographic results [45] could be describedmore precisely. The active structur<strong>in</strong>g role of water molecules is not restricted tohelical regions, s<strong>in</strong>ce very organized water networks were observed <strong>in</strong> the anticodonloop, <strong>in</strong>volv<strong>in</strong>g especially the pseudourid<strong>in</strong>e 32 and the methyl guanos<strong>in</strong>e 37P3


126 E. WesthoJ C. Rub<strong>in</strong>-Carrez, and K Fritsch(Figure 5-14). The Table 5-1 gives the maximum lifetimes observed <strong>in</strong> the MDsimulations for some of systematically observed water bridges. The values obta<strong>in</strong>edare <strong>in</strong> agreement with expected strength of occurrence of each water bridge. However,further studies are required for establish<strong>in</strong>g more precisely the values of the waterbridge lifetimes.FFigure 5-14. Stereo view of a snapshot of the partial aqueous environment <strong>in</strong> the anticodonloop dur<strong>in</strong>g the <strong>in</strong> aquo MD simulation of the anticodon hairp<strong>in</strong>. The draw<strong>in</strong>g emphasizesthe water bridges between residues 32-37 and residues 33-36. (From Rub<strong>in</strong>-Carrez [67]).FTable 5-1. Maximum lifetimes of some important water bridges observed <strong>in</strong> the MD simulationof the anticodon hairp<strong>in</strong>.Qpe of bridgeOP (i) . . . W.. . OP (i + 1)OP (i) ... W... N7 (i)02' (i) . . . W.. . OP (i + 1)02' (i) . . . W.. . N3/02 (i)N3/02 (i) . . . W.. . N3/02 (j + 1)02' (i) . . . W.. . 04' (i + 1)Maximum lifetime (ps)25-4552211The i <strong>in</strong>dex refers to residue number <strong>in</strong> the 5' to 3' direction and the j <strong>in</strong>dex to a residue onthe complementary strand.


5 Modell<strong>in</strong>p Nucleic Acids 1275.14 Modell<strong>in</strong>g of Large Nucleic Acid MoleculesFor nucleic acids, start<strong>in</strong>g with the first model of B-DNA [46], modell<strong>in</strong>g has beenused extensively, even before detailed X-ray crystallographic data on fragments wereabundantly available [47]. Specifically, for RNA structures, the modell<strong>in</strong>g of the anticodon-codon<strong>in</strong>teraction [48] and of a full tRNA [49] constitute both remarquableachievements.The structure proposed by Fuller and Hodgson [48] for the anticodon loop wasbasically correct. The ma<strong>in</strong> assumption was that the number of stacked bases <strong>in</strong> theanticodon loop is a maximum. Although the first two residues of the loop (positions32 and 33) were wrongly exposed toward the solvent, the structure provided most ofthe stereochemical explanations for Crick’s “wobble” hypothesis [50] and for therole of the modified pur<strong>in</strong>e at position 38. Levitt’s model [49] was the only“topologically” correct model. Its ma<strong>in</strong> features were: AA- and T-stems co-axial; D-and AC-stems co-axial; D-and T-loop <strong>in</strong>teractions (especially G19-C56) ; U8-Al4with Hoogsteen pair<strong>in</strong>g <strong>in</strong> trans; the R15-Y48 Levitt pair with Watson-Crick pair<strong>in</strong>gbut <strong>in</strong>correctly <strong>in</strong> cis <strong>in</strong>stead of trans; <strong>in</strong> the AC-loop a la “Fuller-Hodgson”. Thema<strong>in</strong> errors were <strong>in</strong> the conformation of the T-loop and <strong>in</strong> the use of five bases <strong>in</strong>the syn conformation.Follow<strong>in</strong>g the tremendous success of the X-ray crystallography method <strong>in</strong> ourunderstand<strong>in</strong>g of tRNA structure, the modell<strong>in</strong>g of tRNA structures was disregarded.In the early 80s, a revival of <strong>in</strong>terest started follow<strong>in</strong>g the accumulationof chemical, biochemical, and phylogenetic data on RNA molecules. Indeed, forRNA molecules, a variety of biochemical or chemical probes allow<strong>in</strong>g to explore theaccessibility of most of the atoms implied <strong>in</strong> ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g the secondary and the tertiarystructure of nucleic acids have been developed [51]. Also, one could exploitsystematically the enormous amount of <strong>in</strong>formation conta<strong>in</strong>ed <strong>in</strong> biological sequencesfrom various organisms. Aga<strong>in</strong>, the basic assumption is that partlydivergent, but nevertheless functionally and historically related, RNA moleculesfrom various biological orig<strong>in</strong>s fold <strong>in</strong>to similar tertiary structures. The comparisonand alignment of such sequences give <strong>in</strong>formation about the position and nature of<strong>in</strong>variant residues, which are considered important either for ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g the 3D foldor for the function, and about those regions present<strong>in</strong>g <strong>in</strong>sertions or deletions.Besides lead<strong>in</strong>g to consensus secondary structures, comparative sequence analysiscan be employed to identify tertiary contacts, as pioneered for tRNA by Levitt [49].Clearly, the method of comparative sequence analysis is restricted to molecules witheither structural or functional roles conserved across the phylogeny (e. g. tRNA, 5srRNA, ...).It should be understood that the use and display of atomic models for large RNAmolecules does not imply that the constructions are valid at atomic resolution <strong>in</strong> thecrystallographic sense. The fact of be<strong>in</strong>g meticulous about distances, angles, con-


128 E. WesthoL C. Rub<strong>in</strong>-Carrez, and K Fritschtacts, etc. <strong>in</strong>sures at least that the structural model is precise. The accuracy of a modelcan only be assessed either by X-ray crystallography or a posteriori by the <strong>in</strong>centivesand new ideas it gave rise to, s<strong>in</strong>ce energetical and stereochemical considerationsalone cannot be taken as proof. Whatever the approach, modell<strong>in</strong>g has to be complementedby extensive directed mutagenesis for test<strong>in</strong>g first its structural validityand, if a functional test is available, for apprais<strong>in</strong>g the structure-function relationshipsit suggests [52].Recently, Rippe et al. [53] presented experimental evidence for the formation ofa parallel-stranded double helix for the alternat<strong>in</strong>g sequence d(A, G). By gel electrophoresis,UV absorption and vacuum UV circular dichroism, they havedemonstrated the formation of a double helix with a parallel orientation for suchan alternat<strong>in</strong>g sequence. This experimental study was suggested by a model built onthe basis of <strong>molecular</strong> mechanics and dynamics calculations [54]. M<strong>in</strong>imizationresults displayed a pronounced dependence on the electrostatic parameters and leadto structures with several <strong>in</strong>ternal H-bonds between am<strong>in</strong>o groups and anionicphosphate oxygens. The only ones conserved after a 50 ps <strong>molecular</strong> dynamics trajectorycorrespond to the guan<strong>in</strong>e <strong>in</strong>trastrand H-bonds HN2 (G) . . . OP (G), alsopresent <strong>in</strong> the <strong>in</strong>itial model. As discussed above, we tend to th<strong>in</strong>k that several (if notall) of the <strong>in</strong>tra-residue (for G) or <strong>in</strong>ter-residue (for A) H-bonds between am<strong>in</strong>ogroups and anionic phosphate oxygens are artefacts of the force field and that, <strong>in</strong>case such <strong>in</strong>teractions do exist, they are mediated <strong>in</strong>stead by water bridges. This studyillustrates the difficulties result<strong>in</strong>g from a proper treatment of localized and bridg<strong>in</strong>gwater molecules <strong>in</strong>volv<strong>in</strong>g at least one charged group dur<strong>in</strong>g m<strong>in</strong>imization and<strong>molecular</strong> dynamics simulations when us<strong>in</strong>g an implicit treatment of the solvent. Itis not yet clear whether the problem will be solved by us<strong>in</strong>g an explicit representationof the solvent.5.1 5 ConclusionsEven with the simplified representations presently used <strong>in</strong> <strong>molecular</strong> mechanics anddynamics simulations, detailed treatments at the atomic level are, for the time be<strong>in</strong>g,ruled out for handl<strong>in</strong>g the global fold<strong>in</strong>g of large nucleic acid molecules (above 100nucleotides). As discussed above, <strong>in</strong> programs based on force field calculations, thema<strong>in</strong> problems reside <strong>in</strong> proper handl<strong>in</strong>g of the electrostatics and the solvation ofnucleic acids. Indeed, <strong>in</strong> all forms of nucleic acids, water molecules should be consideredas an <strong>in</strong>tegral part of the structure, s<strong>in</strong>ce <strong>in</strong>tra- and <strong>in</strong>ter-residue waterbridges fulfill the hydrogen bond<strong>in</strong>g capacity of the polar atoms, form<strong>in</strong>g str<strong>in</strong>gs,sp<strong>in</strong>es, or filaments <strong>in</strong> which water molecules have enough reorientational mobilityfor additional screen<strong>in</strong>g of the phosphate charges [16]. However, <strong>molecular</strong> dynamics


5 Modell<strong>in</strong>g Nucleic Acids 129simulations should help enormously our <strong>in</strong>sight and understand<strong>in</strong>g of the <strong>in</strong>teractionsand dynamics of water molecules around nucleic acids.The present <strong>in</strong>tractability of modell<strong>in</strong>g such an overwhelm<strong>in</strong>g amount ofmutually coupled <strong>in</strong>teractions led us to favour <strong>in</strong>teractive graphics techniques formodell<strong>in</strong>g large RNA molecules despite the unavoidable heavy reliance on humanjudgments for select<strong>in</strong>g local conformations lead<strong>in</strong>g to global compactness. Withmodell<strong>in</strong>g conceived as a tool, one might as well take advantage of the capability ofthe human m<strong>in</strong>d to th<strong>in</strong>k globally and act locally. Energy m<strong>in</strong>imization andrestra<strong>in</strong>ed least-squares are local techniques which perform only with<strong>in</strong> a smallradius of convergence. Although a more global technique, distance geometry, wasrecently <strong>in</strong>troduced <strong>in</strong> RNA modell<strong>in</strong>g [55], it leads to improbable tangled or knottedstructures that the algorithm cannot remove from the set of solutions. Malhotra etal. [56] have proposed an automatic RNA fold<strong>in</strong>g procedure <strong>in</strong> which a nucleotideis represented as a pseudoatom located at the phosphate atom. Such an approachneglects all the f<strong>in</strong>ely gra<strong>in</strong>ed <strong>in</strong>teractions between helices or loops and helices whichgovern and stabilize the three-dimensional fold<strong>in</strong>g [57]. A technique based on a constra<strong>in</strong>tsatisfaction algorithm has been put forward [58]. For small systems, this approachcould be useful, although the software and CPU requirements are heavy(10 hours CPU time on a sophisticated workstation for fold<strong>in</strong>g a T-loop). Thedrawbacks of build<strong>in</strong>g manually RNA models on graphics systems are real: the processis laborious and can be subjective, s<strong>in</strong>ce it depends on the judgment of themodeller which is itself based on his knowledge of RNA structures. However, up tonow, it is still the most successful method for large RNAs (see for example themodels of 5s rRNA [59, 601 and the model for the core of group I <strong>in</strong>trons [61-631,especially when modell<strong>in</strong>g is viewed as produc<strong>in</strong>g three-dimensional hypothesesdest<strong>in</strong>ed to be confronted to experimental verifications via directed mutagenesis andchemical or enzymatic prob<strong>in</strong>g.References[l] Konnert, J. H., Hendrickson, W. A., Actu Crystullogr. Sect. A 1980, 36, 344-349.[2] Northrup, S. H., Pear, M. R., McCammon, J. A., Karplus, M., Takano, T., Nuture1980,287, 659-660.[3] Levy, R. M., Sheridan, R. P., Keepers, J. W., Dubey, G. S., Swam<strong>in</strong>athan, S., Karplus,M., Biophys. J 1985, 48, 509-518.[4] Brooks, C. L., Karplus, M., Pettitt, B. M., Adv. Chem. Phys. 1988, LXXZ.[5] Ott<strong>in</strong>g, G., Liep<strong>in</strong>sh, E., Wiithrich, K., Science 1991, 254, 974-980.[6] Frey, M., <strong>in</strong>: Water and Biological Macromolecules, Westhof, E. (ed.), Macmillan Press,London 1993, pp. 98-147.[7] Case, D. A., Karplus, M., J. Mol. Biol. 1979, 132, 343-368.


130 E. Westhoj C Rub<strong>in</strong>-Carrez and I/: Fritsch[8] Westhof, E., Moras, D., <strong>in</strong>: Structure, Dynamics and Function of Biomolecules,Ehrenberg, A., Rigler, R., Grasslund, A., Nillson, L. (eds.), Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong> 1987,pp. 208-211.[9] Warshel, A., Russell, S. T., Q. Rev. Biophys. 1984, 17, 283-422.[lo] Westhof, E., Beveridge, D. L., Water Sci. Rev. 1990, 5, 24-135.[ll] Beveridge, D. L., Swam<strong>in</strong>athan, S., Ravishanker, G., Withka, J. M., Sr<strong>in</strong>ivasan, J.,Prevost, C., Louise-May, S., Langley, D. R., DiCapua, F. M., Bolton, P. H., <strong>in</strong>: Waterand Biological Macromolecules, Westhof, E. (ed.), Macmillan Press, London, 1993,pp. 165-225.[12] Clementi, E., Corongiu, G. <strong>in</strong>: Bio<strong>molecular</strong> Stereodynamics, Sarma, R. H. (ed.),Aden<strong>in</strong>e Press, New York, 1981, vol. 1, pp. 209-259.[13] Drew, H. R., Dickerson, R. E., J. Mol. Biol. 1981, 151, 535-556.[14] Saenger, W., Annu. Rev. Biophys. Biophys. Chem. 1987, 16, 93-114.[15] Westhof, E., Znt. J. Biol. Macromol. 1987, 9, 186-192.[16] Westhof, E., Annu. Rev. Biophys. Biophys. Chem. 1988, 17, 125-141.[17] We<strong>in</strong>er, S. J., Kollman, P. A., Nguyen, D. T., Case, D. A., J. Comput. Chem. 1986, 7,230-252.[I81 Rogers, N. K., Prog. Biophys. Mol. Biol. 1986, 48, 37-66.[19] Mehler, E. L., Eichele, G., Biochemistry 1984, 23, 3887-3891.[20] Lavery, R., Sklenar, H., Zakrzewska, K., Pullman, B., J. Biomol. Struct. Dyn. 1986, 3,989- 1014.[21] H<strong>in</strong>gerty, B. E., Ritchie, R. H., Ferrell, T. L., Turner, J. E., Biopolymers 1985, 24,427-439.[22] MacQuarrie, D., Statistical Mechanics, Harper and Row, New York, 1976.[23] Fenley, M. O., Mann<strong>in</strong>g, G. S., Olson, W. K., Biopoiymers 1990, 30, 1191-1203.[24] Toulouse, M., Fritsch, V., Westhof, E., Mol. Simul. 1992, 9, 193-200.[25] Fritsch, V., Ravishanker, G., Beveridge, D. L., Westhof, E., Biopolymers 1992, 33,1337-1552.[26] Chandler, D., Introduction to Modern Statistical Mechanics, Oxford Unversity Press,New York 1987.[27] Allen, M. P., Tildesley, D. J., Computer Simulation of Liquids, Clarendon Press, Oxford1987.[28] Berendsen, H. J. C., Postma, J. P. M., van Gunsteren, W. F., D<strong>in</strong>ola, A., Haak, J. R.,J. Chem. Phys 1984, 81, 3684-3690.[29] F<strong>in</strong>cham, D., Heyes, D. M., Adv. Chem. Phys. 1985, 63, 493-575.[30] Frey, C. M., Stuehr, J., <strong>in</strong>: Metal Zons <strong>in</strong> Biological Systems, Sigel, H. (ed.), Marcel Dekker,New York, 1974, pp. 51-116.[31] Porschke, D., Nucl. Acids Res. 1979, 6, 883-898.[32] S<strong>in</strong>gh, U. C., Brown, F. K., Bash, P. A., Kollman, P., J. Am. Chem. SOC. 1987, 109,1607-1614.[33] Brahms, S., Fritsch, V., Brahms, J. G., Westhof, E., J. Mol. Biol. 1992, 223, 455-476.[34] Fritsch, V., Westhof, E., J. Am. Chem. SOC. 1991, 113, 8271-8277.[35] Edwards, K. J., Brown, D. G., Sp<strong>in</strong>k, N., Skelly, J. V. Neidle, S., J. Mol. Biol. 1992, 226,1161 -1173.[36] Ansev<strong>in</strong>, A. T., Wang, A. H., Nucl. Acids Res. 1990, 18, 6119-6126.[37] Wang, A. H.-J., Quigley, G. J., Kolpak, F. J., Crawford, J. L., van Boom, J. H., van derMarel, G. A., Rich, A. Nature 1979, 282, 680-686.[38] Chevrier, B., Dock, A. C., Hartmann, B., Leng, M., Moras, D., Thuong, M. T., Westhof,E., J. Mol. Biol. 1986, 188, 707-719.[39] Eriksson, M. A. L., Laaksonen, A., Biopolymers 1992, 32, 1035-1059.


5 Modell<strong>in</strong>n Nucleic Acids 131[40] Geiger, A., Mausbach, P., Schnitker, J., Blumberg, R. L., Stanley, H. E., J. Phys. Colloq.C7 1984, 45, 13-30.[41] Rapaport, D. C., Mol. Phys. 1983, 50, 1151-1162.[42] Chupr<strong>in</strong>a, V. P., Nucl. Acids Res. 1987, 15, 293-311.[43] Ghupr<strong>in</strong>a, V. P., He<strong>in</strong>emann, U., Nurislamov, A. A., Zielenkiewicz, P., Dickerson,R. E., Proc. Natl. Sci. USA 1991, 88, 593-597.[44] Teplukh<strong>in</strong>, A. V., Poltev, V. I., Chupr<strong>in</strong>a, V. P., Biopolymers 1992, 32, 1445-1453.[45] Westhof, E., Dumas, P., Moras, D., Biochimie 1988, 70, 145-165.[46] Watson, J. D., Crick, F. H. C., Nature 1953, 171, 737-738.[47 Sundaral<strong>in</strong>gam, M., <strong>in</strong>: Conformation of Biological Molecules and Polymers,Bergmann, E. D., Pullman, B. (eds.), Israel Academy of Sciences, Jerusalem 1973,pp. 417-456.[48] Fuller, W., Hodgson, A., Nature 1967, 215, 817-821.[49] Levitt, M., Nature 1969, 224, 759-763.[SO] Crick, F. H. C., J. Mol. Biol. 1966, 19, 548-555.[51] Ehresmann, C., Baud<strong>in</strong>, F., Mougel, M., Romby, P., Ebel, J. P., Ehresmann, B., Nucl.Acids Res. 1987, 15, 9109-9116.[52] Westhof, E., Romby, P., Ehresmann, C., Ehresmann, B., <strong>in</strong>: Theoretical Biochemistryand Molecular Biophysics, Beveridge, D. L., Lavery, R. (eds.), Aden<strong>in</strong>e, New York, 1990,pp. 399-409.[53] Rippe, K., Fritsch, V., Westhof, E., Jov<strong>in</strong>, T. M., EMBO J. 1992, 11, 3777-3786.[54] Fritsch, V., Westhof, E., submitted for publication.[55] Hubbard, J. M., Hearst, J. E., Biochemistry 1991, 34 5458-5465.[56] Malhotra, A., Tan, R. K. Z., Harvey, S. C., Proc. Natl. Acad. Sci. USA 1990, 87,1950- 1954.[57] Westhof, E., Michel, F., <strong>in</strong>: Structural Tools for the Analysis of Prote<strong>in</strong>-Nucleic AcidComplexes, Lilley, D. M. J., Heumann, H., Suck, D. (eds.), Birkhauser Verlag, Basel,1992, pp. 255-267.[58] Major, F., Turcotte, M., Gautheret, D., Lapalme, G., Fillion, E., Cedergren, R., Science1991, 253, 1255- 1260.[59] Westhof, E., Romby, P., Romaniuk, P. J., Ebel, J. P., Ehresmann, C., Ehresmann, B.,1 Mol. Biol. 1989, 207, 417-431.[60] Brunel, C., Romby, P., Westhof, E., Ehresmann, C., Ehresmann, B., J. Mol. Biol. 1991,221, 293-308.[61] Michel, F., Westhof, E., J. Mol. Biol. 1990, 216, 585-610.[62] Michel, F., Jaeger, L., Westhof, E., Kuras, R., Tihy, F., Xu, M. Q., Shub, D., Genes Dev.1992, 6, 1373-1385.[63] Jaeger, L., Westhof, E., Michel, F., J. Mol. Biol. 1992, 221, 1153-1164.[64] Fritsch, V., Dissertation, Universite Louis Pasteur, Strasbourg, 1992.[65] Westhof, E., Sundaral<strong>in</strong>gam, M., Biochemistry 1986, 25, 4868-4878.[66] Westhof, E., Rao, S. T., Sundaraligam, M., J Mol. Biol. 1980, 142, 331-361.[67] Rub<strong>in</strong>-Carrez, C., Dissertation, Universite Louis Pasteur, Strasbourg, 1992.


Computer Modell<strong>in</strong>g <strong>in</strong> Molecular BiologyEdited by Julia M . GoodfellowOVCH Verlagsgesellschaft mbH. 19956 Theory of Transport <strong>in</strong> Ion ChannelsFrom Molecular Dynamics Simulationsto ExperimentsBenoit RouxGroupe de Recherche en Transport MembranaireDCpartement de Physique. UniversitC de Montreal C.P. 6128. succ . ACanada H3C 3 57. CanadaContents6.1 Introduction ...................................................... 1346.2 Traditional Phenomenological Descriptions ........................... 1376.36.3.16.3.1.16.3.1.26.3.1.36.3.26.3.2.16.3.2.26.3.2.36.3.2.46.3.36.3.3.16.3.3.26.3.46.3.4.16.3.4.26.3.4.36.3.56.3.5.1The Gramicid<strong>in</strong> A Channel: A Model System for Molecular Dynamics ... 140The Potential Energy Function ...................................... 140Ab Znitio Calculations ............................................. 141Functional Form and Parametrization ................................ 141Limitations of the Potential Energy Function ......................... 142The Water-Filled Channel .......................................... 143Build<strong>in</strong>g the Initial System .......................................... 145Analysis of the Trajectory .......................................... 146147149150Rotational Mobility of Internal Water Molecules ......................Rare Events: Ethanolam<strong>in</strong>e Tail Isomerization .........................Calculation of the Potential of Mean Force: Free Energy Simulation .....Application and Techniques ......................................... 151Analysis of the Results. ............................................Calculation of a Transition Rate: Activated Dynamics Technique ........153155Application and Techniques ......................................... 156Analysis of the Results ............................................. 157Comparison with Experiments ...................................... 160Relation Between NP and ERT ...................................... 161Analysis of the Results ............................................. 1636.4 Conclusions ....................................................... 164References ........................................................ 166


134 Benoft Roux6.1 IntroductionThe movement of ions across biological membranes is one of the most fundamentalprocess occurr<strong>in</strong>g <strong>in</strong> liv<strong>in</strong>g cells [l]. Without specific macro<strong>molecular</strong> structures,“ion channels”, the lipid membrane would present a prohibitively high energy barrierto the passage of any ion [2]. Selectivity for specific ions and a remarkably highrate of transport are key features of biological ion channels. For example, an averageof one hundred million Kf ions per second, selected over Na’ ions by a factor ofone hundred to one, can cross a frog node Delayed rectifier K channel underphysiological conditions [l]. Considerable efforts are now dedicated to the characterizationof ion permeation <strong>in</strong> <strong>molecular</strong> terms and to identify structural motifs thatcarry out specific functions. Advances due to the Patch Clamp technique [3] havepermitted the detection, at least on the millisecond time-scale, of the unitary eventsgovern<strong>in</strong>g the permeation of ions such as the open<strong>in</strong>g and the clos<strong>in</strong>g of a s<strong>in</strong>glemembrane channel. Primary am<strong>in</strong>o acid sequences have been deduced for severalbiological channel molecules and site-directed mutagenesis is used to identify the keyresidues <strong>in</strong>volved <strong>in</strong> the function of biological channels [4]. Recently, dramatic examplesshowed that mutation of a s<strong>in</strong>gle residue can alter the Na’ channel to aCaf2 permeable channel [5], and that a substitution of three am<strong>in</strong>o acids is able toconvert a cation-selective channel <strong>in</strong>to an anion-selective channel [6].The relation of structure to function <strong>in</strong> ion channels is of central concern forphysiologists. Results from biochemical dissection, site-directed mutagenesis, chemicalmodifications and ion-flux measurements are be<strong>in</strong>g used, <strong>in</strong> comb<strong>in</strong>ation withstructure prediction algorithms and <strong>molecular</strong> mechanics calculations, to determ<strong>in</strong>ethe overall topology and the three-dimensional structure of important biologicalchannels [4, 7- 111. Nevertheless, the relative <strong>in</strong>tractability of biological membraneprote<strong>in</strong>s still poses severe problems. Experimentally determ<strong>in</strong>ed structures withatomic resolution are available only for a few membrane prote<strong>in</strong>s : the photosyntheticreaction center [12], bacteriorhodops<strong>in</strong> [13], a por<strong>in</strong> from Rhodobacter capsulatus[14], and the OmpF and PhoE por<strong>in</strong>s of bacteria E. coli [15]; there is also some <strong>in</strong>formationabout the general macro<strong>molecular</strong> shape of the acetylchol<strong>in</strong>e receptor [16]and the fast Na’ channel [17] from high resolution electron microscopy. Membraneprote<strong>in</strong>s are difficult to characterize structurally, primarily because the requirementfor ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g a membrane environment h<strong>in</strong>ders purification and crystallizationand complicates spectroscopic measurements. The task is further complicated by thefact that biological ion channels are particularly complex multisubunits membraneprote<strong>in</strong>s. This is the reason why much of the progress <strong>in</strong> understand<strong>in</strong>g the relationof structure to function <strong>in</strong> ion channels has been ga<strong>in</strong>ed by study<strong>in</strong>g small artificialpore form<strong>in</strong>g molecules such as gramicid<strong>in</strong> A (Figure 6-1). This small pentadecapeptideforms a membrane channel that appears to be ideally selective for smallunivalent cations, while it is blocked by divalent cations, and impermeable to anions


6 Theory of Transport <strong>in</strong> Ion Channels 135Figure 6-1. The gramicid<strong>in</strong> A molecule is a l<strong>in</strong>ear antibiotic pentadecapeptide produced byBaciNus brevis consist<strong>in</strong>g of alternat<strong>in</strong>g L and D-am<strong>in</strong>o acids: HCO - GVal' - Gly' - GAla3- D-Leu4 - LAla' - D-Val6 - LVal' - D-Vals - GTrpg- D-Leu" - LTrp" -D-Leu" - LTrp13 -D-LeuI4 - GTrp" - NHCH2CH20H. The ion-conduct<strong>in</strong>g channel is a N-term<strong>in</strong>al to N-term<strong>in</strong>al(head-to-head) dimer formed by two s<strong>in</strong>gle-stranded /36.3-helices 1511. A stereo pictureof the energy m<strong>in</strong>imized left-handed dimer is shown on the figure (see also Section 6.3.2.1).The dimer channel is stabilized by the formation of 20 <strong>in</strong>tra-monomer and 6 <strong>in</strong>ter-monomer-NH-..O - backbone hydrogen bonds to form a pore of about 2.6 nm long and 0.4 nm <strong>in</strong>diameter. The hydrogen-bonded carbonyls l<strong>in</strong>e the pore and the am<strong>in</strong>o acid side cha<strong>in</strong>s, mostof them hydrophobic, extend away <strong>in</strong>to the membrane lipid.


136 Benoit Roux[18-201. The gramicid<strong>in</strong> channel exhibits functional behavior similar to far morecomplex macro<strong>molecular</strong> biological structures, and for this reason, has proved to bean extremely useful model system to study the pr<strong>in</strong>ciples govern<strong>in</strong>g ion transportacross lipid membranes. It has been the object of numerous experimental [21-271 aswell as theoretical <strong>in</strong>vestigations [28-401, and a great wealth of <strong>in</strong>formation is knownabout its membrane-bound ion-conduct<strong>in</strong>g conformation [41-501. Gramicid<strong>in</strong> is atthis moment the best characterized <strong>molecular</strong> pore [51].Comprehension of how any ion channel works <strong>in</strong> terms of its underly<strong>in</strong>g atomicstructure, even the one formed by the relatively simple gramicid<strong>in</strong> molecule, representsan outstand<strong>in</strong>g challenge. Research on ion channels has now reached the po<strong>in</strong>twhere the design and <strong>in</strong>terpretation of experiments depends upon the availability oftheoretical calculations to help formulate and develop a detailed realistic microscopicpicture of ion permeation. Despite their usefulness, traditional phenomenologicaldescriptions of ion permeation based on Eyr<strong>in</strong>g Rate Theory [52] or Nernst-Planckdiffusion [53] cannot fulfill this purpose. For example, attempts to expla<strong>in</strong> theobserved effects of am<strong>in</strong>o acid substitution on current-voltage measurements oftenneed to rely on atomic models of the channel structure [8-10, 54, 551. Moleculardynamics simulation is a powerful theoretical approach to <strong>in</strong>vestigate the functionof complex macro<strong>molecular</strong> structures [56]. It consists <strong>in</strong> calculat<strong>in</strong>g the position ofthe atoms as a function of time, us<strong>in</strong>g detailed models of the microscopic forcesoperat<strong>in</strong>g between them, by <strong>in</strong>tegrat<strong>in</strong>g numerically the classical equations ofmotion. Although <strong>molecular</strong> dynamics is used to study biological systems of <strong>in</strong>creas<strong>in</strong>gcomplexity [56], theoretical <strong>in</strong>vestigations of ion transport are faced with particularlydifficult and serious problems. A first problem arises from the magnitudeof the <strong>in</strong>teractions <strong>in</strong>volved. The large hydration energies of ions, around-400 kJ/mol for Na’, contrast with the activation energies deduced from experimentallyobserved ion-fluxes, which generally do not exceed 10 k,T [l]. Thisimplies that the energetics of ion transport results from a delicate balance of verylarge <strong>in</strong>teractions. Therefore, special care must be taken to construct an accurate andrealistic potential energy function to be used <strong>in</strong> the calculations. A second problemarises from the time-scales <strong>in</strong>volved. The passage of one ion across a channel takesplace on a microsecond time-scale and realistic simulations of biological systems,which typically do not exceed a few nanoseconds, are <strong>in</strong>sufficiently short. Althoughthe most exact and realistic <strong>in</strong>formation is provided by straight <strong>molecular</strong> dynamicstrajectories, simple “brute force” simulations cannot account for the time-scales ofion permeation. A variety of special computational approaches called “biasedsampl<strong>in</strong>g” techniques are necessary to extract <strong>in</strong>formation about these slower andmore complex processes. A last difficulty is the translation of the results obta<strong>in</strong>edfrom a microscopic model <strong>in</strong>to macroscopic observables such as channel conductanceand current-voltage relations. Here, the traditional phenomenologies play animportant role. As <strong>in</strong> the fundamental formulation of non-equilibrium statisticalmechanical theories of transport coefficients [57], the purpose of the phenomenol-


6 Theory of Transport <strong>in</strong> Ion Channels 137ogy is to serve as a bridge between the microscopic model and the macroscopicobservables. In the case of ion channels, Eyr<strong>in</strong>g Rate Theory [52] and Nernst-Planckdiffusion [53] provide an effective conceptual framework to make full use of the <strong>in</strong>formationgathered from the <strong>molecular</strong> dynamics simulations.The goal of this chapter is to provide an <strong>in</strong>troduction to the modern <strong>molecular</strong>dynamics simulation techniques that are particularly useful <strong>in</strong> theoretical studies ofion channels. The chapter is divided <strong>in</strong> 4 sections. The phenomenological theoriestraditionally used to describe experimental data are briefly <strong>in</strong>troduced <strong>in</strong> Section 6.2.The general methodology applied to the gramicid<strong>in</strong> channel is expla<strong>in</strong>ed <strong>in</strong> Section6.3. The chapter is concluded with an outlook at future applications <strong>in</strong> Section6.4.6.2 Traditional Phenomenological DescriptionsTraditional approaches, such as Eyr<strong>in</strong>g Rate Theory (ERT) [52], or the Nernst-Planck (NP) cont<strong>in</strong>uum diffusion equation [53], are useful phenomenological toolsto account for the experimentally observed current-voltage relation, i[A 4 [l]. Bothapproaches describe the movements of ions across membrane channels as chaoticrandom displacements driven by an electrochemical free energy potential, ’ W,,, (x).ERT describes the movements of ions as a sequence of sudden stochastic “hopp<strong>in</strong>gevents” across barriers separat<strong>in</strong>g energetically favorable discrete wells [52] ; <strong>in</strong> contrast,the one-dimensional NP equation describes the movements of ions along theaxis of the channel as a random cont<strong>in</strong>uous diffusion process [53]. ERT models areexpressed <strong>in</strong> terms of a set of equations relat<strong>in</strong>g the net stationary flux, J, and theoccupation probability of the i-th and i + 1-th sites, Pi and Pi+l,where ki and are forward and backward jump rates, respectively [52]. It isgenerally assumed that the rates have an Arrhenius-like form with a voltage-<strong>in</strong>dependentdynamical pre-exponential frequency factor, Fp [52, 581,where W,,, (xi) and wot (xL) are the electrochemical free energy at the i-th barrierand well, respectively. Similarly, the NP equation relates the net stationary flux, J,to the probability density per unit length, P(x),


138 Benoft Rouxwhere D (x) is the local diffusion coefficient.The net electrical current under a transmembrane voltage difference, Ay, is givenby the stationary flux of ions times the electric charge carried by them,For small voltage difference, the net current obeys a l<strong>in</strong>ear relationship follow<strong>in</strong>gOhm’s law,where A is the conductance of the channel. To describe the effects of a voltage differenceacross the membrane it is convenient to separate the electrochemical freeenergy potential, W,,, (x), <strong>in</strong>to a first contribution, W(x), aris<strong>in</strong>g from the <strong>in</strong>teractionsof the permeat<strong>in</strong>g ion with the nearby channel and water <strong>in</strong> the absence of atransmembrane voltage, and a second contribution, Welec (x), correspond<strong>in</strong>g to thefree energy associated with the presence of the electrostatic potential differenceacross the membrane. In pr<strong>in</strong>ciple, W(x), called the “free energy profile”, can becalculated from <strong>molecular</strong> dynamics simulations of an atomic model us<strong>in</strong>g thetechniques described <strong>in</strong> the next sections. An exact treatment of Welec(x) is moredifficult because the transmembrane voltage results from long-range electrostatic <strong>in</strong>teractions<strong>in</strong>volv<strong>in</strong>g a very small imbalance of net charges <strong>in</strong>volv<strong>in</strong>g the mobile ions<strong>in</strong> the bulk solutions on each side of the membrane [59]. For simplicity, it is usuallyassumed that the mobile ions are uniformly distributed at the bulk-membrane <strong>in</strong>terface,giv<strong>in</strong>g rise to a constant electric field <strong>in</strong>side the channel. With this approximation,the total free energy associated with the transmembrane electric field when theion is at position x is [38],where A y is the voltage difference, L is the length of the channel and (p (x)) is theaverage dipole of the channel and its solvent content when the ion is at position x.The contribution of (p(x)) is generally neglected [l, 381.The channel conductance, A, can be calculated from both phenomenologicaltheories once the boundary conditions, generally based on the assumption that theextremities of the pore are <strong>in</strong> equilibrium with the solution with which they are <strong>in</strong>contact [52, 531 are specified. One particular k<strong>in</strong>d of boundary conditions, can beconstructed such that no more than one ion can occupy the pore. The conductanceobta<strong>in</strong>ed from such models, called “one-ion pore” [l], exhibit simple first-order


6 Theory of Transport <strong>in</strong> Zon Channels 139saturation as a function of the permeant ion concentration. With the same concentration,[C], on both sides of the membrane the concentration-dependent conductance,A([C]), is,where Kb is the equilibrium b<strong>in</strong>d<strong>in</strong>g constant to the pore. This Michaelis-Mentenform is obta<strong>in</strong>ed from both the NP and the ERT approaches and follows from theone-ion pore assumption. The maximum conductance, Amax , reached at saturat<strong>in</strong>gconcentrations can be written as,q2 keff/(n + l)n ERTAmax = - k,T [Deff/L2 (6-9)NPIn Eq. (6-9), k,, is an effective hopp<strong>in</strong>g rate constant, def<strong>in</strong>ed <strong>in</strong> terms of theweighted averages over all the discrete sites of the pore,(6-10)obta<strong>in</strong>ed by assum<strong>in</strong>g that sites 1 to n are <strong>in</strong>side the pore and sites 0 and n + 1 belongto the aqueous phases [52]; Deff is an effective diffusion constant of the ion <strong>in</strong>sidethe pore, def<strong>in</strong>ed <strong>in</strong> terms of weighted spatial averages over the full length of thepore,These expressions, which may seem unwieldy and difficult to understand, deserve afew explanations. First, it is observed that the expression for Amax-ERT is considerablysimpler <strong>in</strong> the special case where the transition rates are all equal s<strong>in</strong>ce k,,,reduces to k. Moreover, despite their very different appearances the expressions forA,,, from ERT and NP are closely related. For example, a Deff correspond<strong>in</strong>g tokL2/n2 is obta<strong>in</strong>ed if the diffusion takes place as a sequence of random jumps occurr<strong>in</strong>gat a rate, k, between a large number of identical site, n, separated by L/n (seealso Section 6.3.5). Lastly, a simple physical <strong>in</strong>terpretation of A,,-NP can begiven. It is the conductance of a cyl<strong>in</strong>der of length, L, and area, S, conta<strong>in</strong><strong>in</strong>g a


140 Benoft Roux“one-ion” conduct<strong>in</strong>g solution of concentration, [C *] = l/LS; from Ohm’s law, theconductance of the cyl<strong>in</strong>der is equal to Sa/L, where a, equal to [C*]D,,,qi?,,/k,I;is the conductivity of the solution.The free energy profile, W(x), the diffusion constant, D(x), and the jump rates,k,, are all microscopic quantities that are needed as <strong>in</strong>puts <strong>in</strong> the phenomenologies.Given an estimate of these microscopic quantities, the phenomenological theoriesprovide a complete description of the physiological properties of a channel for differentions, <strong>in</strong>clud<strong>in</strong>g the permeabilities, the rate of transport and the response toan applied electrical potential. The free energy profile, W(x), is particularly importants<strong>in</strong>ce it plays a fundamental role <strong>in</strong> both the ERT and NP approaches. Specialsimulation techniques to calculate W(x), D(x) and ki from detailed microscopicmodels are described <strong>in</strong> the next sections.6.3 The Gramicid<strong>in</strong> A Channel:A Model System for Molecular Dynamics6.3.1 The Potential Energy FunctionIn a <strong>molecular</strong> simulation the classical equation of motion, i.e.,d2rimidt2- - Vu(rl, r2,. .., rd, (6-12)are <strong>in</strong>tegrated numerically with small discrete time-steps to obta<strong>in</strong> the positions andvelocities of the particles <strong>in</strong> the system as a function of time. Here, mi and ri are themass and position of particle i, and U (rl, r,, ... , rn) represents the potentialenergy function that depends on the position of the N particles <strong>in</strong> the system. Fordetailed <strong>molecular</strong> dynamics studies to be mean<strong>in</strong>gful it is essential to use an accuratepotential energy function that is appropriate for the system of <strong>in</strong>terest. In thelong and narrow gramicid<strong>in</strong> channel the permeation process <strong>in</strong>volves the translocationof ion and water <strong>in</strong> s<strong>in</strong>gle file through the <strong>in</strong>terior of the pore. The cation-channel<strong>in</strong>teractions are dom<strong>in</strong>ated by the carbonyl oxygen of the peptide backbone (seeFigure 6-1); the side cha<strong>in</strong>s extend away <strong>in</strong>to the membrane lipid and their energycontributions amount to secondary effects [34]. Hydrogen bonds are of central importancefor the gramicid<strong>in</strong> channel. The structure of the P6.3-helix is stabilized by- NH.a.0 - backbone hydrogen bonds ; water molecules, present <strong>in</strong>side the channel,also possess the ability of mak<strong>in</strong>g hydrogen bonds. Coord<strong>in</strong>ation of Na’ by car-


6 Theory of Transport <strong>in</strong> Zon Channels 141bony1 and water oxygens is possible at the expense of break<strong>in</strong>g hydrogen bonds.Thus, <strong>in</strong> the present system the relative strengths of the strong ion-carbonyl and ionwater<strong>in</strong>teractions must be preserved and well balanced with respect to the relativelyweaker but complex water-water, water-peptide and peptide-peptide hydrogen bond<strong>in</strong>teractions.6.3.1.1 Ab Initio CalculationsThere is no unique method to construct the potential energy function [56, 60-641.The approach chosen <strong>in</strong> the present study relies primarily on experimental data, supplementedby high level ab <strong>in</strong>itio calculations with small model systems when thenecessary <strong>in</strong>formation is not available [65]. S<strong>in</strong>ce no experimental estimate of the <strong>in</strong>teractionof Na' with the carbonyl group of the peptide backbone is available atthe present time, it is necessary to rely on ab <strong>in</strong>itio calculations on small modelsystems to supplement this important <strong>in</strong>formation. The N-methylacetamidemolecule (NMA) was chosen to model the peptide backbone amide plane groups.The <strong>in</strong>teraction of Naf with water, for which a fair amount of experimental gasphase as well as bulk liquid <strong>in</strong>formation is available, was calculated to test the accuracyof the approach. The results of the ab <strong>in</strong>itio calculations <strong>in</strong>volv<strong>in</strong>g Na' ion,water and the NMA molecule are given <strong>in</strong> Table 6-1. It is observed that the calculated<strong>in</strong>teraction between Na+ and water is <strong>in</strong> very good agreement with the experimentalgas phase data [66]. This <strong>in</strong>dicates that the approach to calculate the Na+ NMA <strong>in</strong>teractionis valid and can be used to develop the empirical energy function. For comparison,results on hydrogen bond<strong>in</strong>g <strong>in</strong>teractions <strong>in</strong>volv<strong>in</strong>g water and NMA are alsogiven <strong>in</strong> Table 6-1 [67].6.3.1.2 Functional Form and ParametrizationIt is desirable to keep the empirical potential function as simple as possible forreasons of computational efficiency. The functional form of the empirical force fieldthat was adopted is similar to others used <strong>in</strong> <strong>molecular</strong> dynamics of biologicalmacromolecules [56] ; <strong>in</strong> addition to <strong>in</strong>ternal energy functions (bond, angles anddihedrals), the non-bonded <strong>in</strong>teractions are represented as a sum of radially symmetricpair decomposable site-site functions <strong>in</strong>clud<strong>in</strong>g Coulomb partial charges electrostatic,core repulsion, van der Waals dispersion and <strong>in</strong>duced polarization,'%owbonded = c Eelec Ecore + EvdW Epol - (6-13)pairs


142 Benoft RouxTable 6-1. Interaction energies for the Gramicid<strong>in</strong> channel systema.H,O to Na+NMA C=O to Na'HzO to HzONMA C=O to HOHNMA N-H to OHHNMA to NMAModel system Energy <strong>in</strong> kJ/mol Distance <strong>in</strong> nmeXPHF/6-31G*CHARMMTIP3PHF/6-31G*CHARMMeXPHF/6-31G*TIP3PHF/6-31G*CHARMM-TIP3PHF/6-31G*CHARMM-TIP3PHF/6-31G*CHARMM100.3100.3114.5160.5158.022.627.427.432.030.026.328.532.434.00.2210.2200.2100.2100.2020.1810,1980.1910.2130.1930.2080.193a The ab <strong>in</strong>ito <strong>in</strong>teraction energies of Na' with water and NMA were calculated us<strong>in</strong>g theGaussian 82 [72] and Gaussian 86 [73] programs with the 6-31G* basis sets [74]. To accountfor the overestimated bond polarities the ab <strong>in</strong>itio <strong>in</strong>teractions energies were scaled accord<strong>in</strong>gto Escaled = (Escf - Eb.s.) ,uexp/pscf, where pexp and pScf are the experimental and the computedab <strong>in</strong>itio electric dipole of the molecule <strong>in</strong> the absence of the ion and Eb,s, is the correctionfor the basis set superposition error. The experimental ion-water aff<strong>in</strong>ity is from [66], the experimentalwater-water aff<strong>in</strong>ity is from [76], the results on hydrogen bond<strong>in</strong>g systems are takenfrom [67].An ion-<strong>in</strong>duced polarization term was found to be required to obta<strong>in</strong> the steeperthan-Coulombicdistance dependence of the ion-peptide <strong>in</strong>teraction <strong>in</strong> the range of0.2 to 0.5 nm. The parameters of the potential function, i. e., the polarizability coefficients,the van der Waals and the core size, were adjusted to reproduce the salientfeatures of the ab <strong>in</strong>itio potential surface accurately; the parameters of the ion-peptideand ion-water energy functions are given <strong>in</strong> [38], the partial charges and the Lennard-Jonesparameters assigned to the peptide backbone have been taken fromprevious work on the peptide-peptide potential [65, 681. All the <strong>in</strong>teraction energies<strong>in</strong>volv<strong>in</strong>g Na', water and NMA are given <strong>in</strong> Table 6-1. The agreement between theempirical potential energy function and the ab <strong>in</strong>itio results is satisfactory.6.3.1.3 Limitations of the Potential Energy FunctionThe present ion-peptide energy function only represents a first step <strong>in</strong> obta<strong>in</strong><strong>in</strong>g anaccurate potential function. Geometries and <strong>in</strong>teraction energies obta<strong>in</strong>ed fromsmall isolated fragments are not always sufficient to construct an appropriate poten-


6 Theory of Transport <strong>in</strong> Zon Channels 143tial energy function. Further empirical adjustments, based on experimental data, areoften necessary to yield correct properties <strong>in</strong> dense systems. One example is given bythe TIP3P potential function which yields relatively good properties for liquid wateralthough it results <strong>in</strong> an overestimated <strong>in</strong>teraction energy for the isolated water dimer[69]. It is possible that such discrepancy with gas phase <strong>in</strong>teractions are necessaryto account, <strong>in</strong> an average way, for many-body polarization effects present <strong>in</strong> bulksolution. In the present case only the dom<strong>in</strong>ant first order polarization <strong>in</strong>duced bythe ion on the peptide was <strong>in</strong>cluded, i. e., Epol - l/r4. In this approximation thepartial charges of the peptide and the water as well as other <strong>in</strong>duced dipoles do not<strong>in</strong>fluence a particular <strong>in</strong>duced dipole and the polarization is not calculated self consistentlyto keep the orig<strong>in</strong>al pairwise additive form of the water-water (TIP3P) [69],peptide-peptide and water-peptide (CHARMM) [65] potential functions <strong>in</strong> theorig<strong>in</strong>al form. An important part of nonadditive effects neglected here is <strong>in</strong>cluded<strong>in</strong> the second-order <strong>in</strong>duced polarization energy [70]. In future work the effects ofself-consistent polarization will be considered; experimental <strong>in</strong>formation concern<strong>in</strong>gion b<strong>in</strong>d<strong>in</strong>g <strong>in</strong> the channel will be used to ref<strong>in</strong>e the parametrization of the potentialfunction [71].6.3.2 The Water-Filled ChannelThe importance of water on the permeation process through the gramicid<strong>in</strong> channelis suggested by numerous experimental observations [22-241. For example, therelative selectivity of the gramicid<strong>in</strong> channel for monovalent cations, Li' < Na'< K+ < Rb+ < Cs' C Hf is similar to the mobility of these ions <strong>in</strong> bulk water [l].This observation could <strong>in</strong>dicate that the gramicid<strong>in</strong> channel simply acts as a waterfilledcyl<strong>in</strong>der and that, <strong>in</strong>side the pore, water and ion movements take place withessentially bulk-like mobilities. Further considerations of the permeation processreveal that such view is too simplistic. From a comparison of the osmotic and thediffusional permeability coefficients, it has been estimated that the channel conta<strong>in</strong>s5 to 6 water molecules on average [24]; stream<strong>in</strong>g potential measurements show, that7 to 9 water molecules move through the channel with each permeat<strong>in</strong>g ion [23].Moreover, although the effective diffusion constant of Na+ ion and of watermolecules <strong>in</strong>side the channel appear to be very similar [24], the diffusionalpermeability coefficient, measured from the flux of isotopically labelled water acrossthe channel, (1.82 x cm3/s) could correspond to an effective diffusion constantthat is almost 50 times slower for water molecules <strong>in</strong> the channel than <strong>in</strong> thebulk [22, 241.Clearly, the relation of the structure and dynamics of the water molecules to thepermeation process deserves special attention. As suggested by several theoreticalstudies, <strong>in</strong>ternal water molecules must adopt a l<strong>in</strong>ear configuration due to the con-


144 Benoit RouxFigure 6-2. Solvated Gramicid<strong>in</strong> A dimer. Full system (top); cut-away close-up view (bottom).The system consists of 314 peptide atoms for the gramicid<strong>in</strong> dimer, 183 TIP3P [69] watermolecules and 85 Lennard-Jones spheres; there are 948 particles. Periodic boundary conditionswere applied along the channel axis to mimic the effect of <strong>in</strong>f<strong>in</strong>ite bulk water (shown<strong>in</strong> darker light). A conf<strong>in</strong><strong>in</strong>g potential evaluated for the cyl<strong>in</strong>drical geometry was applied <strong>in</strong>the radial direction on the water oxygen to ma<strong>in</strong>ta<strong>in</strong> proper pressure <strong>in</strong> the system. Thediameter of the cyl<strong>in</strong>drical system is 2.1 nm and the length of the elementary unit is 4.1 nm.Water molecules with<strong>in</strong> 0.25 nm of the surface of the cyl<strong>in</strong>der and all the Lennard-Jonesspheres of the model membrane were submitted to dissipative and stochastics Langev<strong>in</strong> forcescorrespond<strong>in</strong>g to a velocity relaxation rate of 62 and 150 ps-' respectively; <strong>molecular</strong>dynamics was applied to the other atoms. The <strong>in</strong>tegration time step was 0.001 picosecond (ps)and a non-bonded group-by-group based cut-off of 1.2 nm was used.(Colour illustration see page XIV).


6 Theory of Transport <strong>in</strong> Zon Channels 145f<strong>in</strong>ement of the pore [28, 311; ion and water cannot pass each other as they movethrough the channel and permeation proceeds by a s<strong>in</strong>gle-file mechanism [29, 33,361. It may be expected that the hydrogen bonds <strong>in</strong>volv<strong>in</strong>g <strong>in</strong>ternal waters and thechannel backbone must be modified as a result of the displacement of the ion withits s<strong>in</strong>gle-file hydration complex. To ga<strong>in</strong> more <strong>in</strong>sight on the properties of the watermolecules <strong>in</strong>side the channel, a <strong>molecular</strong> dynamics simulation of the fully solvatedchannel <strong>in</strong> a simplified membrane-like environment was performed. The simulatedsystem, shown <strong>in</strong> Figure 6-2, <strong>in</strong>cludes one gramicid<strong>in</strong> dimer channel, 188 TIP3Pwater molecules and 85 electrically neutral spheres <strong>in</strong>troduced to mimic thehydrocarbon region of the membrane. The construction of the <strong>in</strong>itial water configuration<strong>in</strong> bio<strong>molecular</strong> simulations is a critical step. Because this is particularlytrue for a narrow membrane channel, the technique employed to generate the start<strong>in</strong>gcoord<strong>in</strong>ates is described <strong>in</strong> the next section.6.3.2.1 Build<strong>in</strong>g the Initial SystemThere rema<strong>in</strong>s some uncerta<strong>in</strong>ty on the membrane-bound ion-conduct<strong>in</strong>g structureof the gramicid<strong>in</strong> A channel <strong>in</strong> lipid bilayers. Proton-proton NOE distances determ<strong>in</strong>edby two-dimensional NMR experiments have demonstrated that the structureof gramicid<strong>in</strong> A is a right-handed P-helix head-to-head dimer when <strong>in</strong>corporated <strong>in</strong>SDS micelles [45], <strong>in</strong> contrast to the left-handed P-helix that was orig<strong>in</strong>ally proposedby D. W. Urry [41] and supported by 13C Na" <strong>in</strong>duced chemical shifts measurements<strong>in</strong> lecith<strong>in</strong> vesicles [43]. Recent results from solid state 15N and 13C NMR ofgramicid<strong>in</strong> <strong>in</strong> oriented dimyristoylphosphatidylchol<strong>in</strong>e (DMPC) membranes havebeen used to support both the left-handed [47, 771 and right-handed [48, 501 structures.In view of the extreme sensitivity of gramicid<strong>in</strong> to the environment [46], it maywell be that both the left- and right-handed helices, which are plausible on energeticand structural grounds, are found <strong>in</strong> membranes. For the present study thegramicid<strong>in</strong> channel was constructed as a left-handed head-to-head dimer, althoughthe conclusions are not expected to depend essentially on this choice because the propertiesof the right- and left-handed dimer are very similar.The <strong>in</strong>itial conformation was constructed from the backbone dihedral angles ofVenkatachalam and Urry [30], and was further optimized by energy m<strong>in</strong>imizationus<strong>in</strong>g the ABNR algorithm [65]. The result<strong>in</strong>g structure is shown <strong>in</strong> Figure 6-1. Theprimary solvation of the channel structure was <strong>in</strong>troduced by build<strong>in</strong>g 25 watermolecules <strong>in</strong> the vic<strong>in</strong>ity of the structure, i. e., 10 water molecules along the axis ofthe channel <strong>in</strong>side the pore, the rest at the extremities of the channel. The 10 waters<strong>in</strong>side the pore were constructed along the channel axis <strong>in</strong> s<strong>in</strong>gle-file fashion,separated by 0.27 nm. The bulk-like water regions at either end of the channel wereconstructed by overlay<strong>in</strong>g water molecules taken from the coord<strong>in</strong>ates of a pure


146 Benoit Rouxwater box equilibrated at 300 K and delet<strong>in</strong>g the water molecules overlapp<strong>in</strong>g withthe channel or the first 25 waters.To produce a membrane-like environment and to prevent waters from reach<strong>in</strong>g thelateral side of the dimer, a model hydrocarbon region made of Lennard-Jones spherescorrespond<strong>in</strong>g to the size of a CH, group was <strong>in</strong>cluded. A similar overlay methodwas used to construct the model membrane.To remove large unrealistic forces the <strong>in</strong>itial coord<strong>in</strong>ate of the system were optimizedwith several cycles of energy m<strong>in</strong>imization and <strong>molecular</strong> dynamics. At thelast stage of equilibration, the waters were thermalized with a 10 picosecond (ps) trajectorykeep<strong>in</strong>g the gramicid<strong>in</strong> dimer and the membrane fixed <strong>in</strong> space. The completesystem was f<strong>in</strong>ally equilibrated at 300 K with a 10 ps trajectory dur<strong>in</strong>g which all theatoms were allowed to move. After equilibration of the full system, a 100 ps<strong>molecular</strong> dynamics trajectory was computed. A <strong>computer</strong> graphics image of thesimulation system is shown <strong>in</strong> Figure 6-2 (see the figure caption for further details).6.3.2.2 Analysis of the TrajectoryIn accord with previous simulations, the 10 water molecules located <strong>in</strong>side the channelare arranged approximately <strong>in</strong> s<strong>in</strong>gle file and their translational motion along thechannel axis is highly correlated [28, 29, 31, 321. Occurences of water-water and peptide-waterhydrogen bonds are observed; because the backbone carbonyls are moreeasily accessible than the amide N-H, peptide-water hydrogen bonds <strong>in</strong>volvedC = 0.e.H - 0 - H almost exclusively. The translational displacement of the watermolecules along the channel axis dur<strong>in</strong>g the 100 ps simulation is around 0.05 nm(kO.02 nm). The first <strong>in</strong>ternal water at the mouth of each monomer have thesmallest rms displacements (0.035 nm), with each such water mak<strong>in</strong>g a stablehydrogen bond to the hydroxyl hydrogen of the ethanolam<strong>in</strong>e tail (water oxygenpo<strong>in</strong>t<strong>in</strong>g outward). In addition, there are three other water molecules associated withthe entrance to each monomer. They have <strong>in</strong>termediate mobilities and makehydrogen bonds alternatively with the ethanolam<strong>in</strong>e and the carbonyl and amidegroups of the term<strong>in</strong>al am<strong>in</strong>o acids po<strong>in</strong>t<strong>in</strong>g toward the bulk solution. There is an<strong>in</strong>crease <strong>in</strong> fluctuation for the <strong>in</strong>ternal waters reach<strong>in</strong>g a maximum at the junctionof the gramicid<strong>in</strong> monomers.The displacement of water molecules along the channel axis corresponds to aconstant of self-diffusion, Ddiff, of 2.5 x cm2/s. This is much smaller than <strong>in</strong>the bulk phase where <strong>molecular</strong> dynamics with TIP3P yields 3.2 x lo-’ cm2/s [69],close to the experimental value of 2.1 x cm2/s [I]. Assum<strong>in</strong>g that the permeationof water molecules is limited only by passage through the channel, the diffusionalpermeability coefficient of water, can be expressed as,


6 Theory of Transport <strong>in</strong> Zon Channels 147SDdiffPdiff = ~L9(6-14)where L and S are, respectively, the length and the cross-section area of the pore. Thelength L corresponds to the region of the channel that conta<strong>in</strong>s water molecules witha markedly slower mobility. This length is 2.3 nm from the <strong>molecular</strong> dynamics simulation.Tak<strong>in</strong>g a cross-section, S, correspond<strong>in</strong>g to the area of a circle with the radius ofa water molecule, i. e., S = 7~ (0.14 nm)2 yields a diffusional permeability coefficientof 0.85 x lo-’’ cm3/s. Reported experimental values are 1.82 x cm3/s, fromAndersen and F<strong>in</strong>kelste<strong>in</strong> [24], and 6.6 x cm3/s, from Dani and Levitt [22].Although there is some uncerta<strong>in</strong>ty about the experimental estimate, the calculateddiffusional permeability of water has the correct order of magnitude. Because thediffusional permeability is strongly <strong>in</strong>fluenced by the value of the constant of selfdiffusionof water <strong>in</strong>side the channel, the present trajectory <strong>in</strong>dicates that it is plausiblethat diffusion of the water molecules <strong>in</strong>side the channel is much slower than <strong>in</strong>the bulk phase.6.3.2.3 Rotational Mobility of Internal Water MoleculesIt of <strong>in</strong>terest to analyze the ability of the <strong>in</strong>ternal water to re-orient with respect tothe axis of the channel. This provides some <strong>in</strong>sight on the response of the channeland its <strong>in</strong>ternal waters to a transmembrane potential. To analyze the rotationalmobility of the water <strong>in</strong>side the channel a unit vector antiparallel to the water electricdipole, po<strong>in</strong>t<strong>in</strong>g from the mid-po<strong>in</strong>t between the two hydrogens to the oxygen,was def<strong>in</strong>ed. The projection of this vector along the channel axis is plotted <strong>in</strong>Figure 6-3 for the 10 <strong>in</strong>ternal waters; one mouth (1) and one bulk-like (12) watermolecule are <strong>in</strong>cluded for comparison. It is observed that the rotational mobility ofwater <strong>in</strong>side the channel varies significantly depend<strong>in</strong>g on the position along thechannel. The water molecule hydrogen bonded to the ethanolam<strong>in</strong>e hydrogen andtheir first neighbors ma<strong>in</strong>ta<strong>in</strong> their orientation throughout the simulation and showstrong motional anisotropy. The rotational motions <strong>in</strong>crease steadily as one progresses<strong>in</strong>to the channel; the water located at the monomer-monomer contact is seento have the largest motion. Its orientation distribution is bi-modal. In contrast, thewater molecules <strong>in</strong> the bulk and at the mouth have isotropic orientation, as shown<strong>in</strong> their distribution histograms. Their rotational correlation time is on the order of5 to 10 ps. The rotational correlation time of the first <strong>in</strong>ternal waters is larger than100 ps and cannot be <strong>in</strong>ferred from the simulation. This picture contrasts withprevious results obta<strong>in</strong>ed from <strong>molecular</strong> dynamics <strong>in</strong> which all the <strong>in</strong>side waterswere seen to po<strong>in</strong>t <strong>in</strong> the same direction <strong>in</strong> a relatively ordered l<strong>in</strong>ear hydrogenbondednetwork 128, 29, 31, 321. Here there is an approximate mirror plane for the


148 Benoit Roux-I I I I II I I I II’I I I I-110II I I I I10-11I I I I II I-:;I0-110-10 20 40 60 80 100t In ps-v-4I- L ,II 0-I 4RelativeProbabilityFigure 6-3. Projection of the water orientation vector along the channel axis as a function oftime dur<strong>in</strong>g the simulation. The order<strong>in</strong>g corresponds to: number 1 is a mouth water;number 2 to 11 are the 10 waters <strong>in</strong>side the channel, (number 6 and 7 are at the <strong>in</strong>ter-monomerjunction); number 12 is a bulk water <strong>in</strong>cluded for comparison. Water number 2 is hydrogenbonded with ethanolam<strong>in</strong>e of monomer 1 which undergo a conformational change at 25 psand shows an <strong>in</strong>crease <strong>in</strong> fluctuations compared to water number 11, the equivalent entrancewater of monomer 2.I


6 Theory of Transport <strong>in</strong> Ion Channels 149waters, consistent with the dimer symmetry. Such differences may be due to severalfactors: Firstly, it has been observed that the structure and dynamics of the <strong>in</strong>ternalwater is very sensitive to the flexibility of the channel [32]; secondly, it may be thatthe dynamics and structure of the <strong>in</strong>ternal waters depends on detailed aspects of thepotential function, <strong>in</strong> particular those represent<strong>in</strong>g the hydrogen bond<strong>in</strong>g <strong>in</strong>teraction.6.3.2.4 Rare Events : Ethanolam<strong>in</strong>e Tail IsomerizationObservation of rare events <strong>in</strong> a simulation, even though they are not statisticallysignificant, can give useful <strong>in</strong>formation. Dur<strong>in</strong>g the 100 ps trajectory, the ethanolam<strong>in</strong>etail of one of the monomers changed its conformation (the ethanol is locatedat x = + 1.4 nm and is labeled residue + 16). As can be seen <strong>in</strong> Figure 6-4, the transitionsof the three dihedral angles take place <strong>in</strong> a concerted fashion. The changes <strong>in</strong>the successive dihedral angles are anticorrelated along the ethanolam<strong>in</strong>e tail. Thiscan be observed clearly for the major transition at 25 ps and, to a lesser degree,around 50 ps for the second transition of the angle ty. The anticorrelation is relatedto the fact that the transition takes place without large and sudden displacements ofthe 0 - H group at the end of the tail [78]. The hydrogen bond with the entrancewater is lost <strong>in</strong> the time <strong>in</strong>terval from 20 to 25 ps (see water 2 <strong>in</strong> Figure 6-3). Thelatter shows an <strong>in</strong>crease <strong>in</strong> fluctuations compared to the water at the other entrance(water 11) which does not lose its ethanolam<strong>in</strong>e hydrogen bond. From 25 to 50 psthe ethanolam<strong>in</strong>e hydrogen bonds alternately to the three mouth waters and the twor 360 300 180r---12040030050 -P'CI 0 -.-L".ran -120 .-"-10 10 30 50 70 SO 110 -10 10 30 50 70 SO 110t <strong>in</strong> pat <strong>in</strong> paE *0°8:u 100c-r.x o-1 00-200 ..-10 10 30 50 70 SO 110t <strong>in</strong> psFigure 6-4. Dihedral angle transitions of the ethanolam<strong>in</strong>e of monomer 1, ( -CNC,CBOH).The angles are def<strong>in</strong>ed as CN%,CB, NC,%BO and C,CDYOH.


150 Benoi’t Rouxbulk waters. A stable hydrogen bond is formed with a bulk water at 50 ps and isma<strong>in</strong>ta<strong>in</strong>ed for the rest of the trajectory leav<strong>in</strong>g the entrance of the channel free. Itis possible that this type of rare event plays a role <strong>in</strong> the entrance and exit of watermolecules. Appropriate techniques based on biased sampl<strong>in</strong>g must be used to studyrare events accurately. This is the object of the next section.6.3.3 Calculation of the Potential of Mean Force:Free Energy SimulationIn Section 6.2, it was shown that the free energy profile, W(x), plays a fundamentalrole <strong>in</strong> the phenomenological description of ion transport. From its def<strong>in</strong>ition, W(x)is the free energy of the system when the ion is at position x along the channel axis,i.e., [79],(6-15)where U(R,x) is the total potential energy of the system and R represents all thecoord<strong>in</strong>ates of the system other than the x coord<strong>in</strong>ate of the ion (the x, y, z coord<strong>in</strong>atesof all the channel and water atoms <strong>in</strong>clud<strong>in</strong>g the y and z coord<strong>in</strong>ates ofthe ion), xo is an arbitrary reference position chosen such that W(xo) = Wo. Thefree energy profile is often called “potential of mean force”, follow<strong>in</strong>g from the propertythat the average force exerted on the ion when it is located at position x, i.e.,( -aU(R, x)/dx>, is equal to - aW(x)/ax [79]. Because the potential of mean force,based on Eq. (6-15), can be related to the average probability distribution functionof the ion,(6-16)it can, <strong>in</strong> pr<strong>in</strong>ciple, be extracted directly from (p(x)) through a normal <strong>molecular</strong>dynamics simulation. However, because it is expected that the movements of the ionalong the channel axis are tak<strong>in</strong>g place on a very long time-scale compared torealistic simulation times, a direct approach is not practical. It is necessary to usea biased sampl<strong>in</strong>g technique to obta<strong>in</strong> accurate results on the potential of meanforce. Such a technique is free energy perturbation which is illustrated <strong>in</strong> the nextsection.


6.3.3.1 Application and Techniques6 Theory of Transport <strong>in</strong> Ion Channels 151The potential of mean force of the ion along the channel axis is calculated us<strong>in</strong>g thefree energy simulation technique [80, 811. From Eq. (6-15), the potential of meanforce at a position W(x + Ax) can be expressed <strong>in</strong> terms of W(x),AW(x+x+ Ax) = W(x+ Ax) - W(x) = -k,T ln(e-A”kB7)U, (6-17)where the bracket with subscript x represents a canonical average with the reactioncoord<strong>in</strong>ate held fixed at x, i. e.,(6-18)and AU is the change <strong>in</strong> potential energy obta<strong>in</strong>ed by displac<strong>in</strong>g the ion from x tox + Ax, i.e., AU is equal to U(R,x + Ax) - U(R,x). The average Eq. (6-18) iscalculated from an ensemble of configurations generated by a <strong>computer</strong> simulationof the system <strong>in</strong> thermal equilibrium with the ion fixed at x. Although Eq. (6-17) isformally valid for any Ax, convergence <strong>in</strong> achievable <strong>computer</strong> times limits thecalculation to the free energy differences <strong>in</strong> the neighborhood of x. In practice, theperturbations have to be relatively small to obta<strong>in</strong> rapid convergence. In practice itis necessary to generate several trajectories of the system with the ion fixed at variousvalues of x. The complete profile is constructed by jo<strong>in</strong><strong>in</strong>g the free energy differencesobta<strong>in</strong>ed with Eq. (6-17) at the mid-po<strong>in</strong>ts between neighbor<strong>in</strong>g simulations, that is,W(X,+,) = W(X,) + AW(xn+xn + Ax)-AW(x,+I +xn+l -AX)=n= C [AW(X~+X~ + Ax) - AW(xi+l +xi+l - AX)], (6-19)i=lwhere Ax is the mid-distance between two neighbor<strong>in</strong>g simulations,(6-20)W(x) can also be calculated by <strong>in</strong>tegrat<strong>in</strong>g the reversible work done by the meanforce (F(x)) act<strong>in</strong>g on the ion <strong>in</strong> the x direction, i. e.,(6-21)


152 Benoit RouxOne advantage of this formulation is that the mean force can be decomposed l<strong>in</strong>early<strong>in</strong>to a sum of contributions, e.g.,(6-22)and the l<strong>in</strong>earity, preserved by the <strong>in</strong>tegral of the mean force along the reaction coord<strong>in</strong>ate,allows to determ<strong>in</strong>e the contribution Wa(x) of any <strong>in</strong>teraction term to thepotential of mean force,(6-23)with(6-24)This method can be very useful <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g the contribution of the solvent moleculesor specific residues to the free energy profile. The total potential of mean force canbe computed equivalently with the free energy simulation technique, via Eq. (6-17).However, the <strong>in</strong>tegrated mean force decomposition can only be obta<strong>in</strong>ed us<strong>in</strong>gEqs. (6-22), (6-23) and (6-24).The free energy simulation technique is computationally <strong>in</strong>tensive, i. e., to reachconvergence and obta<strong>in</strong> accurate results it is necessary to generate long trajectorieswith the ion constra<strong>in</strong>ed at various positions along the x-axis. For this reason it isof <strong>in</strong>terest to use a system with as small a number of atoms as possible. To avoidthe large number of water molecules necessary to solvate the mouth of the channel,the potential of mean force was calculated for Na’ ion along the axis of a periodic(L, D) poly-alan<strong>in</strong>e P-helix. Such a periodic helix is appropriate for <strong>in</strong>vestigat<strong>in</strong>g thes<strong>in</strong>gle file translocation of the ion and its neighbor<strong>in</strong>g waters <strong>in</strong> the <strong>in</strong>terior of thechannel where end effects and sidecha<strong>in</strong>s are expected to play a secondary role.Moreover, this choice provides a test of the methodology s<strong>in</strong>ce any deviation fromperiodicity of the average properties <strong>in</strong>dicates a lack of convergence <strong>in</strong> virtue of thehelix symmetry.The periodic helix structure was first ref<strong>in</strong>ed <strong>in</strong> the absence of any ion and solventmolecules. The helical parameters, the rise and the rotation angle per (LAla, D-Ala)unit, were optimized by energy m<strong>in</strong>imization with the ABNR algorithm [65]. It wasfound that the optimum rise per unit, hL, is 0.155 nm and the optimum turn perunit, 0, is 114.4 degrees, yield<strong>in</strong>g 6.29 residues per turn; this value is similar to thatof the Urry helix model (6.3) [30]. The total number of particles is 229. The fundamentalunit, treated with periodic boundary conditions to avoid end effects (imagesof the system are repeated along the helix axis), <strong>in</strong>cludes 34 alan<strong>in</strong>e residues,


6 Theory of Transport <strong>in</strong> Ion Channels 153a cation and 8 water molecules. The <strong>in</strong>itial coord<strong>in</strong>ates of the full periodic systemwere optimized by energy m<strong>in</strong>imization. The system was then equilibrated at 300 Kdur<strong>in</strong>g 10 ps with the ion constra<strong>in</strong>ed <strong>in</strong> x but free to move <strong>in</strong> the y and z directions(see [38] for more details).Eight simulations were carried out to determ<strong>in</strong>e the free energy profile along theaxis of the /I-helix for one (Lala, D-ala) repeat unit of the periodic system, i. e., forx between 0.0 and 0.155 nm. The perturbation distance Ax is equal to (1116) x0.155 nm. A 9-th simulation, which should be equivalent to the first one by symmetry,was calculated to determ<strong>in</strong>e the statistical convergence. The free energysimulation protocol (after standard <strong>in</strong>itial stages of equilibration) consisted of1. A 25 ps trajectory is generated with the ion constra<strong>in</strong>ed at x and the free energydifference calculated for -Ax, and + h.2. The ion is displaced by + 2Ax (0.019 nm) along the x axis for the next simulation.3. A short energy m<strong>in</strong>imization is applied to the water and the channel atoms surround<strong>in</strong>gthe ion (ABNR) [65] to remove local stra<strong>in</strong>s and the system is equilibrateddur<strong>in</strong>g 5 ps of <strong>molecular</strong> dynamics with the ion constra<strong>in</strong>ed at the new x position.4. The cycle is repeated start<strong>in</strong>g with step (1).F<strong>in</strong>ally, all the free energy differences are pieced together us<strong>in</strong>g Eq. (6-19) to generatethe free energy profile, W(x). The simulation time-step was 0.001 ps and averageswere evaluated us<strong>in</strong>g configurations separated by 0.02 ps. For each constra<strong>in</strong>ed positionalong the x-axis the <strong>in</strong>stantaneous forces exerted by the channel and the watermolecules were stored to calculate the mean force contribution to the free energy profileus<strong>in</strong>g Eqs. (6-22), (6-23) and (6-24).6.3.3.2 Analysis of the ResultsThe results are shown <strong>in</strong> Figure 6-5. M<strong>in</strong>ima exist near x = 0 and x = 0.155, separatedmidway by an energy barrier of 18.9 kJ/mol. By periodicity, it follows that the potentialof mean force for Na' along the axis of the helix is made up of a sequence ofwell-def<strong>in</strong>ed b<strong>in</strong>d<strong>in</strong>g sites and energy barriers separated by 0.155 nm; there is onesuch b<strong>in</strong>d<strong>in</strong>g site for every two carbonyl oxygens. The <strong>in</strong>tegrated mean force decompositionmethod shows that average water and channel forces each contribute toabout one half of the activation free energy. To analyze the free energy profile, asearch was made to f<strong>in</strong>d the nearest neighbors of the Na' <strong>in</strong> the two equivalent b<strong>in</strong>d<strong>in</strong>gsites and at the <strong>in</strong>terven<strong>in</strong>g barrier. There are four carbonyl oxygens and twowater molecules <strong>in</strong> close contact with the ion <strong>in</strong> each b<strong>in</strong>d<strong>in</strong>g site. The solvationstructure around the ion is transformed <strong>in</strong> a cont<strong>in</strong>uous fashion as the ion movesfrom a b<strong>in</strong>d<strong>in</strong>g site through the transition state to the adjacent b<strong>in</strong>d<strong>in</strong>g site. Dur<strong>in</strong>g


154 Benoft Roux-.3.-15-c 10 -h5 5 -30 --0.05 0 0.05 0.1 0.15 0.2x <strong>in</strong>nmFigure 6-5. Free energy profile of Na' ion along the axis of the periodic (L, D) poly-alan<strong>in</strong>eP-helix, as obta<strong>in</strong>ed from the perturbation technique described <strong>in</strong> Section 6.3.3 (solid l<strong>in</strong>e).The helix axis is oriented along the x-axis, the odd numbered Lalan<strong>in</strong>e carbonyls po<strong>in</strong>t<strong>in</strong>gtoward the N-term<strong>in</strong>us (+x), the even numbered D-alan<strong>in</strong>e carbonyls po<strong>in</strong>t<strong>in</strong>g toward the C-term<strong>in</strong>us (-x). Water (dotted l<strong>in</strong>e) and channel (dashed l<strong>in</strong>e) contributions to the free energyprofile obta<strong>in</strong>ed from the <strong>in</strong>tegrated mean force decomposition, Eqs. (6-22), (6-23) and (6-24)are shown. The arbitrary zero of the potential of mean force, xo, was chosen at the positionof the free energy m<strong>in</strong>imum of W(x). A small hysteresis of 3.0 kJ/mol was l<strong>in</strong>early corrected(see [38] for more details).the transition the ion rema<strong>in</strong>s <strong>in</strong> close contact with two of the four carbonyls <strong>in</strong>volved<strong>in</strong> the first b<strong>in</strong>d<strong>in</strong>g site and the two other carbonyls are replaced by two newcarbonyls. At the transition state, the ion rema<strong>in</strong>s <strong>in</strong> close contact with two of thefour carbonyls of the nearest b<strong>in</strong>d<strong>in</strong>g sites, the contact with the rema<strong>in</strong><strong>in</strong>g two carbonylsis essentially lost. The ion contact with two water oxygens is ma<strong>in</strong>ta<strong>in</strong>edthrough the entire transition with an average ion-oxygen distance of 0.232 nm. Thepattern of ion-carbonyl contacts dur<strong>in</strong>g the translocation of Na' described here isnot a particular attribute of the periodic P-helix and was also observed <strong>in</strong> recentcalculations <strong>in</strong>volv<strong>in</strong>g a model of the full gramicid<strong>in</strong> A dimer channel [40].Water-channel hydrogen bond<strong>in</strong>g <strong>in</strong>teractions are responsible for the contributionsof the water molecules to W(x). In the b<strong>in</strong>d<strong>in</strong>g site, the two water molecules<strong>in</strong> contact with the ion are able to make stable hydrogen bonds with the carbonyls.At the transition state, the Na' ion and its two water neighbors are displaced by0.075 nm along the helix axis where similar hydrogen bonds are no longer possible.Likewise, peptide-peptide hydrogen bonds are the orig<strong>in</strong> of the mean force contributionof the channel. The l<strong>in</strong>ear spac<strong>in</strong>g along the channel axis of the four carbonylscoord<strong>in</strong>at<strong>in</strong>g the Na' ion <strong>in</strong> the b<strong>in</strong>d<strong>in</strong>g site is such that good oxygen-Na' contactis achieved with small dihedral distortions, and little stress on the helix. At the transitionstate the Na+ ion is also mak<strong>in</strong>g contact with four carbonyl oxygens, but largerhelix distortions are necessary to achieve as good a coord<strong>in</strong>ation as <strong>in</strong> the b<strong>in</strong>d<strong>in</strong>g


6 Theory of Transport <strong>in</strong> Ion Channels 155site, result<strong>in</strong>g <strong>in</strong> a slightly less stable situation for the whole system, because they arelocated on opposite sides of the channel.The local flexibility and plasticity of the structure (i.e. the property to deformwithout generat<strong>in</strong>g large energy stress) seems to be an essential feature to determ<strong>in</strong>eits response to the presence of an ion. As was first noted <strong>in</strong> a normal mode study[37], spontaneous fluctuations and distortions of the helix are significant but limitedto a few carbonyls. Even though the <strong>in</strong>dividual <strong>in</strong>teraction of the Na’ with each <strong>in</strong>dividualcarbonyl is quite large (on the order of 160 kJ/mol), the local flexibility ofthe helical structure is able to remove the very unfavorable steps of b<strong>in</strong>d<strong>in</strong>g and unb<strong>in</strong>d<strong>in</strong>gto successive carbonyls that would generate a large energy barrier oppos<strong>in</strong>gthe translocation.In conclusion, the calculation of the potential of mean force of a Na’ ion <strong>in</strong> a/3-helix represent<strong>in</strong>g the <strong>in</strong>terior of the gramicid<strong>in</strong> channel has revealed that the activationfree energy is not controlled by the strong ion-carbonyl and ion-water <strong>in</strong>teraction,but by the water-peptide and peptide-peptide hydrogen bond<strong>in</strong>g <strong>in</strong>teractions.This conclusion differs essentially from previous approaches to the study ofion selectivity <strong>in</strong> channels where the channel structure was essentially rigid, and theion-channel <strong>in</strong>teraction energy was thought to be the controll<strong>in</strong>g factor <strong>in</strong> ion selectivity[34, 821.The potential of mean force, rather than dynamical factors, is thought to bema<strong>in</strong>ly responsible for the particular selectivity of a channel [21, 24, 58, 82, 831.However, to provide a complete description of the transport properties through achannel it is also necessary to <strong>in</strong>vestigate the nature of the dynamics of ions <strong>in</strong>sidethe channel. This question is addressed <strong>in</strong> the next section.6.3.4 Calculation of a Transition Rate:Activated Dynamics TechniqueThe potential of mean force of Na’ along the axis of a periodic /3-helix is made ofa sequence of free energy barriers of 18.9 kJ/mol [38]. Because each barrierrepresents an activation free energy that is significantly larger than kBI; it is appropriateto express the long time transport of Na’ <strong>in</strong> the /3-helix <strong>in</strong> terms of a“hopp<strong>in</strong>g” process between discrete states as <strong>in</strong> ERT models. However, Eyr<strong>in</strong>g’sTransition State Theory (TST) rate [84] often used to express the hopp<strong>in</strong>g rates, mustbe modified. The TST rate is [84, 851,(6-25)


156 Benoft Rouxwhere x and v are the position and the x-component of the velocity of the ion, (i. e.,along the reaction coord<strong>in</strong>ate x); xb is the position of the barrier; 0(v) is aHeaviside step function, and the <strong>in</strong>tegral <strong>in</strong> the denom<strong>in</strong>ator is over the “ reactant”well along the reaction coord<strong>in</strong>ate. The TST rate is based on the assumption that alltrajectories <strong>in</strong>itiated with a positive velocity at the barrier top will be reactive. Itnecessarily represents an upper bound because some attempts to cross the barrier topmay fail due to the dissipative and collisional forces. To account for non-reactive trajectoriesthe transition rate is written as [86, 871,where IC is called the “transmission coefficient”. The transmission coefficient isalways less than or equal to one and sett<strong>in</strong>g K = 1 is equivalent to TST. In pr<strong>in</strong>ciplethe exact transition rate could be calculated by monitor<strong>in</strong>g directly the number oftransitions per unit of time from a normal <strong>molecular</strong> dynamics simulation. However,this is impractical s<strong>in</strong>ce a significant number of transitions would not be expectedto take place dur<strong>in</strong>g a normal <strong>molecular</strong> dynamics trajectory due to the large activationfree energy. Moreover, <strong>in</strong> such simulation the ion would spend most of its timeat the bottom of the energy wells, and a very small fraction of the calculated trajectorywould provide <strong>in</strong>formation about the dynamical events responsible for thedeviations from TST. The “activated” trajectory technique is more appropriate tostudy such problem [86, 881. The method is illustrated <strong>in</strong> the next section.6.3.4.1 Application and TechniquesTo calculate the transmission coefficient, it is necessary to generate a set of activatedtrajectories. To produce one activated trajectory, the <strong>in</strong>itial conditions (i. e., <strong>in</strong>itialconfiguration and velocities) must be obta<strong>in</strong>ed from a biased ensemble. The ion isat the top of the free energy barrier <strong>in</strong> all the configurations of this biased ensemble(such ensemble of configurations can be generated from a <strong>molecular</strong> dynamics trajectory<strong>in</strong> thermal equilibrium dur<strong>in</strong>g which the ion is constra<strong>in</strong>ed at xb). The <strong>in</strong>itialvelocity of the ion along the channel axis is sampled from the non-Maxwellianvelocity distribution v0(v) e -mu2’2kBT; all other <strong>in</strong>itial velocities are sampled from aMaxwell distribution at room temperature. The <strong>in</strong>itial velocity of the ion can begenerated from random numbers R uniformly distributed between 0 and 1 us<strong>in</strong>g,(6-27)


6 Theory of TransDort <strong>in</strong> Zon Channels 157Tho trajectories are generated us<strong>in</strong>g the set of <strong>in</strong>itial conditions taken from thebiased ensemble. The first one is propagated forwards <strong>in</strong> time with these <strong>in</strong>itial conditions;the second one is propagated backwards <strong>in</strong> time, start<strong>in</strong>g with the same <strong>in</strong>itialconfiguration (this can be done by <strong>in</strong>vert<strong>in</strong>g the sign of all the <strong>in</strong>itial velocities<strong>in</strong> the system at t = 0). The forwards and backwards trajectories are calculated from0 to +T and - r respectively, and jo<strong>in</strong>ed together as a s<strong>in</strong>gle activated trajectory.In this manner a trajectory start<strong>in</strong>g at time -r end<strong>in</strong>g at time +T and go<strong>in</strong>gthrough the transition state at time t = 0 with a positive velocity is generated. Thesimulation time, r must be sufficiently long such that the barrier cross<strong>in</strong>gs eventsare completed [86]. In practice the dynamics at the barrier top relaxes rapidly andT is relatively short. The procedure is repeated many times to obta<strong>in</strong> a large numberof activated trajectories. In the present application 100 activated dynamics trajectorieswere generated from - 1.0 ps to + 1.0 ps. Typical examples of activated trajectoriesare shown <strong>in</strong> Figure 6-6; the fate of the trajectories is determ<strong>in</strong>ed <strong>in</strong> 0.5 ps orless. From the position of the ion at time f T the activated trajectory can be assignedto one <strong>in</strong> four types: reactant to product, product to reactant, reactant to reactantand product to product. The transmission coefficient, K, is calculated as the netnumber of “reactive” trajectories over the total number of activated trajectories,To obta<strong>in</strong> the rate constant k, the transmission coefficient K is comb<strong>in</strong>ed with kTsT~41.6.3.4.2 Analysis of the ResultsThe quantities relevant to the transition rate are summarized <strong>in</strong> Table 6-2. Comb<strong>in</strong><strong>in</strong>gkTsT = 2.1 x lo9 s-l with K = 0.11, a transition rate of k = 2.3 X is obta<strong>in</strong>ed.The TST rate is thus reduced by one order of magnitude. From the calculatedtransition rate it is possible to obta<strong>in</strong> an estimate of the maximum conductance ofthe gramicid<strong>in</strong> channel. In the /3-helix there is one free energy barrier per (L, D) unit.This implies that the total number of barriers <strong>in</strong> the complete gramicid<strong>in</strong> channelis around 15. This number is confirmed by more recent calculations on the potentialof mean force of Na+ along the axis of the dimer channel [40]. From Eq. (6-9) themaximal conductance, Amax, is 7 pmho. Errors <strong>in</strong> the transition rate and <strong>in</strong> theestimated channel conductance are dom<strong>in</strong>ated by the activation energy calculatedfrom the potential of mean force technique [38]. For example, a plausible error onthe order of k,T = 2.49 kJ/mol <strong>in</strong> the activation energy of 18.9 kJ/mol leads to afactor of 3 <strong>in</strong> the estimated transport rate. Thus, it is less the absolute value than


158 Benoit RouxReactant to ProductOa2 70.15 -0.15 -0.1 -0.05-0 --0.05-1.2 -0.8 -0.4 0 0.4 0.8 1.2t <strong>in</strong>psOe2 7Product to Product0.20.15Product to Reactant0.10.05-0.050 -0-0.05-1.2-0.8 -0.4 0 0.4 0.8 1.2t <strong>in</strong>ps-0.05-1.2 -0.8 -0.4 0 0.4 0.8 1.2t <strong>in</strong> psFigure 6-6. Samples of typical activated trajectories of Na' ion. The fate of the cross<strong>in</strong>gevent is decided with<strong>in</strong> one ps. Among the 100 activated trajectories, there were 26 forwardcross<strong>in</strong>g reactive trajectories (reactant to product), 15 backward trajectories (product to reactant),14 recross<strong>in</strong>g trajectories (reactant to reactant) and 45 recross<strong>in</strong>g trajectories (productto product). The transmission coefficient, IC, is calculated from the net number of reactive trajectories:0.11 = (26-15)/100.the analysis that is of <strong>in</strong>terest here. To analyze the factors responsible for the deviationsfrom transition state theory, the <strong>in</strong>stantaneous forces act<strong>in</strong>g on Na' dur<strong>in</strong>g areactive and a non-reactive activated dynamics trajectories are shown <strong>in</strong> Figure 6-7.The channel forces appear to be directly responsible for the recross<strong>in</strong>g of the non-


6 Theory of Transuort <strong>in</strong> Zon Channels 159Reactant to ProductReactant to Reactant2000 -10001500 1Eg 1000500 1 9 500c0 ;(D 0 :ae-500 1 9 -500- tt0-1000 e -1000-1500-2000-1.2 -0.8 -0.4 0 0.4 0.8 1.2t <strong>in</strong>ps-Q-1500;-;--2000-1.2 -0.8 -0.4 0 0.4 0.8 1.2t <strong>in</strong>ps20001500E1000P3 500eI!00-500L3g -1000Reactant to Product------IEf- C2000150010003 500I 0e9 -500gsL-1000Reactant to Reactant7-1500-1500-2000-1.2 -0.8 -0.4 0 0.4 0.8 1.2t <strong>in</strong>ps-2000-1.2 -0.8 -0.4 0 0.4 0.8 1.2t <strong>in</strong>psFigure 6-7. Typical example of the <strong>in</strong>stantaneous forces act<strong>in</strong>g on Na' ion dur<strong>in</strong>g the reactivetrajectory, reactant to product, and the non-reactive trajectory, reactant to reactant, shown<strong>in</strong> Figure 6-6. The forces exerted by the channel and the two nearest water molecules areshown. The carbonyl forces are not dom<strong>in</strong>ant dur<strong>in</strong>g the cross<strong>in</strong>g event. The dom<strong>in</strong>ant forcescaus<strong>in</strong>g the recross<strong>in</strong>g arise primarily from the nearest carbonyl oxygens.reactive trajectory. Dur<strong>in</strong>g the successful cross<strong>in</strong>g the water forces oscillate rapidlyand are predom<strong>in</strong>ant whereas the channel forces have moderate amplitude. Dur<strong>in</strong>gthe failed cross<strong>in</strong>g the channel forces are much larger while the water forces havesimilar amplitudes than dur<strong>in</strong>g the reactive trajectory, The nearest water molecules


160 Benoit Rouxrema<strong>in</strong> <strong>in</strong> close contact with the Na' ion dur<strong>in</strong>g both the reactive and non-reactivetrajectories and it is clear that the permeation process does not proceed by a vacancydiffusion mechanism as is often suggested [21, 891.Table 6-2. Transition rate of Na+ .PropertiesActivation energy A W *18.9 kJ/molWell frequency vwell4.0 ps-'Eyr<strong>in</strong>g's Transition State Theory rate kTsT2.1 x 109 s-lTransmission coefficient K 0.11Transition rate k = ~k~~2.3 x lo's-'6.3.4.3 Comparison with ExperimentsComparison of the calculated channel conductance with experimental values shouldbe made with caution s<strong>in</strong>ce many of the features of the full gramicid<strong>in</strong> channel arenot <strong>in</strong>cluded <strong>in</strong> the periodic P-helix system. At best the periodic P-helix model is avalid description for the <strong>in</strong>terior of the pore and end effects have been neglected. Theexpression used for A,,, is based on the assumption that the maximum conductanceunder a saturat<strong>in</strong>g concentration is determ<strong>in</strong>ed only by the translocation rateof Na+ <strong>in</strong>side a s<strong>in</strong>gly-occupied channel (see Section 6.2). Experimental observations<strong>in</strong>dicate that permeation of Na' through the gramicid<strong>in</strong> channel does notobey perfectly this simple model. Deviations of Na' fluxes from the one-ion poresaturation have been attributed to double-occupancy of the gramicid<strong>in</strong> channel [27].Furthermore, it is possible that the maximum conductance, A,,,, is affected by factorsothers than the translocation through the channel. For <strong>in</strong>stance, it has beenobserved that the maximum conductane of Na' depends on the composition of thelipid membrane; the maximum conductance of Na' ion through the gramicid<strong>in</strong> Achannel measured <strong>in</strong> phosphatidylethanolam<strong>in</strong>e (PE) membranes is 14.6 ps [24, 261,about a factor of two smaller than the value of 27 ps measured <strong>in</strong> glycerylmonooleate(GMO) [21]. Nevertheless, <strong>in</strong> spite of these uncerta<strong>in</strong>ties, the calculated valueof 7 pmho has the correct order of magnitude, show<strong>in</strong>g that the local <strong>in</strong>teractions<strong>in</strong> the P-helix per se dur<strong>in</strong>g the translocation of Na' give rise to significant activationbarriers that could account for the observed diffusion rate.


6.3.5 Relation Between NP and ERT6 Theory of Dansport <strong>in</strong> Ion Channels 161Traditionally, the ERT and NP approaches take opposite viewpo<strong>in</strong>ts <strong>in</strong> descriptionof ion movements <strong>in</strong> terms of microscopic events (see Section 6.2) [l]. In fact, undercerta<strong>in</strong> conditions, they can be very similar. Although it is clearly appropriate to expressthe long time transport of Na’ <strong>in</strong> the 8-helix <strong>in</strong> terms of a hopp<strong>in</strong>g processbetween discrete states, the NP diffusion equation can also provide a mean<strong>in</strong>gfuldescription of the translocation process and a good estimate of the long time transportrate <strong>in</strong> the present case. In Eq. (6-9) the A,,,-NP <strong>in</strong>volves the effective diffusionconstant, Deff, def<strong>in</strong>ed <strong>in</strong> Eq. (6-11) as a spatial average over the length of thepore. S<strong>in</strong>ce the potential of mean force is made of a sequence of identical wells andbarriers, Deff can be approximated as,(6-31)where AW* is the activation free energy, [w(xb)- W(x,)], and W”(xw) and- w,l(xb) are the second derivatives of the potential of mean force at the bottomof the well (x,) and at the top of the barrier (xb), respectively. Under these conditionsA,,-NP is equivalent to A,,-ERT provided the pre-exponential factor istaken as,(6-32)An essential aspect of this approximation, called “high friction limit”, is that <strong>in</strong>ertialdynamical effects are neglected [90], as <strong>in</strong>dicated by the fact that Fp is <strong>in</strong>dependentof the mass of the ion.The value of the diffusion constant at the barrier top can be obta<strong>in</strong>ed, via theE<strong>in</strong>ste<strong>in</strong> relation D = k,T/


162 Benoit Rouxis the deviation of the <strong>in</strong>stantaneous force relative to the average force act<strong>in</strong>g on theion constra<strong>in</strong>ed at the barrier top [39, 91, 921. The <strong>in</strong>tegrand <strong>in</strong> Eq. (6-33), i.e.,


6 Theory of Thansport <strong>in</strong> Ion Channels 163c.-10c..g 0.84c8 0.6c9)n4 0.4Ei= 0.2- w'0.-g obz-0.2d\ I1 I ,0 0.1 0.2 0.3 0.4t <strong>in</strong>psFigure 6-8. Decomposition of the normalized time-dependent friction of Na+ ion,t(t)/[(O), calculated at the transition state (solid l<strong>in</strong>e). The water-water contributiont,, (t)/((O) (dashed l<strong>in</strong>e), the channel-channel (cc(t)/((0)(short-dashed l<strong>in</strong>e), and the waterchannelcross terms [(,(t) + (wc(t)]/((0) (dotted l<strong>in</strong>e) are also shown. The rootisequal to 483 kJ/mol/nm. The total friction calculated frommean-square force, v m,Eq. (6-33) corresponds to a diffusion constant of 0.64 x lop5 cm2/s, i. e., less than one thirdof the experimental diffusion constant of Na+ <strong>in</strong> bulk water [l]; with (, TCc and [(,(t) +


164 Benoft RouxTable 6-3. Pre-exponential frequency factors.ExpressionActivated dynamicsHigh friction limitEyr<strong>in</strong>g’s Transition State TheoryGas phase k,T/hFp <strong>in</strong> ps-’0.440.324.006.25coupl<strong>in</strong>g contribute equally to the total static friction act<strong>in</strong>g on Na’ . The importanceof the water-channel cross-coupl<strong>in</strong>g contribution suggests that the ion-ligandcomplex is tightly structured.It is of <strong>in</strong>terest to compare the pre-exponential factor, Fp, obta<strong>in</strong>ed from approximateexpressions with the “exact” result of the activated dynamics trajectories.Various expressions for Fp are given <strong>in</strong> Table 6-3. It is seen that the TST rate iswrong by one order of magnitude <strong>in</strong> the case of Na’ movements <strong>in</strong> the <strong>in</strong>terior ofthe &helix. The reason is that the TST pre-exponential dynamical factor, solelydeterm<strong>in</strong>ed by the potential of mean force and the mass of the ion, ignores all frictionaland collisional effects and represents an upper bound to the exact Fp.Generally the TST rate overestimates the exact rate <strong>in</strong> dense liquid systems [94].Although it does not yield the exact result, the high friction limit provides a muchbetter estimate than TST <strong>in</strong> the present case. Inertial and “memory” effects due tothe f<strong>in</strong>ite decay time of the time-dependent friction, neglected <strong>in</strong> the high frictionapproximation, are responsible for the rema<strong>in</strong><strong>in</strong>g discrepancy. More sophisticatedapproximations have been proposed to account for such effects [94, 951. One expressionfor the pre-exponential dynamical factor, often mentioned <strong>in</strong> early publicationsus<strong>in</strong>g ERT models to describe ion channels [52] is k,T/h, where h is Planck’s constant.This expression, designed for gas phase, is not valid for reactions tak<strong>in</strong>g place<strong>in</strong> dense liquids and yields a Fp that is largely overestimated.6.4 ConclusionsA coherent, though <strong>in</strong>complete, picture of the permeation process through thegramicid<strong>in</strong> channel has emerged from <strong>molecular</strong> dynamics simulations based ondetailed atomic models with realistic microscopic <strong>in</strong>teractions. Based on our experiencewith the gramicid<strong>in</strong> channel, it is possible to propose a general strategy fortheoretical studies of ion transport <strong>in</strong> complex biological systems. First, the importantmicroscopic <strong>in</strong>teractions should be identified, and an accurate empirical energyfunction developed, based on available experimental data <strong>in</strong> comb<strong>in</strong>ation with ac-


6 Theory of Tkansport <strong>in</strong> Ion Channels 165curate ab <strong>in</strong>itio quantum mechanical calculations on small <strong>molecular</strong> fragments;biased simulation techniques, necessary to overcome the sampl<strong>in</strong>g problems, are thenused to determ<strong>in</strong>e the potential of mean force (free energy simulation) and thecalculate the transition rate (activated dynamics trajectory); traditional phenomenologies(ERT and NP) provide a conceptual framework to relate the <strong>in</strong>formationga<strong>in</strong>ed from the detailed <strong>molecular</strong> dynamics to the experimental macroscopic observables.Very similar approaches, sometimes <strong>in</strong>volv<strong>in</strong>g other quantum chemicalmethods or biased sampl<strong>in</strong>g techniques, are now used to <strong>in</strong>vestigate dynamical transportproperties <strong>in</strong> widely different systems (see [96] and reference there<strong>in</strong>).Despite their sophistications it is also important to realize that modern computationalmethods have limitations, particularly <strong>in</strong> study<strong>in</strong>g complex biological systems.Experimentally measured differences <strong>in</strong> ion permeabilities can often be accountedfor by changes of only a few kJ/mol <strong>in</strong> the activation energies. It should be clearthat such small differences may be beyond the accuracy of present computationalmethods. Thus, <strong>in</strong> a theoretical study of ion permeation based on a detailed atomicmodel, it is less the absolute transport rate than the analysis of the microscopic factorsnot directly accessible to experimental measurements that are of <strong>in</strong>terest. In try<strong>in</strong>gto understand the transport properties of an ion channel, the potential of meanforce, W(x), an important concept <strong>in</strong> modern discussions of dynamical and rate processes<strong>in</strong> liquids [86, 941, provides essential <strong>in</strong>sight. Although the description of thetransport properties rema<strong>in</strong>s <strong>in</strong>complete without an <strong>in</strong>vestigation of the dynamics,the nature and the overall time-scale of the ion movements <strong>in</strong>side the channel areoften largely determ<strong>in</strong>ed by the character of the free energy landscape <strong>in</strong> W(x). Toga<strong>in</strong> more <strong>in</strong>sight the various contributions to the potential of mean force can beanalyzed with the <strong>in</strong>tegrated mean force decomposition method; similarly, the frictionconstant can be l<strong>in</strong>early decomposed <strong>in</strong> terms of cross-coupl<strong>in</strong>g of the fluctuat<strong>in</strong>gforces. The decomposition method, <strong>in</strong> comb<strong>in</strong>ation with site-directedmutagenesis experiments, may be very useful to extract the effect of particularresidues on the factors affect<strong>in</strong>g permeability. Decomposition of the microscopicquantities <strong>in</strong> terms of various contributions is important because it allows a detailedunderstand<strong>in</strong>g of how an ion channel works <strong>in</strong> terms of its <strong>molecular</strong> structure.In future work the grow<strong>in</strong>g body of experimental data on the properties ofmodified gramicid<strong>in</strong> “channels” will be exploited. The gramicid<strong>in</strong> A is particularlywell suited for such structure-function studies <strong>in</strong> view of its structural and functionalsimplicity. Experiments <strong>in</strong>volv<strong>in</strong>g am<strong>in</strong>o acid substitutions will allow the <strong>in</strong>vestigationof the <strong>in</strong>fluence of particular side-cha<strong>in</strong>s on the permeation process [27, 97-99],and on the stability of monomer-monomer association <strong>in</strong> the lipid membrane [27].A tartaric acid l<strong>in</strong>ked channel show<strong>in</strong>g rapid <strong>in</strong>terruptions <strong>in</strong> ion flux measurements,similar to the “flicker<strong>in</strong>g” observed <strong>in</strong> biological channels [l], will permit the <strong>in</strong>vestigationof chapel gat<strong>in</strong>g k<strong>in</strong>etics [loo- 1021. Simulations of the GA channelembedded <strong>in</strong> realistic models of the phopholipids bilayer environment will be performed[103].


166 Benoft RouxThis chapter demonstrates that modern computational methods and <strong>molecular</strong>dynamics techniques provide powerful tools to study ion transport <strong>in</strong> complexbiological systems. Already, macro<strong>molecular</strong> <strong>model<strong>in</strong>g</strong> with atomic models is <strong>in</strong>creas<strong>in</strong>glyused, <strong>in</strong> comb<strong>in</strong>ation with experimental studies, <strong>in</strong> the rational design andsynthesis of artificial channels [104- 1061. As their three-dimensional structures willbecome available, it is hoped that the theoretical methods outl<strong>in</strong>ed <strong>in</strong> this chapterwill provide a “roadmapyy <strong>in</strong> study<strong>in</strong>g the function of biological channels.AcknowledgementsThe support of the Medical Research Council fo Canada is gratefully acknowledged.References[l] Hille, B., Ionic Channels of Excitable Membranes, S<strong>in</strong>auer, Sunderland, MA, 1984.[2] Parsegian, A., Nature 1969, 221, 844-846.[3] Neher, E., Sackmann, B., Sci. Am. 1992, 266, 44-51.[4] Catterall, W. A., Science 1988, 242, 50-61.[5] He<strong>in</strong>eman, S. H., Terlau, H., Stuhmer, W., Imoto, K., Numa, S., Nature 1992, 356,441 -443.[6] Galzi, J. L., Devillers-Thiery A., Hussy, N., Bertrand, S., Changeux, J. P., Bertrand, D.,Nature 1992, 359, 500.[7] Guy, H. R., Seetharamulu, P., Proc. Natl. Acad. Sci. USA 1986, 83, 508-512.[8] Guy, H. R., Conti, F,, TINS 1990, 13, 201-206.[9] Durrel, S. R., Guy, H. R., Biophys. 1 1992, 62, 238-250.[lo] Boxer, A., Bogusz, S., Busath, D., Prote<strong>in</strong> Eng. 1992, 5, 285-293.[ll] Bogusz, S., Busath D., Biophys. 1 1992, 62, 19-21.[12] Deisenhofer, J., Epp, O., Miki, K., Huber, R., Michel, H., Nature 1985, 328, 618-624.[13] Henderson, R., Baldw<strong>in</strong>, J. M., Ceska, T. A., Zeml<strong>in</strong>, F., Beckmann, E., Down<strong>in</strong>g,K. H., L Mol. Biol. 1990, 213, 899-929.[14] Weiss, M. S., Wacker, T., Weckesser, J., Welte, W., Schultz, G. E., FEBSLett. 1990,267,269-272.[15] Cowan, S. W., Schirmer, T., Rummel, G., Steirt, M., Ghosh, R., Paupit, R. A., Jan-sonius J. N., Rosenbusch, J. P., Nature 1992, 358, 727-733.[16] Brisson A., Unw<strong>in</strong>, P. N. T., Nature 1985, 315, 474-477.[17] Dem<strong>in</strong>, V. V., Grish<strong>in</strong>and, E. V., Kovalenko, V. A., Spadar, S. N., <strong>in</strong>: Chemistry ofpeptides andprote<strong>in</strong>s, Vol. 3, Ovch<strong>in</strong>nikov, Y. A., Voelter, W., Bayer, E., Ivanov, V. T.,(eds.), W. de Gruyter and Co., Berl<strong>in</strong>, Germany, 1986, pp. 363-370.[18] Chappel, J. B., Crofts, A. R., Biochem. L 1965, 95, 393-402.[19] Pressman, B. C., Proc. Natl. Acad. Sci. USA 1965, 53, 1076-1080.[20] Hladky, S. B., Haydon, D. A., Biochim. Biophys. Acta 1972, 274, 294-312.[21] Hladky, S. B., Haydon, D. A., Curr. Top. Membr. l’kansp. 1984, 21, 327-372.


6 Theory of Thansport <strong>in</strong> Ion Channels 167[22] Dani, J. A., Levitt, D., Biophys. J. 1981, 35, 501-508.[23] Rosenberg, P. A., F<strong>in</strong>kelste<strong>in</strong>, A., J. Gen. Physiol. 1978, 72, 327-340.[24] F<strong>in</strong>kelste<strong>in</strong>, A., Andersen, 0. S., J. Membr. Biol. 1981, 59, 155-171.[25] Andersen, 0. S., Annu. Rev. Physiol. 1984, 46, 531-548.[26] Andersen, 0. S., Procopio, J., Acta Physiol. Suppl. 1980, 481, 27-35.[27] Becker, M. D., Koeppe 11, R. E., Andersen, 0. S., Biophys. J. 1992, 62, 25-27.[28] Mackay, D. H., Berens, P. H., Wilson, K. R., Biophys. J. 1983, 46, 229-248.[29] Kim, K. S., Clementi, E., J Am. Chem. SOC. 1985, 107, 5504-5513.[30] Urry, D. W., Venkatachalam, C. M., J. Comp. Chem. 1983, 4, 461-469.[31] Chiu, S. W., Subramaniam, S., Jakobsson, E., McCammon, J. A., Biophys. J. 1989, 56,253 -261.[32] Chiu, S. W., Jakobsson, E., Subramaniam, S., McCammon, J. A., Biophys. J. 1991, 60,273-285.[33] Etchebest, C., Pullman, A., J. Biomol. Struct. &Dun. 1986, 3, 805-825.[34] Pullman, A., Q. Rev. Biophys. 1987, 20, 173-200.[35] Jordan, P. C., Transport through membranes: Carriers, channels andpumps, A. Pullmanet al. (eds.), Kluwer Academic Publisher, 1988, pp. 237-251.[36] Aqvist, J., Warshel, A., Biophys. J. 1989, 56, 171-182.[37] Roux, B., Karplus, M., Biophys. J. 1988, 53, 297-309.[38] Roux, B., Karplus, M., Biophys. .I 1991, 59, 961-981.[39] Roux, B., Karplus, M., J. Phys. Chem. 1991, 95, 4856-4868.[40] Roux, B., Karplus, M., J. Am. Chem. SOC. 1993, 115, 3250-3262.[41] Urry, D. W., Proc. Natl. Acad. Sci. USA 1971, 68, 672-676.[42] Urry, D. W., Walker, J. T., Trapane, T. L., L Membr. Biol. 1982, 69, 225-231.[43] Urry, D. W., Prasad, K. U., Trapane, T. L., Proc. Natl. Acad. Sci. USA 1982, 79,390- 394.[44] Urry, D. W., Trapane, T. L., Prasad, K. U., Science 1983, 221, 1064-1067.[45] Arseniev, A. S., Bystrov, V. F., Ivanov, T. V., Ovch<strong>in</strong>nikov, Y. A., FEBS Lett. 1985, 186,168- 174.[46] Bystrov, V. F., Arseniev, A. S., Tetrahedron 1988, 44, 925-940.[47] Cornell, B. A., Separovic, F., Baldassi, A. J., Smith, R., Biophys. J. 1988, 53, 67-76.[48] Smith, R., Thomas, D. E., Separovic, F., Atk<strong>in</strong>s, A. R., Cornell, B. A., Biophys. J; 1989,56, 307-314.[49] Nicholson, L. K., LoGrasso, P. V., Cross T, A., J. Am. Chem SOC. 1989, Ill, 400-401.[50] Nicholson, L. K., Cross, T. A., Biochem. 1989, 28, 9379-9385.[51] Wooley, G. A., Wallace, B. A., J. Membr. Biol. 1992, 129, 109-136.[52] Lauger, P., Biochim. Biophys. Acta 1973, 311, 423-441.[53] Levitt, D. G., Ann. Rev. Biophys. Chem. 1987, IS, 29-57.[54] Furois-Corb<strong>in</strong> S., Pullman, A., Biochim. Biophys. Acta 1989, 984, 339-350.[55] Furois-Corb<strong>in</strong>, S., Pullman, A., Biophys. Chem. 1991, 39, 153 - 159.[56] Brooks 111, C. L., Karplus, M., Pettitt, B. M., Advances <strong>in</strong> Chemical Physics Vol. LXXI,Prigog<strong>in</strong>e, J., Rice, S. A. (eds.), John Wiley & Sons, New York, 1988.[57] Helfand, E., Phys. Rev. 1960, 119, 1-9.[58] Cooper, K. E., Jakobsson, E., Wolynes, P. G., Prog. Biophys. Mol. Biol. 1985,46, 51-96.[59] The net charge per unit area, Qhet, necessary to generate the transmembrane potential isAIJ/E~/M, where cM and AX are the dielectric constant and the thickness of thehydrocarbon region. For a typical potential of 100 millivolts the net charge per area isapproximately one unit charge per square of (25 nm)2. The charge imbalance is distributedwith<strong>in</strong> a distance comparable to the Debye length from the membrane, i. e., on theorder of (- 0.9 nm). S<strong>in</strong>ce a physiological concentration of KC1 (150 mM) represents ap-


168 Benoft Rouxproximately one Kf cation and one C1- anion per volume of (2.2 nm)3, the Nernstpotential is caused by a strik<strong>in</strong>gly small accumulation of net charge relative to the averagedensity. Reproduc<strong>in</strong>g the membrane voltage with a detailed simulation would require aprohibitively large number of counter ions.[60] Hagler, A. T., Huler, E., Lifson, S., J. Am. Chem. SOC. 1976, 96, 5319-5335.[61] Burkert, U., All<strong>in</strong>ger, N. L., Molecular Mechanics, American Chemical Society, Wash<strong>in</strong>gton,D. C., 1982.[62] Momany, F. A., McGuire, R. F., Burgess, A. W., Scheraga, H. A., J. Am. Chem. SOC.1975, 79, 2361-2381.[63] Jorgensen, W. L., Tirado-Rives, J., J. Am. Chem. SOC. 1988, 110, 1657-1666.[64] We<strong>in</strong>er, S. J., Kollman, P. A., Case, D. A., S<strong>in</strong>gh, U. C., Ghio, C., Alagona, G., ProfetaJr., S., We<strong>in</strong>er, P., J. Am. Chem. Soc. 1984, 106, 765-784.[65] Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J., Swam<strong>in</strong>athan, S., Karplus,M., J. Comput. Chem. 1983, 4, 187.[66] DiidiC, I., Kebarle, P., J. Phys. Chem. 1970, 74, 1466-1474.[67] Mackerell, A. D., Karplus, M., J. Phys. Chem. 1991, 95, 10559-10560.[68] Reiher 111, W. E., Theoretical studies of hydrogen bond<strong>in</strong>g, Ph. D. Thesis, HarvardUniversity. Cambridge, Massachusetts, 1985.[69] Jorgensen, W. L., Impey, R. W., Chandrasekhar, J., Madura, J. D., Kle<strong>in</strong>, M. L., J.Chem. Phys. 1983, 79, 926-935.[70] Goodfellow, J. M., Proc. Natl. Acad. Sci. USA 1982, 79, 4977.[71] H<strong>in</strong>ton, J. F., Fernandez, J. Q., Shungu, D. C., Whaley, W. L., Koeppe 11, R. E., Millett,F. S., Biophys. J. 1988, 54, 527-533.[72] B<strong>in</strong>kley, J. S., Frisch, M., Krishnan, R., DeFrees, D. J., Schlegel, H. B., Whiteside, R.A., Fluder, E., Seeger, R., Pople, J. A., Gaussian 82, Carnegie-Melon University QuantumChemistry Publish<strong>in</strong>g Unit, Pittsburgh PA 1982.[73] Frisch, M. J., B<strong>in</strong>kley, J. S., Schlegel, H. B., Raghavachari, K., Melius, C. F., Mart<strong>in</strong>,R. L., Steward, J. J. P., Bobrowicz, F. W., Rohlf<strong>in</strong>g, C. M., Kahn, L. R., Defrees, D.J., Seeger, R., Whiteside, R. A., Fox, D. J., Fleuder, E. M., Pople, J. A., Gaussian 86,Carnegie-Melon University, Quantum Chemistry Publish<strong>in</strong>g Unit, Pittsburgh PA 1986.[74] Hariharan, P. C., Pople, J. A., Theor. Chim. Acta 1973, 23, 213-222.[75] Dyke, T. R., Mack, K. R., Muenter, J. S., J. Chem. Phys. 1977, 66, 498-510.[76] Curtiss, L. A., Frurip, D. J., Blander, M., J. Chem. Phys. 1979, 71, 2703-2711.[77] LoGrasso, P. V., Nicholson, L. K., Cross, T. A., J. Am. Chem. SOC. 1989, Z11, 1910-1912.[78] McCammon, J. A., Northrup, S. H., Karplus, M., Levy, R. M., Biopolymers 1980, 19,2033 -2045.[79] McQuarrie, D. A., Statistical Mechanics, Harper and Row, New York, 1976.[80] Zwanzig, R. W., J. Chem. Phys. 1954, 22, 1420-1426.[81] Tobias, D. J., Brooks 111, C. L., Chem. Phys. Left. 1987, 142, 472-476.[82] Eisenman, G., Horn, R., J. Membr. Biol. 1983, 76, 197-225.[83] Vertenste<strong>in</strong>, M., Ronis, D., J. Chem. Phys. 1986, 85, 1628-1649.[84] Glasstone, S., Laidler, K. J., Eyr<strong>in</strong>g, H., Theory of rate processes, McGraw-Hill, NewYork, 1941.[85] In transition state, the pre-exponential factor is F,=V-[S e-lW (@-W(xw)l'kBT&]-',where m is the mass of the ion and x, is the'position of the energy well <strong>in</strong> W(x). Thefamiliar expression, where Fp is related to the oscillation frequency at the bottom of thereactant well, is obta<strong>in</strong>ed by approximat<strong>in</strong>g the <strong>in</strong>tegral over the reactant well with aquadratic expansion around x,.1861 Chandler, D., J. Chem. Phys. 1978, 68, 2959-2970.


6 Theory of Transport <strong>in</strong> Ion Channels 169[87] Chandler, D., Introduction to Modern Statistical Mechanics, Oxford University Press,1987.[88] Berne, B. J., <strong>in</strong>: Multiple Time Scales, Academic Press, New York, 1985, pp. 419-436.[89] Lauger, P., Stephan, W., Frehland, E., Biochim. Biophys. Acta 1980, 602, 167-180.[90] Kramers, H. A., Physica 1940, 7, 284-304.[91] Bergsma, J. P., Reimers, J. R., Wilson, K. R., Hynes, J. T., 1 Chem. Phys. 1986, 85,5625 - 5643.[92] Berne, B. J., Tuckerman, M., Straub, J., Bug, A. L. R., 1 Chem. Phys. 1990, 92,5084- 5095.[93] Wilson, M. A., Pohorille, A., Pratt, L. R., 1 Chem. Phys. 1985, 83, 5832-5836.[94] Hynes, J. T., <strong>in</strong>: Theory of chemical reaction dynamics, Baer, M. (ed.), CRC Press, BocaRaton, FL, USA, 1985, pp. 171-235.[95] Grote, R. F., Hynes, J. T., J Chem. Phys. 1980, 73, 2715-2732.[96] Meyer, M., Pontikis, V., (eds.) Proceed<strong>in</strong>gs of the NATO Advanced Study Institute onComputer Simulation <strong>in</strong> Material Science: Interatomic Potentials, Simulations Techniquesand Applications. Kluwer Academic Press, 1991.[97] Mazet, J. L., Andersen, 0. S., Koeppe 11, R. E., Biophys. 1 1984, 45, 263-276.[98] Koeppe 11, R. E., Andersen, 0. S., Maddock, A. K., <strong>in</strong>: Transport through membranes:Carriers, channels and pumps, Pullman, A. et al. (eds.), Kluwer Academic Publisher,1988, 133-145.[99] Durk<strong>in</strong>, J. T., Koeppe 11, R. E., Andersen, 0. S., J Mol. Biol. 1990, 211, 221-234.[loo] Stankovic, C. J., He<strong>in</strong>emann, S. H., Delf<strong>in</strong>o, J. M., Sigworth, F. J., Sigworth, Schreiber,S. L., Science 1989, 214, 813-817.[loll Stankovic, C. J., He<strong>in</strong>emann, S. H., Schreiber, S. L., J Am. Chem. SOC. 1990, 112,3702- 3704.[lo21 Crouzy, S., Woolf, T. B., Roux, B., Biophys. 1 1994, 67, 1370-1386.[lo31 Woolf, T., Roux, B., Proc. Natl. Acad. Sci. USA, 1994, 92 11631-11635.[lo41 Chung, L. A., Lear, J. D., deGrado, W. F., Biochemistry 1992, 31, 6608-6616.[lo51 Tomich, J. M., Grove, A., Montal, M., Proc. Natl. Acad. Sci. USA 1991, 88, 6418-6422.[lo61 Madison, V., Oiki, S., Montal, M., Prote<strong>in</strong> Struc. Funct. Genet. 1990, 8, 226-236.


Computer Modell<strong>in</strong>g <strong>in</strong> Molecular BiologyEdited by Julia M. GoodfellowOVCH Verlagsgesellschaft mbH, 19957 Molecular Modell<strong>in</strong>g and Simulationsof Major Histocompatibility ComplexClass I Prote<strong>in</strong>-Peptide InteractionsChristopher .I Thorpe and David S. MossLaboratory of Molecular Biology, Department of Crystallography,Birkbeck College, Malet Street, London WClE 7HX, EnglandContents7.1 Introduction ...................................................... 1727.2 The Structure of the MHC Molecule.. ............................... 1787.3The Structure of the Peptide <strong>in</strong> the Groove ...........................1857.4Rationale Used for the Modell<strong>in</strong>g and Simulation of MHC Class IMolecules ........................................................ 1887.5 General Pr<strong>in</strong>ciples of Modell<strong>in</strong>g MHC Class I-Peptide Interactions ...... 1897.6 Modell<strong>in</strong>g and Simulation of an Influenza Virus Peptide with theHuman MHC Class I Molecule HLA-Aw68.. ......................... 1937.7 Modell<strong>in</strong>g of an Epste<strong>in</strong> Barr Virus Nuclear Antigen Peptide withHLA-B27 Sub-Qpe Molecules ....................................... 1987.8 Conclusions.. ..................................................... 210References ........................................................ 21 1


172 Christopher J Thorpe and David S. Moss7.1 IntroductionThe Major Histocompatibility Complex (MHC) is a gene cluster, which is to datethe most polymorphic section of the genome to have been encountered. The classI and class I1 molecules of the MHC perform a pivotal role <strong>in</strong> the cellular or T-cellmediated immune response. The function of any immune system is selfhon-selfrecognition to defeat <strong>in</strong>vasion by pathogens. The immune systems of vertebrates aredivided <strong>in</strong>to two branches, humoral and cellular. The humoral arm of the immunesystem <strong>in</strong>volves antibody response to pathogens <strong>in</strong> the extracellular space and thecellular arm encompasses the response to <strong>in</strong>tracellular pathogens by T-cells andMHC molecules. The function of the MHC molecules <strong>in</strong> the cellular immune systemis to present antigen <strong>in</strong> the form of a processed peptide to T-lymphocytes restrictedfor the comb<strong>in</strong>ation of MHC molecule and peptide [l].The MHC was discovered <strong>in</strong> 1936 by Peter Gorer dur<strong>in</strong>g early research <strong>in</strong>to thephenomenon of tumor rejection <strong>in</strong> mice [2, 31. As a consequence of the nature ofits discovery MHC is somewhat anachronistically named by the term Histocompatibility,as it was considered to be the factor that determ<strong>in</strong>ed tissue type compatibility.Tissue <strong>in</strong>compatibility is however merely an extreme case of selfhon-selfrecognition, and one for which the MHC was not naturally <strong>in</strong>tended. However MHCevolution has produced a highly specific system that may precisely recognise an antigenwhen complexed to a specific MHC molecule. Thus if the donor is not of theexact tissue haplotype of the recipient, the T-cell receptors, whose specificity is solelyfor the MHC molecules of the host, will recognise the MHC molecules of the donoras non-self and start an immune response aga<strong>in</strong>st any tissues bear<strong>in</strong>g them. Thisphenomenon is termed allorecognition. Graft rejection is common due to twofeatures that make the cellular immune system highly efficient; the class I moleculesare highly ubiquitous <strong>in</strong> their tissue distribution, and are highly allelic with<strong>in</strong> apopulation. Thus the chance of f<strong>in</strong>d<strong>in</strong>g an exact haplotype match apart from sibl<strong>in</strong>gsis unlikely.S<strong>in</strong>ce structural data for the class I1 molecule has only recently been published[4], the focus of this review will be upon the class I molecule and its <strong>in</strong>teraction withpeptide. Recent crystallographic data on the structure of the class I1 moleculedemonstrates that it has a high structural similarity to the class I molecule, butcrystals of a class I1 complex with a s<strong>in</strong>gle antigenic peptide show that the class I1molecule has an entirely different mode of peptide chelation to that of class I (LarryStern, personal communication). The <strong>in</strong>teraction between peptides and MHC classI molecules has been well characterised by X-ray crystallography [5-71, but the vastdiversity of MHC class I molecules and the universe of peptides of different lengthand sequence that may be bound by any one MHC class I allele [8] allows great scopefor modell<strong>in</strong>g and simulations.


7 Major Histocompatibility Complex Class I Prote<strong>in</strong>-Peptide Interactions 173The class I molecules of the MHC survey prote<strong>in</strong> synthesis <strong>in</strong> the cytosol as ameans of signall<strong>in</strong>g cell subversion by viral pathogens (Figure 7-1). Viral prote<strong>in</strong>ssynthesized <strong>in</strong> the cytosol are proteolytically degraded and are actively transportedby the TAP molecules <strong>in</strong>to he endoplasmic reticulum (ER). The TAP molecules aremembers are members of the ATP b<strong>in</strong>d<strong>in</strong>g cassette super-family and are related tomolecules such as the multiple drug resistance transporters [9]. The peptide b<strong>in</strong>ds tonascent class I molecules <strong>in</strong> the ER, and the tripartite assembly of MHC class I heavycha<strong>in</strong>, &microglobul<strong>in</strong> and peptide (Figure 7-2) egresses to the cell surface via thedefault pathway to be presented to the cytotoxic-T-lymphocyte population (CTL), asa prelude to the kill<strong>in</strong>g of the <strong>in</strong>fected cell (see Figure 7-1). An excess of MHC class IFigure 7-1. Diagram to represent the <strong>in</strong>tracellular traffick<strong>in</strong>g of peptides and loaded MHCclass I molecules. The peptide fragments are derived from antigenic prote<strong>in</strong>s <strong>in</strong> the cytosol byproteolysis (box labelled P). The peptide is then transported <strong>in</strong>to the endoplasmic reticulum(ER) by the TAP transporters. With<strong>in</strong> the ER the three components of the nascent MHC classI molecule come together and assemble (see Figure 7-2). The folded MHC class I moleculethen egresses from the ER through the default pathway to the cell surface where it is expressedwith mature sugar moieties and replete with the antigenic peptide for presentation to theCD8' cytolytic T-lymphocytes.


174 Christouher J. Thorue and David S. MossFigure 7-2. Diagrammatic representation of the fold<strong>in</strong>g pathway of the MHC class I heavycha<strong>in</strong> (black) with peptide (white) and the light cha<strong>in</strong>, p2-microglobul<strong>in</strong> (grey). Both possiblefold<strong>in</strong>g pathways have been shown, with peptide bound second <strong>in</strong> the top pathway and first<strong>in</strong> the lower pathway. The molecule represented by the striped object which is proposed to beassociated with the MHC class I heavy cha<strong>in</strong> is the p88 chaperone molecule (calnex<strong>in</strong>). Thismolecule is believed to dissociate from the class I molecule before the fold<strong>in</strong>g of the MHCmolecule is complete. Then either peptide or ~2-microglobul<strong>in</strong> b<strong>in</strong>ds to the heavy cha<strong>in</strong> andthe molecule is presumed to fold further to form a more compact state before the additionof the f<strong>in</strong>al component of the tripartite assembly and the f<strong>in</strong>al po<strong>in</strong>t of the fold<strong>in</strong>g pathway.The sugar moiety appears to make no difference to the overall structure of the MHC class Imolecules as recomb<strong>in</strong>ant mouse class I molecules produced <strong>in</strong> <strong>in</strong>sect cells and human classI molecules over-expressed <strong>in</strong> E. coli appear to have an identical fold to the human class Imolecules purified from human lymphoblastoid cell l<strong>in</strong>es.molecules are ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong> the ER to facilitate a rapid response to pathogens. Whenthere is no viral <strong>in</strong>fection <strong>in</strong> the cell, self peptides that fit the class I molecule, producedby the degradation of <strong>in</strong>tracellular prote<strong>in</strong>s <strong>in</strong> the cytosol or ER, will bepresented on the cell surface. Aberrant T-cell response to these self-peptides isavoided by clonal deletion [lo] of T-cell receptor genes that recognise self-peptidesdur<strong>in</strong>g the process of thymic education.Due to the different chemical and steric properties of the peptide b<strong>in</strong>d<strong>in</strong>g cleftsof different alleles, the peptides presented by class I molecules show different patternsof residues required for b<strong>in</strong>d<strong>in</strong>g at positions along the peptide backbone. Thispattern of residues is termed a “motif” and where one or two residues are alwaysfound at one position <strong>in</strong> a large number of peptides this position is termed an anchor1111. One particularly strik<strong>in</strong>g motif is that of peptides presented by HLA-B27 whichhas an absolute specificity for an arg<strong>in</strong><strong>in</strong>e residue at position 2 (P2) of the peptideand a preference for either a positively charged residue straight cha<strong>in</strong> residue(Lys/Arg), or moderate sized aliphatic residue (Leu/Ile/Val) at the C-term<strong>in</strong>al posi-


7 Major HistocompatibiIity Complex Class I Prote<strong>in</strong>-Peptide Interactions 175tion (PC) of the peptide [12-151. In the HLA-B27 motif, the arg<strong>in</strong><strong>in</strong>e at P2 is termedthe primary anchor, and PC residue and other positions that demonstrate apreference for a specific group of side cha<strong>in</strong>s are termed secondary anchors(Figure 7-3).P1 P2 P3 P4 PS P6 P7 P8 P9R R F K I I R R RK I E V L E K KS L D P V D E NF V Q A P T D YG A N G A I Q LA W T R T V H AY S K S G TS W E LFIVAGFigure 7-3.(a) A <strong>molecular</strong> image of the model peptide (RRIKAITLK) from the 2.1 A resolution crystalstructure of HLA-B*2705 to display the concept of anchor residues <strong>in</strong> the MHC class Iassociated peptides. The side cha<strong>in</strong> atoms of positions that demonstrate a preference for asmall group of am<strong>in</strong>o acids are coloured <strong>in</strong> mid-grey (P3 and P9) and are termed secondaryanchors. The primary anchor <strong>in</strong> this allele (P2: coloured black) is unusual <strong>in</strong> hav<strong>in</strong>g aspecificity for only one am<strong>in</strong>o acid: arg<strong>in</strong><strong>in</strong>e. Co-ord<strong>in</strong>ates k<strong>in</strong>dly provided by Dean Madden,Joan Gorga, Jack Strom<strong>in</strong>ger & Don Wiley.(b) Motif data for HLA-B*2705 demonstrat<strong>in</strong>g the totally conserved arg<strong>in</strong><strong>in</strong>e side cha<strong>in</strong> at P2,the predom<strong>in</strong>ance of hydrophobic side cha<strong>in</strong>s at P3, and the dichotomy of preferences at thesecondary anchor at P9, which prefers both long aliphatic positively charged residues andsmall hydrophobic residues.


176 Christopher J. Thorpe and David S. MossThe class I molecule requires peptide to complete its fold<strong>in</strong>g, and <strong>in</strong> general classI molecules are only observed <strong>in</strong> the cell surface loaded with peptide. However at lowtemperatures empty class I molecules have been observed on the cell surface suggest<strong>in</strong>gthat empty dimers may be stabilised by low temperatures [16]. It can be immediatelyobserved that this emergence of empty class I molecules <strong>in</strong> the cold is notthe typical <strong>in</strong> vivo situation.The peptide mediated fold<strong>in</strong>g and cell surface expression of the class I moleculeis <strong>in</strong>exorably l<strong>in</strong>ked with the subtle effect of transporter allelisms as demonstratedby the cim (class I modifier) phenomenon described by the group of JonathanHoward, where the Rat class I RT-lAa molecule has reduced surface expressionwhen transporters of a different cim haplotype are transfected and an apparently differenttype of peptide is transported [17]. A similar phenomenon has recently beenobserved <strong>in</strong> the case of the HLA-B5/B35 family where a difference of eight am<strong>in</strong>oacids <strong>in</strong> the a2 doma<strong>in</strong> determ<strong>in</strong>es the difference between a fast efficient assemblyphenotype, such as HLA-B35/B53, or a slow <strong>in</strong>efficient assembly phenotype, HLA-B51/B52/B78 (Andrew McMichael, personal communication). All of these alleleshave highly similar motifs and identical anchor residues. It is considered that a slowassembl<strong>in</strong>g phenotype may render HLA-B51 susceptible to transporter mismatch<strong>in</strong>gand this hypothesis is partially demonstrated by the <strong>in</strong>ability of HLA-B51 specificCTL to recognise different B-cell l<strong>in</strong>es express<strong>in</strong>g HLA-B51 molecules of identicalsequence. It has been observed that both of the TAP1 and TAP2 transporters <strong>in</strong>humans are allelic but <strong>in</strong> an ethnically heterogeneous population that no dist<strong>in</strong>ctpair<strong>in</strong>gs could be observed which had been ma<strong>in</strong>ta<strong>in</strong>ed together (John Trowsdalepersonal communication). This suggests that transporter allelism, although important<strong>in</strong> determ<strong>in</strong><strong>in</strong>g the classes of peptides which the class I molecules present [18]do not appear to be genetically l<strong>in</strong>ked to class I allelisms <strong>in</strong> humans as was first suggested.Orig<strong>in</strong>ally from f<strong>in</strong>d<strong>in</strong>gs <strong>in</strong> the cim system <strong>in</strong> rats the transporters weredeemed to f<strong>in</strong>e-tune the specificity of b<strong>in</strong>d<strong>in</strong>g of the class I molecules. In the f<strong>in</strong>emodel the MHC class I molecules and the TAP molecules would need to co-evolve,and to date <strong>in</strong> humans or mice there is no direct evidence for this.Due to the diverse nature of the peptide populations that can be derived fromviral genomes, moreover due to the relatively fast evolution of viruses, the MHCclass I genes are under a high selective pressure. Before the discovery of any of theMHC gene or am<strong>in</strong>o acid sequences J. B. S. Haldane <strong>in</strong> a typically visionary andsynthetic statement suggested that disease-resistance systems, such as the MHC,should devise not only polymorphisms but also mechanisms by which allelic noveltycould be generated at a high rate 1191. The MHC has a high speed of evolution andappears from the sequence evidence to have more than one mode of evolution. Inthe HLA-B locus there is strong evidence for evolution by way of the mechanism ofsegmental exchange. Recent data collected from several isolated Amer<strong>in</strong>dian tribeseloquently displays the segmental exchange between alleles with<strong>in</strong> a populationlead<strong>in</strong>g to the generation of new alleles that co-exist <strong>in</strong> the population with the


7 Maior Histocompatibility Complex Class I Prote<strong>in</strong>-Peptide Interactions 177“parent” alleles [20, 211. The Americas were populated relatively recently by thecross<strong>in</strong>g of the paleo-Indian populations from the Asian land mass via the Behr<strong>in</strong>gland bridge approximately 11 000 to 40000 years ago. Analysis of North Americantribal units and Eskimo populations, which would have been derived from similarmongoloid stock, has demonstrated that none of the new alleles are present <strong>in</strong> theirpopulations but that similar “start<strong>in</strong>g” haplotypes can be observed. This suggeststhat the MHC molecules of these tribes with which there has been negligible admixtureof the Old World gene pool have evolved rapidly to present peptides derivedfrom pathogens restricted to their habitat. Modell<strong>in</strong>g of these recomb<strong>in</strong>ant MHCmolecules can give some <strong>in</strong>sight <strong>in</strong>to the reasons why certa<strong>in</strong> gene conversion eventshave occurred by provid<strong>in</strong>g three dimensional <strong>in</strong>formation about their peptide b<strong>in</strong>d<strong>in</strong>gpreferences. In addition the predom<strong>in</strong>ance of certa<strong>in</strong> sub-type alleles <strong>in</strong> ethnicgroups may lead to <strong>in</strong>formation, on the scale of peptide b<strong>in</strong>d<strong>in</strong>g, about differentialdisease susceptibility between ethnic groups. Modell<strong>in</strong>g of known viral epitopes betweensub-type molecules can give <strong>in</strong>timate details of the forces of peptide selectionby MHC molecules.In the majority of cases the uptake of one MHC class I molecule by the wholepopulation would be detrimental to the function of the immune system of thepopulation. In one case however there does appear to be a dom<strong>in</strong>ance of one alleleoccurr<strong>in</strong>g due to selective pressures from one <strong>in</strong>fective agent. HLA-B53 affords protectionfrom falciparum malaria by be<strong>in</strong>g able to present CTL epitopes derived fromthe liver-stage-specific antigen-1 LSA-1 of the <strong>in</strong>fective agent Plamodium fakiparum[22]. For the past quarter of a century the protective or causative associations ofdiseases with certa<strong>in</strong> MHC molecules have been postulated from genetic analysis.However <strong>in</strong> few diseases has the causal l<strong>in</strong>k between the HLA molecule and thephenomenon observed been characterised or understood. If one is to attempt to performa manipulation of an immune response by the use of an immunis<strong>in</strong>g vacc<strong>in</strong>e,one must first have an idea of the function of the antigenic structures which arebe<strong>in</strong>g used or mimicked <strong>in</strong> the vacc<strong>in</strong>e. Recent research <strong>in</strong>to the propensity of Gambian<strong>in</strong>dividuals to ga<strong>in</strong> immunity to severe falciparum or pernicious malaria hasdemonstrated a protective l<strong>in</strong>k between the disease and African haplotypes whichconta<strong>in</strong> HLA-B53. In pernicious malaria, caused by Plasmodium falciparum, the <strong>in</strong>fectedred blood cells (RBCs) develop strange knob-like surface protrusions whichfacilitate the adhesion of the <strong>in</strong>fected RBCs to endothelial tissues of the circulatorysystem. The resultant occlusion of the vascular system causes <strong>in</strong> severe damageprimarily to the kidneys, liver, bra<strong>in</strong> and gastro<strong>in</strong>test<strong>in</strong>al tract. RBCs do not expressclass I molecules on their surfaces and thus are not susceptible to CTL responsesaga<strong>in</strong>st pathogens. The alternative erythrotic stage targets for CTL action aremacrophages which have <strong>in</strong>gested malarial particles. However the removal by CTLaction of <strong>in</strong>fected macrophages, which are themselves part of the immune system,be<strong>in</strong>g one group of the MHC class I1 bear<strong>in</strong>g antigen present<strong>in</strong>g cells <strong>in</strong> the T-cellmediated immune response, would be a counter-productive measure. Therefore the


178 Christopher J Thorpe and David S. Mossmost efficacious po<strong>in</strong>t at which an immune response could be mounted aga<strong>in</strong>st I!falciparum would be at the pre-erythrotic stage. Fortunately for Gambian <strong>in</strong>dividualscarry<strong>in</strong>g the HLA-B53 gene this is the po<strong>in</strong>t where CTL mediated attack on thepathogens occurs by the presentation of liver-stage antigens by HLA-B53. The highlysimilar molecule HLA-B35 which also belongs to the HLA-B5 Creg (Cross-React<strong>in</strong>gGroup) family has an extremely similar though slightly more restrictive motif, whichdiffers at the PC end of the peptide to the motif of HLA-B53 and consequentlyHLA-B35 cannot present the same liver stage antigen peptides which give protectionto the HLA-B53 bear<strong>in</strong>g <strong>in</strong>dividuals. Thus modell<strong>in</strong>g and comparison of these twomolecules will give us <strong>in</strong>formation of how MHC polymorphisms affect diseasesusceptibility <strong>in</strong> <strong>in</strong>dividuals on the scale of peptide b<strong>in</strong>d<strong>in</strong>g.7.2 The Structure of the MHC MoleculeThe structure of the class I molecule is one of the pr<strong>in</strong>cipal icons of <strong>molecular</strong> immunology.The class I heavy cha<strong>in</strong> comprises of three extracellular doma<strong>in</strong>s, a segment,termed the l<strong>in</strong>k<strong>in</strong>g peptide, which separates the extracellular doma<strong>in</strong>s from atransmembrane region, the transmembrane region itself and a cytoplasmic tail. Thelight cha<strong>in</strong>, P2-microglobul<strong>in</strong>, is encoded outside the MHC, typically on a differentchromosome. Structural data exists for the extracellular portion of the class Imolecule which encompasses the al, a2, a3 doma<strong>in</strong>s and /12-microglobul<strong>in</strong>. Thel<strong>in</strong>k<strong>in</strong>g peptide, transmembrane region and cytoplasmic tail are absent from all theclass I crystallographic structures. The extracellular portion of the molecule may effectivelybe divided <strong>in</strong>to two parts ; the membrane distal antigen b<strong>in</strong>d<strong>in</strong>g doma<strong>in</strong>s,a1 and a2, and the membrane proximal doma<strong>in</strong>s, a3 and P,-microglobul<strong>in</strong>, whichare immunoglobul<strong>in</strong>-like <strong>in</strong> their nature. Of the membrane proximal doma<strong>in</strong>s, a3,has a biochemical function besides its structural function, by conta<strong>in</strong><strong>in</strong>g the b<strong>in</strong>d<strong>in</strong>gsite for the CD8 cell surface accessory molecule, which acts as a coreceptor with theT-cell receptor (TCR) for the class I molecule [23].To date the structures of three human leukocyte antigen (HLA) class I alleles havebeen resolved by X-ray crystallography and cryo-crystallography <strong>in</strong> the laboratory ofDon Wiley at the Howard Hughes Medical Institute at Harvard. The structures ofHLA-A2 [24-271 (HLA-A *0201), HLA-Aw68 [28-301 (formerly HLA-A28, HLA-A*6801), and HLA-B27 [3, 311 (HLA-B *2705) at various resolutions have enhancedour knowledge of the structure of the HLA molecule and its <strong>in</strong>teraction with peptide.In addition the structure of the mouse class I molecule H-2Kb has been solved withtwo peptides. One complex with an octamer peptide derived from the VesicularStomatitis virus Nucleoprote<strong>in</strong> was solved concurrently at the laboratories of StanNathenson [32], and Ian Wilson [33]. A second structure of H-2Kb with a nonamer


7 Major Histocompatibility Complex Class I Prote<strong>in</strong>-Peptide Interactions 179Figure 7-4 Figure 7-5Figure 7-4. Molecular model for the <strong>in</strong>teraction of HLA-A*0201 with peptide. The peptide,shown as a p<strong>in</strong>k tube with side cha<strong>in</strong>s <strong>in</strong> ball an stick representation, is sited <strong>in</strong> the antigenb<strong>in</strong>d<strong>in</strong>g cleft between the a-helices of the a1 and a2 doma<strong>in</strong>s (coloured red and green respectively).The doma<strong>in</strong>s coloured dark blue and light blue are a3 and P2-miroglobul<strong>in</strong> respectivelyand are membrane proximal <strong>in</strong> their orientation to the antigen present<strong>in</strong>g cell (APC).Both the N- and C-term<strong>in</strong>i are heavily buried with<strong>in</strong> the proteiy. In the 2.1 A structure ofHLA-B*2705 it has been demonstrated that 48% of the 2003 A2 of buried surface of theRRIKAITLK model peptide would be buried by the chelation of alan<strong>in</strong>e residues from Pl-P3and P8-P9 with no residues built <strong>in</strong>to positions P4-P7. This buried surface was <strong>in</strong>creased to57% by the substitution of AlaP2 to ArgP2. This suggests that the predom<strong>in</strong>ant direct <strong>in</strong>teractionsare at the term<strong>in</strong>i and that the central bulge of the peptide is raised on a solvent bedto maximise its contact with the T-cell receptor (TCR).(Colour illustration see page XV).Figure 7-5. Side view of the model of HLA-A*0201 with peptide show<strong>in</strong>g the asymmetry ofthe molecule. In the immunoglobul<strong>in</strong>-like a3 and P2-microglobul<strong>in</strong> doma<strong>in</strong>s the asymmetryof the pack<strong>in</strong>g gives rise to an atypical immunoglobul<strong>in</strong> (Ig) pair<strong>in</strong>g. A slight shift <strong>in</strong> doma<strong>in</strong>disposition, with respect to that <strong>in</strong> HLA-A2, <strong>in</strong> the membrane proximal Ig-like doma<strong>in</strong>s hasbeen observed <strong>in</strong> the structures of HLA-B27 and H-2Kb.(Colour illustration see page XV).


180 Christopher J Thorpe and David S. Mosspeptide from the Sendai virus Nucleoprote<strong>in</strong> has demonstrated the high structuralsimilarity <strong>in</strong> both MHC molecule and peptide <strong>in</strong> structures of peptides of differentlength.The view of the HLA-A2 molecule as observed <strong>in</strong> Figure 7-4, clearly shows thepeptide b<strong>in</strong>d<strong>in</strong>g site formed <strong>in</strong> the cleft that is walled by the a-helices from the a1and a2 doma<strong>in</strong>s and has the P-sheet as its floor. It can be observed <strong>in</strong> Figure 7-5 thatthe overall fold of the molecule is not symmetrical <strong>in</strong> the face that will be displayedfor recognition by the T-cell. On the f<strong>in</strong>er level the molecule is highly asymmetrical<strong>in</strong> the TCR recognition site due to the poor sequence identity between the a-helicesof the a1 and a2 doma<strong>in</strong>s, however a crude steric repulsion of any T-cell which approach<strong>in</strong>gthe MHC molecule with the wrong orientation will speed up the recognitionprocess.Some f<strong>in</strong>er observations drawn from the structures are of immense importance<strong>in</strong> perform<strong>in</strong>g and analys<strong>in</strong>g <strong>molecular</strong> simulations of MHC class I-peptide <strong>in</strong>teraction.Some of the most important and <strong>in</strong>terest<strong>in</strong>g observations can be drawn fromthe <strong>in</strong>itial structures of HLA-A2, HLA-Aw68 and HLA-B27 with collections of endogenouspeptides. The material used for these studies was derived from human celll<strong>in</strong>es, and the peptide b<strong>in</strong>d<strong>in</strong>g grooves of the crystallised molecules conta<strong>in</strong>ed collectionsof peptides that were representative of the self-peptides present <strong>in</strong> the cytosolof the cell. One strik<strong>in</strong>g feature of these structures, is that the amount of disorder<strong>in</strong> both the ma<strong>in</strong> cha<strong>in</strong> and the side cha<strong>in</strong> atoms of the molecule is very low. If onetakes the supposition that for the complex to form, both MHC molecule and peptidemust move slightly to accommodate one another then the different conformationsfor the side cha<strong>in</strong>s and ma<strong>in</strong> cha<strong>in</strong>s would be averaged out throughout the differentcomb<strong>in</strong>ations of MHC molecules and peptides present <strong>in</strong> the crystal. This would leadto the result<strong>in</strong>g “consensus” structure hav<strong>in</strong>g weak electron density, poor occupancyand high temperature factors for some of the side cha<strong>in</strong>s. However from the <strong>in</strong>itialstructures of HLA-A2 and HLA-Aw68 it was apparent that the side cha<strong>in</strong>s of theMHC molecules were dist<strong>in</strong>ctly visible <strong>in</strong> the electron density. The peptide howeverwas not clearly discernible <strong>in</strong> the additional electron density <strong>in</strong> the left and these twoobservations suggest that it is the peptide that plays the major part <strong>in</strong> adapt<strong>in</strong>g tothe environment of the MHC molecule, and that the MHC molecule makes onlysmall movements to adapt to different peptides. This supposition is borne out byanalysis of one structure aga<strong>in</strong>st the other which demonstrates that the side cha<strong>in</strong>orientations of the majority of positions are nearly identical (Figures 7-6 and 7-7).This demonstrates the <strong>in</strong>herent adaptability of the fold of the prote<strong>in</strong>. Alleles complexpeptides <strong>in</strong> highly similar manners so that the fold of the molecule is virtuallyidentical. An MHC molecule may b<strong>in</strong>d several different peptides that may, apartfrom the conserved anchor residues, have highly divergent sequences, and yet whenthe peptide is removed hypothetically from the structure there are no discernible differencesbetween f<strong>in</strong>e features, such as side cha<strong>in</strong> orientation, <strong>in</strong> the MHC molecules.In the complexation of different <strong>in</strong>hibitors or substrates by proteases, there are often


7 Major Histocompatibility Complex Class I Prote<strong>in</strong>-Peptide Interactions 181subtle but dist<strong>in</strong>ct conformational changes. In a system such as the MHC whererecognition by a third body, the T-cell receptor is required to give the system a functionality,a change <strong>in</strong> the structure of the MHC molecule may remove the responseof the T-cell or may <strong>in</strong>deed cause an aberrant response, therefore it is a sign of thehighly adaptive nature of the MHC molecule that this sort of allogenic reaction appearsnot to be the norm.This high similarity of the class I molecule between alleles and between structuresof an <strong>in</strong>dividual allele works both for and aga<strong>in</strong>st those who wish to model andsimulate these structures. The high similarity of the crystal structures aids the <strong>in</strong>itialbuild<strong>in</strong>g and optimisation of the structure. Many knowledge based modell<strong>in</strong>g programstake note of the orientation of the Ca to Cj3, and Cp to Cy bonds, where appropriate,for build<strong>in</strong>g the side cha<strong>in</strong> geometries of non-homologous residues. It canbe observed from an overlay of the residues <strong>in</strong>volved <strong>in</strong> the region surround<strong>in</strong>g theB-pocket of HLA-A2 and HLA-Aw68 and HLA-B27 that one structure may be accuratelymodelled from another (see Figures 7-6 and 7-7). Indeed the analysis of themodel for HLA-Aw68 built from the structures of HLA-A2 and HLA-B27demonstrates the fit of the model structure to its crystallographic counterpart(Figure 7-8). The analysis of RMS deviations between HLA-A2 and HLA-Aw68Figure 7-6. Comparison of residues <strong>in</strong> the region surround<strong>in</strong>g the “45” or B-pocket <strong>in</strong> theHLA-A2 (3hla:- pale-grey) and HLA-Aw68 (2hla:- dark-grey) structures. It can be readilyobserved from this view that several of the residues can be found <strong>in</strong> identical or near identicalorientations. This facilitates modell<strong>in</strong>g of as yet un-crystallised alleles with a high degree ofcerta<strong>in</strong>ty <strong>in</strong>to the accuracy of the outcome. It can be observed <strong>in</strong> both this view and the companionview (Figure 7-7) that an endo-ex0 flip has occurred <strong>in</strong> the prol<strong>in</strong>e geometry at position50.


182 Christopher J. Thorpe and David S. MossFigure 7-7. Companion comparison to Figure 7-6 of an identical region <strong>in</strong> HLA-B27 (darkgrey)aga<strong>in</strong>st HLA-A2 (pale-grey). It is particularly strik<strong>in</strong>g that, despite the highly differentenvironment of the B-pocket <strong>in</strong> HLA-B27 and the completely different class of residue bound<strong>in</strong> this pocket, position 45 of the heavy cha<strong>in</strong> has a highly similar side cha<strong>in</strong> orientation <strong>in</strong>HLA-A2, HLA-Aw68 and HLA-B27. As <strong>in</strong> Figure 7-9 the side cha<strong>in</strong>s of 51-59 and TrpSl andTrp60 are <strong>in</strong> highly homologous orientations. The side cha<strong>in</strong> of Tyr59 has an exceptionallysimilar orientation presumably due to its role <strong>in</strong> anchor<strong>in</strong>g the N-term<strong>in</strong>us of the peptide. Thelabell<strong>in</strong>g with<strong>in</strong> this figure is emboldened for HLA-B27 residues.Figure 7-8 (a-e).Legend see p. 184


7 Maior Histocomuatibilitv Cornulex Class I Prote<strong>in</strong>-Peutide <strong>in</strong>teractions 183H-2Kb-VSV8 structure.H-2Kb model 0.629H-2Kb-VSV8 structureH-2Kb-SEV9 structure0.6470.316(b)1 .oDark l<strong>in</strong>e:- RMSD calculated for superposition of whole prote<strong>in</strong>Broken l<strong>in</strong>e:- Mean RMSD09080.7(41 .oDark l<strong>in</strong>e- RMSD calculated for superposition of antigen b<strong>in</strong>d<strong>in</strong>g doma<strong>in</strong>s onlyBroken l<strong>in</strong>e:- Mean RMSD0.00.80.70 .e0.5 ----0.40.3


184 Christopher J. Thorpe and David S. Moss'.aTDark l<strong>in</strong>e:- RMSD calculated for superposition of whole prote<strong>in</strong>Broken l<strong>in</strong>e:- RMSD calculated for superposition of antigen b<strong>in</strong>d<strong>in</strong>g doma<strong>in</strong>s only0.00.80.7 -0.6 -0.0 0(elFigure 7-8 (a-e).(a) Model of HLA-Aw68, built us<strong>in</strong>g Composer and the techniques outl<strong>in</strong>ed <strong>in</strong> the text fromthe structures of HLA-B27 and HLA-A2, overlaid onto the 2.6 A crystallographic structurefor HLA-Aw68 (2hla). Only the antigen b<strong>in</strong>d<strong>in</strong>g doma<strong>in</strong>s were overlaid <strong>in</strong> this diagram. Theside cha<strong>in</strong>s of the conserved peptide b<strong>in</strong>d<strong>in</strong>g ligands and the polymorphic position 45 aredisplayed to demonstrate the highly similar nature of the model and structure. Models ofHLA-Aw68 built from the HLA-A2 crystal structure alone give better agreement to the HLA-Aw68 crystal structure than those built from both HLA-A2 and HLA-B27, due to the slightre-arrangement <strong>in</strong> secondary structural elements and doma<strong>in</strong> dispositions that occur <strong>in</strong> HLA-B27. An RMS deviation of 0.541 is observed for the Ca positions of the model and crystalstructure over all doma<strong>in</strong>s. For the antigen b<strong>in</strong>d<strong>in</strong>g doma<strong>in</strong>s alone an RMSD of 0.507 wasobserved. This compares favourably with the RMSD value of 0.599 between the crystal structuresof HLA-A2 and HLA-Aw68 and demonstrates the bias of <strong>in</strong>clud<strong>in</strong>g HLA-A2 <strong>in</strong> thedatabase of molecules for construction.(b) RMSD values for the Composer built model of the mouse molecule H-2Kb. This modelwas built from the co-ord<strong>in</strong>ates of HLA-A2, HLA-Aw68 and HLA-B27 before the publicationof the H-2Kb crystal structure. The deviation between the model and the two crystal structuresis relatively low. This suggests that the Composer built models have a high degree ofgeometrical accuracy <strong>in</strong> relat<strong>in</strong>g the fold of the molecules and the position of biochemicallyimportant residues.(c-e) Plots of RMSD values for Ca positions versus sequence number for the antigen b<strong>in</strong>d<strong>in</strong>gdoma<strong>in</strong>s of HLA-Aw68 (residues 1 - 182). It can be observed that slightly different profiles ofRMSDs are observed between the alignment of only the a1 and a2 doma<strong>in</strong>s and the alignmentof the whole prote<strong>in</strong>, suggest<strong>in</strong>g that the a3 and Pz-microglobul<strong>in</strong> doma<strong>in</strong>s are bias<strong>in</strong>g theoverlay. The observation of this characteristic lead to the use of antigen b<strong>in</strong>d<strong>in</strong>g doma<strong>in</strong>s onlywhen structurally superpos<strong>in</strong>g the model prote<strong>in</strong>s with peptide bound HLA-B27 structure totransfer the peptide and solvent molecules to the model <strong>in</strong> the modell<strong>in</strong>g studies performedat Birkbeck.


7 Major Histocompatibility Complex Class I Prote<strong>in</strong>-Peptide Interactions 185demonstrates that despite the <strong>in</strong>herent bias towards the HLA-A2 structure of themodel built HLA-Aw68 structure, that it displays RMS deviations of the same orderof magnitude to HLA-A2 as the crystallographic structure. The RMS deviations forthe H-2Kb model <strong>in</strong> comparison to the two H-2Kb crystallographic structuresdisplay that models of MHC molecules of other loci or species may be built whichare an accurate reflection of the true structure.7.3 The Structure of the Peptide <strong>in</strong> the GrooveIn order to simulate the complexed state of MHC class I molecule and peptide accuratelywe must fully understand the role played by the peptide, the functionalityof the complex, and the shape of the peptide observed <strong>in</strong> crystal structures to date.The first comment which may be made about the structure of the peptide bound tothe class I molecule is that it is, <strong>in</strong> essence, a l<strong>in</strong>ear structure. The ~/I,Y torsion anglesof the peptide backbone lie largely with<strong>in</strong> the /?-sheet region of the Ramachandranplot. The ends of the peptide are heavily buried with<strong>in</strong> the MHC molecule and thecentre of the peptide bulges out of the groove. The magnitude of the bulge is apparentlya function of the length of the peptide bound; octamers are essentiallyl<strong>in</strong>ear, nonamers have a larger more pronounced bulge at P4 and P5 and presumablydecamers bulge even more.Figures 7-9 and 7-10 show the highly conserved residues which ligate the N- andC-term<strong>in</strong>i of the peptide. The majority of direct hydrogen bond<strong>in</strong>g contacts madeto the MHC molecule from the peptide are made by the term<strong>in</strong>i, with the majorityof contacts with<strong>in</strong> the central bulge of the peptide be<strong>in</strong>g mediated by water. In thismanner a certa<strong>in</strong> degree of promiscuity is afforded to the b<strong>in</strong>d<strong>in</strong>g of peptide withoutcompromis<strong>in</strong>g the structural <strong>in</strong>tegrity of the peptide-MHC <strong>in</strong>teraction. If the term<strong>in</strong>i,or residues close to them are the only parts of the epitope which need be conservedbetween peptides, a far greater number of different peptides may be boundthan if there were restrictions both close to the term<strong>in</strong>i and <strong>in</strong> the centre of the peptide.In a system where the peptide is such an <strong>in</strong>tegral part of the fold<strong>in</strong>g pathway andstability of the molecule, the structure and the accuracy of simulation of the peptidecomponent are of paramount importance. The ground state of the MHC moleculeis different from the state <strong>in</strong> which peptide is bound and cannot be accessed bycrystallography. The different structure of the ground state can be directly shown bythe <strong>in</strong>ability of conformationally dependant monoclonal antibodies to recognise differentstates with<strong>in</strong> the fold<strong>in</strong>g pathway to those aga<strong>in</strong>st which they were generated.In order to achieve the “compact” conformation, the class I molecule must b<strong>in</strong>dboth peptide and &microglobul<strong>in</strong>. However the shape of the molecule before and


186 Christopher J. Thorpe and David S. MossFigure 7-9. View of the N-term<strong>in</strong>al b<strong>in</strong>d<strong>in</strong>g ligands <strong>in</strong> the 2.1 A HLA-B*2705 structure. It canbe observed <strong>in</strong> Figure 2a of Guo et al., 1992 [30] the change from glutamic acid to asparag<strong>in</strong>eat position 63 has little bear<strong>in</strong>g on the chelation of the backbone am<strong>in</strong>e of position P2 <strong>in</strong> thepeptide.dur<strong>in</strong>g assembly, and which of these components b<strong>in</strong>ds first are two of the largestquestions still rema<strong>in</strong><strong>in</strong>g unsolved <strong>in</strong> the field of structural immunology. The lackof data on the nature of the unfolded class I molecule removes the possibility of perform<strong>in</strong>gcalculations on the free energy of b<strong>in</strong>d<strong>in</strong>g, but gives us some additional <strong>in</strong>formationabout the importance of certa<strong>in</strong> <strong>in</strong>teractions between the peptide andMHC molecule.The ability to accommodate different length peptides <strong>in</strong> the same b<strong>in</strong>d<strong>in</strong>g cleftwith no apparent conformational change <strong>in</strong> the molecule suggests that only certa<strong>in</strong>portions of the peptide are required structurally. This hypothesis can <strong>in</strong>itially be supportedby motif data, where <strong>in</strong> systems where different length peptides may be accommodatedit is only the spac<strong>in</strong>g between anchor residues that is altered. In the recentmotif data elucidated from a collection of peptides eluted from HLA-Aw68 wecan observe that between the pr<strong>in</strong>ciple anchors at P2 and PC loops of differentlength can be accommodated. The ability to <strong>in</strong>corporate different loop lengths betweenanchor<strong>in</strong>g residues confers an important advantage on any allele that can b<strong>in</strong>ddifferent length peptides. S<strong>in</strong>ce the T-cell epitopes presented by MHC molecules arederived from prote<strong>in</strong> sequences the ability to choose, without significant constra<strong>in</strong>t,


7 Major Histocompatibility Complex Class I Prote<strong>in</strong>-Peptide Interactions 181Figure 7-10. View of the C-term<strong>in</strong>al b<strong>in</strong>d<strong>in</strong>g ligands <strong>in</strong> HLA-B*2705. In the structures ofHLA-B*2705 and HLA-A*6801 Thr80 <strong>in</strong>teracts <strong>in</strong>directly with the C-term<strong>in</strong>us of the peptidevia a water molecule and ?frr123 <strong>in</strong>teracts with ?frr84. In some HLA-B locus alleles where position80 is the subject of a non-conservative threon<strong>in</strong>e to leuc<strong>in</strong>e substitution the distributionof water molecules appears from <strong>molecular</strong> dynamics studies to be perturbed (see Section7.8). In some mouse and human non-classical or medial class I MHC molecules there aresubstitutions for positions 143 and 147 that would appear to disturb this network considerably.the spac<strong>in</strong>g between two discrete residues statistically enhances the probability off<strong>in</strong>d<strong>in</strong>g an epitope with<strong>in</strong> a sequence.The structure of HLA-Aw68 with a collection of endogenous peptides at 1.9 Aclearly demonstrates that the term<strong>in</strong>i of the peptide are highly similar <strong>in</strong> peptides ofdissimilar length. In this structure clear <strong>in</strong>terpretable density is present for positionsPl-P3 and PC-1 to PC but <strong>in</strong> the central bulge of the peptide the lack of contiguouselectron density suggests that <strong>in</strong> the region of the peptide between the anchors severalconformations occur. This f<strong>in</strong>d<strong>in</strong>g is <strong>in</strong> agreement with observations which may bemade from an overlay of the octamer and nonamer peptide structures bound to theH-2Kb molecule. For these peptides the backbones for positions P1 to P3 and PC-1to PC are virtually identical. The peptides are also co-<strong>in</strong>cident at the PC-3 positionwhich is an anchor <strong>in</strong> the H-2Kb motif. Thus the only differences between the octamerand nonamer peptide is the shape of the bulge which jo<strong>in</strong>s P3 to PC-3. In theoctamer peptide this region is simply more extended than <strong>in</strong> the nonamer peptide asmight have been expected from the difference <strong>in</strong> length of the peptide.


188 Christopher J Thorpe and David S. Moss7.4 Rationale Used for the Modell<strong>in</strong>gand Simulation of MHC Class I MoleculesAt Birkbeck the follow<strong>in</strong>g observations and pr<strong>in</strong>ciples were considered when modell<strong>in</strong>gand simulat<strong>in</strong>g class I molecules and their <strong>in</strong>teraction with peptide antigen.1. The X-ray crystallographic structures of HLA-A2, HLA-Aw68 and HLA-B27display very small Ca position deviations (Figure 7-11). The modell<strong>in</strong>g andsimulations should represent this feature and the perturbation of the backboneshould be kept to a m<strong>in</strong>imum.2. The structures display a highly similar orientation for all side cha<strong>in</strong>s and a nearidentical orientation for conserved side cha<strong>in</strong>s. In addition between complexes ofdifferent peptides with the same allele the side cha<strong>in</strong> positions <strong>in</strong> the MHCmolecule are highly similar, suggest<strong>in</strong>g that the MHC molecule is <strong>in</strong>variant andthat any deviations <strong>in</strong> geometry required to complex the different peptide sidecha<strong>in</strong> characteristics should be largely accommodated by small changes <strong>in</strong> theshape of the peptide. Thus the orientation of homologous side cha<strong>in</strong>s should bereta<strong>in</strong>ed <strong>in</strong> preference to the orientation of the non-homologous side cha<strong>in</strong>s, andpreferentially there should be no side cha<strong>in</strong> movements. In simulations where noMHC side cha<strong>in</strong> or ma<strong>in</strong> cha<strong>in</strong> atoms are held rigid the deviations observed betweenthe start<strong>in</strong>g structure and the f<strong>in</strong>al structure are higher than the deviationsobserved between the crystal structures of the MHC molecules from two loci oreven from two species [34].3. The shape of the peptide and the position of the bed of water molecules beneaththe peptide are highly similar between alleles and between peptides of differentlength and sequence bound to the same allele.To achieve the simulations that are adherent to these observations, the conservedside cha<strong>in</strong>s and backbone atoms of the molecule are <strong>in</strong>itially held rigid so that thenon-homologous side cha<strong>in</strong>s of the model are the ones that are moved by theHLA-A2HLA-Aw68HLA-Aw68 0.599HLA-B27 0.876 0.807Figure 7-11. RMSD values for the X-ray crystallographic structures of HLA-A2, HLA-Aw68and HLA-B27 demonstrat<strong>in</strong>g the significantly higher values for the HLA-B locus molecule.


7 Major Histocompatibility Complex Class I Prote<strong>in</strong>-Peptide Interactions 189geometry optimisation procedure. In addition the backbone of the molecule is notperturbed dur<strong>in</strong>g the optimisation of the side cha<strong>in</strong> geometry. Initially the b<strong>in</strong>d<strong>in</strong>gof the peptide is performed with constra<strong>in</strong>ts placed on the peptide backbone withthe MHC molecule held rigid. This totally preserves the structure of the MHCmolecule, and partially preserves the structure of the peptide backbone, although likepeptides <strong>in</strong> the crystal structures the post-optimisation model peptides display smallsequence dependant changes <strong>in</strong> their backbone structure when compared to theHLA-B27 peptide which is used <strong>in</strong> all cases as the start<strong>in</strong>g structure.7.5 General Pr<strong>in</strong>ciples of Modell<strong>in</strong>gMHC Class I-Peptide InteractionsFor all of the modell<strong>in</strong>g and simulations of MHC class I-peptide <strong>in</strong>teractions performedat Birkbeck several basic pr<strong>in</strong>ciples and techniques have been applied. Inorder to be brief <strong>in</strong> the description of the models that follow, these techniques willbe covered <strong>in</strong> depth here.In cases where the molecule has been studied by X-ray crystallography, the highestresolution crystal structure available is used as the start<strong>in</strong>g structure for the modell<strong>in</strong>g.If the molecule needs to be modelled it is built from the structures of HLA-A2and HLA-Aw68 if it is an HLA-A allele and from HLA-B27 if it is an HLA-B allele.The structure of HLA-B27 showed some slight deviations from the structures ofHLA-A2 and HLA-Aw68 <strong>in</strong> doma<strong>in</strong> disposition and <strong>in</strong> loop structure, thus the HLA-A and HLA-B models are built from members of their loci. The models are builtus<strong>in</strong>g the Composer suite of programs [35-371 and are geometry optimised us<strong>in</strong>g thefollow<strong>in</strong>g procedures. Side cha<strong>in</strong>s conserved between the sequence to be modelledand the homologue are fixed <strong>in</strong> union with the backbone of the prote<strong>in</strong> andgeometry optimisation us<strong>in</strong>g the Powell algorithm [38], a torsional gradient optimiser,is performed to a convergence po<strong>in</strong>t of 0.05 RMS deviations <strong>in</strong> the totalenergy term. The restra<strong>in</strong>ts on all side cha<strong>in</strong> atoms except for tetrahedral carbonatoms are then removed and the molecule is geometry optimised to an identical convergencepo<strong>in</strong>t with constra<strong>in</strong>ts on the backbone. F<strong>in</strong>ally the side cha<strong>in</strong> atoms of themodel is released and the molecule is ref<strong>in</strong>ed to a f<strong>in</strong>al convergence po<strong>in</strong>t. If themolecule is outside of the family, that is it is from another locus or another species,the backbone is ref<strong>in</strong>ed for 100 iterations. At all stages the dielectric model describedbelow is <strong>in</strong>voked. All calculations are performed with<strong>in</strong> the SYBYL 5.5 package(Tripos Associates) on a Silicon Graphics Iris Indigo R4000 platform. The SYBYLimplementation of the Amber 3 .0~ comb<strong>in</strong>ed atom forcefield [39, 401 was usedtroughout. The dielectric model consisted of a distance dependant dielectric cut-off


Ramachandran PlotPhi (degrees)Plot statisticsResidues <strong>in</strong> most favoured regions [kB,L]Residues <strong>in</strong> additional allowed regions [gb,l,p]Residues <strong>in</strong> genemusly allowed regions [-a,-b,-1,-p]Residues <strong>in</strong> disallowed regionsNumber of non-glyc<strong>in</strong>e and non-prol<strong>in</strong>e residuesNumber of end-residuesNumber of glyc<strong>in</strong>e residues (shown as triangles)Number of prol<strong>in</strong>e residuesTotal number of residues149 92.5%11 6.8%1 0.6%0 0.0%.___161 100.0%113I.___182_____-Based on M analysis of 118 smcturur of resolution of at leart 2.0 Angstmnsand R-factor no gmler than 23%. a g wd quality model would be expectedto have over 9G% <strong>in</strong> the most favoured regims.Figure 7-12a (Legend see p. 191).


7 Major Histocompatibility Complex Class i Prote<strong>in</strong>-Peptide <strong>in</strong>teractions 191of 9 A with an E value of 4.0 to simulate solvent effects [41] with charges taken fromthe <strong>in</strong>ternal dictionary. Figure 7-12 clearly demonstrates the improvement <strong>in</strong> sidecha<strong>in</strong> geometries <strong>in</strong> the model for HLA-B53 at the different stages of the m<strong>in</strong>imisationprocedure described above.The peptides were modelled <strong>in</strong>to the clefts of the molecules built and optimisedus<strong>in</strong>g the techniques described above by superpos<strong>in</strong>g the model structure onto thecrystallographic structure of HLA-B27 at 2.1 A resolution with a collection of endogenousnonamer peptides. The peptide and associated water molecules <strong>in</strong> theb<strong>in</strong>d<strong>in</strong>g cleft could then be transferred <strong>in</strong>to te cleft of the model. Implicit watermolecules were treated as TIP3P waters. The side cha<strong>in</strong>s of the peptide were mutatedto their new sequence and the peptide side cha<strong>in</strong>s were geometry optimised us<strong>in</strong>g them<strong>in</strong>imisation procedures described above to a convergence po<strong>in</strong>t of 0.05 RMS deviations<strong>in</strong> either the energy, gradient or force terms, keep<strong>in</strong>g the MHC molecule completelyfixed and allow<strong>in</strong>g movement <strong>in</strong> the water molecules and the peptide sideStage of Bad contacts / x-1 gauche x-1 trans x-1 gauche x-1 pled x-2 transoptimisation 100 residues m<strong>in</strong>us plus(b)1 14 -1.4 -2.0 -1.0 -1.4 -1.62 3 -1.4 -1.9 -1.3 -1.6 -1.53 0 -1.8 -1.8 -1.5 -1.7 -2.04 0 -1.9 -1.9 -2.3 -2.2 -2.8.~Figure 7-12.(a) Ramachandran plot, produced us<strong>in</strong>g Procheck, for the antigen b<strong>in</strong>d<strong>in</strong>g doma<strong>in</strong>s of themodel of HLA-B*5301 built us<strong>in</strong>g Composer from the 2.1 A HLA-B27 structure. It can beobserved that models built from this high-resolution structure easily satisfy the rule def<strong>in</strong>edby Lasowski et al. that all high resolution models should have over 90% of non-glyc<strong>in</strong>eresidues <strong>in</strong> the most favourable areas [A, B, L] of the Ramachandran plot. It can also beobserved that none of the residues lie <strong>in</strong> the disallowed regions of the plot, and that onlyAsp29, a residue heavily implicated <strong>in</strong> the <strong>in</strong>terface between the membrane proximal andmembrane distal doma<strong>in</strong>s of the class I molecule, is the only residue <strong>in</strong> the generously allowed-p region.(b) Table of the different geometrical parameters analysed by Procheck at different phases ofthe m<strong>in</strong>imisation procedure. The constra<strong>in</strong>t sets were as follows: Phase 1 start<strong>in</strong>g structuredirect from Composer; Phase 2 all atoms except for non-homologous side cha<strong>in</strong> atoms constra<strong>in</strong>ed;Phase 3 all atoms except for non-tetrahedral carbon atoms <strong>in</strong> the side cha<strong>in</strong>s constra<strong>in</strong>ed;Phase 4 only backbone atoms constra<strong>in</strong>ed. The use of this slow removal of constra<strong>in</strong>tsproduces highly accurate models that have excellent geometrical characteristics <strong>in</strong> boththe ma<strong>in</strong> cha<strong>in</strong> and the side cha<strong>in</strong>. The models produced <strong>in</strong> this manner have parameterssimilar to those predicted from the Procheck dataset for 1.3 A crystal structures. Apart fromthe figure for bad contacts per 100 residues which is quoted as its absolute value, all variablesare quoted as bandwidths from the mean values def<strong>in</strong>ed <strong>in</strong> the Procheck dataset.


192 Christopher J# Thorpe and David S. Mosscha<strong>in</strong>s only. After this step the side cha<strong>in</strong>s of the MHC molecule with<strong>in</strong> 4 A of theside cha<strong>in</strong>s of the peptide, the water molecules and the peptide backbone wereallowed to move. Both of the above steps were performed us<strong>in</strong>g the dielectric model,but with a smaller distance dependant cut-off of 4 A and an E value of 1.0. Thef<strong>in</strong>ished model could then be <strong>in</strong>terrogated for contacts to the MHC molecule andpotential <strong>in</strong>teractions with the TCR.If simulations were to be performed on the system, the f<strong>in</strong>ished model producedby the methods described above was placed <strong>in</strong>to a constant pressure box which wasfilled us<strong>in</strong>g shells of random TIP3P waters to achieve a density equivalent to thatpredicted for solvent and solute. The system was geometry optimised, with periodicboundary conditions and electrostatic effects <strong>in</strong>voked with the MHC molecule andpeptide def<strong>in</strong>ed as a static aggregate, to a convergence po<strong>in</strong>t of 0.05 RMS deviations<strong>in</strong> the energy terms. An active zone was then def<strong>in</strong>ed which encompassed the peptideand any water molecules with<strong>in</strong> a 9 A sphere radiat<strong>in</strong>g from the peptide. In additiona set of atoms was def<strong>in</strong>ed that <strong>in</strong>cluded any water molecules with<strong>in</strong> an 11 A sphereradiat<strong>in</strong>g from the peptide and any part of the prote<strong>in</strong> with<strong>in</strong> 9 A of the peptide. Thisset of atoms with<strong>in</strong> the molecule were, unlike the active water set and the peptide,held static <strong>in</strong> the simulation. The complete set of static atoms and active atoms wereall taken <strong>in</strong>to account <strong>in</strong> the energy terms of the simulation. The static water solventand any rema<strong>in</strong><strong>in</strong>g water molecules <strong>in</strong> the box were only allowed to move cooperativelywith the walls of the box to relieve changes <strong>in</strong> pressure with<strong>in</strong> the system.The water molecules between 9 A and 11 A effectively acted as an ice cap be<strong>in</strong>g fixed<strong>in</strong> their positions and possess<strong>in</strong>g no velocity however due to their contribution of theenergy terms they provided an <strong>in</strong>elastic boundary to prevent the active water set boil<strong>in</strong>goff and to provide a solvent mediated restra<strong>in</strong><strong>in</strong>g <strong>in</strong>fluence on the peptide to keepit <strong>in</strong> the cleft. The system with these sets <strong>in</strong>itialised was geometry optimised to theusual convergence criteria. The sets were re-assessed follow<strong>in</strong>g convergence and nosubstructures <strong>in</strong> any of the simulations were observed to have moved between shells.Molecular dynamics was then performed on the optimised system. Temperatureand pressure were kept close to constant by connect<strong>in</strong>g the system to temperature andpressure baths at 10 fs <strong>in</strong>tervals us<strong>in</strong>g an <strong>in</strong>tegrator time step of 1 fs. The system wasequilibrated to 300 K us<strong>in</strong>g the stability of the k<strong>in</strong>etic energy and temperature curvesas a measure of the fidelity of the equilibration. Follow<strong>in</strong>g suitable equilibration thesystem was simulated and conformers collected at 1 ps <strong>in</strong>tervals. The conformersthus obta<strong>in</strong>ed could then be analysed <strong>in</strong> the context of the MHC molecule <strong>in</strong> orderto look for transient <strong>in</strong>teractions, new last<strong>in</strong>g <strong>in</strong>teractions and the movement ofwater molecules around the cleft.


7 Major Histocompatibility Complex Class I Prote<strong>in</strong>-Peptide Interactions 1937.6 Modell<strong>in</strong>g and Simulationof an Influenza Virus Peptide with theHuman MHC Class I Molecule HLA-Aw68The human class I allele HLA-Aw68 provides a good basis for <strong>molecular</strong> simulations.It presents a wide variety of peptides of differ<strong>in</strong>g lengths, and there are X-raycrystallographic structures for the molecule both with a collection of endogenouspeptides, at 2.6 A and 1.9 A resolution and with a s<strong>in</strong>gle peptide derived from the<strong>in</strong>fluenza virus nucleoprote<strong>in</strong> peptide (Np91-99) at 2.8 A resolution. The high resolutionstructure for HLA-Aw68 with different length multiple endogenous peptidesclearly demonstrates that the term<strong>in</strong>i of the peptide are <strong>in</strong> highly similar positionsregardless of the length of the central bulge. This allows for the construction of aseries of models which have the different length peptides bound to the HLA-Aw68molecule. In addition we have constructed a model of HLA-Aw68 complexed withthe flu peptide which may be used for model validation purposes by comparison tothe recently determ<strong>in</strong>ed crystallographic structure.The major epitope derived from the <strong>in</strong>fluenza virus to elicit a response from CTLwhen presented by HLA-Aw68 is a nonamer fragment (KTGGPIYKR) derived fromthe <strong>in</strong>fluenza virus nucleoprote<strong>in</strong> compris<strong>in</strong>g of residues 91-99 [42] (Np91-99). Thispeptide has recently been studied bound <strong>in</strong> the groove of HLA-Aw68 by the groupof Don Wiley us<strong>in</strong>g X-ray cryo-crystallography. The result<strong>in</strong>g peptide structure hasbeen compared to that of the model nonamer peptide built <strong>in</strong>to the observed density<strong>in</strong> HLA-B27 at 2.1 A resolution, and the deviations between the two peptide modelsappear to be limited to small sequence dependant changes <strong>in</strong> the backbones of thepeptides.The peptide model and the 15 solvent molecules <strong>in</strong> the cleft of HLA-B27 weremoved <strong>in</strong>to the start<strong>in</strong>g structure and mutated to the sequence of the <strong>in</strong>fluenza viruspeptide. The waters were redef<strong>in</strong>ed as TIP3P waters and the assembly of MHCmolecule peptide and waters was geometry optimised. The SYBYL 5.5 implementationof the Kollman united atom force field was used with polar atoms def<strong>in</strong>ed astheir Kollman all atom counterparts. Charges were taken from the <strong>in</strong>ternal dictionaryand the 1-4 scal<strong>in</strong>g term was def<strong>in</strong>ed to give a direct parity between the energiesgenerated by SYBYL and by Amber. Once the convergence criteria of an RMS deviationof 0.05 <strong>in</strong> the total energy term had been achieved the molecule was soaked <strong>in</strong>a bath of random water molecules compris<strong>in</strong>g 2304 solvent molecules <strong>in</strong> a box whose<strong>in</strong>itial periodic boundary size was 49.29 x 55.50 x 34.44 A (xyz). This assembly wasenergy m<strong>in</strong>imised to convergence with periodic boundary conditions <strong>in</strong>voked andHLA-Aw68 held rigid. This geometry optimisation step was used to remove holesand van der Waals (VDW) clashes with<strong>in</strong> the random solvent field. A sphere of <strong>in</strong>-


194 Christopher L Thorpe and David S. Mossterest of 9 A radius from the atoms of the peptide was generated. Atoms with<strong>in</strong> thiscyl<strong>in</strong>der were allowed to move if they were not part of the HLA-Aw68 molecule, andany atoms between the 9 A sphere and an 11 A sphere were considered passively <strong>in</strong>the calculation <strong>in</strong> order to act as an ice-cap to prevent the solvent from boil<strong>in</strong>g outof the cyl<strong>in</strong>der surround<strong>in</strong>g the peptide. The system was equilibrated to 300 K anda constant pressure of 1 atm over 5 ps and was simulated for a further 100 ps underconstant temperature and pressure.Figure 7-13a and b show the plots of time versus temperature, k<strong>in</strong>etic, potentialand total energy for the dynamics trajectory. It can be readily observed thatk<strong>in</strong>etically the simulation stabilises swiftly. Thermal stabilisation of the simulationis achieved well with<strong>in</strong> the first tenth of the simulation time, after which conformersare collected. In the stacked plot all four graphs demonstrate a cont<strong>in</strong>ual plateauafter approximately 1 ps.The movements of the peptide backbone are documented <strong>in</strong> Figure 7-14. It canbe observed from the time course presented here that the majority of movements aresmall and are largely centred around the prol<strong>in</strong>e residue at position P5 of the peptide.In the start<strong>in</strong>g structure of the <strong>in</strong>teraction of the Np91-99 peptide with HLA-Aw68the region around P3, P4 and P5 of the peptide is relatively flat with a moderatebulge. With<strong>in</strong> the first 15 ps this bulge becomes more pronounced and the prol<strong>in</strong>eresidue beg<strong>in</strong>s to orient the Ca carbon towards the MHC molecule. By 65 ps thebulge is pronounced around the P3 residue and the prol<strong>in</strong>e r<strong>in</strong>g is po<strong>in</strong>t<strong>in</strong>g directlydown <strong>in</strong>to the groove of HLA-Aw68. This pronounced bulge and downward fac<strong>in</strong>gprol<strong>in</strong>e are both observed <strong>in</strong> the 2.8 A structure of this HLA-Aw68 with the Np91-99peptide. The major observable difference between the model structure and the crystalstructure is the orientation of the tyros<strong>in</strong>e side cha<strong>in</strong> at P7 of the peptide. Howeveras it has been observed from other crystal structures of MHC class I molecules thisposition has a dichotomy of orientations and one peptide bound structure, that ofthe mouse molecule H-2Kb with the VSV octamer peptide, displays both orientations<strong>in</strong> the one complex with both TCR fac<strong>in</strong>g and MHC fac<strong>in</strong>g conformers be<strong>in</strong>gobservable <strong>in</strong> the electron density maps. An overlay of six conformers taken at 5, 25,45,65, 85 and 105 ps shows the overall stability of the bound Np91-99 peptide structure(Figure 7-15). The orientations of the anchored side cha<strong>in</strong>s P2 and P9 are virtuallyunchanged throughout the simulation, whereas the more accessible side cha<strong>in</strong>ssuch as P1, P6 and P8 display subtle but considerable differences <strong>in</strong> their orientations(Figure 7-15 a). A comparison of the <strong>in</strong>itial and f<strong>in</strong>al conformers from the <strong>molecular</strong>dynamics simulation with the peptide observed <strong>in</strong> HLA-B27 demonstrates the formationof the more pronounced bulge at positions P3 and P4, and the stability ofthe position of the C-term<strong>in</strong>al residue (Figure 7-15 b).


7 Major Histocompatibility Complex Class I Prote<strong>in</strong>-Peptide Interactions 1950 : i : : : : : : : : : : : : : : : I l : I : / : : i . : : : : : : : : l : : : : : : : I : : : : : I.. .--- -.._.. __ _... .. .. .. .--0--2mFigure 7-13. (a) Plot of time vs. k<strong>in</strong>etic energy (top), temperature (second from top), totalenergy (second from bottom) and potential energy (bottom) for the equilibration of the <strong>in</strong>fluenzavirus nucleoprote<strong>in</strong> peptide with HLA-A*6801. (b) Plot of time vs. k<strong>in</strong>etic energy(top), temperature (second from top), total energy (second from bottom) and potential energy(bottom) for the simulation of the <strong>in</strong>fluenza virus nucleoprote<strong>in</strong> peptide with HLA-A*6801.


196 Christopher .I Thorpe and David S. MossFigure 7-14. Series of snapshots of the peptide bound to HLA-A*6801 at 10 ps <strong>in</strong>tervals,demonstrat<strong>in</strong>g the progression of conformers, and the formation of the pronounced bulgeformed <strong>in</strong> this peptide.


7 Major Histocompatibility Complex Class I Prote<strong>in</strong>-Peptide Interactions 197Figure 7-15. (a) Overlay of 5 conformers from the dynamics trajectory demonstrat<strong>in</strong>g thestability of the structure of the peptide <strong>in</strong> the cleft. (b) Initial and f<strong>in</strong>al conformations of theflu peptide from the <strong>molecular</strong> dynamics trajectory compared to the peptide observed <strong>in</strong> thecrystal structure of HLA-B27.


198 Christopher .l Thorpe and David S. Moss7.7 Modell<strong>in</strong>g of an Epste<strong>in</strong> Barr Virus NuclearAntigen Peptide with HLA-B27 Sub-TypeMoleculesHLA-B27 is an enigma <strong>in</strong> the field of immunology, <strong>in</strong> both its disease associations,and the primary structure of the peptides which it presents. It is to date the onlyhuman class I molecule to have a strong genetic l<strong>in</strong>kage to an autoimmune disease,<strong>in</strong> its well known, but poorly characterised, association with ankylos<strong>in</strong>g spondylitis(AS) [43, 441. In contrast to the l<strong>in</strong>k between HLA-B27 and AS the majority oforgan specific T-cell mediated autoimmune diseases display a strong association tothe class I1 genes of the MHC and a weak association to class I genes via l<strong>in</strong>kagedisequilibria (see Table 7-1).Table 7-1. The MHC class I associations to autoimmune diseases known to date.Disorder HLA allele Relative riskHodgk<strong>in</strong>’s diseaseIdiopathic hemochromatosisBechet’s diseaseCongenital adrenal hyperplasiaAnkylos<strong>in</strong>g spondylitisReiter’s diseaseAcute anterior uveitisSubacute thyroiditisPsoriasis vugarisHLA-A1HLA-A3HLA-B14HLA-B5HLA-B47HLA-B27HLA-B27HLA-B27HLA-B35HLA-Cw61.48.24.26.315.487.437.010.413.713.3In addition to these associations HLA-B7, HLA-A3 and HLA-Cw7 have been demonstratedto be l<strong>in</strong>ked to narcolepsy, probably through their l<strong>in</strong>kage disequilibrium with haplotypes conta<strong>in</strong><strong>in</strong>gthe class I1 HLA-DR2 alleles that have been observed <strong>in</strong> all narcoleptics typed to date.Another class I allele HLA-BlS has been weaklv l<strong>in</strong>ked to C2 comulement deficiencv.The structure of the HLA-B27 molecule determ<strong>in</strong>ed by X-ray crystallography isvirtually identical to that of HLA-A2. However the <strong>in</strong>itial structure of HLA-B*2705was the first to def<strong>in</strong>e the shape of the peptide <strong>in</strong> the cleft. In the structures of HLA-A2 and HLA-Aw68 with collections of endogenous peptides, electron density for thepeptide was visible but was not <strong>in</strong>terpretable. This feature was suggested to be causedby the vast diversity of peptides bound; diverse both <strong>in</strong> length and sequence. All ofthe peptides eluted from HLA-B*2705 were nonamers and had an identical residueat position P2. It was considered to be these features which allowed the visualisationof an average backbone peptide conformation <strong>in</strong> the 3.0 A structure. However octamerand decamer HLA-B27 epitopes have been characterised but all have arg<strong>in</strong><strong>in</strong>e


7 Major Histocompatibility Complex Class I Prote<strong>in</strong>-Peptide Interactions 199at P2, which has the best electron density for a peptide side cha<strong>in</strong> <strong>in</strong> the high resolutioncrystal structure.To date the HLA-B27 family are the only molecules to have an absolute specificityfor a sole am<strong>in</strong>o acid type, arg<strong>in</strong><strong>in</strong>e, at position P2 of the motif [3, 18, 451. In theP2 pocket the side cha<strong>in</strong> of the arg<strong>in</strong><strong>in</strong>e is held rigidly by a planar hydrogen bond<strong>in</strong>gnetwork which <strong>in</strong>volves three of the four polymorphic residues, unique <strong>in</strong> their comb<strong>in</strong>ation<strong>in</strong> this family of alleles: His9, Thr24 and Glu45. The Sy of Cys67, thefourth polymorphic residue, is positioned 3.6 A above the C< of the guanid<strong>in</strong>iumgroup and contributes to the hydrophobic environment of the pocket. These fourimportant residues are conserved <strong>in</strong> all HLA-B27 sub-type molecules sequenced todate, suggest<strong>in</strong>g that <strong>in</strong> all seven cases the pocket will be identical. The four residuesare all observed <strong>in</strong> other HLA-B locus molecules but are not found <strong>in</strong> the same comb<strong>in</strong>ationsuggest<strong>in</strong>g, that amongst the HLA-B specificities HLA-B27 is somewhatunique <strong>in</strong> its antigen b<strong>in</strong>d<strong>in</strong>g properties. Several alleles share the glutamic acid atposition 45 of the heavy cha<strong>in</strong> but have polymorphisms at positions 9 and 67. HLA-B8, for example, has Glu45, but has Phe67 which largely occludes the entrance tothe pocket only allow<strong>in</strong>g the b<strong>in</strong>d<strong>in</strong>g of quite small am<strong>in</strong>o acid side cha<strong>in</strong>s [46](A. J. McMichael, personal communication). Several HLA-B molecules have atyros<strong>in</strong>e or phenylalan<strong>in</strong>e residue at position 9 of the heavy cha<strong>in</strong> <strong>in</strong>stead of histid<strong>in</strong>eand analysis of the crystal structures of HLA-A2, HLA-Aw68 and HLA-B27 showsthat these residues adopt slightly different rotamers. Mutants of HLA-B*2705 witha His + Tyr substitution at position 9 of the heavy cha<strong>in</strong> show impaired b<strong>in</strong>d<strong>in</strong>g ofthe <strong>in</strong>fluenza A nucleoprote<strong>in</strong> epitope, conta<strong>in</strong>ed with<strong>in</strong> the dodecamer (383-394),which is known to b<strong>in</strong>d to HLA-B27 [47]. In the high resolution structure a watermolecule can be observed <strong>in</strong> the P2 b<strong>in</strong>d<strong>in</strong>g pocket <strong>in</strong>teract<strong>in</strong>g with His9 and Tyr99,and it is this solvent molecule that provides the fourth chelator of the arg<strong>in</strong><strong>in</strong>e. Itis the precise nature of the planar hydrogen bond<strong>in</strong>g network generated <strong>in</strong> the B-pocket which stipulates arg<strong>in</strong><strong>in</strong>e as the P2 anchor <strong>in</strong> HLA-B27 rather than lys<strong>in</strong>ewhich would require a trigonal hydrogen bond<strong>in</strong>g network. It has been suggestedfrom model build<strong>in</strong>g experiments [3] that this vital water molecule would have asteric clash with the hydroxyl group of tyros<strong>in</strong>e 9 <strong>in</strong> alleles with this substitutionlead<strong>in</strong>g to the displacement of the solvent molecule. The tyros<strong>in</strong>e residue has thepotential for hydrogen bond<strong>in</strong>g to the guanid<strong>in</strong>ium group of the P2 arg<strong>in</strong><strong>in</strong>e but thatthe bond would have poor geometry. This may therefore expla<strong>in</strong> the observed reduction<strong>in</strong> b<strong>in</strong>d<strong>in</strong>g of the <strong>in</strong>fluenza epitope.The Epste<strong>in</strong>-Barr virus (EBV) is a member of the herpes family of DNA virusesand <strong>in</strong>duces powerful class I mediated CTL responses. It is responsible for the highly<strong>in</strong>fectious and debilitat<strong>in</strong>g condition, <strong>in</strong>fectious mononucleosis, commonly knownas glandular fever. EBV <strong>in</strong>fection is also associated with the pathogenesis of twohuman cancers; Burkitt’s lymphoma and nasopharyngeal cancer. Burkitt’s lymphomais a predom<strong>in</strong>ant non-Hodgk<strong>in</strong>’s lymphoma (NHL) and is a tumor of surfaceIg positive B-cells, endemic <strong>in</strong> some parts of Africa. Virtually all tumor cells of pa-


200 Christopher J Thorpe and David S. Mosstients carry the EBV genome, and <strong>in</strong> patients with immune dysregulation caused by<strong>in</strong>fection by EBV, susta<strong>in</strong>ed B-cell proliferation occurs. Burkitt’s lymphoma ma<strong>in</strong>lyaffects children or young adults and with modern treatment a long term survival rateof 50% can be expected. In African tumors <strong>in</strong> the areas of the maxilla and mandiblesare prevalent, but <strong>in</strong> the Northern Americas abdom<strong>in</strong>al tumors are more common.It is unlikely that EBV <strong>in</strong>fection is the sole causative agent for Burkitt’s lymphoma.Nasopharyngeal cancer is endemic <strong>in</strong> southern Ch<strong>in</strong>a. As <strong>in</strong> Burkitt’s lymphoma theEBV genome is found <strong>in</strong> all tumors but is probably not the sole causative agent, act<strong>in</strong>g<strong>in</strong> concert with other as yet undeterm<strong>in</strong>ed factors.Recent evidence has suggested that the Epste<strong>in</strong>-Barr nuclear antigen 3C (EBNA3C) is the major target for a response directed by HLA-B27 specific T-cells [48]. Thepredom<strong>in</strong>ant epitope from EBNA 3C has been elucidated by the standard techniqueof synthesis<strong>in</strong>g overlapp<strong>in</strong>g peptides from the antigen and mapp<strong>in</strong>g the immunodom<strong>in</strong>antepitopes [49]. For HLA-B*2702 and HLA-B*2705 restricted CTLthe immunodom<strong>in</strong>ant peptide from EBNA 3C was the nonamer RRIYDLIEL. ForHLA-B*2704 restricted CTL populations RRIYDLIEL was one of several peptidesto which a response was elicited and would appear to be a m<strong>in</strong>or determ<strong>in</strong>ant. Theexistence of peptides which b<strong>in</strong>d to several HLA-B27 sub-types has an obvious bear<strong>in</strong>gon the pathogenesis of AS. If a spondylitic peptide is the causative agent for ASit must b<strong>in</strong>d to all HLA-B27 sub-types associated with the disease. HLA-B27 subtypesare quite divergent <strong>in</strong> the ligands for the C-term<strong>in</strong>al side cha<strong>in</strong> of the peptide.For HLA-B*2706 and HLA-B*2707 it would appear unlikely that these alleles wouldbe capable of b<strong>in</strong>d<strong>in</strong>g large charged residues such as lys<strong>in</strong>e and arg<strong>in</strong><strong>in</strong>e due to theAsp + Tyr substitution at position 116 of the heavy cha<strong>in</strong>. This aspartate residue appearsto be of paramount importance <strong>in</strong> b<strong>in</strong>d<strong>in</strong>g the charged term<strong>in</strong>al residue of thepeptide and its substitution to a large residue dim<strong>in</strong>ishes the size of the PC b<strong>in</strong>d<strong>in</strong>gpocket. Thus although the polar hydroxyl group of the tyros<strong>in</strong>e could supporthydrogen bond<strong>in</strong>g it would be impossible from pure steric grounds for this to occur.From this l<strong>in</strong>e of th<strong>in</strong>k<strong>in</strong>g it would appear to be a necessity for the AS peptide tohave a non-charged hydrophobic side cha<strong>in</strong> at the PC residue <strong>in</strong> order to b<strong>in</strong>d to allof the sub-type molecules. Therefore studies of peptides such as the EBNA 3Cepitope aid the overall visualisation of any putative AS peptide. The sequence alignmentof all seven HLA-B27 sub-type molecules (Figure 7-16a) and the schematicdiagram of the HLA-B27 polymorphisms displayed on the structure of HLA-B*2705(Figure 7-16 b) clearly demonstrate the cluster<strong>in</strong>g of polymorphic residues <strong>in</strong> theregion of the class I molecule which b<strong>in</strong>ds the C-term<strong>in</strong>al portion of the peptide.Due to the hydrophobic nature of the majority of the side cha<strong>in</strong>s <strong>in</strong> the EBNA3C peptide (RRIYDLIEL) there are very few noteworthy hydrogen bond<strong>in</strong>g contactsbetween the peptide and the MHC molecule. Lists of contacts observed <strong>in</strong> the modelsof the EBNA 3C peptide with HLA-B*2705, HLA-B*2702 and HLA-B*2704 is given<strong>in</strong> Tables 7-2(a-c). The majority of hydrogen bond<strong>in</strong>g contacts apart from thepreviously mentioned planar bond<strong>in</strong>g network around the arg<strong>in</strong><strong>in</strong>e residue at P2 are


HLA-B*270 IHLA-B*2702HLA-B*2703HLA-B*2704HLA-B*2705HLA-B'2706HLA-B*2707HLA-B*2701HLA-B*2702HLA-B*2703HLA-B*2704HLA-B*2705HLA-B*2706HLA-B*2707HLA-B *2701HLA-B.2702HLA-B'2703HLA-B *2704HLA-B.2705HLA-B*2706HLA-B.2707(a)7 Major Histocompatibility Complex Class I Prote<strong>in</strong>-Peptide Interactions201-60606060606060120120120120120120120182182182182182182182Figure 7-16. (a) Sequence alignment of the seven HLA-B27 subtype molecules. From HLA-B*2705, the archetypal HLA-B27 allele the evolution of all other seven molecules can bedef<strong>in</strong>ed. (b) Molecular graphics image of HLA-B*2705 display<strong>in</strong>g, <strong>in</strong> dark-grey, patches theregions of polymorphism. The cluster<strong>in</strong>g of polymorphic residues <strong>in</strong> the regions that surroundand chelate the C-term<strong>in</strong>al residue PC, and the penultimate residue PC-1 (P8 <strong>in</strong> a nonamersuch as the EBNA 3C epitope) is clearly visible.


202 Christopher 1 Thorpe and David S. MossTable 7-2(a). Lists of contacts and hydrogen bonds displayed by the <strong>in</strong>teraction of the EBNA3C peptide with the clefts of the HLA-27 isotype molecule HLA-B*2702.Position Residues Waters mc hydrogen sc hydrogen<strong>in</strong> peptide with<strong>in</strong> 4A with<strong>in</strong> 4A bonds bondsArgPlArgP2Y59, R62, TIP461,E63, Y159, TIP465E163, W167,Y171Y7, H9, T24, TIP456,V25, V34, TIP461,E45, R62, E63, TIP465166, C67, Y99,Y159, El63NH-OH(sc) Y171, NH-C=O El63C = O-OH(sc)Y159NH-C = O(SC) NH-C = O(SC)E63, E45 (x2),C = O-NH(sc)R62(x2)NH-OH(sc)T24(x2),IleP35rP4AspP5LeuP6IleP7GluPSLeuP9166, Y99, TIP456,L156, Y159 TIP458,TIP459,TIP465R62, 465, I66 TIP457,TIP458,TIP464,TIP465V152, QlSS,L156T73, V152,Q155, L156T73, N77,H114, W147,V152T73, E76,N77, 180,W147N77, 180, Y84,L95, Y123,T143, W147TIP457,TIP460,TIP464TIP453,TIP454,TIP455,TIP460TIP451,TIP453,TIP460TIP451,TIP452,TIP453NH-OH(sc) Y99C = 0-tipTIP458NH-tipTIP465-(E163, R62)C = 0-tipTIP464C = 0-tip TIP453C = O-NH(sc)W147NH-C = O(SC)N77C=O-OH(sc)Y84C = O-OH(sc)T143C = 0-tipTIP452-K146NH-tipTIP456-(Y99,H9)C = 0-tipTIP451-(N77, LeuP9)


7 Maior Histocomvatibilitv Comrdex Class I Prote<strong>in</strong>-Pevtide Interactions 203Table 7-2(b). Lists of contacts and hydrogen bonds displayed by the <strong>in</strong>teraction of the EBNA3C peptide with the clefts of the HLA-27 isotype molecule HLA-B*2704.Position Residues Waters mc hydrogen sc hydrogen<strong>in</strong> peptide with<strong>in</strong> 4A with<strong>in</strong> 4A bonds bondsArgPlArgP2Y59, R62,E63, Y159,E163, W167,Y171Y7, H9, T24,V25, V34,E45, R62, E63,166, C67, Y99,Y159, El63TIP461,TIP465TIP456,TIP461,TIP465NH-OH(sc) Y171,NH-C=O El63C= O-OH(SC)Y159NH-C = O(SC) NH-C=O(sc)E63, E45(x2),C = O-NH(SC)R62(x2)NH-OH(sc)T24(x2),IleP3SrP4AspP5LeuP6IleP7GluP8LeuP9166, Y99,L156, Y159R62, 465, I66E152, Q155,L156T73, E152,Q155, L156T73, S77,H114, W147,El52T73, E76, S77,T80, W147S77, T80, Y84,L95, Y123,T143, W147TIP456,TIP458,TIP459,TIP465TIP457,TIP458,TIP464,TIP465TIP457,TIP460,TIP464TIP453,TIP454,TIP455,TIP460TIP451,TIP453,TIP460TIP451,TIP452,TIP453NH-OH(SC) Y99C = 0-tipTIP458NH-tipTIP465-(E163, R62)C = 0-tipTIP464C = 0-tip TIP453NH-tipTIP456-(Y99,H9)C= O-NH(SC) C = 0-tipW147TIP451-LeuP9C= O-OH(SC)Y84C=O-OH(sc)T143C = 0-tipTIP452-(K143, T80)


204 Christopher J. Thorpe and David S. MossTable 7-2(c). Lists of contacts and hydrogen bonds displayed by the <strong>in</strong>teraction of the EBNA3C peptide with the clefts of the HLA-27 isotype molecule HLA-B*2705.~Position~~Residues Waters mc hydrogen sc hydrogen<strong>in</strong> peptide with<strong>in</strong> 4A with<strong>in</strong> 4A bonds bondsArgPlArgP2Y59, R62,E63, Y159,E163, W167,Y171Y7, H9, T24,V25, V34,E45, R62, E63,166, C67, Y99,Y159, El63TIP461,TIP465TIP456,TIP461,TIP465NH-OH(SC) Y171,NH-C=O El63C = O-OH(SC)Y159NH-C=O(sc) NH-C = O(SC)E63, E45 (x2),C = O-NH(SC)R62(x2)NH-OH(SC)T24(x2),IleP3TyrP4AspP5LeuP6IleP7GluP8LeuP9166, Y99,L156, Y159R62, Q65, I66V152, Q155,L156T73, V152,Ql55, L156T73, D77,H114, W147,V152T73, E76,D77, TSO,W147D77, T80,Y84, L95,Y123, T143,W147TIP456,TIP458,TIP459,TIP465TIP457,TIP458,TIP464,TIP465TIP457,TIP460,TIP464TIP453,TIP454,TIP455,TIP460TIP451,TIP453,TIP460TIP451,TIP452,TIP453NH-OH(SC) Y99C = 0-tipTIP458NH-tipTIP465-(E163, R62)C = 0-tipTIP464C = 0-tipTIP453C = O-NH(SC)W147NH-C = O(SC)D77C=O-OH(sc)Y84NH-tipTIP456-(Y99,H9)C = 0-tipTIP451-(D77, LeuP9)C = O-OH(SC)T143C=O-tipTIP452-(K143, 280)


7 Major Histocompatibility Complex Class I Prote<strong>in</strong>-Peptide Interactions 205mediated through solvent. Position P1 of the peptide only makes one hydrogen bond<strong>in</strong>gcontact with a side cha<strong>in</strong> from HLA-B27, us<strong>in</strong>g the term<strong>in</strong>al NH to bond tothe C=O of Glu163. Glu163 also makes a hydrogen bond to Arg62, a residue whichis found <strong>in</strong> all HLA-B and HLA-C alleles except for HLA-B57 and HLA-B58, andhas two potential contacts to the water molecule TIP465 which <strong>in</strong>teracts with thema<strong>in</strong> cha<strong>in</strong> NH of position P4 of the peptide. Arg62 makes two additional contactsto the peptide at the ma<strong>in</strong> cha<strong>in</strong> C=O of position P2. Glu63 aids this <strong>in</strong>teractionby <strong>in</strong>teract<strong>in</strong>g itself with the term<strong>in</strong>al NH of Arg62 hold<strong>in</strong>g the guanid<strong>in</strong>ium groupof the arg<strong>in</strong><strong>in</strong>e side cha<strong>in</strong> <strong>in</strong> a plane perpendicular to the P-sheet, and align<strong>in</strong>g it sothat it may <strong>in</strong>teract with good geometry with both the peptide and its salt bridgepartner <strong>in</strong> the a2 a-helix, Glu163. The <strong>in</strong>teraction of the arg<strong>in</strong><strong>in</strong>e side cha<strong>in</strong> at P1with the strong salt bridge which holds the a1 and a2 helices together, appears tostrengthen the <strong>in</strong>teraction of the peptide with the prote<strong>in</strong> and helps to stabilise theprote<strong>in</strong> itself. This salt bridge is not found <strong>in</strong> all HLA-B alleles but can be predictedto form <strong>in</strong> certa<strong>in</strong> members of the HLA-B7, HLA-B13, HLA-B40, HLA-B47 andHLA-B48 families <strong>in</strong> addition to HLA-B27. Amongst the HLA-A and HLA-C allelesHLA-A*2501, HLA-A*2601, HLA-A*4301, HLA-A*6601, HLA-Cw*0201 and HLA-Cw*0202 also possess the potential for this stabilis<strong>in</strong>g salt bridge. Glu63 makes agood hydrogen bond to the ma<strong>in</strong> cha<strong>in</strong> NH of ArgP2 furnish<strong>in</strong>g the backbone ofthis position with an excellent network of contacts which aid the stability of thearg<strong>in</strong><strong>in</strong>e side cha<strong>in</strong> <strong>in</strong> the P2 pocket. The hydrogen bond<strong>in</strong>g contacts with<strong>in</strong> the P2pocket are identical to those which have been discussed previously [3]. The ma<strong>in</strong>cha<strong>in</strong> NH of the isoleuc<strong>in</strong>e residue at position P3 of the peptide hydrogen bonds withthe side cha<strong>in</strong> hydroxyl group of Tyr99, a residue conserved <strong>in</strong> all classical HLA classI sequences except for HLA-Cwl and HLA-A*0207 which have a cyste<strong>in</strong>e and HLA-Bw41, HLA-A*0210, and HLA-Aw24 which have a phenylalan<strong>in</strong>e. The ma<strong>in</strong> cha<strong>in</strong>C=O of P3 <strong>in</strong>teracts with TIP464 which is part of the complex bed of watermolecules which support the flexible and largely un-chelated central bulge of thepeptide [3-51. This water molecule <strong>in</strong>teracts with TIP457, which is part of thegeneral net of water molecules, and TIP456 which <strong>in</strong>teracts with ArgP2 of the peptideand His9 and Sr99 of the heavy cha<strong>in</strong> as part of the planar hydrogen bond<strong>in</strong>gnetwork <strong>in</strong> the P2 b<strong>in</strong>d<strong>in</strong>g pocket. The ma<strong>in</strong> cha<strong>in</strong> NH of P4 hydrogen bonds toTIP465 which mediates an <strong>in</strong>teraction to the side cha<strong>in</strong> of Glu163. Both the ma<strong>in</strong>cha<strong>in</strong> carbonyl of P4 and the ma<strong>in</strong> cha<strong>in</strong> amide of P5 are devoid of any contacts,and the ma<strong>in</strong> cha<strong>in</strong> carbonyl of P5 <strong>in</strong>teracts via TIP459 with the general solventfield. The aspartic acid side cha<strong>in</strong> of P5 makes no contacts to the MHC but is po<strong>in</strong>t<strong>in</strong>gdirectly upwards, perpendicular to the P-sheet, probably fac<strong>in</strong>g the T-cell receptor<strong>in</strong> the <strong>in</strong> vivo complex. The ma<strong>in</strong> cha<strong>in</strong> of LeuP6 makes no hydrogen bond<strong>in</strong>gcontacts to the MHC and the ma<strong>in</strong> cha<strong>in</strong> of IleP7 <strong>in</strong>teracts only with the solvent field<strong>in</strong>side the cleft. The ma<strong>in</strong> cha<strong>in</strong> amide of PC-1 (P8) does not <strong>in</strong>teract with either theMHC or solvent, and the ma<strong>in</strong> cha<strong>in</strong> carbonyl <strong>in</strong>teracts with the pyrrole NH ofTrp147 as observed <strong>in</strong> the high resolution HLA-B*2705 [3] and HLA-Aw68 [16, 171


206 Christouher J. Thorue and David S. Mossstructures. The side cha<strong>in</strong> of the glutamic acid residue at position PC-1 of the peptide(P8) <strong>in</strong>teracts via a solvent molecule to the aspartic acid residue at position 77 of theheavy cha<strong>in</strong> of HLA-B*2705. Position 77 along with position 73 of the heavy cha<strong>in</strong>appears, from motif data, to exert some selection on the nature of the side cha<strong>in</strong>which is present at PC-1. The use of a solvent molecule to mediate an <strong>in</strong>teractionbetween the glutamic acid at PC-1 and the aspartic acid at position 77 permits an<strong>in</strong>teraction without the perturb<strong>in</strong>g effects of charge repulsion.Molecular dynamics studies have been performed on the section of the EBNA 3Cpeptide from P5 to P8 bound to HLA-B*2702 and HLA-B*2705. The dynamicsstudies were simple <strong>in</strong> their nature us<strong>in</strong>g only the solvent molecules observed <strong>in</strong> thecleft of the 2.1 A HLA-B*2705 structure. Despite this caveat some <strong>in</strong>terest<strong>in</strong>gfeatures can be observed which clearly def<strong>in</strong>e how the solvent molecules <strong>in</strong> the cleftmay move to accommodate the am<strong>in</strong>o acid changes between sub-types. In particularthe solvent field around P8 of the peptide is of extreme <strong>in</strong>terest due to the largenumber of sub-type polymorphisms <strong>in</strong> this region. In the dynamics of HLA-B*2705the majority of solvent molecules reside the position <strong>in</strong> which they occur <strong>in</strong> the 2.1 Acrystal structure. Dur<strong>in</strong>g the <strong>molecular</strong> dynamics TIP453 has moved towards a positionbetween those previously occupied by TIP455 and the lys<strong>in</strong>e side cha<strong>in</strong> at P9of the RRIKAITLK peptide modelled <strong>in</strong>to the electron density of the 2.1 A structure.Dur<strong>in</strong>g this process TIP453 exchanges briefly with TIP454 and is subsequentlydisplaced to its new location (Figure 7-17). Both TIP453 and TIP454 are stable forthe f<strong>in</strong>al 20 ps of the simulation. As a part of this process TIP455 has been displacedand has moved to the alternate side of the cleft and forms hydrogen bonds to Asp74as part of a network <strong>in</strong>volv<strong>in</strong>g Lys70 and TIP463 which has ma<strong>in</strong>ta<strong>in</strong>ed its position.GluP8 displays two conformations, def<strong>in</strong>ed as C1 and C2 <strong>in</strong> Figure 7-17 b which areslightly unequally populated <strong>in</strong> favour of C2. TIP451 which chelates the side cha<strong>in</strong>of GluP8 <strong>in</strong> both C1 and C2 orientations has a limited movement which is bestdescribed as a tumbl<strong>in</strong>g motion about an axis which dissects both of the O-H bonds<strong>in</strong> the plane of the water molecule <strong>in</strong> the start<strong>in</strong>g structure. TIP451 makes either oneor two contacts with the side cha<strong>in</strong> of GluP8 <strong>in</strong> the C1 and C2 conformations respectivelyand complexes the side cha<strong>in</strong> carbonyl of Asp77 and the hydroxyl of Thr80when the GluP8 side cha<strong>in</strong> is <strong>in</strong> both C1 and C2 conformations. TIP452 displays adichotomy of positions between which it exchanges at several po<strong>in</strong>ts <strong>in</strong> the simulationwhich appear to have no correlation to the orientation of the P8 side cha<strong>in</strong>. Theprimary conformation is that observed <strong>in</strong> the 2.1 A HLA-B*2705 structure whereTIP452 ligates carboxy term<strong>in</strong>us of the peptide and the side cha<strong>in</strong> of Lys146. Thesecondary conformation has a pair of hydrogen bonds, one to the term<strong>in</strong>al amideof Arg83 and the other to the hydroxyl of Thr80.The solvent molecules <strong>in</strong> the HLA-B*2702 model move dur<strong>in</strong>g the 5 ps equilibrationfrom their positions <strong>in</strong> the HLA-B*2705 structure to adopt new positions whichmay be adequately expla<strong>in</strong>ed by the changes <strong>in</strong> am<strong>in</strong>o acid character at positions 77,80 and 81 (see Figure 7-17c). The water molecules under the peptide move <strong>in</strong> a


7 Major Histocompatibility Complex Class I Prote<strong>in</strong>-Peptide Interactions 207slightly different manner to those <strong>in</strong> the HLA-B*2705 dynamics. In the HLA-B*2705model TIP453 moves to fill the space left by the shorter hydrophobic side cha<strong>in</strong> ofthe EBNA 3C peptide. In the HLA-B*2702 model this does not occur as TIP453 istightly bound throughout the trajectory by Asn77, the ma<strong>in</strong> cha<strong>in</strong> carbonyl ofThr73, the ma<strong>in</strong> cha<strong>in</strong> amide of Leu-P9 and the side cha<strong>in</strong> of GluP8. GluP8 appearsto spend the majority of the dynamics simulation <strong>in</strong> an orientation which is approximatelyequidistant between the C1 and C2 conformers observed <strong>in</strong> the HLA-B*2705simulation (see Figure 7-17 d). This conformation is itself stabilised by the movementof TIP451 from its start<strong>in</strong>g position, where <strong>in</strong> HLA-B*2705 it made hydrogen bondswith the side cha<strong>in</strong>s of Asp77, Thr80 and GluP8, to a new position where the sidecha<strong>in</strong>s of both GluP8 and Arg83 can be complexed. This movement occurs dur<strong>in</strong>gthe first 10 ps of the simulation and is stable for the f<strong>in</strong>al 20 ps of the simulation.In the period between 10 ps and 25 ps TIP451 is exchang<strong>in</strong>g between its f<strong>in</strong>al orientationand that observed <strong>in</strong> the HLA-B*2705 crystal structure. TIP451 rema<strong>in</strong>s largely<strong>in</strong> the position observed <strong>in</strong> the crystallographic co-ord<strong>in</strong>ates, but has a m<strong>in</strong>or secondaryconformation where, like the pyrrole am<strong>in</strong>e of Trp147, it is chelated by the ma<strong>in</strong>cha<strong>in</strong> carbonyl of P8.In addition to the above simulations another <strong>molecular</strong> dynamics trajectory wascalculated to determ<strong>in</strong>e the extent of the movement available to the arg<strong>in</strong><strong>in</strong>e sidecha<strong>in</strong> <strong>in</strong> the P2 b<strong>in</strong>d<strong>in</strong>g pocket. The P2 b<strong>in</strong>d<strong>in</strong>g pocket of HLA-B27 displays a highlyspecific hydrogen bond<strong>in</strong>g network, and presumably this specificity ma<strong>in</strong>ta<strong>in</strong>s arigidity of side cha<strong>in</strong> orientation. The experiment was performed fix<strong>in</strong>g all MHCatoms and allow<strong>in</strong>g movement only <strong>in</strong> the side cha<strong>in</strong> atoms of ArgP2, and the ma<strong>in</strong>cha<strong>in</strong> atoms of ArgP2, the C=O and Ca of P1 and the N-H and Ca of P3. Thislead to a picture of the dynamics of the arg<strong>in</strong><strong>in</strong>e side cha<strong>in</strong> with<strong>in</strong> the conf<strong>in</strong>es ofthe well def<strong>in</strong>ed pocket. As can be observed <strong>in</strong> Figure 7-18 the arg<strong>in</strong><strong>in</strong>e side cha<strong>in</strong>has very little movement <strong>in</strong>side the pocket. The movement is also usually cooperativewith movements performed by the water molecule although which movement iscause and which is effect is impossible to determ<strong>in</strong>e. The slight movements ofTIP456 can be compared to those exhibited by TIP461 and the “bulk” solvents ofthe cleft. TIP461 has virtually no movement throughout the dynamics trajectory andthe hydrogen bonds between TIP461, Glu45, Glu63 and Tyrl71 are stable throughoutthe trajectory. Amongst the “bulk” solvent around the P2 pocket markedly greatermovement is observed and at several po<strong>in</strong>ts <strong>in</strong> the dynamics trajectory movementstowards the exchange of solvent positions are observed. In Figure 7-18 the hydrogenbond<strong>in</strong>g patterns which are shown are for the start<strong>in</strong>g conformation of the complex.The <strong>molecular</strong> dynamics calculations carried out on the complexes of the EBNA3C peptide and HLA-B*2702 and HLA-B*2704 clearly demonstrate the effects ofpolymorphisms <strong>in</strong> the heavy cha<strong>in</strong> on the movements of non-anchor<strong>in</strong>g side cha<strong>in</strong>swith<strong>in</strong> the peptide and solvent molecules. This view of the dynamics of the solventmolecules <strong>in</strong> the C-term<strong>in</strong>al b<strong>in</strong>d<strong>in</strong>g region of the molecule demonstrates how thesolvent “bed” may aid closely related MHC molecules to b<strong>in</strong>d the same peptide. In


208 Christopher .l Thorpe and David S. MossdwW


7 Major Histocompatibility Complex Class I Prote<strong>in</strong>-Peptide Interactions 209Figure 7-18. Overlay of five conformers (displayed <strong>in</strong> magenta, red, purple, green and cyan)from the simulation of the Arg-P2 residue of the EBNA 3C peptide <strong>in</strong> the “45-pocket” ofHLA-B*2705. It may be clearly observed that TIP451 is stationary throughout the simulation.The movements of the Arg-P2 side cha<strong>in</strong> are small and are co-operative with those of TIP456,a water residue which is <strong>in</strong>volved <strong>in</strong> the square planar hydrogen bond<strong>in</strong>g network <strong>in</strong> the P2pocket .(Colour illustration see page XVI).4 Figure 7-17. Overlay of the <strong>in</strong>itial and f<strong>in</strong>al conformers of the EBNA 3C peptide. Initial conformersare coloured dark-grey and f<strong>in</strong>al conformers are coloured light-grey.(a) The EBNA 3C complexed to HLA-B*2702 demonstrat<strong>in</strong>g the different conformations ofGluP8 and the movement of water molecules around position 77 and 80.(b) Top view of the EBNA 3C peptide as complexed to HLA-B*2702. The conformationtermed C1 <strong>in</strong> the text is represented by the dark-grey side cha<strong>in</strong> at P8. The C2 conformationis represented by the light grey side cha<strong>in</strong>. Slight movements of the peptide backbone appearto have occured due to the movement of water molecules <strong>in</strong> the “bed” underneath the peptide.(c) Orthologous view to (a) for the <strong>in</strong>teraction of the EBNA 3C peptide with HLA-B*2705.The movement of water molecules <strong>in</strong> the simulation of these two alleles may be directly contrastedbetween the two figures.(d) Orthologous view for (b) for the <strong>in</strong>teraction of the EBNA 3C peptide with HLA-B*2705.In contrast to the different distribution of water molecules around the peptide <strong>in</strong> the cleft ofthe two isotype molecules it may be observed that the peptide has a highly similar conformation<strong>in</strong> the two environments, suggest<strong>in</strong>g that it is the movement of solvent with<strong>in</strong> the cleftof the isotype molecules which allows for cross-reactivity of this peptide.


210 Christopher J. Thorue and David S. Mossthe case of HLA-B*2702 Thr80 which forms contacts mediated by water to the PCresidue of the peptide <strong>in</strong> HLA-B*2705 is non-conservatively substituted to isoleuc<strong>in</strong>e.This loss of a hydrogen bond<strong>in</strong>g contact appears from the <strong>molecular</strong> dynamicssimulation described above to be adjusted for by the movement of a solvent moleculeto a position where a hydrogen bond<strong>in</strong>g bridge between the MHC molecule and thepeptide may be made.7.8 ConclusionsThe simulations and models described above all aid the visualisation of realbiochemical data which is one of the prime aims of <strong>molecular</strong> modell<strong>in</strong>g as a technique.The analysis of peptides b<strong>in</strong>d<strong>in</strong>g to MHC class I molecules us<strong>in</strong>g computationaltechniques has been limited to date. In the case of HLA-B27 sub-type molecules wehave demonstrated how movements of the “bed” of water molecules present <strong>in</strong> thecleft effectively compensates for the polymorphisms <strong>in</strong> the cleft and negates their effecton peptide b<strong>in</strong>d<strong>in</strong>g. In direct contrast it has been recently observed that <strong>in</strong> thecase of some sub-type molecules of HLA-B27 the polymorphisms <strong>in</strong> the cleft changethe motif of peptide presented [15]. At first these two concepts appear dichotomousand the gulf between them appears unbridgeable. In fact what we are observ<strong>in</strong>g isthe subtle effect of peptide selection. The b<strong>in</strong>d<strong>in</strong>g of the same peptide to several subtypemolecules is merely a demonstration of the adaptability of the complex wherebya peptide will be bound as long as it fits the general motif. The general HLA-B27motif is the same for HLA-B*2705 and HLA-B*2702 but the HLA-B*2702 moleculemay b<strong>in</strong>d additional peptides which would not naturally be bound to HLA-B*2705<strong>in</strong> large enough concentration for their presence to be observed <strong>in</strong> the motif data,and <strong>in</strong>deed these peptides may not be bound at all. Conversely there may be peptideswhich may be bound to HLA-B*2705 which are not observed at high levels <strong>in</strong> thecontext of HLA-B*2702. This effect has been observed <strong>in</strong> HLA-B*2704 where anunusual peptide may b<strong>in</strong>d to this isotype but not to the HLA-B*2702 and HLA-B*2705 isotypes despite the fact that all three molecules will b<strong>in</strong>d the EBNA 3C peptidedescribed above [49] (CJT, J. M. Brooks, DSM, A. B. Rickenson & P. J. Travers,manuscript <strong>in</strong> preparation). Often it is unusual peptides which make a motif seemunclear or suggest that the motif may be vastly different, when <strong>in</strong> fact the sub-typemolecules will all b<strong>in</strong>d similar epitopes, have closely related motifs and will probablypresent the same immunodom<strong>in</strong>ant epitopes.The models and simulations described above clearly demonstrate that by the useof homology modell<strong>in</strong>g techniques, appropriately restra<strong>in</strong>ed energy m<strong>in</strong>imisationand <strong>molecular</strong> dynamics, and explicit solvent fields, the <strong>in</strong>teraction of MHC classI molecules and peptides may be performed to a high degree of accuracy and gives


7 Major Histocompatibility Complex Class I Prote<strong>in</strong>-Peptide Interactions 21 1answers which are extremely similar to those obta<strong>in</strong>ed by X-ray crystallography. Dueto the vast difference <strong>in</strong> time scale between perform<strong>in</strong>g a modell<strong>in</strong>g exercise andcrystallis<strong>in</strong>g a complex of an MHC class I molecule and its peptide partner, for caseswhere a qualitive answer to questions of the function and position of residues <strong>in</strong> theMHC class I molecule or the peptide for mutagenesis studies is required, the modelsproduced us<strong>in</strong>g the techniques described here are more than adequate. Where aquantitative answer is required, as <strong>in</strong> all modell<strong>in</strong>g exercises and simulations, onlydirect structural analysis will provide an unequivocal solution. In the case of manyMHC class I peptide studies this unequivocal answer is highly similar <strong>in</strong> its conclusionsto those drawn from the model.AcknowledgementsThe contributions of Alan Rickenson, Andreas Ziegler, Jonathan Howard, AlanLamont, Neera Borkakoti, Julia Goodfellow and especially Paul Travers to this workare gratefully acknowledged. In addition the contributions of Tom Blundell andmembers of the ICRF Unit for Structural Molecular Biology to <strong>in</strong>terest<strong>in</strong>g andstimulat<strong>in</strong>g discussions are acknowledged. CJT is the recipient of an SERC CASEaward <strong>in</strong> collaboration with Roche Products U.K. CJT and DSM acknowledge thecont<strong>in</strong>u<strong>in</strong>g support of Tripos Associates.References[l] Townsend, A., Bodmer, H., Annu. Rev. Immunof. 1989, 7, 601-624.[2] Gorer, P. A., J Pathol. Bacteriol. 1937, 44, 691.[3] Kle<strong>in</strong>, J., Natural history of the major histocompatibility complex. J. Wiley & Sons Inc.,New York, 1986.[4] Brown, J. H., Jardetzky, T, S., Gorga, J. C., Stern, L. J., Urban, R. G., Strom<strong>in</strong>ger, J. L.,Wiley, D. C., Nature 1993, 364, 33-39.[5] Madden, D. R., Gorga, J. C., Strom<strong>in</strong>ger, J. L., Wiley, D. C., Cell 70, 1992, 1035-1048.[6] Thorpe, C. J., Immunology Today 1993, 14, 51-52.[7] Travers, P. J., Thorpe, C. J., Curr. Biol. 1992, 2, 679-681.[8] Elliot, T., Smith, M., Driscoll, P., McMichael, A., Curr. Biol. 1993, 3, 854-866.[9] Trowsdale, J., Hanson, I., Mockridge, I., Beck, S., Townsend, A., Kelly, A., Nature 1990,348, 741 -744.[lo] Robson MacDonald, H., Curr. Biol. 1992, 2, 653-655.[Ill Falk, K. Rotzschke, O., Setvanovich, S., Jung, G., Rammensee, H.-G., Nature 1991, 351,290-296.[12] Jardetzky, T. H., Lane, W. S., Rob<strong>in</strong>son, R. A., Madden, D. R., Wiley, D. C., Nature1991, 353, 326-329.


~~Christopher J. Thorpe and David S. MossMadden, D. R., Wiley, D. C., Curr. Op<strong>in</strong>. Struct. Biol. 1992, 2, 300-304.Ohno, S., Proc. Natl. Acad. Sci. USA 1992, 89, 4643-4647.Rotzshcke, O., Falk, K., Stevanovic, S., Gnau, V., Jung, G., Rammensee, H.-G., Zmmunogenet.1994, 39, 74-77.Ljunggren, H.-G., Stam, N. S., Ohlen, C., Neefjes, J. J., Hogland, P., Heemels, M. T.,Bast<strong>in</strong>, J., Schumacher, T. N. M., Townsend, A., Karre, K., Ploegh, H. L., Nature 1990,346, 476-480.Powis, S. J., Deverson, E. V., Coadwell, W. J., Ciruela, A., Huskisson, N. S., Smith, H.,Butcher, G. W., Howard, J. C., Nature 1992, 357, 211-215.Spies, T., Cerundolo, V., Colonna, M., Cresswell, P., Townsend, A., DeMars, R., Nature1992, 644-646.Haldane, J. B. S., La Ricierca Scientifca 1949, 19, Suppl. 19, 68-75.[20] Watk<strong>in</strong>s, D. I., McAdam, S. N., Liu, X., Strang, C. R., Milford, E. L., Lev<strong>in</strong>e, C. G.,Garber, T. L., Dogon, A. L., Lord, C. I., Ghim, S. H., Troup, G. M., Hughes, A. L.,Letv<strong>in</strong>, N. L., Nature 1992, 357, 329-333.[21] Belich, M. P., Madrigal, J. A., Hildebrand, W. H., Zemmour, J., Williams, R. C., Luz,R., Petzl-Erler, M. L., Parham, P., Nature 1992, 357, 326-329.[22] Hill, A. V. S., Elv<strong>in</strong>, J., Willis, A. C., Aidoo, M., Allsopp, C. E., Gotch, F. M., Gao,X. M., Takiguchi, M., Greenwood, B. M., Townsend, A. R. et al., Nature 1992, 360,434-439.[23] Salter, R. D., Benjam<strong>in</strong>, R. J., Wesley, P. K., Buxton, S. E., Garrett, T. P. J., Clayberger,C., Krensky, A., Norment, A. M., Littman, D. R., Parham, P., Nature 1990, 345, 41.[24] Bjorkman, P. J., Saper, M. A., Samraoui, B., Bennet, W. S., Strom<strong>in</strong>ger, J. L., Wiley,D. C., Nature 1987, 329, 506-512.[25] Bjorkman, P. J., Saper, M. A., Samraoui, B., Bennet, W. S., Strom<strong>in</strong>ger, J. L., Wiley,D. C., Nature 1987, 329, 512.[26] Saper, M. A., Bjorkman, P. J., Wiley, D. C., J. Mol. Biol. 1991, 219, 277.[27] Madden, D. R., Garboczi, D. N., Wiley, D. C. Cell. 1993, 75, 693-708.[28] Garrett, T. P. J., Saper, M. A., Bjorkman, P. J., Strom<strong>in</strong>ger, J. L., Wiley, D. C., Nature1989, 342, 692.[29] Silver, M. L., Guo, H.-C., Strom<strong>in</strong>ger, J. L., Wiley, D. C., Nature 1992, 360, 367-369.[30] Guo, H.-C., Jardetzky, T. S., Garrett, T. P. J., Lane, W. S., Strom<strong>in</strong>ger, J. L., Wiley,D. C., Nature 1992, 360, 364-366.[31] Madden, D. R., Gorga, J. C., Strom<strong>in</strong>ger, J. L., Wiley, D. C., Nature 1991, 353, 321.[32] Zhang, W., Young, A. C. M., Imarai, M., Nathenson, S. G., Sacchet<strong>in</strong>i, J. C., Proc.Natl. Acad. Sci. USA 1992, 89, 8403-8407.[33] Fremont, D. H., Matsumura, M., Stura, E. A., Peterson, P. A,, Wilson, I., Sciencel992,257, 919-927.[34] Thorpe, C. J., Travers, P. J., Moss, D. S., unpublished results.[35] Over<strong>in</strong>gton, J., Johnson, M. S., Sali, A., Blundell, T. L., Proc. R. SOC. London 1990,241, 132-145.[36] Sutcliffe, M. J., Haneef, I., Carney, D., Blundell, T. L., Prote<strong>in</strong> Eng. 1987, Z, 377-384.[37] Sutcliffe, M. J., Hayes, F. R. F., Blundell, T. L., Prote<strong>in</strong> Eng. 1987, I, 385-392.[38] SYBYL 5.5 Manual (Tripos Associates).[39] We<strong>in</strong>er, S. J., Kollman, P. A., Case, D. A., S<strong>in</strong>gh, U. C., Ghio, C., Alagona, G., Profeta,S., We<strong>in</strong>er, P. K., J. Amer. Chem. SOC. 1984, 106, 765-784.[40] We<strong>in</strong>er, S. J., Kollman, P. A., Nguyen, D. T., Case, D. A., 1 Comp. Chem. 1986, 7,230-252.[41] Karplus, M., McCammon, A., Ann. Rev. Biochem. 1983, 52, 263.


7 Major Histocompatibility Complex Class I Prote<strong>in</strong>-Peptide Interactions 213[42] Cerundolo, V., Tse, A. G. D., Salter, R. D., Parham, P., Townsend, A,, Proc. R. SOC.London, 244, 169- 177.[43] Gilliland, B. C., Harrison’s pr<strong>in</strong>ciples of <strong>in</strong>ternal medic<strong>in</strong>e (11th edn.), McGraw Hill,1987, pp. 1434- 1436.1441 Benjam<strong>in</strong>, R., Parham, P., Immunology Today 1990, 11, 137- 142.[45] Jardetzky, T. S., Lane, W. S., Rob<strong>in</strong>son, R. A., Madden, D. R., Wiley, D. C., Nature1991, 353, 326-329.[46] Sutton, J., Rowland-Jones, S., Rosenberg, W., Nixon, D., Gotch, F., Gao, M., Murray,N., Spoonas, N., Driscoll, P., Smith, M., Willis, A., McMichael, A. J., EUK J. Immunol.1993, 23, 447-453.[47] Huet, S. H., Nixon, D. F., Rothbard, J. B., Townsend, A., Ellis, S. A., McMichael, A. J.,International Immunology 1990, 4, 311 -316.[48] Murray, R. J., Kurilla, M. G., Brooks, J. M., Thomas, W. A., Rowe, M., Kieff, E.,Rickenson, A. B., J. Exp. Med. 1992, 176, 157-168.[49] Brooks, J. M., Murray, R. J., Thomas, W. A., Kurilla, M. G., Rickenson, A. B., J. Exp.Med. 1993, <strong>in</strong> press.


Computer Modell<strong>in</strong>g <strong>in</strong> Molecular BiologyEdited by Julia M. GoodfellowOVCH Verlagsgesellschaft mbH, 19958 Path Energy M<strong>in</strong>imization :A New Method for the Simulationof Conformational Transitionsof Large MoleculesOliver S. SmartDepartment of Crystallography, Birkbeck College, University of London,Malet Street, London WClE 7HX, EnglandContents8.18.1.18.28.38.48.4.18.4.28.58.6Introduction ...................................................... 216Simulation Methods for Conformational Transitions ................... 216Theory ........................................................... 220Method .......................................................... 225Applications ...................................................... 226A Pucker Angle Change <strong>in</strong> a-D-Xylulofuranose.. ...................... 226Conformation Change of the Substrate <strong>in</strong> the Active Site ofD-Xylose Isomerase ................................................ 230Conclusions: Potential Developments. ................................ 237Summary ......................................................... 238References ........................................................ 239


216 Oliver S. Smart8.1 IntroductionConformational transitions play an important part <strong>in</strong> many processes <strong>in</strong>volv<strong>in</strong>gbiological macromolecules. The sigmoidal b<strong>in</strong>d<strong>in</strong>g k<strong>in</strong>etics of oxygen to hemoglob<strong>in</strong>,which has major physiological significance, has been successfully expla<strong>in</strong>ed<strong>in</strong> terms of an allosteric model [l]. Structures of hemoglob<strong>in</strong> derived by X-raycrystallography [2, 31 show that the allosteric transition is a change <strong>in</strong> the quaternarystructure of the prote<strong>in</strong> which accompanies ligation. Allosteric regulation is also <strong>in</strong>volved<strong>in</strong> the behavior of many multi-subunit enzymes [2-41. Although structuralstudies can provide a detailed picture of the conformations of the prote<strong>in</strong> <strong>in</strong>volvedthey provide little <strong>in</strong>formation on the pathways between the states [4].Examples of more dramatic conformation transitions are provided by thefold<strong>in</strong>ghnfold<strong>in</strong>g of prote<strong>in</strong>s and by the changes between different structural forms<strong>in</strong> helical molecules, such as DNA [5-71 and gramicid<strong>in</strong> A [8-lo]. Processes which<strong>in</strong>volve the motion of a small molecule relative to a prote<strong>in</strong> can also be thought ofas conformational transitions. Examples <strong>in</strong>clude substrate b<strong>in</strong>d<strong>in</strong>g to enzymes,which may also <strong>in</strong>volve concurrent large scale motions of the prote<strong>in</strong>; and thetranslocation of ions through lipid bilayers by ion channels.In some ways the field of macro<strong>molecular</strong> conformational transitions is an idealarea for the application of simulation methods. Microscopic models for the changesdur<strong>in</strong>g a transition are difficult or impossible to obta<strong>in</strong> by experimental means.However, detailed structural models are often available for the end po<strong>in</strong>ts togetherwith good <strong>in</strong>formation about the macroscopic behavior of the system (such as freeenergies or enthalpies of activation). This provides an excellent opportunity forsimulation to provide a detailed model whilst be<strong>in</strong>g constra<strong>in</strong>ed by experimental <strong>in</strong>formation.The next section briefly exam<strong>in</strong>es exist<strong>in</strong>g simulation techniques. Apowerful new method, the PEM procedure, is then set out.8.1.1 Simulation Methods for Conformational TransitionsThe most direct way of simulat<strong>in</strong>g a conformational rearrangement is to run a<strong>molecular</strong> dynamics (MD) simulation under reasonable conditions of temperatureand pressure and to hope that the system switches between the states of <strong>in</strong>terest.Clearly this method is limited to modell<strong>in</strong>g transitions that occur on a time scalesignificantly shorter than the length of the simulation. It also has the disadvantagethat analysis of the movements necessary for the transition will be complicated bythe presence of all the other concurrent motions of the prote<strong>in</strong>. Little direct <strong>in</strong>formationas to energy barriers <strong>in</strong>volved is obta<strong>in</strong>ed. To extend the effective time course


8 Path Energy M<strong>in</strong>imization 217of a simulation high temperatures are often used. Karplus [ll] exam<strong>in</strong>ed the motionof a loop which closes over the active site of triose phosphate isomerase (TIM) us<strong>in</strong>gdynamics at room temperature, 500 K and 1000 K. Another example where this approachmay be useful is <strong>in</strong> the simulation of prote<strong>in</strong> unfold<strong>in</strong>g [12], as there is nodetailed model available for the unfolded state of a prote<strong>in</strong>. However, simulationsat unrealistically high temperatures may favor pathways which are different fromthose at low temperatures. An alternative approach, also used to exam<strong>in</strong>e loop motions<strong>in</strong> TIM, is to simplify the model used for the prote<strong>in</strong> and perform browniandynamics which allows simulations of up to 100 ns to be performed [13].Other simulation methods do not <strong>in</strong>itially <strong>in</strong>clude dynamical effects but concentrateon f<strong>in</strong>d<strong>in</strong>g a “reaction coord<strong>in</strong>ate” for the change, pass<strong>in</strong>g through a transitionstate. The reaction coord<strong>in</strong>ate is a variable or function of variables which smoothlychanges between the end po<strong>in</strong>t conformations of <strong>in</strong>terest. The transition state is thepeak energy position on the reaction coord<strong>in</strong>ate. Transition state theory [14- 161states that the most favorable route for the change is the one with the lowest transitionstate potential energy. Once a suitable reaction coord<strong>in</strong>ate has been identifiedit is possible to run dynamical simulations to obta<strong>in</strong> the potential of mean force forthe change (free energy profile along the reaction coord<strong>in</strong>ate) and such variables asthe transmission coefficient [17-201. In this context, Elber [19] has shown that aseries of cont<strong>in</strong>guous positions for the molecule which l<strong>in</strong>k the end po<strong>in</strong>ts can beused to effectively def<strong>in</strong>e a reaction coord<strong>in</strong>ate for complex transitions.Most methods for identify<strong>in</strong>g reaction coord<strong>in</strong>ates have been developed by quantumchemists <strong>in</strong>terested <strong>in</strong> chemical reactions of small molecules. Although this problemmay appear very similar to identify<strong>in</strong>g a reaction coord<strong>in</strong>ate for a conformationaltransition there are some important differences. The quantum chemist <strong>in</strong>general deals with problems with a relatively small number of variables and thus afairly simple energy hypersurface. Evaluat<strong>in</strong>g the energy of a position of a moleculeis often computationally very expensive (though accurate). In contrast, macro<strong>molecular</strong>potential energy functions are approximate but cheap. However, the energyhypersurface is <strong>in</strong>variably horrendously complicated with large numbers of energym<strong>in</strong>ima thermally accessible to each other [21]. This makes the requirements ofrout<strong>in</strong>es to f<strong>in</strong>d reaction coord<strong>in</strong>ates very different <strong>in</strong> the two cases. The quantumchemist requires a technique to be as efficient as possible but the method need notbe completely robust. In contrast methods for obta<strong>in</strong><strong>in</strong>g reaction coord<strong>in</strong>ates forlarge molecules must be robust even at the expense of computational efficiency.A number of methods developed <strong>in</strong> quantum chemistry concentrate on locat<strong>in</strong>gtransition state configurations [22-271. The multitude of m<strong>in</strong>ima on the energyhypersurface of a large molecule leads one to expect that there would be at least asmany transition states. It is unlikely that a rout<strong>in</strong>e which attempts to locate a transitionstate would f<strong>in</strong>d the one of <strong>in</strong>terest. In addition most of these rout<strong>in</strong>es requirethat the matrix of the second derivatives of the potential energy function (the Hessian)be calculated and manipulated. This also precludes their use for problems <strong>in</strong>-


218 Oliver S. Smartvolv<strong>in</strong>g large molecules as the evaluation, storage and manipulation of the large Hessiansbecomes difficult.A technique commonly used <strong>in</strong> both fields is the reaction coord<strong>in</strong>ate or adiabaticmapp<strong>in</strong>g method [18, 281. A variable or function of the variables judged to be important<strong>in</strong> the transition of <strong>in</strong>terest is controlled and energy m<strong>in</strong>imization is appliedchang<strong>in</strong>g all other variables. By judicious control of the “reaction coord<strong>in</strong>ate” it ishoped that the transition of <strong>in</strong>terest can be provoked. Often two coord<strong>in</strong>ates are usedand a two dimensional contour map is built up. The results are referred to as theadiabatic surface as the method gives a reasonable approximation to the potentialof mean force (free energy profile) along the reaction coord<strong>in</strong>ate, provided the timescale for the motion along the coord<strong>in</strong>ate is much larger than for the other variables[MI. The method is widely used [18, 29-34] despite be<strong>in</strong>g prone to failure: even <strong>in</strong>the case of a two dimensional model function [35]. It is important to note that theroute obta<strong>in</strong>ed by adiabatic mapp<strong>in</strong>g may be quite dist<strong>in</strong>ct from the steepest descentspath even if the same transition state conformation is found [36].A set of procedures which are related to the method proposed by S<strong>in</strong>clair andFletcher [37] are more applicable to large systems. In their orig<strong>in</strong>al method an <strong>in</strong>itialsearch was made to f<strong>in</strong>d the maximum energy position along the l<strong>in</strong>e connect<strong>in</strong>g thetwo given end po<strong>in</strong>ts. The system at this po<strong>in</strong>t was then subjected to energym<strong>in</strong>imization along directions conjugate to the orig<strong>in</strong>al search direction. This ismeant to ensure that the gradient norm drops to zero while the energy profile alongthe orig<strong>in</strong>al search direction is a maximum, thus locat<strong>in</strong>g a transition state. The problemwith the method is that on a complex energy hypersurface, as soon as a s<strong>in</strong>glestep is taken form the orig<strong>in</strong>al po<strong>in</strong>t, the energy profile along the orig<strong>in</strong>al searchdirection ceases to be a maximum [38]. In similar methods proposed by Halgren andLipscomb [39] and Bell and Crighton [28] this problem is avoided by mak<strong>in</strong>g thesearch for a maximum along the parabola through the current po<strong>in</strong>t jo<strong>in</strong><strong>in</strong>g the endpo<strong>in</strong>ts. If the end po<strong>in</strong>ts are energy m<strong>in</strong>ima there is guaranteed to be at least onemaximum along the parabola. The problem with this procedure on complicatedenergy hypersurfaces is when the case can arise where the energy profile displays twomaxima [38]. Whatever choice is made as to which of the maxima is accepted therout<strong>in</strong>e eventually ends up tak<strong>in</strong>g a large displacement along the parabola and doesnot converge [38]. To avoid this problem an adapted procedure has been proposed[38]. In common with the other methods an <strong>in</strong>itial search is made for a maximumalong the l<strong>in</strong>e section jo<strong>in</strong><strong>in</strong>g the end po<strong>in</strong>ts. This is followed by energy m<strong>in</strong>imizationalong all directions orthogonal to the orig<strong>in</strong>al search direction. A further search formaxima is then made along the l<strong>in</strong>es jo<strong>in</strong><strong>in</strong>g the result<strong>in</strong>g po<strong>in</strong>t to the end po<strong>in</strong>ts.In practice, the procedure was found to produce a reasonable result, but was cumbersomeand each m<strong>in</strong>imization had to be restra<strong>in</strong>ed to avoid jump<strong>in</strong>g to remote partsof conformational space. The methods set out by Fischer and Karplus [40] andLiotard [41] are based on broadly similar ideas but have proved to be more successful.


8 Path Energy M<strong>in</strong>imization 219A further set of methods is based on adapt<strong>in</strong>g a MD simulation from one stateso that the molecule is forced to move towards the state of <strong>in</strong>terest. At least threetechniques have <strong>in</strong>dependently been proposed. The Contra MD method [33, 341takes a short free MD run from one state. If the run moves “toward” the desiredtarget conformation it is “accepted” but if this is not case the run is restarted withanother set of <strong>in</strong>itial atomic velocities. This process is repeated until the target conformationis reached. The measure of distance between two conformations used isbased on the same coord<strong>in</strong>ates used for the adiabatic mapp<strong>in</strong>g study undertaken <strong>in</strong>the work [33]. Ech-Cherif El-Kettani and Durup [42] work<strong>in</strong>g on a much larger problemtake a more direct approach <strong>in</strong> that the MD run is started with the <strong>in</strong>italvelocities set so that the system moves toward the desired state. The r. m. s. atomicdisplacement of the molecule with respect to the target conformation is monitoreddur<strong>in</strong>g dynamics and the run is halted when this quantity starts to rise. Velocities arethen reset toward the target conformations and the run is restarted. Once aga<strong>in</strong> theprocedure is repeated until the desired conformation is reached. In contrast, Schlitteret al. [43] use a MD simulation from one conformation with the constra<strong>in</strong>t that thedistance to the target should decrease slowly with time. All these methods make theassumption that the best path between two states is reasonably direct which may notbe true (e. g., consider a change where the best route for a molecule to undergo asimple transition of a s<strong>in</strong>gle dihedral angle from 0 to 90 degrees is to go through atransition state at 180 degrees). Furthermore the transition state configuration is notidentified and at best only a “ball park estimate” as to the energy barrier <strong>in</strong>volvedis obta<strong>in</strong>ed. These methods may prove to be useful to provide <strong>in</strong>itial routes to whichfurther optimization can be applied.The most effective method developed to date for reaction path generation <strong>in</strong> largesystems is the Self Penalty Walk (SPW) procedure due to Elber and co-workers[44, 451. A cha<strong>in</strong> of positions is considered between the fixed end po<strong>in</strong>ts. These aresimultaneously energy m<strong>in</strong>imized with the restra<strong>in</strong>t that the distance between adjacentpo<strong>in</strong>ts should be equal. Further restra<strong>in</strong>ts are added to avoid problems causedby rigid body translations and rotations, and to avoid the route fold<strong>in</strong>g back onitself. The method presented here, Path Energy M<strong>in</strong>imization (PEM), is also basedon an optimization of a series of conformations ly<strong>in</strong>g between the fixed end po<strong>in</strong>tsbut has a number of novel features. The method is discussed at length <strong>in</strong> the nextsection.


220 Oliver S. Smart8.2 TheoryConsider a molecule whose spatial conformation is described by a vector X whichcould be <strong>in</strong> terms of Cartesian or <strong>in</strong>ternal coord<strong>in</strong>ates, and whose potential energycan be expressed as a well behaved function of these coord<strong>in</strong>ates: E(X). The PEMprocedure aims to f<strong>in</strong>d an optimal path wich l<strong>in</strong>ks two different fixed conformersof the molecule (which we shall describe by the conformation vectors Xo andXNmove+l). These positions could be based on experimental results or some modell<strong>in</strong>gprocedure. The function E(X) must give a reasonable representation for thepotential energy of the molecule at the two conformers and for <strong>in</strong>termediate positions<strong>in</strong> conformational space. The current implementation of the procedure islimited to the simulation of simple conformational transitions as the standardmacro<strong>molecular</strong> potential energy function used cannot adequately describe changes<strong>in</strong> which bonds are made or broken. If the method were implemented with a moresophisticated representation it would be possible to simulate chemical reactions.We def<strong>in</strong>e a path as a set of N,,,, <strong>molecular</strong> spatial conformations [XI, X,, X,. . . XNmoveJ and all po<strong>in</strong>ts on the l<strong>in</strong>e sections through conformational space l<strong>in</strong>k<strong>in</strong>gadjacent po<strong>in</strong>ts [XO + Xl , Xl + X,, . . . XNmove + XNmove+lJ.def<strong>in</strong>ed as the set of all <strong>molecular</strong> conformations Y:This is to say, a path isY = zx, + (1 - 2) X,+l, (8-1)where the <strong>in</strong>teger n runs from 1 to N,,,, and the number z runs cont<strong>in</strong>ously from0.0 to 1.0. Figure 8-1 shows two paths on a 2-dimensional model potential,It is clearly possible to f<strong>in</strong>d the potential energy for all po<strong>in</strong>ts on the path andconsequently the peak energy of the route E'. Transition state theory [14-161states that the most favorable route is that with the smallest E*. The PEM methodm<strong>in</strong>imizes the peak energy of a quasi-cont<strong>in</strong>ous route through conformational spacel<strong>in</strong>k<strong>in</strong>g the two fixed end po<strong>in</strong>t conformers, by mak<strong>in</strong>g adjustments to the mov<strong>in</strong>gconformations [X, X,, X, ... XNmoveJ. The procedure will therefore locate the optimalroute local to the <strong>in</strong>itial set of positions taken as the mov<strong>in</strong>g conformations.A simplify<strong>in</strong>g assumption is made <strong>in</strong> that <strong>in</strong>stead of sampl<strong>in</strong>g the energy of allpo<strong>in</strong>ts along the l<strong>in</strong>e sections a discrete number of equally spaced sample positionsare taken from each. For <strong>in</strong>stance three sample po<strong>in</strong>ts could be taken <strong>in</strong> each section,<strong>in</strong> addition to the mov<strong>in</strong>g conformations. This would require the procedure to considerall the positions given by Eq. (8-1) with z set to 0.00, 0.25, 0.50 and 0.75.It would be possible to m<strong>in</strong>imize the peak energy of such a path by consider<strong>in</strong>gthe objective function:


8 Path Energy M<strong>in</strong>imization 22115<strong>in</strong>oaon-<strong>in</strong>I I-2.0 -15 -1.0 45 0.0 05 10(b) 20ldla05on4.5-1.0 -Figure 8-1. The application of the PEM procedure to the two-parametric model potentialgiven by Miiller and Brown, 1979 [24]; the f<strong>in</strong>al optimized route is given for (a) one mov<strong>in</strong>gpo<strong>in</strong>t and (b) two mov<strong>in</strong>g po<strong>in</strong>ts. The two lowest m<strong>in</strong>ima and the saddle po<strong>in</strong>t between themare marked by circles, triangles mark the mov<strong>in</strong>g po<strong>in</strong>ts and crosses the <strong>in</strong>termediate samplepo<strong>in</strong>ts. The paths pass with<strong>in</strong> 0.02 units of the saddle po<strong>in</strong>t.


222 Oliver S. Smart(8-2)and apply<strong>in</strong>g some optimization technique. However, as Eq. (8-2) <strong>in</strong>volves the maximumfunction it is impossible to f<strong>in</strong>d its derivatives. This would preclude the useof efficient optimization methods which require the derivative vector of the objectivefunction to be known.This problem can be avoided by not<strong>in</strong>g that for any set of n positive numbers[kl, k2, c3 ... 5,) the expression:MvGi=l 9(8-3)tends towards the maximum of It1, k2, k3 . . , 5,) as M tends to <strong>in</strong>f<strong>in</strong>ity. This expressionis used to construct the PEM objective function:where N, is the number of sample conformations to be taken <strong>in</strong> each l<strong>in</strong>e section.This function is cont<strong>in</strong>uous, differentiable and tends towards the peak energy thehigher the number Mis set. This allows the use of conventional optimization techniquessuch as the Polak-Ribere conjugate gradients m<strong>in</strong>imization [46-481 as used <strong>in</strong>this study. Alternatively simulated anneal<strong>in</strong>g [49] could be employed <strong>in</strong> order toavoid some of the problems of the dependence of the f<strong>in</strong>al result on the <strong>in</strong>itial setof mov<strong>in</strong>g conformations. Importantly, it can be shown that, by the application ofthe cha<strong>in</strong> rule, the derivative vector VS can be calculated analytically from the<strong>molecular</strong> force and potential energy functions. Note that all the potential energies<strong>in</strong> Eq. (8-4) are relative to the energy of fixed position 0:E'(X) = E(X) - E(Xo), (8-5)which is assumed to be the lowest energy position for the molecule - so that E'(X)is always greater than zero.If it is necessary to exactly locate the transition state conformation an adaptionof the PEM procedure (focus<strong>in</strong>g down) can be used [52].The PEM objective function is dom<strong>in</strong>ated by the high areas of the energy profileof the path (provided the power M is set to a reasonably high value). The procedurewill concentrate on lower<strong>in</strong>g these regions, result<strong>in</strong>g <strong>in</strong> a path which will ultimatelyfollow the optimal vector close to a saddle po<strong>in</strong>t (see Figure 8-1). However, as the


8 Path Energy M<strong>in</strong>imization 223low energy parts of the path contribute very little to the objective function theseregions will not be optimized. If it is necessary to f<strong>in</strong>d the smooth overall path, avariety of methods can be used to locate routes downhill from the transition state[36]. These <strong>in</strong>clude “PEM descents”, an adaption of the PEM procedure which isused below. The method uses the fact that po<strong>in</strong>ts immediately adjacent to the peakenergy position contribute to the PEM objective function. Only one fixed conformationand one mov<strong>in</strong>g conformation are considered (l<strong>in</strong>ked by a number of <strong>in</strong>termediates).The fixed position is <strong>in</strong>itially set to the transition state conformationand the mov<strong>in</strong>g position to some po<strong>in</strong>t on one or other side of the transition state.The PEM procedure is then used to optimize the l<strong>in</strong>e section between the mov<strong>in</strong>gand fixed conformations. The highest energy <strong>in</strong>termediate conformation is “accepted”,output, and used to replace the fixed conformation. The process is thenrestarted. On each cycle the potential energy of the accepted position will be lowered.In this way PEM is used to descend from the transition state along a locally optimizedpath to the local energy m<strong>in</strong>imum.The PEM method is <strong>in</strong>spired by, and shares some features with, the self penaltywalk (SPW) procedure developed by Elber and co-workers [44, 451 from the earlierGaussian cha<strong>in</strong> approach [50]. The SPW procedure has been applied to study conformationaltransformations <strong>in</strong> peptides [44], ligand diffusion <strong>in</strong> leghemoglob<strong>in</strong> [45]and conformational transitions of DNA [51]. The Gaussian cha<strong>in</strong> approach, withlow temperature <strong>molecular</strong>-dynamics simulated anneal<strong>in</strong>g, has recently been appliedto determ<strong>in</strong>e reaction paths for a conformational change of citrate synthase [42].The SPW procedure also considers a set of mov<strong>in</strong>g conformations [X,, X,, X,. . . xNmove] between fixed end po<strong>in</strong>ts Xo and XNmove+, but unlike the PEM techniqueno sampled positions are taken between the mov<strong>in</strong>g conformations. The SPWmethod f<strong>in</strong>ds an optimal position for each <strong>in</strong>termediate by optimiz<strong>in</strong>g the objectivefunction:r!: sSPW = -kScha<strong>in</strong> -k Srepu/sion + (srigid body) (8-6)The first term of the expression is a summation of all the energies of the mov<strong>in</strong>g conformations.The term Scha<strong>in</strong> imposes the requirement that the distances between adjacentmov<strong>in</strong>g positions di,i+l should be equal (but does not specify the absolutevalue for the average distance):


224 Oliver S. SmartThe repulsion term ensures that mov<strong>in</strong>g conformations do not aggregate <strong>in</strong> lowenergy region :As the SPW procedure imposes restra<strong>in</strong>ts on the distances between mov<strong>in</strong>g positionsit is necessary to ensure that “rigid body” motions do not effect the result. In thisstudy this is accomplished by impos<strong>in</strong>g a penalty function [44, 501where rj is the Cartesian vector giv<strong>in</strong>g the position of atom j and ($7j=1,Natom isthe position of a reference set of atoms which is set to the arithmetic average of theend positions. Czerm<strong>in</strong>ski and Elber [44] <strong>in</strong>troduced an alternative to this penaltyfunction <strong>in</strong> a gradient projection constra<strong>in</strong>t technique which is much more efficient(but more difficult to implement).The SPW method differs fundamentally from PEM <strong>in</strong> that it favors routes withthe lowest average energy rather than the lowest peak energy. It is easy to imag<strong>in</strong>ethe case of a large <strong>molecular</strong> system with many dist<strong>in</strong>ct routes between two conformationswhere the route with the lowest average energy may not be the mostfavorable [50]. As noted by Fischer and Karplus [40] the SPW method results <strong>in</strong>routes with <strong>molecular</strong> conformations which lie as low as possible on each side of thesaddle po<strong>in</strong>t. As shown below, the PEM formalism encourages exactly the oppositebehavior <strong>in</strong> that if mov<strong>in</strong>g a po<strong>in</strong>t from a low energy region to the high energy partof a profile leads to improved sampl<strong>in</strong>g and a reduction of peak energy then the objectivefunction (Eq. (8-4)) will favor this. The PEM approach will avoid problemsof abrupt shifts <strong>in</strong> atomic position sometimes found by the SPW procedure [45] asif such a shift produced a path with a high energy <strong>in</strong>termediate sample position this,would automatically be disfavored by an <strong>in</strong>crease <strong>in</strong> the objective function.The methods also differ <strong>in</strong> the respect that SPW requires the concept of distanceto be applicable to the coord<strong>in</strong>ate system under consideration. The PEM methoddoes not rely on any such notion and is immediately applicable to a non-Cartesiancoord<strong>in</strong>ate representation, such as <strong>in</strong>ternal coord<strong>in</strong>ates. In particular, it would be difficultto apply the SPW formalism to case <strong>in</strong> which the conformational transitionof <strong>in</strong>terest also <strong>in</strong>volved a rigid body displacement and rotation under the <strong>in</strong>fluenceof a mild external field, such as that imposed by a crystall<strong>in</strong>e environment.


8 Path Energy M<strong>in</strong>imization 225Both PEM and SPW techniques produce a set of <strong>molecular</strong> positions and a valuefor the potential energy barrier which can be thought of as an upper estimate of enthalpyof activation [18]. Neither technique <strong>in</strong>cludes dynamical effects nor calculatesthe free energy profile (also know as the potential of mean force) which is crucialfor predict<strong>in</strong>g the behavior of the macroscopic system. Both procedures effectivelyidentify a reaction coord<strong>in</strong>ate for the transformation of <strong>in</strong>terest, which can then beused to run simulations to calculate the potential of mean force, rate constants andother dynamical properties, either by the free energy perturbation method [19] or bythe umbrella sampl<strong>in</strong>g technique [20].8.3 MethodThe PEM procedure was <strong>in</strong>corporated <strong>in</strong>to the macro<strong>molecular</strong> energy m<strong>in</strong>imizationprogram TIC [38]. The implementation uses Cartesian coord<strong>in</strong>ates to describe<strong>molecular</strong> conformation as the program has rout<strong>in</strong>es to calculate potential energyand first derivative functions <strong>in</strong> this space. The procedure allows the use of steepestdescents and/or Polak-Ribere conjugate gradients m<strong>in</strong>imization procedures [46-481for a specified number of steps or until the norm of the objective function (Eq. (8-4))falls below a specified level. At the end of the optimization procedure a check ismade as to whether the sampl<strong>in</strong>g density is high enough by consider<strong>in</strong>g an additionalsample conformation at the mid-po<strong>in</strong>t of each <strong>in</strong>terval used <strong>in</strong> the optimization (thisis achieved by sett<strong>in</strong>g N<strong>in</strong>, to 2N<strong>in</strong>ter + 1). A facility exists to automatically producean <strong>in</strong>itial set of mov<strong>in</strong>g conformations equally spaced on the l<strong>in</strong>e through Cartesianspace which l<strong>in</strong>ks two specified end po<strong>in</strong>ts. Applications of the method to transitions<strong>in</strong>volv<strong>in</strong>g a change <strong>in</strong> pucker angle for a pentulofuranose sugar and for a change theconformation of the substrate <strong>in</strong> the active site of D-xylose isomerase will bedescribed below. In addition the method has been used to simulate a substantial conformationaltransition of the ion-channel form<strong>in</strong>g peptide gramicid<strong>in</strong> A, asdescribed elsewhere [52].


226 Oliver S. Smart8.4 Applications8.4.1 A Pucker Angle Change <strong>in</strong> a-D-XylulofuranoseThe pucker of five-membered r<strong>in</strong>gs can be completely described by a pucker-angleand displacement [53]. Us<strong>in</strong>g this description it is possible to produce "adiabatic"surfaces for pucker angle changes <strong>in</strong> sugars with furanose r<strong>in</strong>gs [31, 33, 341. Thesystem provides an opportunity to apply the PEM and SPW procedures to a nontrivialproblem with small dimensionality and to test the effect of chang<strong>in</strong>g the controlparameters on the results of simulation. The pentulofuranose monosaccharidea-D-xylulofuranose was modelled <strong>in</strong> an arbitrary conformation and energym<strong>in</strong>imized to yield the conformation shown <strong>in</strong> Figure 8-2. The Amber united atomenergy function [54] was used with a dielectric constant of three and no non-bondedcut-offs. Start<strong>in</strong>g from this conformation, restra<strong>in</strong>ed energy m<strong>in</strong>imization [32] wasused to produce an adiabatic energy profile. A restra<strong>in</strong>t on the Cremer-Pople puckerangle 8 of the r<strong>in</strong>g was added to the <strong>molecular</strong> potential energy function:where €I , is the target pucker angle and KO is the restra<strong>in</strong>t constant (set to50 kcal/(mol- rad')). By adjust<strong>in</strong>g the target pucker angle <strong>in</strong> 5" steps through 360"Figure 8-2. A stereographic picture show<strong>in</strong>g the structures of the two conformations of a-D-xylulofuranose used to test the PEM procedure. Thick l<strong>in</strong>es are used for the lower energym<strong>in</strong>imum which has a Cremer-Pople pucker angle of 224" (between the 'E and 'T, forms[31], labelled A <strong>in</strong> Figure (8-3) and th<strong>in</strong> l<strong>in</strong>es show the 'E m<strong>in</strong>imum with a pucker angle of33.5" (labelled B <strong>in</strong> Figure 8-3).


8 Path Energy M<strong>in</strong>imization 227and apply<strong>in</strong>g energy m<strong>in</strong>imization after each change an energy profile aga<strong>in</strong>st puckerangle was obta<strong>in</strong>ed (Figure 8-3). This shows a second energy m<strong>in</strong>imum with a ,Econformation [31] at 33.5”, which has a r.m.s. displacement of 0.98 A from thelower energy m<strong>in</strong>imum. The second energy m<strong>in</strong>imum conformation was then subjectedto further energy m<strong>in</strong>imization and the result<strong>in</strong>g structure is shown <strong>in</strong>Figure 8-2. Transition states are also found at 324” (potential energy 4.96 kcal/molrelative to start) and a higher energy peak at 124” (this part of the profile is notshown <strong>in</strong> the figure).- 0 “r’E2 610.-cxP.!!4-.- 0YCY.- 9’+-0K0Pucker angle <strong>in</strong> degreesFigure 8-3. The energy profile obta<strong>in</strong>ed for the conformational change between two forms ofa-D-xylulofuranose. The energy m<strong>in</strong>ima marked A and B are the conformations shown <strong>in</strong>Figure 8-2. The dotted l<strong>in</strong>e shows the “adiabatic” route. The solid l<strong>in</strong>e shows the result of apply<strong>in</strong>gthe PEM procedure with a power of 100, 3 mov<strong>in</strong>g conformations and 3 <strong>in</strong>termediatesampled po<strong>in</strong>ts. Mov<strong>in</strong>g conformations are marked by circles. The dashed l<strong>in</strong>es are the resultof apply<strong>in</strong>g the SPW procedure. Short dashes mark the result of us<strong>in</strong>g a weak repulsionparameter (p = 2 kcal/mol) and longer dashes us<strong>in</strong>g stronger repulsion (p = 8 kcal/mol).Squares mark the mov<strong>in</strong>g conformations.The two energy m<strong>in</strong>ima where then used as fixed end po<strong>in</strong>ts for the SPW andPEM techniques, which were required to f<strong>in</strong>d paths l<strong>in</strong>k<strong>in</strong>g the two conformations.Information that the Cremer-Pople pucker angle of the r<strong>in</strong>g provides a suitable reactioncoord<strong>in</strong>ate for the change was not used <strong>in</strong> either of the procedures. All runs usedan <strong>in</strong>itial set of mov<strong>in</strong>g conformations taken from a l<strong>in</strong>ear path through Cartesianspace l<strong>in</strong>k<strong>in</strong>g the two m<strong>in</strong>ima (this path has a peak energy of 34.8 kcal/mol relativeto the lower energy m<strong>in</strong>imum). The SPW technique was applied with the follow<strong>in</strong>gparameter values (referr<strong>in</strong>g to Eqs. (8-6) to (8-9)), h = 2, y = 128 kcal/mol [45], the


228 Oliver S. Smartrigid-body penalty function constant (2,’) was set to 128 kcal/(mol A’ a.m.u.’).One of two values for the repulsion constant (p) was used: a weak value of2 kcal/mol or a stronger one of 8 kcal/mol.Figure 8-3 shows the results of apply<strong>in</strong>g the PEM and SPW techniques with 3mov<strong>in</strong>g conformations. In general results are encourag<strong>in</strong>g, both the PEM and SPWprocedures converge to routes close to the lower energy transition state positionfound by adiabatic mapp<strong>in</strong>g. The SPW method, with a high repulsion parameter,results <strong>in</strong> a route which lies above the adiabatic surface. In contrast, the lower repulsionparameter produces a route where two mov<strong>in</strong>g conformations lie on theadiabatic l<strong>in</strong>e but one has “slipped back” toward the lower energy m<strong>in</strong>imum. ThePEM procedure locates the saddle po<strong>in</strong>t and the conformations immediately adjacentto it (the curvature around the saddle po<strong>in</strong>t is the same as the adiabatic result)but results <strong>in</strong> path with spiked artefacts <strong>in</strong> lower energy regions. It is not possibleto immediately ascribe a value for the potential energy barrier of a conformationalrearrangement, based on a SPW result as no sampled po<strong>in</strong>ts are taken between mov<strong>in</strong>gconformations. The SPW method can, however, produce an immediate approximationto the smooth overall path <strong>in</strong> contrast with the PEM approach where furthersmooth<strong>in</strong>g runs are required.Table 8-1 shows the effect of chang<strong>in</strong>g the power, number of mov<strong>in</strong>g conformationsand number of sampled po<strong>in</strong>ts taken. In general, it can be seen that the higherthe value of the power used the more the PEM procedure concentrates on lower<strong>in</strong>gthe peak energy of path <strong>in</strong> comparison to the average energy. High values for thepower result <strong>in</strong> an <strong>in</strong>crease is computational cost it becomes necessary to recalculatesome <strong>molecular</strong> potential energies to avoid float<strong>in</strong>g po<strong>in</strong>t overflow <strong>in</strong> the calculationof the objective function (Eq. (8-4)). A power of 100 provides a reasonable compromise- lower values result <strong>in</strong> paths which do not locate the transition state conformationand larger values produce paths with greater average energies at a highercomputational cost. Three mov<strong>in</strong>g conformations and three <strong>in</strong>termediate samplepo<strong>in</strong>ts are sufficient to sample the change. Unlike the SPW procedure the PEM procedureis not particularly sensitive to the values taken for controll<strong>in</strong>g parameter, providedthat the power and numbers of mov<strong>in</strong>g/sampled conformations are highenough, reasonable sampl<strong>in</strong>g of the saddle po<strong>in</strong>t is obta<strong>in</strong>ed.To obta<strong>in</strong> a smooth overall path the PEM descents procedure (discussed above)was applied from the transition state conformation found by PEM. The results areclose to the route found by adiabatic mapp<strong>in</strong>g as shown <strong>in</strong> Figure 8-4. Interest<strong>in</strong>gly,as discussed elsewhere [36], the steepest descents path from the transition state isquite dist<strong>in</strong>ct from the PEM descents or adiabatic routes.


8 Path Enemy M<strong>in</strong>imization 229Table 8-1. Applications of PEM/SPW procedures to a change <strong>in</strong> pucker angle <strong>in</strong> a-D-xylulofuranose.(a) Effect of objective function power and comparison to SPWaPower (M) cpuc Objective functionb Maximum energyb Average energyb1000 1840300 1798100 1072............................30 98410 9563 12461 15751: SPW strongd -1: SPW weakd -adiabatic -4.96 (4.97) 4.96 (4.96)(3.92)4.96 (4.97) 4.95 (4.96)(3.84)4.98 (5.02) 4.96 (4.96) (3.75)-----------------------------------.5.13 (5.24) 4.97 (4.97) (3.63)5.74 (6.13) 5.05 (5.05) (3.40)9.08 (11.4) 5.36 (5.36)(2.43)33.9 (68.4) 6.42 (6.42) (2.17)86.1 5.27 (5.70) (3.89)19.8 4.93 (6.56) (3.59)- 4.96 -(b) Number of mov<strong>in</strong>g positions/<strong>in</strong>termediates requiredacpuc No. N<strong>in</strong>m Objective functione maximummov<strong>in</strong>genergyepositions440 1 3 6.31 (6.56) 6.25 (6.56)679 2 3 5.00 (5.03) 4.97 (4.97)1072 3 3 4.98 (5.02) 4.96 (4.96)..........................................................................................784 4 3 4.99 (5.02) 4.95 (4.97)876 5 3 4.96 (5.03) 4.97 (4.97)770 3 1 4.94 (5.03) 4.91 (5.01)285 3 2 4.96 (5.01) 4.95 (5.01)1072 3 3 4.98 (5.02) 4.96 (4.96)________________________________________----------------------------------------------1258 3 4 5.00 (5.03) 4.96 (4.96)1554 3 5 5.00 (5.04) 4.96 (4.96)a Conditions used: (a) 3 mov<strong>in</strong>g <strong>molecular</strong> configurations were considered; except for theSPW runs 3 <strong>in</strong>termediate sampled po<strong>in</strong>ts were taken <strong>in</strong> each <strong>in</strong>terval (Njnte, = 3). (b) Powerof 100. M<strong>in</strong>imization was term<strong>in</strong>ated when the root mean square objective function fellbelow 0.01 kcal/molA.All energies and function values are given <strong>in</strong> kcal/mol relative to the lower energy m<strong>in</strong>imum(1 cal = 4.184 J). The figures <strong>in</strong> brackets give energy values for paths <strong>in</strong> which 7 <strong>in</strong>termediatesare taken for each <strong>in</strong>terval.Reported seconds cpu (or equivalent) on a convex c220 (code is unvectorized).Spw restra<strong>in</strong>ts applied (as given by Eq. (8-6). The first results is for a route with a strongrepulsion term (parameter p to 8 kcal/mol) the second for a weaker set (p = 2 kcal/mol).The figures <strong>in</strong> brackets give values for energies (<strong>in</strong> kcal/mol) with a additional sampled<strong>molecular</strong> configuration taken at the midpo<strong>in</strong>t of each <strong>in</strong>terval used <strong>in</strong> optimization.


230 Oliver S. Smart216 252 200 324 360 396Pucker angle <strong>in</strong> degreesFigure 8-4. Obta<strong>in</strong><strong>in</strong>g an overall smooth route for a pucker angle change <strong>in</strong> a-D-xylulofuranose.The PEM descents procedure (solid l<strong>in</strong>es) is applied from the transition state position(marked with a circle) found <strong>in</strong> the run shown <strong>in</strong> Figure 8-3. For comparison the adiabaticroute is marked with a dotted l<strong>in</strong>e.8.4.2 Conformation Change of the Substrate <strong>in</strong> theActive Site of D-Xylose IsomeraseThe enzyme D-xylose isomerase catalyzes the isomerization of the sugar D-xylose toits ketose form D-xylulose. The enzyme has an absolute requirement for the smalldivalent metal ions: Mg2+, Co2+ or Mn2+ [55]. A reaction mechanism has beenproposed based on X-ray crystal structures of the Arthobacter enzyme with varioussubstrates, <strong>in</strong>hibitors and metal ions bound [56] and a similar mechanism has beendetailed based on high resolution crystal structures of the enzyme from Streptomycesrubig<strong>in</strong>osus [57]. The <strong>in</strong>itial stages of the mechanism <strong>in</strong>volve the substrate b<strong>in</strong>d<strong>in</strong>g<strong>in</strong> an a-D-xylulofuranose form (whose b<strong>in</strong>d<strong>in</strong>g conformation is assumed to resemblethat determ<strong>in</strong>ed for the <strong>in</strong>hibitor 5-thio-a-~-glucose). The substrate is then r<strong>in</strong>gopenedby a base-catalyzed mechanism <strong>in</strong>volv<strong>in</strong>g an active site histid<strong>in</strong>e residue. Thesubstrate subsequently undergoes a conformational rearrangement from a r<strong>in</strong>gopenedpseudo-cyclic (pC) conformation to the experimentally observed extendedopen cha<strong>in</strong> (eoc) form.This conformational rearrangement was modelled <strong>in</strong> a previous study by a reactioncoord<strong>in</strong>ate method [32, 381. This exploited the alteration dur<strong>in</strong>g the rearrangement<strong>in</strong> the co-ord<strong>in</strong>ation of the substrate to a divalent metal ion found <strong>in</strong> the active


8 Path Energy M<strong>in</strong>imization 231site. This metal ion is coord<strong>in</strong>ated by atoms 02 and 04 of the substrate <strong>in</strong> the eocform. In contrast <strong>in</strong> the cyclic and pseudo-cyclic forms of the substrate, atoms 03and 0 4 coord<strong>in</strong>ate the ion (Figure 8-5). Start<strong>in</strong>g from the eoc model, semi-harmonicrestra<strong>in</strong>t terms on the distances between the metal ion and the substrate’s ligat<strong>in</strong>g oxygenatoms were added to the <strong>molecular</strong> potential energy function. By adjust<strong>in</strong>g therestra<strong>in</strong>t distances <strong>in</strong> small steps and apply<strong>in</strong>g energy m<strong>in</strong>imization after each changethe eoc model was gradually forced to a form <strong>in</strong> which atoms 03 and 04 coord<strong>in</strong>atethe metal ion.Problems were found with this reaction coord<strong>in</strong>ate approach. The end po<strong>in</strong>t ofthe distance restra<strong>in</strong>ts procedure was a pseudo-cyclic form (designated pC2) dist<strong>in</strong>ctfrom the lower energy pseudo-cyclic form pC1. To l<strong>in</strong>k the two positions it wasnecessary to apply additional restra<strong>in</strong>ts and m<strong>in</strong>imization. Attempts to apply theprocedure <strong>in</strong> reverse (start<strong>in</strong>g from a pseudo-cyclic form and forc<strong>in</strong>g it to adopt02/04 co-ord<strong>in</strong>ation to the metal ion) did not produce reasonable results [38]. Thereaction coord<strong>in</strong>ate method also requires considerable manual <strong>in</strong>put <strong>in</strong> identify<strong>in</strong>gsuitable coord<strong>in</strong>ates, plott<strong>in</strong>g data and check<strong>in</strong>g results.The PEM procedure was applied us<strong>in</strong>g positions for the eoc, pC1 and pC2 formsdeterm<strong>in</strong>ed previously [32, 381. All residues with an atom with<strong>in</strong> 15 A of the cyclicconformation of the substrate where <strong>in</strong>cluded <strong>in</strong> the models. Most of the enzyme waskept fixed with only am<strong>in</strong>o acid side-cha<strong>in</strong> atoms with any atom with<strong>in</strong> 5 A of thesubstrate allowed flexibility (Figure 8-5). In addition to the substrate and prote<strong>in</strong>atoms, the model <strong>in</strong>cluded the two divalent metal ions <strong>in</strong> the active site and a watermolecule found coord<strong>in</strong>ated to the metal ion <strong>in</strong> site [2]. The models had a total of105 mov<strong>in</strong>g atoms with 1481 kept fixed. The Amber united atom energy function [54]was used with the exception of the metal ion/ligand <strong>in</strong>teractions which wererepresented by the potential of Dietz, Reide and He<strong>in</strong>z<strong>in</strong>ger [58]. As no explicitrepresentation of solvent was made, an implicit treatment by a dielectric constant ofthree was used [32, 381.The procedure was <strong>in</strong>itially applied to generate a reaction path l<strong>in</strong>k<strong>in</strong>g the eocform and the higher energy pseudocyclic form (pC2). An <strong>in</strong>itial set of three mov<strong>in</strong>gconformations was taken from the l<strong>in</strong>e through Cartesian space jo<strong>in</strong><strong>in</strong>g the endpo<strong>in</strong>ts. The power M <strong>in</strong> the objective function (Eq. (8-4)) was set to 100, a valuefound to be reasonable <strong>in</strong> the previous section. At the end of the m<strong>in</strong>imization a newrun was started from the optimized result with an additional mov<strong>in</strong>g conformationat the peak energy position found. This process was repeated until no further drop<strong>in</strong> the peak energy of the path was obta<strong>in</strong>ed (Table 8-2).The f<strong>in</strong>al path had a peak energy of 9.90 kcal/mol (relative to the eoc form)compar<strong>in</strong>g favorably with the value of 11.0 kcal/mol obta<strong>in</strong>ed by the previous reactioncoord<strong>in</strong>ate study. The improvement is due to better optimization of the peakenergy rather than the location of a dist<strong>in</strong>ct path. The closest approach of the newpath to the peak energy position obta<strong>in</strong>ed <strong>in</strong> the reaction coord<strong>in</strong>ate study is0.17 A r. m. s.


232 Oliver S. SmartrbFigure 8-5. A stereographic picture compar<strong>in</strong>g different energy m<strong>in</strong>ima for the substrate andactive-site prote<strong>in</strong> atoms <strong>in</strong> D-xylose isomerase. The substrate is marked by thick l<strong>in</strong>es. Onlyprote<strong>in</strong> atoms allowed to move dur<strong>in</strong>g the simulation are shown; each Cp atom is labelled withthe one letter code for the residue. The two divalent metal ions found <strong>in</strong> the active site arelabelled “Ml” and “M2”. The substrate adopts an eoc conformation <strong>in</strong> (a), (b) shows the pC1form and (c) the new low energy <strong>in</strong>termediate form.


8 Path Energy M<strong>in</strong>imization 233Table 8-2. Progress of PEM procedure for the simulation of a conformational rearrangementof D-xylose <strong>in</strong> D-xylose isomerase.No. mov<strong>in</strong>g positions Steps c. g. m<strong>in</strong>imizationa Objective functioneoc + pC2Maximum energyb3 500sd + 345 11.36 11.21 (11.21)4 215 10.32 10.15 (10.21)5 100 10.12 9.97 (9.97)6 4 10.16 9.95 (9.95)7* 131 10.07 9.91 (9.91)8* 40 10.10 9.90 (9.90)12* 6 10.15 9.90 (9.90)eoc-pC1 (1)3 500sd + 433 16.48 16.30 (16.48)4 346 15.17 14.94 (14.99)5 106 14.91 14.73 (14.73)6 41 14.88 14.70 (14.70)7 40 14.89 14.68 (14.69)8* 71 14.88 14.69 (14.69)12* 70 14.89 14.69 (14.69)eoc+pCl (2)3 600sd + 563 14.83 14.66 (15.14)4 376 11.68 11.54 (11.90)5 348 10.45 10.33 (10.38)6 164 10.18 10.04 (10.06)7 51 10.08 9.91 (9.91)8* 89 10.08 9.90 (9.90)9* 1 10.13 9.90 (9.90)13* 38 10.18 9.89 (9.89)Ref. [32] 11.01a c. g. conjugate gradients, s. d. steepest descents.Energies <strong>in</strong> kcal/mol relative to the lower energy m<strong>in</strong>imum (eoc). Three <strong>in</strong>termediate samplepo<strong>in</strong>ts were taken <strong>in</strong> each <strong>in</strong>terval (N<strong>in</strong>te, = 3). The figures given <strong>in</strong> brackets are a check onthe result with N,,, set to 7. M<strong>in</strong>imization was cont<strong>in</strong>ued until the r.m.s. objectivederivative fell below 0.03 kcal/(molA) except for the runs marked * where a0.01 kcal/(molA) limit was used.(1) Path obta<strong>in</strong>ed start<strong>in</strong>g from l<strong>in</strong>early <strong>in</strong>terpolated route <strong>in</strong> Cartesian space.(2)Path obta<strong>in</strong>ed start<strong>in</strong>g from 3*pC2 <strong>in</strong>termediate (see text of details).The procedure was then applied to f<strong>in</strong>d a path l<strong>in</strong>k<strong>in</strong>g the eoc and pC1 conformations.Once aga<strong>in</strong> the <strong>in</strong>itial path was set to a series of conformations taken fromthe l<strong>in</strong>e through Cartesian space. The f<strong>in</strong>al result of the procedure was a path witha peak energy of 14.69 kcal/mol relative to the eoc form. The result shows that thePEM method <strong>in</strong> common with all optimization procedures will f<strong>in</strong>d the optimumroute local to the start<strong>in</strong>g position. To show this a further trial was undertaken to


234 Oliver S. Smartl<strong>in</strong>k the eoc and pC1 conformations. The start<strong>in</strong>g path was set to three copies of thepC2 conformation. Table 8-2 shows that this run converged to the same result as theeoc to pC2 trial, clearly demonstrat<strong>in</strong>g the dependence of the result on the <strong>in</strong>itial setof mov<strong>in</strong>g positions taken. This dependence could be reduced by the use of a morepowerful optimization technique such as simulated anneal<strong>in</strong>g [49] - Table 8-2shows that the path with the higher peak energy has a higher objective function. Itis <strong>in</strong>terest<strong>in</strong>g to note that the f<strong>in</strong>al path does not approach the pC2 conformation- the closest position is r.m. s. 0.156 A with atom H2 be<strong>in</strong>g over l A from itsposition <strong>in</strong> pC2.Figure 8-6 shows the dist<strong>in</strong>ction between the two routes obta<strong>in</strong>ed for the transitionbetween the eoc and pC1 forms. The pr<strong>in</strong>ciple difference between the paths isthe behavior of the hydroxyl moiety H3-03. On the higher energy path this groupturns and po<strong>in</strong>ts toward the divalent metal ion creat<strong>in</strong>g an unfavorable .<strong>in</strong>teraction.In contrast the lower energy route has the dihedral angle 03-C3 chang<strong>in</strong>g <strong>in</strong> the oppositesense and the <strong>in</strong>teraction between the hydroxyl group and the metal ion neverbecomes unfavorable.I & M’I$# M1Figure 8-6. A comparison of the routes taken by substrate atoms <strong>in</strong> the different paths obta<strong>in</strong>edfor the conformational change between the eoc (dark l<strong>in</strong>es) and pC1 (open l<strong>in</strong>es) forms<strong>in</strong> D-xylose isomerase. Part (a) shows the routes for the higher energy path obta<strong>in</strong>ed froml<strong>in</strong>early <strong>in</strong>terpolated start<strong>in</strong>g conformations and (b) shows the lower energy path obta<strong>in</strong>ed byus<strong>in</strong>g the pC2 energy m<strong>in</strong>imum for the start<strong>in</strong>g conformations. In each case the transitionstate conformation is marked by grey l<strong>in</strong>es and the routes through space taken by hydrogenand oxygen atoms are shown by th<strong>in</strong> and dashed l<strong>in</strong>es respectively.


8 Path Energy M<strong>in</strong>imization 235A smooth overall route was obta<strong>in</strong>ed based on the low energy eoc to pC1 pathus<strong>in</strong>g a predecessor of the PEM descents method described above (Figure 8-7c). Theprocedure differed from the PEM descents method <strong>in</strong> that runs were startedmanually and more than one <strong>in</strong>termediate position was accepted. Once the <strong>in</strong>---2- PC 1 -eocI I I I I I I I I -10864200 1 2 3 4 5 6 7 8 9 10runn<strong>in</strong>g rms displacement <strong>in</strong> %,Figure 8-7. Graphs show<strong>in</strong>g the f<strong>in</strong>al energy profiles for the simulation of a conformationalrearrangement between the eoc and pC1 forms of D-xylose <strong>in</strong> D-xylose isomerase. In each casethe abscissa is the runn<strong>in</strong>g r. m. s. displacement of the substrate along the path and the ord<strong>in</strong>ateis the potential energy of the system relative the eoc form. The runn<strong>in</strong>g r. m. s. displacementis an <strong>in</strong>dicator of the distance moved by the substrate and is def<strong>in</strong>ed for position numbern by:nC rmsi-iii ,i=2where rmsi/i-l is the r. m. s. atomic displacement of substrate atoms between position i andi + 1. Graph (a) shows the results of the reaction coord<strong>in</strong>ate methods presented by Smart etal. 1992 [32]. The f<strong>in</strong>al result of the application of the PEM method with eight mov<strong>in</strong>g conformationsis shown as (b). This route was smoothed by PEM descents method to yield the resultshown <strong>in</strong> (c).


236 Oliver S. Smarttermediates marked “A” and “B” were identified further PEM runs started to connectthem. Small artefacts <strong>in</strong> the low energy regions of the path were accepted. Acomparison between the overall route (Figure 8-7c) and the <strong>in</strong>itial unsmoothed PEMpath (Figure 8-7 b) shows an important feature of the PEM technique. The <strong>in</strong>itialpath is very much shorter than the f<strong>in</strong>al result. By <strong>in</strong>itially concentrat<strong>in</strong>g on f<strong>in</strong>d<strong>in</strong>ga path through the transition state rather than the overall <strong>in</strong>tr<strong>in</strong>sic reaction coord<strong>in</strong>ate,the method reduces the size of any problem enormously.The overall optimal route found which l<strong>in</strong>ks the eoc and pC1 forms found here(Figure 8-7c) differs markedly from the results previously found. As well as hav<strong>in</strong>ga markedly smaller peak energy the route goes through a new low energy <strong>in</strong>termediate(marked “B”). This form has a slightly lower energy than pC1 and differs<strong>in</strong> the position of atoms 01 and H2 as shown <strong>in</strong> Figure 8-5.Table 8-3 compares the relative potential energies of important positions found<strong>in</strong> this study and previously. The estimate of the energy barrier to the rearrangementof the substrate’s coord<strong>in</strong>ation to the metal ion has been reduced to 7.2 kcal/mol.Comparison with the Arrhenius activation for the enzyme reaction of 14.6 kcal/mol[59] adds weight to the identification that the transition is unlikely to be rate-determ<strong>in</strong><strong>in</strong>g[32].The potential energy function used to date to simulate the rearrangement takesa rather <strong>in</strong>consistent approach mix<strong>in</strong>g the empirically derived Amber potentialenergy function with the Mg2+ representation of Dietz et al. [58] derived by quantummechanical calculation. Aquist [60] has derived a consistent set of van der Waalsradii and well depth for the alkali and alkali earth ions us<strong>in</strong>g a fit to the free energyof solvation for the ion. Energy m<strong>in</strong>imization and PEM was used to repeat thesimulation of the conformational transition between the eoc and pC1 forms us<strong>in</strong>gthe Aqvist parameters for Mg2+ rather than the representation of Dietz et al. [58].Table 8-3. Conformational rearrangement of substrate <strong>in</strong> D-xylose isomerase: potentialenergiesa of important forms.PC1 3.1PC24.8New <strong>in</strong>termediate -Lowest Peak 11.0Barrier 03 * 02‘ 7.9Old results b,d New resultsd New parameters(3.1)(4.8)2.79.97.2Potential energy <strong>in</strong> kcal/mol (1 cal = 4.184 J), relative to the eoc model.Calculated by the reaction co-ord<strong>in</strong>ate methods presented by Smart, Ak<strong>in</strong>s & Blow [32].The potential energy barrier for a conformational rearrangement between substrate withatoms 03 and 04 co-ord<strong>in</strong>ate to metal ion [l] and an 02/04 co-ord<strong>in</strong>ate form.Values calculated <strong>in</strong> the first two columns use the parametrization of problem as presentedby Smart, Ak<strong>in</strong>s & Blow [32], whereas new parameters use Mg2+ representation due toAqvist [60].1.35.03.28.37.0


8 Path Energy M<strong>in</strong>imization 237The results, shown <strong>in</strong> Table 8-3, show that the energy of <strong>in</strong>dividual forms changesby up to 2 kcal/mol but the energy barrier rema<strong>in</strong>s around 7 kcal/mol. The consistencybetween the results us<strong>in</strong>g a different representation for the metal ion is encourag<strong>in</strong>g.8.5 Conclusions : Potential DevelopmentsThe PEM method has been shown to be a robust and powerful technique for f<strong>in</strong>d<strong>in</strong>greaction paths for conformational changes <strong>in</strong> macromolecules. It has advantagesover the SPW method (the only comparable technique), <strong>in</strong> that sampl<strong>in</strong>g is biasedtowards high energy regions of the path (thus ensur<strong>in</strong>g that the transition state islocated), the cont<strong>in</strong>uity of the f<strong>in</strong>al result is assured and that the result is not dependenton the values taken for restra<strong>in</strong>t parameters.The implementation of the PEM method presented here suffers from the problemof computational efficiency. This is <strong>in</strong> part due to the fact that the energy and forcecalculation rout<strong>in</strong>es <strong>in</strong> TIC have not been adjusted for maximum speed, for examplean atom based non-bonded cut-off procedure is used <strong>in</strong>stead of the more efficientgroup based formalism and the code has not been vectorized. The PEM method wasalso designed with robustness rather than efficiency <strong>in</strong> m<strong>in</strong>d, The present proceduretakes a specified number of equally spaced sample conformations between each pairof mov<strong>in</strong>g po<strong>in</strong>ts but this could be changed so that the rout<strong>in</strong>e occasionally adjuststhe spac<strong>in</strong>g to ensure the peak energy position of each <strong>in</strong>terval is sampled.As shown <strong>in</strong> the D-xylose isomerase example, the PEM procedure can convergeto different paths depend<strong>in</strong>g on the start<strong>in</strong>g set of conformations. It would be verydesirable for the method to always locate the path with the lowest peak energy. Thiswould require that the procedure should f<strong>in</strong>d the global m<strong>in</strong>imum of the objectivefunction: a ubiquitous problem [48]. There are two partial solutions: to startm<strong>in</strong>imization from different start<strong>in</strong>g conformations (see below) and simulated anneal<strong>in</strong>g[49]. The <strong>molecular</strong> dynamics simulated anneal<strong>in</strong>g procedure has been applied<strong>in</strong> a variant of SPW method [42] and could be expected to be useful <strong>in</strong> m<strong>in</strong>imiz<strong>in</strong>gthe PEM objective function - allow<strong>in</strong>g the method to avoid be<strong>in</strong>g trapped <strong>in</strong>local m<strong>in</strong>ima.An important potential improvement to the method would be to apply the procedure<strong>in</strong> dihedral angle coord<strong>in</strong>ate space. This would require the <strong>molecular</strong> potentialenergy and force functions be calculable <strong>in</strong> terms of dihedral angle variables [61].This approach would result <strong>in</strong> several advantages compared to the present application<strong>in</strong> Cartesian space. Conformational change is clearly more naturally represented<strong>in</strong> <strong>in</strong>ternal coord<strong>in</strong>ate space because of the division between “soft” (dihedral angles)and “hard” (bond lengths and angles) variables. This would allow the use of fewer


238 Oliver S. Smartmov<strong>in</strong>g conformations and <strong>in</strong>termediates to represent a conformational change, <strong>in</strong>comparison to the Cartesian application where small atomic deviations producelarge <strong>in</strong>creases <strong>in</strong> bond energy. Secondly, the use of dihedral angle space allows thedescription of the conformation of a molecule with approximately one-eighth thenumber of <strong>in</strong>dependent variables. F<strong>in</strong>ally, the use of dihedral space would enablemore reasonable start<strong>in</strong>g conformations to be generated. Different start<strong>in</strong>g conformationscould be modelled by try<strong>in</strong>g alternative possibilities for dihedral angleswhich change markedly between the fixed end po<strong>in</strong>ts. Such a method has been usedby Ech-Cherif El-Kettani and Durup [42] to generate <strong>in</strong>itial routes, before apply<strong>in</strong>ga variant of the SPW procedure <strong>in</strong> Cartesian space. For the reasons set out, an applicationof the PEM method <strong>in</strong> dihedral angle space can be expected to be verymuch more efficient than the present application and should allow simulation of <strong>in</strong>terest<strong>in</strong>gconformational transitions of macromolecules.8.6 SummaryA new method for the generation of reaction coord<strong>in</strong>ates for conformational transitions<strong>in</strong> large <strong>molecular</strong> systems is presented. The path energy mimimization (PEM)technique optimizes the peak energy of a quasi-cont<strong>in</strong>uous route through conformationalspace between two given m<strong>in</strong>ima: locat<strong>in</strong>g the transition state and the optimalvector through this conformation. The method produces a series of conformationswhich effectively def<strong>in</strong>e a reaction coord<strong>in</strong>ate for the change. A transition <strong>in</strong>volv<strong>in</strong>ga pucker angle change for the sugar a-D-xylulofuranose is used to test the procedure.The results are compared with those obta<strong>in</strong>ed by both adiabatic mapp<strong>in</strong>g and theSelf Penalty Walk procedure developed by Elber and co-workers. The method is appliedto recalculate the energy barrier for a conformational rearrangement of thesubstrate <strong>in</strong> the active site of D-xylose isomerase, where it is shown to outperforman earlier adiabatic mapp<strong>in</strong>g study. Potential improvements to the method werediscussed.AcknowledgementsThis work was supported by the UK Science and Eng<strong>in</strong>eer<strong>in</strong>g Research Councilunder project grant GR/G49494 and the Molecular Recognition and ComputationalScience Initiatives. I thank Julia Goodfellow, Bonnie Wallace and David Blow forencouragement and many discussions. The ~-xylose isomerase coord<strong>in</strong>ates and reactionmechanism are the result of years of hard work by Charles Collyer, Kim Henrickand Jonathon Goldberg.


8 Path Energy M<strong>in</strong>imization 239ReferencesMonod, J., Wyman, J., Changeux, J. P., J. Mol. Biol. 1965, 12, 88-118.Perutz, M., Q. Rev. Biophys. 1989, 22, 138-236.Baldw<strong>in</strong>, J., Chothia, C., J. Mol. Biol. 1979, 129, 175-220.Stevens, R. C., Lipscomp, W. N., <strong>in</strong>: Molecular Structures <strong>in</strong> Biology, Diamond, R.,Toetzle, T. F., Prout, K., Richardson, J. A. (eds.), Oxford University Press, Oxford, 1993,pp. 223-259.Harvey, S. C., Nucl. Acids Res. 1983, 11, 4867-4878.Leroy, J. L., Kochoyan, M., Huynh-D<strong>in</strong>h, T., GuCron, M., J. Mol. Biol. 1988, 200,223 -238.Moe, J. G., Russu, I. M., Biochemistry 1992, 31, 8421-8428.Urry, D. W., Long, M. M., Jacobs, M., Harris, R. D., Ann. N. I: Acad. Sci. 1975, 264,203-220.Wallace, B. A., Biophys. .l 1984, 45, 114-116.Killian, J. A., de Kruijff, B., Biophys. .l 1988, 53, 111-117.Karplus, M., Evanseck, J. D., Joseph, D., Bash, P. A., Field, M. J., Faraday Discuss. R.SOC. Chem. 1992, 93, 239-248.Brooks, C. L. 111, Curr. Op<strong>in</strong>. Struct. Biol. 1993, 3, 92-98.Wade, R. C., Davis, M. E., Luty, B. A., Madura, J. D., McCammon, J. A., Biophys. J.1993, 64, 9-15.Eyr<strong>in</strong>g, H., J. Chem. Phys. 1935, 3, 107-115.Evans, M. G., Polanyi, M., Duns. Faraday SOC. 1935, 31, 875-894.Laidler, K. J., Chemical K<strong>in</strong>etics, Third edition, Harper & Row, New York, 1987.Northrup, S. H., Pear, M. R., Lee, C.-Y., McCammon, J. A., Karplus, M., Proc. Natl.Acad. Sci. USA 1982, 79, 4035-4039.McCammon, J. A., Harvey, S. C., Dynamics of Prote<strong>in</strong>s and Nucleic Acids, CambridgeUniversity Press, Cambridge, 1987.Elber, R., J. Chem. Phys. 1990, 93, 4312-4321.Verkhivker, G., Elber, R., Nowak, W., J. Chem. Phys. 1992, 97, 7838-7841.Elber, R., Karplus, M., Science 1987, 235, 318-321.McIver, J. W. Jr., Komornicki, A., J. Am. Chem. SOC. 1972, 94, 2625-2633.Popp<strong>in</strong>ger, D., Chem. Phys. Lett. 1975, 35, 550-554.Muller, K., Brown, L. D., Theor. Chim. Acta 1979, 53, 75-93.Cerjan, C. J., Miller, W. H., J. Chem. Phys. 1981, 75, 2800-2806.Simons, J., Jorgensen, P., Taylor, H., Ozment, J., J. Phys. Chem. 1983, 87, 2745-2753.Nguyen, D. T., Case, D. A., J. Phys. Chem. 1985, 89, 4020-4026.Bell, S., Crighton, J. S., J. Chem. Phys. 1984, 80, 2464-2475.[29] Gel<strong>in</strong>, B. R., Karplus, M., Proc. Natl. Acad. Sci. USA 1975, 72, 2002-2006.[30] Ha, S. N., Madsen, L. J., Brady, J. W., Biopolymers 1988, 27, 1927-1952.[31] French, A. D., Tran, V., Biopolymers 1990, 29, 1599-1611.[32] Smart, 0. S., Ak<strong>in</strong>s, J., Blow, D. M., Prote<strong>in</strong>s 1992, 13, 100-111.[33] Gabb, H. A., Harvey, S. C., J. Am. Chem. SOC. 1993, 115, 4218-4227.[34] Harvey, S. C., Gabb, H. A., Biopolymers 1993, 33, 1167-1172.[35] Miiller, K., Angew. Chem. Znt. Ed. Engl. 1980, 19, 1-78.[36] Smart, 0. S., Goodfellow, J. M., Mol. Simul. 1995, <strong>in</strong> press.[37] S<strong>in</strong>clair, J. E., Fletcher, R., J. Phys. C 1974, 7, 864-870.[38] Smart, 0. S., Ph. D. Thesis, University of London, 1991.


240 Oliver S. Smart[39] Halgren, T. A., Lipscomb, W. N., Chem. Phys. Lett. 1977, 49, 225-232.[40] Fischer, S., Karplus, M., Chem. Phys. Lett. 1992, 194, 252-261.[41] Liotard, D. A., Int. J; Quantum Chem. 1992, 44, 723-741.[42] Ech-Cherif El-Kettani, M. A., Durup, J., Biopolymers 1992, 32, 561 -574.[43] Schlitter, J., Engels, M., Kruger, P., Jacoby, E., Wollmer, A., Mol. Simul. 1993, 10,291 -308.[44] Czerm<strong>in</strong>ski, R., Elber, R., Znt. J; Quantum Chem. 1990, S24, 167-186.[45] Nowak, W., Czerm<strong>in</strong>ski, R., Elber, R., J; Am. Chem. SOC. 1991, 113, 5627-5637.[46] Fletcher, R., Reeves, C. M., Comput. J; 1964, 7, 149-154.[47] Polak, E., Computational Methods <strong>in</strong> Optimization, Academic Press, New York 1971.[48] Press, W. H., Flannery, B. P., Teukolsky, S. A., Vetterl<strong>in</strong>g, W. T., Numerical Recipes, TheArt of Scientific Comput<strong>in</strong>g, Cambridge University Press, Cambridge, 1986.[49] Kirkpatrick, S., Gelatt, C. D. Jr., Vecchi, M. P., Science 1983, 220, 671-680.[50] Elber, R., Karplus, M., Chem. Phys. Lett. 1987, 139, 375-380.[Sl] Czerm<strong>in</strong>ski, R., Roitberg, A., Choi, C., Utilsky, A., Elber, R., <strong>in</strong>: AZP Conference Proceed<strong>in</strong>gs239 - Advances <strong>in</strong> Bio<strong>molecular</strong> Simulations, Lavery, R., Rivail, J.-L., Smith,J. (eds.), American Institute of Physics, New York 1991, pp. 153-173.[521 Smart, 0. S., Chem. Phys. Letts. 1994, 222, 503-512.[53] Cremer, D., Pople, J. A., J; Am. Chem. SOC. 1975, 97, 1354-1358.[54] We<strong>in</strong>er, S. J., Kollman, P. A., Case, D. A., S<strong>in</strong>gh, U. C., Ghio, C., Alagona, G., Profeta,S. Jr., We<strong>in</strong>er, P., J; Am. Chem. SOC. 1984, 106, 765-784.[55] Chen, W.-P., Proc. Biochem. 1980, 15, 36-41.[56] Collyer, C. A., Henrick, K., Blow, D. M., J; Mol. Biol. 1990, 212, 211-235.1571 Whitlow, M., Howard, A. J., F<strong>in</strong>zel, B. C., Poulos, T. L., W<strong>in</strong>bourne, E., Gilliland, G.L., Prote<strong>in</strong>s 1991, 9, 153-173.[58] Dietz, W., Riede, W. O., He<strong>in</strong>z<strong>in</strong>ger, K., 2. Naturforsch. 1982, 37a, 1038-1048.1591 Danno, G., Agr. Biol. Chem., 1970, 34, 1805-1814.[60] Aqvist, J., J; Phys. Chem. 1990, 94, 8021-8024.[61] Wako, H., G6, N. J; Comp. Chem. 1987, 8, 625-635.


Computer Modell<strong>in</strong>g <strong>in</strong> Molecular BiologyEdited by Julia M. GoodfellowOVCH Verlagsgesellschaft mbH, 1995IndexAAccessible surface area 73Accessible volume 73Accuracy of modell<strong>in</strong>g 32Activated dynamics 155Active conformation 38Adiabatic mapp<strong>in</strong>g method 218Agonists 38AMBER 14, 108, 227Antagonists 38Antibody response 172Anticodon hairp<strong>in</strong> 123 - 125Antigen 172Antigen b<strong>in</strong>d<strong>in</strong>g loops 25-27Association constants 62Atomic fluctuations 120- 121Autocorrelation function 120- 121BB-factors 104Barnase 61ppBrugel 70CCellular immune system 172Channel conductance 138, 160Channel, <strong>in</strong>ternal waters 147- 148CHARMM 15, 65, 143COMPOSER 191Computer simulation 2Conformational search 46Conformational transitions 44, 215 ppContra MD method 219Coulombic <strong>in</strong>teraction 66Counterions 116Cut-off distance 67DDatabase screen<strong>in</strong>g 24, 25De novo design 52Dielectric constant 108, 109Diffusion coefficients 84-85Diffusion of water 146Distance geometry 43DNA- <strong>molecular</strong> dynamics 118- phosphate screen<strong>in</strong>g 106- polymorphism 106EEnergy calculations 24, 28- methods 42Energy m<strong>in</strong>ization 2, 28, 38, 46, 218,225Energy profile 227, 235Ensembles 113Enzyme ligand complexes 61Epste<strong>in</strong> Barr virus nucelar antigenpeptide 198ppEquilibration 110Error estimation 94FFlexibility 5, 38, 51Force field 65Free energy 43, 61 pp, 86pp- coupl<strong>in</strong>g parameter 43- difference 87- pathways 89, 92- perturbation method 86, 88- practical aspects 88- profile 138- thermodynamic <strong>in</strong>tegration 43, 87


242 IndexGGramicid<strong>in</strong> A 134- NMR data 145- tail isomerization 149- Urry Model 145HHelix bundles 53, 54Homology 11, 22HSSP files 15Hydrogen bonds 66, 74, 75, 111, 118,185, 202, 204Hydrophobic effect 85Hydrophobic mutation 86IInfluenza virus peptide 193, 197Intracellular traffick<strong>in</strong>g of peptides 173Ion channels 133pp- anion selective 134- cation selective 134- diffusion constant 138Ion permeation 136, 137, 161- Eyr<strong>in</strong>g rate theory 136-137- Nernst-Planck diffusion 136-137Ions- hydration energy- <strong>in</strong>teraction energy136142LLarge nucleic acids 127LHRF agonists 49Long-range Coulombic <strong>in</strong>teractions 67Lute<strong>in</strong>is<strong>in</strong>g hormone releas<strong>in</strong>g factor45, 46MMajor histocompatibility complex(MHC) 171pp- class 1 173, 178- 179- fold<strong>in</strong>g pathways 174- HLA-B27 175- p2 microglobul<strong>in</strong> 178- peptide b<strong>in</strong>d<strong>in</strong>g cleft 174- structure peptide 185pp- water molecules 192MD-algorithms 64MHC class I 188Modell<strong>in</strong>g programs 29- Composer 29- Insight 29- 0 29- Quanta 29- Sybyl 29- What if 29Modell<strong>in</strong>g prote<strong>in</strong> structures 9ppMolecular conformation 4Molecular dynamics 2, 28, 37, 38, 61,63pp, 118, 121, 216Moment of <strong>in</strong>ertia 55, 57Monte Carlo methods 2Motif databases 15Multiple sequence alignment 28, 30Mutations 18NNewton’s laws of motion 43Non-bonded <strong>in</strong>teractions 66Non-discruptive mutation 97Nucleic acids 104NVE ensemble 700Objective function 222Optimization 28PPath energy m<strong>in</strong>imization- comparison with SPW 224- PEM 215, 220ppPDB 15Peptide and prote<strong>in</strong> design 52Peptides 37pp- conformational studies 44


Index 243- dynamics 39- hormones 39pp- mediated fold<strong>in</strong>g 176Pharmaceutical applications 44Potential energy 2, 66, 107PRO-CHECK 191PROSITE 15Prote<strong>in</strong>- motion 71- structure prediction lOppProte<strong>in</strong> common core 19, 20Prote<strong>in</strong> evolution 17Prote<strong>in</strong> flexibility and secondarystructure 72Prote<strong>in</strong> fold<strong>in</strong>g 19Prote<strong>in</strong>-loop region 23Prote<strong>in</strong> sequence 10Prote<strong>in</strong>-solvent <strong>in</strong>teraction- with backbone 76, 77- with side-cha<strong>in</strong> 78, 79Prote<strong>in</strong> stability 91RRNA simulationsRadial distribution functions 81Reaction coord<strong>in</strong>ate 217Residue exchange matrices 16Root mean square deviation 18, 71SSCR 22Secondary structure prediction 29Self-diffusion coefficient 82Self-penalty walk (SPW) 219Sequence alignment 11- modification 62Sequence/structure relationship 11,12, 14SHAKE 64Side-cha<strong>in</strong> build<strong>in</strong>g 28Simulation protocol 113Solvent- around RNA 125- dynamical shell 83- explicit treatment 110- implicit treatment 108Sp<strong>in</strong>e of hydration 122Stability mutants 61Stochastic boundary <strong>molecular</strong>dynamics 89Structural environment 16Structural similarity 12, 18Sugar pucker 226SVR 22SWISS PROT 15SYBYL 189, 193Synthetic ion channels 53TTemplate 15Thermodynamic cycle 93, 96Thermodynamics 5Thread<strong>in</strong>g 17TIC 225TIP3P water model 60, 67Transition state 217Transmembrane ion channels 53tRNA 127VValence force field 43Vasopress<strong>in</strong> 44Verlet algorithm 64WWater and nucleic acids 106, 107Water molecules 80- bridges 106, 107, 125-126- structure 80, 82, 106Watson-Crick base-pairs 118XX-ray diffraction 104, 105Xylose isomerase 230

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!