Development and Application of Novel ... - Jacobs University

Development and Application of Novel Bioinformatics and 

Computational Modeling Tools for Protein Engineering 

by 

Rajni Verma 

A thesis submitted for the degree of 

Doctor of Philosophy 

in 

Computational Chemistry & Bioinformatics 

Date of Defense: December 14 th 2012 

Supervisor 

Prof. Dr. Danilo Roccatano 

Jacobs University Bremen, Germany 

Co-supervisor 

Prof. Dr. Ulrich Schwaneberg 

RWTH Aachen University, Germany 

External committee member 

Dr. Steven Hayward 

University of East Anglia, UK 

School of Engineering and Science

Acknowledgement 

I express my sincere gratitude to my PhD supervisor, Prof. Dr. Danilo Roccatano for 

his expert and continuous guidance. Especially, I thank him for his patience and the time he 

spent to explain the concepts and ideas that really helped me to accomplish my work. His 

constant support, understanding, motivation and valuable discussions provided me a 

wonderful learning experience during my PhD. 

I am thankful to Prof. Dr. Ulrich Schwaneberg for his constructive comments, fruitful 

discussions and his support during this endeavor. It has been a great honor and pleasure to 

work with him. I am deeply grateful for his trust on me. I express my respectful gratitude to 

Dr. Steven Hayward for being a member of my PhD committee. I am thankful to Dr. Achim 

Gelessus for his technical support to utilize efficiently CLAMV facility for scientific 

computation throughout this work at Jacobs University Bremen. 

I express my whole-hearted thanks to the member of Prof. Roccatano Group, Samira 

Hezaveh, Khadga Karki, Susruta Samanta and Edita Sarukhanyan for their wonderful 

company and support. I also convey my thanks to the member of Prof. Schwaneberg group 

at RWTH Aachen for their cooperation. I express my special thanks to my friends, Kavita, 

Amit, Amol, Sagar, Hemanshu, Susruta, Usha and Steffi for providing such a friendly 

environment. Especially, I express my whole-souled gratitude to Amol for his continuous 

support, encouragement, motivation, patience and care throughout my PhD. 

i

Funding 

The work described in this PhD thesis was financially supported by European 

Union 7 th framework program for the project entitled “Effective redesign of oxidative 

enzymes for green chemistry” (Project reference: 212281) in collaboration with Prof. Dr. 

Ulrich Schwaneberg from RWTH Aachen University. 

ii

List of Publication 

1. Verma R, Schwaneberg U, Roccatano D. Conformational dynamics of the FMN-binding 

reductase domain of monooxygenase P450BM-3. J Chem Theory and Comput 2012, DOI: 

10.1021/ct300723x. 

2. Verma R, Schwaneberg U, Roccatano D. Computer-aided protein directed evolution: a 

review of web servers, databases and other computational tools for protein 

engineering. Computational and Structural Biotechnology Journal 2012, 2 (3), 

e201209008. 

3. Ruff AJ, Marienhagen J, Verma R, Roccatano D, Genieser HG, Niemann P, Shivange AV, 

Schwaneberg U. dRTP and dPTP a complementary nucleotide couple for the Sequence 

Saturation Mutagenesis (SeSaM) method. J Mol Catal B-Enzym 2012, 84, 40-47. 

4. Verma R, Schwaneberg U, Roccatano D. MAP 2.0 3D: a sequence/structure based server 

for protein engineering. ACS Synth Bio. 2012, 1 (4), 139-150. 

5. Zhu L, Verma R, Roccatano D, Ni Y, Sun Z, Schwaneberg U. A potential antitumor drug 

(arginine deiminase) reengineered for efficient operation under physiological 

conditions. ChemBioChem 2010, 11, 2294-2301. [inside cover page] 

6. Verma R, Schwaneberg U, Roccatano D. Insight into the redox partner interaction 

mechanism in cytochrome P450BM-3 using molecular dynamics simulations. 

(manuscript under preparation) 

7. Verma R, Schwaneberg U, Roccatano D. A molecular dynamics study of the interactions 

between P450BM-3 domains and Coblat(II)Sepulchrate as an electron transfer 

mediator. (manuscript under preparation) 

iii

Abstract 

In the last decades, enzymatic catalysis emerges as a convenient and 

environmentally friendly substitute for the traditional chemical processes range from the 

synthesis of many pharmaceutical and agrochemical building blocks to fine and bulk 

chemicals, and more recently, the components of biofuel. The combination of experimental 

and computational methods holds particular promise in the field of enzymatic catalysis to 

tailor enzymes for the tasks not yet exploited by natural selection. Therefore, it is 

important to develop computational tools that help to exploit this goal. The scope of this 

thesis is to propose novel bioinformatics tools and to explore computational methods 

aimed to support and guide protein evolution experiments. The thesis is divided into two 

parts. First part of the thesis (Part I, Chapter 1 and Chapter 2) is focused on extending the 

benchmarking system of random mutagenesis methods (MAP: Mutagenesis Assistant 

Program) towards the sequence/structure and structure/function analysis and to evaluate 

this approach on commonly used enzymes as biocatalysts. Chapter 1 offers the 

comprehensive information about the computational methods used to assist protein 

engineering experiments. Chapter 2 describes a completely renewed and improved version 

of MAP server, named as MAP 2.0 3D server that correlates the generated amino acid 

substitution patterns to the structural information of the target protein. Therefore, the 

latter helps to identify in advance the random mutagenesis method that can introduce 

mutations having less deleterious effect and to improve protein fitness towards an 

expected property, e.g. charged amino acid substitutions to increase solubility of protein in 

water. The capability of the server was illustrated by in-silico screening of different 

enzymes and the predicted results were in agreement with the experimental findings. 

iv

The atomic level understanding of the subtle intertwining among structure, 

dynamics and function of enzymes plays an important role to rationally design new or 

improved functions. Second part of the thesis (Part II, Chapter 3 – 6) is based on molecular 

modeling approach to gain insight into the structural and dynamic properties of P450BM-3 

(CYP102) complex in water and in the presence of cobalt(II)sepulchrate (CoSep) as an 

electron transfer (ET) mediator. P450BM-3, isolated from Bacillus megaterium is an 

attractive target and model system for biochemical (catalyzes the wide variety of 

industrially attractive substrates) and biomedical (being a bacterial model for microsomal 

P450s system) applications. The comprehensive theoretical aspects of MD simulation are 

provided in Chapter 3 with the overview about the system preparation for MD simulation 

and the analysis of protein conformation and dynamics in the generated trajectory. In 

Chapter 4, the structural and dynamic properties of P450BM-3 FMN (Flavin 

mononucleotide) domain as holo-protein, with the cofactor in oxidized and reduced states 

and as apo-protein are investigated. The results illustrate the effect of FMN cofactor and its 

protonation state on the conformation and dynamics of the FMN domain that can be 

related to ET pathway from FMN to HEME cofactor. The study is further extended to garner 

insight into the binding modes and the structural determinant of inter-domain ET in 

HEME/FMN complex of P450BM-3. MD simulations were performed on both FMN and 

HEME domains, isolated and in their crystallographic complex and results are reported in 

Chapter 5. HEME/FMN complex undergoes the rearrangement process to decrease the 

distance between their redox centers to promote favorable ET rate under physiological 

condition. In Chapter 6, MD simulation of P450BM-3 domains (isolated HEME domain and 

HEME/FMN complex) were performed in the presence of CoSep, as ET mediator. The 

results illustrate the preferential binding modes of CoSep in P450BM-3 domains and the 

putative ET pathways from CoSep to the iron center of HEME cofactor and are in agreement 

with the experimental findings. 

v

Table of Content 

Acknowledgement .............................................................................................................................................. i 

Funding .................................................................................................................................................................... ii 

List of Publication ............................................................................................................................................. iii 

Abstract .................................................................................................................................................................. iv 

Chapter 1 ................................................................................................................................................................. 1 

1.1. Abstract ................................................................................................................................................... 1 

1.2. Background ............................................................................................................................................ 1 

1.3. Generated diversity and library size ............................................................................................ 5 

1.4. Evolutionary conservation based focused library .................................................................. 9 

1.5. Structure-based focused library ................................................................................................. 15 

1.6. Mutational effects in protein ........................................................................................................ 23 

1.7. Summary and outlook..................................................................................................................... 26 

1.8. References ........................................................................................................................................... 27 

Chapter 2 .............................................................................................................................................................. 38 

2.1. Abstract ................................................................................................................................................ 38 

2.2. Introduction ........................................................................................................................................ 39 

2.3. Methods ................................................................................................................................................ 41 

2.3.1. Mutational probability and statistics ............................................................................... 41 

2.3.2. MAP indicators .......................................................................................................................... 43 

2.3.3. Local chemical diversity and protein structure components ................................. 44 

2.3.4. MAP 2.0 3D server description ............................................................................................... 46 

2.3.5. MAP 2.0 3D output....................................................................................................................... 48 

2.3.6. Model proteins .......................................................................................................................... 48 

2.4. Results and discussions ................................................................................................................. 49 

2.4.1. D-amino acid oxidase ............................................................................................................. 49 

vi

2.4.2. Phytase ......................................................................................................................................... 57 

2.4.3. N-acetylneuraminic acid aldolase ..................................................................................... 61 

2.5. Conclusions ......................................................................................................................................... 65 

2.6. References ........................................................................................................................................... 66 

Chapter 3 .............................................................................................................................................................. 71 

3.1. Background ......................................................................................................................................... 71 

3.2. Setup of the simulated systems ................................................................................................... 75 

3.3. Equilibration procedure ................................................................................................................ 76 

3.4. Structural and dynamical analysis ............................................................................................. 77 

3.5. Cluster analysis ................................................................................................................................. 77 

3.6. Principal component analysis ...................................................................................................... 78 

3.7. References ........................................................................................................................................... 79 

Chapter 4 .............................................................................................................................................................. 81 

4.1. Abstract ................................................................................................................................................ 81 

4.1. Introduction ........................................................................................................................................ 82 

4.2. Methods ................................................................................................................................................ 84 

4.2.1. Starting coordinates ............................................................................................................... 84 

4.2.2. Molecular dynamics simulation ......................................................................................... 84 

4.2.3. FMN binding site analysis ..................................................................................................... 86 

4.2.4. Multiple structural alignment of FMN domain ............................................................. 86 

4.3. Results ................................................................................................................................................... 87 

4.3.1. FMN domain: structural and dynamical properties ................................................... 87 

4.3.2. Cluster analysis of FMN domain......................................................................................... 89 

4.3.3. FMN binding site ...................................................................................................................... 90 

4.3.4. Conservation profile of FMN binding site ...................................................................... 94 

4.3.5. Principal component analysis of FMN domain ............................................................. 96 

4.3.6. FMN cofactor: structural and dynamical properties .................................................. 99 

4.3.7. Cluster analysis of FMN cofactor ..................................................................................... 102 

4.3.8. Principal component analysis of FMN cofactor.......................................................... 102 

4.4. Discussions and conclusions ...................................................................................................... 103 

vii

4.5. References ......................................................................................................................................... 105 

Supporting information ............................................................................................................................. 109 

Chapter 5 ............................................................................................................................................................ 122 

5.1. Abstract .............................................................................................................................................. 122 

5.2. Introduction ...................................................................................................................................... 123 

5.3. Methods .............................................................................................................................................. 125 

5.3.1. Starting coordinates ............................................................................................................. 125 

5.3.2. Molecular dynamic simulations ....................................................................................... 125 

5.3.2. Electron transfer tunneling ................................................................................................ 127 

5.4. Results and discussion.................................................................................................................. 127 

5.4.1. Structural properties ............................................................................................................ 127 

5.4.2. Cluster analysis ....................................................................................................................... 130 

5.4.3. Substrate access channel .................................................................................................... 131 

5.4.4. ET tunneling pathways ........................................................................................................ 133 

5.4.5. Essential dynamics ..................................................................................................................... 135 

5.5. Conclusions ....................................................................................................................................... 139 

5.6. References ..................................................................................................................................... 140 


Chapter 6 ............................................................................................................................................................ 151 

6.1. Abstract .............................................................................................................................................. 151 

6.2. Introduction ...................................................................................................................................... 152 

6.3. Methods .............................................................................................................................................. 153 

6.3.1. Starting coordinates ............................................................................................................. 153 

6.3.2. Molecular dynamics simulation and modeling .......................................................... 154 

6.4. Results and discussion.................................................................................................................. 156 

6.4.1. CoSep binding on P450BM-3 domains .......................................................................... 157 

6.4.2. Effect of CoSep binding on substrate access channel .............................................. 159 

6.4.3. Effect of CoSep binding on ET tunneling....................................................................... 159 

6.4.4. Effect of CoSep binding on P450BM-3 dynamics ...................................................... 162 

6.5. Conclusions ....................................................................................................................................... 166 

viii

6.6. References ......................................................................................................................................... 167 


Summary and outlook................................................................................................................................. 184 

Curriculum vitae 

ix

PART I: CAPDE 

Chapter 1 

Computer-Aided Protein Directed Evolution: a Review of 

Web Servers, Databases and other Computational Tools 

for Protein Engineering 

1.1. Abstract 

The combination of computational and directed evolution methods has proven a 

winning strategy for protein engineering. We refer to this approach as computer-aided 

protein directed evolution (CAPDE) and the chapter summarizes the recent developments 

in this rapidly growing field. We will restrict ourselves to overview the availability, 

usability and limitations of web servers, databases and other computational tools proposed 

in the last five years. The goal of this chapter is to provide concise information about 

currently available computational resources to assist the design of directed evolution 

based protein engineering experiment. 

1.2. Background 

Protein engineering comprises a large number of techniques applied to evolve or 

design protein with desired function.[1] The primary objective in any protein engineering 

experiment is to identify specific sequence changes and alter the protein for desired 

1

PART I: CAPDE 

functional properties.[1,2] Generally, two main approaches are used to design the novel 

proteins or enzymes: rational design and directed evolution. The first approach employs 

the information of protein structure and focuses mutagenesis to modify protein scaffolds 

(e.g. the active site of the biocatalyst). For this approach, the knowledge of the target amino 

acid is necessary and can be provided by visual inspection or in-silico prescreening.[3] Both 

cases depend on the nature of the problem and show high success rate only for the 

prediction of single or double mutations. Indeed, multiple mutations involve cooperative 

effects on protein structure and function that are almost inaccessible to the current 

computational screening methods as well. 

A more challenging de novo design or redesign of synthetic protein or peptide uses 

solely structural information and folding rules of the proteins.[4,5] Although the method 

offers broadest possibility to design novel fold and function, the success for large proteins 

is limited.[6,7] The reasons rely on the limited number of three-dimensional protein 

structures (in particular membrane proteins) and the lack of unifying theory for protein 

folding mechanisms. Computational approaches based on micro-second to milliseconds 

atomistic [8-10] molecular dynamics (MD) simulations of protein folding have recently 

given some encouraging success for ab-initio folding of peptides and small proteins. In 

addition, the combined approach of quantum mechanics and molecular dynamics methods 

have shown the superior capability of physical based method to design new enzymatic 

reaction.[11] However, the application of these methods is still limited since they are 

considerably computational time demanding.[12] In this chapter, the approaches based on 

de novo design, quantum mechanics and molecular dynamics will not be covered. The 

reader can refer to different recent papers and reviews on these topics.[13-16] 

The second approach is the so-called directed evolution. The method is one of the 

most powerful approaches to improve or create new protein function by redesigning the 

protein structure.[17] It can, for example, improve activity or stability of biocatalyst under 

unnatural conditions (e.g. the presence of organic solvent) by accumulating multiple 

mutations.[17,18] Directed evolution involves multiple rounds of random mutagenesis or 

gene shuffling followed by screening of the mutant library.[19] The preliminary knowledge 

2

PART I: CAPDE 

of protein structure is not required in directed protein evolution. However, the structural 

information can focus and restrict the approach to specific subsets of amino acids (e.g. 

active site residues). A common problem of directed evolution methods is the limited 

distribution of generated sequence diversity that reduces the efficient sampling of 

functional sequence space.[19,20] 

In summary, rational design via site directed or saturation mutagenesis and directed 

evolution via random mutagenesis are used as key tools in protein engineering. In both 

approaches, the sequence diversity is directly generated as point mutation, insertion or 

deletion within a single parental gene. Consequently, the improvement in the quality of 

rationally designed libraries and techniques for sequence space exploration and diversity 

generation are critical for future advances. 

The combination of experimental and computational methods holds particular 

promise to tailor the proteins for tasks not yet exploited by natural selection.[21,22] In fact, 

most of the computational tools or web servers for directed evolution utilize, when it is 

possible, structural data to assist library generation processes. Since it is impossible to test 

more than a very small fraction of vast number of possible protein sequences, it urges to 

have a directed evolution strategy for generating sequence libraries with the highest 

chance to have variants with desired enzymatic properties. Such libraries can be designed 

by applying the current knowledge of the protein response towards mutations and 

sequence-structure-function relationships. 

Thermo stability, solvent stability (pH and salt stability or co-solvents tolerance) 

and enzymatic activity (as improvement in both binding affinity and catalytic activity) are 

the properties commonly targeted by protein engineering experiments. The first two 

effects are subtle to predict due to their distributed effect on protein structure. For the 

enzymatic activity, different mutagenesis studies indicate that most of the mutations, 

affecting certain enzyme properties, as substrate specificity, enantioselectivity and new 

catalytic activities, are located into or near the active site.[21] Rational design approach is 

successful in targeting relevant active site residues for site-directed mutagenesis but less 

3

PART I: CAPDE 

effective for important residues located in the second coordination sphere of the active site. 

For these cases, the combination of random mutagenesis and computer-aided protein 

directed evolution (CAPDE) approaches can provide a winning strategy. The application of 

computational methods in conjunction with directed evolution offers the exciting promise 

to generate libraries having high frequency of active and improved variants.[23] 

Figure 1.1: Schematic representation of four CAPDE approaches (as the quarters of the circle): (1) 

generated diversity and library size (in red), (2) evolutionary conservation based focused library 

(in green), (3) structure-based focused library (in purple) and (4) mutational effects in protein (in 

cyan). The servers, tools and databases associated with the approaches are shown in boxes. 

4

PART I: CAPDE 

In this chapter, for the sake of clarity, the CAPDE approaches have been divided in 

four major areas, schematically represented in Figure 1.1. The first one comprises tools 

used for characterizing the library generated by mutagenesis methods mainly through the 

statistical approaches. The second and third areas are represented by tools that consider 

the evolutionary and structural information of the target protein to design the focused 

library. Multiple sequence or structure alignment (MSA) is the key approach used by these 

tools to identify variable or conserved positions in the target protein. The fourth part is 

dedicated to the tools for the prediction of mutational effects on protein structure and 

function. These tools and/or web servers are based on machine learning, statistical or 

empirical approaches and predict mutational effect on protein stability and/or activity by 

estimating the relative free energy changes.[24] 

This chapter is divided in four parts following the division of CAPDE approaches. It 

aims to provide the concise information about currently available CAPDE methods to assist 

and design directed evolution experiments with the final goal to enhance the probability 

for identifying the mutants with desired properties. In particular, the reader will find a 

short overview and classification to novel database, web server and other computational 

tools that can provide relevant information for the interpretation of experimental results 

and have been developed in the last few years in the field of molecular modeling of protein 

structure. Finally and as previously mentioned, we are not going to take in consideration 

the methods that involve physical approach based on QM/MM or MD simulations. 

1.3. Generated diversity and library size 

The unbiased diversity generation followed by the screening of a statistically 

meaningful fraction of generated sequence space are fundamental challenges in directed 

evolution experiments.[25] The directed evolution strategy comprises two key steps: 1) 

generate diverse mutant libraries and 2) screen to identify the improved protein variants. 

The success of a directed evolution methods depends upon the quality of the mutant 

5

PART I: CAPDE 

library. The challenges and advances to generate the functionally diverse libraries have 

been reviewed in past year.[20,26] Computational tools can assist directed evolution in 

these two steps by in-silico analysis and screening of expected protein sequence space 

sampled by generated libraries (summarized in Table 1.1). Publicly available web servers, 

MAP (Mutagenesis Assistant Program)[25,27] and PEDAL-AA[28] were developed to 

estimate the diversity at protein level in the library generated by random mutagenesis 

method. 

Table 1.1: Summarizing computational tools to analyze amino acid diversity, size and 

completeness of the library generated by mutagenesis methods. 

Approach Name Input 

Nucleotide 

MAP 2.0 3D 

sequence or 

[25,27] 

protein structure. 

Statistics of 

Nucleotide 

generated 

sequence, 

diversity 

PEDEL-AA mutation rate, 

[28] library size, indel 

rate, nucleotide 

mutation matrix. 

Library size and 

Library size GLUE-IT 

randomization 

and 

[28] 

techniques. 

completenes 

s 

Probability 

TopLib [30] 

required by 

Case study 

examples 

Cytochrome 

P450BM-3,[25] D- 

amino acid oxidase, 

Phytase [27] 

α-synuclein, 

Phosphoribosylpyro 

phosphate 

amidotransferase 

(purF) [29] 

Randomization 

scheme: NNK. NDT, 

NNB, NAY [28] 

Randomization 

scheme: NNN, NNB, 

URL 

http://map.jacob 

s- 

university.de/su 

bmission.html 

http://guinevere 

.otago.ac.nz/cgi- 

bin/aef/pedel- 

AA.pl 

http://guinevere 

.otago.ac.nz/cgi- 

bin/aef/glue- 

IT.pl 

http://stat.haifa. 

ac.il/~yuval/topl 

6

PART I: CAPDE 

library size and 

randomization 

techniques. 

NNK, MAX [30] 

ib/ 

Figure 1.2: a) The MAP 2.0 3D analysis for the amino acid diversity generated by balanced epPCR 

(Taq (MnCl 2, G=A=C=T) method. Y-axis shows the original amino acid species and the X-axis shows 

the amino acid substitution patterns indicated from red (lowest probability) to blue (highest 

probability). The MAP 2.0 3D analysis is restricted to the active site residues (Ala11, Ser47, Thr48, 

Tyr137, Ile139, Lys165, Thr167, Gly189, Tyr190). For this analysis, the amino acids are grouped 

into four classes according to their chemical nature (charged, neutral, aromatic and aliphatic) with 

stop codon ((structure disrupting) and glycine/proline (helix destabilizing) as separate classes. The 

7

PART I: CAPDE 

probabilities of amino acid substitutions were mapped on the protein sequence and structure (PDB 

Id: 1NAL) of N-acetylneuraminic acid and represented in b and c, respectively. b) The Jmol [33] 

applet is used for the visualization of amino acid substitution patterns using RWB (Red-white-blue) 

color gradient scheme and active site residues as sticks. Y-axis shows sequence id, PDB id, amino 

acid name and in c) secondary structure elements (T: hydrogen bonded turn and bend, *: loop or 

irregular structure), d) normalized Cα b-factor to differentiate flexible (F) and rigid (R) residues, 

and e) relative solvent associability to identify exposed (E) or buried (B) residues. 

MAP [25] takes nucleotide sequence as input and assists to design better directed 

evolution strategy by providing the statistical analysis of random mutagenesis methods on 

protein level. The capabilities of MAP was extended in MAP 2.0 3D[27] server that predicts 

the residue mutability resulted by the mutational bias of random mutagenesis methods and 

correlates the generated amino acid substitution patterns with the structural information 

of the target protein. In this way, the server offers the possibility to analyze the 

consequences of the limitations of mutational preferences of random mutagenesis methods 

on protein level and their effects on protein structure.[25] The capability of the server was 

illustrated by the in-silico screening of different enzymes and the predicted results were in 

agreement with the experimental results.[27,31,32 ] Figure 1.2 shows an example of the 

MAP 2.0 3D output for active site residues of N-acetylneuraminic acid using epPCR 

method.[27] 

PEDAL-AA returns statistics, at amino acid level and for the libraries generated by 

epPCR method, after providing the nucleotide sequence with library size, mutation rate, 

indel rate and nucleotide mutation matrix.[28] CodonCalculator and AA-Calculator are two 

algorithms developed by Patrik et al. to select an appropriate randomization scheme for 

library construction.[28] Two servers GLUE-IT and GLUE estimate amino acid diversity and 

completeness in the generated library. Finally, the TopLib [30] web server assists to design 

saturation mutagenesis experiment by predicting the size or completeness of the generated 

library with the user-defined codon randomization scheme using probabilistic approach. 

8

PART I: CAPDE 

1.4. Evolutionary conservation based focused library 

Multiple sequence or structure alignment (MS) is the most common approach to 

identify functionally significant or evolutionary variable regions in protein.[34] In CAPDE, 

several servers and databases use MSA with the physical and structural information of 

protein or protein superfamilies. Table 1.2 contains a list of the tools considered in this 

chapter. ConSurf 2010 [35] server provides the evolutionary conservation profiles of 

protein or nucleic acid sequence or structure by first identifying the conserved positions 

using MSA and then calculating the evolutionary conservation rate using an empirical 

Bayesian inference. ConSurf-DB [36] database make available the evolutionary 

conservation profiles of the available protein structures pre-calculated by ConSurf web 

server. The 3DM [37] server performs structure based multiple sequence alignments (MSA) 

of the members of a protein superfamily and provides the consensus data combined with 

other useful information, like interactions and solvent accessibility, about amino acid 

positions in protein with published mutation data. 

For more focused analysis of protein hotspots or amino acid patches, three 

interesting tools are available as standalone programs or web servers. The Joint 

Evolutionary Tree (JET) method is more tuned to identify the conserved amino acids 

patches on protein interface by taking into account the physical-chemical properties and 

evolutionary conservation of the surface residues.[38] The predicted protein interaction 

sites or core residues might be used in site-specific mutagenesis experiments. HotSprint 

[39] database provides information of the hotspots in protein interfaces using the sequence 

conservation score (calculated by Rate4Site algorithm [40]) of the residues and their 

solvent accessible surface area. HotSpot Wizard predicts the suitability of the mutagenesis 

of the amino acids in or near the active site using their evolutionary conservation 

information.[41] The server takes protein structure as input and provides a platform to 

experimentalists to select target amino acids for site directed mutagenesis to improve 

enzymatic properties like substrate specificities, activity and enantioselectivity.[41] 

MAP 2.0 3D [27] (Table 1.1, see previous paragraph) also provides the information of 

9

PART I: CAPDE 

mutagenic hotspots generated due to the mutational preferences of the random 

mutagenesis methods with sequence and structural information of protein. Selecton [42] 

web server predicts the selective forces at each amino acid position in protein. The server 

performs the codon-based alignment on a set of the homologous nucleotide sequences and 

uses the ratio of amino acids altered to silent substitutions (Ka/Ks) to estimate both the 

positive (>1) and purifying (

PART I: CAPDE 

The web server 

performs MSA and 

ConSurf 2010 

[35] 

calculates evolutionary 

conservation rate to 

identify conserved 

positions in protein or 

GAL4 

transcription 

factor [35] 

http://consur 

f.tau.ac.il/ 

nucleotide 

sequence/structure. 

The database provides 

ConSurf DB [36] 

the predicted results of 

ConSurf [35] server for 

known protein 

Cytochrome c 

[36] 

http://consur 

fdb.tau.ac.il/in 

dex.php 

structures. 

Hotspot 

identificatio 

n 

The Evolutionary trace 

based method performs 

MSA on a set of 

homologous sequences 

DNA 

polymerase I, 

DNA 

transferase, 

(from PSI-BLAST) after 

allophycocya 

Gibbs like sampling. 

nin, Leucine 

JET [38] 

The aligned 

homologous sequences 

are used to construct 

distance tree based on 

Neighbor Joining 

dehydrogena 

se, β-trypsin 

proteinase, 

phosphotrans 

ferase, 

http://www.i 

hes.fr/~carbo 

ne/data6/lege 

nda.htm 

algorithm. The 

human 

clustering method is 

CDC42 gene 

parameterized to 

regulation 

identify protein 

protein, 

interface or core 

oncogene 

residues by taking into 

protein, 

11

PART I: CAPDE 

account the physical- 

signal 

chemical properties and 

transduction 

evolutionary 

protein etc 

conservation. 

[38] 


information about 

HotSprint 

Database [39] 

hotspots in protein 

interface using 

conservation rate and 

Numb PTB 

domain [39] 

http://prism.c 

cbb.ku.edu.tr/ 

hotsprint/ 

solvent accessibility of 

the residues. 

Haloalkane 

dehalogenase 

HotSpot wizard 

[41] 

The web server predicts 

residue mutability of 

functionally important 

residues and visualizes 

it on protein sequence 

and structure. 

, 

Phosphotries 

terase, 1,3- 

1,4-b-D- 

Glucan 4- 

glucanohydro 

lase, β- 

http://loschm 

idt.chemi.mun 

i.cz/hotspotwi 

zard/ 

Lactamase 

[41] 

The web server detects 

Selecton [42] 

selection forces on 

biologically significant 

sites in the target 

protein during 

TRIM5α 

protein [42] 

http://selecto 

n.tau.ac.il/ind 

ex.html 

evolutionary process. 

Protein 

superfamily 

3DM [37] 

The database performs 

structure based MSA for 

α/β 

hydrolase 

http://3dmcsi 

s.systemsbiol 

12

PART I: CAPDE 

based MSA 

a protein superfamily 

fold [53] 

ogy.nl/ 

with sequence, 

structural, molecular 

interaction and 

mutational information 

from literature. 

The Lipase 

Engineering 

Database 

[43,54,55] 

Lipases 

[43,54,55] 

http://www.l 

ed.unistuttgart.de/ 

The database of 

Epoxide 

epoxide 

hydrolases and 

haloalkane 

dehalogenase 

The database performs 

hydrolases 

and 

haloalkane 

dehalogenase 

http://www.l 


[56] 

protein superfamily 

[56] 

The Laccase 

based MSA and 

http://www.l 

Engineering 

annotates functionally 

Laccases [45] 

cced.uni- 

database [45] 

relevant amino acid 

stuttgart.de/ 

The Cytochrome 

P450 

engineering 

database [57] 

positions with 

structural and 

mutational information. 

Cytochrome 

P450s [57] 

http://www.c 

yped.unistuttgart.de/ 

The PHA 

Depolymerase 

Engineering 

Database [44] 

Polyhydroxya 

lkanoates 

depolymeras 

e [44] 

http://www.d 


The Lactamase 

Engineering 

database [46] 

Lactamases 

[46] 

http://www.l 

aced.unistuttgart.de/ 

13

PART I: CAPDE 

SHV Lactamase 

Engineering 

Database [47] 

SHV 

lactamases 

[47] 

http://www.l 

aced.unistuttgart.de/cl 

assA/SHVED/ 

PMD [48] 


literature based protein 

mutant information 

with structure and 

functional annotation. 

http://pmd.d 

dbj.nig.ac.jp/ 

~pmd/pmd.ht 

ml 



Literature 

based 

ProTherm [49- 

51] 

mutant information 

with thermodynamic 

parameters and 

experimental 

conditions integrated 

with sequence, 

http://gibk26. 

bio.kyutech.ac 

.jp/jouhou/Pr 

otherm/proth 

erm.html 

protein 

structure and function 

mutant data 

annotation. 



MuteinDB [52] 

mutant information, 

kinetic parameters and 

experimental 

conditions integrated 

with user-friendly and 

flexible query system to 

fetch data using 

Cytochrome 

P450s [52] 

https://mutei 

ndb.genome.t 

ugraz.at/mute 

indb-web- 

2.0/faces/init 

/index.seam 

reaction name or 

substrate or inhibitor 

14

PART I: CAPDE 

name or structure and 

mutations. 

1.5. Structure-based focused library 

The structure based approaches assist rational design and random mutagenesis by 

predicting regions in the protein responsible for stability and activity.[2,58] The 

computational tools as 3DLigandSite [59], ProBiS [60,61] (Protein Binding Site) and 

SiteComp [62] predict ligand binding site in protein [63]. All these tools, in the absence of 

crystal structure, use homology model of the target protein and aid the design and tune 

ligand binding site by identifying key residues for activity and their molecular interactions 

properties. 3DLigandSite [59] performs alignment and clustering of the homologous 

structures to predict ligand binding site. ProBiS [60,61] uses MSA to detect structurally 

similar binding site in protein and also perform local structural pairwise alignment to 

identify functionally relevant binding regions. The pre-calculated results of ProBiS analysis 

are available via ProBiS-database [64] as a repository of structurally similar binding sites. 

SiteComp [62] characterizes protein binding site using molecular interaction fields based 

descriptors. The server evaluates differences in similar binding sites, identification of subsites 

and residue contributions in ligand binding. TRITON [65,66] provides the single 

platform to protein engineers to model mutants, perform protein-ligand docking and 

calculate reaction pathways. In this way, these methods facilitate to study the properties of 

protein-ligand complexes. 

The knowledge of molecular interactions, contribute to relevant free energy barrier, 

and the design of surface charge distribution, can help to understand the molecular basis of 

kinetic stability and efficiently modulates the enhancement of protein stability.[58,67] PIC 

(Protein Interaction Calculator) server [68] calculates inter or intra protein interactions 

using published criteria integrated with solvent accessibility and residue depth 

calculations. Recently introduced web server, COCOMAP (bioCOmplexes COntact MAPs) 

15

PART I: CAPDE 

[69] uses intermolecular interactions to analyze interfaces in biological complexes. The 

identification of exposed and buried amino acids also helps to gain insight into protein 

stability and to explore the mutational effect on protein. DEPTH [70] employ distance 

information between residues and bulk solvent to predict protein stability, conservation or 

binding cavity based on information about residue depth and solvent accessibility. SRide 

[71] provides residual contribution to protein stability using interactions, evolutionary 

conservations and hydrophobicity of their neighboring residues. Patch finder plus [72] 

identifies residues that contribute to positively charge patches on protein surface and 

might interact with DNA, membrane or the other protein. ConPlex [73] utilizes protein 

solvent accessible surface area to identify surface or interface residues and assign residue 

specific conservation score on sequence and structure of the protein complex. The server 

also provides the pre-calculated ConPlex results of known protein complexes as repository. 

Recent studies have suggested that protein flexibility and protein functions are 

strongly linked.[24,74,75] Protein flexibility plays an important role in both catalytic 

activity and molecular recognition processes. The effect of protein flexibility is particularly 

relevant in protein from extremophiles to balance rigidity required for stability and 

flexibility necessary for activity [76-78]. In addition, numerous proteins have regions that 

adopt different conformation under different conditions, allowing them to take part in 

cellular and molecular regulation.[24,79] The residue flexibility in protein has been taken 

in account to describe a variety of protein properties including relation with thermal 

stability, catalytic activity, ligand binding (induced fit), domain motion, preferential 

solvation and molecular recognition in intrinsically disordered protein system. The Debye– 

Waller factor, reported in crystallographic atomic resolution structures, provides an rough 

estimation of local residue flexibility [80] and different servers provide this information as 

an indicator (for example, in MAP 2.0 3D server [27]). If the crystallographic structure is not 

available then different tools can be used to estimate flexibility profiles using different 

approaches. 

The RosettaBackrub [81] server can generate protein backbone structural variability 

as consequence of amino acid variations [82] that can be used to design sequence libraries 

16

PART I: CAPDE 

for experimental screening and to predict protein or peptide interaction specificity. The 

server generates Rosetta scored modeled structures for variant with single or multiple 

point mutations in monomeric proteins. It also generates near-native structural ensembles 

of protein backbone conformations and sequences consistent with those ensembles. 

Finally, it can predict sequences tolerated by proteins or protein interfaces using flexible 

backbone design methods. The tCONCOORD [83] method generates conformational 

ensembles to gain insight in the conformational flexibility and conformational space of the 

protein. 

FlexPred [84] specially predicts residue flexibility using pattern recognition 

approach to identify residue positions in conformations switches integrated with their 

evolutionary conservation and normalized solvent accessibility (if structure is available) as 

the Support Vector Machine (SVM) predictors. 

Different simplified methods have been proposed to identify local flexibility or large 

scale motions in protein at coarse-grained level [85-87] Many of these methods are based 

on Gaussian network model (GNM) [88] or its extension, the anisotropic network model 

(ANM) [89] to study protein dynamics using Normal Mode Analysis (NMA) (see the review 

[90] for a general overview about these topics). Table 1.3 shows the tools available to 

analyze conformational flexibility on protein structure (for more details see [91]). ElNemo 

[92] and WEBnb@ [93] servers are reported here to complete the information about NMA 

based tools. Both the servers perform NMA using coarse grain model to analyze the 

conformational changes in protein. FlexServ [94] server estimates protein flexibility using 

three different coarse-grained approaches: 1) discrete molecular dynamics (DMD), 2) 

normal mode analysis (NMA) and 3) Brownian dynamics (BD). The server characterizes 

protein flexibility by analyzing different structural and dynamic properties of the protein 

such as structural variations, essential modes, stiffness between the interacting residues 

and dynamic domains and hinge points. Different tools are available to identify hinge 

bending residues on large-scale protein motions. HINGEprot [95] server predicts hinge 

motion in protein using coarse grained GNM and ANM model. DynDom [96] use a rigorous 

approach to describe domain motion. The method determines hinge axes and hinge 

17

PART I: CAPDE 

bending residues using two conformations of the protein. A recent addition to DynDom is 

the ligand-induced domain movements in enzymes database.[97] Furthermore, the 

Dyndom3D [98] server provides a more advanced and generic tool that can be used to 

study any kind of polymer. 

The reader should be noticed that the connection between protein flexibility and 

function has been investigated theoretically and experimentally only in the last few years. 

[87,99-101] The methods based on this approach provide a qualitative estimation of 

protein dynamical properties but they do not take in account many effects (such as direct 

solvent effects) that are important for protein functionality. Till now, the atomistic 

simulation (MD or QM/MD) is the best approach to quantitatively study protein flexibility 

and dynamics.[8,87,99] Nevertheless, even to this level of accuracy, the connection 

between flexibility and functionality is still puzzling. In addition, the simulation approaches 

are still time consuming and unpractical for high-throughput modeling and analysis of 

protein structural dynamics. 

Table 1.3: Summarizing the computational tools for structure-based focused library 

generation. 

Approach Name Description 


identifies ligand 

3DLigandSite 

binding site via MSA 

[59] 

Ligand 

and clustering 

binding site 

algorithm. 


ProBiS [60,61] detects binding site 

using MSA and 

Case study 

examples 

Target 

T0483 in 

CASP8 

Biotin 

carboxylase, 

TATA 

URL 

http://www.sbg.bio.i 

c.ac.uk/~3dligandsit 

e/ 

http://probis.cmm.ki 

.si/ 

18

PART I: CAPDE 

characterizes it 

binding 

using local 

protein [60], 

structural pairwise 

D-alanine– 

alignment. 

D-alanine 

ligase, 

Protein 

kinases C 

[61] 

The database 

provides 

ProBiS- 

structurally similar 

Cytochrome 

http://probis.cmm.ki 

database [64] 

protein binding site 

c [64] 

.si/?what=database 

using ProBiS 

algorithm. 


SiteComp [62] 

characterizes ligand 

binding site using 

molecular 

interaction 

Cyclooxygen 

ase, 

adenylate 

kinase [62] 

http://scbx.mssm.ed 

u/sitecomp/sitecom 

p-web/Input.html 

descriptors. 

The method 

facilitates to model 

mutant, dock ligand 

TRITON 

[65,66] 

in the protein and 

calculates reaction 

pathways for the 

characterization of 

PA-IIL lectin 

and its 

mutants 

[65] 

http://www.ncbr.mu 

ni.cz/triton/descripti 

on.html 

protein-ligand 

interactions using 

Semi-empirical 

19

PART I: CAPDE 

quantum-mechanics 

approach. 


PIC [68] 

calculates the 

molecular 

interactions using 

- 

http://pic.mbu.iisc.er 

net.in/job.html 

published criteria. 


Protein 

interaction 

COCOMAPS 

[69] 

analyzes and 

visualizes interfaces 

in biological 

complexes using 

intermolecular 

contact maps based 

Hen egg 

lysozyme 

interaction 

with two 

antibodies 

https://www.molnac 

.unisa.it/BioTools/co 

comaps/ 

on distance or 

[69] 

physicochemical 

properties. 


predicts binding 

cavity and 

West Nile 

mutational effect on 

Virus 

http://mspc.bii.a- 

DEPTH [70] 

protein stability 

NS2B/NS3 

star.edu.sg/tankp/int 

Residue 

using residue depth 

protease 

ro.html 

depth and 

and solvent 

[70] 

stability 

accessible surface 

area. 

SRIde [71] 

The web serve 

predicts the 

contribution of 

residues in protein 

TIM-barrel 

proteins 

[102] 

http://sride.enzim.h 

u/ 

20

PART I: CAPDE 

stability using 

interactions with its 

spatial neighbors 

and their 

evolutionary 

conservation. 


identifies large 

positively charged 

DNA binding 

Patch finder 

plus [72] 

electrostatic patches 

on protein surface 

using Poisson 

domain of 

TATA 

binding 

http://pfp.technion.a 

c.il/ 

Protein 

Boltzmann 

protein [72] 

surface and 

electrostatic 

interface 

potential. 


performs 

Rho– 

ConPlex [73] 

evolutionary 

RhoGAP 

http://sbi.postech.ac. 

conservation 

complex 

kr/ConPlex/ 

analysis of the 

[73] 

protein complex. 


performs flexible 

https://kortemmelab 

Protein 

flexibility 

RosettaBackru 

b [81] 

backbone modeling 

using Backrub [103] 

method to design 

hGH-hGHr 

interface 

[104] 

.ucsf.edu/backrub/cg 

i- 

bin/rosettaweb.py?q 

tolerated protein 

uery=index 

sequences. 

tCONCOORD 

The method 

Osmoprotec 

http://wwwuser.gw 

[83] 

generates 

tion protein 

dg.de/~dseelig/tcon 

21

PART I: CAPDE 

FlexPred [84] 

ElNemo [92] 

WEBnm@ 

[93] 

FlexServ [94] 

HINGEprot 

[95] 

conformation 

ensemble and 

transitions using 

geometrical 

constrains based 

prediction of 

protein 

conformational 

flexibility 


predicts residue 

flexibility in the 

protein using SVM 

approach. 


predicts large 

amplitude motions 

in the protein using 

NMA. 


determines and 

analyzes protein 

flexibility using 

coarse-grained 

modeling approach. 


detects hinge region 

[83] coord.html 

Human PrP http://flexpred.rit.al 

[105] bany.edu/ 

HIV-1 

protease, E. http://igs- 

coli server.cnrs- 

membrane 

mrs.fr/elnemo/index 

channel .html 

protein TolC 

Calcium http://apps.cbu.uib.n 

ATPase [93] o/webnma/home 

http://mmb.pcb.ub.e 

- 

s/FlexServ/input.ph 

p 

Calmodulin http://www.prc.bou 

protein, n.edu.tr/appserv/prc 

22

PART I: CAPDE 

in the protein using 

hemoglobin 

/hingeprot/ 

both GNM and ANM. 

[95] 


predicts domain 

Hemoglobin, 

DynDom3D 

motions using 

70S 

http://fizz.cmp.uea.a 

[98] 

conformational 

ribosome 

c.uk/dyndom/3D/ 

changes in the 

[98] 

protein. 

1.6. Mutational effects in protein 

For biotechnological applications, the enhancement of protein thermal stability or 

tolerance is a common requested task in protein engineering.[106] Highly stable structure 

correlates with well-packed highly compact structure and has increased tolerance to 

mutation because mostly the mutations are deleterious i.e. related to instability of 

protein.[107] Generally the effect of the mutation on protein has been calculated by the 

free energy differences between two states of protein like thermodynamic stability as 

change in free energy in folded and unfolded state (ΔΔG). The mutational effect has been 

predicted by using different machine learning and selection methods (as SVM, Decision 

Tree (DT) or Random Forest (RE) [108]) for classification or regression of data or by using 

statistical or empirical methods taking into account the atomic interactions or structural 

properties like solvent accessibility. Most of the servers based on these approaches use 

available information of mutational effects (fetched from databases like PMD [48], 

ProTherm [51]) to predict the effect of new substitutions. Table 1.4 summarizes the 

available tools to predict mutational effects on protein stability and activity using different 

methods. I-Mutant2.0 [109] and MUpro [110] are SVM based methods to predict stabilizing 

or destabilizing amino acid substitutions based on free energy change (ΔΔG). iPTREE-STAB 

[111] server employ a DT approach to predict the effect of single mutation on protein 

stability considering physicochemical properties and contact information of the substituted 

23

PART I: CAPDE 

amino acid with their neighboring amino acids. WET-STAB [112] server performs a similar 

prediction with an additional feature to predict protein stability changes upon double 

mutations from amino acid sequence. ProMAYA [113] uses RF machine learning algorithm 

to predict protein stability based on free energy difference. MuD (Mutation detector) uses 

the same algorithm for the classification of amino acid substitutions as neutral or 

deleterious by taking into account structure- and sequence-based features as solvent 

accessibility, binding site, sequence identity.[114] SDM (Site Directed Mutator) [115] and 

PopMuSic2.1 [116] are statistical derived force field potential based methods for protein 

stability prediction using relative free energy differences. In PopMuSic2.1 [116], however, 

the parameters of statistical derived force field potential depend on protein solvent 

accessibility. FoldX plugin [117] and PEAT-SA [118] program suite utilize empirical force 

field to calculate, from three-dimensional protein or peptides structures, the relative free 

energy difference determined by the changes of interactions in the mutated structures. 

CUPSAT [119] estimates the effect of mutations on the protein stability using protein 

environment specific mean force potentials. The potentials are derived from statistical 

analysis of protein structure data sets. AUTO-MUTE [120,121] provides either energy based 

or machine learning methods for the prediction of protein stability by providing protein 

structure, mutation and experimental condition. SIFT (Sorts Intolerant From Tolerant) 

[122] server helps to explore the effect of mutation on protein function using sequence 

homology approach. The multiple alignment information is used to identify tolerated and 

deleterious substitutions in the query sequence. 

A quantitative in-silico screening of the virtual libraries based on the cooperative 

effect of multiple mutations to the stability and functionality is still out of reach. However, 

the current methods allow a qualitative indication of possible mutation sites that can 

increase the chances to get higher population of stable and functionally active variants in 

the library. The available knowledge of mutational effects on protein provided by all these 

CAPDE approaches help to limit library size and focus to generate unpredictable 

substitutions that may lead to large effects. These libraries based on in-silico screening 

generally show a higher success rate when the starting protein has sufficient stability. 

24

PART I: CAPDE 

Table 1.4: Summarizing the computational tools to analyze the mutational effect on 

protein stability and activity. 

Approach Name Description URL 

SVM 

I-Mutant2.0 

[109] 

MUpro [110] 


protein stability change 

upon point mutation. 

http://folding.uib.es/imutant/i-mutant2.0.html 

http://mupro.proteomics.ic 

s.uci.edu/ 


iPTREE-STAB 


http://210.60.98.19/IPTRE 

[111] 

with residues 

Er/iptree.htm 

Decision tree 

information. 

(DT) 


WET-STAB [112] 


upon double mutation 

with residue 

http://210.60.98.19/WETr 

/wet.htm 

information. 

Random 

forests (RF) 

ProMAYA [113] 

MuD [114] 



protein function. 

http://bental.tau.ac.il/Pro 

Maya/ 

http://mud.tau.ac.il/ 

Statistical 

potential 

SDM [115] 



protein stability. 

http://mordred.bioc.cam.ac 

.uk/sdm/sdm.php 

based 

method 

PopMuSic2.1 

[116] 


thermodynamic stability 

change upon mutation. 

http://babylone.ulb.ac.be/ 

popmusic/ 

Empirical 

force field 

FoldX [117] 

The plugin predicts 


http://foldx.crg.es/ 

25

PART I: CAPDE 

protein and facilitates in- 

silico alanine screening, 

mutant homology 

modeling and 

interaction energy 

calculation. 

The program suite 

PEAT-SA [118] 

predict mutational effect 

on protein stability, 

ligand affinity and pKa 

http://enzyme.ucd.ie/PEA 

TSA/Pages/FrontPage.php 

values. 


CUPSAT [119] 


http://cupsat.tu-bs.de/ 

protein stability. 


RF, SVM, Tree 

and SVM 

regression 

AUTO-MUTE 

[121] 


protein stability and 

activity (up to 19 

http://proteins.gmu.edu/a 

utomute/ 

mutations). 

Evolutionary 

conservation 

SIFT [122] 



protein function. 

http://sift.jcvi.org/ 

1.7. Summary and outlook 

In this chapter, the recent additions to the CAPDE arsenal of computational tools, 

servers and databases have been briefly reviewed. The rapid accumulation of the 

knowledge on protein structures and sequence-structure-function relationships foresee 

the continuous amelioration of these methods. In particular, machine-learning approaches, 

26

PART I: CAPDE 

in which the volume of data is the heuristic key to access the hidden knowledge, statistical 

based force fields for coarse-grained approaches will surely benefit this trend. These 

approaches are not only the convenient aids to support lab experiments but also the 

workbench for heuristically blueprinting novel molecules. In addition, the availability of the 

low cost and high performance computers will soon transform currently expensive 

physically based simulations to the convenient and very accurate high throughput 

computational tools. This will make possible to predict structural stability and folds of 

small or medium sized proteins and will open a new working style paradigm in protein 

engineering. In addition, the physical based approach has already shown promising results 

to understand enzyme activity.[123,124] 

1.8. References 

1. Bornscheuer UT, Huisman GW, Kazlauskas RJ, Lutz S, Moore JC, et al. (2012) 

Engineering the third wave of biocatalysis. Nature 485: 185-194. 

2. Lutz S (2010) Beyond directed evolution-semi-rational protein engineering and 

design. Curr Opin Biotech 21: 734-743. 

3. Gerlt JA, Babbitt PC (2009) Enzyme (re)design: lessons from natural evolution and 

computation. Curr Opin Chem Biol 13: 10-18. 

4. Jackel C, Kast P, Hilvert D (2008) Protein design by directed evolution. Annu Rev 

Biophys 37: 153-173. 

5. Damborsky J, Brezovsky J (2009) Computational tools for designing and engineering 

biocatalysts. Curr Opin Chem Biol 13: 26-34. 

6. Suarez M, Jaramillo A (2009) Challenges in the computational design of proteins. J R 

Soc Interface 6 (Suppl 4): S477–S491. 

7. Pantazes RJ, Grisewood MJ, Maranas CD (2011) Recent advances in computational 

protein design. Curr Opin Struct Biol 21: 467-472. 

8. Dror RO, Dirks RM, Grossman JP, Xu H, Shaw DE (2012) Biomolecular simulation: a 

computational microscope for molecular biology. Annu Rev Biophys 41: 429-452. 

27

PART I: CAPDE 

9. Lee EH, Hsin J, Sotomayor M, Comellas G, Schulten K (2009) Discovery through the 

computational microscope. Structure 17: 1295-1306. 

10. Schlick T, Collepardo-Guevara R, Halvorsen LA, Jung S, Xiao X (2011) 

Biomolecularmodeling and simulation: a field coming of age. Q Rev Biophys 44: 191- 

228. 

11. McGeagh JD, Ranaghan KE, Mulholland AJ (2011) Protein dynamics and enzyme 

catalysis: Insights from simulations. BBA-Proteins Proteom 1814: 1077-1092. 

12. Klepeis JL, Lindorff-Larsen K, Dror RO, Shaw DE (2009) Long-timescale molecular 

dynamics simulations of protein structure and function. Curr Opin Struct Biol 19: 

120-127. 

13. Barrozo A, Borstnar R, Marloie Gl, Kamerlin SCL (2012) Computational protein 

engineering: bridging the gap between rational design and laboratory evolution. Int 

J Mol Sci 13: 12428-12460. 

14. Frushicheva MP, Cao J, Warshel A (2011) Challenges and advances in validating 

enzyme design proposals: the case of kemp eliminase catalysis. Biochemistry 50: 

3849-3858. 

15. Frushicheva MP, Warshel A (2012) Towards quantitative computer-aided studies of 

enzymatic enantioselectivity: the case of Candida antarctica lipase A. Chembiochem 

13: 215-223. 

16. van der Kamp MW, Mulholland AJ (2008) Computational enzymology: insight into 

biological catalysts from modelling. Nat Prod Rep 25: 1001-1014. 

17. Turner NJ (2009) Directed evolution drives the next generation of biocatalysts. Nat 

Chem Biol 5: 567-573. 

18. Arnold FH, Moore JC (1997) Optimizing industrial enzymes by directed evolution. 

Adv Biochem Eng Biotechnol 58: 1-14. 

19. Tracewell CA, Arnold FH (2009) Directed enzyme evolution: climbing fitness peaks 

one amino acid at a time. Curr Opin Chem Biol 13: 3-9. 

20. Wong TS, Roccatano D, Schwaneberg U (2007) Steering directed protein evolution: 

strategies to manage combinatorial complexity of mutant libraries. Environ 

Microbiol 9: 2645-2659. 

28

PART I: CAPDE 

21. Chica RA, Doucet N, Pelletier JN (2005) Semi-rational approaches to engineering 

enzyme activity: combining the benefits of directed evolution and rational design. 

Curr Opin Biotech 16: 378-384. 

22. Kazlauskas RJ, Bornscheuer UT (2009) Finding better protein engineering 

strategies. Nat Chem Biol 5: 526-529. 

23. Romero PA, Arnold FH (2009) Exploring protein fitness landscapes by directed 

evolution. Nat Rev Mol Cell Biol 10: 866-876. 

24. Tokuriki N, Tawfik DS (2009) Stability effects of mutations and protein evolvability. 

Curr Opin Struct Biol 19: 596-604. 

25. Wong TS, Roccatano D, Zacharias M, Schwaneberg U (2006) A statistical analysis of 

random mutagenesis methods used for directed protein evolution. J Mol Biol 355: 

858-871. 

26. Shivange AV, Marienhagen J, Mundhada H, Schenk A, Schwaneberg U (2009) 

Advances in generating functional diversity for directed protein evolution. Curr 

Opin Chem Biol 13: 19-25. 

27. Verma R, Schwaneberg U, Roccatano D (2012) MAP2.03D: a sequence/structure 

based server for protein engineering. ACS Synth Biol 1: 139-150. 

28. Firth AE, Patrick WM (2008) GLUE-IT and PEDEL-AA: new programmes for 

analyzing protein diversity in randomized libraries. Nucleic Acids Res 36: W281- 

W285. 

29. Patrick WM, Matsumura I (2008) A study in molecular contingency: glutamine 

phosphoribosylpyrophosphate amidotransferase is a promiscuous and evolvable 

phosphoribosylanthranilate isomerase. J Mol Biol 377: 323-336. 

30. Nov Y (2011) When second best is good enough: another probabilistic look at 

saturation mutagenesis. Appl Environ Microbiol 78: 258-262. 

31. Rasila TS, Pajunen MI, Savilahti H (2009) Critical evaluation of random mutagenesis 

by error-prone polymerase chain reaction protocols, Escherichia coli mutator strain, 

and hydroxylamine treatment. Anal Biochem 388: 71-80. 

32. Ruff AJ, Marienhagen J, Verma R, Roccatano D, Genieser H-G, et al. (2012) dRTP and 

dPTP a complementary nucleotide couple for the Sequence Saturation Mutagenesis 

(SeSaM) method. J Mol Catal B-Enzym 84: 40-47. 

29

PART I: CAPDE 

33. Jmol: an open-source Java viewer for chemical structures in 3D. 

http://www.jmol.org/ 

34. Pei J (2008) Multiple protein sequence alignment. Curr Opin Struct Biol 18: 382-386. 

35. Ashkenazy H, Erez E, Martz E, Pupko T, Ben-Tal N (2010) ConSurf 2010: calculating 

evolutionary conservation in sequence and structure of proteins and nucleic acids. 

Nucleic Acids Res 38: W529-533. 

36. Goldenberg O, Erez E, Nimrod G, Ben-Tal N (2009) The ConSurf-DB: pre-calculated 

evolutionary conservation profiles of protein structures. Nucleic Acids Res 37: 

D323-D327. 

37. Kuipers RK, Joosten H-J, van Berkel WJH, Leferink NGH, Rooijen E, et al. (2010) 3DM: 

Systematic analysis of heterogeneous superfamily data to discover protein 

functionalities. Proteins 78: 2101-2113. 

38. Engelen S, Trojan LA, Sacquin-Mora S, Lavery R, Carbone A (2009) Joint 

Evolutionary Trees: a large-scale method to predict protein interfaces based on 

sequence sampling. PLoS Comput Biol 5: e1000267. 

39. Guney E, Tuncbag N, Keskin O, Gursoy A (2008) HotSprint: database of 

computational hot spots in protein interfaces. Nucleic Acids Res 36: D662-D666. 

40. Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N (2002) Rate4Site: an algorithmic 

tool for the identification of functional regions in proteins by surface mapping of 

evolutionary determinants within their homologues. Bioinformatics 18: S71-S77. 

41. Pavelka A, Chovancova E, Damborsky J (2009) HotSpot Wizard: a web server for 

identification of hot spots in protein engineering. Nucleic Acids Res 37: W376- 

W383. 

42. Stern A, Doron-Faigenboim A, Erez E, Martz E, Bacharach E, et al. (2007) Selecton 

2007: advanced models for detecting positive and purifying selection using a 

Bayesian inference approach. Nucleic Acids Res 35: W506-W511. 

43. Pleiss Jr, Fischer M, Peiker M, Thiele C, Rolf D (2000) Lipase Engineering Database: 

understanding and exploiting sequence-structure-function relationships. J Mol Catal 

B-Enzym 10: 491-508. 

30

PART I: CAPDE 

44. Knoll M, Hamm TM, Wagner F, Martinez V, Pleiss J (2009) The PHA Depolymerase 

Engineering Database: a systematic analysis tool for the diverse family of 

polyhydroxyalkanoate (PHA) depolymerases. BMC Bioinformatics 10: 89. 

45. Sirim D, Wagner F, Wang L, Schmid RD, Pleiss J (2010) The Laccase Engineering 

Database: a classification and analysis system for laccases and related multicopper 

oxidases. Database 2011: bar006. 

46. Thai QK, Bos F, Pleiss J (2009) The Lactamase Engineering Database: a critical 

survey of TEM sequences in public databases. BMC Genomics 10: 390. 

47. Thai QK, Pleiss J (2010) SHV Lactamase Engineering Database: a reconciliation tool 

for SHV beta-lactamases in public databases. BMC Genomics 11: 563. 

48. Kawabata T, Ota M, Nishikawa K (1999) The Protein Mutant Database. Nucleic Acids 

Res 27: 355-357. 

49. Gromiha MM, Uedaira H, An J, Selvaraj S, Prabakaran P, et al. (2002) ProTherm, 

thermodynamic database for proteins and mutants: developments in version 3.0. 

Nucleic Acids Res 30: 301-302. 

50. Gromiha MM, An J, Kono H, Oobatake M, Uedaira H, et al. (2000) ProTherm, version 

2.0: thermodynamic database for proteins and mutants. Nucleic Acids Res 28: 283- 

285. 

51. Bava KA, Gromiha MM, Uedaira H, Kitajima K, Sarai A (2004) ProTherm, version 4.0: 

thermodynamic database for proteins and mutants. Nucleic Acids Res 32: D120-121. 

52. Braun A, Halwachs B, Geier M, Weinhandl K, Guggemos M, et al. (2012) MuteinDB: 

the mutein database linking substrates, products and enzymatic reactions directly 

with genetic variants of enzymes. Database 2012. 

53. Kourist R, Jochens H, Bartsch S, Kuipers R, Padhi SK, et al. (2010) The alpha/betahydrolase 

fold 3DM database (ABHDB) as a tool for protein engineering. 

Chembiochem 11: 1635-1643. 

54. Fischer M, Pleiss J (2003) The Lipase Engineering Database: a navigation and 

analysis tool for protein families. Nucleic Acids Res 31: 319-321. 

55. Widmann M, Juhl PB, Pleiss J (2010) Structural classification by the Lipase 

Engineering Database: a case study of Candida antarctica lipase A. BMC Genomics 

11: 123. 

31

PART I: CAPDE 

56. Barth S, Fischer M, Schmid RD, Pleiss J (2004) The database of epoxide hydrolases 

and haloalkane dehalogenases: one structure, many functions. Bioinformatics 20: 

2845-2847. 

57. Sirim D, Wagner F, Lisitsa A, Pleiss J (2009) The cytochrome P450 engineering 

database: Integration of biochemical properties. BMC Biochem 10: 27. 

58. Gong S, Worth CL, Bickerton GR, Lee S, Tanramluk D, et al. (2009) Structural and 

functional restraints in the evolution of protein families and superfamilies. Biochem 

Soc Trans 37: 727-733. 

59. Wass MN, Kelley LA, Sternberg MJ (2010) 3DLigandSite: predicting ligand-binding 

sites using similar structures. Nucleic Acids Res 38: W469-473. 

60. Konc J, Janezic D (2010) ProBiS algorithm for detection of structurally similar 

protein binding sites by local structural alignment. Bioinformatics 26: 1160-1168. 

61. Konc J, Janezic D (2012) ProBiS-2012: web server and web services for detection of 

structurally similar binding sites in proteins. Nucleic Acids Res 40: W214-221. 

62. Lin Y, Yoo S, Sanchez R (2012) SiteComp: a server for ligand binding site analysis in 

protein structures. Bioinformatics. 

63. Liang J, Tseng YY, Dundas J, Binkowski TA, Joachimiak A, et al. (2008) Predicting and 

characterizing protein functions through matching geometric and evolutionary 

patterns of binding surfaces. Adv Protein Chem Struct Biol 75: 107-141. 

64. Konc J, Cesnik T, Konc JT, Penca M, Janezic D (2012) ProBiS-database: precalculated 

binding site similarities and local pairwise alignments of PDB structures. J Chem Inf 

Model 52: 604-612. 

65. Prokop M, Damborsky J, Koca J (2000) TRITON: in silico construction of protein 

mutants and prediction of their activities. Bioinformatics 16: 845-846. 

66. Prokop M, Adam J, Kriz Z, Wimmerova M, Koca J (2008) TRITON: a graphical tool for 

ligand-binding protein engineering. Bioinformatics 24: 1955-1956. 

67. Sanchez-Ruiz JM (2010) Protein kinetic stability. Biophys Chem 148: 1-15. 

68. Tina KG, Bhadra R, Srinivasan N (2007) PIC: Protein Interactions Calculator. Nucleic 

Acids Res 35: W473-476. 

32

PART I: CAPDE 

69. Vangone A, Spinelli R, Scarano V, Cavallo L, Oliva R (2011) COCOMAPS: a web 

application to analyse and visualize contacts at the interface of biomolecular 

complexes. Bioinformatics. 

70. Tan KP, Varadarajan R, Madhusudhan MS (2011) DEPTH: a web server to compute 

depth and predict small-molecule binding cavities in proteins. Nucleic Acids Res. 

71. Magyar C, Gromiha MM, Pujadas G, Tusnady GE, Simon I (2005) SRide: a server for 

identifying stabilizing residues in proteins. Nucleic Acids Res 33: W303-305. 

72. Shazman S, Celniker G, Haber O, Glaser F, Mandel-Gutfreund Y (2007) Patch Finder 

Plus (PFplus): a web server for extracting and displaying positive electrostatic 

patches on protein surfaces. Nucleic Acids Res 35: W526-W530. 

73. Choi YS, Han SK, Kim J, Yang J-S, Jeon J, et al. (2010) ConPlex: a server for the 

evolutionary conservation analysis of protein complex structures. Nucleic Acids Res 

38: W450-W456. 

74. Teilum K, Olsen JG, Kragelund BB (2011) Protein stability, flexibility and function. 

Biochim Biophys Acta 1814: 969-976. 

75. Teilum K, Olsen JG, Kragelund BB (2009) Functional aspects of protein flexibility. 

Cell Mol Life Sci 66: 2231-2247. 

76. Henzler-Wildman K, Kern D (2007) Dynamic personalities of proteins. Nature 450: 

964-972. 

77. Mittermaier AK, Kay LE (2009) Observing biological dynamics at atomic resolution 

using NMR. Trends Biochem Sci 34: 601-611. 

78. Martinez R, Schwaneberg U, Roccatano D (2011) Temperature effects on structure 

and dynamics of the psychrophilic protease subtilisin S41 and its thermostable 

mutants in solution. Protein Eng Des Sel 24: 533-544. 

79. Ma B, Nussinov R (2010) Enzyme dynamics point to stepwise conformational 

selection in catalysis. Curr Opin Chem Biol 14: 652-659. 

80. Zhang H, Zhang T, Chen K, Shen SY, Ruan JS, et al. (2009) On the relation between 

residue flexibility and local solvent accessibility in proteins. Proteins 76: 617-636. 

81. Lauck F, Smith CA, Friedland GF, Humphris EL, Kortemme T (2010) RosettaBackruba 

web server for flexible backbone protein structure modeling and design. Nucleic 

Acids Res 38: W569-W575. 

33

PART I: CAPDE 

82. Mandell DJ, Kortemme T (2009) Backbone flexibility in computational protein 

design. Curr Opin Biotech 20: 420-428. 

83. Seeliger D, Haas Jr, de Groot BL (2007) Geometry-based sampling of conformational 

transitions in proteins. Structure 15: 1482-1492. 

84. Kuznetsov IB, McDuffie M (2008) FlexPred: a web-server for predicting residue 

positions involved in conformational switches in proteins. Bioinformation 3: 134- 

136. 

85. Bahar I, Lezon TR, Yang L-W, Eyal E (2010) Global dynamics of proteins: bridging 

between structure and function. Ann Rev Biophys 39: 23-42. 

86. Bahar I, Rader AJ (2005) Coarse-grained normal mode analysis in structural biology. 


87. Kamerlin SCL, Vicatos S, Dryga A, Warshel A (2011) Coarse-grained (multiscale) 

simulations in studies of biophysical and chemical systems. Annu Rev Phys Chem 

62: 41-64. 

88. Bahar I, Atilgan AR, Erman B (1997) Direct evaluation of thermal fluctuations in 

proteins using a single-parameter harmonic potential. Fold Des 2: 173-181. 

89. Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Keskin O, et al. (2001) Anisotropy of 

fluctuation dynamics of proteins with an elastic network model. Biophys J 80: 505- 

515. 

90. Skjaerven L, Hollup SM, Reuter N (2009) Normal mode analysis for proteins. J Mol 

Struc- Theochem 898: 42-48. 

91. Liu X, Karimi HA (2007) High-throughput modeling and analysis of protein 

structural dynamics. Brief Bioinform 8: 432-445. 

92. Suhre K, Sanejouand Y-H (2004) ElNemo: a normal mode web server for protein 

movement analysis and the generation of templates for molecular replacement. 

Nucleic Acids Res 32: W610-W614. 

93. Hollup S, Salensminde G, Reuter N (2005) WEBnm@: a web application for normal 

mode analyses of proteins. BMC Bioinformatics 6: 52. 

94. Camps J, Carrillo O, Emperador A, Orellana L, Hospital A, et al. (2009) FlexServ: an 

integrated tool for the analysis of protein flexibility. Bioinformatics 25: 1709-1710. 

34

PART I: CAPDE 

95. Emekli U, Schneidman-Duhovny D, Wolfson HJ, Nussinov R, Haliloglu T (2008) 

HingeProt: automated prediction of hinges in protein structures. Proteins 70: 1219- 

1227. 

96. Hayward S, Berendsen HJC (1998) Systematic analysis of domain motions in 

proteins from conformational change: New results on citrate synthase and T4 

lysozyme. Proteins 30: 144-154. 

97. Qi GY, Hayward S (2009) Database of ligand-induced domain movements in 

enzymes. BMC Struct Biol 9. 

98. Poornam GP, Matsumoto A, Ishida H, Hayward S (2009) A method for the analysis of 

domain movements in large biomolecular complexes. Proteins 76: 201-212. 

99. Glowacki DR, Harvey JN, Mulholland AJ (2012) Taking Ockham's razor to enzyme 

dynamics and catalysis. Nature Chemistry 4: 169-176. 

100. Pisliakov AV, Cao J, Kamerlin SCL, Warshel A (2009) Enzyme millisecond 

conformational dynamics do not catalyze the chemical step. Proc Natl Acad Sci USA 

106: 17359-17364. 

101. Roca M, Vardi-Kilshtain A, Warshel A (2009) Toward accurate screening in 

computer-aided enzyme design. Biochemistry 48: 3046-3056. 

102. Gromiha MM, Pujadas G, Magyar C, Selvaraj S, Simon I (2004) Locating the 

stabilizing residues in (α/β)8 barrel proteins based on hydrophobicity, long-range 

interactions, and sequence conservation. Proteins 55: 316-329. 

103. Davis IW, Arendall WB, 3rd, Richardson DC, Richardson JS (2006) The backrub 

motion: how protein backbone shrugs when a sidechain dances. Structure 14: 265- 

274. 

104. Humphris EL, Kortemme T (2008) Prediction of Protein-Protein Interface Sequence 

Diversity Using Flexible Backbone Computational Protein Design. Structure 16: 

1777-1788. 

105. Kuznetsov IB (2008) Ordered conformational change in the protein backbone: 

prediction of conformationally variable positions from sequence and low-resolution 

structural data. Proteins 72: 74-87. 

106. Bloom JD, Arnold FH (2009) In the light of directed evolution: pathways of adaptive 

protein evolution. Proc Natl Acad Sci USA 106 Suppl 1: 9995-10000. 

35

PART I: CAPDE 

107. Tokuriki N, Stricher F, Serrano L, Tawfik DS (2008) How protein stability and new 

functions trade off. PLoS Comput Biol 4: e1000002. 

108. Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in 

bioinformatics. Bioinformatics 23: 2507-2517. 

109. Capriotti E, Fariselli P, Casadio R (2005) I-Mutant2.0: predicting stability changes 

upon mutation from the protein sequence or structure. Nucleic Acids Res 33: W306- 

310. 

110. Cheng J, Randall A, Baldi P (2006) Prediction of protein stability changes for singlesite 

mutations using support vector machines. Proteins 62: 1125-1132. 

111. Huang LT, Gromiha MM, Ho SY (2007) iPTREE-STAB: interpretable decision tree 

based method for predicting protein stability changes upon mutations. 

Bioinformatics 23: 1292-1293. 

112. Huang LT, Gromiha MM (2009) Reliable prediction of protein thermostability 

change upon double mutation from amino acid sequence. Bioinformatics 25: 2181- 

2187. 

113. Wainreb G, Wolf L, Ashkenazy H, Dehouck Y, Ben-Tal N (2011) Protein stability: a 

single recorded mutation aids in predicting the effects of other mutations in the 

same amino acid site. Bioinformatics 27: 3286-3292. 

114. Wainreb G, Ashkenazy H, Bromberg Y, Starovolsky-Shitrit A, Haliloglu T, et al. 

(2010) MuD: an interactive web server for the prediction of non-neutral 

substitutions using protein structural data. Nucleic Acids Res 38: W523-W528. 

115. Worth CL, Preissner R, Blundell TL (2011) SDM:a server for predicting effects of 

mutations on protein stability and malfunction. Nucleic Acids Res 39: W215-W222. 

116. Dehouck Y, Kwasigroch J, Gilis D, Rooman M (2011) PoPMuSiC 2.1: a web server for 

the estimation of protein stability changes upon mutation and sequence optimality. 

BMC Bioinformatics 12: 151. 

117. Van Durme J, Delgado J, Stricher F, Serrano L, Schymkowitz J, et al. (2011) A 

graphical interface for the FoldX forcefield. Bioinformatics 27: 1711-1712. 

118. Johnston MA, Søndergaard CR, Nielsen JE (2011) Integrated prediction of the effect 

of mutations on multiple protein characteristics. Proteins 79: 165-178. 

36

PART I: CAPDE 

119. Parthiban V, Gromiha MM, Schomburg D (2006) CUPSAT: prediction of protein 

stability upon point mutations. Nucleic Acids Res 34: W239-242. 

120. Masso M, Vaisman II (2008) Accurate prediction of stability changes in protein 

mutants by combining machine learning with structure based computational 

mutagenesis. Bioinformatics 24: 2002-2009. 

121. Masso M, Vaisman II (2010) AUTO-MUTE: web-based tools for predicting stability 

changes in proteins due to single amino acid replacements. Protein Eng Des Sel 23: 

683-687. 

122. Kumar P, Henikoff S, Ng PC (2009) Predicting the effects of coding non-synonymous 

variants on protein function using the SIFT algorithm. Nat Protoc 4: 1073-1081. 

123. Adamczyk AJ, Cao J, Kamerlin SC, Warshel A (2011) Catalysis by dihydrofolate 

reductase and other enzymes arises from electrostatic preorganization, not 

conformational motions. Proc Natl Acad Sci USA 108: 14115-14120. 

124. Ishikita H, Warshel A (2008) Predicting drug-resistant mutations of HIV protease. 

Angew Chem Int Edit 47: 697-700. 

Part of this chapter is adapted with permission from ‘Verma R, Schwaneberg U, 

Roccatano D. Computational and Structural Biotechnology Journal 2012, 2 (3), 

e201209008.’ 

37

PART I: MAP 2.0 3D 

Chapter 2 

MAP 2.0 3D: A Sequence/Structure Based Server for 

Protein Engineering 

2.1. Abstract 

The Mutagenesis Assistant Program (MAP) is a web-based tool to provide statistical 

analyses of the mutational biases of directed evolution experiments on amino acid 

substitution patterns. MAP analysis assists protein engineering in the benchmarking of 

random mutagenesis methods that generate single nucleotide mutation in a codon. Herein, 

we describe a completely renewed and improved version of the MAP server, named as 

MAP 2.0 3D server that correlates the generated amino acid substitution patterns to the 

structural information of the target protein. This correlation helps to select more suitable 

random mutagenesis method with specific biases on amino acid substitution patterns. In 

particular, the new server represents MAP indicators on secondary and tertiary structure, 

and correlates them to specific structural components like hydrogen bonds, hydrophobic 

contacts, salt bridges, solvent accessibility and crystallographic B-factors. Three model 

proteins (D-amino oxidase, phytase and N-acetylneuraminic acid aldolase) are used to 

illustrate the novel capability of the server. MAP 2.0 3D server is available publicly at 

http://map.jacobs-university.de/map3d.html. 

38


2.2. Introduction 

Over the past two decades directed protein evolution has been proven to be a 

powerful algorithm to tailor protein properties through iterative rounds of random 

mutagenesis and screening for improved protein variants.[1,2] Directed evolution methods 

are especially useful for improving properties difficult to rationalize and, hence, to identify 

amino acids and protein regions that can guide to further enhancements using site directed 

and saturation mutagenesis methods.[3,4] The success of a directed evolution campaign 

depends highly on the quality of the mutant library and on the employed random 

mutagenesis method. Random mutagenesis methods are based on specific error prone 

polymerase (enzymatic methods), DNA modifying chemicals (e.g. nitrous acid) or mutator 

strains (e.g. Escherichia coli mutA).[5] The quality of a mutant library is determined by the 

generated genetic diversity and corresponding protein sequence space.[6] Since the 

number of muteins boost with the increasing number of amino acid exchanged in the 

protein, protein engineers are challenged with an astronomically vast sequence space[7]. 

Despite advances in high-throughput screening, it is very difficult to screen the 

theoretically generated diversity even in the case of a small protein sequence.[8,9] 

Therefore, generating high quality mutant libraries enriched with functional trait is of high 

importance. To deal with the challenge to access and screen such a large sequence space, 

protein engineers usually adopt two strategic approaches.[10-12] The first approach 

consists in the random mutagenesis of the target protein and the subsequent identification 

of ‘mutagenic hot spots’. Random mutagenesis can be followed by recombination of the 

best variants by site directed mutagenesis or saturation mutagenesis.[13] The second 

approach involves the identification of a subset of specific residues using rational or semirational 

design with the help of computational tools.[14] Up to five amino acid positions 

can be efficiently targeted with focused mutagenesis methods allowing the generation of 

focused mutant libraries of a number of variants that can be screened with the state of the 

art in flow cytometry methods.[13] Focused mutagenesis is normally employed to improve 

the properties of target protein such as activity or selectivity, by mutating residues in close 

39


proximity to a specific protein region like the active site. In this case, random mutagenesis 

methods are complementary to the rational design since they can identify important amino 

acid positions, especially in the second and third coordination sphere, which would have 

been overlooked rationally. Nevertheless, random mutagenesis methods are biased toward 

certain nucleotide exchanges (e.g. many epPCR methods prefer transition mutations). The 

mutagenic preferences resulted by biased random mutagenesis methods affect the 

generated diversity. The analysis of the effects of mutational bias on the amino acid 

diversity provides a useful indicator in the selection of the mutagenesis method with 

diverse and complementary amino acid substitution patterns. The generated 

complementary mutant libraries extend the sampling of the vast protein space and 

enhance the chance to obtain improved variants.[15,16] 

Recently, we have introduced a freely available web-based statistical analysis tool 

(MAP[17]). The server statistically analyzes the effects of mutational bias of 19 different 

random mutagenesis methods on the level of amino acid substitutions for a given 

nucleotide sequence of the target protein. The analysis is returned in terms of MAP 

indicators that allow a rapid comparison of different random mutagenesis methods on the 

protein level. It has been shown that this approach can be used to predict the type, extent, 

and chemical nature of the genetic diversity generated by different mutagenesis 

methods.[17,18] Recently, Rasila and co-workers[19] reported a comparative evolution of 

commonly used random mutagenesis methods. They found the experimentally induced 

substitution patterns very similar to those obtained by MAP server and suggested the use 

of combination of mutagenesis methods to generate high diversity.[19] 

One of the limitations of the original MAP server is the absence of the analysis tools 

relating the MAP indicators to the structural properties of the target protein. The nature of 

the amino acid change in different region of the protein can affects its global and local 

structural and thermodynamics properties.[14,20,21] Therefore, the possibility to 

correlate the generated diversity with structural properties help to identify in advance the 

random mutagenesis method that has the least number of “deleterious” mutations on the 

protein stability and the higher probability to introduce amino acid substitutions that may 

40


improve the fitness toward an expected property, e.g. substitutions to charged amino acid 

residues to increase solubility in water. For this reason, we have expanded the capability of 

the server by introducing these new features. The new server (MAP 2.0 3D) can correlate the 

mutational propensity at amino acid level of a gene for 19 random mutagenesis methods 

(and now also for a user customized random mutagenesis method) with the 

crystallographic or homology modeled structure (if available in Protein Data Bank[22] 

format) of the target protein. MAP 2.0 3D analyses the three-dimensional structure of the 

target proteins by calculating secondary structure elements, important local interactions 

(such as hydrogen bonds, hydrophobic contacts, salt bridges, disulphide bridges, solvent 

accessibility), and amino acid motilities from the crystallographic B-factors. These 

combined information help to identify biased amino acid substitutions that may improve 

stability and function of the protein.[23-25] 

To correlate the sequence-based analysis to the structural data analysis, a new 

indicator, the residue mutability indicator ‘µ’ (amino acid substitution probability leading 

to amino acid change at specific position), has been introduced (see Methods). The 

mutability indicator allows a rapid identification of mutagenic hot spots and, more easy 

comparison of experimental data to the predicted ones. 

This chapter illustrates the new features of MAP 2.0 3D server in detail, performing 

the analysis on three model proteins. The results of the MAP 2.0 3D analysis are compared 

with the results of protein engineering experiments reported in the literature. The three 

examples show possible uses of the server for computational pre-screening of the target 

protein to evaluate and select mutagenesis method for directed evolution. 

2.3. Methods 

2.3.1. Mutational probability and statistics 

41


The MAP 2.0 3D server performs statistical analysis on a given nucleotide sequence 

based on the mutational spectra of different random mutagenesis methods that were 

slightly elaborated as follow to be used in the analysis.[17] First, insertions and deletions 

with an occurrence frequency between 0.80 % and 13.9 % were neglected and remaining 

nucleotide substitution frequencies were scaled proportionally to 100 %. Second, 

mutations in upper and lower DNA strand were considered to occur with equal frequency. 

The scaled mutational frequencies are used in the analysis to calculate the probability of 

amino acid substitutions resulting from one nucleotide exchange in one codon of the gene. 

The analysis is performed as follows. Consider a gene coding for a protein of L amino acids. 

For each nucleotide of a codon (named as X,Y,Z) in the gene sequence, the corresponding 

single nucleotide substitutions X´,Y´,Z´ (with {X, Y, Z, X´, Y´, Z’ ∈ {A, T, G, C} | X´ ≠ X, Y’ ≠ Y, Z’ ≠ 

Z}) are considered. For each one of the 19 random mutagenesis methods, matrix P (in 

equation 2.1) gives the 16 mutational probability values for the given nucleotide 

substitution into another three (e.g. X → X´). The values of matrix P have been already 

reported in Table 1 of our previous publication.[17] 

⎛ 

⎜ 

P= ⎜ 

⎜ 

⎜ 

⎝ 

A → A A → T A → G A → C 

T → A T → T T → G T → C 

G → A G → T G → G G → C 

C → A C → T C → G C → C 

⎞ 

⎟ 

⎟ 

⎟ 

⎟ 

⎠ 

(2.1) 

In equation 2.2, the binary vector U and V are then used to select a given probability 

(f) from the matrix P. The four elements of U and V correspond to the nucleotide (A, T, G, C) 

that can be selected by assigning a value of one or zero. U selects the original nucleotide 

and V the mutated one. In the equation 2.2, an example for the epPCR method with the Taq- 

Polymerase (unbalanced dNTPs) is given as matrix P. In this example the mutational 

probability for the transformation of nucleotide A → T gives a value of f = 9.7. 

42


⎛ 0.0 9.70 19.34 16.14⎞⎛0⎞ 

⎜ 

⎟⎜ 

⎟ 

T 

⎜9.70 

0.0 16.14 19.34⎟⎜1⎟ 

f = UPV = 

⎜ 

= 

4.82 0.0 0.0 0.0 ⎟⎜0⎟ 

⎜ 

⎟⎜ 

⎟ 

0.0 4.82 0.0 0.0 

0 

⎝ 

⎠⎝ 

⎠ 

( 1 0 0 0) 9. 7 

(2.2) 

By applying this procedure to each single nucleotide substitution in the codon, nine 

probability values (three for each nucleotide) are obtained. Each of these values gives the 

k th f ( i 

mutational probability ( 

k 

) α → β 

) that change the i th amino acid (α) expressed by the 

native codon into the one (β, which also comprises the stop codon) expressed by the 

k 

f( i) mutated codon (e.g. X,Y,Z → X´,Y,Z). Therefore, the 9 probabilities ( α→ 

get the normalization factor (Ni) for the i th residue of the protein sequence 

β 

) are summed to 

N 

i 

= 

9 

∑ 

k = 1 

f 

( i) 

k 

α →β 

(2.3) 

hence, the normalized probability for the substitution of amino acid α → β is given by 

φ( 

i) 

k 

α 

→ 

= f ( i) 

β 

k 

α → β 

N 

i 

(2.4) 

2.3.2. MAP indicators 

Three indicators protein structure indicator, amino acid diversity indicator and 

chemical diversity indicator are used to summarize the characteristics of random 

mutagenesis method for the target gene on amino acid level. The amino acid diversity for 

the substitution of amino acid α → β in the protein sequence (L) is calculated by 

43


∆ 

α→ β 

= 

L 

1 

∑φ( 

i) 

L 1 

i= 

α→β 

(2.5) 

The amino acid diversities are summed together to calculate the values for MAP indicators. 

I 

α →S 

= 

r ' 

∑ 

r = 1 

∆ 

r 

α →β 

( r ) 

(2.6) 

where S indicates different subset of amino acids or stop codons and r´ represents the 

elements in these subsets. The chemical diversity indicator quantifies the generated 

chemical diversity by the random mutagenesis method. For this indicator, the S consists of 

one of the subset of amino acids: charged (D, E, H, K, R; S = ch), neutral (C, M, S, P, T, N, Q; S 

= ne), aromatic (F, Y, W; S = ar) and aliphatic (G, A, V, L, I; S = al). For example, Iα 

ch indicate 

the total probability of a given amino acid α to substitute into charged amino acids (ch) is 

calculated by Δα 

β(r), where the substituted residue β(r) can be one of the charged residues 

(E, D, R, K and H) i.e. r´ = 5. The protein structure indicator signifies the fraction of single 

nucleotide substitution resulting in protein structure/function-disrupting (stop codons; S = 

st) and likely destabilizing (glycine or proline; S = gp) amino acid substitutions. Finally, the 

amino acid diversity indicator measures the fraction of variants with preserved amino acid 

substitutions (S = pr) and average amino acid substitutions per residue. This is 

complemented by codon diversity coefficient that measures the distribution of random 

mutations among the codons of the gene. 

2.3.3. Local chemical diversity and protein structure components 

Two new sequence based indicators are introduced with the MAP 2.0 3D server to 

complement the single amino acid structural analysis. The substitution probability of the i th 

44


amino acid (α) that leads to change in the amino acid (β) with the side chain of same 

chemical nature is calculated by 

δ(i) α→S 

= 

r' 

∑ 

r=1 

r 

φ(i) α→β 

(2.7) 

where, x and r´ represents the amino acid group and its members, respectively (as 

described for equation 2.6). The amino acid mutability of the i th amino acid (a special case 

of the equation 2.7 with r´ = 1) is given by 

µ(i) =1−φ(i) α→α (2.8) 

where 

φ(i) 

α →α 

is the normalized probability for the substitution does not lead to amino 

acid change (α → α) at the i th residue. The local structure environment of the amino acid 

residue influences the acceptance of the amino acid substitutions.[23,24] Local structural 

environment of the protein comprises secondary structure element, residue flexibility and 

solvent accessibility. Intra protein interactions contribute to define secondary structure 

elements and residue flexibility in a target protein and help to understand molecular basis 

of the stability and activity of the protein.[26] To illustrate the effect of generated chemical 

diversity on protein structural environment, these factors are mapped with amino acid 

substitution patterns. 

The secondary structure elements are derived using DSSP[27] while Relative 

Solvent Accessibility (RSA) has been calculated by the number of water molecules in 

contact of residue[27] divided by total surface area of the residue.[28] A threshold value of 

0.16 is used to differentiate between exposed (RSA >= 0.16) or buried residues (RSA < 

0.16). Crystallographic B-factors are used as indicators of the residue flexibility.[29] The B- 

factors of Cα atoms are normalized by the 

45

B´= 

( B − B ) 

σ 


(2.9) 

where ‹B› is the average value for Cα atom (after omitting first and last 3 residues) and σ 

the standard deviation.[30] The relative B-factor values after normalization is employed to 

differentiate flexibility and rigidity of the residue.[31] 

Finally, the new server calculates from the crystallographic protein structure, using 

criteria reported in literature, the following intra-protein interactions: disulphide 

bonds,[27] salt bridge,[32] hydrophobic interaction,[33] aromatic interaction[34] and side 

chain hydrogen bond.[35] The default parameters are taken from the widely accepted 

primary literature for the calculation of molecular interactions and can me modified by the 

user. 

2.3.4. MAP 2.0 3D server description 

MAP 2.0 3D analysis was performed on gene sequence along with the 3D coordinates 

of target protein for a random mutagenesis method at a time. Figure 2.1 shows the query 

interface of the server available at http://map.jacobs-university.de/map3d.html. 

The server is flexible to accept the gene sequence in commonly used sequence 

format (fasta, GenBank, GCG) or as the raw sequence. The 3D coordinates is accepted in 

PDB file format[36]. The protein sequence, after translation from gene sequence, is aligned 

with protein sequence, extracted from protein coordinates, by using Smith Waterman 

algorithm[37] for local sequence alignment. For the complete analysis, the sequences 

should have appropriate identity (default >= 70 %). In case of multi-protein chain files, the 

analysis performs on first chain or can be defined by the user. The analysis is performed on 

a user selected mutagenesis method (chosen among the MAP library of commonly used 

methods or, as a feature of the server by directly introducing the values of the probability 

46


of transformation matrix P). By default the results include the analysis of all the residues 

that can be changed by selecting predefined group of amino (charged, neutral, aromatic 

and aliphatic or, accordingly to their relative solvent accessibility, exposed or buried) or by 

providing a set of amino acid residues, which can be extended to residues within a given 

range (in Å) from the given set of amino acids. Finally, the advanced user interface section 

allows the change of the parameters used for the calculation of molecular interactions. 

47


Figure 2.1: Query interface for MAP 2.0 3D. Black boxes show two ways to query the sever: (1) 

sequence based analysis that take nucleotide sequence as an input (red box) and (2) structure 

based analysis, which takes protein coordinates (crystallographic structure or homology model), 

nucleotide sequence and a random mutagenesis method as input (red boxes). The options given in 

the green boxes can be used to customize the query like (1) 19 commonly used mutagenesis 

methods are included in the server as default, new method can be included by defining its 

mutational spectra, (2) selection of chain in case of multi chain protein, (3) restrict the search for a 

group of amino acids either selecting the predefined groups based on (a) the chemical property of 

their side chain like charged, neutral, aromatic, and aliphatic, (b) the solvent accessible area like 

buried or exposed and (c) the given set of amino acids and define cutoff (in Å) to include residues in 

the defined diameter of given residues in the analysis, and (4) altering the threshold used for the 

calculation of molecular interactions. 

2.3.5. MAP 2.0 3D output 

Along with the sequence based MAP analysis indicators, the implemented indicators 

in MAP 2.0 3D correlate the generated amino acid substitution patterns of random 

mutagenesis methods to the protein structure (by using the Jmol applet, 

http://www.jmol.org/) and includes a residue mutability indicator and taking secondary 

structure elements, residue flexibility, relative solvent accessibility and intra protein 

interactions into account (see above). Generated results are also available to download for 

further use in text format. The modified coordinate files (with amino acid substitution 

probabilities) in pdb format are also available as downloads. 

2.3.6. Model proteins 

The enzyme selected for the analysis by MAP 2.0 3D are: 1) D-amino acid oxidase from 

Rhodotorula gracilis (EC: 1.4.3.1; EMBL-Bank: AAB93974.1[36]; PDB Id: 1C0I[37]), 2) 

Phytase from Escherichia coli (EC: 3.1.3.2; EMBL-Bank: AY496073.1[38]; PDB Id: 

1DKP[39]), 3) N-Acetylneuramine acid aldolase from Escherichia coli (EC: 4.1.3.3; EMBL- 

Bank: X03345.1[40] ; PDB Id: 1NAL[41]). The sequence composition of the enzymes: 1) D- 

48


amino acid oxidase (1107 bases: A 19.96 %; T 17.52 %; G 31.17 %; C 31.35 %; 369 

residues), 2) Phytase (1299 bases: A 24.25 %; T 22.09 %; G 27.25 %; C 26.40 %; 433 

residues) and 3) N-acetylneuraminic acid aldolase (894 bases: A 24.38 %; T 23.60 %; G 

27.07 %; C 24.94 %; 298 residues). Secondary structure of the enzymes: 1) D-amino acid 

oxidase (30 % helical, 28 % beta sheet), 2) Phytase (42 % helical, 15 % beta sheet), 3) N- 

acetylneuraminic acid aldolase (50 % helical, 13 % beta sheet). 

2.4. Results and discussions 

The use of MAP 2.0 3D server is illustrated by performing the analysis of three 

different enzymes evolved for different properties by using directed protein evolution. The 

first example describes how to decrease effects of mutational bias and to generate a mutant 

library with a higher fraction of active clones. The second and third examples show the 

usability of the server to analyze the influence of mutational preferences on the evolution 

of desirable property. Outputs of the complete MAP 2.0 3D analysis are provided as examples 

in the instruction link of the server (http://map.jacobs-university.de/instruction.html). 

2.4.1. D-amino acid oxidase 

D-amino acid oxidase (DAAO) is a flavin adenine dinucleotide (FAD) dependent 

flavoenzyme. DAAO catalyses the dehydrogenation of D-amino acid to the corresponding α- 

keto acids, producing ammonia and hydrogen peroxide.[42,43] The high turnover rate, the 

stable FAD-binding and the broad substrate specificity of DAAO from Rhodotorula gracilis 

(RgDAAO) make it an attractive catalyst for biotechnological application as the biosensing 

(i.e. the rapid and reliable detection of D-amino acid content in food specimens or of the 

neurotransmitter D-serine in the brain).[43] We performed MAP 2.0 3D analysis on the 

RgDAAO to evaluate the amino acid diversity generated by random mutagenesis methods. 

49


Table 2.1: Summary of the MAP 2.0 3D analysis for the oxidase, the phytase and the aldolase, 

targeting different epPCR methods for random mutagenesis. 

RgDAAO (1 st 

RgDAAO (2 nd 

Phytase 

N-acetylneura- 

round) 

round) 

minic acid 

aldolase 

epPCR method 

Average amino 

acid substitution a 

Preserved amino 

acid substitution b 

Codon diversity 

coefficient c 

Stop codons 

frequency d 

Gly/Pro 

frequency e 

Charged amino 

acid diversity f 

Neutral amino 

acid diversity g 

Aromatic amino 

acid diversity h 

Aliphatic amino 

acid diversity i 

Taq 

(+,G=A=C=T) 

Taq 

(+,G=A,C=T) 

Taq 

(+,G=A,C=T) 

Taq 

(+,G=A=C=T) 

7.40 7.40 7.45 7.20 

24.53 % 23.38 % 25.40 % 28.47 % 

42.48 34.04 36.49 43.70 

2.30 % 4.38 % 4.69 % 2.12 % 

20.58 % 13.23 % 11.60 % 16.26 % 

-0.34 % 

(25.00 %) 

-2.62 % 

(25.00 %) 

1.39 % 

(19.21 %) 

5.00 % 

(22.22 %) 

3.37 % 

4.47 % -4.14 % 1.73 % 

(27.99 %) (27.99 %) (35.65 %) (26.94 %) 

-3.19 % -0.23 % 0.91 % -3.14 % 

(7.34 %) (7.34 %) (5.79 %) (8.08 %) 

-2.13 % -6.00 % -2.86 % -5.72 % 

(39.67 %) (39.67 %) (39.35 %) (42.76 %) 

aaverage number of amino acid substitutions per residue, b Iα→pr: fraction of variants with 

preserved amino acid substitutions, c codon diversity coefficient, d Iα→st: fraction of variants with 

stop codons, e Iα→gp: fraction of variants with Gly/Pro and chemical diversity generated by the 

mutagenesis methods presented as f Iα→ch: charged, g Iα→ne: neutral, h Iα→ar: aromatic and I Iα→al: 

50


aliphatic amino acid diversity with the amino acid composition of the target protein sequence (in 

parenthesis) and deviation from this composition after mutagenesis. 

MAP 2.0 3D analysis 

The sequence based MAP 2.0 3D analysis was performed using the following 

descriptors: i) protein structure indicators, ii) amino acid diversity indicator with codon 

diversity coefficient and iii) chemical diversity indicator. 

Figure 2.2: Statistical analysis of stop codons frequencies (a) and Gly/Pro substitutions (b) for 

RgDAAO. The random mutagenesis methods enclosed in the black rectangles (epPCR (Taq (MnCl 2, 

G=A=C=T)) and epPCR (Taq (MnCl 2, G=A, C=T))) are used for the MAP 2.0 3D analysis. 

In Figure 2.2, the values for the stop codon indicator (Iα→st)) and the Gly/Pro 

indicator (Iα→gp)) for different random mutagenesis methods are reported. The two methods 

show opposite trend in the generation of stop codons (sequence truncation) and Gly/Pro 

51


(α-helix destabilizers), i.e. higher the stop codons frequency lower the Gly/Pro 

substitutions and vice versa.[17] The two epPCR methods (indicated in the Figure 2.2 with 

the black rectangles) were found to be more appropriate for the RgDAAO with the balanced 

frequencies of stop codons and Gly/Pro in comparison to other mutagenesis methods. In 

Table 2.1, the sequence-based analysis of the server for selected random mutagenesis 

methods is summarized. The first method, the balanced epPCR Taq-Pol (Mn 2+ , balanced 

dNTP)[44] has strong preference for specific nucleotide exchange ~32 % AT → GC 

(transition mutations). While second method, the unbalanced epPCR Taq-Pol (Mn 2+ , 

unbalanced dNTP)[45] is expected to produce more transversion (21.41 % AT → TA) than 

transition (14.45 % AT→GC) mutations. Balanced epPCR was expected to generate lower 

fraction of stop codons (Iα→st = 2.30 %) and higher Gly/Pro (Iα→gp = 20.58 %) content than 

the unbalanced epPCR (Iα→st = 4.38 % and Iα→gp = 13.23 %) (see Table 2.1). For both 

methods, an average 7.4 amino acid substitutions per residue was calculated. 

In Figure 2.3, cartoon representations of the RgDAAO crystallographic structure 

colored accordingly to Iα→gp using the Jmol[46] visualization feature of the new server, are 

shown. Out of 30 % of the residues involved in helix formation, 51 % has a higher Iα→gp 

value (if α is equal to S, L, E and D) with a prevalence of negative charged residues (E and D, 

highlighted in stick format in Figure 2.3). In comparison to the unbalanced epPCR, the 

balanced epPCR method was observed with a higher probability of the charged residues 

substitution into Gly/Pro (represented by the color code used to define amino acid 

substitution probability in Figure 2.3). The mapping of charged amino acid substitution 

patterns on the structure of RgDAAO are reported in Figure 2.4 and found to be consistent 

with the latter observation of Gly/Pro substitution patterns. The balanced epPCR (Figure 

2.4a) shows lower probability for charged amino acid substitutions than unbalanced epPCR 

(Figure 2.4b) that is found to be opposite to the Gly/Pro substitution patterns for both 

methods (Figure 2.3). Hence, the amino acid substitutions of charged residues into residues 

unfavorable for forming molecular interactions result in destabilization of protein. For 

example, charged residues were found to be involved in molecular interactions like salt 

bridges (15 out of 21 show more than 0.5 probability to be substituted in glycine) and side 

chain H-bonds (5 out of 26 with more than 0.5 probability for glycine substitutions). In 

52


Figure 2.5, charged residues involved in salt bridge formation with the amino acid diversity 

generated by the balanced (a1 

and a2) and unbalanced epPCR (b1 and b2) methods are 

reported. The balanced epPCR method shows lower probabilities for substitution into 

charged residues when compared to the unbalanced epPCR method. The unbalanced 

epPCR is less transition biased (AT → GC) that results in higher probability of substitutions 

to charged residues (for E coded by GAG and D coded by GAC; a transition mutation leads 

often to a substitution into glycine (GGC, GGG)). These effects of mutagenesis methods due 

to mutational preferences might be minimized by codon optimization like for E using GAA 

and for D using GAT codon. 

Figure 2.3: Gly/Pro amino acid substitutions mapping on RgDAAO structure for (a) epPCR 

(Taq (MnCl2, G=A=C=T)) and (b) epPCR (Taq (MnCl2, G=A, C=T)). For the balanced epPCR 

method (a) the red colored regions of RgDAAO structure indicate an overall higher 

probability of charged residues substitutions, mainly for negatively charged residues (in 

stick representation), into Gly/Pro than the unbalanced epPCR (b). 

53


Figure 2.4: Amino acid substitutions mapping of charged residues (E, D, R, K, H) on RgDAAO with 

for (a) epPCR (Taq (MnCl 2, , G=A=C=T)) and (b) epPCR (Taq (MnCl 2, G=A, C=T)). 

54


Figure 2.5: Chemical diversity and mutability of charged amino acid positions of D-amino acid 

oxidase (E, D, R, K, H) that are involved in salt bridges formation (a1) and (a2) for epPCR (Taq 

(MnCl 2, G=A=C=T)) and (b1) and (b2) for epPCR (Taq (MnCl 2, G=A, C=T)). Y-axis shows residue (i) 

sequence id, (ii) PDB id, (iii) residue name, (iv) secondary structure elements (H: alpha helix; B: 

beta bridge and extended strand; T: hydrogen bonded turn and bend; *: loop or irregular structure) 

and (v) Amino acid category according to the chemical property of its side chain (P: charged, Y: 

neutral, C: aromatic and B: aliphatic) with stop codon (R) and Gly/Pro (G) as separate classes. 

Using the new focused analysis feature of the server, the amino acid substitutions 

patterns for active site residues (Y223, Y238, and R285) were also evaluated. Y223 and 

Y238 are involved in substrate binding and product release while R285 forms a pair with 

carboxylate portion of the substrate (arginine) in RgDAAO.[37] R285 has a very low 

residue mutability indicator (μ(285) < 0.3) (i.e. low probability of substitution leading to 

amino acid change) for both methods. Y223 and Y238 have µ(223/238) = 0.9 and therefore 

they have a higher probability to be substituted into another amino acid. For the balanced 

epPCR, Y223 and Y238 are preferentially substituted into charged (δ(223/238)Y→ch = 0.37) 

and neutral (δ(223/238)Y→ne = 0.46) amino acids. In the unbalanced epPCR, the chemical 

diversity at Y223/238 is more preserved (δ(223/238)Y→ne = 0.44 and δ(223/238)Y→ar = 

0.31). The tendency to the substitution of active site aromatic residues into chemically 

different amino acid might result in the increased number of inactive clones in the mutant 

library. In summary, MAP 2.0 3D provides qualitative indication that the balanced epPCR 

method might be less beneficial (or of lower quality) than the unbalanced one in the 

directed evolution of RgDAAO. 

RgDAAO directed evolution 

In one directed evolution study by Pollegioni et al.[47], the substrate specificity of 

RgDAAO was altered to formulate it as biosensor for analytical determination of D-amino 

acid in biological samples. Two rounds of directed evolution were performed employing 

epPCR mutant libraries (balanced dNTP) followed by another round of directed evolution 

55


employing epPCR (unbalanced dNTP) for diversity generation. In the first round (1 st set of 

epPCR: balanced), 91 % and in the second round (2 nd set of epPCR: unbalanced), 63 % 

clones were reported to be inactive. The results of these experiments are in agreement 

with the predictions of MAP 2.0 3D server. In fact, mutational preferences of the balanced 

method induce more structural destabilizing substitutions and resulted in a higher number 

of inactive clones than balanced epPCR. In addition, MAP 2.0 3D analysis suggests that most 

of the inactive clones should be a result of substitutions into Gly/Pro (destabilizing amino 

acids), which can destabilize the secondary structure of a helix or weaken intra-molecular 

interactions. 

The best variant obtained from the experiments was the triple mutant (T60A Q144R 

K152E) with broader substrate specificity. Amino acid substitution patterns calculated by 

the MAP 2.0 3D server at these positions were also found in agreement with experimental 

results. All mutated positions were assigned by MAP 2.0 3D with high residue mutability 

value (μ(60/144/152) > 0.8), i.e. within mutagenic hotspots generated by mutagenesis 

methods. Q144R substitution was identified in the first round of the balanced epPCR. Q144 

has a high probability to substitute into charged residue (δ(144)Q→ch = 0.67) and 

experimentally the Q144R (φ(144)Q→R = 0.58) substitution was found. In second round of 

random mutagenesis with unbalanced epPCR, T60A (φ(60)T→A = 0.58) and K152E 

(φ(152)K→E = 0.36) were substituted. Both residues have a high preference to be 

substituted into aliphatic (δ(60)T→al = = 0.6) and charged residues (δ(152)K→ch = = 0.7), 

respectively. 

In summary, the RgDAAO case illustrates how the MAP 2.0 3D server can be used in 

developing efficient mutagenesis strategies before and during directed evolution 

experiments by, for instance, the selection of the most efficient mutagenesis method for the 

target gene with least unfavorable effects on its protein structure or function and codon 

engineering. In this way, the gene can be synthesized prior to the directed evolution 

experiment to reduce highly destabilizing substitutions at key amino acid positions. 

56


2.4.2. Phytase 

Phytase is a class of phosphatase enzymes that catalyses the hydrolysis of phytic 

acid (myoinositol hexakisphosphate) to release inorganic phosphorus in a usable form. 

Phytases have been used as a feed supplement since decades.[48] Application of phytases 

in industrial feed pelleting process requires high temperatures. For this reason, directed 

evolution methods have been used to increase thermal resistance of phytases while 

maintaining high activity at ambient temperature.[49] 


MAP 2.0 3D analysis was performed on the phytase appA2 (full analysis is given in 

MAP 2.0 3D server as an example). In comparison to other 18 random mutagenesis methods, 

epPCR Taq (+, G=A, C=T) was found to be the preferred choice for directed appA2 

evolution. In fact, as reported in Table 2.1, the sequence based MAP 2.0 3D analysis shows 

frequency of stop codons Iα→st = 4.69 % and substitutions into Gly/Pro Iα→gp = 11.60 %. The 

average 7.45 amino acid substitutions per residue were calculated. The value of codon 

diversity coefficient was 36.49 % and resulted in preserved amino acid substitutions Iα→pr = 

25.40 %. Charged (19.21 %) and aromatic (5.69 %) residues were overrepresented with 

1.39 % and 0.91 % deviation from their chemical distribution, respectively. The aliphatic 

(39.35 %) and neutral (35.65 %) residues were underrepresented with -2.86 % and -4.14 

% deviation, respectively. 

By using the structural data a different conclusion emerge in contrast to the 

sequence analysis alone. One of the rule of thumb, used to enhance the thermostability of 

an enzyme, is to increase the number of charged residues in the loop regions at the protein 

surface. The reduction of mobility of these flexible regions by strengthening with 

electrostatic and hydrogen bonding interactions usually has a stabilizing effect on the 

thermal stability.[50] Hence, the amino acid substitution patterns of charged residues were 

analyzed using the residue mutability indicator, the normalized B-factors (B´) as a residue 

flexibility indicator and the relative solvent accessibility (RSA) to differentiate exposed and 

57


buried residues. In Figure 2.6, the mapping of amino acid substitution patterns, generated 

by epPCR Taq (+, G=A, C=T), for different amino acid substitution classes (charged, neutral, 

aromatic and aliphatic), stop codon and Gly/Pro on the phytase appA2 is reported with 

charged residues represented in stick representation. The high probability of charged 

residues substitutions into Gly/Pro, aliphatic and neutral residues were observed in 

MAP 2.0 3D analysis. In Figure 2.7 the detailed information of amino acid substitution 

patterns for charged residues is reported with three MAP 2.0 3D structural indicators for the 

epPCR Taq (+, G=A, C=T) method. The experimentally determined mutations are 

highlighted with black rectangles in Figure 2.7. Most of the charged residues were found 

with mutability value µ > 0.6 i.e. high substitution probability to change into another amino 

acid. In Figure 2.7, the high probabilities were evident to substitute from charged residues 

into glycine or proline (alpha helix destabilizers), aliphatic and neutral residues (less 

favorable to improve thermostability). 

58


Figure 2.6: MAP 2.0 3D analysis of amino acid substitutions probability of phytase appA2 after being 

subjected to epPCR (Taq (MnCl 2, G=A, C=T) in cartoon representation; charged residues (D, E, H, K, 

R) shown in stick representation. The probability values increase from blue (lowest probability) to 

red (highest probability). Amino acids were grouped according to the chemical nature of their side 

chain: charged (c), neutral (d), aromatic (e) or aliphatic (f) with sequence interrupting (stop codons 

(a)) and structure destabilizing amino acids (glycine and proline (b)). 

Phytase directed evolution 

In one example, Kim et al. performed directed evolution on phytase appA2 from E. 

coli to generate variants with increased thermostability by using epPCR with unbalanced 

dNTPs.[51] Two variants (K46E and K65E K97M S209G) with 20 % improved 

thermostabilty were found after screening 5000 clones. Out of four positions, three were 

resulted from charged residue substitutions occurred at lysine residues. 

MAP 2.0 3D analysis of amino acid substitution pattern for these positions was found 

in agreement with experimental findings with, all four positions having a high mutability 

indicator value (µ > 0.8) and relative solvent accessibility (RSA > 0.4). Furthermore, all 

lysine residues in the mutated positions have a probability to a nucleotide exchange that 

results in a stop codon. K46 and K97 have the same amino acid substitution patterns with 

substitution preference for stop codon (δ(46/97)K→st = 0.24) but different for charged 

(δ(46)K→ch = 0.40; φ(46)K→E = 0.16) and neutral residues (δ(97)K→ne = 0.35; φ(97)K→M = 

0.24). K65 has different amino acid substitution values to change into residues with 

aliphatic (δ(65)K→al = 0.18), charged (δ(65)K→ch = 0.36; φ(65)K→E = 0.12) and neutral 

(δ(65)K→ne = 0.27) side chains and δ(65)K→st = 0.18 for stop codon. S209 has a high 

probability to preserve the chemical property of its side chain and has high preference to 

neutral substitution (δ(209)S→ne = 0.60). S209 substitution into glycine alone has 

probability φ(209)S→G = 0.24. The mutations generated by using the epPCR Taq (+, G=A, 

=T) mutagenesis method experimentally resulted in only 20 % active clones in the library 

and only 80 were found improved in thermal stability. Phytase appA2 has high C helical 

59


content (42%) and substitutions into Gly/Pro residues might reduce thermal stability by 

destabilizing the structure and increasing the number of inactive clones. In general, amino 

acid substitutions of charged residues into aliphatic or neutral residues are less favorable 

to improve thermal stability. 

60


Figure 2.7: Amino acid substitution patterns for charged residues in phytase with performance 

the parameters residue mutability, residue flexibility and relative solvent accessibility of amino 

acids. The experimentally determined mutations are highlighted in black boxes. Y-axis shows 

sequence id, PDB id, amino acid name and in (a) secondary structure elements (H: alpha helix; B: 

beta bridge and extended strand; T: hydrogen bonded turn and bend; *: loop or irregular structure), 

(b) normalized Cα B-factor to differentiate between flexible: F and rigid: R residues and (c) relative 

solvent associability to identify exposed: E or buried: B residues. 

2.4.3. N-acetylneuraminic acid aldolase 

N-acetylneuraminic acid aldolase (Neu5Ac aldolase) catalyses the aldol 

condensation of N-acetyl-D-mannosamine and pyruvate to give N-acetyl-D-neuraminic acid 

(D-sialic acid).[52] Neu5Ac aldolase is used in the synthesis of sialic acid, a complex sugar 

with many pharmaceutical applications. 


Based on the sequence based analysis of MAP 2.0 3D server, the balanced epPCR 

method (Taq (MnCl2, G=A=C=T) was found suitable for directed evolution of Neu5Ac 

aldolase (summarized in Table 2.1). For this method, the value of codon diversity 

coefficient was 43.12, which is resulted in Iα→pr = 28.47 % preserved amino acid 

substitutions with an average 7.20 amino acid substitutions per residue. The frequency for 

stop codons was Iα→st = 2.12% and for Gly/Pro substitutions Iα→gp = 16.26% were reported. 

The structure based analysis was focused on active site residues (A11, S47, T48, Y137, 

I139, k165, T167, G189, Y190) using the new option of the MAP 2.0 3D server to restrict the 

analysis to selected amino acids. Figure 2.8 shows the expected amino acid substitutions 

for active site residues and, highlighted in boxes, experimentally determined mutation 

positions[52] (G70, T84, Y98, F115, V251, E282). With the exception of residues A11 and 

G189, the other active site residues have a residue mutability value (μ > 0.6). The values of 

the RSA and B´ indicate that A11 and G189 are buried in the protein active site and highly 

rigid. The residue I139, another aliphatic residue of active site, resulted in a moderately 

61


high preference of substitution into neutral amino acid (δ(139)I→ne = 0.36). The active site 

residues, Y137 and Y190 have high residue mutability value (µ(137/190) = 0.94) and 

substitute into charged (δ(137/190)K→ch = 0.37) or neutral (δ(137/190)K→ne = 0.46) amino 

acids. S47 has mutability value µ = 0.88 with a substitution probability φ(47)S→G = 0.6 to 

change into glycine. K165 has preference (µ(165) = 0.73) to substitute into charged 

residues (φ(165)K→R/K/E = 0.26). 

Figure 2.8: Amino acid substitution patterns for active site residues (A11, S47, T48, Y137, I139, 

K165, T167, G189, Y190) of Neu5Ac aldolase and experimentally determined mutations (I st 

generation: Y98H, P115L, II nd generation: V251I, III rd generation G70A, T84S, Q282L) in the boxes 

for random mutagenesis method: epPCR (Taq (MnCl 2, G=A=C=T). Y-axis representations are same 

as described in Figure 2.7. 

Figure 2.9 and 2.10 show the analysis of hydrophobic contacts and hydrogen bonds 

for active site residues and experimentally determined mutation positions, respectively. 

The results of the analysis highly suggest an involvement of A11 in hydrophobic 

interactions with Y43 or I206 (see Figure 2.9) and a side-chain hydrogen bond formation 

with Y43 or G207 or N211 (see Figure 2.10). In short, the substitution spectra analysis of 

62


active site residues (A11, S47, T48, Y137, I139, K165, T167, G189, Y190) indicates that the 

chemical environment of active site residues is not substantially modified by the epPCR 

random mutagenesis method. 

Figure 2.9: Amino acid substitution patterns for active site residues (A11, Y137, I139) of Neu5Ac 

aldolase and mutations (1 st generation: Y98H and P115L; highlighted in the box frames) involved in 

hydrophobic interactions. Figure (a) and (b) shows the interaction partners for hydrophobic 

interaction. Y-axis representations are same as described in Figure 2.5. 

Figure 2.10: Amino acid substitution patterns for active site residues (A11, S47, T48, Y137) of 

Neu5Ac aldolase and mutation (I st generation: Y98H; highlighted in a black box) involved in side 

chain hydrogen bond. Figure (a) and (b) shows the interaction partners for side chain hydrogen 

bond. Y-axis representations are same as described in Figure 2.5. 

63


Neu5Ac Aldolase directed evolution 

Neu5Ac aldolase was engineered applying epPCR with balanced dNTP for a 

complete reversal of enantioselectivity by Wada et al.[52] Three rounds of random 

mutagenesis resulted in a variant more effective toward both D/L enantiomeric substrates 

(3-deoxy–L/D-manno-2-octulosonic acid). The two mutation positions (Y98H and F115L, 

see Figure 2.8) from the first round of random mutagenesis were found to be involved in 

hydrophobic interaction (Figure 2.9, from Y98 to L63/A67/F100 and from F115 to 

L147/L155) and side chain hydrogen bonds (Figure 2.10, from E64 to Y98) formation in 

wild type. Y98H and F115L were present outside the active site, partially exposed to the 

solvent with relative solvent accessibility value (RSA = 0.26), moderately flexible with 

normalized B-factor (B´ = 0.91) and “variable” amino acid substitutions with residue 

mutability indicator µ(98/115) = 0.75. The substitutions at these positions into 

comparatively more hydrophobic residues resulted in increased activity of the wild type 

aldolase. In MAP 2.0 3D, Y98 is preferably preserved or substitute into charged (δ(98)Y→ch = 

0.27; φ(98)Y→H = 0.25) or neutral (δ(98)Y→ne = 0.33) residues while F115 shows a slightly 

higher preference toward aliphatic substitution (δ(115)F→al = 0.42; φ(115)F→L = 0.33) and 

cannot be substituted into charged residues (δ(115)F→ch = 0.00). The substitution at V251 

residue was obtained in the second round of directed evolution experiment and resulted in 

partially inverted enantiomeric preference of the enzyme. The position was found to be 

more conserved in MAP 2.0 3D analysis with µ(251) = 0.54 or substituted into more 

hydrophobic residues. The third generation mutations (G70A, T84S, Q282L) resulted into a 

complete reversal of enzymatic enantioselectivity for use in the synthesis of both D- and L- 

sugars. G70 has a high flexibility (B´ = 2.59) and low probability to be substituted into an 

aliphatic residue (δ(70)G→al = 0.10; φ(70)G→A = 0.03). Thr84 is a part of a turn with a high 

flexibility (B´ = 0.77) and exposure to the solvent (RSA = 0.25) with the residue mutability 

µ(84) = 0.87. T84 has a high preference for being substituted by aliphatic residues 

(δ(84)T→al = 0.58) and with less extend by a “neutral” amino acid (δ(84)T→ne = 0.34; 

φ(84)T→S = 0.13). Q282 is a part of a helix, is rigid (B´ = -1.01) but partially exposed to the 

solvent (RSA = 0.23). Q282 has high preference to be substituted into a charged residue 

64


(δ(282)Q→ch = 0.67) and very low for aliphatic (δ(282)Q→al = 0.13; φ(282)Q→L = 0.13) or 

neutral (δ(282)Q→ne = 0.08) substitution. 

In the case of aldolase, the MAP 2.0 3D analysis shows also a good agreement with 

experimental results. The variability in amino acid substitution patterns for active site 

residues resulted in exploring more sequence space for catalytic activity of the enzyme and 

resulted in getting a high fraction of beneficial mutations in first generation. 

2.5. Conclusions 

In this manuscript, we introduced MAP 2.0 3D server and its use to assist the design of 

directed evolution experiments. MAP 2.0 3D correlates the traditional sequence based MAP 

indicators with the structural information of the target protein. The combined information 

can help to improve the chances to find functional and stable enzyme variants. MAP 2.0 3D 

helps to guide the directed evolution experiments by focusing the analysis on a set of 

residues that are important for specific enhancement of enzymatic properties such as to 

improve substrate specificity by targeting residues located in or near the active site, or to 

enhance thermal stability or water solubility of proteins by increasing the number of 

charged amino acid substitutions. The new structure oriented features of the MAP 2.0 3D 

server have been applied to the analysis of three different proteins (phytase, oxidase and 

aldolase) and the predicted results were compared with the experimental results. The 

results of RgDAAO analysis indicate that the selection of the random mutagenesis method 

by the pre-screening of the generated library can help to elucidate the effects of mutational 

bias on the structural environment of the protein and how these effects can be optimized. 

The analysis of phytase and Neu5Ac aldolase illustrate that how the structural analysis 

features included in MAP 2.0 3D server can now assist to correlate the effect of mutational 

biases with protein structural environment and to evolve desired property. In this way, 

MAP 2.0 3D server facilitates the ‘in-silico’ pre-screening of the target gene and can also 

promote an increase of the active population in random mutagenesis libraries, thereby 

65


decrease screening efforts and increase probability for obtaining desirable mutations even 

in the small mutant library. 


1. Bornscheuer UT, Pohl M (2001) Improved biocatalysts by directed evolution and 

rational protein design. Curr Opin Chem Biol 5: 137-143. 

2. Brakmann S (2001) Discovery of superior enzymes by directed molecular evolution. 


3. Wong TS, Arnold FH, Schwaneberg U (2004) Laboratory evolution of cytochrome 

p450 BM-3 monooxygenase for organic cosolvents. Biotechnol Bioeng 85: 351-358. 

4. Roccatano D, Wong TS, Schwaneberg U, Zacharias M (2006) Toward understanding 

the inactivation mechanism of monooxygenase P450 BM-3 by organic cosolvents: a 

molecular dynamics simulation study. Biopolymers 83: 467-476. 

5. Wong TS, Zhurina D, Schwaneberg U (2006) The diversity challenge in directed 

protein evolution. Comb Chem High T Scr 9: 271-288. 

6. Wong TS, Roccatano D, Schwaneberg U (2007) Steering directed protein evolution: 

strategies to manage combinatorial complexity of mutant libraries. Environ 

Microbiol 9: 2645-2659. 

7. Smith JM (1970) Natural Selection and Concept of a Protein Space. Nature 225: 563- 

564. 

8. Olsen M, Iverson B, Georgiou G (2000) High-throughput screening of enzyme 

libraries. Curr Opin Biotech 11: 331-337. 

9. Tawfik DS, Bershtein S (2008) Advances in laboratory evolution of enzymes. Curr 


10. Shivange AV, Marienhagen J, Mundhada H, Schenk A, Schwaneberg U (2009) 

Advances in generating functional diversity for directed protein evolution. Curr 


66


11. Turner NJ (2009) Directed evolution drives the next generation of biocatalysts. Nat 

Chem Biol 5: 567-573. 

12. Wong TS, Roccatano D, Loakes D, Tee KL, Schenk A, et al. (2008) Transversionenriched 

sequence saturation mutagenesis (SeSaM-Tv+): a random mutagenesis 

method with consecutive nucleotide exchanges that complements the bias of errorprone 

PCR. Biotechnol J 3: 74-82. 

13. Dennig A, Shivange AV, Marienhagen J, Schwaneberg U (2011) OmniChange: The 

Sequence Independent Method for Simultaneous Site-Saturation of Five Codons. 

PLoS ONE 6: e26222. 

14. Chica RA, Doucet N, Pelletier JN (2005) Semi-rational approaches to engineering 

enzyme activity: combining the benefits of directed evolution and rational design. 

Curr Opin Biotech 16: 378-384. 

15. Zumarraga M, Camarero S, Shleev S, Martinez-Arias A, Ballesteros A, et al. (2008) 

Altering the laccase functionality by in vivo assembly of mutant libraries with 

different mutational spectra. Proteins 71: 250-260. 

16. Vanhercke T, Ampe C, Tirry L, Denolf P (2005) Reducing mutational bias in random 

protein libraries. Anal Biochem 339: 9-14. 

17. Wong TS, Roccatano D, Zacharias M, Schwaneberg U (2006) A statistical analysis of 

random mutagenesis methods used for directed protein evolution. J Mol Biol 355: 

858-871. 

18. Wong TS, Roccatano D, Schwaneberg U (2007) Are transversion mutations better? A 

Mutagenesis Assistant Program analysis on P450 BM-3 heme domain. Biotechnol J 

2: 133-142. 

19. Rasila TS, Pajunen MI, Savilahti H (2009) Critical evaluation of random mutagenesis 

by error-prone polymerase chain reaction protocols, Escherichia coli mutator strain, 

and hydroxylamine treatment. Anal Biochem 388: 71-80. 

20. Ditursi MK, Kwon SJ, Reeder PJ, Dordick JS (2006) Bioinformatics-driven, rational 

engineering of protein thermostability. Protein Eng Des Sel 19: 517-524. 

21. Shoichet BK, Beadle BM (2002) Structural bases of stability-function tradeoffs in 

enzymes. J Mol Bio 321: 285-296. 

67


22. Berman H, Henrick K, Nakamura H (2003) Announcing the worldwide Protein Data 

Bank. Nat Struct Biol 10: 980-980. 

23. Zhang H, Zhang T, Chen K, Shen S, Ruan J, et al. (2009) On the relation between 

residue flexibility and local solvent accessibility in proteins. Proteins 76: 617-636. 

24. Teilum K, Olsen JG, Kragelund BB (2009) Functional aspects of protein flexibility. 

Cell Mol Life Sci 66: 2231-2247. 

25. Gromiha MM, Oobatake M, Kono H, Uedaira H, Sarai A (1999) Role of structural and 

sequence information in the prediction of protein stability changes: comparison 

between buried and partially buried mutations. Protein Eng Des Sel 12: 549-555. 

26. Gromiha MM, Selvaraj S (2004) Inter-residue interactions in protein folding and 

stability. Prog Biophys Mol Bio 86: 235-277. 

27. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern 

recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 2577- 

2637. 

28. Chothia C (1976) The nature of the accessible and buried surfaces in proteins. J Mol 

Biol 105: 1-12. 

29. Peisajovich SG, Tawfik DS (2007) Protein engineers turned evolutionists. Nat 

Methods 4: 991-994. 

30. Karplus PA, Schulz GE (1985) Prediction of Chain Flexibility in Proteins - a Tool for 

the Selection of Peptide Antigens. Naturwissenschaften 72: 212-213. 

31. Yuan Z, Zhao J, Wang ZX (2003) Flexibility analysis of enzyme active sites by 

crystallographic temperature factors. Protein Eng Des Sel 16: 109-114. 

32. Kumar S, Nussinov R (2002) Close-range electrostatic interactions in proteins. 


33. Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic 

character of a protein. J Mol Biol 157: 105-132. 

34. Burley SK, Petsko GA (1985) Aromatic-aromatic interaction: a mechanism of protein 

structure stabilization. Science 229: 23-28. 

35. Overington J, Johnson MS, Sali A, Blundell TL (1990) Tertiary structural constraints 

on protein evolutionary diversity: templates, key residues and structure prediction. 

P Roy Soc Lond B Bio 241: 132-145. 

68


36. Liao GJ, Lee YJ, Lee YH, Chen LL, Chu WS (1998) Structure and expression of the D- 

amino-acid oxidase gene from the yeast Rhodosporidium toruloides. Biotechnol 

Appl Bioc 27 ( Pt 1): 55-61. 

37. Pollegioni L, Diederichs K, Molla G, Umhau S, Welte W, et al. (2002) Yeast D-amino 

acid oxidase: structural basis of its catalytic properties. J Mol Biol 324: 535-546. 

38. Rodriguez E, Han Y, Lei XG (1999) Cloning, sequencing, and expression of an 

Escherichia coli acid phosphatase/phytase gene (appA2) isolated from pig colon. 

Biochem Bioph Res Co 257: 117-123. 

39. Lim D, Golovan S, Forsberg CW, Jia Z (2000) Crystal structures of Escherichia coli 

phytase and its complex with phytate. Nat Struct Biol 7: 108-113. 

40. Ohta Y, Watanabe K, Kimura A (1985) Complete nucleotide sequence of the E. coli N- 

acetylneuraminate lyase. Nucleic Acids Res 13: 8843-8852. 

41. Izard T, Lawrence MC, Malby RL, Lilley GG, Colman PM (1994) The threedimensional 

structure of N-acetylneuraminate lyase from Escherichia coli. Structure 

2: 361-369. 

42. Pilone MS (2000) D-Amino acid oxidase: new findings. Cell Mol Life Sci 57: 1732- 

1747. 

43. Pollegioni L, Molla G (2011) New biotech applications from evolved D-amino acid 

oxidases. Trends Biotechnol 29: 276-283. 

44. Lin-Goerke JL, Robbins DJ, Burczak JD (1997) PCR-based random mutagenesis using 

manganese and reduced dNTP concentration. Biotechniques 23: 409-412. 

45. Vartanian JP, Henry M, Wain-Hobson S (1996) Hypermutagenic PCR involving all 

four transitions and a sizeable proportion of transversions. Nucleic Acids Res 24: 

2627-2631. 

46. Jmol: an open-source Java viewer for chemical structures in 3D. 

http://www.jmol.org/ 

47. Sacchi S, Rosini E, Molla G, Pilone MS, Pollegioni L (2004) Modulating D-amino acid 

oxidase substrate specificity: production of an enzyme for analytical determination 

of all D-amino acids by directed evolution. Protein Eng Des Sel 17: 517-525. 

69


48. Rao DE, Rao KV, Reddy TP, Reddy VD (2009) Molecular characterization, 

physicochemical properties, known and potential applications of phytases: An 

overview. Crit Rev Biotechnol 29: 182-198. 

49. Garrett JB, Kretz KA, O'Donoghue E, Kerovuo J, Kim W, et al. (2004) Enhancing the 

thermal tolerance and gastric performance of a microbial phytase for use as a 

phosphate-mobilizing monogastric-feed supplement. Appl Environ Microb 70: 

3041-3046. 

50. Fields PA (2001) Review: Protein function at thermal extremes: balancing stability 

and flexibility. Comp Biochem Phys A 129: 417-431. 

51. Kim MS, Lei XG (2008) Enhancing thermostability of Escherichia coli phytase AppA2 

by error-prone PCR. Appl Microbiol Biotechnol 79: 69-75. 

52. Wada M, Hsu CC, Franke D, Mitchell M, Heine A, et al. (2003) Directed evolution of 

N-acetylneuraminic acid aldolase to catalyze enantiomeric aldol reactions. Bioorgan 

Med Chem 11: 2091-2098. 


Roccatano D. ACS Synthetic Biology 2012, 1 (4), 139-150.’ 

70

PART II: MD Simulation 

Chapter 3 

Introduction to Molecular Dynamics Simulation of 

Biomolecules 

This chapter provides the brief introduction of molecular dynamics (MD) simulation 

followed by the description about the system preparation for MD simulation and the 

analysis of generated trajectories. In the following chapters of this thesis, the same 

procedure is used to perform MD simulation using P450BM-3 monooxygenase as model 

system and the analysis of trajectories. 

3.1. Background 

Last decades witnessed the rapid development in the field of MD simulations for 

biological molecules to study their dynamic processes at atomic level. Far from its infancy, 

the computer simulation methods can nowadays provide an important insight into the 

molecular basis of protein structure, function and dynamics relationships.[1-4] 

MD is a computational chemistry method that describes the dynamics of a molecular 

system by integrating Newton’s equations of motion for a system of N interacting atoms. In 

MD simulation, the force acting on the i th particle (Fi) are calculated as negative derivatives 

of a potential energy function (V) (equation 3.1), called force field that describes the atomic 

interactions in an approximate way. In equation 3.1, ri represents the position of i th atom. 

71


F 

i 

∂V 

( r 

= − 

1 

,r3 

…r 

∂r 

i 

N 

) 

(3.1) 

The dynamics of the system is calculated according to the Newton’s law by 

numerically integrating the differential equations of motion (equation 3.2). In this way, a 

new set of atomic positions and velocities (vi) can be generated at successive integration 

time step dt. 

∂v i = 

∂t 

Fi 

m 

i 

(3.2) 

The so-called Leap-frog algorithm[5] is commonly used in MD simulation to 

integrate the equation of motion. It updates velocities (equation 3.3) and positions 

(equation 3.4) of i th atom of mass mi using force F(t) at position ri(t). 

1 1 dt 

vi ( t + dt) 

= vi 

( t − dt) 

+ F( 

t) 

2 2 m 

(3.3) 

1 

ri ( t dt) 

= ri 

( t) 

+ dtvi 

( t + dt) 

2 

+ 

(3.4) 

The simulation generates an ensemble of molecular configurations (trajectory) that 

describes the evolution of the coordinates and velocities of the system as a function of time. 

To generate equilibrium ensemble consistent with the experimental conditions, at which 

the system was studied temperature and pressure of the simulated system are controlled 

and keep constant during the simulation.[3,4] 

72


Force field equation 

In MD simulation, the force field characterizes the different terms of atomic 

interactions as bonded and non-bonded interaction. Bonded interactions include bonds, 

angles, dihedrals and improper interaction terms and non-bonded interactions have van 

der Waals (vdW) and electrostatic terms (equation 3.5). 

V = V + V + V + V + V + V 

bond 

angle 

dihedral 

improper 

vdW 

es 

(3.5) 

Bonded interactions 

Bond stretching between covalently bound atoms (i and j having bond length b) is 

calculated by covalent bond potential (Vbond) in GROMOS-96 (equation 3.6) using 

b 

k 

ij 

force 

constant and equilibrium bond length b0. Angle vibrations between triplets of atoms (i, j 

and k) having bond angle θ are represented by cosine based angle potential (Vangle) 

(equation 3.7) using 

θ 

k 

ijk 

force constant and equilibrium bond angle cosθ0. 

V 

bond 

1 

4 

b 2 2 2 

= kij 

( b − b0 

) 

(3.6) 

V 

1 

= 

2 

θ 

angle 

k ijk 

(cosθ 

− cosθ 

) 

o 

2 

(3.7) 

Torsional interactions (equation 3.8) involve four atoms (i, j, k, and l) and define the 

dihedral angle Φ as the angle present between two planes constituted by first three (i, j and 

k) and last three (j, k and l) atoms. Torsional potential define the interactions arising by the 

73


rotation of two functional groups connected with a bond and defined in equation 3.8, 

where 

φ 

k 

ijkl 

is the force constant. 

V 

dihedral 

= 

k 

1+ 

cos( nφ 

−φ 

)) 

φ 

ijkl( o 

(3.8) 

Improper dihedrals are used to define the planarity of the four atoms (i, j, k and l) 

defined by harmonic interaction potential in equation 3.9 where 

ξ 

k 

ijkl 

is the force constant 

and ξ is the dihedral angle on four atoms to keep them in special configuration. For 

example, ξ0 will be equal to 0° to keep the four atoms in planar but also tetrahedral 

configuration (i, j, k and l)). 

V 

1 

2 

ξ 

ξ 

ξ 

2 

improper 

= k ijkl 

( − 

o) 

(3.9) 

Non-bonded interactions 

vdW interactions are resulted from the induced atomic dipoles and excluded 

volumes of atom pairs. They are attractive at long-range distance but become repulsive at 

short-range distance between the atom pairs. In GROMACS, they are defined as Lennard- 

Jones (LJ) potential terms (VLJ) (see equation 3.10). In equation 3.10, εij and σij are empirical 

parameters where ε is the depth of potential well and σ is the distance at which the 

potential is zero and rij is the distance between i th and j th atoms. 

V 

LJ 

⎛ 

⎜⎛σ 

ij 

= 4ε 

⎜ 

ij⎜ 

⎝⎝ 

rij 

⎞ 

⎟ 

⎠ 

12 

⎛σ 

ij 

− ⎜ 

⎝ rij 

⎞ 

⎟ 

⎠ 

6 

⎞ 

⎟ 

⎟ 

⎠ 

(3.10) 

74


Electrostatic potential (Ves) terms define the Coulombic interaction between two 

charged atoms (i and j) calculated by equation 3.11, where ε0 and εr are the dielectric 

constants, qi and qj are the atomic charges on i th and j th atoms and rij is the distance between 

them. 

V 

es 

= 

1 

4πε 0 

q q 

i 

ε r 

r 

j 

ij 

(3.11) 

The topology of a simulated biomolecule is the ordered list of all these interactions 

and their predefined parameters for the selected force field. In this thesis all MD 

simulations were performed using GROMOS96 43a1 force field.[4] 

3.2. Setup of the simulated systems 

The crystal structure of the protein was used as the starting coordinates of the MD 

simulation (in this thesis PDB ID: 1BVY [6] is used for the simulation of P450BM-3 

domains). The proteins were centered in a cubic periodic box and set to have at least a 

minimal distance between the protein and any side of the box larger than 0.80 nm so that 

the protein cannot see its periodic image across the boundary of the box. They were 

solvated by stacking equilibrated boxes of solvent molecules to fill completely the 

simulation box. All solvent molecules with any atom within 0.15 nm from the atoms of 

protein were removed. SPC water model, a simple three atoms model was used for water 

molecule. [7] Sodium counter ions were added by replacing solvent molecules at the most 

negative electrostatic potential to provide a total charge of the box equal to zero. The 

protonation state of residues in the protein was assumed to be the same as of the isolated 

amino acids in solution at pH 7. Hence, the water molecules closest to the charges in the 

protein structures are replaced by the counter ions to neutralize the system. The LINCS 

(Linear Constraints Solver)[8] algorithm was used to constrain all bond lengths. LINCS [8] 

75


algorithm use the stable and fast way to reset the bond length after an unconstrained 

update. SETTLE [9] algorithm was used for solvent molecules to constrain them as rigid 

body. 

Electrostatic interactions were calculated by using Particle Mesh Ewalds 

method.[10] For the calculation of the long-range interactions, a grid spacing of 0.12 nm 

combined with a fourth-order B-spline interpolation were used to compute the potential 

and forces between grid points. A non-bonded pair-list cutoff of 1.3 nm was used and 

updated at every 5 time-steps. 

3.3. Equilibration procedure 

Simulated systems were first energy minimized, using the steepest descent 

algorithm [11], for at least 2000 steps in order to remove clashes between atoms that were 

too close. After energy minimization, all atoms were given an initial velocity obtained from 

a Maxwell-Boltzmann velocity distribution at 300 K to start MD simulations. 

All systems were initially equilibrated by 100 ps of MD run with position restraints 

on the heavy atoms of the solute to allow relaxation of the solvent molecules. In position 

restraint, the protein is fixed in the reference position using force constants in each spatial 

dimension and let the solvent relax around protein. Berendsen’s thermostat[12] was used 

to keep the temperature at 300 K by weak coupling the systems to an external thermal bath 

with a relaxation time constant τ = 0.1 ps. The pressure of the system was kept at 1 bar by 

using the Berendsen’s barostat[12] with a time constant of 1 ps. After the equilibration 

procedure, position restraints were removed and the system was gradually heated from 50 

K to 300 K during 200 ps of simulation. Finally, a production run was performed at 300 K. 

The analysis of trajectories were performed by using the GROMACS software package 

(http://www.gromacs.org/).[13] 

76


3.4. Structural and dynamical analysis 

The structural stability and convergence of protein were examined by analyzing 

root mean square deviation (RMSD), radius of gyration (Rg) and secondary structure 

elements with respect to its crystal structure as a function of time. Residual root mean 

square fluctuation (RMSF) was used to access the dynamics of the target protein during the 

simulation. 

3.5. Cluster analysis 

Cluster analysis was performed to characterize the conformational diversity of the 

structures generated during MD simulations. Cluster analysis was performed using the 

Gromos clustering algorithm[14] on the backbone atoms (Cα, C and N) of protein. The 

analysis is performed on conformations extracted at regular time interval from the 

generated trajectory. The resulting conformations were superimposed with respect to 

backbone atoms of reference structure after removing overall translation and rotation of 

protein in space. The similarity (RMSD-distance) matrix is prepared for all the pairs of 

selected conformers. 

In Gromos clustering algorithm, RMSD cutoff criteria is used to add similar atomic 

coordinates in the cluster having RMSD value less than the defined cutoff. The atomic 

coordinates having smallest RMSD from other members of the cluster is known as the 

representative structure of that cluster. The same process is repeated until all the selected 

structures for the analysis are assigned to the clusters. In this way, the large clusters 

represent the ensemble of frequently populated configurations in conformational space 

during MD simulation. 

77


3.6. Principal component analysis 

Principal component analysis (PCA, also called essential dynamics analysis) was 

performed to access the conformational space by identifying collective motions in the 

biomolecules during MD simulation. PCA correlates the atomic positional fluctuations in 

proteins and can enhance the molecular level understanding of protein function.[15] The 

covariance matrix (C) of atomic coordinates (3N x 3N) is used to construct to identify 

collective motions in the biomolecules[16,17] The backbone atoms (Cα, C and N) of the 

target proteins were used to explore the conformational subspace in solution. For PCA 

analysis first the translational and rotational motions are eliminated from the trajectory (to 

consider internal motions only) by the least square fitting of atomic coordinates (using 

backbone atoms) to crystal structure. The resulted set of atomic coordinates is used to 

construct C of positional deviations using equation 3.12.[17] 

C 

= 

( x − x )( x − x ) T 

(3.12) 

where x is the subset of atoms in the trajectory x(t) and 

represents the ensemble 

average over time. The eigenvectors or essential modes can be identified by the 

diagonalization of the symmetric matrix (C) using orthogonal transformation matrix T. The 

displacements along different eigenvectors were calculated by projecting the atomic 

coordinates on eigenvectors. The comparison of eigenvectors obtained from different 

simulations was performed using the root-mean-square inner product (RMSIP)[18], which 

is defined in equation 3.13. 

RMSIP 

= 

1 

N 

m 

m 

∑ ∑ 

i= 1 j= 

1 

2 

( v 

i 

⋅ u 

j 

) 

(3.13) 

78


where vi and uj are i th and j th eigenvectors of the two different m dimensions essential 

subspaces of the two systems. RMSIP gives a simple measure to assess the dynamical 

similarity of eigenvectors.[18] 


1. Dror RO, Dirks RM, Grossman JP, Xu H, Shaw DE (2012) Biomolecular simulation: a 

computational microscope for molecular biology. Annu Rev Biophys 41: 429-452. 

2. Mccammon JA, Gelin BR, Karplus M (1977) Dynamics of Folded Proteins. Nature 

267: 585-590. 

3. Karplus M, McCammon JA (2002) Molecular dynamics simulations of biomolecules. 

Nat Struct Biol 9: 646-652. 

4. van Gunsteren WF, Billeter SR, Eising AA, Hunenberger PH, Kruger P, et al. (1996) 

Biomolecular Simulation: The GROMOS96 Manual and User Guide. VdF: 

Hochschulverlag AG an der ETH Zurich and BIOMOS bv, Zurich, Groningen. 

5. Hockney RW, Goel SP, Eastwood JW (1974) Quiet high-resolution computer models 

of a plasma. J Comp Phys 14: 148-158. 

6. Sevrioukova IF, Li HY, Zhang H, Peterson JA, Poulos TL (1999) Structure of a 

cytochrome P450-redox partner electron-transfer complex. P Natl Acad Sci USA 96: 

1863-1868. 

7. Berendsen HJC, Postma JPM, van Gunsteren WF, Hermans J (1981) Interaction 

models for water in relation to protein hydration. Intermolecular Forces: 331-342. 

8. Hess B, Bekker H, Berendsen HJC, Fraaije JGEM (1997) LINCS: A linear constraint 

solver for molecular simulations. J Comput Chem 18: 1463-1472. 

9. Miyamoto S, Kollman PA (1992) Settle - an Analytical Version of the Shake and 

Rattle Algorithm for Rigid Water Models. J Comput Chem 13: 952-962. 

10. Darden T, York D, Pedersen L (1993) Particle Mesh Ewald - an N.Log(N) Method for 

Ewald Sums in Large Systems. J Chem Phys 98: 10089-10092. 

79


11. Cauchy A (1847) Méthode générale pour la résolution des systèmes d'équations 

simultanées. C R Acad Sci Paris 25: 536-538. 

12. Berendsen HJC, Postma JPM, Vangunsteren WF, Dinola A, Haak JR (1984) Molecular- 

Dynamics with Coupling to an External Bath. J Chem Phys 81: 3684-3690. 

13. Hess B, Kutzner C, van der Spoel D, Lindahl E (2008) GROMACS 4: Algorithms for 

highly efficient, load-balanced, and scalable molecular simulation. J Chem Theory 

Comput 4: 435-447. 

14. Daura X, Gademann K, Jaun B, Seebach D, van Gunsteren WF, et al. (1999) Peptide 

folding: When simulation meets experiment. Angew Chem Int Edit 38: 236-240. 

15. Berendsen HJ, Hayward S (2000) Collective protein dynamics in relation to function. 


16. Ichiye T, Karplus M (1991) Collective motions in proteins: a covariance analysis of 

atomic fluctuations in molecular dynamics and normal mode simulations. Proteins 

11: 205-217. 

17. Amadei A, Linssen AB, Berendsen HJ (1993) Essential dynamics of proteins. Proteins 

17: 412-425. 

18. Amadei A, Ceruso MA, Di Nola A (1999) On the convergence of the conformational 

coordinates basis set obtained by the essential dynamics analysis of proteins' 

molecular dynamics simulations. Proteins 36: 419-424. 

80

PART II: P450BM-3 Reductase Domain 

Chapter 4 

Conformational Dynamics of the FMN-binding Reductase 

Domain of Monooxygenase P450BM-3 

4.1. Abstract 

In the cytochrome P450BM-3, flavin mononucleotide (FMN) binding domain is an 

intermediate electron donor between the flavin adenine dinucleotide (FAD) binding 

domain and the HEME domain. Experimental evidence has shown that different redox 

states of FMN cofactor were found to induce conformational changes in the FMN domain. 

Herein, molecular dynamics (MD) simulation is used to gain insight into the latter 

phenomenon at atomistic level. We have studied the effect of FMN cofactor and its redox 

states (oxidized and reduced) on the structure and dynamics of FMN domain. The results of 

our study show significant differences in the atomic fluctuation amplitude of FMN domain 

in both holo- and apo-protein. The change in the protonation state of FMN cofactor mostly 

affects its binding in holo-protein. In particular, the loops involved in the binding of 

isoalloxazine ring (Lβ4) and ribityl side chain (Lβ1) adopt different conformations in both 

reduced and oxidized states. In addition, the reduced FMN cofactor mainly induces a 

conformational change in Trp574 residue (Lβ4) that is essential to control electron 

transfer (ET) within P450BM-3 domains. The structure of the apo-protein in solution 

remains mostly unchanged with respect to the crystal structure of the holo-protein. 

However, FMN binding loops were more flexible in apo-protein that might favor the 

81


rebinding of FMN cofactor. In the holo-protein simulation, the largest conformational 

changes in FMN cofactor are caused by ribityl side chain. The isoalloxazine ring of FMN 

cofactor remains almost planar (~177°) in oxidized state and bends along N5 — N10 axis 

at the angle of ~160° in reduced state. The collective modes of isoalloxazine ring were 

identical in both the protonation states of FMN cofactor except the first eigenvector. In 

reduced state, the isoalloxazine ring attains the butterfly motion as a dominant collective 

motion in first eigenvector due to the bending along N5 — N10 axis. 


Cytochrome P450 monooxygenases, the largest superfamily of heme-containing 

soluble proteins, spread widely in almost all domains of life e.g. bacteria, yeast, insects, 

mammalian tissues, and plants.[1-3] They catalyze the oxidation of wide variety of 

substrates involved in biosynthesis and biodegradation pathways, or in xenobiotics 

metabolism.[4] Cytochrome P450BM-3, isolated from Bacillus megaterium, is a 

multidomain self-sufficient NADPH dependent flavoenzyme (class III bacterial P450).[5] As 

a pivotal member of its super family it has been deeply studied as an important model 

system for the comprehension of structure/function relationships and many structural and 

kinetic data are available in literature.[6,7] The peculiar catalytic properties of this enzyme 

towards industrial applications has also been successfully enhanced by protein 

engineering.[8,9] 

This enzyme is composed by two reductase domains (with FAD and FMN cofactors) 

and a P450 HEME domain (with a HEME cofactor) are arranged on a single polypeptide 

chain as HEME-FMN-FAD from N- to C-terminus. The transfer of two successive electrons 

from NADPH to HEME cofactor is essential for oxygenation reaction.[10-12] During the 

oxygenation reaction, the enzyme is reduced by NADPH, with electrons first transferred to 

FAD cofactor of FAD-binding domain then to the FMN cofactor of FMN-binding domain and 

finally to the HEME iron in the substrate bound HEME domain. In this ET process, FMN 

82


domain serves as one or two electrons mediator from the FAD cofactor to the heme 

iron.[13] FMN cofactor switches between fully oxidized and semiquinone state during 

catalytic turnover. The thermodynamically unstable anionic semiquinone state can reduce 

HEME iron. However, other P450s utilize FMN hydroquinone as the reduction species.[11- 

13] In the substrate free P450BM-3, FMN cofactor stays in a thermodynamically stable 

hydroquinone state which is not able to reduce HEME iron.[11]. The use of the anionic 

semiquinone state of FMN cofactor as reduction species makes P450BM-3 more efficient 

with a high turnover rate in comparison to other members of this family.[14] Therefore, the 

thermodynamics properties of the FMN moiety are mainly responsible for the unusual 

redox properties of P450BM-3. The protein environment has a strong influence on the 

latter mechanism by changing the redox potential of FMN and HEME cofactors. The 

mutagenesis studies have shown that the conformation of FMN binding loops plays a 

critical role in stabilizing the different redox states of FMN cofactor in the protein 

environment.[14,15] The insertion of a glycine residue in the re-face (inner-FMN binding) 

loop able to stabilize neutral semiquinone state in P450BM-3 as observed in other diflavin 

reductases.[16] 

Although, experimental data of the isolated FMN domain in solution and the 

crystallographic structure[17,18] are available, molecular dynamics (MD) study of this 

protein has not been reported. In this paper, the structural and dynamics properties of the 

FMN domain as holo-protein, with FMN cofactor in oxidized and reduced states and as apoprotein 

are investigated using classical MD simulations. The aim of this study is to 

understand the structural and dynamics properties of the protein in solution and the effect 

of the protonation state of FMN cofactor on the conformational dynamics of FMN cofactor 

and the whole protein. The paper is organized as follows. In the Methods Section, the 

details of the simulations, in particular the refinement of the original GROMOS96 FMN 

parameters for the oxidized and reduced states of the FMN cofactors are reported. In the 

results part, the analysis of the simulation trajectories for the apo- and the holo-protein in 

the oxidized and reduced states are reported. The analysis will be focused on the structural 

and dynamics properties of the overall protein structure and of the FMN binding side as 

well as the FMN cofactor. Finally, in the discussion and conclusions, the results of the 

83


simulations will be discussed in the context of the experimental knowledge of the FMN 

domain and a summary of the paper will be provided. 

4.2. Methods 

4.2.1. Starting coordinates 

The starting coordinates of the FMN domain (residue 479 - 630) were taken from a 

non-stoichiometric complex of P450BM-3 (PDB ID: 1BVY, 2.03 nm resolution) which has 

one FMN domain with two HEME domains.[18] The crystallographic water (within a 

distance of 0.6 nm from the FMN domain) was also kept during system preparation for MD 

simulations. 

4.2.2. Molecular dynamics simulation 

Table 4.1 summarizes the systems used to perform MD simulations with GROMOS96 

43a1 force field.[19] The crystal structure[18] has FMN cofactor in oxidized state (FOX) 

(see Figure 4.1a). The protonation of FOX isoalloxazine ring at N1 and N5 position 

represents the reduced state of FMN cofactor (FHQ) as indicated in Figure 4.1b. For the 

preparation of apo-protein simulation, FMN cofactor was removed from the crystal 

structure of FMN domain. GROMOS96 force field[20] was used for FMN cofactor. Additional 

improper dihedrals were introduced to adopt the conformation of isoalloxazine ring as 

observed in crystallographic structure and molecular geometry optimization of flavin in 

both redox states, reported in Table S4.1 (for FOX) and Table S4.2 (for FHQ) of supporting 

information (SI).[21,22] 

In the isoalloxazine ring of FMN cofactor, bending angle (δ) and puckering angle (ρ) 

were calculated along N5—N10 axis using v1, v2, a1, a2 and c vectors as shown in Figure 4.2. 

84


In FOX, isoalloxazine ring was kept close to planar while the bending angle of ~160º was 

used for FHQ. 

Table 4.1: Summarizing MD Simulation of P450BM-3 FMN domain in water. 

FMN domain 

No. of atoms 

No. of solvent No. 

of Simulation 

molecules counter ions length (ns) 

Oxidized (FOX) 33483 

10650 14 

50 

Reduced (FHQ) 33491 

10652 14 

50 

Apo protein 

(APO) 

33482 

10662 13 

50 

*The abbreviations FOX, FHQ and APO are used in the rest of the paper for FMN domain in oxidized 

and reduced states, and as apo-protein, protein, respectively. 

Figure 4.1: The schematic representation of FMN cofactor in oxidized (a) and reduced (b) states 

with atomic numbering of isoalloxazine ring.[23] N1 and N5 atoms, in blue ovals highlight the 

protonation positions. ChemSketch[24] was used to draw the figures. 

85


Figure 4.2: The schematic structure of the isoalloxazine ring to define the bending angle (δ) and 

puckering angle (ρ) using the vectors v 1, v 2, a 1, a 2 and c. 

4.2.3. FMN binding site analysis 

The FMN binding site of holo-protein and apo-protein protein were tracked throughout the 

MD simulation using the MDpocket method.[25] The analysis was performed on total 5000 

snapshots after taking every 50 th frame from the trajectories. The pocket volume analysis 

was performed on aligned MD snapshots with the minimum alpha sphere size of 3 Å, the 

minimum number of alpha sphere close to each other for clustering of alpha sphere equal 

to The number of iteration to perform pocket volume calculation using Monte Carlo 

algorithm was set to 5000. The grid file generated by the first MDpocket run (iso-value = 

0.7) was used to extract the grid points for FMN binding pocket. The grid file of FMN 

binding pocket was edited manually by deleting some grid points using PyMOL[26] for 

better representation of FMN binding site in FMN domain. The FMN binding grid file was 

used with the aligned snapshots to track the changes in FMN binding site during the 

simulation. 

4.2.4. Multiple structural alignment of FMN domain 

The homologous structures of FMN domain were obtained by performing BlastP[27] 

against PDB database.[28] The protein sequences with identity greater than 20% were 

86


taking in account for further analysis. Six structures were selected as homologous to FMN 

domain of P450BM-3 after manually removing the redundant entries from BlastP results 

(summarized in Table 2). Multiple structural alignment (MSA) was performed on selected 

structures taking the FMN domain as reference structure with maximum RMSD cutoff of 

0.5 nm using UCSB chimera.[29] 

4.3. Results 

4.3.1. FMN domain: structural and dynamical properties 

Figure 4.3 shows the backbone root mean square deviation (RMSD) of the FMN 

domain as apo- and holo- protein with FMN cofactor in oxidized and reduced states. In FOX, 

RMSD curve stabilizes to a plateau with an average value of 0.24 ± 0.04 nm after a rapid 

increase in the first 15 ns simulation. In the first 10 ns simulation, FHQ follows the same 

trend as observed in FOX. However, after 15 ns, RMSD of FHQ stabilizes to an average value 

of 0.16 ± 0.01 nm lower than the one in FOX. In APO, FMN domain remains stable 

throughout the simulation with the average RMSD value of 0.19 ± 0.01 nm after the short 

equilibration of ~5 ns. For all the simulations, the average radius of gyration (Rg) did not 

show appreciable variations from the crystal structure value (1.45 nm). 

87


Figure 4.3: Backbone RMSD with respect to crystal structure as a function of time for APO (in 

green) and holo-protein in FOX (in black) and FHQ (in red). 

FMN domain has a highly classical flavodoxin fold with five parallel β-sheets (β1 – 

β5) that are surrounded by four α-helices (α1 – α4). The loop regions together with 

irregular structures (coils and turns) are named according to the secondary structure 

element, α(–helix) or β(–sheet), preceding them. FMN cofactor is surrounded by three 

loops that succeed β sheets, hence named as Lβ1, Lβ3 and Lβ4 for ribityl binding loop 

(residues 488 – 491), inner (re face residues 534 – 544) and outer (si face residuse 571 – 

579) FMN binding loop, respectively. The crystallographic protein secondary structure, 

calculated using the DSSP method[30], was preserved during the simulation of FOX, FHQ 

and even in the APO simulations (see Figure S4.1 of SI). 

Figure 4.4: Backbone RMSD (a) and RMSF (b) per residue with respect to crystal structure for 

FOX (black), FHQ (red) and APO (green). Vertical bars in grey color show the loop regions. Loops 

surrounding the FMN cofactor are shown in black horizontal bars. (c) FMN domain in pink and FMN 

cofactor in cyan color with labeled helices and loop regions. FMN binding loops are labeled in 

orange color. N and C represent the amino and carboxy terminus of FMN domain. 

88


In Figure 4.4a and 4.4b, backbone RMSD and RMSF per residue with respect to 

crystal structure are reported, respectively. FMN domain with labeled helices and loops is 

reported in Figure 4.4c. Residues with large RMSD and RMSF values corresponds to loop 

regions (represented by grey colored bars in the Figure 4.4a and 4.4b) and to N- and C- 

terminus. In all simulations, the largest deviations and fluctuations were observed in Lβ3 

and Lβ4 FMN binding loops (black horizontal bars in Figure 4.4a and 4.4b) and Lβ2 and 

Lα2 loops which are present opposite to FMN binding site. Lα2 loop shows the highest 

deviation in FOX. In APO, the deviations and fluctuations are mainly observed in the FMN 

binding loops (especially in the Lβ1 loop) that also occupy the FMN binding cavity during 

simulation. 

4.3.2. Cluster analysis of FMN domain 

Total number of clusters observed in FOX, FHQ and APO are 11, 12 and 7, 

respectively. The first two clusters accounts for 80.26 % and 79.25 % of the population for 

FOX and FHQ, respectively. The first cluster contributes to 47.10 % in FOX and 60.07 % in 

FHQ. While the second cluster represent the 33.16 % and 19.18 % of the total population in 

FOX and FHQ, respectively. In APO, even the first cluster covers the 85.17 % of the 

population. The apo-protein shows the least conformational diversity during the 

simulation. However the difference in the population of clusters in holo-protein indicates 

that the FMN protonation state notably influences the conformation of FMN domain. 

In Figure 4.5a, 4.5b and 4.5c, the representative conformations of the first two 

clusters of FOX and FHQ, and the first cluster of APO are superimposed with the crystal 

structure (in sky blue) and shown in cartoon representation. Major differences occur in 

loop regions as well as in N- and C- terminus. In particular, slightly larger deviations are 

present in FMN binding loops in FOX and FHQ. On the contrary, FMN binding region in APO 

is the most deviating part of FMN domain. The loop regions Lβ2 and Lα2 are also affected 

by the presence and the protonation state of FMN cofactor. In FOX, Lβ2 and Lα2 show 

larger deviations than in FHQ and APO as evidenced by the conformation of the first two 

89


clusters (see Figure 4.5). Lα2 loop flips inwards with higher deviation from crystal 

structure in the first two clusters of FOX. 

Figure 4.5: The conformation of first two clusters of (a) oxidized (black and gray), (b) reduced 

(red and coral) and (c) apo-protein protein (green) superimposed with crystal structure (in sky blue). 

Loops and helices (α1, α2, α3 and α4) are labeled. FMN binding loops are labeled in red color. The 

labeling of loops belongs to the secondary structure element succeed them. 

4.3.3. FMN binding site 

Figure 4.6 represents FMN binding site in detail using the representative structure 

of the first cluster of FOX (4.6a) and FHQ (4.6b), respectively. The hydrogen bonds between 

FMN domain and cofactor were calculated for distance between acceptor and hydrogen 

donor ≤ 0.35 nm and an angle among acceptor, donor and acceptor ≤ 30°. 

The occurrence of hydrogen bonds that are observed in crystal structure between 

FMN domain and the isoalloxazine ring of FMN cofactor are reported in Figure 4.7 for the 

simulation of FOX (in blue) and FHQ (in red). NMR spectroscopy studies of the protein in 

solution have also evidenced the hydrogen-bonding network involving N1, C2O, N3, C4O 

and N5 atoms of the isoalloxazine ring.[14,31] In the crystal structure, the hydrogen atom 

90


from the backbone amino group (NH) of Asn537 was involved in a hydrogen bond 

formation with N5 atom of isoalloxazine ring (NH – N5) with the distance of 0.175 nm. The 

same hydrogen bond was observed in the first 15 ns of FHQ simulation (in Figure 4.7a). 

However, being N5 atom in FHQ a hydrogen donor due to the protonation, a hydrogen 

bond with the oxygen from the side chain carboxamide group (-CONH2) of Asn537 (- 

(NH2)CO – HN5) was observed in last 25 ns simulation. 

Figure 4.6: (a) and (b) shows the FMN binding site from the representative structures of the first 

cluster for FOX and FHQ, respectively. The represented residues are within 0.4 nm from the FMN 

cofactor (in black). FMN binding loops Lβ1, Lβ3 and Lβ4 are shown as the ribbon of yellow, pink 

and cyan, respectively. The residues are labeled in red, blue and green as the part of Lβ1, Lβ3 and 

Lβ4, respectively. Dashed lines show hydrogen bonds between isoalloxazine and surrounding 

residues. The underlined labels indicate the residues that have major change in conformation after 

the change in the redox state of FMN cofactor. 

In the crystal structure and simulations, the stable hydrogen bonds were observed 

at oxygen (O2) and nitrogen (N3H) of isoalloxazine ring with the hydrogen of backbone 

amino group of Gln579 (NH – O2) and the oxygen from backbone carbonyl group (CO) of 

91


Thr577 (N3H – OC), respectively. O4 position was involved in hydrogen bond formation 

with hydrogen of hydroxyl group of Thr577 (OH – O4) in the first 15 ns simulation of FOX 

but it occurred throughout the whole FHQ simulation. Atom N1 was observed to form 

hydrogen bond with backbone amino group of Asp571 (NH – N1) in the crystal structure 

and FOX. However, in the FHQ simulation, the occurrence of this bond (NH – N1H) was very 

low. 

Figure 4.7: Hydrogen bond existence between FMN binding residues and a) isoalloxazine ring (a) 

and ribityl side chain (b) of FMN cofactor throughout the MD simulations calculated using every 

50 th ps frame. Blue and red color lines show hydrogen bond occurrences in FOX and FHQ, 

respectively as a function of time. On Y-axis labeled the partners of hydrogen bond. 

92


The change in protonation state of FMN cofactor affects its binding in FMN domain. 

FHQ strengthen the hydrogen-bonding network between ribityl side chain and phosphate 

moiety of FMN cofactor and domain. In FHQ, the phosphate group of FMN cofactor forms 

strong hydrogen bonds with the residues of Lβ1 loop than in FOX (shown in Figure 4.7b). 

The first hydroxyl group of ribityl side chain was involved in strong hydrogen bonding with 

the carbonyl oxygen of Ser537 (OH – OC) in FOX and FHQ. The third hydroxyl group of 

ribityl side chain shows stronger hydrogen bond with Thr492 hydroxyl in FOX than FHQ. 

The latter was also observed to form hydrogen bond with carbonyl oxygen of Cys569 in 

FHQ. 

In FHQ, the tighter binding of FMN cofactor with stronger hydrogen bonding 

network between flavin and protein than FOX, induces the conformational change in FMN 

binding loops Lβ1, Lβ3 and Lβ4. The major conformational change was observed in the 

orientation of Trp574 residue due to the protonated N5 position in the isoalloxazine ring in 

FHQ (Figure 4.6b). The indole ring of Trp574 was nearly coplanar to the isoalloxazine ring 

of FMN cofactor in the crystal structure.[17] Experimentally, Trp574 conformation was 

observed to be critical to the FMN binding and found to be involved in ET tunneling from 

FMN to HEME. In FOX, the indole ring of Trp574 remains in the same conformation as in 

the crystal. In FHQ, it rotates to another configuration not aligned to the isoalloxazine ring. 

This rotation is a consequence of the steric hindrance induced by the conformation change 

in the reduced isoalloxazine ring. 

The change in the volume, hydrophobicity, solvent accessibility and polarity of FMN 

binding site are reported in Figure 4.8a, 4.8b , 4.8c and 4.8d, respectively during the 

simulation of FOX (in black), FHQ (in red) and APO (in green). In APO, the absence of FMN 

cofactor promotes a rearrangement of FMN binding site. After rearrangement, the side 

chains of amino acids of FMN binding loops replaced the initial water molecules and 

occupied the cavity after ~5 ns of simulation. FOX shows larger variation in the geometric 

properties of FMN binding pocket than FHQ. Major changes were observed in the volume of 

FMN binding pocket after the change in protonation state of FMN cofactor with averages 

429 ± 86 and 357 ± 45 for FOX and FHQ, respectively. In FHQ, the pocket volume showed 

93


less variation than in FOX (see Figure 4.8a). The hydrophobicity (Figure 4.8b) and polarity 

(Figure 4.8d) of FMN pocket are slightly perturbed in the first 15 ns simulation for then 

converge to the same values in both FQH and FOX simulations. 

Figure 4.8: FMN binding pocket (a) volume, (b) hydrophobicity, y, (c) solvent accessibility 

and (d) polarity for FOX (in black), FHQ (in red) and APO (in green). 

4.3.4. Conservation profile of FMN binding site 

In Table 4.2, the summary of MSA for FMN domain of P450BM-3 and its homologous 

structures (in SI see Figure S4. 4.2 for MSA and Figure S4.3 for conservation patterns mapped 

on FMN domain of P450BM-3) is reported. The ribityl side-chain binding region is the most 

94


Table 4.2: Summarizing MSA of FMN domain of P450BM-3 and its homologous structures 

and the characterization of their FMN-binding pocket. 

PDB 

Id 

1BVY 

1B1C 

1JA1 

2BF4 

1YKG 

3HR4 

1F4P 

Max. 

FMN binding pocket 

RMSD sequence 

Hydro Solvent 

Protein 

Pola 

(nm) identity Volume phobicitbility 

accessi- 

-rity 

(%) 

CPR a 0.000 100.00 601.82 358.17 16.69 11 

CPR a 0.134 30.26 508.28 299.54 25.26 11 

CPR a 0.135 30.26 594.28 313.50 24.65 12 

CPR a 0.135 25.00 554.08 354.37 11.29 12 

SiR-FP b 0.172 19.86 639.03 354.39 20.88 15 

NOS c 0.144 23.68 466.78 258.09 14.44 12 

Fld d 0.214 23.13 558.02 340.19 10.82 15 

Organism 

Bacillus 

megaterium 

Homo 

sapiens 

Rattus 

norvegicus 

Saccharomyces 

cerevisiae 

Escherichia 

coli 

Homo 

sapiens 

Desulfovibrio 

vulgaris 

conserved region in FMN domain since it is responsible for the tight binding of FMN 

cofactor. Among the cytochrome P450 reductases (CPR), the P450BM-3 one has the higher 

volume of FMN binding site with higher hydrophobicity and lower polarity. The solvent 

accessibility of the FMN binding pocket was found to be in the middle of other homologous 

protein. Together all these differences in the properties of FMN-binding pocket of P450BM- 

3 results into the better catalytic turnover than other P450 monooxygenases.[14,17] 

*CPR a : cytochrome P450 reductase, SiR-FP b : sulfite reductase, NOS c : nitric oxide synthase, Fld d : flavodoxin 

95


4.3.5. Principal component analysis of FMN domain 

The cumulative relative positional fluctuation of the first 20 eigenvectors accounts 

for 82 %, 79 % and 75 % of the total RPF in the simulation of FOX, FHQ and APO, 

respectively. The convergence of the trajectory has been analyzed by comparing the RMSIP 

value for the first 20 eigenvectors obtained from the PCA of MD trajectories (50 ns). The 

RMSIP values calculated by the two halves of the trajectories resulted in 0.563, 0.624 and 

0.594 for the simulation of FOX, FHQ and APO, respectively. The relatively high values of 

the RMSIP for the trajectories indicate the good convergence of the essential eigenvectors. 

The first two eigenvectors cover the 50%, 41%, and 32% (with 32% and 29% and 

20% contribution just from the first eigenvector) of the total fluctuations in FOX, FHQ, and 

APO, respectively. The inner product (IP) values for the first two eigenvectors obtained 

from the inner product matrix of two trajectories are reported in Table 4.3. The inner 

product of the first eigenvector in all simulations was found to be less than 0.350, which 

shows that the most important essential mode is different for the three systems. 

Table 4.3: RMSIP values of the first 2 eigenvectors obtained from the last 50 ns trajectories 

of two different simulations. 

Inner product 

FOX/FHQ 

FOX/APO 

FHQ/APO 

1 st eigenvector 2 nd eigenvector 

1 st eigenvector 0.143 0.400 






96


Figure 4.9a represents RMSF of the backbone atoms in the first and second 

eigenvector of FOX (black), FHQ (red) and APO (green). The corresponding tridimensional 

representations obtained after the projection of first and second eigenvectors on MD 

trajectories are reported in Figure 4.9b, 4.9c and 4.9d for FOX, FHQ and APO, respectively. 

In all simulations, the residues of C terminal and long loop regions, Lβ2 and Lα2 that are 

present opposite to FMN binding site show higher fluctuations and together constitute the 

collective motions in the first eigenvector. In the first eigenvector of FOX the collective 

motion is restricted to Lβ2, Lβ3 and Lα2 loops and C terminal region and have higher 

fluctuations. In the second eigenvector of FOX, the higher fluctuation was shown by C 

terminal residues and Lα1, Lβ2, Lα2 loops. FHQ shows higher fluctuations in Lα1, Lα2, Lβ4 

loops, and C and N terminus in the first eigenvector. The second eigenvector of FHQ shows 

higher fluctuations in Lα2, Lβ2 and Lβ4 loops. The first eigenvector of APO shows higher 

fluctuations in Lβ1, Lβ2 and Lα2 loops and C terminus, while the second eigenvector shows 

in Lα2 loop and in all the FMN binding loops. The higher fluctuations constitute the 

collective motion in the first eigenvector were observed in inner FMN binding loop Lβ3 for 

FOX, outer FMN binding loop Lβ4 for FHQ and Lβ1 for APO. 

Figure 4.10 shows the crystallographic structure complex of HEME with FMN 

domain with labeled helices, cofactors and FMN binding loops. In the crystal structure, the 

α1 helix is involved in direct or water mediated contacts with HEME domain and the outer 

FMN binding loop (Lβ4) interact with the peptide precede the HEME binding loop (K/L 

loop). 18 These interaction sites are crucial for the ET from FMN to HEME. Higher 

fluctuations observed in Lβ4 and α1 helix regions in the first eigenvector of FHQ might be 

related to the inhibition of electron transfer from FMN to HEME by reduced state of FMN 

cofactor as observed experimentally. 11 In the first eigenvector of FOX, the higher 

fluctuation was restricted to the residues of inner FMN binding loop Lβ3 and loops, 

opposite to FMN binding site Lα2 and Lβ2. So the latter defined region is found opposite to 

the region of probable HEME binding surface in the crystal structure so the collective 

motion constitute by this region in FOX might be related to the electron transfer from FAD 

to FMN and it could be the probable binding site for FAD domain. In APO the local structure 

97


remain conserved with highly flexible FMN binding loops that helps to rebind FMN cofactor 

in apo-protein protein and working again as holo-protein as found experimentally. 41 

Figure 4.9: (a) RPF associated with eigenvectors. Vertical bars in grey color show the loop 

regions. FMN binding loop are shown in black horizontal bars. Representation of the RMSF of the 

98


protein backbone atoms along first and second eigenvectors after projection of the trajectory of 

FOX (black), FHQ (red) and APO (green) on the corresponding eigenvectors. The 10 sequential 

frames representing the extension of the fluctuations in FOX (b), FHQ (c) and APO (d) trajectories 

along the first and second eigenvector are reported. . The first extreme is shown in blue color and 

last extreme in cyan. Loops and helices es are labeled. Labels in red show the FMN binding loops. N 

and C indicate the N- and C-terminus of the protein. 

Figure 4.10: FMN domain (in pink) complex with P450 HEME domain (in blue) in crystal 

structure (1BVY[18]). HEME cofactor represented in orange and FMN cofactor in green. Helices and 

N- and C-terminus are labeled in both domains. Labels Lβ1, Lβ3 and Lβ4 show the FMN cofactor 

binding loops in the FMN domain. 

4.3.6. FMN cofactor: structural and dynamical properties 

The conformational changes of the FMN cofactor induced by the surrounding 

protein environment were studied for both redox states. The RMSD and RMSF of phosphate 

group atoms for both states show higher fluctuations and deviations than other FMN 

cofactor heavy atoms (see Figure 4.11a and 4.11b). Furthermore, the phosphate group of 

99


Figure 4.11: (a) RMSF and RMSD of heavy atoms calculated with respect to crystal 

structure. Vertical line shows the beginning of ribityl side chain. (b) Schematic diagram of 

FMN cofactor with the numbering used in the plots (4.4a) for the atomic positions of heavy 

atoms. 

FMN cofactor in oxidized state deviates more from the crystal structure and with higher 

fluctuations than it does in the reduced state. This is consistent with the observed 

variations of the hydrogen bonding network between the FMN cofactor and the protein. 

Figure 4.12a shows the distribution of the value of δ angles of the isoalloxazine ring 

(see Figure 4.2) in FOX and FHQ. For the oxidized state of FMN cofactor the vales are 

normal distributed in the range from 170º to 180º with the peak centered at 177º. In the 

reduced state, the distribution of δ has a larger width. The reduced state of FMN shows a 

distribution ranging from 154º to 171º with the peak at 162º. For the latter case, the 

average value is consistent with quantum mechanical calculations in vacuum of the 

isoalloxazine ring in the reduced state[22,32] that give a value for δ = ~160º. 

100


Figure 4.12: Distribution of (a) 

angles calculated at N5 — N10 axis of isoalloxazine ring along the 

50 ns simulation of FOX (in black color) and FHQ (in red color) with 0.3 bin width and (b) the 

beginning to end distance of the ribityl side chain along 50 ns simulation for FOX and FHQ (0.007 

bin width) and in X-ray (in green color) and NMR (in blue color) homologous structures of FMN 

domain. 

Figure 4.12b shows the distribution of the beginning to end distance for the ribityl side 

chain of FMN cofactor in FOX, FHQ and the homologous structures of FMN domain with 

FMN cofactor in oxidized state. For the crystallographic and NMR homologous structures 

the distance range from 0.73 to 0.8 nm and 0.70 to 0.81 nm, respectively. The distance 

observed ed in the crystal structure of FMN domain was 0.77 nm. The distances obtained from 

the FOX and FHQ simulations are distributed in the range of 0.61 to 0.90 nm with the main 

peaks at 0.78 nm and 0.80 nm, respectively. The beginning to end distance for ribityl side 

chain in the simulations and in NMR and crystallographic studies was consistent and 

distributed in the same range. 

101


4.3.7. Cluster analysis of FMN cofactor 

The cluster analysis was performed on the heavy atoms of FMN cofactor using the 

crystal structure as reference and a cutoff of 0.04 nm in the protein environment. The first 

cluster comprises 87 % and 99 % of the total 13 and 8 clusters in FOX and FHQ, 

respectively. The cumulative sum of the number of clusters obtained from different 

simulations as a function of time is reported in Figure S4.4 of the SI. Both the simulations 

reached a plateau, which indicates a sufficient sampling of conformational space along the 

trajectories. The representative structure of first cluster of FOX (in black) and FHQ (in red) 

are reported in Figure S4.4 in SI. The conformational flexibility of ribityl side chain was 

observed to be mainly responsible for the conformational diversity of clusters in both the 

simulations. The reduced number of clusters in the FHQ simulation is consistent with the 

small RMSF fluctuations of the ribityl side chain. The preferred conformation for the FMN 

cofactor in both redox states is characterized by a partially elongated ribityl side chain (see 

Figure S4.4 in SI). 

4.3.8. Principal component analysis of FMN cofactor 

The first eight eigenvectors cover ~79 % of the total RPF in both the simulations. In 

the protein environment, RPF covers ~25 % and ~27 % by the first eigenvector in FOX and 

FHQ, respectively. The superimposition of the first and last extreme structures of 

isoalloxazine ring generated by the projection of the first eight eigenvectors on the 

trajectory of FOX is shown in Figure 4.13. In the first eigenvector, a symmetric bending 

mode of the ring is present in both the simulations. However, the reduced state (Figure 

4.13 (1b)) manifest a “butterfly wing” bending mode around the N5 — N10 axis. This type 

of vibrational mode has been also reported in previous experimental and quantum 

mechanics study.[21,22,32] The other seven modes show similar eigenvectors in both 

states. The second most dominant motion is the twisting of the isoalloxazine ring along the 

main isoalloxazine axis. The observed collective modes are in the qualitative agreement 

with the vibrational normal modes obtained from resonance Raman spectroscopy 

measurements and QM calculations for the Lumiflavin.[23,32] In addition, surface 

102


enhanced resonance Raman scattering studies of the free FMN cofactor and in FMN domain 

indicate evidences the presence of vibrational modes resulted by atomic displacement of 

atoms like C4, O, C4a, C10a , C5a and C9a.[33] These experimental observations are 

consistent with the higher fluctuations in correspondence of these atoms observed in PC 

modes from the first 8 eigenvectors of our simulations. 

Figure 4.13: The superimposition of two extreme structures generated after projecting FOX 

trajectory on the first eight eigenvectors (structures from 1 - 8). 1a and 1b show the different 

collective motion of 1 st eigenvector in oxidized and reduced state, respectively. 

4.4. Discussion and conclusions 

MD simulations have been performed on FMN binding reductase domain of 

monooxygenase P450BM-3 using FMN cofactor in oxidized and reduced state to 

understand the effect of the change in protonation state of isoalloxazine ring on the 

103


conformation and dynamics of FMN domain and cofactor. The results of the simulations 

showed that the change of the protonation state in the reduced FMN affect the overall 

structure and dynamics of FMN domain in solution. In particular, the structural and 

dynamic properties of the si-face FMN binding loop (Lβ3) are strongly influenced by the 

change in protonation of FMN cofactor (in FOX). In the apo-protein, the overall local 

structure of the protein remains reserved but higher fluctuations were observed in FMN 

binding loops. The latter effect can explain the experimental finding of reversible rebinding 

of FMN cofactor in apo-protein.[31,34,35] The loop Lβ2 were observed to contribute 

mainly on the collective modes of the FMN domain as holo-protein or apo-protein that is 

also in agreement with the solution structure of flavodoxin-like domain of E.coli 

determined by NMR.[35] The inner FMN binding loop (Lβ3) contributed to the prominent 

collective mode of FMN domain in oxidized state. While the outer FMN binding loop (Lβ4) 

contribute to the prominent collective mode of FMN domain in reduced state and in apoprotein. 

In FHQ, the major conformational change in the FMN binding site residue Trp574 

was observed. Trp574 is critical to FMN cofactor binding and electron transfer in P450BM- 

3. In FHQ, the latter do not remain coplanar to isoalloxazine ring to avoid the steric 

hindrance induced by the conformation change in the FMN cofactor upon protonation as 

also suggested by 15 N–NMR[31] and surface enhanced resonance Raman scattering[33] 

experiments. Hence, the change in the conformation of Trp574 might be the major factor 

that makes the reduced state kinetically unfavorable in P450BM-3 for transferring the 

electron from FMN to HEME as observed in previous studies.[36] 

The FMN cofactor during simulations acquires different conformations that are 

mainly influenced by the movement of ribityl side chain. The binding region of the ribityl 

side chain was evolutionary more conserved. In general the oxidized state was observed 

more flexible to obtain different conformation in protein environment. The latter might be 

the result of change in the FMN binding site properties and hydrogen bond environment in 

the reduced state. The isoalloxazine ring of FMN cofactor exhibits mainly 8 collective 

motions. Except first vibrational modes all modes were identical in both redox states. FMN 

cofactor in reduced state constitutes to the so-called “butterfly motion” as the first 

collective motion due to bending of isoalloxazine along the N5 — N10 axis. 

104


In summary, we have analyzed for the first time the dynamics of the FMN binding 

domain of P450BM-3 in water. In particular, we have studied the effect of FMN binding on 

the fluctuation modes of FMN domain. FMN cofactor is involved in the electron transfer in 

P450BM-3 and its dynamics can play an important role in electron transfer. The results of 

our study indicate a difference in the fluctuation amplitude of the FMN cofactor in the 

different redox states. The latter effect was resulted by the change in the conformation of 

FMN binding site due to the protonation state of isoalloxazine ring. 


1. Chefson A, Auclair K (2006) Progress towards the easier use of P450 enzymes. Mol 

Biosyst 2: 462-469. 

2. Wong LL (1998) Cytochrome P450 monooxygenases. Curr Opin Chem Biol 2: 263- 

268. 

3. Guengerich FP (2001) Common and uncommon cytochrome P450 reactions related 

to metabolism and chemical toxicity. Chem Res Toxicol 14: 611-650. 

4. Kumar S (2010) Engineering cytochrome P450 biocatalysts for biotechnology, 

medicine and bioremediation. Expert Opin Drug Metab Toxicol 6: 115-131. 

5. Warman AJ, Roitel O, Neeli R, Girvan HM, Seward HE, et al. (2005) Flavocytochrome 

P450 BM3: an update on structure and mechanism of a biotechnologically important 

enzyme. Biochem Soc Trans 33: 747-753. 

6. Munro AW, Leys DG, McLean KJ, Marshall KR, Ost TW, et al. (2002) P450 BM3: the 

very model of a modern flavocytochrome. Trends Biochem Sci 27: 250-257. 

7. Girvan HM, Waltham TN, Neeli R, Collins HF, McLean KJ, et al. (2006) 

Flavocytochrome P450 BM3 and the origin of CYP102 fusion species. Biochem Soc 

Trans 34: 1173-1177. 

8. Jung ST, Lauchli R, Arnold FH (2011) Cytochrome P450: taming a wild type enzyme. 

Curr Opin Biotechnol 22: 809–817. 

105


9. Whitehouse CJC, Bell SG, Wong L-L (2012) P450BM3 (CYP102A1): connecting the 

dots. Chem Soc Rev 41: 1218-1260. 

10. Sevrioukova I, Peterson JA (1996) Domain-domain interaction in cytochrome 

P450BM-3. Biochimie 78: 744-751. 

11. Sevrioukova I, Shaffer C, Ballou DP, Peterson JA (1996) Equilibrium and transient 

state spectrophotometric studies of the mechanism of reduction of the flavoprotein 

domain of P450BM-3. Biochemistry 35: 7058-7068. 

12. Sevrioukova I, Truan G, Peterson JA (1996) The flavoprotein domain of P450BM-3: 

Expression, purification, and properties of the flavin adenine dinucleotide- and 

flavin mononucleotide-binding subdomains. Biochemistry 35: 7528-7535. 

13. Hazzard JT, Govindaraj S, Poulos TL, Tollin G (1997) Electron transfer between the 

FMN and heme domains of cytochrome P450BM-3. Effects of substrate and CO. J Biol 

Chem 272: 7922-7926. 

14. Narhi LO, Fulco AJ (1987) Identification and characterization of two functional 

domains in cytochrome P-450BM-3, a catalytically self-sufficient monooxygenase 

induced by barbiturates in Bacillus megaterium. J Biol Chem 262: 6683-6690. 

15. Pylypenko O, Schlichting I (2004) Structural aspects of ligand binding to and 

electron transfer in bacterial and fungal P450s. Annu Rev Biochem 73: 991-1018. 

16. Chen HC, Swenson RP (2008) Effect of the Insertion of a Glycine Residue into the 

Loop Spanning Residues 536-541 on the Semiquinone State and Redox Properties of 

the Flavin Mononucleotide-Binding Domain of Flavocytochrome P450BM-3 from 

Bacillus megaterium. Biochemistry 47: 13788-13799. 

17. Sevrioukova IF, Hazzard JT, Tollin G, Poulos TL (1999) The FMN to heme electron 

transfer in cytochrome P450BM-3 - Effect of chemical modification of cysteines 

engineered at the FMN-heme domain interaction site. J Biol Chem 274: 36097- 

36106. 



1863-1868. 

106




Hochschulverlag AG an der ETH Zurich and BIOMOS bv, Zurich, Groningen: 1. 




21. Zheng Y-J, Ornstein RL (1996) A Theoretical Study of the Structures of Flavin in 

Different Oxidation and Protonation States. J Am Chem Soc 118: 9402-9408. 

22. Walsh JD, Miller AF (2003) Flavin reduction potential tuning by substitution and 

bending. J Mol Struc-Theochem 623: 185-195. 

23. Abe M, Kyogoku Y (1987) Vibrational Analysis of Flavin Derivatives - Normal 

Coordinate Treatments of Lumiflavin. Spectrochim Acta A 43: 1027-1037. 

24. (2010) ACD/ChemSketch Freeware, version 1201, Advanced Chemistry 

Development, Inc, Toronto, ON, Canada, wwwacdlabscom. 

25. Schmidtke P, Bidon-Chanal A, Luque FJ, Barril X (2011) MDpocket : Open Source 

Cavity Detection and Characterization on Molecular Dynamics Trajectories. 

Bioinformatics 27: 3276-3285. 

26. (2012) The PyMOL Molecular Graphics System, Version 1504 Schrödinger, LLC, 

http://wwwpymolorg/. 

27. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment 

search tool. J Mol Biol 215: 403-410. 

28. Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Jr., Brice MD, et al. (1977) The 

Protein Data Bank: a computer-based archival file for macromolecular structures. J 

Mol Biol 112: 535-542. 

29. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, et al. (2004) UCSF 

Chimera--a visualization system for exploratory research and analysis. J Comput 

Chem 25: 1605-1612. 



2637. 

107


31. Kasim M, Chen HC, Swenson RP (2009) Functional characterization of the re-face 

loop spanning residues 536-541 and its interactions with the cofactor in the flavin 

mononucleotide-binding domain of flavocytochrome P450 from Bacillus 

megaterium. Biochemistry 48: 5131-5141. 

32. Nakai S, Yoneda F, Yamabe T (1999) Theoretical study on the lowest-frequency 

mode of the flavin ring. Theor Chem Acc 103: 109-116. 

33. Macdonald IDG, Smith WE, Munro AW (1999) Analysis of the structure of the flavin 

binding sites of flavocytochrome P450BM3 using surface enhanced resonance 

Raman scattering. Eur Biophys J 28: 437-445. 

34. Wittung-Stafshede P (2002) Role of cofactors in protein folding. Acc Chem Res 35: 

201-208. 

35. Sibille N, Blackledge M, Brutscher B, Coves J, Bersch B (2005) Solution structure of 

the sulfite reductase flavodoxin-like domain from Escherichia coli. Biochemistry 44: 

9086-9095. 

36. Klein ML, Fulco AJ (1993) Critical residues involved in FMN binding and catalytic 

activity in cytochrome P450BM-3. J Biol Chem 268: 7553-7561. 


Roccatano D. Journal of Chemical Theory and Computation DOI: 10.1021/ct300723x.’ 

108

PART II: P450BM-3 Reductase Domain SI 

Supporting Information 

Conformational Dynamics of the FMN-binding Reductase 

Domain of Monooxygenase P450BM-3 

Table S4.1: Force field parameters for FMN cofactor in oxidized state for GROMOS96 43a1 

force field.[1] 

Atom number Atom type Atom name Charge group Partial charge 

1 C FC9A 1 0.200 

2 NR FN10 1 -0.200 

3 C FC10A 2 0.360 

4 NR FN1 2 -0.360 

5 C FC2 3 0.380 

6 O FO2 3 -0.380 

7 NR FN3 4 -0.280 

8 H FH3 4 0.280 

9 C FC4 5 0.380 

10 O FO4 5 -0.380 

11 C FC4A 6 0.180 

12 NR FN5 6 -0.280 

13 C FC5A 6 0.100 

14 CR1 FC6 7 0.000 

15 C FC7 8 0.000 

109


16 CH3 FCM7 8 0.000 

17 C FC8 9 0.000 

18 CH3 FCM8 9 0.000 

19 CR1 FC9 10 0.000 

20 CH2 FCA 11 0.000 

21 CH1 FCB 12 0.150 

22 OA FOB 12 -0.548 

23 H FHB 12 0.398 

24 CH1 FCG 13 0.150 

25 OA FOG 13 -0.548 

26 H FHG 13 0.398 

27 CH1 FCD 14 0.150 

28 OA FOD 14 -0.548 

29 H FHD 14 0.398 

30 CH2 FCE 15 0.150 

31 OA FOZ 15 -0.36 

32 P FPH 15 0.630 

33 OA FOH 15 -0.548 

34 H FHH 15 0.398 

35 OM FOT1 15 -0.635 

36 OM FOT2 15 -0.635 

Dihedral parameters 

ai aj ak al function c0 c1 c2 

13 2 1 3 1 180.000 33.5 2 

1 2 3 11 1 180.000 33.5 2 

3 11 12 13 1 180.000 33.5 2 

11 13 12 1 1 180.000 33.5 2 

2 20 21 24 1 0.000 5.86 3 

20 22 21 23 1 0.000 1.26 3 

20 21 24 27 1 0.000 5.86 3 

110


21 24 25 26 1 0.000 1.26 3 

21 24 27 30 1 0.000 5.86 3 

24 27 28 29 1 0.000 1.26 3 

24 30 27 31 1 0.000 5.86 3 

27 30 31 32 1 0.000 3.77 3 

30 31 32 33 1 0.000 1.05 3 

31 33 32 34 1 0.000 1.05 3 

Improper dihedral parameters 

ai aj ak al function c0 c1 

1 2 19 13 2 0.0 167.42 

1 12 14 13 2 0.0 167.42 

1 13 14 15 2 0.0 167.42 

1 19 17 15 2 0.0 167.42 

1 2 3 4 2 180.0 167.42 

1 12 13 11 2 0.0 167.42 

1 14 13 12 2 180.0 167.42 

1 2 3 11 2 0.0 167.42 

1 17 19 18 2 180.0 167.42 

2 1 13 12 2 0.0 167.42 

2 11 3 12 2 0.0 167.42 

2 1 13 14 2 180.0 167.42 

2 3 11 9 2 180.0 167.42 

2 19 1 17 2 180.0 167.42 

2 4 3 5 2 180.0 167.42 

2 1 3 20 2 0.0 167.42 

3 2 1 19 2 180.0 167.42 

3 5 4 7 2 0.0 167.42 

3 4 5 6 2 180.0 167.42 

3 11 9 7 2 0.0 167.42 

3 2 4 11 2 0.0 167.42 

111


3 12 9 11 2 0.0 167.42 

3 11 12 13 2 0.0 167.42 

3 1 2 13 2 0.0 167.42 

3 9 11 10 2 180.0 167.42 

4 11 3 9 2 0.0 167.42 

4 5 7 9 2 0.0 167.42 

4 5 7 8 2 180.0 167.42 

4 3 2 11 2 180.0 167.42 

4 11 3 12 2 180.0 167.42 

5 4 7 6 2 0.0 167.42 

5 4 3 11 2 0.0 167.42 

5 9 7 11 2 0.0 167.42 

5 9 7 10 2 180.0 167.42 

6 7 5 8 2 0.0 167.42 

6 5 7 9 2 180.0 167.42 

7 9 5 8 2 0.0 167.42 

7 9 10 11 2 180.0 167.42 

8 7 9 20 2 0.0 167.42 

8 7 9 11 2 180.0 167.42 

8 10 11 12 2 180.0 167.42 

9 7 11 10 2 0.0 167.42 

9 12 11 13 2 180.0 167.42 

12 1 13 19 2 180.0 167.42 

12 11 9 7 2 180.0 167.42 

12 13 14 15 2 180.0 167.42 

13 19 1 17 2 0.0 167.42 

13 15 14 17 2 0.0 167.42 

13 15 14 16 2 180.0 167.42 

13 12 1 14 2 0.0 167.42 

14 12 13 11 2 180.0 167.42 

112


14 1 13 19 2 0.0 167.42 

14 17 15 19 2 0.0 167.42 

14 15 17 18 2 180.0 167.42 

15 17 14 16 2 0.0 167.42 

16 15 17 18 2 0.0 167.42 

17 15 19 18 2 0.0 167.42 

19 17 15 16 2 180.0 167.42 

20 24 22 21 2 35.0 167.42 

21 27 25 24 2 35.0 167.42 

24 30 28 27 2 35.0 167.42 

Table S4.2: Force field parameters for FMN cofactor in reduced state for GROMOS96 43a1 

force field.[1] 


1 C FC9A 1 0.1 

2 NR FN10 1 -0.2 

3 C FC10A 1 0.1 

4 NR FN1 2 -0.28 

5 H FH1 2 0.28 

6 C FC2 3 0.38 

7 O FO2 3 -0.38 

8 NR FN3 4 -0.28 

9 H FH3 4 0.28 

10 C FC4 5 0.38 

11 O FO4 5 -0.38 

12 C FC4A 6 0.00 

13 NR FN5 7 -0.28 

14 H FH5 7 0.28 

113


15 C FC5A 8 0.00 

16 CR1 FC6 9 0.00 

17 C FC7 10 0.00 

18 CH3 FCM7 10 0.00 

19 C FC8 11 0.00 

20 CH3 FCM8 11 0.00 

21 CR1 FC9 12 0.00 

22 CH2 FCA 13 0.00 

23 CH1 FCB 14 0.15 

24 OA FOB 14 -0.548 

25 H FHB 14 0.398 

26 CH1 FCG 14 0.15 

27 OA FOG 15 -0.548 

28 H FHG 15 0.398 

29 CH1 FCD 15 0.15 

30 OA FOD 16 -0.548 

31 H FHD 16 0.398 

32 CH2 FCE 16 0.15 

33 OA FOZ 17 -0.36 

34 P FPH 17 0.63 

35 OA FOH 17 -0.548 

36 H FHH 17 0.398 

37 OM FOT1 17 -0.635 

38 OM FOT2 17 -0.635 


ai aj ak al function c0 c1 c2 

15 1 3 2 1 180.00 33.5 2 

1 12 2 3 1 180.00 33.5 2 

3 15 12 13 1 180.00 33.5 2 

12 13 1 15 1 180.00 33.5 2 

114


2 22 23 26 1 0.00 5.86 2 

22 24 23 25 1 0.00 1.26 2 

22 23 26 29 1 0.00 5.86 2 

23 26 27 28 1 0.00 1.26 2 

23 26 30 29 1 0.00 5.86 2 

26 29 30 31 1 0.00 1.26 2 

26 32 29 33 1 0.00 5.86 2 

29 32 33 34 1 0.00 3.77 2 

32 33 34 35 1 0.00 1.05 2 

33 35 34 36 1 0.00 1.05 2 

Improper dihedral parameters 

ai aj ak al function c0 c1 

1 15 2 21 2 5 167.42 

1 13 16 15 2 0 167.42 

1 15 16 17 2 0 167.42 

1 21 19 17 2 0 167.42 

1 2 3 4 2 160 167.42 

1 13 15 12 2 50 167.42 

1 16 15 13 2 180 167.42 

1 3 2 12 2 50 167.42 

1 19 21 20 2 180 167.42 

2 1 15 13 2 60 167.42 

2 12 3 13 2 60 167.42 

2 15 1 16 2 180 167.42 

2 3 12 10 2 180 167.42 

2 21 1 19 2 180 167.42 

2 4 3 6 2 180 167.42 

2 3 1 22 2 5 167.42 

3 1 2 21 2 160 167.42 

3 6 4 8 2 0 167.42 

115


3 4 6 7 2 180 167.42 

3 12 10 8 2 0 167.42 

3 2 4 12 2 5 167.42 

3 13 10 12 2 0 167.42 

3 12 13 15 2 50 167.42 

3 1 2 15 2 50 167.42 

4 12 3 10 2 0 167.42 

4 6 8 10 2 0 167.42 

4 6 8 9 2 180 167.42 

4 3 2 12 2 180 167.42 

4 12 3 13 2 180 167.42 

5 6 4 3 2 180 167.42 

6 4 8 7 2 0 167.42 

6 4 3 12 2 0 167.42 

6 10 8 12 2 0 167.42 

6 10 8 11 2 180 167.42 

7 8 6 9 2 0 167.42 

7 6 8 10 2 180 167.42 

8 10 6 9 2 0 167.42 

8 10 11 12 2 180 167.42 

9 8 10 11 2 0 167.42 

9 8 10 12 2 180 167.42 

10 8 12 11 2 0 167.42 

10 13 12 15 2 160 167.42 

11 12 10 3 2 180 167.42 

13 1 15 21 2 180 167.42 

13 10 12 8 2 180 167.42 

13 16 15 17 2 180 167.42 

14 15 13 12 2 135 167.42 

15 21 1 19 2 0 167.42 

116


15 17 16 19 2 0 167.42 

15 16 17 18 2 180 167.42 

15 13 1 16 2 0 167.42 

16 15 13 12 2 160 167.42 

16 1 15 21 2 0 167.42 

16 19 17 21 2 0 167.42 

16 17 19 20 2 180 167.42 

17 19 16 18 2 0 167.42 

18 17 19 20 2 0 167.42 

19 17 21 20 2 0 167.42 

21 19 17 18 2 180 167.42 

22 26 24 23 2 35 334.84 

23 29 27 26 2 35 334.84 

26 32 30 29 2 35 334.84 

117


Figure S4.1: Secondary structure per residue calculated by DSSP[2] along the trajectory as a 

function of time for (a) FOX, (b) FHQ and (c) APO. Color code represents different secondary 

structure elements. 

118


Figure S4.2: (a) Multiple structure alignment of FMN domain (1BVY) and its homologous 

structures (summarized in Table 2) with the conservation profile, root mean square deviation 

(RMSD) and charge variation per residue created by using Chimera[3]. . Green and yellow color 

boxes show the helixes and beta strands, respectively. However the purple color boxes represent 

the residues involved in FMN binding. (b) The phylogenetic tree of FMN domain and its homologous 

structures generated by ClustalW2[4]. 

119


Figure S4.3: Evolutionary conservation profile on the FMN domain of P450BM-3 using 

RWB color scheme. Red region shows the highly conserved residues in the domain. FMN 

domain is shown in cartoon representation with FMN cofactor in green color and labeled 

helixes and loop op regions. FMN binding loops are labeled in blue color. N and C represent 

the amino and carboxy terminus of FMN domain. 

Figure S4.4: Cumulative sum of the number of clusters obtained from the simulations. The 

sampling of clusters was performed over 50 ns of FOX (black) and FHQ (red) using RMSD cutoff of 

0.04 nm. . The representative conformations of FMN cofactor in the first cluster of FOX (black) and 

FHQ (red) are shown. 

120


References 






2637. 

3. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, et al. (2004) UCSF 

Chimera--a visualization system for exploratory research and analysis. J Comput 

Chem 25: 1605-1612. 

4. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, et al. (2007) Clustal 

W and Clustal X version 2.0. Bioinformatics 23: 2947-2948. 


Roccatano D. Journal of Chemical Theory and Computation DOI: 10.1021/ct300723x.’ 

121

PART II: P450BM-3 HEME/FMN Complex 

Chapter 5 

Insight into the redox partner interaction mechanism in 

cytochrome P450BM-3 using molecular dynamics 

simulation 

5.1. Abstract 

Flavocytochrome P450BM-3 is a soluble bacterial reductase composed by two flavin 

(FAD/FMN) and one HEME domains. The understanding of atomic details of the inter 

domain electron transfer (ET) mechanism is requisite for better exploitation of the enzyme 

in biotechnological applications and to extend the knowledge of P450 proteins family in 

general. In this paper, we have performed molecular dynamics (MD) simulations on both 

FMN and HEME domains, isolated and their crystallographic complex to study their binding 

modes and to garner insight on structural determinant for inter-domain ET. In the 

simulation of the complex, we observed conformational rearrangements in both the 

domains that reduce the separation between FMN and HEME cofactor. In particular, 

FMN/HEME closest distance was decreased from 1.84 nm (in crystal structure) to an 

average of 1.41 ± 0.09 nm during the simulation with a minimum distance of 1.02 nm. 

These distance values are within the range of distance for ET tunneling between the two 

redox centers. The analysis of the possible ET pathways in the crystal complex indicates 

Met490 of ribityl tail binding loop of FMN domain, and Ala399 and Cys400 of HEME 

domain as possible mediators for ET. However, during simulation, at ~1.41 nm FMN/HEME 

122


distance, in spite of Ala399, Cys400, Phe393 along with Met490 take part in ET while, with 

the minimum FMN/HEME distance of ~1.02 nm, only Met490 mediates the ET tunneling. 

The results of the simulations are in agreement with previously proposed hypotheses that 

the crystal complex of FMN/HEME domains is not in the optimal arrangement for favorable 

needed electron transfer rate under physiological conditions. 


Cytochrome P450s are the regio- and stereo-selective monooxygenase of the family 

oxidoreductase with a wide variety of substrates.[1-3] They have been studied as the 

potential catalyst for the production of high value oxygenated organic molecules to 

promote enzyme-mediated product formation.[4-6] In particular, cytochrome P450BM-3 is 

a NADPH dependent fatty acid hydroxylase system, isolated from soil bacterium Bacillus 

megaterium.[7,8] This enzyme is an attractive target and model system for biochemical and 

biomedical applications for different reasons. First, it is a stable, catalytically self-sufficient 

protein with a convenient multidomain structure that allows easier production and 

handling than other monooxygenases of the same family. Second, it is a water soluble 

enzyme with a high catalytic efficiency and oxygenase rate and readily expressed 

recombinantly.[9,10] Third, it resembles to eukaryotic diflavin reductase such as human 

microsomal P450s. As a pivotal member of its super family it has been widely studied as an 

important model system for the comprehension of structure-function-dynamics 

relationships with the wealth of structural and kinetic data.[11,12] 

P450BM-3, being a multidomain protein, has two reductase flavin adenine 

dinucleotide (FAD)- and flavin mononucleotide (FMN)- binding domains and a HEME 

domain arranged as HEME-FMN-FAD on a single polypeptide chain.[13,14] The main 

catalytic function of P450s is to transfer oxygen atom from molecular oxygen to their 

substrates. During the reaction, the enzyme is reduced by NADPH, with electrons first 

transferred to FAD cofactor of FAD-binding domain and then to HEME iron in the substrate 

123


bound HEME domain mediated by FMN cofactor of FMN-binding domain. The 

crystallization of the whole P450BM-3 protein has been proven difficult due to the 

presence of flexible linker regions between domains. However, the crystallographic 

structures of the isolated HEME domain[15], FAD domain[16] and a non-stoichiometric 

complex with one FMN and two HEME domains[15] are available in the PDB database. In 

the FMN/HEME complex (PDB ID: 1BVY) the smallest edge to edge distance between redox 

centers is 1.81 nm.[15] However, it has been shown from the survey of electron transfer 

(ET) in oxidoreductase protein structures that the latter should be less than 1.40 nm for an 

efficient ET tunneling between redox centers in the protein environment.[17] Munro et al. 

used modeling approach to rationalize the electron transfer between FMN to HEME and 

postulated the movement of FMN domain is essential to decrease the distance between 

FMN and HEME cofactors within the physiological range (less than 1.40 nm) for ET.[11] 

In this study, we aim to extend our knowledge regarding structure-functiondynamics 

relationships in P450BM-3 at atomistic level using Molecular dynamics (MD) 

simulations of the isolated HEME and FMN domains and of their complex in water. It has 

been proved experimentally that specific arrangement of HEME and FMN domain is 

responsible for the catalytic efficiency and high oxygenase rate of P450BM-3.[18] In this 

paper, for the first time, the dynamics in solution of the complex and the isolated HEME and 

FMN domains will be comparatively investigated. In particular, the relative rearrangement 

of FMN/HEME domains and how the latter affects the ET pathways from isoalloxazine ring 

of FMN cofactor to HEME iron will be analyzed. 

The chapter is organized as follows. The details of the MD simulations and the 

analysis of the trajectories are reported in the Method section. The Results and Discussions 

section is organized as follows. In the first part, the general structural and properties of the 

simulated systems to assess the quality of the simulation are reported. Cluster analysis is 

used to identify representative structures to evidence the difference of the domain in 

solution and in the complex. The following paragraphs will focus on the ET pathways 

between the FMN and HEME calculated on selected conformations from the cluster analysis 

of the trajectory. The structural behavior of the substrate access channel will be also 

124


reported. Hence, the collective dynamics of the system will be analyzed using the principal 

component analysis of the trajectories. Finally, in the conclusion section a summary of the 

outcome of the study is provided. 

5.3. Methods 


The non-stoichiometric FMN/(HEME)2 complex of one FMN domain without 

substrate (PDB ID: 1BVY with resolution 0.203 nm)[15] were used as to obtain the starting 

coordinate for MD simulation. Out of two HEME domains (chain A: 20 - 450) was in close 

proximity of FMN domain (chain F: 479 - 630) in the crystal structure. Hence, These A and 

F chains were extracted from crystal structure (including crystallographic water within 

0.60 nm from the proteins) and used as starting coordinates for MD simulation. 1,2- 

ethanediol molecules were removed from the crystallographic structure and replaced by 

water molecules. 

5.3.2. Molecular dynamic simulations 

The GROMOS96 43a1 force field[19] was used for all simulations. The MD 

simulations performed in this study are summarized in Table 5.1. Figure 5.1 shows the 

FMN and HEME cofactors in stick representation. The parameters for the ferric iron of 

HEME cofactor were adopted from Helms et al.[20] and has been employed already for the 

MD simulation of P450BM-3 HEME domain by Roccatano et al..[21,22] The partial charges 

were redistributed on porphyrin ring of HEME cofactor to adopt the parameters for 

GROMOS96 43a1 force field[19] with hydrogen atoms bound to bridging carbon in 

porphyrin ring (see Table S5.1 in Supplementary Information (SI)). FMN cofactor was in 

oxidized state in the FMN domain. Additional improper dihedrals were introduced to adopt 

125


the conformation of isoalloxazine ring as observed in crystallographic structure and 

molecular geometry optimization of flavin in both redox states.[23-25] 

Table 5.1: Summary of the MD simulations of P450BM-3 in water 

Starting coordinates 

No. of atoms No. of solvent No. 

of 

molecules counter ions 

(Na + ) 

HEME Domain (A 

65650 20365 16 

chain) 

FMN Domain (F chain) 33483 10650 14 

Complex (AF chain) 86101 26671 30 

Simulation 

length (ns) 

100 

100 

100 

*The abbreviation A, F and AF chain will be used in rest of the paper for HEME domain, FMN domain and 

HEME/FMN complex, respectively. 

126


Figure 5.1: (a) HEME cofactor and (b) FMN cofactor are in stick representation, colored by 

elements such as, oxygen in red, nitrogen in blue, hydrogen in green, iron or phosphorus in orange 

and carbon in gray, with atomic labeling according to GROMOS96[19] topology. 

5.3.2. Electron transfer tunneling 

Electron tunneling (ET) from FMN to HEME cofactor was calculated by the program 

Pathways.[26,27] The method calculates donor to acceptor partial electronic coupling 

influenced by protein structure using graph theory to identify the electron transfer 

pathways in biological electron transfer reactions.[26] FMN to HEME cofactor ET pathway 

was identified in the crystal structure and P450BM-3 conformation after rearrangement by 

taking C8 atom of isoalloxazine ring of FMN cofactor as donor and HEME iron as acceptor 

with the default parameters of Pathways. The ET pathway was visualized by VMD.[28] 

5.4. Results and discussion 

5.4.1. Structural properties 

The structural stability and convergence of P450BM-3 domains were examined by 

analyzing root mean square deviation (RMSD), radius of gyration (Rg) and secondary 

structure elements with respect to crystal structure during the MD simulation. Figure 5.2a 

shows the backbone RMSD of the proteins as a function of time. The total RMSD curves for 

both the AF chains and the single A chain reach to a plateau with an average RMSD of 0.41 ± 

0.03 nm and 0.36 ± 0.03 nm, respectively. The RMSD of isolated A chain shows an average 

plateau to a slightly lower value of 0.33 ± 0.02 nm. The F chain in the complex increases its 

RMSD value to an average on the last 10 ns of simulation of 0.25 ± 0.02 nm. The RMSD of 

isolated F chain increases rapidly to stabilize after 10 ns of simulation to an average value 

of 0.26 ± 0.02 nm. In Figure 5.2b, the radius of gyrations is also reported. In the first 10 ns 

127


of the simulation, the Rg of the complex decreases of ~3.7% from the crystallographic 

value (2.42 nm) to the average value of 2.33 ± 0.01 nm. The variation of the single A 

domain in the complex and in solution with respect the initial structure (2.16 nm) is less 

than 1.8% (2.12 ± 0.01 nm and 2.14 ± 0.01 nm, respectively). F chain does not show 

variation from the crystal structure (1.45 nm) with an average of 1.45 ± 0.01 nm and 1.46 ± 

0.01 nm for isolated F chain and in complex simulation, respectively. 

Figure 5.2: (a) Backbone RMSD and (b) Rg with respect to crystal structure as a function of time 

for AF chain (black), A of AF chain (red), F of AF chain (green), A chain (blue) and F chain (orange). 

In P450BM-3, A and F chain have structurally conserved P450 and flavodoxin like 

protein fold, respectively. Figure 5.3c shows the structure with labeled helices of A (A to L) 

and F (α1 to α4) chain and FMN binding loops (Lβ1, Lβ3 and Lβ4). The loop regions 

together with irregular structures (coils and turns) are named according to the secondary 

structure element (α helix or β sheet) preceding them. DSSP criteria[29] 

were used to 

follow the secondary structure of the P450BM-3 domains in isolated and complex MD 

simulations (Figure S5.1 in SI). The secondary structure remains fairly conserved during 

the simulations. 

128


Figure 5.3a and 5.3b show residual RMSD and RMSF with respect to crystal 

structure, respectively. The regions involved in cofactor binding show smaller deviation 

and fluctuation from the crystal structure in isolated (in red color) and complex (black 

color) simulations. For both domains, the loop regions and N- and C- terminus show higher 

deviation. The isolated domains deviate more than the one in complex except the region 

between helices, A - B, B’ - C, H - I, and K - L and in G helix in A chain and Lβ3 in F chain. 

Isolated F chain shows largest deviation in Lβ2 and Lβ4 regions. In both systems, F chain 

shows higher fluctuation in Lβ2 and Lα2 loops. In complex simulation, the loop regions 

A/B, and F/G fluctuate more. While in isolated F chain simulation, inner FMN cofactor 

binding loop Lβ3 fluctuate slightly more. 

Figure 5.3: Backbone RMSD (a) and RMSF (b) per residue with respect to crystal structure for 

isolated domains (in red) and in complex (in black) MD simulations. The green vertical line 

separates HEME and FMN domains. Horizontal bars, in blue and orange color represent helices 

(labeled) and beta sheets, respectively. The regions involved in cofactor binding are represented by 

horizontal bars in purple color. (c) HEME and FMN domain are in cartoon representation in sky 

blue and tan color, respectively. HEME and FMN cofactor are in red and green color, respectively. 

Helices, cofactors, FMN binding regions and, N- and C- terminus are labeled. 

129


5.4.2. Cluster analysis 

The first two clusters account for 46.16 % and 27.12 % for AF chain, respectively. 

For isolated domain simulation, A chain and F chain have 6 clusters and in complex 

simulation 7 and 8 clusters, respectively. The first two clusters of A chain in complex covers 

76.87 % and 10.99 % and as isolated domains they account for 46.53 % and 30.12 %, 

respectively. The first cluster of F chain covers ~64 % and second cluster covers 21.05% 

and 23.05 % in complex and isolated simulation, respectively. A chain is more liable for 

conformational change in isolated simulation than in complex, while F chain shows the 

negligible difference in conformation space in both the simulations. 

Figure 5.4: The representative conformation of first cluster of (a) AF, (b) A and (c) F chain in 

cartoon representation superimposed with crystal structure. In the crystal structure, A and F chain 

are in sky blue and tan color, respectively. For the complex simulation, A and F chain are in dark 

blue and brown color, respectively. For the isolated domain simulation, A and F chain are in orange 

130


and purple color, respectively. HEME and FMN cofactors are in green, red and blue color in crystal 

structure, isolated domain and in complex structure, respectively. The helices and FMN cofactor 

binding loops are labeled. Amino and carboxy terminal of the domain are labeled in red color. 

Figure 5.4a, 5.4b and 5.4c show the crystal structure superimposed with the 

representative conformation of the first cluster of AF chain and, A and F chain in isolated 

and complex simulation, respectively. Major differences were observed in the loop regions 

of the domains in both simulations. N terminus region (residue 20 – 82, including A, B and 

B’ helices) of A chain deviates more from crystal structure in both the simulations. In 

complex simulation, larger deviation in G helix and, H/I and K/L (residue 380 - 390) loop 

region of A chain. Residues 380 – 390 in K/L loop region precedes HEME cofactor binding 

region. H/I and K/L loops are involved in the binding of FMN domain. α2 helix of F chain 

shows larger deviation in isolated F chain simulation and resulted in a compact 

conformation of FMN domain in solution than in complex. In complex simulation, the 

representative structure of first cluster of AF chain represents the conformational 

rearrangement in both domains to increase compactness of AF chains complex. The 

deviations in both the domains from crystal structure mainly involve G helix and H/I and 

K/L loops of A chain and displacement of F chain towards HEME domain that resulted into 

the decrease in the minimum distance between both the cofactors. 

5.4.3. Substrate access channel 

Pro45 and Ala191 were found to be at the mouth of substrate access channel. In the 

crystal structure of P450BM-3 complex, P45Cα - A191Cα is 1.61 nm apart (0.87 nm in A 

chain of 1BU7). Chang et. al. observed that the substrate binding was not dramatically 

affected by the closeness of substrate access channel in P450BM-3 using MD simulation 

and docking approach.[30] The behavior of substrate access channel has been accessed by 

monitoring the distance between these two residues by Roccatano et al..[22] P45Cα - 

A191Cα minimum distance was calculated and reported in Figure 5.5 during the isolated 

domain and complex simulations. Both simulations show higher variations in P45Cα - 

131


A191Cα distance in the first 20 ns simulation. After that in isolated A chain, an average 

distance of 1.11 ± 0.10 nm was observed with slight variations. In A chain of AF chain, the 

P45Cα - A191Cα distance continues decreasing till 32 ns simulation and reaches to an 

average distance of 0.59 ± 0.10 nm. In comparison to isolated A chain substrate access 

channel was partially closed in A chain in complex that might be the result of more 

deviation of F/G loop in complex than in isolated domain. 

Figure 5.5: Minimum distance between P45Cα and A191Cα as a function of time for A chain in 

isolated (in red color) and complex (in black color) simulations. 

In the crystal structure, the crystallographic water molecule was not ligated to heme 

iron (Fe) (distance in 1BVY > 6 nm and 1BU7 0.24 nm). When A chain was solvated in 

water (crystal structure), the water molecule was present at the distance of 0.47 nm and 

0.34 nm from heme iron in isolated and complex simulation. Figure S5.2 in SI shows the 

minimum distance between Fe and water molecules (every 100 ps). An average distance of 

0.28 ± 0.13 nm and of 0.34 ± 0.14 nm was observed for A chain in complex and isolated 

domain simulation, respectively. 

132


5.4.4. ET tunneling pathways 

The minimum distance between heavy atoms of isoalloxazine ring of FMN and 

HEME cofactors is represented in Figure 5.6 (the AF chain simulation was extended to 150 

ns to check the distance convergence). During the simulation, FMN/HEME distance is 

decreased from 1.81 nm (in crystal structure) to an average distance of 1.41 ± 0.09 nm with 

the minimum distance of 1.02 nm that is within the range for expected ET between redox 

centers[17] (1.40 - 1.50 nm) and proposed by Munro et al. 11 The decreased distance might 

result into the ET rate of 10 8 to 10 11 s -1 , that is consistent with experimental and theoretical 

observations.[11,17] 

Figure 5.6: Minimum distance between heavy atoms of isoalloxazine ring of FMN and HEME 

cofactor as a function of time. Red color horizontal line shows the distance observed in crystal 

structure.[15] 

133


Figure 5.7a, 5.7b and 5.7c show the ET pathway identified by Pathways VMD 

plugin[27] in the crystal structure (min. dist 1.80 nm), representative of first cluster of AF 

chain (minimum distance 1.41 nm) and AF chain with minimum distance (minimum 

distance 1.10 nm) between FMN to HEME cofactor, respectively. In Table 5.2, the results of 

the analysis are summarized. In the crystal structure, FMN to HEME ET tunneling is 

mediated by solvent molecules as well but after rearrangement in AF conformation, FMN 

cofactor come close to HEME cofactor and eliminate the involvement of water molecules in 

ET tunneling. In Figure 5.7c, when the FMN to HEME distance is ~ 1 nm, ET tunneling is 

mediated by the Met490 residue only and the ET pathway length decrease from 2.7 nm (in 

crystal structure) to 1.8 nm and electronic coupling from 4.00 x 10 -10 to 2.68 x 10 -8 , 

respectively. 

Table 5.2: Electron transfer tunneling pathway in AF chain complex calculated by 

Pathways[27] VMD plugin. 

Coordinates 

FMN/HEME 

Max. Distance 

minimum 

coupling along ET 

distance 

(a.u.) pathway (nm) 

(nm) 

Crystal 

structure 

1.8 4.00 x10 -10 2.70 

First cluster 1.4 9.07 x10 -9 1.96 

Minimum 

FMN/HEME 1.1 2.68 x10 -8 1.79 

distance 

Amino acids involved in 

the ET pathway 

FMN(C8) → M490 → 

Sol → Sol → A399 → 

C400 → HEME(FE) 

FMN(C8) → M490 → 

→ F393 → HEME(FE) 

FMN(C8) → M490 → 

→ HEME(FE) 

134


Figure 5.7: ET tunneling from the isoalloxazine ring (C8 atom) of FMN cofactor (in gray color) to 

iron center of HEME cofactor (in black color) represented by red color tubes in a) crystal structure, 

b) conformation of first cluster and c) conformation with minimum distance between HEME to FMN 

cofactor. The amino acids with in the distance of 0.50 nm from both the cofactors are labeled and 

shown in licorice representation colored by element type (oxygen in red, carbon in cyan and 

nitrogen in blue color) and their associated secondary structure in cartoon representation in sky 

blue for HEME domain and in orange color for FMN domain. The residues involved in electron 

tunneling are represented and labeled in green color. 

5.4.5. Essential dynamics 

The cumulative sum of relative positional fluctuation (RPF) of first 50 eigenvectors 

of A and F chain in isolated and complex simulation is greater than 69% and reported in 

135


Figure S5.3 of SI. RMSIP for first twenty eigenvectors of A chain and F chain in both 

simulations was less than 0.53. The inner product value of the first three eigenvectors for A 

and F chain were less than 0.25 and 0.43, respectively. The overlap and inner product 

analysis indicate the existence of different set of collective motions in the eigenvectors of 

same time windows of both the trajectories. 

Figure 5.8a, 5.8b and 5.8c represent RPF associated with first three eigenvectors of 

A and F chain in isolated (in red color) and complex (in black color) simulation. Figure 5.9 

show the RMSF associated with first three eigenvector (a, b and c) of A (in sky blue) and F 

(in tan color) chain in isolated (a1, b1 and c1) and complex (a2, b2 and c2) simulation, 

respectively. 

Figure 5.8: RPF for (a) first, (b) second and (c) third eigenvector of A and F chain in isolated (red 

color) and complex (black color) simulation. The green vertical line separates HEME and FMN 

domain. Horizontal bars, in blue and orange color represent helixes (labeled) and beta sheets, 

136


respectively. The regions involved in cofactor binding are represented by horizontal bars in purple 

color. 

In complex simulation, the first collective motion (Figure 5.9a1) of A chain involves 

the turn succeeds beta sheet 1 (residues 44 – 48, highest RPF for Arg47 that is involved in 

substrate binding), D/E loop (residues 130 – 138), F/G loop (residues 190 – 196), K/L loop 

(residue 385 – 390) and C- terminus loop (residues 425 – 432 and 452 – 458). The 

cooperative motion in the turn succeeds beta sheet 1 and F/G loop related to the 

movement of substrate access channel closing and opening. Residue F393 of the latter 

region of K/L loop was involves FMN domain binding and found to be involved in ET 

tunneling in the average structure of first cluster of AF chain. The first collective mode of F 

chain in complex involves the major contribution of Lα2 loop with slightly higher RPF of 

Lβ2 and Lβ3 (inner FMN binding loop). The cooperative motion of Lα2 and Lβ3 might 

facilitate ET tunneling from FMN to HEME cofactor. In complex the collective motions of 

both the domains were synchronized to relate ET tunneling and change in substrate 

binding. The effect was clearly seen when the first eigenvectors of AF chain was compared 

with that of A and F chain (reported in Figure S5.4a and S5.5a in SI). In both AF and, A and F 

chain, the first eigenvector show fluctuations in the same regions with higher fluctuations 

in the collective mode of AF chain and cooperative effect due to their binding. The second 

collective mode in A chain involve mainly the motion in D/E and G/H loops, beta sheets in 

K/L regions and A/B region and the third collective motion was restricted to D/E and G/H 

loops and C-terminus loop (residues 425 – 432). F chain shows involvement of Lα2 and Lβ2 

loops and C- terminus region in the second collective mode and Lα2, Lβ3 and Lβ5 in the 

third eigenvector. In AF chain, the collective motion associated with the first two 

eigenvectors belongs to the movement of F chain towards A chain to decrease the distance 

between FMN and HEME cofactor and show slightly higher fluctuation than in the 

individual chains. In the third eigenvector the major difference was observed mainly in Lβ3 

and Lα2 loop of F chain with higher fluctuations. The collective motion associated with the 

first three eigenvectors of AF chain is reported in Figure S5.5a, S5.5b and S5.5c, 

respectively in SI. 

137


Figure 5.9: RMSF of protein backbone atoms along first (a), second (b) and third (c) 

eigenvector after projection of the trajectory on the corresponding eigenvector of A and F 

138


chain in complex simulation (a1, b1 and c1) and in isolated simulation (a2, b2 and c2). The 

10 sequential frames represent the extension of the fluctuations in trajectories along the 

eigenvectors. The first extreme conformation is shown in green color and last extreme in 

violet color. Other conformations of A and F chain are in sky blue and tan color, 

respectively. Helices and loops in FMN domain are labeled. N and C indicate the N- and C- 

terminus of the protein (labeled in red color). 

The isolated A chain have the first collective mode (Figure 5.9a2) have higher RPF at 

the end of C helix (residue 103 – 107) and C- terminus (residues 452 – 458). Other region 

involves in first collective mode were D/E, E/F, F/G and K/L loop (residue 385 – 390). 

Together the motion in related to the change in substrate binding region and FMN domain 

binding region. The first collective mode of the isolated F chain shows higher RPF in Lα2 

and Lβ2, and slightly high RPF in Lβ3 and Lβ4. In the isolated domains, the collective 

motions were more independent i.e. in F chain related to binding of FMN cofactor and in A 

chain restricted to substrate binding region. The second collective mode in A chain involve 

mainly the motion in D/E, E/F and F/G loops and only in F/G region in the third collective 

motion. F chain shows involvement of Lα2 and Lβ2 in the second collective mode and 

Lα2 and Lβ3 in the third eigenvector. 


We performed MD simulation on HEME and FMN domains as isolated domain or in 

complex. Structure remains conserved in both the systems throughout the simulation. 

During simulation, HEME/FMN complex undergoes into the conformational rearrangement 

in the first 10 ns simulation (with decrease in Rg from 2.42 nm to 2.33 nm) and resulted 

into the compactness of the complex with decrease in FMN/HEME distance from 1.81 nm 

to an average 1.41 nm. FMN domain in solution show major conformational change in Lα2 

loop in the absence of HEME domain. In isolated HEME domain major conformational 

139


change were observed in FMN binding region especially in C helix and H/I and K/L (residue 

385 – 395) loops. G helix and inner FMN cofactor loop (Lβ3) fluctuate more in both the 

simulations. Both domains differ in the atomic fluctuation amplitude in isolated and 

complex simulation. In complex the collective motion was dominated by the interaction 

mechanism between HEME and FMN domain and associated change in substrate access 

channel. The movement of FMN domain over HEME domain might be related to ET 

mechanism in P450BM-3 as proposed earlier and responsible to the ET rate between both 

the domains in the range from 10 8 to 10 11 s -1 under physiological condition as observed 

experimentally and proposed theoretically earlier.[11] 



Biosyst 2: 462-469. 


268. 



4. Urlacher VB, Eiben S (2006) Cytochrome P450 monooxygenases: perspectives for 

synthetic application. Trends biotechnol 24: 324-330. 

5. Bernhardt R (2006) Cytochromes P450 as versatile biocatalysts. J Biotechnol 124: 

128-145. 

6. Coon MJ (2005) Cytochrome P450: nature's most versatile biological catalyst. Annu 

Rev Pharmacol Toxicol 45: 1-25. 

7. Narhi LO, Fulco AJ (1986) Characterization of a catalytically self-sufficient 119,000- 

dalton cytochrome P-450 monooxygenase induced by barbiturates in Bacillus 

megaterium. J Biol Chem 261: 7160-7169. 

140


8. Narhi LO, Fulco AJ (1987) Identification and Characterization of 2 Functional 

Domains in Cytochrome-P-450bm-3, a Catalytically Self-Sufficient Monooxygenase 

Induced by Barbiturates in Bacillus-Megaterium. J Biol Chem 262: 6683-6690. 

9. Munro AW, Lindsay JG, Coggins JR, Kelly SM, Price NC (1994) Structural and 

Enzymological Analysis of the Interaction of Isolated Domains of Cytochrome-P-450 

Bm3. Febs Letters 343: 70-74. 




11. Munro AW, Leys DG, McLean KJ, Marshall KR, Ost TW, et al. (2002) P450 BM3: the 

very model of a modern flavocytochrome. Trends Biochem Sci 27: 250-257. 



Trans 34: 1173-1177. 

13. Peterson JA, Sevrioukova I, Truan G, GrahamLorence SE (1997) P450BM-3: A tale of 

two domains - Or is it three? Steroids 62: 117-123. 

14. Munro AW, Daff S, Coggins JR, Lindsay JG, Chapman SK (1996) Probing electron 

transfer in flavocytochrome P-450 BM3 and its component domains. Eur J Biochem 

239: 403-409. 



1863-1868. 

16. Joyce MG, Ekanem IS, Roitel O, Dunford AJ, Neeli R, et al. (2012) The crystal 

structure of the FAD/NADPH-binding domain of flavocytochrome P450 BM3. FEBS 

Journal 279: 1694-1706. 

17. Page CC, Moser CC, Chen X, Dutton PL (1999) Natural engineering principles of 

electron tunnelling in biological oxidation-reduction. Nature 402: 47-52. 

18. Hazzard JT, Govindaraj S, Poulos TL, Tollin G (1997) Electron transfer between the 

FMN and heme domains of cytochrome P450BM-3. Effects of substrate and CO. J Biol 

Chem 272: 7922-7926. 

141





20. Helms V, Deprez E, Gill E, Barret C, Hui Bon Hoa G, et al. (1996) Improved binding of 

cytochrome P450cam substrate analogues designed to fill extra space in the 

substrate binding pocket. Biochemistry 35: 1485-1499. 

21. Roccatano D, Wong TS, Schwaneberg U, Zacharias M (2005) Structural and dynamic 

properties of cytochrome P450 BM-3 in pure water and in a 

dimethylsulfoxide/water mixture. Biopolymers 78: 259-267. 




23. Verma R, Schwaneberg U, Roccatano D Conformational Dynamics of the FMNbinding 

Reductase Domain of Monooxygenase P450BM-3. Unpublished. 





26. Beratan DN, Betts JN, Onuchic JN (1991) Protein electron transfer rates set by the 

bridging secondary and tertiary structure. Science 252: 1285-1288. 

27. Balabin IA, Hu X, Beratan DN (2012) Exploring biological electron transfer pathway 

dynamics with the Pathways Plugin for VMD. J Comput Chem 33: 906-910. 

28. Humphrey W, Dalke A, Schulten K (1996) VMD: Visual molecular dynamics. J Mol 

Graphics 14: 33-&. 



2637. 

30. Chang YT, Loew GH (1999) Molecular dynamics simulations of P450 BM3-- 

examination of substrate-induced conformational change. J Biomol Struct Dyn 16: 

1189-1203. 

142

PART II: P450BM-3 HEME/FMN Complex SI 




simulation 

Table S5.1: Partial charge on HEME cofactor with ferric iron.[1-3] 


1 FE FE 1 1.0 

2 NR NA 1 -0.4 

3 NR NB 1 -0.4 

4 NR NC 1 -0.4 

5 NR ND 1 -0.4 

6 C CHA 2 -0.2 

7 HC HHA 2 0.2 

8 C C1A 3 0.2 

9 C C2A 3 -0.1 

10 C C3A 3 0.0 

11 C C4A 3 0.1 

12 CH3 CMA 4 0.0 

13 CH2 CAA 5 0.0 

14 CH2 CBA 5 0.0 

143


15 C CGA 6 0.27 

16 OM O1A 6 -0.635 

17 OM O2A 6 -0.635 

18 C CHB 7 -0.2 

19 HC HHB 7 0.2 

20 C C1B 8 0.05 

21 C C2B 8 0.05 

22 C C3B 8 -0.1 

23 C C4B 8 0.2 

24 CH3 CMB 9 0.0 

25 CR1 CAB 10 0.0 

26 CH2 CBB 10 0.0 

27 C CHC 11 -0.2 

28 HC HHC 11 0.2 

29 C C1C 12 0.2 

30 C C2C 12 0.0 

31 C C3C 12 -0.1 

32 C C4C 12 0.2 

33 CH3 CMC 13 0.0 

34 CR1 CAC 14 0.0 

35 CH2 CBC 14 0.0 

36 C CHD 15 -0.2 

37 HC HHD 15 0.2 

38 C C1D 16 0.2 

39 C C2D 16 0.1 

40 C C3D 16 -0.2 

41 C C4D 16 0.2 

42 CH3 CMD 17 0.0 

43 CH2 CAD 18 0.0 

44 CH2 CBD 18 0.0 

144


45 

46 

47 

C CGD 19 

OM O1D 19 

OM O2D 19 

0.27 

-0.635 

-0.635 

Figure S5.1: Secondary structure per residue calculated by DSSP[4] along the trajectory as a 

function of time for HEME domain and FMN domain (a) in complex simulation and (b) isolated. 

Color code represents different secondary structures. 

145


Figure S5.2: Minimum distance between water molecules and HEME iron as a function of 

time (every 100 ps) in isolated (in red color) and complex (in black color) simulation. 

146


Figure S5.3: Relative positional fluctuation of first 50 eigenvectors of the A and F chains in 

isolation and complex simulation. In AF chain, the first 50 eigenvectors account for 80.45 % of total 

RPF with 25.96 % contribution by the first eigenvector. A chain has 79.28 % and 86.77 % 

cumulative RPF with 27.19 % and 48.54 % contribution by first eigenvector in complex and 

isolated domain simulation, respectively. For F chain, cumulative RPF of first 50 eigenvectors was 

90.96 % and 89.19 % with 33.98 % and 35.02 % RPF of first eigenvector in complex and isolated 

domain simulation. 

147


Figure S5.4: RPF for (a) first, (b) second and (c) third eigenvector of AF chain (cyan color), and A 

and F chain in complex (black color) simulation. The green vertical line separates Heme and FMN 

domain. Horizontal bars, in blue and orange color represent helixes (labeled) and beta sheets, 

respectively. The regions involved in cofactor binding are represented ed by horizontal bars in purple 

color. 

148


Figure S5.5: RMSF of protein backbone atoms along first, second and third eigenvector after 

projection of the trajectory on the corresponding eigenvector of AF chain in complex simulation in 

(a), (b) and (c), respectively. The 10 sequential frames represent the extension of the fluctuations in 

trajectories along the eigenvectors. The first extreme conformation is shown in green color and last 

extreme in violet color. Other conformations of Heme and FMN domain are in sky blue and tan 

color, respectively. Helixes and loops are labeled. N and C indicate the N- and C-terminus of the 

protein (labeled in red color). 

149


References: 












2637. 

150

PART II: P450BM-3 HEME/FMN & CoSep 

Chapter 6 

A molecular dynamics study of the effect of 

cobalt(II)sepulchrate as an electron transfer mediator 

on the conformational and dynamics of P450BM-3 

6.1. Abstract 

The major limitation of the exploitation of P450BM-3 in the industrial processes is 

the consumption of expensive NADPH as a reduction equivalent in the catalytic cycle. 

Experimentally NADPH has also been found to inactivate the enzyme in the absence of 

substrate. The use of alternative cost effective cofactor like cobalt(III)sepulchrate (CoSep) 

with zinc dust as the source of electron has been proposed as a possible alternative 

solution to overcome the latter limitation. The mechanism of interaction of cobalt(III) 

sepulchrate with the protein has not yet elucidated at molecular level. In this paper, we 

propose a novel model of CoSep and use to study using molecular dynamic simulations its 

interaction with isolated HEME domain and the HEME/FMN complex of the P450BM-3. The 

aim of the study is to identify the putative binding modes of the CoSep on P450BM-3 

domains and their effect on their conformation, dynamics and electron transfer (ET) 

tunneling. The results of this study indicates that CoSep preferentially bind to negative 

charged residue on the surface exposed regions of P450BM-3 domains. Two ET tunneling 

pathways were observed for HEME/FMN complex in the presence of CoSep. First one is 

from CoSep to FMN isoalloxazine ring involving Trp574 then to HEME iron mediated by 

151


water molecule on interface and Met490 (ribityl tail of FMN cofactor binding loop) and 

Phe394, the same residues were involved in ET tunneling in water simulation of P450BM-3 

domains. The second ET pathways was from CoSep to HEME iron via Ile102 and Leu103 (C 

helix residues) and Ile401 and Cys400. In isolated HEME domain ET tunneling involved the 

residues of B’/C loop, Asp84, Gly85, Leu86 and Phe87. The collective motions of different 

amplitude were observed in both the systems and were found to facilitate the ET tunneling 

from CoSep to HEME iron. 


Cytochrome P450 monooxygenases, the largest superfamily of heme-containing 

soluble proteins, spread widely in almost all domains of life e.g. bacteria, yeast, insects, 

mammalian tissues, and plants.[1-3] They catalyze the oxidation using oxygen molecules of 

wide variety of substrates involved in biosynthesis and biodegradation pathways, or in 

xenobiotics metabolism[4] in the presence of reduction equivalents. The high 

stereoselectivity and large variety of possible substrates make these enzymes particular 

interesting for industrial applications. However, their complexity, low solubility, low 

catalytic turnover and in particular the utilization of expensive source of electron have so 

far limited their use.[4] Cytochrome P450BM-3, from the soil bacterium Bacillus 

megaterium, is one of the most widely studied members of this family.[5,6] Being soluble 

and self sufficient (P450 and reductase domains linked together on a single polypeptide 

chain), P450BM-3 has higher catalytic turnover with easy expression and purification in 

cell free medium.[7] Protein engineering approaches have been used successfully to 

increase technologically viability of P450BM-3 by fine-tuning its catalytic parameters and 

substrate recognition.[8,9] In past years, fast advancements are also made towards the cost 

effectiveness of the P450BM-3 catalytic reaction by the regeneration or substitution of 

expensive cofactor (NADPH or NADH) as a source of electrons.[7] The electrochemistry of 

P450BM-3 received considerable attention and various methods have allowed direct 

electron transfer system (from electrode to protein via conducting polymer films like 

152


BaytronP)[10] or mediated electron transfer system (to shuttle electrons from electrode to 

protein via small electro active compounds like Zn dust (as a source of electron) with 

Co(III)sepulchrate (as electron mediator) for driving the catalytic cycle.[11,12] Protein 

engineering via directed evolution and rational design offers an attractive solution to 

improve the enzymatic properties and to enhance the electrochemical performance of the 

enzyme.[11-13] In this paper, we performed molecular dynamic simulation to gain insight 

into the interaction mechanism of P450BM-3 domains with cobalt(II)sepulchrate (CoSep) 

as an electron transfer mediator. The results will help to investigate the effect of CoSep 

binding on conformation, dynamics and ET tunneling in P450BM-3 domains. 

The chapter is organized as follows. The details of MD simulations and force field 

modeling of the CoSep are reported in Method section. The Results and Discussion section 

is organized as follows. The preferential binding sites of CoSep on P450BM-3 domains are 

reported. The following paragraph provides information about the ET tunneling from 

CoSep to P450BM-3 domains. Hence, the collective dynamics of the system will be analyzed 

using the principal component analysis of the trajectories. Finally, in the conclusion section 

provides a summary of the outcome of the study. 

6.3. Methods 


The non- stoichiometric complex of one FMN domain to two HEME domains without 

substrate were used as a starting coordinate (PDB ID: 1BVY with resolution 0.203 nm).[14] 

For MD simulation, HEME domain (chain A: 20 - 450) associated with FMN domain (chain 

F: 479 - 630) was extracted from the starting coordinates including crystallographic water 

(within 0.60 nm from the protein was extracted using VMD software[15]). 1,2-ethanediol 

molecules were removed and replaced by water molecules from the crystallographic 

153


structure. The MD simulation was set up for isolated HEME-binding domain and 

HEME/FMN complex in water- CoSep mixture. 

6.3.2. Molecular dynamics simulation and modeling 

The GROMOS96 43a1 force field[16] was used for all simulations. The MD 

simulations performed in this study are summarized in Table 6.1. The HEME cofactor 

parameters for ferric iron was adopted from Helms et al.[17], that was already employed 

for the MD simulation of P450BM-3 HEME domain by Roccatano et al..[18,19] The partial 

charges were redistributed on porphyrin ring of HEME cofactor to adopt the parameters 

for GROMOS96 43a1 force field[16] with hydrogen atoms bound to bridging carbon in 

prophyrin ring.[20] FMN cofactor was in oxidized state in the FMN domain. Additional 

improper dihedrals were introduced to adopt the conformation of isoalloxazine ring as 

observed in crystallographic structure and molecular geometry optimization of flavin in 

both redox states. [21,22] Detail of the modified force field for FMN are reported in a 

previous paper.[23] 

For CoSep, (schematically represented in Figure 6.1) the force field parameters for 

bond and bond angles are adapted from Dehayes et al.[24] (the values are reported in 

Table S6.1 in Supporting Information (SI)). The non-bonded parameters are adopted from 

GROMOS96 43a1 force field.[16] 

Density functional theory calculation using Becke3LYP method[25] with LanL2DZ 

basic set[26] was used for the geometry optimization. Atomic partial charges were derived 

using CHelpG scheme[27] after constraints them to reproduce dipole moment (partial 

charges are reported in Table S6.2 in SI). A ionic radius for Co +2 of 0.075 nm was used to fit 

electrostatic potentials. All the calculations were performed using Gaussian09 package.[28] 

Fourty molecules of CoSep were randomly placed in the simulation box and solvated 

by stacking equilibrated boxes of solvent molecules to fill the simulation box. The CoSep 

154


concentration was equal to ~0.5 mM and it corresponds to the one used experimentally for 

the fastest biotransformation in P450BM-3 using Zn dust and cobalt(III)sepulchrate as 

alternative electron transfer system.[12] 

Figure 6.1: CoSep is in ball and stick representation, colored by elements such as, nitrogen 

in blue, hydrogen in green, and carbon in gray with labeled atom name and number (except 

hydrogen). 

Table 6.1: Summarizing the MD simulations of P450BM-3 domains in water and CoSep 

solution. 

No. of 

Starting 

No. of counter Simulation 

No. of atoms No. of CoSep solvent 

coordinates 

ions 

length (ns) 

molecules 

Heme 

domain (A 65650 - 20365 16 Na + 100 

chain) 

FMN domain 33483 - 10650 14 Na + 100 

155


(F chain) 

Complex (AF 

chain) 

A chain & 

CoSep 

AF chain & 

CoSep 

86101 - 26671 30 Na + 100 

64597 40 19638 64 Cl - 100 

85275 40 26029 50 Cl - 100 

*The abbreviation A, F and AF are used in the rest of the paper for HEME domain, FMN domain and 

HEME/FMN complex, respectively. 

6.4. Results and discussion 

The difference in conformation and dynamics of isolated FMN and HEME domain 

and HEME/FMN complex has been discussed in our previous paper.[20,23] Herein, we will 

focus on the effect of CoSep binding on the conformation, dynamics and ET tunneling in 

P450BM-3 domains in isolated HEME domain and in HEME/FMN complex. The presence of 

CoSep does not affect the structure of P450BM-3 domains significantly. The structural 

stability and convergence of P450BM-3 domains in CoSep solution were compared with the 

one in water and reported in SI through backbone root mean square deviation (RMSD) 

(Figure S6.1), radius of gyration (Rg) (Figure S6.2) and backbone RMSD and RMSF per 

residue (Figure S6.3a and S6.3a, respectively) using crystal structure as reference. In CoSep 

solution, both HEME domain and the complex show the same behavior as in pure water. 

The backbone RMSD of the HEME domain in the CoSep solution reaches a plateau with an 

average value of 0.25 ± 0.01 nm after 10 ns of simulation (Figure S6.1 in SI) and it shows 

the lowest RMS deviations and fluctuations (Figure S6.3a and S6.3b in SI). On the contrary, 

the AF complex shows the largest deviation in the residues of H helix (Figure S6.3a in SI). 

156


6.4.1. CoSep binding on P450BM-3 domains 

At the end of the simulations of both the isolated HEME domain and the complex, 

CoSep molecules were found bounded mainly at the surface exposed loop regions of the 

protein (see Figure S6.4 of SI). The average minimum distances between the CoSep 

molecules and HEME iron and the isoalloxazine ring along the simulations are reported in 

Figure S6.5 of SI. After 20 ns of simulation, CoSep molecules approach the HEME domain 

within an average distance of 1.72 ± 0.44 nm and 1.94 ± 0.19 nm in the isolated protein and 

in the complex, respectively. Figure 6.2a, 6.2b and 6.2c shows the minimum distance 

between CoSep and residues of isolated A chain and, of A and F chain in complex, 

respectively. 

The cluster analysis was used to select representative structure for the isolated 

HEME domain and for the complex. The first cluster of A chain and complex accounts for 

more ~83 % and 99 %, respectively in CoSep solution. The binding of CoSep in isolated A 

chain and AF chain complex is shown in Figure 6.2c and 6.2d, respectively. In isolated A 

chain, CoSep molecules bind mainly at HEME/FMN interface in contact with C (94 – 107) 

and H (233 – 238) helix and, B’/C (82 – 94), H/I (239 – 251), K/L (359 – 367) C-terminus 

turn (441 – 445) regions. Other regions of CoSep binding on isolated chain was A/B (32 – 

38 and 51 – 55), B helix (55 – 61) and B/B’ loop (61 – 68) and E helix (139 – 143) and F 

helix (181, 182) and F/G (192 – 198) loop. In AF chain, the binding of F chain slightly 

influence the distribution of CoSep on A chain. CoSep were more abundant at F chain, two 

of them were present near (≤ 0.50 nm) to FMN cofactor binding loop Lβ3 and Lβ4 at 

FMN/HEME interface. In P450BM-3 domains, the regions of CoSep binding were found to 

be rich in charged residues especially negative charged polar residues (aspartic acid and 

glutamic acid) i.e. obtained from the analysis of number of contacts between CoSep and 

P450BM-3 residues within the distance of 0.50 nm (reported in Figure S6.6 and also shown 

in Figure 6.2d and 6.2e by the abundance of oxygen (red color surface) in CoSep binding 

region). 

157


Figure 6.2: Minimum distance (≤ 1.0 nm) between CoSep and residues of (a) isolated A chain, (b) 

A chain and (c) F chain in complex. Horizontal bars, in blue and orange color represent helices 

(labeled) and beta sheets, respectively. The regions involved in cofactor binding are represented by 

horizontal bars in purple color. (d) and (e) show binding site for CoSep in the structure of first 

158


cluster of the isolates A chain and AF chain, respectively. CoSep molecules are in ball and stick 

representation and colored by element type (nitrogen in blue color, hydrogen in white color, and 

carbon in black color). HEME and FMN domain are in cartoon representation in sky blue and tan 

color, respectively with surface colored according to element type. FMN and HEME cofactors are in 

green and red, respectively. Helices, cofactors, loops and, N- and C- terminus (in red color) are 

labeled. 

6.4.2. Effect of CoSep binding on substrate access channel 

The accessibility of active site has been monitored by the dynamics behavior of 

residues Pro45 and A191 that line the substrate access channel by Roccatano et al..[19] 

P45Cα - A191Cα minimum distance (1.61 nm in crystal structure) calculated and reported 

in Figure S6.7 of SI. After 30 ns of simulation, least variation in the distances was observed 

in all the simulations. CoSep binding in A chain of AF complex induce larger deviation in G 

helix and F/G loop region and resulted in wider substrate access channel with an average 

distance of 1.87 ± 0.15 nm than in A chain of AF complex in water (0.59 ± 0.10 nm). Isolated 

A chain was less affected by CoSep binding and show slightly higher P45Cα - A191Cα 

distance (1.50 ± 0.14 nm) in it CoSep solution than in water with an average distance of 

1.11 ± 0.10 nm. In isolated A chain, reverse effect of CoSep binding was observed with 0.22 

± 0.03 nm average distance between water and HEME iron that was observed to be 0.34 ± 

0.14 nm in water simulation. Hence, CoSep binding in isolated HEME domain make its 

structure slightly compact and decreased the size of substrate access channel. 

6.4.3. Effect of CoSep binding on ET tunneling 

In CoSep solution, the distance between FMN and HEME cofactor was as average 

1.35 ± 0.01 nm (with the minimum distance of 0.95 nm), lower than the one in water (1.41 

± 0.09 nm) (reported in Figure S6.8 in SI). The ET tunneling in AF chain in crystal structure 

and in the simulation has been discussed in detail in our previous paper.[20] In CoSep 

159


solution, the ET tunneling was identified form CoSep to HEME iron in representative 

structures of isolated and complex simulation obtained via cluster analysis and reported in 

Table 6.2. 

Table 6.2: Electron transfer tunneling in AF chain and isolated A chain in CoSep solution 

calculated by Pathways[29] VMD plugin. 

Coordinates Redox Max. Distance Amino acids involved in 

partners coupling along ET the ET pathway 

(a.u.) pathway (nm) 

A chain CoSep/HEME 6.39 x10 -9 3.08 CoSep → D84 → 

G85 → L86 → F87 → 

HEME (FE) 

AF chain CoSep/HEME 2.25 x10 -9 2.93 CoSep → I102 → 

L103 → I401 → C400 → 

HEME (FE) 

AF chain CoSep/FMN 4.00 x10 -6 1.72 CoSep → W574 → 

→ FMN (C7) 

AF chain FMN/HEME 1.38 x10 -9 2.08 FMN (C7) → SOL → 

→ M490 → F393 → 

HEME (FE) 

Figure 6.3a and 6.3b shows the possible ET tunneling from CoSep to HEME iron of A 

chain in isolated and complex simulation. In isolated A chain, ET was mediated by the 

residues of B’/C loop, Asp84, Gly85, Leu86 and Phe87. In A chain of AF complex, ET 

tunneling can be mediated by two pathways. The first ET pathway is from CoSep to HEME 

iron mediated by Iso102, Leu103, Ile401 and Cys400. Figure 6.3c and 6.3d shows the 

160


second possible pathway, first from CoSep to isoalloxazine ring of FMN (C7 atom) and then 

from C7 atom to HEME iron mediated by water molecule involving Met490 of Lβ1 FMN 

binding loop and Phe393 of K/L loop. The same FMN/HEME ET pathway was observed in 

water simulation of AF complex without the involvement of water molecule.[20] 

161


Figure 6.3: ET tunneling from CoSep (in purple color) to HEME iron in AF complex (a) and 

isolated A chain (b) in CoSep solution. (c) ET from CoSep to the isoalloxazine ring (C7 atom) of FMN 

cofactor and d) from C7 atom of FMN cofactor to HEME iron in AF complex. ET is represented by 

red color tubes. HEME and FMN cofactors are in black and pink color, respectively. The 

conformation of first cluster with minimum distance between HEME to FMN cofactor is used. The 

amino acids with in the distance of 0.50 nm from both the cofactors are labeled and shown in 

licorice representation colored by element type (oxygen in red, carbon in cyan and nitrogen in blue 

color) and their associated secondary structure in cartoon representation in sky blue for HEME 

domain and in orange color for FMN domain. The residues involved in electron tunneling are 

represented and labeled in green color. 

6.4.4. Effect of CoSep binding on P450BM-3 dynamics 

The subspace overlap and inner product of first ten eigenvectors A chain (together 

account for ~60% of total residue position fluctuation) in isolated and complex simulation 

was less than 0.20 and 0.34, respectively. The latter indicate the existence of different set of 

collective motions in the eigenvectors of same time windows of both the trajectories. The 

first three eigenvectors of A chain together represent ~41 % of the total relative positional 

fluctuation (RPF). Figure 6.4a, 6.4b and 6.4c represents RPF associated with first three 

eigenvectors of A chain in isolated (in green color) and complex (in orange color) 

simulation (comparison to P450BM-3 domain in water is reported in Figure S6.9). Figure 

6.5 shows RMSF associated with first three eigenvector of A chain in isolated (a1, a2 and 

a3) and complex (b1, b2 and b3) simulation, respectively in CoSep solution. 

In isolated A chain, the first collective motion (Figure 6.4a and 6.5a1) involves 

mainly N-terminus region (residue 20 – 26), turn (residue 35 – 38) between A helix and 

beta sheet 1, C-terminus (residue 450 – 458) and in K/L loop region (residue 325 – 380, 

involved in HEME cofactor binding) and slight motion in B helix and D/E, F/G and G/H 

loops. In the second eigenvector involve the collective motion (Figure 6.4b and 6.5a2) along 

the turn (residue 35 – 38) between A helix and beta sheet, F/G loop, K/L loop (residue 366 

162


– 385) and C-terminus (residue 425 – 430 and 450 – 458). In the third eigenvector (Figure 

6.4c and 6.5a3), mainly at C-terminus (residue 425 – 458) and slightly D/E, F/G and K/L 

loop (residue 390 – 402). The first three eigenvectors show that the substrate channel 

remains open (also found in P45Cα - A191Cα 

Figure 6.4: RPF for (a) first, (b) second and (c) third eigenvector of A chain in isolated (green 

color) and complex (orange color) simulation. Horizontal bars, in blue and orange color represent 

helixes (labeled) and beta sheets, respectively. The regions involved in cofactor binding are 

represented by horizontal bars in purple color. 

distance) and the collective motion is related to the interaction of residues to HEME 

cofactor and to facilitate ET tunneling from CoSep to HEME iron through B’/C loop in 

isolated A chain. 

163


In complex simulation, the first eigenvector of A chain (Figure 6.4a and 6.5b1) 

involve RPF in B’/C loop (residue 83 – 94), C and D helix and C/D loop (residue 100 – 130), 

E/F loop (residue 159 – 171), G helix and G/H loop (residue 198 – 230), I helix (residue 

250 – 268) and K/L loop (residue 335 – 340 and 392 – 400). The first collective motion 

involved residues in contact with HEME cofactor and is related to interaction of A chain 

with F chain with the largest RPF the regions on interface of A chain mainly constituted by 

C – D helix (found to be involved in ET tunneling from CoSep to HEME iron) and K/L loop 

(F393 is involved in water mediated ET tunneling from FMN to HEME iron). In the second 

eigenvector (Figure 6.4b and 6.5b2), the collective motion was involve mainly G helix and 

slightly in B’/C loop, G/H loop and K/L loop (residue 495 - 400). Larger RPF in G helix is 

resulted by CoSep binding the slight kink formation in G helix. The collective motion in 

third eigenvector (Figure 6.4c and 6.5b3) involve mainly G/H loop only and slightly in G 

helix and K/L loop regions. 

AF chain in CoSep solution shows the collective motion of different amplitude than 

the one in water. RPF of first three eigenvectors of AF chain in both water and CoSep 

solution is reported in Figure S6.10 of SI. The collective motion associated with the first 

two eigenvectors of AF chain in the presence of CoSep does not belongs solely to the 

movement of F chain towards A chain as observed in water simulation. The collective 

motion associated with the first three eigenvectors of AF chain in the presence of CoSep is 

reported in Figure S6.11 in SI. In the first eigenvector in CoSep, A chain of AF complex show 

higher fluctuation in C helix, D helix, beginning of F helix (residue 170 – 175) and G/H loop 

and lower fluctuation for F chain than the one in water. In the second eigenvector, the 

collective motion in A chain in the presence of CoSep mainly involve G helix and slightly 

higher RMSF for B’/C loop and K/L loop (residue 390 – 400), both loops are involved in 

HEME cofactor binding. Third collective motion in A chain of AF complex involves N- 

terminus residues (A – C helix), G helix and K/L loop. Third eigenvector of AF chain show 

slight collective motion of F chain towards A chain. 

164


Figure 6.5: RMSF of protein backbone atoms along first (a), second (b) and third (c) eigenvector 

after projection of the trajectory on the corresponding eigenvector of isolated A chain in water (a1, 

a2 and a3) and in CoSep solution (b1, b2 and b3). The 10 sequential frames represent the extension 

of the fluctuations in trajectories along the eigenvectors. The first extreme conformation is shown 

in green color and last extreme in violet color. Other conformations of A chain are in sky blue. 

Helices and loops are labeled. N-terminus of the protein is labeled in red color. 

165



We performed the simulation of isolated HEME domain and HEME/FMN complex in 

CoSep solution. Structure remains conserved in both the systems throughout the 

simulation. CoSep was found to bound mainly on surface exposed loop regions (richer in 

charged amino acid mainly negative charged, E and D) in both the systems. CoSep binding 

affects the substrate access channel, found to be relatively more open in compassion to the 

one in water simulation.[20] Isolated HEME domain adopts ET tunneling from CoSep to 

HEME iron mediated by residues of B’/C loop. HEME/FMN complex has two possible ET 

tunneling pathways. First one was from CoSep to FMN and then from FMN to HEME by the 

involvement of same residues (as observed in water simulation, Met490 and Phe393) but 

mediated by water molecule. However, the average distance between FMN/HEME was 

lesser (1.35 ± 0.11 nm) than the one observed in water simulation (1.41 ± 0.09 nm and 1.81 

nm in crystal structure). Second ET tunneling was directly from CoSep to HEME iron. Both 

the system shows atomic fluctuations of different amplitude in CoSep solution and in water 

simulation. Hence, the presence of CoSep does not affect dramatically the conformation of 

P450BM-3 domains but mainly it results into the stabilization on the loops on surface. 

Except in HEME/FMN complex CoSep binding induced the conformational change in G helix 

and resulted in higher fluctuation in F/G and G/H loop regions during the simulation. In 

HEME/FMN complex, the preferable ET pathway is from CoSep to FMN and then to HEME 

iron and in this process surface water molecule plays an important role. However, in 

isolated HEME domain direct ET from CoSep to HEME iron might fasten the ET tunneling 

and hence the performance of enzyme as observed in protein engineering experiment of 

P450BM-3 that isolated HEME domain perform better in the presence of Zn/Co(III)sep. The 

results of this study provide indication of the mechanism of ET by the CoSep. These results 

are in agreement with the findings of directed evolution and side directed mutagenesis 

experiments on the whole P450BM-3 and the HEME domain. 

166




Biosyst 2: 462-469. 


268. 



4. Kumar S Engineering cytochrome P450 biocatalysts for biotechnology, medicine and 

bioremediation. Expert Opin Drug Metab Toxicol 6: 115-131. 

5. Narhi LO, Fulco AJ (1986) Characterization of a catalytically self-sufficient 119,000- 

dalton cytochrome P-450 monooxygenase induced by barbiturates in Bacillus 

megaterium. J Biol Chem 261: 7160-7169. 

6. Narhi LO, Fulco AJ (1987) Identification and Characterization of 2 Functional 

Domains in Cytochrome-P-450bm-3, a Catalytically Self-Sufficient Monooxygenase 

Induced by Barbiturates in Bacillus-Megaterium. J Biol Chem 262: 6683-6690. 

7. Whitehouse CJC, Bell SG, Wong L-L (2012) P450BM3 (CYP102A1): connecting the 

dots. Chem Soc Rev. 






Trans 34: 1173-1177. 

10. Schuhmann W (2002) Amperometric enzyme biosensors based on optimised 

electron-transfer pathways and non-manual immobilisation procedures. J 

Biotechnol 82: 425-441. 

11. Nazor J, Dannenmann S, Adjei RO, Fordjour YB, Ghampson IT, et al. (2008) 

Laboratory evolution of P450 BM3 for mediated electron transfer yielding an 

167


activity-improved and reductase-independent variant. Protein Eng Des Sel 21: 29- 

35. 

12. Schwaneberg U, Appel D, Schmitt J, Schmid RD (2000) P450 in biotechnology: zinc 

driven omega-hydroxylation of p-nitrophenoxydodecanoic acid using P450 BM-3 

F87A as a catalyst. J Biotechnol 84: 249-257. 

13. Wong TS, Schwaneberg U (2003) Protein engineering in bioelectrocatalysis. Curr 

Opin Biotechnol 14: 590-596. 



1863-1868. 

15. Humphrey W, Dalke A, Schulten K (1996) VMD: Visual molecular dynamics. J Mol 

Graphics 14: 33-&. 













20. Verma R, Schwaneberg U, Roccatano D Insight into the redox partner interaction 

mechanism in cytochrome P450BM-3 using molecular dynamics simulation. 

Unpublished. 





168


23. Verma R, Schwaneberg U, Roccatano D Conformational Dynamics of the FMNbinding 

Reductase Domain of Monooxygenase P450BM-3. Unpublished. 

24. Dehayes LJ, Busch DH (1973) Conformational Studies of Metal-Chelates .1. Intra- 

Ring Strain in 5-Membered and 6-Membered Chelate Rings. Inorganic Chemistry 12: 

1505-1513. 

25. Becke AD (1993) Density-functional thermochemistry. III. The role of exact 

exchange. The Journal of Chemical Physics 98: 5648-5652. 

26. Hay PJ, Willard RW (1985) Ab initio effective core potentials for molecular 

calculations. Potentials for the transition metal atoms Sc to Hg. The Journal of 

Chemical Physics 82: 270-283. 

27. Breneman CM, Wiberg KB (1990) Determining Atom-Centered Monopoles from 

Molecular Electrostatic Potentials - the Need for High Sampling Density in 

Formamide Conformational-Analysis. Journal of Computational Chemistry 11: 361- 

373. 

28. Frisch MJ, Trucks GW, Schlegel HB, Scuseria GE, Robb MA, et al. (2009) Gaussian 09, 

Revision B.01. Gaussian 09, Revision B01, Gaussian, Inc, Wallingford CT. Wallingford 

CT. 

29. Balabin IA, Hu X, Beratan DN (2012) Exploring biological electron transfer pathway 

dynamics with the Pathways Plugin for VMD. J Comput Chem 33: 906-910. 

169

PART II: P450BM-3 HEME/FMN & CoSep SI 




simulation 

Table S6.1: Force field parameters for cobalt(II)sepulchrate adopted from the force 

constants calculated on the energy minimized geometries by Dehayes et al.[1] 

Bond stretching parameters 

Bond type Force constant (kJ mol -1 nm -1 ) 

Co – N 885.234 

N – C 2342.558 

N – H 2962.824 

C – C 2709.900 

C – H 2740.010 

Angle bending parameters 

Angle type Force constant (kJ mol -1 rad -2 ) 

N – Co – N 240.88 

Co – N – C 167.41 

Co – N – H 167.41 

C – C – C 417.92 

C – C – H 292.06 

170


C – N – H 167.41 

C – N – C 417.92 

N – C – C 417.92 

N – C – H 292.06 

N – C – N 334.22 

H – N – H 251.11 


Dihedral type Force constant (kJ mol -1 ) 

H – C – C – H 2.729 

H – C – N – Co 2.729 

H – C – N – C 2.729 

H – N – C – C 2.729 

H – C – N – H 1.807 

H – C – C – C 4.103 

H – C – C – N 4.103 

Co – N – C – C 1.373 

N – Co – N – C 0.000 

N – C – C – N 2.060 

N – C – C – C 2.060 

C – C – C – C 2.060 

171


Table S6.2: Partial charges on cobalt(II)sepulchrate calculated by DFT calculations and 

adopted for GROMOS96 43a1 force field.[2] 


1 CH2 C 1 0.514 

2 N N 2 -0.544 

3 CH2 C 3 0.149 

4 CH2 C 4 0.149 

5 N N 5 -0.542 

6 CH2 C 6 0.511 

7 N N 7 -0.945 

8 CH2 C 8 0.639 

9 N N 9 -0.785 

10 CH2 C 10 0.143 

11 CH2 C 11 0.253 

12 N N 12 -0.944 

13 CH2 C 13 0.697 

14 N N 14 -0.947 

15 CH2 C 15 0.639 

16 N N 16 -0.785 

17 CH2 C 17 0.143 

18 CH2 C 18 0.253 

19 N N 19 -0.944 

20 CH2 C 20 0.697 

21 CO CO 21 1.682 

22 H H 22 0.252 

23 H H 23 0.381 

24 H H 24 0.351 

25 H H 25 0.381 

26 H H 26 0.351 

172


27 H 

H 27 

0.251 

Figure S6.1: Backbone root means square deviation (RMSD) with respect to reference structure 

as a function of time for AF chain (black), A of AF chain (red), F of AF chain (green), A (blue) and F 

(orange) chain in water (as dotted line), and CoSep solution (straight line). P450BM-3 domains 

deviated less in CoSep solution than the one in water only. Major difference was observed for 

isolated A chain (green color solid line) in the presence of CoSep with lower deviation than the one 

in water and it reached to a plateau after 25 ns with less variation. 

173


Figure S6.2: Radius of gyration with respect to reference structure as a function of time for AF 

chain (black), A of AF chain (red), F of AF chain (green), A chain (blue) and F chain (orange) in 

water (as dotted line), and CoSep solution (straight line). 

174


Figure S6.3: Backbone RMSD (a) and RMSF (b) per residue with respect to crystal structure for 

isolated A and F chain (in red color), AF complex (in black color) in water, and A chain (in green 

color) and AF complex (in orange color) in CoSep solution. The maroon vertical line separates 

HEME and FMN domains. Horizontal bars, in blue and orange color represent helices (labeled) and 

beta sheets, respectively. The regions involved in cofactor binding are represented by horizontal 

bars in purple color. 

175


Figure S6.4: The binding of CoSep (in ball and stick representation) on a) A chain and b) AF chain 

of P450BM-3 in cartoon representation with surface colored by element type (carbon in gray, 

oxygen in red, nitrogen in blue and hydrogen in white). FMN and HEME cofactors are in licorice 

representation in red and green color, respectively. Helices, loops and N- and C- terminus (in red) 

are labeled. 

176


Figure S6.5: Average over the minimum distance (less than 2.3 nm) between CoSep and 

isoalloxazine ring of FMN cofactor (in green color), and HEME iron of A chain in isolated domain (in 

black color) and complex simulation (in red color) as a function of time. 

Figure S6.6: Number of contacts between CoSep and amino acids of P450BM-3 domains within 

the distance of 0.50 nm. 

177


Figure S6.7: Minimum distance between P45C α and A191C α (1.61 nm in crystal structure) as a 

function of time for isolated A and F chain (in red color), AF complex (in black color), and A chain 

(in green color) and AF complex (in orange color) with CoSep. 

178


Figure S6.8: Minimum distance between heavy atoms of isoalloxazine ring of FMN and HEME 

cofactor as a function of time in AF complex in water (in black color) and AF complex in CoSep 

solution (in red color). Green color horizontal line shows the distance observed in crystal 

structure.[3] 

179


Figure S6.9: RPF for first, second and third eigenvector of isolated A and F chain (in red color), AF 

complex (in black color) in water, and A chain (in green color) and AF complex (in orange color) in 

CoSep solution. . The maroon vertical line separates HEME and FMN domain. Horizontal bars, in blue 

and orange color represent helices es (labeled) and beta sheets, respectively. The regions involved in 

cofactor binding are represented by horizontal bars in purple color. 

180


Figure S6.10: RPF for first, second and third eigenvector of AF complex in water (black color) and 

in CoSep solution (orange color). The maroon vertical line separates HEME and FMN domain. 

Horizontal bars, in blue and orange color represent helices (labeled) and beta sheets, respectively. 

The regions involved in cofactor binding are represented by horizontal bars in purple color. 

181


Figure S6.11: RMSF of protein backbone atoms along first (a), second (b) and third (c) 

eigenvector after projection of the trajectory on the corresponding eigenvector of AF complex in 

CoSep solution. The 10 sequential frames represent the extension of the fluctuations in trajectories 

along the eigenvectors. The first extreme conformation is shown in green color and last extreme in 

violet color. Other conformations of A and F chain are in sky blue and tan color, respectively. Helices 

and loops in FMN domain are labeled. N and C indicate the N- and C-terminus of the protein 

(labeled in red color). 

182


References 

1. Dehayes LJ, Busch DH (1973) Conformational Studies of Metal-Chelates .1. Intra- 

Ring Strain in 5-Membered and 6-Membered Chelate Rings. Inorganic Chemistry 12: 

1505-1513. 






1863-1868. 

183

Summary and outlook 

In the first part of the thesis, the importance of the combined computational and 

directed evolution methods have been reviewed as a winning strategy for protein 

engineering. The computational approaches can assist the design of protein engineering 

experiments and holds particular promise to tailor proteins for specific functions. 

MAP 2.0 3D server has been introduced to assist the development of directed evolution 

experiments for generating sequence libraries with the highest chance to have variants 

with desired enzymatic properties. This task is accomplished by correlating the generated 

amino acid substitution patterns for a specific random mutagenesis method to the 

structural information of the target protein. The combined information can help to select 

an experimental strategy that improves the chances to obtain functional efficient and/or 

stable enzyme variants. Hence, MAP 2.0 3D server facilitates the ‘in-silico’ pre-screening of 

the target gene by predicting the amino acid diversity population in random mutagenesis 

libraries. Currently, MAP 2.0 3D server provides sequence/structure based analysis using the 

protein sequence/structure (crystallographic structure or homology model) provided by 

the user. In future, the capability of the server can further be extended by (1) dynamically 

identifying the functionally important regions e.g. active site residues and trans-membrane 

regions in the target protein and focusing the analysis only on those regions, (2) by 

providing MAP 2.0 3D results of structural analysis in the absence of crystallographic or 

model structure using the predicted secondary structure elements from protein sequence, 

and (3) predicting the flexible regions in protein structure using e.g. Gaussian network 

model (GMN) and correlate them with MAP 2.0 3D analysis. 

184

In the second part of the thesis, molecular dynamics simulations were used to 

understand the interaction mechanism in the HEME and FMN domains of P450BM-3 in 

solution and in the presence of electron mediator cobalt(II)sepulchrate (CoSep). 

Cytochrome P450BM-3 is the pivot member of cytochrome P450 monooxygenase 

superfamily particularly for being bacterial P450, fused with its eukaryotic like P450s 

redox partners (FMN and FAD binding domains). This structural feature makes the enzyme 

catalytically self-sufficient. In addition, being soluble in water, it has high catalytic 

efficiency and monooxygenase rate. These characteristics make the enzyme particularly 

interesting for possible biotechnological application. For this reason, the comprehension of 

structure-function-dynamics relationships in P450BM-3 is relevant. In this thesis we have 

analyzed different dynamic and structural properties of the HEME domain, FMN domain 

and their complex in solution. 

In the first study, the effect of protonation states (oxidized and reduced) of FMN 

cofactor on conformation and dynamics of FMN-binding domain of P450BM-3 was 

analyzed by performing MD simulations of holo- and apo- protein in solution. In holoprotein, 

the protonation state of isoalloxazine ring influences the conformation and 

dynamics of FMN cofactor and resulted in change in FMN binding site. In particular, the 

dynamics of FMN domain showed significant differences in the atomic fluctuation 

amplitude in oxidized and reduced states. In apo-protein, the overall structure remained 

conserved but high fluctuations were observed in FMN binding region that can promote the 

feasible rebinding of FMN cofactor as observed experimentally. 

The MD simulation of HEME and FMN domains were performed to gain insight into 

the interaction mechanism and inter domain electron transfer in HEME/FMN complex. The 

simulations of isolated HEME and FMN domains were also performed to compare their 

behavior in solution and in HEME/FMN complex. The HEME/FMN complex undergoes 

conformational rearrangement during the simulation and decrease the distance between 

FMN and HEME cofactor within the range for expected ET between both the redox centers. 

185

In complex the main collective motion was dominated by the interaction mechanism 

between HEME and FMN domain. 

The MD simulations of HEME/FMN complex and isolated HEME domain were 

performed to investigate the binding modes between CoSep and P450BM-3 domains and 

their effect on ET pathway. CoSep prefers to bind on surface exposed loop regions mainly 

having negative charged residues. CoSep binding on HEME domain was observed to affect 

the substrate access channel and keep it more open in comparison to the one observed in 

solution. Putative ET pathways were proposed between CoSep and HEME iron in 

HEME/FMN complex and isolated HEME domain. 

The results of P450BM-3 simulations can enhance our basic understanding with the 

possible applications in enzyme catalysis toward (1) the effect of protonation state on 

dynamics of P450BM-3 reductase domain, (2) the interaction mechanism of redox partners 

and its effect on ET tunneling between redox centers, and (3) the effect of the presence of 

ET mediator on redox partner interaction and ET tunneling. The study can be further 

extended by performing the MD simulation of HEME/FMN complex with FMN cofactor in 

reduced state. The modeling of linker regions connecting HEME and FMN domains followed 

by its simulation will help to further enhance our understanding toward HEME/FMN 

binding interaction mechanism and ET tunneling. Recently the release of FAD domain of 

P450BM-3 with NADPH also offers a chance to perform the simulation of the whole 

complex. 

186


Personal Details 

Name: 

Rajni Verma 

Address: 

School of Engineering and Science, 

Jacobs University Bremen, 

28759 Bremen, Germany 

Tel.: +49 421 200 3208 

Email: 

ra.verma@jacobs-university.de 

Date of Birth: 15 th April 1984 

Nationality 

Indian 

Linguistic skills: 

Hindi, English 

__________________________________ 

Employment & Education 

_______________________________ 

Since 06/09 

PhD Fellow in Computational Chemistry and Bioinformatics, 

Jacobs University Bremen, Bremen, Germany 

04/08 – 03/09 Project Assistant, Bioinformatics, 

Institute of Genomics and Integrative Biology, New Delhi, India 

07/05 – 03/08 Master of Science in Bioinformatics, 

CCS CS University, Meerut, India 

06/04 – 06/05 Advanced Diploma in Computer Application, 

CCSCS University, Meerut, India 

07/01 – 06/05 Bachelor of Science in Life Sciences, 

CCSCS University, Meerut, India 

Publications 

1. Verma R, , Schwaneberg U, Roccatano D. Conformational dynamics of the FMN-binding reductase 

domain of monooxygenase P450BM-3. J Chem Theory and Comput 2012, DOI: 10.1021/ct300723x. 

2. Verma R, , Schwaneberg U, Roccatano D. Computer-aided protein directed evolution: a review of 

web servers, databases and other computational tools for protein engineering. Computational and 

Structural Biotechnology Journal 2012, 2 (3), e201209008. 

3. Ruff AJ, Marienhagen J, Verma R, , Roccatano D, Genieser HG, Niemann P, Shivange AV, 

Schwaneberg U. dRTP and dPTP a complementary nucleotide couple for the Sequence Saturation 

Mutagenesis (SeSaM) method. J Mol Catal B-Enzym 2012, 84, 40-47. 

4. Verma R, Schwaneberg U, Roccatano D. MAP 2.0 3D: a sequence/structure based 

server for protein 

engineering. ACS Synth Bio. 2012, 1 (4), 139-150.


5. Ramachandran S, Chaudhuri R, Verma R, Shah AR, Sen R, Paul C. Systems Immunology: Data 

modeling and scripting in R Book Chapter: Encyclopedia of Systems Biology, Springer Science & 

Business Media, LLC 2011. Edited by W. Dubitsky, O. Wolkenhauer, K. Cho, & H. Yokota. 

6. Zhu L, Verma R, Roccatano D, Ni Y, Sun Z, Schwaneberg U. A potential antitumor drug (arginine 

deiminase) reengineered for efficient operation under physiological conditions. ChemBioChem 2010, 

11, 2294-2301. [inside cover page] 

Conferences/Abstracts 

1. Verma R, Schwaneberg U, Roccatano D. Molecular dynamics simulations of P450BM-3 reductase 

domain. Computer simulation and theory of macromolecules 2012, Hunfeld, Germany. 

2. Verma R, Schwaneberg U, Roccatano D. Protein and cofactor conformational dynamics of FMNbinding 

reductase domain of monooxygenase P450BM-3. 5th Meeting of the North German 

Biophysicist 2012, Borstel, Germany. 

3. Verma R, Schwaneberg U, Roccatano D. MAP 2.0 3D: a structure based substitution spectra analyses 

of mutagenesis methods. 10 th International Symposium on Biocatalysis- Biotrans 2011, Sicily, Italy. 

4. Ruff AJ, Marienhagen J, Verma R, Roccatano D, Mundhada H, Shivange AV, Schwaneberg U. 

Ribavarin: A complementary universal base to P for Sequence Saturation Mutagensis Method 

(SeSaM). 10 th International Symposium on Biocatalysis- Biotrans 2011, Sicily, Italy. 

5. Verma R, Schwaneberg U, Roccatano D. Conformational dynamics of oxidized and reduced FMN in 

water and methanol. MoLife Center Jacobs University Bremen 2011, Seefeld, Germany. 

6. Verma R, Schwaneberg U, Roccatano D. MAP2.0: Evolution of Mutagenesis Assistant Program. 5 th 

International Congress on Biocatalysis- Biocat 2010, Hamburg, Germany. 

7. WE-Heraeus Summer School June 2009, ‘Quantum and classical simulation of biological systems and 

their interaction with technical materials’, Bremen, Germany. (Participation) 

References 

Prof. Dr. Danilo Roccatano 

Assistant Professor 

School of Engineering and Science, 

Jacobs University Bremen, 

Campus Ring 1, Research II, 

28759 Bremen, Germany 

Tel: +49-421 200-3144 

Fax: +49-421-200-3249 

Email: d.roccatano@jacobs-university.de 

Web: http://ses.jacobs-university.de/ses/droccatano 

Prof. Dr. Ulrich Schwaneberg 

Head of the Institute 

Department of Biotechnology, 

RWTH Aachen University, 

Worringer Weg 1, 

52056 Aachen, Germany 

Tel.: +49-241-80-24176 

Fax: +49-241-80-22387 

E-Mail: u.schwaneberg@biotec.rwth-aachen.de 

Web: www.biotec.rwth-aachen.de

Statutory Declaration 

I, RAJNI VERMA, hereby declare that I have written this PhD thesis independently, 

unless where clearly stated otherwise. I have used only the sources, the data and the 

support that I have clearly mentioned. This PhD thesis has not been submitted for 

conferral of degree elsewhere. 

Bremen, December 19, 2012 

Signature ____________________________________________________________

Development and Application of Novel ... - Jacobs University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?