Development and Application of Novel ... - Jacobs University
Development and Application of Novel ... - Jacobs University
Development and Application of Novel ... - Jacobs University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Development</strong> <strong>and</strong> <strong>Application</strong> <strong>of</strong> <strong>Novel</strong> Bioinformatics <strong>and</strong><br />
Computational Modeling Tools for Protein Engineering<br />
by<br />
Rajni Verma<br />
A thesis submitted for the degree <strong>of</strong><br />
Doctor <strong>of</strong> Philosophy<br />
in<br />
Computational Chemistry & Bioinformatics<br />
Date <strong>of</strong> Defense: December 14 th 2012<br />
Supervisor<br />
Pr<strong>of</strong>. Dr. Danilo Roccatano<br />
<strong>Jacobs</strong> <strong>University</strong> Bremen, Germany<br />
Co-supervisor<br />
Pr<strong>of</strong>. Dr. Ulrich Schwaneberg<br />
RWTH Aachen <strong>University</strong>, Germany<br />
External committee member<br />
Dr. Steven Hayward<br />
<strong>University</strong> <strong>of</strong> East Anglia, UK<br />
School <strong>of</strong> Engineering <strong>and</strong> Science
Acknowledgement<br />
I express my sincere gratitude to my PhD supervisor, Pr<strong>of</strong>. Dr. Danilo Roccatano for<br />
his expert <strong>and</strong> continuous guidance. Especially, I thank him for his patience <strong>and</strong> the time he<br />
spent to explain the concepts <strong>and</strong> ideas that really helped me to accomplish my work. His<br />
constant support, underst<strong>and</strong>ing, motivation <strong>and</strong> valuable discussions provided me a<br />
wonderful learning experience during my PhD.<br />
I am thankful to Pr<strong>of</strong>. Dr. Ulrich Schwaneberg for his constructive comments, fruitful<br />
discussions <strong>and</strong> his support during this endeavor. It has been a great honor <strong>and</strong> pleasure to<br />
work with him. I am deeply grateful for his trust on me. I express my respectful gratitude to<br />
Dr. Steven Hayward for being a member <strong>of</strong> my PhD committee. I am thankful to Dr. Achim<br />
Gelessus for his technical support to utilize efficiently CLAMV facility for scientific<br />
computation throughout this work at <strong>Jacobs</strong> <strong>University</strong> Bremen.<br />
I express my whole-hearted thanks to the member <strong>of</strong> Pr<strong>of</strong>. Roccatano Group, Samira<br />
Hezaveh, Khadga Karki, Susruta Samanta <strong>and</strong> Edita Sarukhanyan for their wonderful<br />
company <strong>and</strong> support. I also convey my thanks to the member <strong>of</strong> Pr<strong>of</strong>. Schwaneberg group<br />
at RWTH Aachen for their cooperation. I express my special thanks to my friends, Kavita,<br />
Amit, Amol, Sagar, Hemanshu, Susruta, Usha <strong>and</strong> Steffi for providing such a friendly<br />
environment. Especially, I express my whole-souled gratitude to Amol for his continuous<br />
support, encouragement, motivation, patience <strong>and</strong> care throughout my PhD.<br />
i
Funding<br />
The work described in this PhD thesis was financially supported by European<br />
Union 7 th framework program for the project entitled “Effective redesign <strong>of</strong> oxidative<br />
enzymes for green chemistry” (Project reference: 212281) in collaboration with Pr<strong>of</strong>. Dr.<br />
Ulrich Schwaneberg from RWTH Aachen <strong>University</strong>.<br />
ii
List <strong>of</strong> Publication<br />
1. Verma R, Schwaneberg U, Roccatano D. Conformational dynamics <strong>of</strong> the FMN-binding<br />
reductase domain <strong>of</strong> monooxygenase P450BM-3. J Chem Theory <strong>and</strong> Comput 2012, DOI:<br />
10.1021/ct300723x.<br />
2. Verma R, Schwaneberg U, Roccatano D. Computer-aided protein directed evolution: a<br />
review <strong>of</strong> web servers, databases <strong>and</strong> other computational tools for protein<br />
engineering. Computational <strong>and</strong> Structural Biotechnology Journal 2012, 2 (3),<br />
e201209008.<br />
3. Ruff AJ, Marienhagen J, Verma R, Roccatano D, Genieser HG, Niemann P, Shivange AV,<br />
Schwaneberg U. dRTP <strong>and</strong> dPTP a complementary nucleotide couple for the Sequence<br />
Saturation Mutagenesis (SeSaM) method. J Mol Catal B-Enzym 2012, 84, 40-47.<br />
4. Verma R, Schwaneberg U, Roccatano D. MAP 2.0 3D: a sequence/structure based server<br />
for protein engineering. ACS Synth Bio. 2012, 1 (4), 139-150.<br />
5. Zhu L, Verma R, Roccatano D, Ni Y, Sun Z, Schwaneberg U. A potential antitumor drug<br />
(arginine deiminase) reengineered for efficient operation under physiological<br />
conditions. ChemBioChem 2010, 11, 2294-2301. [inside cover page]<br />
6. Verma R, Schwaneberg U, Roccatano D. Insight into the redox partner interaction<br />
mechanism in cytochrome P450BM-3 using molecular dynamics simulations.<br />
(manuscript under preparation)<br />
7. Verma R, Schwaneberg U, Roccatano D. A molecular dynamics study <strong>of</strong> the interactions<br />
between P450BM-3 domains <strong>and</strong> Coblat(II)Sepulchrate as an electron transfer<br />
mediator. (manuscript under preparation)<br />
iii
Abstract<br />
In the last decades, enzymatic catalysis emerges as a convenient <strong>and</strong><br />
environmentally friendly substitute for the traditional chemical processes range from the<br />
synthesis <strong>of</strong> many pharmaceutical <strong>and</strong> agrochemical building blocks to fine <strong>and</strong> bulk<br />
chemicals, <strong>and</strong> more recently, the components <strong>of</strong> bi<strong>of</strong>uel. The combination <strong>of</strong> experimental<br />
<strong>and</strong> computational methods holds particular promise in the field <strong>of</strong> enzymatic catalysis to<br />
tailor enzymes for the tasks not yet exploited by natural selection. Therefore, it is<br />
important to develop computational tools that help to exploit this goal. The scope <strong>of</strong> this<br />
thesis is to propose novel bioinformatics tools <strong>and</strong> to explore computational methods<br />
aimed to support <strong>and</strong> guide protein evolution experiments. The thesis is divided into two<br />
parts. First part <strong>of</strong> the thesis (Part I, Chapter 1 <strong>and</strong> Chapter 2) is focused on extending the<br />
benchmarking system <strong>of</strong> r<strong>and</strong>om mutagenesis methods (MAP: Mutagenesis Assistant<br />
Program) towards the sequence/structure <strong>and</strong> structure/function analysis <strong>and</strong> to evaluate<br />
this approach on commonly used enzymes as biocatalysts. Chapter 1 <strong>of</strong>fers the<br />
comprehensive information about the computational methods used to assist protein<br />
engineering experiments. Chapter 2 describes a completely renewed <strong>and</strong> improved version<br />
<strong>of</strong> MAP server, named as MAP 2.0 3D server that correlates the generated amino acid<br />
substitution patterns to the structural information <strong>of</strong> the target protein. Therefore, the<br />
latter helps to identify in advance the r<strong>and</strong>om mutagenesis method that can introduce<br />
mutations having less deleterious effect <strong>and</strong> to improve protein fitness towards an<br />
expected property, e.g. charged amino acid substitutions to increase solubility <strong>of</strong> protein in<br />
water. The capability <strong>of</strong> the server was illustrated by in-silico screening <strong>of</strong> different<br />
enzymes <strong>and</strong> the predicted results were in agreement with the experimental findings.<br />
iv
The atomic level underst<strong>and</strong>ing <strong>of</strong> the subtle intertwining among structure,<br />
dynamics <strong>and</strong> function <strong>of</strong> enzymes plays an important role to rationally design new or<br />
improved functions. Second part <strong>of</strong> the thesis (Part II, Chapter 3 – 6) is based on molecular<br />
modeling approach to gain insight into the structural <strong>and</strong> dynamic properties <strong>of</strong> P450BM-3<br />
(CYP102) complex in water <strong>and</strong> in the presence <strong>of</strong> cobalt(II)sepulchrate (CoSep) as an<br />
electron transfer (ET) mediator. P450BM-3, isolated from Bacillus megaterium is an<br />
attractive target <strong>and</strong> model system for biochemical (catalyzes the wide variety <strong>of</strong><br />
industrially attractive substrates) <strong>and</strong> biomedical (being a bacterial model for microsomal<br />
P450s system) applications. The comprehensive theoretical aspects <strong>of</strong> MD simulation are<br />
provided in Chapter 3 with the overview about the system preparation for MD simulation<br />
<strong>and</strong> the analysis <strong>of</strong> protein conformation <strong>and</strong> dynamics in the generated trajectory. In<br />
Chapter 4, the structural <strong>and</strong> dynamic properties <strong>of</strong> P450BM-3 FMN (Flavin<br />
mononucleotide) domain as holo-protein, with the c<strong>of</strong>actor in oxidized <strong>and</strong> reduced states<br />
<strong>and</strong> as apo-protein are investigated. The results illustrate the effect <strong>of</strong> FMN c<strong>of</strong>actor <strong>and</strong> its<br />
protonation state on the conformation <strong>and</strong> dynamics <strong>of</strong> the FMN domain that can be<br />
related to ET pathway from FMN to HEME c<strong>of</strong>actor. The study is further extended to garner<br />
insight into the binding modes <strong>and</strong> the structural determinant <strong>of</strong> inter-domain ET in<br />
HEME/FMN complex <strong>of</strong> P450BM-3. MD simulations were performed on both FMN <strong>and</strong><br />
HEME domains, isolated <strong>and</strong> in their crystallographic complex <strong>and</strong> results are reported in<br />
Chapter 5. HEME/FMN complex undergoes the rearrangement process to decrease the<br />
distance between their redox centers to promote favorable ET rate under physiological<br />
condition. In Chapter 6, MD simulation <strong>of</strong> P450BM-3 domains (isolated HEME domain <strong>and</strong><br />
HEME/FMN complex) were performed in the presence <strong>of</strong> CoSep, as ET mediator. The<br />
results illustrate the preferential binding modes <strong>of</strong> CoSep in P450BM-3 domains <strong>and</strong> the<br />
putative ET pathways from CoSep to the iron center <strong>of</strong> HEME c<strong>of</strong>actor <strong>and</strong> are in agreement<br />
with the experimental findings.<br />
v
Table <strong>of</strong> Content<br />
Acknowledgement .............................................................................................................................................. i<br />
Funding .................................................................................................................................................................... ii<br />
List <strong>of</strong> Publication ............................................................................................................................................. iii<br />
Abstract .................................................................................................................................................................. iv<br />
Chapter 1 ................................................................................................................................................................. 1<br />
1.1. Abstract ................................................................................................................................................... 1<br />
1.2. Background ............................................................................................................................................ 1<br />
1.3. Generated diversity <strong>and</strong> library size ............................................................................................ 5<br />
1.4. Evolutionary conservation based focused library .................................................................. 9<br />
1.5. Structure-based focused library ................................................................................................. 15<br />
1.6. Mutational effects in protein ........................................................................................................ 23<br />
1.7. Summary <strong>and</strong> outlook..................................................................................................................... 26<br />
1.8. References ........................................................................................................................................... 27<br />
Chapter 2 .............................................................................................................................................................. 38<br />
2.1. Abstract ................................................................................................................................................ 38<br />
2.2. Introduction ........................................................................................................................................ 39<br />
2.3. Methods ................................................................................................................................................ 41<br />
2.3.1. Mutational probability <strong>and</strong> statistics ............................................................................... 41<br />
2.3.2. MAP indicators .......................................................................................................................... 43<br />
2.3.3. Local chemical diversity <strong>and</strong> protein structure components ................................. 44<br />
2.3.4. MAP 2.0 3D server description ............................................................................................... 46<br />
2.3.5. MAP 2.0 3D output....................................................................................................................... 48<br />
2.3.6. Model proteins .......................................................................................................................... 48<br />
2.4. Results <strong>and</strong> discussions ................................................................................................................. 49<br />
2.4.1. D-amino acid oxidase ............................................................................................................. 49<br />
vi
2.4.2. Phytase ......................................................................................................................................... 57<br />
2.4.3. N-acetylneuraminic acid aldolase ..................................................................................... 61<br />
2.5. Conclusions ......................................................................................................................................... 65<br />
2.6. References ........................................................................................................................................... 66<br />
Chapter 3 .............................................................................................................................................................. 71<br />
3.1. Background ......................................................................................................................................... 71<br />
3.2. Setup <strong>of</strong> the simulated systems ................................................................................................... 75<br />
3.3. Equilibration procedure ................................................................................................................ 76<br />
3.4. Structural <strong>and</strong> dynamical analysis ............................................................................................. 77<br />
3.5. Cluster analysis ................................................................................................................................. 77<br />
3.6. Principal component analysis ...................................................................................................... 78<br />
3.7. References ........................................................................................................................................... 79<br />
Chapter 4 .............................................................................................................................................................. 81<br />
4.1. Abstract ................................................................................................................................................ 81<br />
4.1. Introduction ........................................................................................................................................ 82<br />
4.2. Methods ................................................................................................................................................ 84<br />
4.2.1. Starting coordinates ............................................................................................................... 84<br />
4.2.2. Molecular dynamics simulation ......................................................................................... 84<br />
4.2.3. FMN binding site analysis ..................................................................................................... 86<br />
4.2.4. Multiple structural alignment <strong>of</strong> FMN domain ............................................................. 86<br />
4.3. Results ................................................................................................................................................... 87<br />
4.3.1. FMN domain: structural <strong>and</strong> dynamical properties ................................................... 87<br />
4.3.2. Cluster analysis <strong>of</strong> FMN domain......................................................................................... 89<br />
4.3.3. FMN binding site ...................................................................................................................... 90<br />
4.3.4. Conservation pr<strong>of</strong>ile <strong>of</strong> FMN binding site ...................................................................... 94<br />
4.3.5. Principal component analysis <strong>of</strong> FMN domain ............................................................. 96<br />
4.3.6. FMN c<strong>of</strong>actor: structural <strong>and</strong> dynamical properties .................................................. 99<br />
4.3.7. Cluster analysis <strong>of</strong> FMN c<strong>of</strong>actor ..................................................................................... 102<br />
4.3.8. Principal component analysis <strong>of</strong> FMN c<strong>of</strong>actor.......................................................... 102<br />
4.4. Discussions <strong>and</strong> conclusions ...................................................................................................... 103<br />
vii
4.5. References ......................................................................................................................................... 105<br />
Supporting information ............................................................................................................................. 109<br />
Chapter 5 ............................................................................................................................................................ 122<br />
5.1. Abstract .............................................................................................................................................. 122<br />
5.2. Introduction ...................................................................................................................................... 123<br />
5.3. Methods .............................................................................................................................................. 125<br />
5.3.1. Starting coordinates ............................................................................................................. 125<br />
5.3.2. Molecular dynamic simulations ....................................................................................... 125<br />
5.3.2. Electron transfer tunneling ................................................................................................ 127<br />
5.4. Results <strong>and</strong> discussion.................................................................................................................. 127<br />
5.4.1. Structural properties ............................................................................................................ 127<br />
5.4.2. Cluster analysis ....................................................................................................................... 130<br />
5.4.3. Substrate access channel .................................................................................................... 131<br />
5.4.4. ET tunneling pathways ........................................................................................................ 133<br />
5.4.5. Essential dynamics ..................................................................................................................... 135<br />
5.5. Conclusions ....................................................................................................................................... 139<br />
5.6. References ..................................................................................................................................... 140<br />
Supporting information ............................................................................................................................. 143<br />
Chapter 6 ............................................................................................................................................................ 151<br />
6.1. Abstract .............................................................................................................................................. 151<br />
6.2. Introduction ...................................................................................................................................... 152<br />
6.3. Methods .............................................................................................................................................. 153<br />
6.3.1. Starting coordinates ............................................................................................................. 153<br />
6.3.2. Molecular dynamics simulation <strong>and</strong> modeling .......................................................... 154<br />
6.4. Results <strong>and</strong> discussion.................................................................................................................. 156<br />
6.4.1. CoSep binding on P450BM-3 domains .......................................................................... 157<br />
6.4.2. Effect <strong>of</strong> CoSep binding on substrate access channel .............................................. 159<br />
6.4.3. Effect <strong>of</strong> CoSep binding on ET tunneling....................................................................... 159<br />
6.4.4. Effect <strong>of</strong> CoSep binding on P450BM-3 dynamics ...................................................... 162<br />
6.5. Conclusions ....................................................................................................................................... 166<br />
viii
6.6. References ......................................................................................................................................... 167<br />
Supporting information ............................................................................................................................. 170<br />
Summary <strong>and</strong> outlook................................................................................................................................. 184<br />
Curriculum vitae<br />
ix
PART I: CAPDE<br />
Chapter 1<br />
Computer-Aided Protein Directed Evolution: a Review <strong>of</strong><br />
Web Servers, Databases <strong>and</strong> other Computational Tools<br />
for Protein Engineering<br />
1.1. Abstract<br />
The combination <strong>of</strong> computational <strong>and</strong> directed evolution methods has proven a<br />
winning strategy for protein engineering. We refer to this approach as computer-aided<br />
protein directed evolution (CAPDE) <strong>and</strong> the chapter summarizes the recent developments<br />
in this rapidly growing field. We will restrict ourselves to overview the availability,<br />
usability <strong>and</strong> limitations <strong>of</strong> web servers, databases <strong>and</strong> other computational tools proposed<br />
in the last five years. The goal <strong>of</strong> this chapter is to provide concise information about<br />
currently available computational resources to assist the design <strong>of</strong> directed evolution<br />
based protein engineering experiment.<br />
1.2. Background<br />
Protein engineering comprises a large number <strong>of</strong> techniques applied to evolve or<br />
design protein with desired function.[1] The primary objective in any protein engineering<br />
experiment is to identify specific sequence changes <strong>and</strong> alter the protein for desired<br />
1
PART I: CAPDE<br />
functional properties.[1,2] Generally, two main approaches are used to design the novel<br />
proteins or enzymes: rational design <strong>and</strong> directed evolution. The first approach employs<br />
the information <strong>of</strong> protein structure <strong>and</strong> focuses mutagenesis to modify protein scaffolds<br />
(e.g. the active site <strong>of</strong> the biocatalyst). For this approach, the knowledge <strong>of</strong> the target amino<br />
acid is necessary <strong>and</strong> can be provided by visual inspection or in-silico prescreening.[3] Both<br />
cases depend on the nature <strong>of</strong> the problem <strong>and</strong> show high success rate only for the<br />
prediction <strong>of</strong> single or double mutations. Indeed, multiple mutations involve cooperative<br />
effects on protein structure <strong>and</strong> function that are almost inaccessible to the current<br />
computational screening methods as well.<br />
A more challenging de novo design or redesign <strong>of</strong> synthetic protein or peptide uses<br />
solely structural information <strong>and</strong> folding rules <strong>of</strong> the proteins.[4,5] Although the method<br />
<strong>of</strong>fers broadest possibility to design novel fold <strong>and</strong> function, the success for large proteins<br />
is limited.[6,7] The reasons rely on the limited number <strong>of</strong> three-dimensional protein<br />
structures (in particular membrane proteins) <strong>and</strong> the lack <strong>of</strong> unifying theory for protein<br />
folding mechanisms. Computational approaches based on micro-second to milliseconds<br />
atomistic [8-10] molecular dynamics (MD) simulations <strong>of</strong> protein folding have recently<br />
given some encouraging success for ab-initio folding <strong>of</strong> peptides <strong>and</strong> small proteins. In<br />
addition, the combined approach <strong>of</strong> quantum mechanics <strong>and</strong> molecular dynamics methods<br />
have shown the superior capability <strong>of</strong> physical based method to design new enzymatic<br />
reaction.[11] However, the application <strong>of</strong> these methods is still limited since they are<br />
considerably computational time dem<strong>and</strong>ing.[12] In this chapter, the approaches based on<br />
de novo design, quantum mechanics <strong>and</strong> molecular dynamics will not be covered. The<br />
reader can refer to different recent papers <strong>and</strong> reviews on these topics.[13-16]<br />
The second approach is the so-called directed evolution. The method is one <strong>of</strong> the<br />
most powerful approaches to improve or create new protein function by redesigning the<br />
protein structure.[17] It can, for example, improve activity or stability <strong>of</strong> biocatalyst under<br />
unnatural conditions (e.g. the presence <strong>of</strong> organic solvent) by accumulating multiple<br />
mutations.[17,18] Directed evolution involves multiple rounds <strong>of</strong> r<strong>and</strong>om mutagenesis or<br />
gene shuffling followed by screening <strong>of</strong> the mutant library.[19] The preliminary knowledge<br />
2
PART I: CAPDE<br />
<strong>of</strong> protein structure is not required in directed protein evolution. However, the structural<br />
information can focus <strong>and</strong> restrict the approach to specific subsets <strong>of</strong> amino acids (e.g.<br />
active site residues). A common problem <strong>of</strong> directed evolution methods is the limited<br />
distribution <strong>of</strong> generated sequence diversity that reduces the efficient sampling <strong>of</strong><br />
functional sequence space.[19,20]<br />
In summary, rational design via site directed or saturation mutagenesis <strong>and</strong> directed<br />
evolution via r<strong>and</strong>om mutagenesis are used as key tools in protein engineering. In both<br />
approaches, the sequence diversity is directly generated as point mutation, insertion or<br />
deletion within a single parental gene. Consequently, the improvement in the quality <strong>of</strong><br />
rationally designed libraries <strong>and</strong> techniques for sequence space exploration <strong>and</strong> diversity<br />
generation are critical for future advances.<br />
The combination <strong>of</strong> experimental <strong>and</strong> computational methods holds particular<br />
promise to tailor the proteins for tasks not yet exploited by natural selection.[21,22] In fact,<br />
most <strong>of</strong> the computational tools or web servers for directed evolution utilize, when it is<br />
possible, structural data to assist library generation processes. Since it is impossible to test<br />
more than a very small fraction <strong>of</strong> vast number <strong>of</strong> possible protein sequences, it urges to<br />
have a directed evolution strategy for generating sequence libraries with the highest<br />
chance to have variants with desired enzymatic properties. Such libraries can be designed<br />
by applying the current knowledge <strong>of</strong> the protein response towards mutations <strong>and</strong><br />
sequence-structure-function relationships.<br />
Thermo stability, solvent stability (pH <strong>and</strong> salt stability or co-solvents tolerance)<br />
<strong>and</strong> enzymatic activity (as improvement in both binding affinity <strong>and</strong> catalytic activity) are<br />
the properties commonly targeted by protein engineering experiments. The first two<br />
effects are subtle to predict due to their distributed effect on protein structure. For the<br />
enzymatic activity, different mutagenesis studies indicate that most <strong>of</strong> the mutations,<br />
affecting certain enzyme properties, as substrate specificity, enantioselectivity <strong>and</strong> new<br />
catalytic activities, are located into or near the active site.[21] Rational design approach is<br />
successful in targeting relevant active site residues for site-directed mutagenesis but less<br />
3
PART I: CAPDE<br />
effective for important residues located in the second coordination sphere <strong>of</strong> the active site.<br />
For these cases, the combination <strong>of</strong> r<strong>and</strong>om mutagenesis <strong>and</strong> computer-aided protein<br />
directed evolution (CAPDE) approaches can provide a winning strategy. The application <strong>of</strong><br />
computational methods in conjunction with directed evolution <strong>of</strong>fers the exciting promise<br />
to generate libraries having high frequency <strong>of</strong> active <strong>and</strong> improved variants.[23]<br />
Figure 1.1: Schematic representation <strong>of</strong> four CAPDE approaches (as the quarters <strong>of</strong> the circle): (1)<br />
generated diversity <strong>and</strong> library size (in red), (2) evolutionary conservation based focused library<br />
(in green), (3) structure-based focused library (in purple) <strong>and</strong> (4) mutational effects in protein (in<br />
cyan). The servers, tools <strong>and</strong> databases associated with the approaches are shown in boxes.<br />
4
PART I: CAPDE<br />
In this chapter, for the sake <strong>of</strong> clarity, the CAPDE approaches have been divided in<br />
four major areas, schematically represented in Figure 1.1. The first one comprises tools<br />
used for characterizing the library generated by mutagenesis methods mainly through the<br />
statistical approaches. The second <strong>and</strong> third areas are represented by tools that consider<br />
the evolutionary <strong>and</strong> structural information <strong>of</strong> the target protein to design the focused<br />
library. Multiple sequence or structure alignment (MSA) is the key approach used by these<br />
tools to identify variable or conserved positions in the target protein. The fourth part is<br />
dedicated to the tools for the prediction <strong>of</strong> mutational effects on protein structure <strong>and</strong><br />
function. These tools <strong>and</strong>/or web servers are based on machine learning, statistical or<br />
empirical approaches <strong>and</strong> predict mutational effect on protein stability <strong>and</strong>/or activity by<br />
estimating the relative free energy changes.[24]<br />
This chapter is divided in four parts following the division <strong>of</strong> CAPDE approaches. It<br />
aims to provide the concise information about currently available CAPDE methods to assist<br />
<strong>and</strong> design directed evolution experiments with the final goal to enhance the probability<br />
for identifying the mutants with desired properties. In particular, the reader will find a<br />
short overview <strong>and</strong> classification to novel database, web server <strong>and</strong> other computational<br />
tools that can provide relevant information for the interpretation <strong>of</strong> experimental results<br />
<strong>and</strong> have been developed in the last few years in the field <strong>of</strong> molecular modeling <strong>of</strong> protein<br />
structure. Finally <strong>and</strong> as previously mentioned, we are not going to take in consideration<br />
the methods that involve physical approach based on QM/MM or MD simulations.<br />
1.3. Generated diversity <strong>and</strong> library size<br />
The unbiased diversity generation followed by the screening <strong>of</strong> a statistically<br />
meaningful fraction <strong>of</strong> generated sequence space are fundamental challenges in directed<br />
evolution experiments.[25] The directed evolution strategy comprises two key steps: 1)<br />
generate diverse mutant libraries <strong>and</strong> 2) screen to identify the improved protein variants.<br />
The success <strong>of</strong> a directed evolution methods depends upon the quality <strong>of</strong> the mutant<br />
5
PART I: CAPDE<br />
library. The challenges <strong>and</strong> advances to generate the functionally diverse libraries have<br />
been reviewed in past year.[20,26] Computational tools can assist directed evolution in<br />
these two steps by in-silico analysis <strong>and</strong> screening <strong>of</strong> expected protein sequence space<br />
sampled by generated libraries (summarized in Table 1.1). Publicly available web servers,<br />
MAP (Mutagenesis Assistant Program)[25,27] <strong>and</strong> PEDAL-AA[28] were developed to<br />
estimate the diversity at protein level in the library generated by r<strong>and</strong>om mutagenesis<br />
method.<br />
Table 1.1: Summarizing computational tools to analyze amino acid diversity, size <strong>and</strong><br />
completeness <strong>of</strong> the library generated by mutagenesis methods.<br />
Approach Name Input<br />
Nucleotide<br />
MAP 2.0 3D<br />
sequence or<br />
[25,27]<br />
protein structure.<br />
Statistics <strong>of</strong><br />
Nucleotide<br />
generated<br />
sequence,<br />
diversity<br />
PEDEL-AA mutation rate,<br />
[28] library size, indel<br />
rate, nucleotide<br />
mutation matrix.<br />
Library size <strong>and</strong><br />
Library size GLUE-IT<br />
r<strong>and</strong>omization<br />
<strong>and</strong><br />
[28]<br />
techniques.<br />
completenes<br />
s<br />
Probability<br />
TopLib [30]<br />
required by<br />
Case study<br />
examples<br />
Cytochrome<br />
P450BM-3,[25] D-<br />
amino acid oxidase,<br />
Phytase [27]<br />
α-synuclein,<br />
Phosphoribosylpyro<br />
phosphate<br />
amidotransferase<br />
(purF) [29]<br />
R<strong>and</strong>omization<br />
scheme: NNK. NDT,<br />
NNB, NAY [28]<br />
R<strong>and</strong>omization<br />
scheme: NNN, NNB,<br />
URL<br />
http://map.jacob<br />
s-<br />
university.de/su<br />
bmission.html<br />
http://guinevere<br />
.otago.ac.nz/cgi-<br />
bin/aef/pedel-<br />
AA.pl<br />
http://guinevere<br />
.otago.ac.nz/cgi-<br />
bin/aef/glue-<br />
IT.pl<br />
http://stat.haifa.<br />
ac.il/~yuval/topl<br />
6
PART I: CAPDE<br />
library size <strong>and</strong><br />
r<strong>and</strong>omization<br />
techniques.<br />
NNK, MAX [30]<br />
ib/<br />
Figure 1.2: a) The MAP 2.0 3D analysis for the amino acid diversity generated by balanced epPCR<br />
(Taq (MnCl 2, G=A=C=T) method. Y-axis shows the original amino acid species <strong>and</strong> the X-axis shows<br />
the amino acid substitution patterns indicated from red (lowest probability) to blue (highest<br />
probability). The MAP 2.0 3D analysis is restricted to the active site residues (Ala11, Ser47, Thr48,<br />
Tyr137, Ile139, Lys165, Thr167, Gly189, Tyr190). For this analysis, the amino acids are grouped<br />
into four classes according to their chemical nature (charged, neutral, aromatic <strong>and</strong> aliphatic) with<br />
stop codon ((structure disrupting) <strong>and</strong> glycine/proline (helix destabilizing) as separate classes. The<br />
7
PART I: CAPDE<br />
probabilities <strong>of</strong> amino acid substitutions were mapped on the protein sequence <strong>and</strong> structure (PDB<br />
Id: 1NAL) <strong>of</strong> N-acetylneuraminic acid <strong>and</strong> represented in b <strong>and</strong> c, respectively. b) The Jmol [33]<br />
applet is used for the visualization <strong>of</strong> amino acid substitution patterns using RWB (Red-white-blue)<br />
color gradient scheme <strong>and</strong> active site residues as sticks. Y-axis shows sequence id, PDB id, amino<br />
acid name <strong>and</strong> in c) secondary structure elements (T: hydrogen bonded turn <strong>and</strong> bend, *: loop or<br />
irregular structure), d) normalized Cα b-factor to differentiate flexible (F) <strong>and</strong> rigid (R) residues,<br />
<strong>and</strong> e) relative solvent associability to identify exposed (E) or buried (B) residues.<br />
MAP [25] takes nucleotide sequence as input <strong>and</strong> assists to design better directed<br />
evolution strategy by providing the statistical analysis <strong>of</strong> r<strong>and</strong>om mutagenesis methods on<br />
protein level. The capabilities <strong>of</strong> MAP was extended in MAP 2.0 3D[27] server that predicts<br />
the residue mutability resulted by the mutational bias <strong>of</strong> r<strong>and</strong>om mutagenesis methods <strong>and</strong><br />
correlates the generated amino acid substitution patterns with the structural information<br />
<strong>of</strong> the target protein. In this way, the server <strong>of</strong>fers the possibility to analyze the<br />
consequences <strong>of</strong> the limitations <strong>of</strong> mutational preferences <strong>of</strong> r<strong>and</strong>om mutagenesis methods<br />
on protein level <strong>and</strong> their effects on protein structure.[25] The capability <strong>of</strong> the server was<br />
illustrated by the in-silico screening <strong>of</strong> different enzymes <strong>and</strong> the predicted results were in<br />
agreement with the experimental results.[27,31,32 ] Figure 1.2 shows an example <strong>of</strong> the<br />
MAP 2.0 3D output for active site residues <strong>of</strong> N-acetylneuraminic acid using epPCR<br />
method.[27]<br />
PEDAL-AA returns statistics, at amino acid level <strong>and</strong> for the libraries generated by<br />
epPCR method, after providing the nucleotide sequence with library size, mutation rate,<br />
indel rate <strong>and</strong> nucleotide mutation matrix.[28] CodonCalculator <strong>and</strong> AA-Calculator are two<br />
algorithms developed by Patrik et al. to select an appropriate r<strong>and</strong>omization scheme for<br />
library construction.[28] Two servers GLUE-IT <strong>and</strong> GLUE estimate amino acid diversity <strong>and</strong><br />
completeness in the generated library. Finally, the TopLib [30] web server assists to design<br />
saturation mutagenesis experiment by predicting the size or completeness <strong>of</strong> the generated<br />
library with the user-defined codon r<strong>and</strong>omization scheme using probabilistic approach.<br />
8
PART I: CAPDE<br />
1.4. Evolutionary conservation based focused library<br />
Multiple sequence or structure alignment (MS) is the most common approach to<br />
identify functionally significant or evolutionary variable regions in protein.[34] In CAPDE,<br />
several servers <strong>and</strong> databases use MSA with the physical <strong>and</strong> structural information <strong>of</strong><br />
protein or protein superfamilies. Table 1.2 contains a list <strong>of</strong> the tools considered in this<br />
chapter. ConSurf 2010 [35] server provides the evolutionary conservation pr<strong>of</strong>iles <strong>of</strong><br />
protein or nucleic acid sequence or structure by first identifying the conserved positions<br />
using MSA <strong>and</strong> then calculating the evolutionary conservation rate using an empirical<br />
Bayesian inference. ConSurf-DB [36] database make available the evolutionary<br />
conservation pr<strong>of</strong>iles <strong>of</strong> the available protein structures pre-calculated by ConSurf web<br />
server. The 3DM [37] server performs structure based multiple sequence alignments (MSA)<br />
<strong>of</strong> the members <strong>of</strong> a protein superfamily <strong>and</strong> provides the consensus data combined with<br />
other useful information, like interactions <strong>and</strong> solvent accessibility, about amino acid<br />
positions in protein with published mutation data.<br />
For more focused analysis <strong>of</strong> protein hotspots or amino acid patches, three<br />
interesting tools are available as st<strong>and</strong>alone programs or web servers. The Joint<br />
Evolutionary Tree (JET) method is more tuned to identify the conserved amino acids<br />
patches on protein interface by taking into account the physical-chemical properties <strong>and</strong><br />
evolutionary conservation <strong>of</strong> the surface residues.[38] The predicted protein interaction<br />
sites or core residues might be used in site-specific mutagenesis experiments. HotSprint<br />
[39] database provides information <strong>of</strong> the hotspots in protein interfaces using the sequence<br />
conservation score (calculated by Rate4Site algorithm [40]) <strong>of</strong> the residues <strong>and</strong> their<br />
solvent accessible surface area. HotSpot Wizard predicts the suitability <strong>of</strong> the mutagenesis<br />
<strong>of</strong> the amino acids in or near the active site using their evolutionary conservation<br />
information.[41] The server takes protein structure as input <strong>and</strong> provides a platform to<br />
experimentalists to select target amino acids for site directed mutagenesis to improve<br />
enzymatic properties like substrate specificities, activity <strong>and</strong> enantioselectivity.[41]<br />
MAP 2.0 3D [27] (Table 1.1, see previous paragraph) also provides the information <strong>of</strong><br />
9
PART I: CAPDE<br />
mutagenic hotspots generated due to the mutational preferences <strong>of</strong> the r<strong>and</strong>om<br />
mutagenesis methods with sequence <strong>and</strong> structural information <strong>of</strong> protein. Selecton [42]<br />
web server predicts the selective forces at each amino acid position in protein. The server<br />
performs the codon-based alignment on a set <strong>of</strong> the homologous nucleotide sequences <strong>and</strong><br />
uses the ratio <strong>of</strong> amino acids altered to silent substitutions (Ka/Ks) to estimate both the<br />
positive (>1) <strong>and</strong> purifying (
PART I: CAPDE<br />
The web server<br />
performs MSA <strong>and</strong><br />
ConSurf 2010<br />
[35]<br />
calculates evolutionary<br />
conservation rate to<br />
identify conserved<br />
positions in protein or<br />
GAL4<br />
transcription<br />
factor [35]<br />
http://consur<br />
f.tau.ac.il/<br />
nucleotide<br />
sequence/structure.<br />
The database provides<br />
ConSurf DB [36]<br />
the predicted results <strong>of</strong><br />
ConSurf [35] server for<br />
known protein<br />
Cytochrome c<br />
[36]<br />
http://consur<br />
fdb.tau.ac.il/in<br />
dex.php<br />
structures.<br />
Hotspot<br />
identificatio<br />
n<br />
The Evolutionary trace<br />
based method performs<br />
MSA on a set <strong>of</strong><br />
homologous sequences<br />
DNA<br />
polymerase I,<br />
DNA<br />
transferase,<br />
(from PSI-BLAST) after<br />
allophycocya<br />
Gibbs like sampling.<br />
nin, Leucine<br />
JET [38]<br />
The aligned<br />
homologous sequences<br />
are used to construct<br />
distance tree based on<br />
Neighbor Joining<br />
dehydrogena<br />
se, β-trypsin<br />
proteinase,<br />
phosphotrans<br />
ferase,<br />
http://www.i<br />
hes.fr/~carbo<br />
ne/data6/lege<br />
nda.htm<br />
algorithm. The<br />
human<br />
clustering method is<br />
CDC42 gene<br />
parameterized to<br />
regulation<br />
identify protein<br />
protein,<br />
interface or core<br />
oncogene<br />
residues by taking into<br />
protein,<br />
11
PART I: CAPDE<br />
account the physical-<br />
signal<br />
chemical properties <strong>and</strong><br />
transduction<br />
evolutionary<br />
protein etc<br />
conservation.<br />
[38]<br />
The database provides<br />
information about<br />
HotSprint<br />
Database [39]<br />
hotspots in protein<br />
interface using<br />
conservation rate <strong>and</strong><br />
Numb PTB<br />
domain [39]<br />
http://prism.c<br />
cbb.ku.edu.tr/<br />
hotsprint/<br />
solvent accessibility <strong>of</strong><br />
the residues.<br />
Haloalkane<br />
dehalogenase<br />
HotSpot wizard<br />
[41]<br />
The web server predicts<br />
residue mutability <strong>of</strong><br />
functionally important<br />
residues <strong>and</strong> visualizes<br />
it on protein sequence<br />
<strong>and</strong> structure.<br />
,<br />
Phosphotries<br />
terase, 1,3-<br />
1,4-b-D-<br />
Glucan 4-<br />
glucanohydro<br />
lase, β-<br />
http://loschm<br />
idt.chemi.mun<br />
i.cz/hotspotwi<br />
zard/<br />
Lactamase<br />
[41]<br />
The web server detects<br />
Selecton [42]<br />
selection forces on<br />
biologically significant<br />
sites in the target<br />
protein during<br />
TRIM5α<br />
protein [42]<br />
http://selecto<br />
n.tau.ac.il/ind<br />
ex.html<br />
evolutionary process.<br />
Protein<br />
superfamily<br />
3DM [37]<br />
The database performs<br />
structure based MSA for<br />
α/β<br />
hydrolase<br />
http://3dmcsi<br />
s.systemsbiol<br />
12
PART I: CAPDE<br />
based MSA<br />
a protein superfamily<br />
fold [53]<br />
ogy.nl/<br />
with sequence,<br />
structural, molecular<br />
interaction <strong>and</strong><br />
mutational information<br />
from literature.<br />
The Lipase<br />
Engineering<br />
Database<br />
[43,54,55]<br />
Lipases<br />
[43,54,55]<br />
http://www.l<br />
ed.unistuttgart.de/<br />
The database <strong>of</strong><br />
Epoxide<br />
epoxide<br />
hydrolases <strong>and</strong><br />
haloalkane<br />
dehalogenase<br />
The database performs<br />
hydrolases<br />
<strong>and</strong><br />
haloalkane<br />
dehalogenase<br />
http://www.l<br />
ed.unistuttgart.de/<br />
[56]<br />
protein superfamily<br />
[56]<br />
The Laccase<br />
based MSA <strong>and</strong><br />
http://www.l<br />
Engineering<br />
annotates functionally<br />
Laccases [45]<br />
cced.uni-<br />
database [45]<br />
relevant amino acid<br />
stuttgart.de/<br />
The Cytochrome<br />
P450<br />
engineering<br />
database [57]<br />
positions with<br />
structural <strong>and</strong><br />
mutational information.<br />
Cytochrome<br />
P450s [57]<br />
http://www.c<br />
yped.unistuttgart.de/<br />
The PHA<br />
Depolymerase<br />
Engineering<br />
Database [44]<br />
Polyhydroxya<br />
lkanoates<br />
depolymeras<br />
e [44]<br />
http://www.d<br />
ed.unistuttgart.de/<br />
The Lactamase<br />
Engineering<br />
database [46]<br />
Lactamases<br />
[46]<br />
http://www.l<br />
aced.unistuttgart.de/<br />
13
PART I: CAPDE<br />
SHV Lactamase<br />
Engineering<br />
Database [47]<br />
SHV<br />
lactamases<br />
[47]<br />
http://www.l<br />
aced.unistuttgart.de/cl<br />
assA/SHVED/<br />
PMD [48]<br />
The database provides<br />
literature based protein<br />
mutant information<br />
with structure <strong>and</strong><br />
functional annotation.<br />
http://pmd.d<br />
dbj.nig.ac.jp/<br />
~pmd/pmd.ht<br />
ml<br />
The database provides<br />
literature based protein<br />
Literature<br />
based<br />
ProTherm [49-<br />
51]<br />
mutant information<br />
with thermodynamic<br />
parameters <strong>and</strong><br />
experimental<br />
conditions integrated<br />
with sequence,<br />
http://gibk26.<br />
bio.kyutech.ac<br />
.jp/jouhou/Pr<br />
otherm/proth<br />
erm.html<br />
protein<br />
structure <strong>and</strong> function<br />
mutant data<br />
annotation.<br />
The database provides<br />
literature based protein<br />
MuteinDB [52]<br />
mutant information,<br />
kinetic parameters <strong>and</strong><br />
experimental<br />
conditions integrated<br />
with user-friendly <strong>and</strong><br />
flexible query system to<br />
fetch data using<br />
Cytochrome<br />
P450s [52]<br />
https://mutei<br />
ndb.genome.t<br />
ugraz.at/mute<br />
indb-web-<br />
2.0/faces/init<br />
/index.seam<br />
reaction name or<br />
substrate or inhibitor<br />
14
PART I: CAPDE<br />
name or structure <strong>and</strong><br />
mutations.<br />
1.5. Structure-based focused library<br />
The structure based approaches assist rational design <strong>and</strong> r<strong>and</strong>om mutagenesis by<br />
predicting regions in the protein responsible for stability <strong>and</strong> activity.[2,58] The<br />
computational tools as 3DLig<strong>and</strong>Site [59], ProBiS [60,61] (Protein Binding Site) <strong>and</strong><br />
SiteComp [62] predict lig<strong>and</strong> binding site in protein [63]. All these tools, in the absence <strong>of</strong><br />
crystal structure, use homology model <strong>of</strong> the target protein <strong>and</strong> aid the design <strong>and</strong> tune<br />
lig<strong>and</strong> binding site by identifying key residues for activity <strong>and</strong> their molecular interactions<br />
properties. 3DLig<strong>and</strong>Site [59] performs alignment <strong>and</strong> clustering <strong>of</strong> the homologous<br />
structures to predict lig<strong>and</strong> binding site. ProBiS [60,61] uses MSA to detect structurally<br />
similar binding site in protein <strong>and</strong> also perform local structural pairwise alignment to<br />
identify functionally relevant binding regions. The pre-calculated results <strong>of</strong> ProBiS analysis<br />
are available via ProBiS-database [64] as a repository <strong>of</strong> structurally similar binding sites.<br />
SiteComp [62] characterizes protein binding site using molecular interaction fields based<br />
descriptors. The server evaluates differences in similar binding sites, identification <strong>of</strong> subsites<br />
<strong>and</strong> residue contributions in lig<strong>and</strong> binding. TRITON [65,66] provides the single<br />
platform to protein engineers to model mutants, perform protein-lig<strong>and</strong> docking <strong>and</strong><br />
calculate reaction pathways. In this way, these methods facilitate to study the properties <strong>of</strong><br />
protein-lig<strong>and</strong> complexes.<br />
The knowledge <strong>of</strong> molecular interactions, contribute to relevant free energy barrier,<br />
<strong>and</strong> the design <strong>of</strong> surface charge distribution, can help to underst<strong>and</strong> the molecular basis <strong>of</strong><br />
kinetic stability <strong>and</strong> efficiently modulates the enhancement <strong>of</strong> protein stability.[58,67] PIC<br />
(Protein Interaction Calculator) server [68] calculates inter or intra protein interactions<br />
using published criteria integrated with solvent accessibility <strong>and</strong> residue depth<br />
calculations. Recently introduced web server, COCOMAP (bioCOmplexes COntact MAPs)<br />
15
PART I: CAPDE<br />
[69] uses intermolecular interactions to analyze interfaces in biological complexes. The<br />
identification <strong>of</strong> exposed <strong>and</strong> buried amino acids also helps to gain insight into protein<br />
stability <strong>and</strong> to explore the mutational effect on protein. DEPTH [70] employ distance<br />
information between residues <strong>and</strong> bulk solvent to predict protein stability, conservation or<br />
binding cavity based on information about residue depth <strong>and</strong> solvent accessibility. SRide<br />
[71] provides residual contribution to protein stability using interactions, evolutionary<br />
conservations <strong>and</strong> hydrophobicity <strong>of</strong> their neighboring residues. Patch finder plus [72]<br />
identifies residues that contribute to positively charge patches on protein surface <strong>and</strong><br />
might interact with DNA, membrane or the other protein. ConPlex [73] utilizes protein<br />
solvent accessible surface area to identify surface or interface residues <strong>and</strong> assign residue<br />
specific conservation score on sequence <strong>and</strong> structure <strong>of</strong> the protein complex. The server<br />
also provides the pre-calculated ConPlex results <strong>of</strong> known protein complexes as repository.<br />
Recent studies have suggested that protein flexibility <strong>and</strong> protein functions are<br />
strongly linked.[24,74,75] Protein flexibility plays an important role in both catalytic<br />
activity <strong>and</strong> molecular recognition processes. The effect <strong>of</strong> protein flexibility is particularly<br />
relevant in protein from extremophiles to balance rigidity required for stability <strong>and</strong><br />
flexibility necessary for activity [76-78]. In addition, numerous proteins have regions that<br />
adopt different conformation under different conditions, allowing them to take part in<br />
cellular <strong>and</strong> molecular regulation.[24,79] The residue flexibility in protein has been taken<br />
in account to describe a variety <strong>of</strong> protein properties including relation with thermal<br />
stability, catalytic activity, lig<strong>and</strong> binding (induced fit), domain motion, preferential<br />
solvation <strong>and</strong> molecular recognition in intrinsically disordered protein system. The Debye–<br />
Waller factor, reported in crystallographic atomic resolution structures, provides an rough<br />
estimation <strong>of</strong> local residue flexibility [80] <strong>and</strong> different servers provide this information as<br />
an indicator (for example, in MAP 2.0 3D server [27]). If the crystallographic structure is not<br />
available then different tools can be used to estimate flexibility pr<strong>of</strong>iles using different<br />
approaches.<br />
The RosettaBackrub [81] server can generate protein backbone structural variability<br />
as consequence <strong>of</strong> amino acid variations [82] that can be used to design sequence libraries<br />
16
PART I: CAPDE<br />
for experimental screening <strong>and</strong> to predict protein or peptide interaction specificity. The<br />
server generates Rosetta scored modeled structures for variant with single or multiple<br />
point mutations in monomeric proteins. It also generates near-native structural ensembles<br />
<strong>of</strong> protein backbone conformations <strong>and</strong> sequences consistent with those ensembles.<br />
Finally, it can predict sequences tolerated by proteins or protein interfaces using flexible<br />
backbone design methods. The tCONCOORD [83] method generates conformational<br />
ensembles to gain insight in the conformational flexibility <strong>and</strong> conformational space <strong>of</strong> the<br />
protein.<br />
FlexPred [84] specially predicts residue flexibility using pattern recognition<br />
approach to identify residue positions in conformations switches integrated with their<br />
evolutionary conservation <strong>and</strong> normalized solvent accessibility (if structure is available) as<br />
the Support Vector Machine (SVM) predictors.<br />
Different simplified methods have been proposed to identify local flexibility or large<br />
scale motions in protein at coarse-grained level [85-87] Many <strong>of</strong> these methods are based<br />
on Gaussian network model (GNM) [88] or its extension, the anisotropic network model<br />
(ANM) [89] to study protein dynamics using Normal Mode Analysis (NMA) (see the review<br />
[90] for a general overview about these topics). Table 1.3 shows the tools available to<br />
analyze conformational flexibility on protein structure (for more details see [91]). ElNemo<br />
[92] <strong>and</strong> WEBnb@ [93] servers are reported here to complete the information about NMA<br />
based tools. Both the servers perform NMA using coarse grain model to analyze the<br />
conformational changes in protein. FlexServ [94] server estimates protein flexibility using<br />
three different coarse-grained approaches: 1) discrete molecular dynamics (DMD), 2)<br />
normal mode analysis (NMA) <strong>and</strong> 3) Brownian dynamics (BD). The server characterizes<br />
protein flexibility by analyzing different structural <strong>and</strong> dynamic properties <strong>of</strong> the protein<br />
such as structural variations, essential modes, stiffness between the interacting residues<br />
<strong>and</strong> dynamic domains <strong>and</strong> hinge points. Different tools are available to identify hinge<br />
bending residues on large-scale protein motions. HINGEprot [95] server predicts hinge<br />
motion in protein using coarse grained GNM <strong>and</strong> ANM model. DynDom [96] use a rigorous<br />
approach to describe domain motion. The method determines hinge axes <strong>and</strong> hinge<br />
17
PART I: CAPDE<br />
bending residues using two conformations <strong>of</strong> the protein. A recent addition to DynDom is<br />
the lig<strong>and</strong>-induced domain movements in enzymes database.[97] Furthermore, the<br />
Dyndom3D [98] server provides a more advanced <strong>and</strong> generic tool that can be used to<br />
study any kind <strong>of</strong> polymer.<br />
The reader should be noticed that the connection between protein flexibility <strong>and</strong><br />
function has been investigated theoretically <strong>and</strong> experimentally only in the last few years.<br />
[87,99-101] The methods based on this approach provide a qualitative estimation <strong>of</strong><br />
protein dynamical properties but they do not take in account many effects (such as direct<br />
solvent effects) that are important for protein functionality. Till now, the atomistic<br />
simulation (MD or QM/MD) is the best approach to quantitatively study protein flexibility<br />
<strong>and</strong> dynamics.[8,87,99] Nevertheless, even to this level <strong>of</strong> accuracy, the connection<br />
between flexibility <strong>and</strong> functionality is still puzzling. In addition, the simulation approaches<br />
are still time consuming <strong>and</strong> unpractical for high-throughput modeling <strong>and</strong> analysis <strong>of</strong><br />
protein structural dynamics.<br />
Table 1.3: Summarizing the computational tools for structure-based focused library<br />
generation.<br />
Approach Name Description<br />
The web server<br />
identifies lig<strong>and</strong><br />
3DLig<strong>and</strong>Site<br />
binding site via MSA<br />
[59]<br />
Lig<strong>and</strong><br />
<strong>and</strong> clustering<br />
binding site<br />
algorithm.<br />
The web server<br />
ProBiS [60,61] detects binding site<br />
using MSA <strong>and</strong><br />
Case study<br />
examples<br />
Target<br />
T0483 in<br />
CASP8<br />
Biotin<br />
carboxylase,<br />
TATA<br />
URL<br />
http://www.sbg.bio.i<br />
c.ac.uk/~3dlig<strong>and</strong>sit<br />
e/<br />
http://probis.cmm.ki<br />
.si/<br />
18
PART I: CAPDE<br />
characterizes it<br />
binding<br />
using local<br />
protein [60],<br />
structural pairwise<br />
D-alanine–<br />
alignment.<br />
D-alanine<br />
ligase,<br />
Protein<br />
kinases C<br />
[61]<br />
The database<br />
provides<br />
ProBiS-<br />
structurally similar<br />
Cytochrome<br />
http://probis.cmm.ki<br />
database [64]<br />
protein binding site<br />
c [64]<br />
.si/?what=database<br />
using ProBiS<br />
algorithm.<br />
The web server<br />
SiteComp [62]<br />
characterizes lig<strong>and</strong><br />
binding site using<br />
molecular<br />
interaction<br />
Cyclooxygen<br />
ase,<br />
adenylate<br />
kinase [62]<br />
http://scbx.mssm.ed<br />
u/sitecomp/sitecom<br />
p-web/Input.html<br />
descriptors.<br />
The method<br />
facilitates to model<br />
mutant, dock lig<strong>and</strong><br />
TRITON<br />
[65,66]<br />
in the protein <strong>and</strong><br />
calculates reaction<br />
pathways for the<br />
characterization <strong>of</strong><br />
PA-IIL lectin<br />
<strong>and</strong> its<br />
mutants<br />
[65]<br />
http://www.ncbr.mu<br />
ni.cz/triton/descripti<br />
on.html<br />
protein-lig<strong>and</strong><br />
interactions using<br />
Semi-empirical<br />
19
PART I: CAPDE<br />
quantum-mechanics<br />
approach.<br />
The web server<br />
PIC [68]<br />
calculates the<br />
molecular<br />
interactions using<br />
-<br />
http://pic.mbu.iisc.er<br />
net.in/job.html<br />
published criteria.<br />
The web server<br />
Protein<br />
interaction<br />
COCOMAPS<br />
[69]<br />
analyzes <strong>and</strong><br />
visualizes interfaces<br />
in biological<br />
complexes using<br />
intermolecular<br />
contact maps based<br />
Hen egg<br />
lysozyme<br />
interaction<br />
with two<br />
antibodies<br />
https://www.molnac<br />
.unisa.it/BioTools/co<br />
comaps/<br />
on distance or<br />
[69]<br />
physicochemical<br />
properties.<br />
The web server<br />
predicts binding<br />
cavity <strong>and</strong><br />
West Nile<br />
mutational effect on<br />
Virus<br />
http://mspc.bii.a-<br />
DEPTH [70]<br />
protein stability<br />
NS2B/NS3<br />
star.edu.sg/tankp/int<br />
Residue<br />
using residue depth<br />
protease<br />
ro.html<br />
depth <strong>and</strong><br />
<strong>and</strong> solvent<br />
[70]<br />
stability<br />
accessible surface<br />
area.<br />
SRIde [71]<br />
The web serve<br />
predicts the<br />
contribution <strong>of</strong><br />
residues in protein<br />
TIM-barrel<br />
proteins<br />
[102]<br />
http://sride.enzim.h<br />
u/<br />
20
PART I: CAPDE<br />
stability using<br />
interactions with its<br />
spatial neighbors<br />
<strong>and</strong> their<br />
evolutionary<br />
conservation.<br />
The web server<br />
identifies large<br />
positively charged<br />
DNA binding<br />
Patch finder<br />
plus [72]<br />
electrostatic patches<br />
on protein surface<br />
using Poisson<br />
domain <strong>of</strong><br />
TATA<br />
binding<br />
http://pfp.technion.a<br />
c.il/<br />
Protein<br />
Boltzmann<br />
protein [72]<br />
surface <strong>and</strong><br />
electrostatic<br />
interface<br />
potential.<br />
The web server<br />
performs<br />
Rho–<br />
ConPlex [73]<br />
evolutionary<br />
RhoGAP<br />
http://sbi.postech.ac.<br />
conservation<br />
complex<br />
kr/ConPlex/<br />
analysis <strong>of</strong> the<br />
[73]<br />
protein complex.<br />
The web server<br />
performs flexible<br />
https://kortemmelab<br />
Protein<br />
flexibility<br />
RosettaBackru<br />
b [81]<br />
backbone modeling<br />
using Backrub [103]<br />
method to design<br />
hGH-hGHr<br />
interface<br />
[104]<br />
.ucsf.edu/backrub/cg<br />
i-<br />
bin/rosettaweb.py?q<br />
tolerated protein<br />
uery=index<br />
sequences.<br />
tCONCOORD<br />
The method<br />
Osmoprotec<br />
http://wwwuser.gw<br />
[83]<br />
generates<br />
tion protein<br />
dg.de/~dseelig/tcon<br />
21
PART I: CAPDE<br />
FlexPred [84]<br />
ElNemo [92]<br />
WEBnm@<br />
[93]<br />
FlexServ [94]<br />
HINGEprot<br />
[95]<br />
conformation<br />
ensemble <strong>and</strong><br />
transitions using<br />
geometrical<br />
constrains based<br />
prediction <strong>of</strong><br />
protein<br />
conformational<br />
flexibility<br />
The web server<br />
predicts residue<br />
flexibility in the<br />
protein using SVM<br />
approach.<br />
The web server<br />
predicts large<br />
amplitude motions<br />
in the protein using<br />
NMA.<br />
The web server<br />
determines <strong>and</strong><br />
analyzes protein<br />
flexibility using<br />
coarse-grained<br />
modeling approach.<br />
The web server<br />
detects hinge region<br />
[83] coord.html<br />
Human PrP http://flexpred.rit.al<br />
[105] bany.edu/<br />
HIV-1<br />
protease, E. http://igs-<br />
coli server.cnrs-<br />
membrane<br />
mrs.fr/elnemo/index<br />
channel .html<br />
protein TolC<br />
Calcium http://apps.cbu.uib.n<br />
ATPase [93] o/webnma/home<br />
http://mmb.pcb.ub.e<br />
-<br />
s/FlexServ/input.ph<br />
p<br />
Calmodulin http://www.prc.bou<br />
protein, n.edu.tr/appserv/prc<br />
22
PART I: CAPDE<br />
in the protein using<br />
hemoglobin<br />
/hingeprot/<br />
both GNM <strong>and</strong> ANM.<br />
[95]<br />
The web server<br />
predicts domain<br />
Hemoglobin,<br />
DynDom3D<br />
motions using<br />
70S<br />
http://fizz.cmp.uea.a<br />
[98]<br />
conformational<br />
ribosome<br />
c.uk/dyndom/3D/<br />
changes in the<br />
[98]<br />
protein.<br />
1.6. Mutational effects in protein<br />
For biotechnological applications, the enhancement <strong>of</strong> protein thermal stability or<br />
tolerance is a common requested task in protein engineering.[106] Highly stable structure<br />
correlates with well-packed highly compact structure <strong>and</strong> has increased tolerance to<br />
mutation because mostly the mutations are deleterious i.e. related to instability <strong>of</strong><br />
protein.[107] Generally the effect <strong>of</strong> the mutation on protein has been calculated by the<br />
free energy differences between two states <strong>of</strong> protein like thermodynamic stability as<br />
change in free energy in folded <strong>and</strong> unfolded state (ΔΔG). The mutational effect has been<br />
predicted by using different machine learning <strong>and</strong> selection methods (as SVM, Decision<br />
Tree (DT) or R<strong>and</strong>om Forest (RE) [108]) for classification or regression <strong>of</strong> data or by using<br />
statistical or empirical methods taking into account the atomic interactions or structural<br />
properties like solvent accessibility. Most <strong>of</strong> the servers based on these approaches use<br />
available information <strong>of</strong> mutational effects (fetched from databases like PMD [48],<br />
ProTherm [51]) to predict the effect <strong>of</strong> new substitutions. Table 1.4 summarizes the<br />
available tools to predict mutational effects on protein stability <strong>and</strong> activity using different<br />
methods. I-Mutant2.0 [109] <strong>and</strong> MUpro [110] are SVM based methods to predict stabilizing<br />
or destabilizing amino acid substitutions based on free energy change (ΔΔG). iPTREE-STAB<br />
[111] server employ a DT approach to predict the effect <strong>of</strong> single mutation on protein<br />
stability considering physicochemical properties <strong>and</strong> contact information <strong>of</strong> the substituted<br />
23
PART I: CAPDE<br />
amino acid with their neighboring amino acids. WET-STAB [112] server performs a similar<br />
prediction with an additional feature to predict protein stability changes upon double<br />
mutations from amino acid sequence. ProMAYA [113] uses RF machine learning algorithm<br />
to predict protein stability based on free energy difference. MuD (Mutation detector) uses<br />
the same algorithm for the classification <strong>of</strong> amino acid substitutions as neutral or<br />
deleterious by taking into account structure- <strong>and</strong> sequence-based features as solvent<br />
accessibility, binding site, sequence identity.[114] SDM (Site Directed Mutator) [115] <strong>and</strong><br />
PopMuSic2.1 [116] are statistical derived force field potential based methods for protein<br />
stability prediction using relative free energy differences. In PopMuSic2.1 [116], however,<br />
the parameters <strong>of</strong> statistical derived force field potential depend on protein solvent<br />
accessibility. FoldX plugin [117] <strong>and</strong> PEAT-SA [118] program suite utilize empirical force<br />
field to calculate, from three-dimensional protein or peptides structures, the relative free<br />
energy difference determined by the changes <strong>of</strong> interactions in the mutated structures.<br />
CUPSAT [119] estimates the effect <strong>of</strong> mutations on the protein stability using protein<br />
environment specific mean force potentials. The potentials are derived from statistical<br />
analysis <strong>of</strong> protein structure data sets. AUTO-MUTE [120,121] provides either energy based<br />
or machine learning methods for the prediction <strong>of</strong> protein stability by providing protein<br />
structure, mutation <strong>and</strong> experimental condition. SIFT (Sorts Intolerant From Tolerant)<br />
[122] server helps to explore the effect <strong>of</strong> mutation on protein function using sequence<br />
homology approach. The multiple alignment information is used to identify tolerated <strong>and</strong><br />
deleterious substitutions in the query sequence.<br />
A quantitative in-silico screening <strong>of</strong> the virtual libraries based on the cooperative<br />
effect <strong>of</strong> multiple mutations to the stability <strong>and</strong> functionality is still out <strong>of</strong> reach. However,<br />
the current methods allow a qualitative indication <strong>of</strong> possible mutation sites that can<br />
increase the chances to get higher population <strong>of</strong> stable <strong>and</strong> functionally active variants in<br />
the library. The available knowledge <strong>of</strong> mutational effects on protein provided by all these<br />
CAPDE approaches help to limit library size <strong>and</strong> focus to generate unpredictable<br />
substitutions that may lead to large effects. These libraries based on in-silico screening<br />
generally show a higher success rate when the starting protein has sufficient stability.<br />
24
PART I: CAPDE<br />
Table 1.4: Summarizing the computational tools to analyze the mutational effect on<br />
protein stability <strong>and</strong> activity.<br />
Approach Name Description URL<br />
SVM<br />
I-Mutant2.0<br />
[109]<br />
MUpro [110]<br />
The web server predicts<br />
protein stability change<br />
upon point mutation.<br />
http://folding.uib.es/imutant/i-mutant2.0.html<br />
http://mupro.proteomics.ic<br />
s.uci.edu/<br />
The web server predicts<br />
iPTREE-STAB<br />
protein stability change<br />
http://210.60.98.19/IPTRE<br />
[111]<br />
with residues<br />
Er/iptree.htm<br />
Decision tree<br />
information.<br />
(DT)<br />
The web server predicts<br />
WET-STAB [112]<br />
protein stability change<br />
upon double mutation<br />
with residue<br />
http://210.60.98.19/WETr<br />
/wet.htm<br />
information.<br />
R<strong>and</strong>om<br />
forests (RF)<br />
ProMAYA [113]<br />
MuD [114]<br />
The web server predicts<br />
mutational effect on<br />
protein function.<br />
http://bental.tau.ac.il/Pro<br />
Maya/<br />
http://mud.tau.ac.il/<br />
Statistical<br />
potential<br />
SDM [115]<br />
The web server predicts<br />
mutational effect on<br />
protein stability.<br />
http://mordred.bioc.cam.ac<br />
.uk/sdm/sdm.php<br />
based<br />
method<br />
PopMuSic2.1<br />
[116]<br />
The web server predicts<br />
thermodynamic stability<br />
change upon mutation.<br />
http://babylone.ulb.ac.be/<br />
popmusic/<br />
Empirical<br />
force field<br />
FoldX [117]<br />
The plugin predicts<br />
mutational effect on<br />
http://foldx.crg.es/<br />
25
PART I: CAPDE<br />
protein <strong>and</strong> facilitates in-<br />
silico alanine screening,<br />
mutant homology<br />
modeling <strong>and</strong><br />
interaction energy<br />
calculation.<br />
The program suite<br />
PEAT-SA [118]<br />
predict mutational effect<br />
on protein stability,<br />
lig<strong>and</strong> affinity <strong>and</strong> pKa<br />
http://enzyme.ucd.ie/PEA<br />
TSA/Pages/FrontPage.php<br />
values.<br />
The web server predicts<br />
CUPSAT [119]<br />
mutational effect on<br />
http://cupsat.tu-bs.de/<br />
protein stability.<br />
The web server predicts<br />
RF, SVM, Tree<br />
<strong>and</strong> SVM<br />
regression<br />
AUTO-MUTE<br />
[121]<br />
mutational effect on<br />
protein stability <strong>and</strong><br />
activity (up to 19<br />
http://proteins.gmu.edu/a<br />
utomute/<br />
mutations).<br />
Evolutionary<br />
conservation<br />
SIFT [122]<br />
The web server predicts<br />
mutational effect on<br />
protein function.<br />
http://sift.jcvi.org/<br />
1.7. Summary <strong>and</strong> outlook<br />
In this chapter, the recent additions to the CAPDE arsenal <strong>of</strong> computational tools,<br />
servers <strong>and</strong> databases have been briefly reviewed. The rapid accumulation <strong>of</strong> the<br />
knowledge on protein structures <strong>and</strong> sequence-structure-function relationships foresee<br />
the continuous amelioration <strong>of</strong> these methods. In particular, machine-learning approaches,<br />
26
PART I: CAPDE<br />
in which the volume <strong>of</strong> data is the heuristic key to access the hidden knowledge, statistical<br />
based force fields for coarse-grained approaches will surely benefit this trend. These<br />
approaches are not only the convenient aids to support lab experiments but also the<br />
workbench for heuristically blueprinting novel molecules. In addition, the availability <strong>of</strong> the<br />
low cost <strong>and</strong> high performance computers will soon transform currently expensive<br />
physically based simulations to the convenient <strong>and</strong> very accurate high throughput<br />
computational tools. This will make possible to predict structural stability <strong>and</strong> folds <strong>of</strong><br />
small or medium sized proteins <strong>and</strong> will open a new working style paradigm in protein<br />
engineering. In addition, the physical based approach has already shown promising results<br />
to underst<strong>and</strong> enzyme activity.[123,124]<br />
1.8. References<br />
1. Bornscheuer UT, Huisman GW, Kazlauskas RJ, Lutz S, Moore JC, et al. (2012)<br />
Engineering the third wave <strong>of</strong> biocatalysis. Nature 485: 185-194.<br />
2. Lutz S (2010) Beyond directed evolution-semi-rational protein engineering <strong>and</strong><br />
design. Curr Opin Biotech 21: 734-743.<br />
3. Gerlt JA, Babbitt PC (2009) Enzyme (re)design: lessons from natural evolution <strong>and</strong><br />
computation. Curr Opin Chem Biol 13: 10-18.<br />
4. Jackel C, Kast P, Hilvert D (2008) Protein design by directed evolution. Annu Rev<br />
Biophys 37: 153-173.<br />
5. Damborsky J, Brezovsky J (2009) Computational tools for designing <strong>and</strong> engineering<br />
biocatalysts. Curr Opin Chem Biol 13: 26-34.<br />
6. Suarez M, Jaramillo A (2009) Challenges in the computational design <strong>of</strong> proteins. J R<br />
Soc Interface 6 (Suppl 4): S477–S491.<br />
7. Pantazes RJ, Grisewood MJ, Maranas CD (2011) Recent advances in computational<br />
protein design. Curr Opin Struct Biol 21: 467-472.<br />
8. Dror RO, Dirks RM, Grossman JP, Xu H, Shaw DE (2012) Biomolecular simulation: a<br />
computational microscope for molecular biology. Annu Rev Biophys 41: 429-452.<br />
27
PART I: CAPDE<br />
9. Lee EH, Hsin J, Sotomayor M, Comellas G, Schulten K (2009) Discovery through the<br />
computational microscope. Structure 17: 1295-1306.<br />
10. Schlick T, Collepardo-Guevara R, Halvorsen LA, Jung S, Xiao X (2011)<br />
Biomolecularmodeling <strong>and</strong> simulation: a field coming <strong>of</strong> age. Q Rev Biophys 44: 191-<br />
228.<br />
11. McGeagh JD, Ranaghan KE, Mulholl<strong>and</strong> AJ (2011) Protein dynamics <strong>and</strong> enzyme<br />
catalysis: Insights from simulations. BBA-Proteins Proteom 1814: 1077-1092.<br />
12. Klepeis JL, Lindorff-Larsen K, Dror RO, Shaw DE (2009) Long-timescale molecular<br />
dynamics simulations <strong>of</strong> protein structure <strong>and</strong> function. Curr Opin Struct Biol 19:<br />
120-127.<br />
13. Barrozo A, Borstnar R, Marloie Gl, Kamerlin SCL (2012) Computational protein<br />
engineering: bridging the gap between rational design <strong>and</strong> laboratory evolution. Int<br />
J Mol Sci 13: 12428-12460.<br />
14. Frushicheva MP, Cao J, Warshel A (2011) Challenges <strong>and</strong> advances in validating<br />
enzyme design proposals: the case <strong>of</strong> kemp eliminase catalysis. Biochemistry 50:<br />
3849-3858.<br />
15. Frushicheva MP, Warshel A (2012) Towards quantitative computer-aided studies <strong>of</strong><br />
enzymatic enantioselectivity: the case <strong>of</strong> C<strong>and</strong>ida antarctica lipase A. Chembiochem<br />
13: 215-223.<br />
16. van der Kamp MW, Mulholl<strong>and</strong> AJ (2008) Computational enzymology: insight into<br />
biological catalysts from modelling. Nat Prod Rep 25: 1001-1014.<br />
17. Turner NJ (2009) Directed evolution drives the next generation <strong>of</strong> biocatalysts. Nat<br />
Chem Biol 5: 567-573.<br />
18. Arnold FH, Moore JC (1997) Optimizing industrial enzymes by directed evolution.<br />
Adv Biochem Eng Biotechnol 58: 1-14.<br />
19. Tracewell CA, Arnold FH (2009) Directed enzyme evolution: climbing fitness peaks<br />
one amino acid at a time. Curr Opin Chem Biol 13: 3-9.<br />
20. Wong TS, Roccatano D, Schwaneberg U (2007) Steering directed protein evolution:<br />
strategies to manage combinatorial complexity <strong>of</strong> mutant libraries. Environ<br />
Microbiol 9: 2645-2659.<br />
28
PART I: CAPDE<br />
21. Chica RA, Doucet N, Pelletier JN (2005) Semi-rational approaches to engineering<br />
enzyme activity: combining the benefits <strong>of</strong> directed evolution <strong>and</strong> rational design.<br />
Curr Opin Biotech 16: 378-384.<br />
22. Kazlauskas RJ, Bornscheuer UT (2009) Finding better protein engineering<br />
strategies. Nat Chem Biol 5: 526-529.<br />
23. Romero PA, Arnold FH (2009) Exploring protein fitness l<strong>and</strong>scapes by directed<br />
evolution. Nat Rev Mol Cell Biol 10: 866-876.<br />
24. Tokuriki N, Tawfik DS (2009) Stability effects <strong>of</strong> mutations <strong>and</strong> protein evolvability.<br />
Curr Opin Struct Biol 19: 596-604.<br />
25. Wong TS, Roccatano D, Zacharias M, Schwaneberg U (2006) A statistical analysis <strong>of</strong><br />
r<strong>and</strong>om mutagenesis methods used for directed protein evolution. J Mol Biol 355:<br />
858-871.<br />
26. Shivange AV, Marienhagen J, Mundhada H, Schenk A, Schwaneberg U (2009)<br />
Advances in generating functional diversity for directed protein evolution. Curr<br />
Opin Chem Biol 13: 19-25.<br />
27. Verma R, Schwaneberg U, Roccatano D (2012) MAP2.03D: a sequence/structure<br />
based server for protein engineering. ACS Synth Biol 1: 139-150.<br />
28. Firth AE, Patrick WM (2008) GLUE-IT <strong>and</strong> PEDEL-AA: new programmes for<br />
analyzing protein diversity in r<strong>and</strong>omized libraries. Nucleic Acids Res 36: W281-<br />
W285.<br />
29. Patrick WM, Matsumura I (2008) A study in molecular contingency: glutamine<br />
phosphoribosylpyrophosphate amidotransferase is a promiscuous <strong>and</strong> evolvable<br />
phosphoribosylanthranilate isomerase. J Mol Biol 377: 323-336.<br />
30. Nov Y (2011) When second best is good enough: another probabilistic look at<br />
saturation mutagenesis. Appl Environ Microbiol 78: 258-262.<br />
31. Rasila TS, Pajunen MI, Savilahti H (2009) Critical evaluation <strong>of</strong> r<strong>and</strong>om mutagenesis<br />
by error-prone polymerase chain reaction protocols, Escherichia coli mutator strain,<br />
<strong>and</strong> hydroxylamine treatment. Anal Biochem 388: 71-80.<br />
32. Ruff AJ, Marienhagen J, Verma R, Roccatano D, Genieser H-G, et al. (2012) dRTP <strong>and</strong><br />
dPTP a complementary nucleotide couple for the Sequence Saturation Mutagenesis<br />
(SeSaM) method. J Mol Catal B-Enzym 84: 40-47.<br />
29
PART I: CAPDE<br />
33. Jmol: an open-source Java viewer for chemical structures in 3D.<br />
http://www.jmol.org/<br />
34. Pei J (2008) Multiple protein sequence alignment. Curr Opin Struct Biol 18: 382-386.<br />
35. Ashkenazy H, Erez E, Martz E, Pupko T, Ben-Tal N (2010) ConSurf 2010: calculating<br />
evolutionary conservation in sequence <strong>and</strong> structure <strong>of</strong> proteins <strong>and</strong> nucleic acids.<br />
Nucleic Acids Res 38: W529-533.<br />
36. Goldenberg O, Erez E, Nimrod G, Ben-Tal N (2009) The ConSurf-DB: pre-calculated<br />
evolutionary conservation pr<strong>of</strong>iles <strong>of</strong> protein structures. Nucleic Acids Res 37:<br />
D323-D327.<br />
37. Kuipers RK, Joosten H-J, van Berkel WJH, Leferink NGH, Rooijen E, et al. (2010) 3DM:<br />
Systematic analysis <strong>of</strong> heterogeneous superfamily data to discover protein<br />
functionalities. Proteins 78: 2101-2113.<br />
38. Engelen S, Trojan LA, Sacquin-Mora S, Lavery R, Carbone A (2009) Joint<br />
Evolutionary Trees: a large-scale method to predict protein interfaces based on<br />
sequence sampling. PLoS Comput Biol 5: e1000267.<br />
39. Guney E, Tuncbag N, Keskin O, Gursoy A (2008) HotSprint: database <strong>of</strong><br />
computational hot spots in protein interfaces. Nucleic Acids Res 36: D662-D666.<br />
40. Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N (2002) Rate4Site: an algorithmic<br />
tool for the identification <strong>of</strong> functional regions in proteins by surface mapping <strong>of</strong><br />
evolutionary determinants within their homologues. Bioinformatics 18: S71-S77.<br />
41. Pavelka A, Chovancova E, Damborsky J (2009) HotSpot Wizard: a web server for<br />
identification <strong>of</strong> hot spots in protein engineering. Nucleic Acids Res 37: W376-<br />
W383.<br />
42. Stern A, Doron-Faigenboim A, Erez E, Martz E, Bacharach E, et al. (2007) Selecton<br />
2007: advanced models for detecting positive <strong>and</strong> purifying selection using a<br />
Bayesian inference approach. Nucleic Acids Res 35: W506-W511.<br />
43. Pleiss Jr, Fischer M, Peiker M, Thiele C, Rolf D (2000) Lipase Engineering Database:<br />
underst<strong>and</strong>ing <strong>and</strong> exploiting sequence-structure-function relationships. J Mol Catal<br />
B-Enzym 10: 491-508.<br />
30
PART I: CAPDE<br />
44. Knoll M, Hamm TM, Wagner F, Martinez V, Pleiss J (2009) The PHA Depolymerase<br />
Engineering Database: a systematic analysis tool for the diverse family <strong>of</strong><br />
polyhydroxyalkanoate (PHA) depolymerases. BMC Bioinformatics 10: 89.<br />
45. Sirim D, Wagner F, Wang L, Schmid RD, Pleiss J (2010) The Laccase Engineering<br />
Database: a classification <strong>and</strong> analysis system for laccases <strong>and</strong> related multicopper<br />
oxidases. Database 2011: bar006.<br />
46. Thai QK, Bos F, Pleiss J (2009) The Lactamase Engineering Database: a critical<br />
survey <strong>of</strong> TEM sequences in public databases. BMC Genomics 10: 390.<br />
47. Thai QK, Pleiss J (2010) SHV Lactamase Engineering Database: a reconciliation tool<br />
for SHV beta-lactamases in public databases. BMC Genomics 11: 563.<br />
48. Kawabata T, Ota M, Nishikawa K (1999) The Protein Mutant Database. Nucleic Acids<br />
Res 27: 355-357.<br />
49. Gromiha MM, Uedaira H, An J, Selvaraj S, Prabakaran P, et al. (2002) ProTherm,<br />
thermodynamic database for proteins <strong>and</strong> mutants: developments in version 3.0.<br />
Nucleic Acids Res 30: 301-302.<br />
50. Gromiha MM, An J, Kono H, Oobatake M, Uedaira H, et al. (2000) ProTherm, version<br />
2.0: thermodynamic database for proteins <strong>and</strong> mutants. Nucleic Acids Res 28: 283-<br />
285.<br />
51. Bava KA, Gromiha MM, Uedaira H, Kitajima K, Sarai A (2004) ProTherm, version 4.0:<br />
thermodynamic database for proteins <strong>and</strong> mutants. Nucleic Acids Res 32: D120-121.<br />
52. Braun A, Halwachs B, Geier M, Weinh<strong>and</strong>l K, Guggemos M, et al. (2012) MuteinDB:<br />
the mutein database linking substrates, products <strong>and</strong> enzymatic reactions directly<br />
with genetic variants <strong>of</strong> enzymes. Database 2012.<br />
53. Kourist R, Jochens H, Bartsch S, Kuipers R, Padhi SK, et al. (2010) The alpha/betahydrolase<br />
fold 3DM database (ABHDB) as a tool for protein engineering.<br />
Chembiochem 11: 1635-1643.<br />
54. Fischer M, Pleiss J (2003) The Lipase Engineering Database: a navigation <strong>and</strong><br />
analysis tool for protein families. Nucleic Acids Res 31: 319-321.<br />
55. Widmann M, Juhl PB, Pleiss J (2010) Structural classification by the Lipase<br />
Engineering Database: a case study <strong>of</strong> C<strong>and</strong>ida antarctica lipase A. BMC Genomics<br />
11: 123.<br />
31
PART I: CAPDE<br />
56. Barth S, Fischer M, Schmid RD, Pleiss J (2004) The database <strong>of</strong> epoxide hydrolases<br />
<strong>and</strong> haloalkane dehalogenases: one structure, many functions. Bioinformatics 20:<br />
2845-2847.<br />
57. Sirim D, Wagner F, Lisitsa A, Pleiss J (2009) The cytochrome P450 engineering<br />
database: Integration <strong>of</strong> biochemical properties. BMC Biochem 10: 27.<br />
58. Gong S, Worth CL, Bickerton GR, Lee S, Tanramluk D, et al. (2009) Structural <strong>and</strong><br />
functional restraints in the evolution <strong>of</strong> protein families <strong>and</strong> superfamilies. Biochem<br />
Soc Trans 37: 727-733.<br />
59. Wass MN, Kelley LA, Sternberg MJ (2010) 3DLig<strong>and</strong>Site: predicting lig<strong>and</strong>-binding<br />
sites using similar structures. Nucleic Acids Res 38: W469-473.<br />
60. Konc J, Janezic D (2010) ProBiS algorithm for detection <strong>of</strong> structurally similar<br />
protein binding sites by local structural alignment. Bioinformatics 26: 1160-1168.<br />
61. Konc J, Janezic D (2012) ProBiS-2012: web server <strong>and</strong> web services for detection <strong>of</strong><br />
structurally similar binding sites in proteins. Nucleic Acids Res 40: W214-221.<br />
62. Lin Y, Yoo S, Sanchez R (2012) SiteComp: a server for lig<strong>and</strong> binding site analysis in<br />
protein structures. Bioinformatics.<br />
63. Liang J, Tseng YY, Dundas J, Binkowski TA, Joachimiak A, et al. (2008) Predicting <strong>and</strong><br />
characterizing protein functions through matching geometric <strong>and</strong> evolutionary<br />
patterns <strong>of</strong> binding surfaces. Adv Protein Chem Struct Biol 75: 107-141.<br />
64. Konc J, Cesnik T, Konc JT, Penca M, Janezic D (2012) ProBiS-database: precalculated<br />
binding site similarities <strong>and</strong> local pairwise alignments <strong>of</strong> PDB structures. J Chem Inf<br />
Model 52: 604-612.<br />
65. Prokop M, Damborsky J, Koca J (2000) TRITON: in silico construction <strong>of</strong> protein<br />
mutants <strong>and</strong> prediction <strong>of</strong> their activities. Bioinformatics 16: 845-846.<br />
66. Prokop M, Adam J, Kriz Z, Wimmerova M, Koca J (2008) TRITON: a graphical tool for<br />
lig<strong>and</strong>-binding protein engineering. Bioinformatics 24: 1955-1956.<br />
67. Sanchez-Ruiz JM (2010) Protein kinetic stability. Biophys Chem 148: 1-15.<br />
68. Tina KG, Bhadra R, Srinivasan N (2007) PIC: Protein Interactions Calculator. Nucleic<br />
Acids Res 35: W473-476.<br />
32
PART I: CAPDE<br />
69. Vangone A, Spinelli R, Scarano V, Cavallo L, Oliva R (2011) COCOMAPS: a web<br />
application to analyse <strong>and</strong> visualize contacts at the interface <strong>of</strong> biomolecular<br />
complexes. Bioinformatics.<br />
70. Tan KP, Varadarajan R, Madhusudhan MS (2011) DEPTH: a web server to compute<br />
depth <strong>and</strong> predict small-molecule binding cavities in proteins. Nucleic Acids Res.<br />
71. Magyar C, Gromiha MM, Pujadas G, Tusnady GE, Simon I (2005) SRide: a server for<br />
identifying stabilizing residues in proteins. Nucleic Acids Res 33: W303-305.<br />
72. Shazman S, Celniker G, Haber O, Glaser F, M<strong>and</strong>el-Gutfreund Y (2007) Patch Finder<br />
Plus (PFplus): a web server for extracting <strong>and</strong> displaying positive electrostatic<br />
patches on protein surfaces. Nucleic Acids Res 35: W526-W530.<br />
73. Choi YS, Han SK, Kim J, Yang J-S, Jeon J, et al. (2010) ConPlex: a server for the<br />
evolutionary conservation analysis <strong>of</strong> protein complex structures. Nucleic Acids Res<br />
38: W450-W456.<br />
74. Teilum K, Olsen JG, Kragelund BB (2011) Protein stability, flexibility <strong>and</strong> function.<br />
Biochim Biophys Acta 1814: 969-976.<br />
75. Teilum K, Olsen JG, Kragelund BB (2009) Functional aspects <strong>of</strong> protein flexibility.<br />
Cell Mol Life Sci 66: 2231-2247.<br />
76. Henzler-Wildman K, Kern D (2007) Dynamic personalities <strong>of</strong> proteins. Nature 450:<br />
964-972.<br />
77. Mittermaier AK, Kay LE (2009) Observing biological dynamics at atomic resolution<br />
using NMR. Trends Biochem Sci 34: 601-611.<br />
78. Martinez R, Schwaneberg U, Roccatano D (2011) Temperature effects on structure<br />
<strong>and</strong> dynamics <strong>of</strong> the psychrophilic protease subtilisin S41 <strong>and</strong> its thermostable<br />
mutants in solution. Protein Eng Des Sel 24: 533-544.<br />
79. Ma B, Nussinov R (2010) Enzyme dynamics point to stepwise conformational<br />
selection in catalysis. Curr Opin Chem Biol 14: 652-659.<br />
80. Zhang H, Zhang T, Chen K, Shen SY, Ruan JS, et al. (2009) On the relation between<br />
residue flexibility <strong>and</strong> local solvent accessibility in proteins. Proteins 76: 617-636.<br />
81. Lauck F, Smith CA, Friedl<strong>and</strong> GF, Humphris EL, Kortemme T (2010) RosettaBackruba<br />
web server for flexible backbone protein structure modeling <strong>and</strong> design. Nucleic<br />
Acids Res 38: W569-W575.<br />
33
PART I: CAPDE<br />
82. M<strong>and</strong>ell DJ, Kortemme T (2009) Backbone flexibility in computational protein<br />
design. Curr Opin Biotech 20: 420-428.<br />
83. Seeliger D, Haas Jr, de Groot BL (2007) Geometry-based sampling <strong>of</strong> conformational<br />
transitions in proteins. Structure 15: 1482-1492.<br />
84. Kuznetsov IB, McDuffie M (2008) FlexPred: a web-server for predicting residue<br />
positions involved in conformational switches in proteins. Bioinformation 3: 134-<br />
136.<br />
85. Bahar I, Lezon TR, Yang L-W, Eyal E (2010) Global dynamics <strong>of</strong> proteins: bridging<br />
between structure <strong>and</strong> function. Ann Rev Biophys 39: 23-42.<br />
86. Bahar I, Rader AJ (2005) Coarse-grained normal mode analysis in structural biology.<br />
Curr Opin Struct Biol 15: 586-592.<br />
87. Kamerlin SCL, Vicatos S, Dryga A, Warshel A (2011) Coarse-grained (multiscale)<br />
simulations in studies <strong>of</strong> biophysical <strong>and</strong> chemical systems. Annu Rev Phys Chem<br />
62: 41-64.<br />
88. Bahar I, Atilgan AR, Erman B (1997) Direct evaluation <strong>of</strong> thermal fluctuations in<br />
proteins using a single-parameter harmonic potential. Fold Des 2: 173-181.<br />
89. Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Keskin O, et al. (2001) Anisotropy <strong>of</strong><br />
fluctuation dynamics <strong>of</strong> proteins with an elastic network model. Biophys J 80: 505-<br />
515.<br />
90. Skjaerven L, Hollup SM, Reuter N (2009) Normal mode analysis for proteins. J Mol<br />
Struc- Theochem 898: 42-48.<br />
91. Liu X, Karimi HA (2007) High-throughput modeling <strong>and</strong> analysis <strong>of</strong> protein<br />
structural dynamics. Brief Bioinform 8: 432-445.<br />
92. Suhre K, Sanejou<strong>and</strong> Y-H (2004) ElNemo: a normal mode web server for protein<br />
movement analysis <strong>and</strong> the generation <strong>of</strong> templates for molecular replacement.<br />
Nucleic Acids Res 32: W610-W614.<br />
93. Hollup S, Salensminde G, Reuter N (2005) WEBnm@: a web application for normal<br />
mode analyses <strong>of</strong> proteins. BMC Bioinformatics 6: 52.<br />
94. Camps J, Carrillo O, Emperador A, Orellana L, Hospital A, et al. (2009) FlexServ: an<br />
integrated tool for the analysis <strong>of</strong> protein flexibility. Bioinformatics 25: 1709-1710.<br />
34
PART I: CAPDE<br />
95. Emekli U, Schneidman-Duhovny D, Wolfson HJ, Nussinov R, Haliloglu T (2008)<br />
HingeProt: automated prediction <strong>of</strong> hinges in protein structures. Proteins 70: 1219-<br />
1227.<br />
96. Hayward S, Berendsen HJC (1998) Systematic analysis <strong>of</strong> domain motions in<br />
proteins from conformational change: New results on citrate synthase <strong>and</strong> T4<br />
lysozyme. Proteins 30: 144-154.<br />
97. Qi GY, Hayward S (2009) Database <strong>of</strong> lig<strong>and</strong>-induced domain movements in<br />
enzymes. BMC Struct Biol 9.<br />
98. Poornam GP, Matsumoto A, Ishida H, Hayward S (2009) A method for the analysis <strong>of</strong><br />
domain movements in large biomolecular complexes. Proteins 76: 201-212.<br />
99. Glowacki DR, Harvey JN, Mulholl<strong>and</strong> AJ (2012) Taking Ockham's razor to enzyme<br />
dynamics <strong>and</strong> catalysis. Nature Chemistry 4: 169-176.<br />
100. Pisliakov AV, Cao J, Kamerlin SCL, Warshel A (2009) Enzyme millisecond<br />
conformational dynamics do not catalyze the chemical step. Proc Natl Acad Sci USA<br />
106: 17359-17364.<br />
101. Roca M, Vardi-Kilshtain A, Warshel A (2009) Toward accurate screening in<br />
computer-aided enzyme design. Biochemistry 48: 3046-3056.<br />
102. Gromiha MM, Pujadas G, Magyar C, Selvaraj S, Simon I (2004) Locating the<br />
stabilizing residues in (α/β)8 barrel proteins based on hydrophobicity, long-range<br />
interactions, <strong>and</strong> sequence conservation. Proteins 55: 316-329.<br />
103. Davis IW, Arendall WB, 3rd, Richardson DC, Richardson JS (2006) The backrub<br />
motion: how protein backbone shrugs when a sidechain dances. Structure 14: 265-<br />
274.<br />
104. Humphris EL, Kortemme T (2008) Prediction <strong>of</strong> Protein-Protein Interface Sequence<br />
Diversity Using Flexible Backbone Computational Protein Design. Structure 16:<br />
1777-1788.<br />
105. Kuznetsov IB (2008) Ordered conformational change in the protein backbone:<br />
prediction <strong>of</strong> conformationally variable positions from sequence <strong>and</strong> low-resolution<br />
structural data. Proteins 72: 74-87.<br />
106. Bloom JD, Arnold FH (2009) In the light <strong>of</strong> directed evolution: pathways <strong>of</strong> adaptive<br />
protein evolution. Proc Natl Acad Sci USA 106 Suppl 1: 9995-10000.<br />
35
PART I: CAPDE<br />
107. Tokuriki N, Stricher F, Serrano L, Tawfik DS (2008) How protein stability <strong>and</strong> new<br />
functions trade <strong>of</strong>f. PLoS Comput Biol 4: e1000002.<br />
108. Saeys Y, Inza I, Larranaga P (2007) A review <strong>of</strong> feature selection techniques in<br />
bioinformatics. Bioinformatics 23: 2507-2517.<br />
109. Capriotti E, Fariselli P, Casadio R (2005) I-Mutant2.0: predicting stability changes<br />
upon mutation from the protein sequence or structure. Nucleic Acids Res 33: W306-<br />
310.<br />
110. Cheng J, R<strong>and</strong>all A, Baldi P (2006) Prediction <strong>of</strong> protein stability changes for singlesite<br />
mutations using support vector machines. Proteins 62: 1125-1132.<br />
111. Huang LT, Gromiha MM, Ho SY (2007) iPTREE-STAB: interpretable decision tree<br />
based method for predicting protein stability changes upon mutations.<br />
Bioinformatics 23: 1292-1293.<br />
112. Huang LT, Gromiha MM (2009) Reliable prediction <strong>of</strong> protein thermostability<br />
change upon double mutation from amino acid sequence. Bioinformatics 25: 2181-<br />
2187.<br />
113. Wainreb G, Wolf L, Ashkenazy H, Dehouck Y, Ben-Tal N (2011) Protein stability: a<br />
single recorded mutation aids in predicting the effects <strong>of</strong> other mutations in the<br />
same amino acid site. Bioinformatics 27: 3286-3292.<br />
114. Wainreb G, Ashkenazy H, Bromberg Y, Starovolsky-Shitrit A, Haliloglu T, et al.<br />
(2010) MuD: an interactive web server for the prediction <strong>of</strong> non-neutral<br />
substitutions using protein structural data. Nucleic Acids Res 38: W523-W528.<br />
115. Worth CL, Preissner R, Blundell TL (2011) SDM:a server for predicting effects <strong>of</strong><br />
mutations on protein stability <strong>and</strong> malfunction. Nucleic Acids Res 39: W215-W222.<br />
116. Dehouck Y, Kwasigroch J, Gilis D, Rooman M (2011) PoPMuSiC 2.1: a web server for<br />
the estimation <strong>of</strong> protein stability changes upon mutation <strong>and</strong> sequence optimality.<br />
BMC Bioinformatics 12: 151.<br />
117. Van Durme J, Delgado J, Stricher F, Serrano L, Schymkowitz J, et al. (2011) A<br />
graphical interface for the FoldX forcefield. Bioinformatics 27: 1711-1712.<br />
118. Johnston MA, Søndergaard CR, Nielsen JE (2011) Integrated prediction <strong>of</strong> the effect<br />
<strong>of</strong> mutations on multiple protein characteristics. Proteins 79: 165-178.<br />
36
PART I: CAPDE<br />
119. Parthiban V, Gromiha MM, Schomburg D (2006) CUPSAT: prediction <strong>of</strong> protein<br />
stability upon point mutations. Nucleic Acids Res 34: W239-242.<br />
120. Masso M, Vaisman II (2008) Accurate prediction <strong>of</strong> stability changes in protein<br />
mutants by combining machine learning with structure based computational<br />
mutagenesis. Bioinformatics 24: 2002-2009.<br />
121. Masso M, Vaisman II (2010) AUTO-MUTE: web-based tools for predicting stability<br />
changes in proteins due to single amino acid replacements. Protein Eng Des Sel 23:<br />
683-687.<br />
122. Kumar P, Henik<strong>of</strong>f S, Ng PC (2009) Predicting the effects <strong>of</strong> coding non-synonymous<br />
variants on protein function using the SIFT algorithm. Nat Protoc 4: 1073-1081.<br />
123. Adamczyk AJ, Cao J, Kamerlin SC, Warshel A (2011) Catalysis by dihydr<strong>of</strong>olate<br />
reductase <strong>and</strong> other enzymes arises from electrostatic preorganization, not<br />
conformational motions. Proc Natl Acad Sci USA 108: 14115-14120.<br />
124. Ishikita H, Warshel A (2008) Predicting drug-resistant mutations <strong>of</strong> HIV protease.<br />
Angew Chem Int Edit 47: 697-700.<br />
Part <strong>of</strong> this chapter is adapted with permission from ‘Verma R, Schwaneberg U,<br />
Roccatano D. Computational <strong>and</strong> Structural Biotechnology Journal 2012, 2 (3),<br />
e201209008.’<br />
37
PART I: MAP 2.0 3D<br />
Chapter 2<br />
MAP 2.0 3D: A Sequence/Structure Based Server for<br />
Protein Engineering<br />
2.1. Abstract<br />
The Mutagenesis Assistant Program (MAP) is a web-based tool to provide statistical<br />
analyses <strong>of</strong> the mutational biases <strong>of</strong> directed evolution experiments on amino acid<br />
substitution patterns. MAP analysis assists protein engineering in the benchmarking <strong>of</strong><br />
r<strong>and</strong>om mutagenesis methods that generate single nucleotide mutation in a codon. Herein,<br />
we describe a completely renewed <strong>and</strong> improved version <strong>of</strong> the MAP server, named as<br />
MAP 2.0 3D server that correlates the generated amino acid substitution patterns to the<br />
structural information <strong>of</strong> the target protein. This correlation helps to select more suitable<br />
r<strong>and</strong>om mutagenesis method with specific biases on amino acid substitution patterns. In<br />
particular, the new server represents MAP indicators on secondary <strong>and</strong> tertiary structure,<br />
<strong>and</strong> correlates them to specific structural components like hydrogen bonds, hydrophobic<br />
contacts, salt bridges, solvent accessibility <strong>and</strong> crystallographic B-factors. Three model<br />
proteins (D-amino oxidase, phytase <strong>and</strong> N-acetylneuraminic acid aldolase) are used to<br />
illustrate the novel capability <strong>of</strong> the server. MAP 2.0 3D server is available publicly at<br />
http://map.jacobs-university.de/map3d.html.<br />
38
PART I: MAP 2.0 3D<br />
2.2. Introduction<br />
Over the past two decades directed protein evolution has been proven to be a<br />
powerful algorithm to tailor protein properties through iterative rounds <strong>of</strong> r<strong>and</strong>om<br />
mutagenesis <strong>and</strong> screening for improved protein variants.[1,2] Directed evolution methods<br />
are especially useful for improving properties difficult to rationalize <strong>and</strong>, hence, to identify<br />
amino acids <strong>and</strong> protein regions that can guide to further enhancements using site directed<br />
<strong>and</strong> saturation mutagenesis methods.[3,4] The success <strong>of</strong> a directed evolution campaign<br />
depends highly on the quality <strong>of</strong> the mutant library <strong>and</strong> on the employed r<strong>and</strong>om<br />
mutagenesis method. R<strong>and</strong>om mutagenesis methods are based on specific error prone<br />
polymerase (enzymatic methods), DNA modifying chemicals (e.g. nitrous acid) or mutator<br />
strains (e.g. Escherichia coli mutA).[5] The quality <strong>of</strong> a mutant library is determined by the<br />
generated genetic diversity <strong>and</strong> corresponding protein sequence space.[6] Since the<br />
number <strong>of</strong> muteins boost with the increasing number <strong>of</strong> amino acid exchanged in the<br />
protein, protein engineers are challenged with an astronomically vast sequence space[7].<br />
Despite advances in high-throughput screening, it is very difficult to screen the<br />
theoretically generated diversity even in the case <strong>of</strong> a small protein sequence.[8,9]<br />
Therefore, generating high quality mutant libraries enriched with functional trait is <strong>of</strong> high<br />
importance. To deal with the challenge to access <strong>and</strong> screen such a large sequence space,<br />
protein engineers usually adopt two strategic approaches.[10-12] The first approach<br />
consists in the r<strong>and</strong>om mutagenesis <strong>of</strong> the target protein <strong>and</strong> the subsequent identification<br />
<strong>of</strong> ‘mutagenic hot spots’. R<strong>and</strong>om mutagenesis can be followed by recombination <strong>of</strong> the<br />
best variants by site directed mutagenesis or saturation mutagenesis.[13] The second<br />
approach involves the identification <strong>of</strong> a subset <strong>of</strong> specific residues using rational or semirational<br />
design with the help <strong>of</strong> computational tools.[14] Up to five amino acid positions<br />
can be efficiently targeted with focused mutagenesis methods allowing the generation <strong>of</strong><br />
focused mutant libraries <strong>of</strong> a number <strong>of</strong> variants that can be screened with the state <strong>of</strong> the<br />
art in flow cytometry methods.[13] Focused mutagenesis is normally employed to improve<br />
the properties <strong>of</strong> target protein such as activity or selectivity, by mutating residues in close<br />
39
PART I: MAP 2.0 3D<br />
proximity to a specific protein region like the active site. In this case, r<strong>and</strong>om mutagenesis<br />
methods are complementary to the rational design since they can identify important amino<br />
acid positions, especially in the second <strong>and</strong> third coordination sphere, which would have<br />
been overlooked rationally. Nevertheless, r<strong>and</strong>om mutagenesis methods are biased toward<br />
certain nucleotide exchanges (e.g. many epPCR methods prefer transition mutations). The<br />
mutagenic preferences resulted by biased r<strong>and</strong>om mutagenesis methods affect the<br />
generated diversity. The analysis <strong>of</strong> the effects <strong>of</strong> mutational bias on the amino acid<br />
diversity provides a useful indicator in the selection <strong>of</strong> the mutagenesis method with<br />
diverse <strong>and</strong> complementary amino acid substitution patterns. The generated<br />
complementary mutant libraries extend the sampling <strong>of</strong> the vast protein space <strong>and</strong><br />
enhance the chance to obtain improved variants.[15,16]<br />
Recently, we have introduced a freely available web-based statistical analysis tool<br />
(MAP[17]). The server statistically analyzes the effects <strong>of</strong> mutational bias <strong>of</strong> 19 different<br />
r<strong>and</strong>om mutagenesis methods on the level <strong>of</strong> amino acid substitutions for a given<br />
nucleotide sequence <strong>of</strong> the target protein. The analysis is returned in terms <strong>of</strong> MAP<br />
indicators that allow a rapid comparison <strong>of</strong> different r<strong>and</strong>om mutagenesis methods on the<br />
protein level. It has been shown that this approach can be used to predict the type, extent,<br />
<strong>and</strong> chemical nature <strong>of</strong> the genetic diversity generated by different mutagenesis<br />
methods.[17,18] Recently, Rasila <strong>and</strong> co-workers[19] reported a comparative evolution <strong>of</strong><br />
commonly used r<strong>and</strong>om mutagenesis methods. They found the experimentally induced<br />
substitution patterns very similar to those obtained by MAP server <strong>and</strong> suggested the use<br />
<strong>of</strong> combination <strong>of</strong> mutagenesis methods to generate high diversity.[19]<br />
One <strong>of</strong> the limitations <strong>of</strong> the original MAP server is the absence <strong>of</strong> the analysis tools<br />
relating the MAP indicators to the structural properties <strong>of</strong> the target protein. The nature <strong>of</strong><br />
the amino acid change in different region <strong>of</strong> the protein can affects its global <strong>and</strong> local<br />
structural <strong>and</strong> thermodynamics properties.[14,20,21] Therefore, the possibility to<br />
correlate the generated diversity with structural properties help to identify in advance the<br />
r<strong>and</strong>om mutagenesis method that has the least number <strong>of</strong> “deleterious” mutations on the<br />
protein stability <strong>and</strong> the higher probability to introduce amino acid substitutions that may<br />
40
PART I: MAP 2.0 3D<br />
improve the fitness toward an expected property, e.g. substitutions to charged amino acid<br />
residues to increase solubility in water. For this reason, we have exp<strong>and</strong>ed the capability <strong>of</strong><br />
the server by introducing these new features. The new server (MAP 2.0 3D) can correlate the<br />
mutational propensity at amino acid level <strong>of</strong> a gene for 19 r<strong>and</strong>om mutagenesis methods<br />
(<strong>and</strong> now also for a user customized r<strong>and</strong>om mutagenesis method) with the<br />
crystallographic or homology modeled structure (if available in Protein Data Bank[22]<br />
format) <strong>of</strong> the target protein. MAP 2.0 3D analyses the three-dimensional structure <strong>of</strong> the<br />
target proteins by calculating secondary structure elements, important local interactions<br />
(such as hydrogen bonds, hydrophobic contacts, salt bridges, disulphide bridges, solvent<br />
accessibility), <strong>and</strong> amino acid motilities from the crystallographic B-factors. These<br />
combined information help to identify biased amino acid substitutions that may improve<br />
stability <strong>and</strong> function <strong>of</strong> the protein.[23-25]<br />
To correlate the sequence-based analysis to the structural data analysis, a new<br />
indicator, the residue mutability indicator ‘µ’ (amino acid substitution probability leading<br />
to amino acid change at specific position), has been introduced (see Methods). The<br />
mutability indicator allows a rapid identification <strong>of</strong> mutagenic hot spots <strong>and</strong>, more easy<br />
comparison <strong>of</strong> experimental data to the predicted ones.<br />
This chapter illustrates the new features <strong>of</strong> MAP 2.0 3D server in detail, performing<br />
the analysis on three model proteins. The results <strong>of</strong> the MAP 2.0 3D analysis are compared<br />
with the results <strong>of</strong> protein engineering experiments reported in the literature. The three<br />
examples show possible uses <strong>of</strong> the server for computational pre-screening <strong>of</strong> the target<br />
protein to evaluate <strong>and</strong> select mutagenesis method for directed evolution.<br />
2.3. Methods<br />
2.3.1. Mutational probability <strong>and</strong> statistics<br />
41
PART I: MAP 2.0 3D<br />
The MAP 2.0 3D server performs statistical analysis on a given nucleotide sequence<br />
based on the mutational spectra <strong>of</strong> different r<strong>and</strong>om mutagenesis methods that were<br />
slightly elaborated as follow to be used in the analysis.[17] First, insertions <strong>and</strong> deletions<br />
with an occurrence frequency between 0.80 % <strong>and</strong> 13.9 % were neglected <strong>and</strong> remaining<br />
nucleotide substitution frequencies were scaled proportionally to 100 %. Second,<br />
mutations in upper <strong>and</strong> lower DNA str<strong>and</strong> were considered to occur with equal frequency.<br />
The scaled mutational frequencies are used in the analysis to calculate the probability <strong>of</strong><br />
amino acid substitutions resulting from one nucleotide exchange in one codon <strong>of</strong> the gene.<br />
The analysis is performed as follows. Consider a gene coding for a protein <strong>of</strong> L amino acids.<br />
For each nucleotide <strong>of</strong> a codon (named as X,Y,Z) in the gene sequence, the corresponding<br />
single nucleotide substitutions X´,Y´,Z´ (with {X, Y, Z, X´, Y´, Z’ ∈ {A, T, G, C} | X´ ≠ X, Y’ ≠ Y, Z’ ≠<br />
Z}) are considered. For each one <strong>of</strong> the 19 r<strong>and</strong>om mutagenesis methods, matrix P (in<br />
equation 2.1) gives the 16 mutational probability values for the given nucleotide<br />
substitution into another three (e.g. X → X´). The values <strong>of</strong> matrix P have been already<br />
reported in Table 1 <strong>of</strong> our previous publication.[17]<br />
⎛<br />
⎜<br />
P= ⎜<br />
⎜<br />
⎜<br />
⎝<br />
A → A A → T A → G A → C<br />
T → A T → T T → G T → C<br />
G → A G → T G → G G → C<br />
C → A C → T C → G C → C<br />
⎞<br />
⎟<br />
⎟<br />
⎟<br />
⎟<br />
⎠<br />
(2.1)<br />
In equation 2.2, the binary vector U <strong>and</strong> V are then used to select a given probability<br />
(f) from the matrix P. The four elements <strong>of</strong> U <strong>and</strong> V correspond to the nucleotide (A, T, G, C)<br />
that can be selected by assigning a value <strong>of</strong> one or zero. U selects the original nucleotide<br />
<strong>and</strong> V the mutated one. In the equation 2.2, an example for the epPCR method with the Taq-<br />
Polymerase (unbalanced dNTPs) is given as matrix P. In this example the mutational<br />
probability for the transformation <strong>of</strong> nucleotide A → T gives a value <strong>of</strong> f = 9.7.<br />
42
PART I: MAP 2.0 3D<br />
⎛ 0.0 9.70 19.34 16.14⎞⎛0⎞<br />
⎜<br />
⎟⎜<br />
⎟<br />
T<br />
⎜9.70<br />
0.0 16.14 19.34⎟⎜1⎟<br />
f = UPV =<br />
⎜<br />
=<br />
4.82 0.0 0.0 0.0 ⎟⎜0⎟<br />
⎜<br />
⎟⎜<br />
⎟<br />
0.0 4.82 0.0 0.0<br />
0<br />
⎝<br />
⎠⎝<br />
⎠<br />
( 1 0 0 0) 9. 7<br />
(2.2)<br />
By applying this procedure to each single nucleotide substitution in the codon, nine<br />
probability values (three for each nucleotide) are obtained. Each <strong>of</strong> these values gives the<br />
k th f ( i<br />
mutational probability (<br />
k<br />
) α → β<br />
) that change the i th amino acid (α) expressed by the<br />
native codon into the one (β, which also comprises the stop codon) expressed by the<br />
k<br />
f( i) mutated codon (e.g. X,Y,Z → X´,Y,Z). Therefore, the 9 probabilities ( α→<br />
get the normalization factor (Ni) for the i th residue <strong>of</strong> the protein sequence<br />
β<br />
) are summed to<br />
N<br />
i<br />
=<br />
9<br />
∑<br />
k = 1<br />
f<br />
( i)<br />
k<br />
α →β<br />
(2.3)<br />
hence, the normalized probability for the substitution <strong>of</strong> amino acid α → β is given by<br />
φ(<br />
i)<br />
k<br />
α<br />
→<br />
= f ( i)<br />
β<br />
k<br />
α → β<br />
N<br />
i<br />
(2.4)<br />
2.3.2. MAP indicators<br />
Three indicators protein structure indicator, amino acid diversity indicator <strong>and</strong><br />
chemical diversity indicator are used to summarize the characteristics <strong>of</strong> r<strong>and</strong>om<br />
mutagenesis method for the target gene on amino acid level. The amino acid diversity for<br />
the substitution <strong>of</strong> amino acid α → β in the protein sequence (L) is calculated by<br />
43
PART I: MAP 2.0 3D<br />
∆<br />
α→ β<br />
=<br />
L<br />
1<br />
∑φ(<br />
i)<br />
L 1<br />
i=<br />
α→β<br />
(2.5)<br />
The amino acid diversities are summed together to calculate the values for MAP indicators.<br />
I<br />
α →S<br />
=<br />
r '<br />
∑<br />
r = 1<br />
∆<br />
r<br />
α →β<br />
( r )<br />
(2.6)<br />
where S indicates different subset <strong>of</strong> amino acids or stop codons <strong>and</strong> r´ represents the<br />
elements in these subsets. The chemical diversity indicator quantifies the generated<br />
chemical diversity by the r<strong>and</strong>om mutagenesis method. For this indicator, the S consists <strong>of</strong><br />
one <strong>of</strong> the subset <strong>of</strong> amino acids: charged (D, E, H, K, R; S = ch), neutral (C, M, S, P, T, N, Q; S<br />
= ne), aromatic (F, Y, W; S = ar) <strong>and</strong> aliphatic (G, A, V, L, I; S = al). For example, Iα <br />
ch indicate<br />
the total probability <strong>of</strong> a given amino acid α to substitute into charged amino acids (ch) is<br />
calculated by Δα <br />
β(r), where the substituted residue β(r) can be one <strong>of</strong> the charged residues<br />
(E, D, R, K <strong>and</strong> H) i.e. r´ = 5. The protein structure indicator signifies the fraction <strong>of</strong> single<br />
nucleotide substitution resulting in protein structure/function-disrupting (stop codons; S =<br />
st) <strong>and</strong> likely destabilizing (glycine or proline; S = gp) amino acid substitutions. Finally, the<br />
amino acid diversity indicator measures the fraction <strong>of</strong> variants with preserved amino acid<br />
substitutions (S = pr) <strong>and</strong> average amino acid substitutions per residue. This is<br />
complemented by codon diversity coefficient that measures the distribution <strong>of</strong> r<strong>and</strong>om<br />
mutations among the codons <strong>of</strong> the gene.<br />
2.3.3. Local chemical diversity <strong>and</strong> protein structure components<br />
Two new sequence based indicators are introduced with the MAP 2.0 3D server to<br />
complement the single amino acid structural analysis. The substitution probability <strong>of</strong> the i th<br />
44
PART I: MAP 2.0 3D<br />
amino acid (α) that leads to change in the amino acid (β) with the side chain <strong>of</strong> same<br />
chemical nature is calculated by<br />
δ(i) α→S<br />
=<br />
r'<br />
∑<br />
r=1<br />
r<br />
φ(i) α→β<br />
(2.7)<br />
where, x <strong>and</strong> r´ represents the amino acid group <strong>and</strong> its members, respectively (as<br />
described for equation 2.6). The amino acid mutability <strong>of</strong> the i th amino acid (a special case<br />
<strong>of</strong> the equation 2.7 with r´ = 1) is given by<br />
µ(i) =1−φ(i) α→α (2.8)<br />
where<br />
φ(i)<br />
α →α<br />
is the normalized probability for the substitution does not lead to amino<br />
acid change (α → α) at the i th residue. The local structure environment <strong>of</strong> the amino acid<br />
residue influences the acceptance <strong>of</strong> the amino acid substitutions.[23,24] Local structural<br />
environment <strong>of</strong> the protein comprises secondary structure element, residue flexibility <strong>and</strong><br />
solvent accessibility. Intra protein interactions contribute to define secondary structure<br />
elements <strong>and</strong> residue flexibility in a target protein <strong>and</strong> help to underst<strong>and</strong> molecular basis<br />
<strong>of</strong> the stability <strong>and</strong> activity <strong>of</strong> the protein.[26] To illustrate the effect <strong>of</strong> generated chemical<br />
diversity on protein structural environment, these factors are mapped with amino acid<br />
substitution patterns.<br />
The secondary structure elements are derived using DSSP[27] while Relative<br />
Solvent Accessibility (RSA) has been calculated by the number <strong>of</strong> water molecules in<br />
contact <strong>of</strong> residue[27] divided by total surface area <strong>of</strong> the residue.[28] A threshold value <strong>of</strong><br />
0.16 is used to differentiate between exposed (RSA >= 0.16) or buried residues (RSA <<br />
0.16). Crystallographic B-factors are used as indicators <strong>of</strong> the residue flexibility.[29] The B-<br />
factors <strong>of</strong> Cα atoms are normalized by the<br />
45
B´=<br />
( B − B )<br />
σ<br />
PART I: MAP 2.0 3D<br />
(2.9)<br />
where ‹B› is the average value for Cα atom (after omitting first <strong>and</strong> last 3 residues) <strong>and</strong> σ<br />
the st<strong>and</strong>ard deviation.[30] The relative B-factor values after normalization is employed to<br />
differentiate flexibility <strong>and</strong> rigidity <strong>of</strong> the residue.[31]<br />
Finally, the new server calculates from the crystallographic protein structure, using<br />
criteria reported in literature, the following intra-protein interactions: disulphide<br />
bonds,[27] salt bridge,[32] hydrophobic interaction,[33] aromatic interaction[34] <strong>and</strong> side<br />
chain hydrogen bond.[35] The default parameters are taken from the widely accepted<br />
primary literature for the calculation <strong>of</strong> molecular interactions <strong>and</strong> can me modified by the<br />
user.<br />
2.3.4. MAP 2.0 3D server description<br />
MAP 2.0 3D analysis was performed on gene sequence along with the 3D coordinates<br />
<strong>of</strong> target protein for a r<strong>and</strong>om mutagenesis method at a time. Figure 2.1 shows the query<br />
interface <strong>of</strong> the server available at http://map.jacobs-university.de/map3d.html.<br />
The server is flexible to accept the gene sequence in commonly used sequence<br />
format (fasta, GenBank, GCG) or as the raw sequence. The 3D coordinates is accepted in<br />
PDB file format[36]. The protein sequence, after translation from gene sequence, is aligned<br />
with protein sequence, extracted from protein coordinates, by using Smith Waterman<br />
algorithm[37] for local sequence alignment. For the complete analysis, the sequences<br />
should have appropriate identity (default >= 70 %). In case <strong>of</strong> multi-protein chain files, the<br />
analysis performs on first chain or can be defined by the user. The analysis is performed on<br />
a user selected mutagenesis method (chosen among the MAP library <strong>of</strong> commonly used<br />
methods or, as a feature <strong>of</strong> the server by directly introducing the values <strong>of</strong> the probability<br />
46
PART I: MAP 2.0 3D<br />
<strong>of</strong> transformation matrix P). By default the results include the analysis <strong>of</strong> all the residues<br />
that can be changed by selecting predefined group <strong>of</strong> amino (charged, neutral, aromatic<br />
<strong>and</strong> aliphatic or, accordingly to their relative solvent accessibility, exposed or buried) or by<br />
providing a set <strong>of</strong> amino acid residues, which can be extended to residues within a given<br />
range (in Å) from the given set <strong>of</strong> amino acids. Finally, the advanced user interface section<br />
allows the change <strong>of</strong> the parameters used for the calculation <strong>of</strong> molecular interactions.<br />
47
PART I: MAP 2.0 3D<br />
Figure 2.1: Query interface for MAP 2.0 3D. Black boxes show two ways to query the sever: (1)<br />
sequence based analysis that take nucleotide sequence as an input (red box) <strong>and</strong> (2) structure<br />
based analysis, which takes protein coordinates (crystallographic structure or homology model),<br />
nucleotide sequence <strong>and</strong> a r<strong>and</strong>om mutagenesis method as input (red boxes). The options given in<br />
the green boxes can be used to customize the query like (1) 19 commonly used mutagenesis<br />
methods are included in the server as default, new method can be included by defining its<br />
mutational spectra, (2) selection <strong>of</strong> chain in case <strong>of</strong> multi chain protein, (3) restrict the search for a<br />
group <strong>of</strong> amino acids either selecting the predefined groups based on (a) the chemical property <strong>of</strong><br />
their side chain like charged, neutral, aromatic, <strong>and</strong> aliphatic, (b) the solvent accessible area like<br />
buried or exposed <strong>and</strong> (c) the given set <strong>of</strong> amino acids <strong>and</strong> define cut<strong>of</strong>f (in Å) to include residues in<br />
the defined diameter <strong>of</strong> given residues in the analysis, <strong>and</strong> (4) altering the threshold used for the<br />
calculation <strong>of</strong> molecular interactions.<br />
2.3.5. MAP 2.0 3D output<br />
Along with the sequence based MAP analysis indicators, the implemented indicators<br />
in MAP 2.0 3D correlate the generated amino acid substitution patterns <strong>of</strong> r<strong>and</strong>om<br />
mutagenesis methods to the protein structure (by using the Jmol applet,<br />
http://www.jmol.org/) <strong>and</strong> includes a residue mutability indicator <strong>and</strong> taking secondary<br />
structure elements, residue flexibility, relative solvent accessibility <strong>and</strong> intra protein<br />
interactions into account (see above). Generated results are also available to download for<br />
further use in text format. The modified coordinate files (with amino acid substitution<br />
probabilities) in pdb format are also available as downloads.<br />
2.3.6. Model proteins<br />
The enzyme selected for the analysis by MAP 2.0 3D are: 1) D-amino acid oxidase from<br />
Rhodotorula gracilis (EC: 1.4.3.1; EMBL-Bank: AAB93974.1[36]; PDB Id: 1C0I[37]), 2)<br />
Phytase from Escherichia coli (EC: 3.1.3.2; EMBL-Bank: AY496073.1[38]; PDB Id:<br />
1DKP[39]), 3) N-Acetylneuramine acid aldolase from Escherichia coli (EC: 4.1.3.3; EMBL-<br />
Bank: X03345.1[40] ; PDB Id: 1NAL[41]). The sequence composition <strong>of</strong> the enzymes: 1) D-<br />
48
PART I: MAP 2.0 3D<br />
amino acid oxidase (1107 bases: A 19.96 %; T 17.52 %; G 31.17 %; C 31.35 %; 369<br />
residues), 2) Phytase (1299 bases: A 24.25 %; T 22.09 %; G 27.25 %; C 26.40 %; 433<br />
residues) <strong>and</strong> 3) N-acetylneuraminic acid aldolase (894 bases: A 24.38 %; T 23.60 %; G<br />
27.07 %; C 24.94 %; 298 residues). Secondary structure <strong>of</strong> the enzymes: 1) D-amino acid<br />
oxidase (30 % helical, 28 % beta sheet), 2) Phytase (42 % helical, 15 % beta sheet), 3) N-<br />
acetylneuraminic acid aldolase (50 % helical, 13 % beta sheet).<br />
2.4. Results <strong>and</strong> discussions<br />
The use <strong>of</strong> MAP 2.0 3D server is illustrated by performing the analysis <strong>of</strong> three<br />
different enzymes evolved for different properties by using directed protein evolution. The<br />
first example describes how to decrease effects <strong>of</strong> mutational bias <strong>and</strong> to generate a mutant<br />
library with a higher fraction <strong>of</strong> active clones. The second <strong>and</strong> third examples show the<br />
usability <strong>of</strong> the server to analyze the influence <strong>of</strong> mutational preferences on the evolution<br />
<strong>of</strong> desirable property. Outputs <strong>of</strong> the complete MAP 2.0 3D analysis are provided as examples<br />
in the instruction link <strong>of</strong> the server (http://map.jacobs-university.de/instruction.html).<br />
2.4.1. D-amino acid oxidase<br />
D-amino acid oxidase (DAAO) is a flavin adenine dinucleotide (FAD) dependent<br />
flavoenzyme. DAAO catalyses the dehydrogenation <strong>of</strong> D-amino acid to the corresponding α-<br />
keto acids, producing ammonia <strong>and</strong> hydrogen peroxide.[42,43] The high turnover rate, the<br />
stable FAD-binding <strong>and</strong> the broad substrate specificity <strong>of</strong> DAAO from Rhodotorula gracilis<br />
(RgDAAO) make it an attractive catalyst for biotechnological application as the biosensing<br />
(i.e. the rapid <strong>and</strong> reliable detection <strong>of</strong> D-amino acid content in food specimens or <strong>of</strong> the<br />
neurotransmitter D-serine in the brain).[43] We performed MAP 2.0 3D analysis on the<br />
RgDAAO to evaluate the amino acid diversity generated by r<strong>and</strong>om mutagenesis methods.<br />
49
PART I: MAP 2.0 3D<br />
Table 2.1: Summary <strong>of</strong> the MAP 2.0 3D analysis for the oxidase, the phytase <strong>and</strong> the aldolase,<br />
targeting different epPCR methods for r<strong>and</strong>om mutagenesis.<br />
RgDAAO (1 st<br />
RgDAAO (2 nd<br />
Phytase<br />
N-acetylneura-<br />
round)<br />
round)<br />
minic acid<br />
aldolase<br />
epPCR method<br />
Average amino<br />
acid substitution a<br />
Preserved amino<br />
acid substitution b<br />
Codon diversity<br />
coefficient c<br />
Stop codons<br />
frequency d<br />
Gly/Pro<br />
frequency e<br />
Charged amino<br />
acid diversity f<br />
Neutral amino<br />
acid diversity g<br />
Aromatic amino<br />
acid diversity h<br />
Aliphatic amino<br />
acid diversity i<br />
Taq<br />
(+,G=A=C=T)<br />
Taq<br />
(+,G=A,C=T)<br />
Taq<br />
(+,G=A,C=T)<br />
Taq<br />
(+,G=A=C=T)<br />
7.40 7.40 7.45 7.20<br />
24.53 % 23.38 % 25.40 % 28.47 %<br />
42.48 34.04 36.49 43.70<br />
2.30 % 4.38 % 4.69 % 2.12 %<br />
20.58 % 13.23 % 11.60 % 16.26 %<br />
-0.34 %<br />
(25.00 %)<br />
-2.62 %<br />
(25.00 %)<br />
1.39 %<br />
(19.21 %)<br />
5.00 %<br />
(22.22 %)<br />
3.37 %<br />
4.47 % -4.14 % 1.73 %<br />
(27.99 %) (27.99 %) (35.65 %) (26.94 %)<br />
-3.19 % -0.23 % 0.91 % -3.14 %<br />
(7.34 %) (7.34 %) (5.79 %) (8.08 %)<br />
-2.13 % -6.00 % -2.86 % -5.72 %<br />
(39.67 %) (39.67 %) (39.35 %) (42.76 %)<br />
aaverage number <strong>of</strong> amino acid substitutions per residue, b Iα→pr: fraction <strong>of</strong> variants with<br />
preserved amino acid substitutions, c codon diversity coefficient, d Iα→st: fraction <strong>of</strong> variants with<br />
stop codons, e Iα→gp: fraction <strong>of</strong> variants with Gly/Pro <strong>and</strong> chemical diversity generated by the<br />
mutagenesis methods presented as f Iα→ch: charged, g Iα→ne: neutral, h Iα→ar: aromatic <strong>and</strong> I Iα→al:<br />
50
PART I: MAP 2.0 3D<br />
aliphatic amino acid diversity with the amino acid composition <strong>of</strong> the target protein sequence (in<br />
parenthesis) <strong>and</strong> deviation from this composition after mutagenesis.<br />
MAP 2.0 3D analysis<br />
The sequence based MAP 2.0 3D analysis was performed using the following<br />
descriptors: i) protein structure indicators, ii) amino acid diversity indicator with codon<br />
diversity coefficient <strong>and</strong> iii) chemical diversity indicator.<br />
Figure 2.2: Statistical analysis <strong>of</strong> stop codons frequencies (a) <strong>and</strong> Gly/Pro substitutions (b) for<br />
RgDAAO. The r<strong>and</strong>om mutagenesis methods enclosed in the black rectangles (epPCR (Taq (MnCl 2,<br />
G=A=C=T)) <strong>and</strong> epPCR (Taq (MnCl 2, G=A, C=T))) are used for the MAP 2.0 3D analysis.<br />
In Figure 2.2, the values for the stop codon indicator (Iα→st)) <strong>and</strong> the Gly/Pro<br />
indicator (Iα→gp)) for different r<strong>and</strong>om mutagenesis methods are reported. The two methods<br />
show opposite trend in the generation <strong>of</strong> stop codons (sequence truncation) <strong>and</strong> Gly/Pro<br />
51
PART I: MAP 2.0 3D<br />
(α-helix destabilizers), i.e. higher the stop codons frequency lower the Gly/Pro<br />
substitutions <strong>and</strong> vice versa.[17] The two epPCR methods (indicated in the Figure 2.2 with<br />
the black rectangles) were found to be more appropriate for the RgDAAO with the balanced<br />
frequencies <strong>of</strong> stop codons <strong>and</strong> Gly/Pro in comparison to other mutagenesis methods. In<br />
Table 2.1, the sequence-based analysis <strong>of</strong> the server for selected r<strong>and</strong>om mutagenesis<br />
methods is summarized. The first method, the balanced epPCR Taq-Pol (Mn 2+ , balanced<br />
dNTP)[44] has strong preference for specific nucleotide exchange ~32 % AT → GC<br />
(transition mutations). While second method, the unbalanced epPCR Taq-Pol (Mn 2+ ,<br />
unbalanced dNTP)[45] is expected to produce more transversion (21.41 % AT → TA) than<br />
transition (14.45 % AT→GC) mutations. Balanced epPCR was expected to generate lower<br />
fraction <strong>of</strong> stop codons (Iα→st = 2.30 %) <strong>and</strong> higher Gly/Pro (Iα→gp = 20.58 %) content than<br />
the unbalanced epPCR (Iα→st = 4.38 % <strong>and</strong> Iα→gp = 13.23 %) (see Table 2.1). For both<br />
methods, an average 7.4 amino acid substitutions per residue was calculated.<br />
In Figure 2.3, cartoon representations <strong>of</strong> the RgDAAO crystallographic structure<br />
colored accordingly to Iα→gp using the Jmol[46] visualization feature <strong>of</strong> the new server, are<br />
shown. Out <strong>of</strong> 30 % <strong>of</strong> the residues involved in helix formation, 51 % has a higher Iα→gp<br />
value (if α is equal to S, L, E <strong>and</strong> D) with a prevalence <strong>of</strong> negative charged residues (E <strong>and</strong> D,<br />
highlighted in stick format in Figure 2.3). In comparison to the unbalanced epPCR, the<br />
balanced epPCR method was observed with a higher probability <strong>of</strong> the charged residues<br />
substitution into Gly/Pro (represented by the color code used to define amino acid<br />
substitution probability in Figure 2.3). The mapping <strong>of</strong> charged amino acid substitution<br />
patterns on the structure <strong>of</strong> RgDAAO are reported in Figure 2.4 <strong>and</strong> found to be consistent<br />
with the latter observation <strong>of</strong> Gly/Pro substitution patterns. The balanced epPCR (Figure<br />
2.4a) shows lower probability for charged amino acid substitutions than unbalanced epPCR<br />
(Figure 2.4b) that is found to be opposite to the Gly/Pro substitution patterns for both<br />
methods (Figure 2.3). Hence, the amino acid substitutions <strong>of</strong> charged residues into residues<br />
unfavorable for forming molecular interactions result in destabilization <strong>of</strong> protein. For<br />
example, charged residues were found to be involved in molecular interactions like salt<br />
bridges (15 out <strong>of</strong> 21 show more than 0.5 probability to be substituted in glycine) <strong>and</strong> side<br />
chain H-bonds (5 out <strong>of</strong> 26 with more than 0.5 probability for glycine substitutions). In<br />
52
PART I: MAP 2.0 3D<br />
Figure 2.5, charged residues involved in salt bridge formation with the amino acid diversity<br />
generated by the balanced (a1<br />
<strong>and</strong> a2) <strong>and</strong> unbalanced epPCR (b1 <strong>and</strong> b2) methods are<br />
reported. The balanced epPCR method shows lower probabilities for substitution into<br />
charged residues when compared to the unbalanced epPCR method. The unbalanced<br />
epPCR is less transition biased (AT → GC) that results in higher probability <strong>of</strong> substitutions<br />
to charged residues (for E coded by GAG <strong>and</strong> D coded by GAC; a transition mutation leads<br />
<strong>of</strong>ten to a substitution into glycine (GGC, GGG)). These effects <strong>of</strong> mutagenesis methods due<br />
to mutational preferences might be minimized by codon optimization like for E using GAA<br />
<strong>and</strong> for D using GAT codon.<br />
Figure 2.3: Gly/Pro amino acid substitutions mapping on RgDAAO structure for (a) epPCR<br />
(Taq (MnCl2, G=A=C=T)) <strong>and</strong> (b) epPCR (Taq (MnCl2, G=A, C=T)). For the balanced epPCR<br />
method (a) the red colored regions <strong>of</strong> RgDAAO structure indicate an overall higher<br />
probability <strong>of</strong> charged residues substitutions, mainly for negatively charged residues (in<br />
stick representation), into Gly/Pro than the unbalanced epPCR (b).<br />
53
PART I: MAP 2.0 3D<br />
Figure 2.4: Amino acid substitutions mapping <strong>of</strong> charged residues (E, D, R, K, H) on RgDAAO with<br />
for (a) epPCR (Taq (MnCl 2, , G=A=C=T)) <strong>and</strong> (b) epPCR (Taq (MnCl 2, G=A, C=T)).<br />
54
PART I: MAP 2.0 3D<br />
Figure 2.5: Chemical diversity <strong>and</strong> mutability <strong>of</strong> charged amino acid positions <strong>of</strong> D-amino acid<br />
oxidase (E, D, R, K, H) that are involved in salt bridges formation (a1) <strong>and</strong> (a2) for epPCR (Taq<br />
(MnCl 2, G=A=C=T)) <strong>and</strong> (b1) <strong>and</strong> (b2) for epPCR (Taq (MnCl 2, G=A, C=T)). Y-axis shows residue (i)<br />
sequence id, (ii) PDB id, (iii) residue name, (iv) secondary structure elements (H: alpha helix; B:<br />
beta bridge <strong>and</strong> extended str<strong>and</strong>; T: hydrogen bonded turn <strong>and</strong> bend; *: loop or irregular structure)<br />
<strong>and</strong> (v) Amino acid category according to the chemical property <strong>of</strong> its side chain (P: charged, Y:<br />
neutral, C: aromatic <strong>and</strong> B: aliphatic) with stop codon (R) <strong>and</strong> Gly/Pro (G) as separate classes.<br />
Using the new focused analysis feature <strong>of</strong> the server, the amino acid substitutions<br />
patterns for active site residues (Y223, Y238, <strong>and</strong> R285) were also evaluated. Y223 <strong>and</strong><br />
Y238 are involved in substrate binding <strong>and</strong> product release while R285 forms a pair with<br />
carboxylate portion <strong>of</strong> the substrate (arginine) in RgDAAO.[37] R285 has a very low<br />
residue mutability indicator (μ(285) < 0.3) (i.e. low probability <strong>of</strong> substitution leading to<br />
amino acid change) for both methods. Y223 <strong>and</strong> Y238 have µ(223/238) = 0.9 <strong>and</strong> therefore<br />
they have a higher probability to be substituted into another amino acid. For the balanced<br />
epPCR, Y223 <strong>and</strong> Y238 are preferentially substituted into charged (δ(223/238)Y→ch = 0.37)<br />
<strong>and</strong> neutral (δ(223/238)Y→ne = 0.46) amino acids. In the unbalanced epPCR, the chemical<br />
diversity at Y223/238 is more preserved (δ(223/238)Y→ne = 0.44 <strong>and</strong> δ(223/238)Y→ar =<br />
0.31). The tendency to the substitution <strong>of</strong> active site aromatic residues into chemically<br />
different amino acid might result in the increased number <strong>of</strong> inactive clones in the mutant<br />
library. In summary, MAP 2.0 3D provides qualitative indication that the balanced epPCR<br />
method might be less beneficial (or <strong>of</strong> lower quality) than the unbalanced one in the<br />
directed evolution <strong>of</strong> RgDAAO.<br />
RgDAAO directed evolution<br />
In one directed evolution study by Pollegioni et al.[47], the substrate specificity <strong>of</strong><br />
RgDAAO was altered to formulate it as biosensor for analytical determination <strong>of</strong> D-amino<br />
acid in biological samples. Two rounds <strong>of</strong> directed evolution were performed employing<br />
epPCR mutant libraries (balanced dNTP) followed by another round <strong>of</strong> directed evolution<br />
55
PART I: MAP 2.0 3D<br />
employing epPCR (unbalanced dNTP) for diversity generation. In the first round (1 st set <strong>of</strong><br />
epPCR: balanced), 91 % <strong>and</strong> in the second round (2 nd set <strong>of</strong> epPCR: unbalanced), 63 %<br />
clones were reported to be inactive. The results <strong>of</strong> these experiments are in agreement<br />
with the predictions <strong>of</strong> MAP 2.0 3D server. In fact, mutational preferences <strong>of</strong> the balanced<br />
method induce more structural destabilizing substitutions <strong>and</strong> resulted in a higher number<br />
<strong>of</strong> inactive clones than balanced epPCR. In addition, MAP 2.0 3D analysis suggests that most<br />
<strong>of</strong> the inactive clones should be a result <strong>of</strong> substitutions into Gly/Pro (destabilizing amino<br />
acids), which can destabilize the secondary structure <strong>of</strong> a helix or weaken intra-molecular<br />
interactions.<br />
The best variant obtained from the experiments was the triple mutant (T60A Q144R<br />
K152E) with broader substrate specificity. Amino acid substitution patterns calculated by<br />
the MAP 2.0 3D server at these positions were also found in agreement with experimental<br />
results. All mutated positions were assigned by MAP 2.0 3D with high residue mutability<br />
value (μ(60/144/152) > 0.8), i.e. within mutagenic hotspots generated by mutagenesis<br />
methods. Q144R substitution was identified in the first round <strong>of</strong> the balanced epPCR. Q144<br />
has a high probability to substitute into charged residue (δ(144)Q→ch = 0.67) <strong>and</strong><br />
experimentally the Q144R (φ(144)Q→R = 0.58) substitution was found. In second round <strong>of</strong><br />
r<strong>and</strong>om mutagenesis with unbalanced epPCR, T60A (φ(60)T→A = 0.58) <strong>and</strong> K152E<br />
(φ(152)K→E = 0.36) were substituted. Both residues have a high preference to be<br />
substituted into aliphatic (δ(60)T→al = = 0.6) <strong>and</strong> charged residues (δ(152)K→ch = = 0.7),<br />
respectively.<br />
In summary, the RgDAAO case illustrates how the MAP 2.0 3D server can be used in<br />
developing efficient mutagenesis strategies before <strong>and</strong> during directed evolution<br />
experiments by, for instance, the selection <strong>of</strong> the most efficient mutagenesis method for the<br />
target gene with least unfavorable effects on its protein structure or function <strong>and</strong> codon<br />
engineering. In this way, the gene can be synthesized prior to the directed evolution<br />
experiment to reduce highly destabilizing substitutions at key amino acid positions.<br />
56
PART I: MAP 2.0 3D<br />
2.4.2. Phytase<br />
Phytase is a class <strong>of</strong> phosphatase enzymes that catalyses the hydrolysis <strong>of</strong> phytic<br />
acid (myoinositol hexakisphosphate) to release inorganic phosphorus in a usable form.<br />
Phytases have been used as a feed supplement since decades.[48] <strong>Application</strong> <strong>of</strong> phytases<br />
in industrial feed pelleting process requires high temperatures. For this reason, directed<br />
evolution methods have been used to increase thermal resistance <strong>of</strong> phytases while<br />
maintaining high activity at ambient temperature.[49]<br />
MAP 2.0 3D analysis<br />
MAP 2.0 3D analysis was performed on the phytase appA2 (full analysis is given in<br />
MAP 2.0 3D server as an example). In comparison to other 18 r<strong>and</strong>om mutagenesis methods,<br />
epPCR Taq (+, G=A, C=T) was found to be the preferred choice for directed appA2<br />
evolution. In fact, as reported in Table 2.1, the sequence based MAP 2.0 3D analysis shows<br />
frequency <strong>of</strong> stop codons Iα→st = 4.69 % <strong>and</strong> substitutions into Gly/Pro Iα→gp = 11.60 %. The<br />
average 7.45 amino acid substitutions per residue were calculated. The value <strong>of</strong> codon<br />
diversity coefficient was 36.49 % <strong>and</strong> resulted in preserved amino acid substitutions Iα→pr =<br />
25.40 %. Charged (19.21 %) <strong>and</strong> aromatic (5.69 %) residues were overrepresented with<br />
1.39 % <strong>and</strong> 0.91 % deviation from their chemical distribution, respectively. The aliphatic<br />
(39.35 %) <strong>and</strong> neutral (35.65 %) residues were underrepresented with -2.86 % <strong>and</strong> -4.14<br />
% deviation, respectively.<br />
By using the structural data a different conclusion emerge in contrast to the<br />
sequence analysis alone. One <strong>of</strong> the rule <strong>of</strong> thumb, used to enhance the thermostability <strong>of</strong><br />
an enzyme, is to increase the number <strong>of</strong> charged residues in the loop regions at the protein<br />
surface. The reduction <strong>of</strong> mobility <strong>of</strong> these flexible regions by strengthening with<br />
electrostatic <strong>and</strong> hydrogen bonding interactions usually has a stabilizing effect on the<br />
thermal stability.[50] Hence, the amino acid substitution patterns <strong>of</strong> charged residues were<br />
analyzed using the residue mutability indicator, the normalized B-factors (B´) as a residue<br />
flexibility indicator <strong>and</strong> the relative solvent accessibility (RSA) to differentiate exposed <strong>and</strong><br />
57
PART I: MAP 2.0 3D<br />
buried residues. In Figure 2.6, the mapping <strong>of</strong> amino acid substitution patterns, generated<br />
by epPCR Taq (+, G=A, C=T), for different amino acid substitution classes (charged, neutral,<br />
aromatic <strong>and</strong> aliphatic), stop codon <strong>and</strong> Gly/Pro on the phytase appA2 is reported with<br />
charged residues represented in stick representation. The high probability <strong>of</strong> charged<br />
residues substitutions into Gly/Pro, aliphatic <strong>and</strong> neutral residues were observed in<br />
MAP 2.0 3D analysis. In Figure 2.7 the detailed information <strong>of</strong> amino acid substitution<br />
patterns for charged residues is reported with three MAP 2.0 3D structural indicators for the<br />
epPCR Taq (+, G=A, C=T) method. The experimentally determined mutations are<br />
highlighted with black rectangles in Figure 2.7. Most <strong>of</strong> the charged residues were found<br />
with mutability value µ > 0.6 i.e. high substitution probability to change into another amino<br />
acid. In Figure 2.7, the high probabilities were evident to substitute from charged residues<br />
into glycine or proline (alpha helix destabilizers), aliphatic <strong>and</strong> neutral residues (less<br />
favorable to improve thermostability).<br />
58
PART I: MAP 2.0 3D<br />
Figure 2.6: MAP 2.0 3D analysis <strong>of</strong> amino acid substitutions probability <strong>of</strong> phytase appA2 after being<br />
subjected to epPCR (Taq (MnCl 2, G=A, C=T) in cartoon representation; charged residues (D, E, H, K,<br />
R) shown in stick representation. The probability values increase from blue (lowest probability) to<br />
red (highest probability). Amino acids were grouped according to the chemical nature <strong>of</strong> their side<br />
chain: charged (c), neutral (d), aromatic (e) or aliphatic (f) with sequence interrupting (stop codons<br />
(a)) <strong>and</strong> structure destabilizing amino acids (glycine <strong>and</strong> proline (b)).<br />
Phytase directed evolution<br />
In one example, Kim et al. performed directed evolution on phytase appA2 from E.<br />
coli to generate variants with increased thermostability by using epPCR with unbalanced<br />
dNTPs.[51] Two variants (K46E <strong>and</strong> K65E K97M S209G) with 20 % improved<br />
thermostabilty were found after screening 5000 clones. Out <strong>of</strong> four positions, three were<br />
resulted from charged residue substitutions occurred at lysine residues.<br />
MAP 2.0 3D analysis <strong>of</strong> amino acid substitution pattern for these positions was found<br />
in agreement with experimental findings with, all four positions having a high mutability<br />
indicator value (µ > 0.8) <strong>and</strong> relative solvent accessibility (RSA > 0.4). Furthermore, all<br />
lysine residues in the mutated positions have a probability to a nucleotide exchange that<br />
results in a stop codon. K46 <strong>and</strong> K97 have the same amino acid substitution patterns with<br />
substitution preference for stop codon (δ(46/97)K→st = 0.24) but different for charged<br />
(δ(46)K→ch = 0.40; φ(46)K→E = 0.16) <strong>and</strong> neutral residues (δ(97)K→ne = 0.35; φ(97)K→M =<br />
0.24). K65 has different amino acid substitution values to change into residues with<br />
aliphatic (δ(65)K→al = 0.18), charged (δ(65)K→ch = 0.36; φ(65)K→E = 0.12) <strong>and</strong> neutral<br />
(δ(65)K→ne = 0.27) side chains <strong>and</strong> δ(65)K→st = 0.18 for stop codon. S209 has a high<br />
probability to preserve the chemical property <strong>of</strong> its side chain <strong>and</strong> has high preference to<br />
neutral substitution (δ(209)S→ne = 0.60). S209 substitution into glycine alone has<br />
probability φ(209)S→G = 0.24. The mutations generated by using the epPCR Taq (+, G=A,<br />
=T) mutagenesis method experimentally resulted in only 20 % active clones in the library<br />
<strong>and</strong> only 80 were found improved in thermal stability. Phytase appA2 has high C helical<br />
59
PART I: MAP 2.0 3D<br />
content (42%) <strong>and</strong> substitutions into Gly/Pro residues might reduce thermal stability by<br />
destabilizing the structure <strong>and</strong> increasing the number <strong>of</strong> inactive clones. In general, amino<br />
acid substitutions <strong>of</strong> charged residues into aliphatic or neutral residues are less favorable<br />
to improve thermal stability.<br />
60
PART I: MAP 2.0 3D<br />
Figure 2.7: Amino acid substitution patterns for charged residues in phytase with performance<br />
the parameters residue mutability, residue flexibility <strong>and</strong> relative solvent accessibility <strong>of</strong> amino<br />
acids. The experimentally determined mutations are highlighted in black boxes. Y-axis shows<br />
sequence id, PDB id, amino acid name <strong>and</strong> in (a) secondary structure elements (H: alpha helix; B:<br />
beta bridge <strong>and</strong> extended str<strong>and</strong>; T: hydrogen bonded turn <strong>and</strong> bend; *: loop or irregular structure),<br />
(b) normalized Cα B-factor to differentiate between flexible: F <strong>and</strong> rigid: R residues <strong>and</strong> (c) relative<br />
solvent associability to identify exposed: E or buried: B residues.<br />
2.4.3. N-acetylneuraminic acid aldolase<br />
N-acetylneuraminic acid aldolase (Neu5Ac aldolase) catalyses the aldol<br />
condensation <strong>of</strong> N-acetyl-D-mannosamine <strong>and</strong> pyruvate to give N-acetyl-D-neuraminic acid<br />
(D-sialic acid).[52] Neu5Ac aldolase is used in the synthesis <strong>of</strong> sialic acid, a complex sugar<br />
with many pharmaceutical applications.<br />
MAP 2.0 3D analysis<br />
Based on the sequence based analysis <strong>of</strong> MAP 2.0 3D server, the balanced epPCR<br />
method (Taq (MnCl2, G=A=C=T) was found suitable for directed evolution <strong>of</strong> Neu5Ac<br />
aldolase (summarized in Table 2.1). For this method, the value <strong>of</strong> codon diversity<br />
coefficient was 43.12, which is resulted in Iα→pr = 28.47 % preserved amino acid<br />
substitutions with an average 7.20 amino acid substitutions per residue. The frequency for<br />
stop codons was Iα→st = 2.12% <strong>and</strong> for Gly/Pro substitutions Iα→gp = 16.26% were reported.<br />
The structure based analysis was focused on active site residues (A11, S47, T48, Y137,<br />
I139, k165, T167, G189, Y190) using the new option <strong>of</strong> the MAP 2.0 3D server to restrict the<br />
analysis to selected amino acids. Figure 2.8 shows the expected amino acid substitutions<br />
for active site residues <strong>and</strong>, highlighted in boxes, experimentally determined mutation<br />
positions[52] (G70, T84, Y98, F115, V251, E282). With the exception <strong>of</strong> residues A11 <strong>and</strong><br />
G189, the other active site residues have a residue mutability value (μ > 0.6). The values <strong>of</strong><br />
the RSA <strong>and</strong> B´ indicate that A11 <strong>and</strong> G189 are buried in the protein active site <strong>and</strong> highly<br />
rigid. The residue I139, another aliphatic residue <strong>of</strong> active site, resulted in a moderately<br />
61
PART I: MAP 2.0 3D<br />
high preference <strong>of</strong> substitution into neutral amino acid (δ(139)I→ne = 0.36). The active site<br />
residues, Y137 <strong>and</strong> Y190 have high residue mutability value (µ(137/190) = 0.94) <strong>and</strong><br />
substitute into charged (δ(137/190)K→ch = 0.37) or neutral (δ(137/190)K→ne = 0.46) amino<br />
acids. S47 has mutability value µ = 0.88 with a substitution probability φ(47)S→G = 0.6 to<br />
change into glycine. K165 has preference (µ(165) = 0.73) to substitute into charged<br />
residues (φ(165)K→R/K/E = 0.26).<br />
Figure 2.8: Amino acid substitution patterns for active site residues (A11, S47, T48, Y137, I139,<br />
K165, T167, G189, Y190) <strong>of</strong> Neu5Ac aldolase <strong>and</strong> experimentally determined mutations (I st<br />
generation: Y98H, P115L, II nd generation: V251I, III rd generation G70A, T84S, Q282L) in the boxes<br />
for r<strong>and</strong>om mutagenesis method: epPCR (Taq (MnCl 2, G=A=C=T). Y-axis representations are same<br />
as described in Figure 2.7.<br />
Figure 2.9 <strong>and</strong> 2.10 show the analysis <strong>of</strong> hydrophobic contacts <strong>and</strong> hydrogen bonds<br />
for active site residues <strong>and</strong> experimentally determined mutation positions, respectively.<br />
The results <strong>of</strong> the analysis highly suggest an involvement <strong>of</strong> A11 in hydrophobic<br />
interactions with Y43 or I206 (see Figure 2.9) <strong>and</strong> a side-chain hydrogen bond formation<br />
with Y43 or G207 or N211 (see Figure 2.10). In short, the substitution spectra analysis <strong>of</strong><br />
62
PART I: MAP 2.0 3D<br />
active site residues (A11, S47, T48, Y137, I139, K165, T167, G189, Y190) indicates that the<br />
chemical environment <strong>of</strong> active site residues is not substantially modified by the epPCR<br />
r<strong>and</strong>om mutagenesis method.<br />
Figure 2.9: Amino acid substitution patterns for active site residues (A11, Y137, I139) <strong>of</strong> Neu5Ac<br />
aldolase <strong>and</strong> mutations (1 st generation: Y98H <strong>and</strong> P115L; highlighted in the box frames) involved in<br />
hydrophobic interactions. Figure (a) <strong>and</strong> (b) shows the interaction partners for hydrophobic<br />
interaction. Y-axis representations are same as described in Figure 2.5.<br />
Figure 2.10: Amino acid substitution patterns for active site residues (A11, S47, T48, Y137) <strong>of</strong><br />
Neu5Ac aldolase <strong>and</strong> mutation (I st generation: Y98H; highlighted in a black box) involved in side<br />
chain hydrogen bond. Figure (a) <strong>and</strong> (b) shows the interaction partners for side chain hydrogen<br />
bond. Y-axis representations are same as described in Figure 2.5.<br />
63
PART I: MAP 2.0 3D<br />
Neu5Ac Aldolase directed evolution<br />
Neu5Ac aldolase was engineered applying epPCR with balanced dNTP for a<br />
complete reversal <strong>of</strong> enantioselectivity by Wada et al.[52] Three rounds <strong>of</strong> r<strong>and</strong>om<br />
mutagenesis resulted in a variant more effective toward both D/L enantiomeric substrates<br />
(3-deoxy–L/D-manno-2-octulosonic acid). The two mutation positions (Y98H <strong>and</strong> F115L,<br />
see Figure 2.8) from the first round <strong>of</strong> r<strong>and</strong>om mutagenesis were found to be involved in<br />
hydrophobic interaction (Figure 2.9, from Y98 to L63/A67/F100 <strong>and</strong> from F115 to<br />
L147/L155) <strong>and</strong> side chain hydrogen bonds (Figure 2.10, from E64 to Y98) formation in<br />
wild type. Y98H <strong>and</strong> F115L were present outside the active site, partially exposed to the<br />
solvent with relative solvent accessibility value (RSA = 0.26), moderately flexible with<br />
normalized B-factor (B´ = 0.91) <strong>and</strong> “variable” amino acid substitutions with residue<br />
mutability indicator µ(98/115) = 0.75. The substitutions at these positions into<br />
comparatively more hydrophobic residues resulted in increased activity <strong>of</strong> the wild type<br />
aldolase. In MAP 2.0 3D, Y98 is preferably preserved or substitute into charged (δ(98)Y→ch =<br />
0.27; φ(98)Y→H = 0.25) or neutral (δ(98)Y→ne = 0.33) residues while F115 shows a slightly<br />
higher preference toward aliphatic substitution (δ(115)F→al = 0.42; φ(115)F→L = 0.33) <strong>and</strong><br />
cannot be substituted into charged residues (δ(115)F→ch = 0.00). The substitution at V251<br />
residue was obtained in the second round <strong>of</strong> directed evolution experiment <strong>and</strong> resulted in<br />
partially inverted enantiomeric preference <strong>of</strong> the enzyme. The position was found to be<br />
more conserved in MAP 2.0 3D analysis with µ(251) = 0.54 or substituted into more<br />
hydrophobic residues. The third generation mutations (G70A, T84S, Q282L) resulted into a<br />
complete reversal <strong>of</strong> enzymatic enantioselectivity for use in the synthesis <strong>of</strong> both D- <strong>and</strong> L-<br />
sugars. G70 has a high flexibility (B´ = 2.59) <strong>and</strong> low probability to be substituted into an<br />
aliphatic residue (δ(70)G→al = 0.10; φ(70)G→A = 0.03). Thr84 is a part <strong>of</strong> a turn with a high<br />
flexibility (B´ = 0.77) <strong>and</strong> exposure to the solvent (RSA = 0.25) with the residue mutability<br />
µ(84) = 0.87. T84 has a high preference for being substituted by aliphatic residues<br />
(δ(84)T→al = 0.58) <strong>and</strong> with less extend by a “neutral” amino acid (δ(84)T→ne = 0.34;<br />
φ(84)T→S = 0.13). Q282 is a part <strong>of</strong> a helix, is rigid (B´ = -1.01) but partially exposed to the<br />
solvent (RSA = 0.23). Q282 has high preference to be substituted into a charged residue<br />
64
PART I: MAP 2.0 3D<br />
(δ(282)Q→ch = 0.67) <strong>and</strong> very low for aliphatic (δ(282)Q→al = 0.13; φ(282)Q→L = 0.13) or<br />
neutral (δ(282)Q→ne = 0.08) substitution.<br />
In the case <strong>of</strong> aldolase, the MAP 2.0 3D analysis shows also a good agreement with<br />
experimental results. The variability in amino acid substitution patterns for active site<br />
residues resulted in exploring more sequence space for catalytic activity <strong>of</strong> the enzyme <strong>and</strong><br />
resulted in getting a high fraction <strong>of</strong> beneficial mutations in first generation.<br />
2.5. Conclusions<br />
In this manuscript, we introduced MAP 2.0 3D server <strong>and</strong> its use to assist the design <strong>of</strong><br />
directed evolution experiments. MAP 2.0 3D correlates the traditional sequence based MAP<br />
indicators with the structural information <strong>of</strong> the target protein. The combined information<br />
can help to improve the chances to find functional <strong>and</strong> stable enzyme variants. MAP 2.0 3D<br />
helps to guide the directed evolution experiments by focusing the analysis on a set <strong>of</strong><br />
residues that are important for specific enhancement <strong>of</strong> enzymatic properties such as to<br />
improve substrate specificity by targeting residues located in or near the active site, or to<br />
enhance thermal stability or water solubility <strong>of</strong> proteins by increasing the number <strong>of</strong><br />
charged amino acid substitutions. The new structure oriented features <strong>of</strong> the MAP 2.0 3D<br />
server have been applied to the analysis <strong>of</strong> three different proteins (phytase, oxidase <strong>and</strong><br />
aldolase) <strong>and</strong> the predicted results were compared with the experimental results. The<br />
results <strong>of</strong> RgDAAO analysis indicate that the selection <strong>of</strong> the r<strong>and</strong>om mutagenesis method<br />
by the pre-screening <strong>of</strong> the generated library can help to elucidate the effects <strong>of</strong> mutational<br />
bias on the structural environment <strong>of</strong> the protein <strong>and</strong> how these effects can be optimized.<br />
The analysis <strong>of</strong> phytase <strong>and</strong> Neu5Ac aldolase illustrate that how the structural analysis<br />
features included in MAP 2.0 3D server can now assist to correlate the effect <strong>of</strong> mutational<br />
biases with protein structural environment <strong>and</strong> to evolve desired property. In this way,<br />
MAP 2.0 3D server facilitates the ‘in-silico’ pre-screening <strong>of</strong> the target gene <strong>and</strong> can also<br />
promote an increase <strong>of</strong> the active population in r<strong>and</strong>om mutagenesis libraries, thereby<br />
65
PART I: MAP 2.0 3D<br />
decrease screening efforts <strong>and</strong> increase probability for obtaining desirable mutations even<br />
in the small mutant library.<br />
2.6. References<br />
1. Bornscheuer UT, Pohl M (2001) Improved biocatalysts by directed evolution <strong>and</strong><br />
rational protein design. Curr Opin Chem Biol 5: 137-143.<br />
2. Brakmann S (2001) Discovery <strong>of</strong> superior enzymes by directed molecular evolution.<br />
Chembiochem 2: 865-871.<br />
3. Wong TS, Arnold FH, Schwaneberg U (2004) Laboratory evolution <strong>of</strong> cytochrome<br />
p450 BM-3 monooxygenase for organic cosolvents. Biotechnol Bioeng 85: 351-358.<br />
4. Roccatano D, Wong TS, Schwaneberg U, Zacharias M (2006) Toward underst<strong>and</strong>ing<br />
the inactivation mechanism <strong>of</strong> monooxygenase P450 BM-3 by organic cosolvents: a<br />
molecular dynamics simulation study. Biopolymers 83: 467-476.<br />
5. Wong TS, Zhurina D, Schwaneberg U (2006) The diversity challenge in directed<br />
protein evolution. Comb Chem High T Scr 9: 271-288.<br />
6. Wong TS, Roccatano D, Schwaneberg U (2007) Steering directed protein evolution:<br />
strategies to manage combinatorial complexity <strong>of</strong> mutant libraries. Environ<br />
Microbiol 9: 2645-2659.<br />
7. Smith JM (1970) Natural Selection <strong>and</strong> Concept <strong>of</strong> a Protein Space. Nature 225: 563-<br />
564.<br />
8. Olsen M, Iverson B, Georgiou G (2000) High-throughput screening <strong>of</strong> enzyme<br />
libraries. Curr Opin Biotech 11: 331-337.<br />
9. Tawfik DS, Bershtein S (2008) Advances in laboratory evolution <strong>of</strong> enzymes. Curr<br />
Opin Chem Biol 12: 151-158.<br />
10. Shivange AV, Marienhagen J, Mundhada H, Schenk A, Schwaneberg U (2009)<br />
Advances in generating functional diversity for directed protein evolution. Curr<br />
Opin Chem Biol 13: 19-25.<br />
66
PART I: MAP 2.0 3D<br />
11. Turner NJ (2009) Directed evolution drives the next generation <strong>of</strong> biocatalysts. Nat<br />
Chem Biol 5: 567-573.<br />
12. Wong TS, Roccatano D, Loakes D, Tee KL, Schenk A, et al. (2008) Transversionenriched<br />
sequence saturation mutagenesis (SeSaM-Tv+): a r<strong>and</strong>om mutagenesis<br />
method with consecutive nucleotide exchanges that complements the bias <strong>of</strong> errorprone<br />
PCR. Biotechnol J 3: 74-82.<br />
13. Dennig A, Shivange AV, Marienhagen J, Schwaneberg U (2011) OmniChange: The<br />
Sequence Independent Method for Simultaneous Site-Saturation <strong>of</strong> Five Codons.<br />
PLoS ONE 6: e26222.<br />
14. Chica RA, Doucet N, Pelletier JN (2005) Semi-rational approaches to engineering<br />
enzyme activity: combining the benefits <strong>of</strong> directed evolution <strong>and</strong> rational design.<br />
Curr Opin Biotech 16: 378-384.<br />
15. Zumarraga M, Camarero S, Shleev S, Martinez-Arias A, Ballesteros A, et al. (2008)<br />
Altering the laccase functionality by in vivo assembly <strong>of</strong> mutant libraries with<br />
different mutational spectra. Proteins 71: 250-260.<br />
16. Vanhercke T, Ampe C, Tirry L, Denolf P (2005) Reducing mutational bias in r<strong>and</strong>om<br />
protein libraries. Anal Biochem 339: 9-14.<br />
17. Wong TS, Roccatano D, Zacharias M, Schwaneberg U (2006) A statistical analysis <strong>of</strong><br />
r<strong>and</strong>om mutagenesis methods used for directed protein evolution. J Mol Biol 355:<br />
858-871.<br />
18. Wong TS, Roccatano D, Schwaneberg U (2007) Are transversion mutations better? A<br />
Mutagenesis Assistant Program analysis on P450 BM-3 heme domain. Biotechnol J<br />
2: 133-142.<br />
19. Rasila TS, Pajunen MI, Savilahti H (2009) Critical evaluation <strong>of</strong> r<strong>and</strong>om mutagenesis<br />
by error-prone polymerase chain reaction protocols, Escherichia coli mutator strain,<br />
<strong>and</strong> hydroxylamine treatment. Anal Biochem 388: 71-80.<br />
20. Ditursi MK, Kwon SJ, Reeder PJ, Dordick JS (2006) Bioinformatics-driven, rational<br />
engineering <strong>of</strong> protein thermostability. Protein Eng Des Sel 19: 517-524.<br />
21. Shoichet BK, Beadle BM (2002) Structural bases <strong>of</strong> stability-function trade<strong>of</strong>fs in<br />
enzymes. J Mol Bio 321: 285-296.<br />
67
PART I: MAP 2.0 3D<br />
22. Berman H, Henrick K, Nakamura H (2003) Announcing the worldwide Protein Data<br />
Bank. Nat Struct Biol 10: 980-980.<br />
23. Zhang H, Zhang T, Chen K, Shen S, Ruan J, et al. (2009) On the relation between<br />
residue flexibility <strong>and</strong> local solvent accessibility in proteins. Proteins 76: 617-636.<br />
24. Teilum K, Olsen JG, Kragelund BB (2009) Functional aspects <strong>of</strong> protein flexibility.<br />
Cell Mol Life Sci 66: 2231-2247.<br />
25. Gromiha MM, Oobatake M, Kono H, Uedaira H, Sarai A (1999) Role <strong>of</strong> structural <strong>and</strong><br />
sequence information in the prediction <strong>of</strong> protein stability changes: comparison<br />
between buried <strong>and</strong> partially buried mutations. Protein Eng Des Sel 12: 549-555.<br />
26. Gromiha MM, Selvaraj S (2004) Inter-residue interactions in protein folding <strong>and</strong><br />
stability. Prog Biophys Mol Bio 86: 235-277.<br />
27. Kabsch W, S<strong>and</strong>er C (1983) Dictionary <strong>of</strong> protein secondary structure: pattern<br />
recognition <strong>of</strong> hydrogen-bonded <strong>and</strong> geometrical features. Biopolymers 22: 2577-<br />
2637.<br />
28. Chothia C (1976) The nature <strong>of</strong> the accessible <strong>and</strong> buried surfaces in proteins. J Mol<br />
Biol 105: 1-12.<br />
29. Peisajovich SG, Tawfik DS (2007) Protein engineers turned evolutionists. Nat<br />
Methods 4: 991-994.<br />
30. Karplus PA, Schulz GE (1985) Prediction <strong>of</strong> Chain Flexibility in Proteins - a Tool for<br />
the Selection <strong>of</strong> Peptide Antigens. Naturwissenschaften 72: 212-213.<br />
31. Yuan Z, Zhao J, Wang ZX (2003) Flexibility analysis <strong>of</strong> enzyme active sites by<br />
crystallographic temperature factors. Protein Eng Des Sel 16: 109-114.<br />
32. Kumar S, Nussinov R (2002) Close-range electrostatic interactions in proteins.<br />
Chembiochem 3: 604-617.<br />
33. Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic<br />
character <strong>of</strong> a protein. J Mol Biol 157: 105-132.<br />
34. Burley SK, Petsko GA (1985) Aromatic-aromatic interaction: a mechanism <strong>of</strong> protein<br />
structure stabilization. Science 229: 23-28.<br />
35. Overington J, Johnson MS, Sali A, Blundell TL (1990) Tertiary structural constraints<br />
on protein evolutionary diversity: templates, key residues <strong>and</strong> structure prediction.<br />
P Roy Soc Lond B Bio 241: 132-145.<br />
68
PART I: MAP 2.0 3D<br />
36. Liao GJ, Lee YJ, Lee YH, Chen LL, Chu WS (1998) Structure <strong>and</strong> expression <strong>of</strong> the D-<br />
amino-acid oxidase gene from the yeast Rhodosporidium toruloides. Biotechnol<br />
Appl Bioc 27 ( Pt 1): 55-61.<br />
37. Pollegioni L, Diederichs K, Molla G, Umhau S, Welte W, et al. (2002) Yeast D-amino<br />
acid oxidase: structural basis <strong>of</strong> its catalytic properties. J Mol Biol 324: 535-546.<br />
38. Rodriguez E, Han Y, Lei XG (1999) Cloning, sequencing, <strong>and</strong> expression <strong>of</strong> an<br />
Escherichia coli acid phosphatase/phytase gene (appA2) isolated from pig colon.<br />
Biochem Bioph Res Co 257: 117-123.<br />
39. Lim D, Golovan S, Forsberg CW, Jia Z (2000) Crystal structures <strong>of</strong> Escherichia coli<br />
phytase <strong>and</strong> its complex with phytate. Nat Struct Biol 7: 108-113.<br />
40. Ohta Y, Watanabe K, Kimura A (1985) Complete nucleotide sequence <strong>of</strong> the E. coli N-<br />
acetylneuraminate lyase. Nucleic Acids Res 13: 8843-8852.<br />
41. Izard T, Lawrence MC, Malby RL, Lilley GG, Colman PM (1994) The threedimensional<br />
structure <strong>of</strong> N-acetylneuraminate lyase from Escherichia coli. Structure<br />
2: 361-369.<br />
42. Pilone MS (2000) D-Amino acid oxidase: new findings. Cell Mol Life Sci 57: 1732-<br />
1747.<br />
43. Pollegioni L, Molla G (2011) New biotech applications from evolved D-amino acid<br />
oxidases. Trends Biotechnol 29: 276-283.<br />
44. Lin-Goerke JL, Robbins DJ, Burczak JD (1997) PCR-based r<strong>and</strong>om mutagenesis using<br />
manganese <strong>and</strong> reduced dNTP concentration. Biotechniques 23: 409-412.<br />
45. Vartanian JP, Henry M, Wain-Hobson S (1996) Hypermutagenic PCR involving all<br />
four transitions <strong>and</strong> a sizeable proportion <strong>of</strong> transversions. Nucleic Acids Res 24:<br />
2627-2631.<br />
46. Jmol: an open-source Java viewer for chemical structures in 3D.<br />
http://www.jmol.org/<br />
47. Sacchi S, Rosini E, Molla G, Pilone MS, Pollegioni L (2004) Modulating D-amino acid<br />
oxidase substrate specificity: production <strong>of</strong> an enzyme for analytical determination<br />
<strong>of</strong> all D-amino acids by directed evolution. Protein Eng Des Sel 17: 517-525.<br />
69
PART I: MAP 2.0 3D<br />
48. Rao DE, Rao KV, Reddy TP, Reddy VD (2009) Molecular characterization,<br />
physicochemical properties, known <strong>and</strong> potential applications <strong>of</strong> phytases: An<br />
overview. Crit Rev Biotechnol 29: 182-198.<br />
49. Garrett JB, Kretz KA, O'Donoghue E, Kerovuo J, Kim W, et al. (2004) Enhancing the<br />
thermal tolerance <strong>and</strong> gastric performance <strong>of</strong> a microbial phytase for use as a<br />
phosphate-mobilizing monogastric-feed supplement. Appl Environ Microb 70:<br />
3041-3046.<br />
50. Fields PA (2001) Review: Protein function at thermal extremes: balancing stability<br />
<strong>and</strong> flexibility. Comp Biochem Phys A 129: 417-431.<br />
51. Kim MS, Lei XG (2008) Enhancing thermostability <strong>of</strong> Escherichia coli phytase AppA2<br />
by error-prone PCR. Appl Microbiol Biotechnol 79: 69-75.<br />
52. Wada M, Hsu CC, Franke D, Mitchell M, Heine A, et al. (2003) Directed evolution <strong>of</strong><br />
N-acetylneuraminic acid aldolase to catalyze enantiomeric aldol reactions. Bioorgan<br />
Med Chem 11: 2091-2098.<br />
Part <strong>of</strong> this chapter is adapted with permission from ‘Verma R, Schwaneberg U,<br />
Roccatano D. ACS Synthetic Biology 2012, 1 (4), 139-150.’<br />
70
PART II: MD Simulation<br />
Chapter 3<br />
Introduction to Molecular Dynamics Simulation <strong>of</strong><br />
Biomolecules<br />
This chapter provides the brief introduction <strong>of</strong> molecular dynamics (MD) simulation<br />
followed by the description about the system preparation for MD simulation <strong>and</strong> the<br />
analysis <strong>of</strong> generated trajectories. In the following chapters <strong>of</strong> this thesis, the same<br />
procedure is used to perform MD simulation using P450BM-3 monooxygenase as model<br />
system <strong>and</strong> the analysis <strong>of</strong> trajectories.<br />
3.1. Background<br />
Last decades witnessed the rapid development in the field <strong>of</strong> MD simulations for<br />
biological molecules to study their dynamic processes at atomic level. Far from its infancy,<br />
the computer simulation methods can nowadays provide an important insight into the<br />
molecular basis <strong>of</strong> protein structure, function <strong>and</strong> dynamics relationships.[1-4]<br />
MD is a computational chemistry method that describes the dynamics <strong>of</strong> a molecular<br />
system by integrating Newton’s equations <strong>of</strong> motion for a system <strong>of</strong> N interacting atoms. In<br />
MD simulation, the force acting on the i th particle (Fi) are calculated as negative derivatives<br />
<strong>of</strong> a potential energy function (V) (equation 3.1), called force field that describes the atomic<br />
interactions in an approximate way. In equation 3.1, ri represents the position <strong>of</strong> i th atom.<br />
71
PART II: MD Simulation<br />
F<br />
i<br />
∂V<br />
( r<br />
= −<br />
1<br />
,r3<br />
…r<br />
∂r<br />
i<br />
N<br />
)<br />
(3.1)<br />
The dynamics <strong>of</strong> the system is calculated according to the Newton’s law by<br />
numerically integrating the differential equations <strong>of</strong> motion (equation 3.2). In this way, a<br />
new set <strong>of</strong> atomic positions <strong>and</strong> velocities (vi) can be generated at successive integration<br />
time step dt.<br />
∂v i =<br />
∂t<br />
Fi<br />
m<br />
i<br />
(3.2)<br />
The so-called Leap-frog algorithm[5] is commonly used in MD simulation to<br />
integrate the equation <strong>of</strong> motion. It updates velocities (equation 3.3) <strong>and</strong> positions<br />
(equation 3.4) <strong>of</strong> i th atom <strong>of</strong> mass mi using force F(t) at position ri(t).<br />
1 1 dt<br />
vi ( t + dt)<br />
= vi<br />
( t − dt)<br />
+ F(<br />
t)<br />
2 2 m<br />
(3.3)<br />
1<br />
ri ( t dt)<br />
= ri<br />
( t)<br />
+ dtvi<br />
( t + dt)<br />
2<br />
+<br />
(3.4)<br />
The simulation generates an ensemble <strong>of</strong> molecular configurations (trajectory) that<br />
describes the evolution <strong>of</strong> the coordinates <strong>and</strong> velocities <strong>of</strong> the system as a function <strong>of</strong> time.<br />
To generate equilibrium ensemble consistent with the experimental conditions, at which<br />
the system was studied temperature <strong>and</strong> pressure <strong>of</strong> the simulated system are controlled<br />
<strong>and</strong> keep constant during the simulation.[3,4]<br />
72
PART II: MD Simulation<br />
Force field equation<br />
In MD simulation, the force field characterizes the different terms <strong>of</strong> atomic<br />
interactions as bonded <strong>and</strong> non-bonded interaction. Bonded interactions include bonds,<br />
angles, dihedrals <strong>and</strong> improper interaction terms <strong>and</strong> non-bonded interactions have van<br />
der Waals (vdW) <strong>and</strong> electrostatic terms (equation 3.5).<br />
V = V + V + V + V + V + V<br />
bond<br />
angle<br />
dihedral<br />
improper<br />
vdW<br />
es<br />
(3.5)<br />
Bonded interactions<br />
Bond stretching between covalently bound atoms (i <strong>and</strong> j having bond length b) is<br />
calculated by covalent bond potential (Vbond) in GROMOS-96 (equation 3.6) using<br />
b<br />
k<br />
ij<br />
force<br />
constant <strong>and</strong> equilibrium bond length b0. Angle vibrations between triplets <strong>of</strong> atoms (i, j<br />
<strong>and</strong> k) having bond angle θ are represented by cosine based angle potential (Vangle)<br />
(equation 3.7) using<br />
θ<br />
k<br />
ijk<br />
force constant <strong>and</strong> equilibrium bond angle cosθ0.<br />
V<br />
bond<br />
1<br />
4<br />
b 2 2 2<br />
= kij<br />
( b − b0<br />
)<br />
(3.6)<br />
V<br />
1<br />
=<br />
2<br />
θ<br />
angle<br />
k ijk<br />
(cosθ<br />
− cosθ<br />
)<br />
o<br />
2<br />
(3.7)<br />
Torsional interactions (equation 3.8) involve four atoms (i, j, k, <strong>and</strong> l) <strong>and</strong> define the<br />
dihedral angle Φ as the angle present between two planes constituted by first three (i, j <strong>and</strong><br />
k) <strong>and</strong> last three (j, k <strong>and</strong> l) atoms. Torsional potential define the interactions arising by the<br />
73
PART II: MD Simulation<br />
rotation <strong>of</strong> two functional groups connected with a bond <strong>and</strong> defined in equation 3.8,<br />
where<br />
φ<br />
k<br />
ijkl<br />
is the force constant.<br />
V<br />
dihedral<br />
=<br />
k<br />
1+<br />
cos( nφ<br />
−φ<br />
))<br />
φ<br />
ijkl( o<br />
(3.8)<br />
Improper dihedrals are used to define the planarity <strong>of</strong> the four atoms (i, j, k <strong>and</strong> l)<br />
defined by harmonic interaction potential in equation 3.9 where<br />
ξ<br />
k<br />
ijkl<br />
is the force constant<br />
<strong>and</strong> ξ is the dihedral angle on four atoms to keep them in special configuration. For<br />
example, ξ0 will be equal to 0° to keep the four atoms in planar but also tetrahedral<br />
configuration (i, j, k <strong>and</strong> l)).<br />
V<br />
1<br />
2<br />
ξ<br />
ξ<br />
ξ<br />
2<br />
improper<br />
= k ijkl<br />
( −<br />
o)<br />
(3.9)<br />
Non-bonded interactions<br />
vdW interactions are resulted from the induced atomic dipoles <strong>and</strong> excluded<br />
volumes <strong>of</strong> atom pairs. They are attractive at long-range distance but become repulsive at<br />
short-range distance between the atom pairs. In GROMACS, they are defined as Lennard-<br />
Jones (LJ) potential terms (VLJ) (see equation 3.10). In equation 3.10, εij <strong>and</strong> σij are empirical<br />
parameters where ε is the depth <strong>of</strong> potential well <strong>and</strong> σ is the distance at which the<br />
potential is zero <strong>and</strong> rij is the distance between i th <strong>and</strong> j th atoms.<br />
V<br />
LJ<br />
⎛<br />
⎜⎛σ<br />
ij<br />
= 4ε<br />
⎜<br />
ij⎜<br />
⎝⎝<br />
rij<br />
⎞<br />
⎟<br />
⎠<br />
12<br />
⎛σ<br />
ij<br />
− ⎜<br />
⎝ rij<br />
⎞<br />
⎟<br />
⎠<br />
6<br />
⎞<br />
⎟<br />
⎟<br />
⎠<br />
(3.10)<br />
74
PART II: MD Simulation<br />
Electrostatic potential (Ves) terms define the Coulombic interaction between two<br />
charged atoms (i <strong>and</strong> j) calculated by equation 3.11, where ε0 <strong>and</strong> εr are the dielectric<br />
constants, qi <strong>and</strong> qj are the atomic charges on i th <strong>and</strong> j th atoms <strong>and</strong> rij is the distance between<br />
them.<br />
V<br />
es<br />
=<br />
1<br />
4πε 0<br />
q q<br />
i<br />
ε r<br />
r<br />
j<br />
ij<br />
(3.11)<br />
The topology <strong>of</strong> a simulated biomolecule is the ordered list <strong>of</strong> all these interactions<br />
<strong>and</strong> their predefined parameters for the selected force field. In this thesis all MD<br />
simulations were performed using GROMOS96 43a1 force field.[4]<br />
3.2. Setup <strong>of</strong> the simulated systems<br />
The crystal structure <strong>of</strong> the protein was used as the starting coordinates <strong>of</strong> the MD<br />
simulation (in this thesis PDB ID: 1BVY [6] is used for the simulation <strong>of</strong> P450BM-3<br />
domains). The proteins were centered in a cubic periodic box <strong>and</strong> set to have at least a<br />
minimal distance between the protein <strong>and</strong> any side <strong>of</strong> the box larger than 0.80 nm so that<br />
the protein cannot see its periodic image across the boundary <strong>of</strong> the box. They were<br />
solvated by stacking equilibrated boxes <strong>of</strong> solvent molecules to fill completely the<br />
simulation box. All solvent molecules with any atom within 0.15 nm from the atoms <strong>of</strong><br />
protein were removed. SPC water model, a simple three atoms model was used for water<br />
molecule. [7] Sodium counter ions were added by replacing solvent molecules at the most<br />
negative electrostatic potential to provide a total charge <strong>of</strong> the box equal to zero. The<br />
protonation state <strong>of</strong> residues in the protein was assumed to be the same as <strong>of</strong> the isolated<br />
amino acids in solution at pH 7. Hence, the water molecules closest to the charges in the<br />
protein structures are replaced by the counter ions to neutralize the system. The LINCS<br />
(Linear Constraints Solver)[8] algorithm was used to constrain all bond lengths. LINCS [8]<br />
75
PART II: MD Simulation<br />
algorithm use the stable <strong>and</strong> fast way to reset the bond length after an unconstrained<br />
update. SETTLE [9] algorithm was used for solvent molecules to constrain them as rigid<br />
body.<br />
Electrostatic interactions were calculated by using Particle Mesh Ewalds<br />
method.[10] For the calculation <strong>of</strong> the long-range interactions, a grid spacing <strong>of</strong> 0.12 nm<br />
combined with a fourth-order B-spline interpolation were used to compute the potential<br />
<strong>and</strong> forces between grid points. A non-bonded pair-list cut<strong>of</strong>f <strong>of</strong> 1.3 nm was used <strong>and</strong><br />
updated at every 5 time-steps.<br />
3.3. Equilibration procedure<br />
Simulated systems were first energy minimized, using the steepest descent<br />
algorithm [11], for at least 2000 steps in order to remove clashes between atoms that were<br />
too close. After energy minimization, all atoms were given an initial velocity obtained from<br />
a Maxwell-Boltzmann velocity distribution at 300 K to start MD simulations.<br />
All systems were initially equilibrated by 100 ps <strong>of</strong> MD run with position restraints<br />
on the heavy atoms <strong>of</strong> the solute to allow relaxation <strong>of</strong> the solvent molecules. In position<br />
restraint, the protein is fixed in the reference position using force constants in each spatial<br />
dimension <strong>and</strong> let the solvent relax around protein. Berendsen’s thermostat[12] was used<br />
to keep the temperature at 300 K by weak coupling the systems to an external thermal bath<br />
with a relaxation time constant τ = 0.1 ps. The pressure <strong>of</strong> the system was kept at 1 bar by<br />
using the Berendsen’s barostat[12] with a time constant <strong>of</strong> 1 ps. After the equilibration<br />
procedure, position restraints were removed <strong>and</strong> the system was gradually heated from 50<br />
K to 300 K during 200 ps <strong>of</strong> simulation. Finally, a production run was performed at 300 K.<br />
The analysis <strong>of</strong> trajectories were performed by using the GROMACS s<strong>of</strong>tware package<br />
(http://www.gromacs.org/).[13]<br />
76
PART II: MD Simulation<br />
3.4. Structural <strong>and</strong> dynamical analysis<br />
The structural stability <strong>and</strong> convergence <strong>of</strong> protein were examined by analyzing<br />
root mean square deviation (RMSD), radius <strong>of</strong> gyration (Rg) <strong>and</strong> secondary structure<br />
elements with respect to its crystal structure as a function <strong>of</strong> time. Residual root mean<br />
square fluctuation (RMSF) was used to access the dynamics <strong>of</strong> the target protein during the<br />
simulation.<br />
3.5. Cluster analysis<br />
Cluster analysis was performed to characterize the conformational diversity <strong>of</strong> the<br />
structures generated during MD simulations. Cluster analysis was performed using the<br />
Gromos clustering algorithm[14] on the backbone atoms (Cα, C <strong>and</strong> N) <strong>of</strong> protein. The<br />
analysis is performed on conformations extracted at regular time interval from the<br />
generated trajectory. The resulting conformations were superimposed with respect to<br />
backbone atoms <strong>of</strong> reference structure after removing overall translation <strong>and</strong> rotation <strong>of</strong><br />
protein in space. The similarity (RMSD-distance) matrix is prepared for all the pairs <strong>of</strong><br />
selected conformers.<br />
In Gromos clustering algorithm, RMSD cut<strong>of</strong>f criteria is used to add similar atomic<br />
coordinates in the cluster having RMSD value less than the defined cut<strong>of</strong>f. The atomic<br />
coordinates having smallest RMSD from other members <strong>of</strong> the cluster is known as the<br />
representative structure <strong>of</strong> that cluster. The same process is repeated until all the selected<br />
structures for the analysis are assigned to the clusters. In this way, the large clusters<br />
represent the ensemble <strong>of</strong> frequently populated configurations in conformational space<br />
during MD simulation.<br />
77
PART II: MD Simulation<br />
3.6. Principal component analysis<br />
Principal component analysis (PCA, also called essential dynamics analysis) was<br />
performed to access the conformational space by identifying collective motions in the<br />
biomolecules during MD simulation. PCA correlates the atomic positional fluctuations in<br />
proteins <strong>and</strong> can enhance the molecular level underst<strong>and</strong>ing <strong>of</strong> protein function.[15] The<br />
covariance matrix (C) <strong>of</strong> atomic coordinates (3N x 3N) is used to construct to identify<br />
collective motions in the biomolecules[16,17] The backbone atoms (Cα, C <strong>and</strong> N) <strong>of</strong> the<br />
target proteins were used to explore the conformational subspace in solution. For PCA<br />
analysis first the translational <strong>and</strong> rotational motions are eliminated from the trajectory (to<br />
consider internal motions only) by the least square fitting <strong>of</strong> atomic coordinates (using<br />
backbone atoms) to crystal structure. The resulted set <strong>of</strong> atomic coordinates is used to<br />
construct C <strong>of</strong> positional deviations using equation 3.12.[17]<br />
C<br />
=<br />
( x − x )( x − x ) T<br />
(3.12)<br />
where x is the subset <strong>of</strong> atoms in the trajectory x(t) <strong>and</strong><br />
represents the ensemble<br />
average over time. The eigenvectors or essential modes can be identified by the<br />
diagonalization <strong>of</strong> the symmetric matrix (C) using orthogonal transformation matrix T. The<br />
displacements along different eigenvectors were calculated by projecting the atomic<br />
coordinates on eigenvectors. The comparison <strong>of</strong> eigenvectors obtained from different<br />
simulations was performed using the root-mean-square inner product (RMSIP)[18], which<br />
is defined in equation 3.13.<br />
RMSIP<br />
=<br />
1<br />
N<br />
m<br />
m<br />
∑ ∑<br />
i= 1 j=<br />
1<br />
2<br />
( v<br />
i<br />
⋅ u<br />
j<br />
)<br />
(3.13)<br />
78
PART II: MD Simulation<br />
where vi <strong>and</strong> uj are i th <strong>and</strong> j th eigenvectors <strong>of</strong> the two different m dimensions essential<br />
subspaces <strong>of</strong> the two systems. RMSIP gives a simple measure to assess the dynamical<br />
similarity <strong>of</strong> eigenvectors.[18]<br />
3.7. References<br />
1. Dror RO, Dirks RM, Grossman JP, Xu H, Shaw DE (2012) Biomolecular simulation: a<br />
computational microscope for molecular biology. Annu Rev Biophys 41: 429-452.<br />
2. Mccammon JA, Gelin BR, Karplus M (1977) Dynamics <strong>of</strong> Folded Proteins. Nature<br />
267: 585-590.<br />
3. Karplus M, McCammon JA (2002) Molecular dynamics simulations <strong>of</strong> biomolecules.<br />
Nat Struct Biol 9: 646-652.<br />
4. van Gunsteren WF, Billeter SR, Eising AA, Hunenberger PH, Kruger P, et al. (1996)<br />
Biomolecular Simulation: The GROMOS96 Manual <strong>and</strong> User Guide. VdF:<br />
Hochschulverlag AG an der ETH Zurich <strong>and</strong> BIOMOS bv, Zurich, Groningen.<br />
5. Hockney RW, Goel SP, Eastwood JW (1974) Quiet high-resolution computer models<br />
<strong>of</strong> a plasma. J Comp Phys 14: 148-158.<br />
6. Sevrioukova IF, Li HY, Zhang H, Peterson JA, Poulos TL (1999) Structure <strong>of</strong> a<br />
cytochrome P450-redox partner electron-transfer complex. P Natl Acad Sci USA 96:<br />
1863-1868.<br />
7. Berendsen HJC, Postma JPM, van Gunsteren WF, Hermans J (1981) Interaction<br />
models for water in relation to protein hydration. Intermolecular Forces: 331-342.<br />
8. Hess B, Bekker H, Berendsen HJC, Fraaije JGEM (1997) LINCS: A linear constraint<br />
solver for molecular simulations. J Comput Chem 18: 1463-1472.<br />
9. Miyamoto S, Kollman PA (1992) Settle - an Analytical Version <strong>of</strong> the Shake <strong>and</strong><br />
Rattle Algorithm for Rigid Water Models. J Comput Chem 13: 952-962.<br />
10. Darden T, York D, Pedersen L (1993) Particle Mesh Ewald - an N.Log(N) Method for<br />
Ewald Sums in Large Systems. J Chem Phys 98: 10089-10092.<br />
79
PART II: MD Simulation<br />
11. Cauchy A (1847) Méthode générale pour la résolution des systèmes d'équations<br />
simultanées. C R Acad Sci Paris 25: 536-538.<br />
12. Berendsen HJC, Postma JPM, Vangunsteren WF, Dinola A, Haak JR (1984) Molecular-<br />
Dynamics with Coupling to an External Bath. J Chem Phys 81: 3684-3690.<br />
13. Hess B, Kutzner C, van der Spoel D, Lindahl E (2008) GROMACS 4: Algorithms for<br />
highly efficient, load-balanced, <strong>and</strong> scalable molecular simulation. J Chem Theory<br />
Comput 4: 435-447.<br />
14. Daura X, Gademann K, Jaun B, Seebach D, van Gunsteren WF, et al. (1999) Peptide<br />
folding: When simulation meets experiment. Angew Chem Int Edit 38: 236-240.<br />
15. Berendsen HJ, Hayward S (2000) Collective protein dynamics in relation to function.<br />
Curr Opin Struct Biol 10: 165-169.<br />
16. Ichiye T, Karplus M (1991) Collective motions in proteins: a covariance analysis <strong>of</strong><br />
atomic fluctuations in molecular dynamics <strong>and</strong> normal mode simulations. Proteins<br />
11: 205-217.<br />
17. Amadei A, Linssen AB, Berendsen HJ (1993) Essential dynamics <strong>of</strong> proteins. Proteins<br />
17: 412-425.<br />
18. Amadei A, Ceruso MA, Di Nola A (1999) On the convergence <strong>of</strong> the conformational<br />
coordinates basis set obtained by the essential dynamics analysis <strong>of</strong> proteins'<br />
molecular dynamics simulations. Proteins 36: 419-424.<br />
80
PART II: P450BM-3 Reductase Domain<br />
Chapter 4<br />
Conformational Dynamics <strong>of</strong> the FMN-binding Reductase<br />
Domain <strong>of</strong> Monooxygenase P450BM-3<br />
4.1. Abstract<br />
In the cytochrome P450BM-3, flavin mononucleotide (FMN) binding domain is an<br />
intermediate electron donor between the flavin adenine dinucleotide (FAD) binding<br />
domain <strong>and</strong> the HEME domain. Experimental evidence has shown that different redox<br />
states <strong>of</strong> FMN c<strong>of</strong>actor were found to induce conformational changes in the FMN domain.<br />
Herein, molecular dynamics (MD) simulation is used to gain insight into the latter<br />
phenomenon at atomistic level. We have studied the effect <strong>of</strong> FMN c<strong>of</strong>actor <strong>and</strong> its redox<br />
states (oxidized <strong>and</strong> reduced) on the structure <strong>and</strong> dynamics <strong>of</strong> FMN domain. The results <strong>of</strong><br />
our study show significant differences in the atomic fluctuation amplitude <strong>of</strong> FMN domain<br />
in both holo- <strong>and</strong> apo-protein. The change in the protonation state <strong>of</strong> FMN c<strong>of</strong>actor mostly<br />
affects its binding in holo-protein. In particular, the loops involved in the binding <strong>of</strong><br />
isoalloxazine ring (Lβ4) <strong>and</strong> ribityl side chain (Lβ1) adopt different conformations in both<br />
reduced <strong>and</strong> oxidized states. In addition, the reduced FMN c<strong>of</strong>actor mainly induces a<br />
conformational change in Trp574 residue (Lβ4) that is essential to control electron<br />
transfer (ET) within P450BM-3 domains. The structure <strong>of</strong> the apo-protein in solution<br />
remains mostly unchanged with respect to the crystal structure <strong>of</strong> the holo-protein.<br />
However, FMN binding loops were more flexible in apo-protein that might favor the<br />
81
PART II: P450BM-3 Reductase Domain<br />
rebinding <strong>of</strong> FMN c<strong>of</strong>actor. In the holo-protein simulation, the largest conformational<br />
changes in FMN c<strong>of</strong>actor are caused by ribityl side chain. The isoalloxazine ring <strong>of</strong> FMN<br />
c<strong>of</strong>actor remains almost planar (~177°) in oxidized state <strong>and</strong> bends along N5 — N10 axis<br />
at the angle <strong>of</strong> ~160° in reduced state. The collective modes <strong>of</strong> isoalloxazine ring were<br />
identical in both the protonation states <strong>of</strong> FMN c<strong>of</strong>actor except the first eigenvector. In<br />
reduced state, the isoalloxazine ring attains the butterfly motion as a dominant collective<br />
motion in first eigenvector due to the bending along N5 — N10 axis.<br />
4.1. Introduction<br />
Cytochrome P450 monooxygenases, the largest superfamily <strong>of</strong> heme-containing<br />
soluble proteins, spread widely in almost all domains <strong>of</strong> life e.g. bacteria, yeast, insects,<br />
mammalian tissues, <strong>and</strong> plants.[1-3] They catalyze the oxidation <strong>of</strong> wide variety <strong>of</strong><br />
substrates involved in biosynthesis <strong>and</strong> biodegradation pathways, or in xenobiotics<br />
metabolism.[4] Cytochrome P450BM-3, isolated from Bacillus megaterium, is a<br />
multidomain self-sufficient NADPH dependent flavoenzyme (class III bacterial P450).[5] As<br />
a pivotal member <strong>of</strong> its super family it has been deeply studied as an important model<br />
system for the comprehension <strong>of</strong> structure/function relationships <strong>and</strong> many structural <strong>and</strong><br />
kinetic data are available in literature.[6,7] The peculiar catalytic properties <strong>of</strong> this enzyme<br />
towards industrial applications has also been successfully enhanced by protein<br />
engineering.[8,9]<br />
This enzyme is composed by two reductase domains (with FAD <strong>and</strong> FMN c<strong>of</strong>actors)<br />
<strong>and</strong> a P450 HEME domain (with a HEME c<strong>of</strong>actor) are arranged on a single polypeptide<br />
chain as HEME-FMN-FAD from N- to C-terminus. The transfer <strong>of</strong> two successive electrons<br />
from NADPH to HEME c<strong>of</strong>actor is essential for oxygenation reaction.[10-12] During the<br />
oxygenation reaction, the enzyme is reduced by NADPH, with electrons first transferred to<br />
FAD c<strong>of</strong>actor <strong>of</strong> FAD-binding domain then to the FMN c<strong>of</strong>actor <strong>of</strong> FMN-binding domain <strong>and</strong><br />
finally to the HEME iron in the substrate bound HEME domain. In this ET process, FMN<br />
82
PART II: P450BM-3 Reductase Domain<br />
domain serves as one or two electrons mediator from the FAD c<strong>of</strong>actor to the heme<br />
iron.[13] FMN c<strong>of</strong>actor switches between fully oxidized <strong>and</strong> semiquinone state during<br />
catalytic turnover. The thermodynamically unstable anionic semiquinone state can reduce<br />
HEME iron. However, other P450s utilize FMN hydroquinone as the reduction species.[11-<br />
13] In the substrate free P450BM-3, FMN c<strong>of</strong>actor stays in a thermodynamically stable<br />
hydroquinone state which is not able to reduce HEME iron.[11]. The use <strong>of</strong> the anionic<br />
semiquinone state <strong>of</strong> FMN c<strong>of</strong>actor as reduction species makes P450BM-3 more efficient<br />
with a high turnover rate in comparison to other members <strong>of</strong> this family.[14] Therefore, the<br />
thermodynamics properties <strong>of</strong> the FMN moiety are mainly responsible for the unusual<br />
redox properties <strong>of</strong> P450BM-3. The protein environment has a strong influence on the<br />
latter mechanism by changing the redox potential <strong>of</strong> FMN <strong>and</strong> HEME c<strong>of</strong>actors. The<br />
mutagenesis studies have shown that the conformation <strong>of</strong> FMN binding loops plays a<br />
critical role in stabilizing the different redox states <strong>of</strong> FMN c<strong>of</strong>actor in the protein<br />
environment.[14,15] The insertion <strong>of</strong> a glycine residue in the re-face (inner-FMN binding)<br />
loop able to stabilize neutral semiquinone state in P450BM-3 as observed in other diflavin<br />
reductases.[16]<br />
Although, experimental data <strong>of</strong> the isolated FMN domain in solution <strong>and</strong> the<br />
crystallographic structure[17,18] are available, molecular dynamics (MD) study <strong>of</strong> this<br />
protein has not been reported. In this paper, the structural <strong>and</strong> dynamics properties <strong>of</strong> the<br />
FMN domain as holo-protein, with FMN c<strong>of</strong>actor in oxidized <strong>and</strong> reduced states <strong>and</strong> as apoprotein<br />
are investigated using classical MD simulations. The aim <strong>of</strong> this study is to<br />
underst<strong>and</strong> the structural <strong>and</strong> dynamics properties <strong>of</strong> the protein in solution <strong>and</strong> the effect<br />
<strong>of</strong> the protonation state <strong>of</strong> FMN c<strong>of</strong>actor on the conformational dynamics <strong>of</strong> FMN c<strong>of</strong>actor<br />
<strong>and</strong> the whole protein. The paper is organized as follows. In the Methods Section, the<br />
details <strong>of</strong> the simulations, in particular the refinement <strong>of</strong> the original GROMOS96 FMN<br />
parameters for the oxidized <strong>and</strong> reduced states <strong>of</strong> the FMN c<strong>of</strong>actors are reported. In the<br />
results part, the analysis <strong>of</strong> the simulation trajectories for the apo- <strong>and</strong> the holo-protein in<br />
the oxidized <strong>and</strong> reduced states are reported. The analysis will be focused on the structural<br />
<strong>and</strong> dynamics properties <strong>of</strong> the overall protein structure <strong>and</strong> <strong>of</strong> the FMN binding side as<br />
well as the FMN c<strong>of</strong>actor. Finally, in the discussion <strong>and</strong> conclusions, the results <strong>of</strong> the<br />
83
PART II: P450BM-3 Reductase Domain<br />
simulations will be discussed in the context <strong>of</strong> the experimental knowledge <strong>of</strong> the FMN<br />
domain <strong>and</strong> a summary <strong>of</strong> the paper will be provided.<br />
4.2. Methods<br />
4.2.1. Starting coordinates<br />
The starting coordinates <strong>of</strong> the FMN domain (residue 479 - 630) were taken from a<br />
non-stoichiometric complex <strong>of</strong> P450BM-3 (PDB ID: 1BVY, 2.03 nm resolution) which has<br />
one FMN domain with two HEME domains.[18] The crystallographic water (within a<br />
distance <strong>of</strong> 0.6 nm from the FMN domain) was also kept during system preparation for MD<br />
simulations.<br />
4.2.2. Molecular dynamics simulation<br />
Table 4.1 summarizes the systems used to perform MD simulations with GROMOS96<br />
43a1 force field.[19] The crystal structure[18] has FMN c<strong>of</strong>actor in oxidized state (FOX)<br />
(see Figure 4.1a). The protonation <strong>of</strong> FOX isoalloxazine ring at N1 <strong>and</strong> N5 position<br />
represents the reduced state <strong>of</strong> FMN c<strong>of</strong>actor (FHQ) as indicated in Figure 4.1b. For the<br />
preparation <strong>of</strong> apo-protein simulation, FMN c<strong>of</strong>actor was removed from the crystal<br />
structure <strong>of</strong> FMN domain. GROMOS96 force field[20] was used for FMN c<strong>of</strong>actor. Additional<br />
improper dihedrals were introduced to adopt the conformation <strong>of</strong> isoalloxazine ring as<br />
observed in crystallographic structure <strong>and</strong> molecular geometry optimization <strong>of</strong> flavin in<br />
both redox states, reported in Table S4.1 (for FOX) <strong>and</strong> Table S4.2 (for FHQ) <strong>of</strong> supporting<br />
information (SI).[21,22]<br />
In the isoalloxazine ring <strong>of</strong> FMN c<strong>of</strong>actor, bending angle (δ) <strong>and</strong> puckering angle (ρ)<br />
were calculated along N5—N10 axis using v1, v2, a1, a2 <strong>and</strong> c vectors as shown in Figure 4.2.<br />
84
PART II: P450BM-3 Reductase Domain<br />
In FOX, isoalloxazine ring was kept close to planar while the bending angle <strong>of</strong> ~160º was<br />
used for FHQ.<br />
Table 4.1: Summarizing MD Simulation <strong>of</strong> P450BM-3 FMN domain in water.<br />
FMN domain<br />
No. <strong>of</strong> atoms<br />
No. <strong>of</strong> solvent No.<br />
<strong>of</strong> Simulation<br />
molecules counter ions length (ns)<br />
Oxidized (FOX) 33483<br />
10650 14<br />
50<br />
Reduced (FHQ) 33491<br />
10652 14<br />
50<br />
Apo protein<br />
(APO)<br />
33482<br />
10662 13<br />
50<br />
*The abbreviations FOX, FHQ <strong>and</strong> APO are used in the rest <strong>of</strong> the paper for FMN domain in oxidized<br />
<strong>and</strong> reduced states, <strong>and</strong> as apo-protein, protein, respectively.<br />
Figure 4.1: The schematic representation <strong>of</strong> FMN c<strong>of</strong>actor in oxidized (a) <strong>and</strong> reduced (b) states<br />
with atomic numbering <strong>of</strong> isoalloxazine ring.[23] N1 <strong>and</strong> N5 atoms, in blue ovals highlight the<br />
protonation positions. ChemSketch[24] was used to draw the figures.<br />
85
PART II: P450BM-3 Reductase Domain<br />
Figure 4.2: The schematic structure <strong>of</strong> the isoalloxazine ring to define the bending angle (δ) <strong>and</strong><br />
puckering angle (ρ) using the vectors v 1, v 2, a 1, a 2 <strong>and</strong> c.<br />
4.2.3. FMN binding site analysis<br />
The FMN binding site <strong>of</strong> holo-protein <strong>and</strong> apo-protein protein were tracked throughout the<br />
MD simulation using the MDpocket method.[25] The analysis was performed on total 5000<br />
snapshots after taking every 50 th frame from the trajectories. The pocket volume analysis<br />
was performed on aligned MD snapshots with the minimum alpha sphere size <strong>of</strong> 3 Å, the<br />
minimum number <strong>of</strong> alpha sphere close to each other for clustering <strong>of</strong> alpha sphere equal<br />
to The number <strong>of</strong> iteration to perform pocket volume calculation using Monte Carlo<br />
algorithm was set to 5000. The grid file generated by the first MDpocket run (iso-value =<br />
0.7) was used to extract the grid points for FMN binding pocket. The grid file <strong>of</strong> FMN<br />
binding pocket was edited manually by deleting some grid points using PyMOL[26] for<br />
better representation <strong>of</strong> FMN binding site in FMN domain. The FMN binding grid file was<br />
used with the aligned snapshots to track the changes in FMN binding site during the<br />
simulation.<br />
4.2.4. Multiple structural alignment <strong>of</strong> FMN domain<br />
The homologous structures <strong>of</strong> FMN domain were obtained by performing BlastP[27]<br />
against PDB database.[28] The protein sequences with identity greater than 20% were<br />
86
PART II: P450BM-3 Reductase Domain<br />
taking in account for further analysis. Six structures were selected as homologous to FMN<br />
domain <strong>of</strong> P450BM-3 after manually removing the redundant entries from BlastP results<br />
(summarized in Table 2). Multiple structural alignment (MSA) was performed on selected<br />
structures taking the FMN domain as reference structure with maximum RMSD cut<strong>of</strong>f <strong>of</strong><br />
0.5 nm using UCSB chimera.[29]<br />
4.3. Results<br />
4.3.1. FMN domain: structural <strong>and</strong> dynamical properties<br />
Figure 4.3 shows the backbone root mean square deviation (RMSD) <strong>of</strong> the FMN<br />
domain as apo- <strong>and</strong> holo- protein with FMN c<strong>of</strong>actor in oxidized <strong>and</strong> reduced states. In FOX,<br />
RMSD curve stabilizes to a plateau with an average value <strong>of</strong> 0.24 ± 0.04 nm after a rapid<br />
increase in the first 15 ns simulation. In the first 10 ns simulation, FHQ follows the same<br />
trend as observed in FOX. However, after 15 ns, RMSD <strong>of</strong> FHQ stabilizes to an average value<br />
<strong>of</strong> 0.16 ± 0.01 nm lower than the one in FOX. In APO, FMN domain remains stable<br />
throughout the simulation with the average RMSD value <strong>of</strong> 0.19 ± 0.01 nm after the short<br />
equilibration <strong>of</strong> ~5 ns. For all the simulations, the average radius <strong>of</strong> gyration (Rg) did not<br />
show appreciable variations from the crystal structure value (1.45 nm).<br />
87
PART II: P450BM-3 Reductase Domain<br />
Figure 4.3: Backbone RMSD with respect to crystal structure as a function <strong>of</strong> time for APO (in<br />
green) <strong>and</strong> holo-protein in FOX (in black) <strong>and</strong> FHQ (in red).<br />
FMN domain has a highly classical flavodoxin fold with five parallel β-sheets (β1 –<br />
β5) that are surrounded by four α-helices (α1 – α4). The loop regions together with<br />
irregular structures (coils <strong>and</strong> turns) are named according to the secondary structure<br />
element, α(–helix) or β(–sheet), preceding them. FMN c<strong>of</strong>actor is surrounded by three<br />
loops that succeed β sheets, hence named as Lβ1, Lβ3 <strong>and</strong> Lβ4 for ribityl binding loop<br />
(residues 488 – 491), inner (re face residues 534 – 544) <strong>and</strong> outer (si face residuse 571 –<br />
579) FMN binding loop, respectively. The crystallographic protein secondary structure,<br />
calculated using the DSSP method[30], was preserved during the simulation <strong>of</strong> FOX, FHQ<br />
<strong>and</strong> even in the APO simulations (see Figure S4.1 <strong>of</strong> SI).<br />
Figure 4.4: Backbone RMSD (a) <strong>and</strong> RMSF (b) per residue with respect to crystal structure for<br />
FOX (black), FHQ (red) <strong>and</strong> APO (green). Vertical bars in grey color show the loop regions. Loops<br />
surrounding the FMN c<strong>of</strong>actor are shown in black horizontal bars. (c) FMN domain in pink <strong>and</strong> FMN<br />
c<strong>of</strong>actor in cyan color with labeled helices <strong>and</strong> loop regions. FMN binding loops are labeled in<br />
orange color. N <strong>and</strong> C represent the amino <strong>and</strong> carboxy terminus <strong>of</strong> FMN domain.<br />
88
PART II: P450BM-3 Reductase Domain<br />
In Figure 4.4a <strong>and</strong> 4.4b, backbone RMSD <strong>and</strong> RMSF per residue with respect to<br />
crystal structure are reported, respectively. FMN domain with labeled helices <strong>and</strong> loops is<br />
reported in Figure 4.4c. Residues with large RMSD <strong>and</strong> RMSF values corresponds to loop<br />
regions (represented by grey colored bars in the Figure 4.4a <strong>and</strong> 4.4b) <strong>and</strong> to N- <strong>and</strong> C-<br />
terminus. In all simulations, the largest deviations <strong>and</strong> fluctuations were observed in Lβ3<br />
<strong>and</strong> Lβ4 FMN binding loops (black horizontal bars in Figure 4.4a <strong>and</strong> 4.4b) <strong>and</strong> Lβ2 <strong>and</strong><br />
Lα2 loops which are present opposite to FMN binding site. Lα2 loop shows the highest<br />
deviation in FOX. In APO, the deviations <strong>and</strong> fluctuations are mainly observed in the FMN<br />
binding loops (especially in the Lβ1 loop) that also occupy the FMN binding cavity during<br />
simulation.<br />
4.3.2. Cluster analysis <strong>of</strong> FMN domain<br />
Total number <strong>of</strong> clusters observed in FOX, FHQ <strong>and</strong> APO are 11, 12 <strong>and</strong> 7,<br />
respectively. The first two clusters accounts for 80.26 % <strong>and</strong> 79.25 % <strong>of</strong> the population for<br />
FOX <strong>and</strong> FHQ, respectively. The first cluster contributes to 47.10 % in FOX <strong>and</strong> 60.07 % in<br />
FHQ. While the second cluster represent the 33.16 % <strong>and</strong> 19.18 % <strong>of</strong> the total population in<br />
FOX <strong>and</strong> FHQ, respectively. In APO, even the first cluster covers the 85.17 % <strong>of</strong> the<br />
population. The apo-protein shows the least conformational diversity during the<br />
simulation. However the difference in the population <strong>of</strong> clusters in holo-protein indicates<br />
that the FMN protonation state notably influences the conformation <strong>of</strong> FMN domain.<br />
In Figure 4.5a, 4.5b <strong>and</strong> 4.5c, the representative conformations <strong>of</strong> the first two<br />
clusters <strong>of</strong> FOX <strong>and</strong> FHQ, <strong>and</strong> the first cluster <strong>of</strong> APO are superimposed with the crystal<br />
structure (in sky blue) <strong>and</strong> shown in cartoon representation. Major differences occur in<br />
loop regions as well as in N- <strong>and</strong> C- terminus. In particular, slightly larger deviations are<br />
present in FMN binding loops in FOX <strong>and</strong> FHQ. On the contrary, FMN binding region in APO<br />
is the most deviating part <strong>of</strong> FMN domain. The loop regions Lβ2 <strong>and</strong> Lα2 are also affected<br />
by the presence <strong>and</strong> the protonation state <strong>of</strong> FMN c<strong>of</strong>actor. In FOX, Lβ2 <strong>and</strong> Lα2 show<br />
larger deviations than in FHQ <strong>and</strong> APO as evidenced by the conformation <strong>of</strong> the first two<br />
89
PART II: P450BM-3 Reductase Domain<br />
clusters (see Figure 4.5). Lα2 loop flips inwards with higher deviation from crystal<br />
structure in the first two clusters <strong>of</strong> FOX.<br />
Figure 4.5: The conformation <strong>of</strong> first two clusters <strong>of</strong> (a) oxidized (black <strong>and</strong> gray), (b) reduced<br />
(red <strong>and</strong> coral) <strong>and</strong> (c) apo-protein protein (green) superimposed with crystal structure (in sky blue).<br />
Loops <strong>and</strong> helices (α1, α2, α3 <strong>and</strong> α4) are labeled. FMN binding loops are labeled in red color. The<br />
labeling <strong>of</strong> loops belongs to the secondary structure element succeed them.<br />
4.3.3. FMN binding site<br />
Figure 4.6 represents FMN binding site in detail using the representative structure<br />
<strong>of</strong> the first cluster <strong>of</strong> FOX (4.6a) <strong>and</strong> FHQ (4.6b), respectively. The hydrogen bonds between<br />
FMN domain <strong>and</strong> c<strong>of</strong>actor were calculated for distance between acceptor <strong>and</strong> hydrogen<br />
donor ≤ 0.35 nm <strong>and</strong> an angle among acceptor, donor <strong>and</strong> acceptor ≤ 30°.<br />
The occurrence <strong>of</strong> hydrogen bonds that are observed in crystal structure between<br />
FMN domain <strong>and</strong> the isoalloxazine ring <strong>of</strong> FMN c<strong>of</strong>actor are reported in Figure 4.7 for the<br />
simulation <strong>of</strong> FOX (in blue) <strong>and</strong> FHQ (in red). NMR spectroscopy studies <strong>of</strong> the protein in<br />
solution have also evidenced the hydrogen-bonding network involving N1, C2O, N3, C4O<br />
<strong>and</strong> N5 atoms <strong>of</strong> the isoalloxazine ring.[14,31] In the crystal structure, the hydrogen atom<br />
90
PART II: P450BM-3 Reductase Domain<br />
from the backbone amino group (NH) <strong>of</strong> Asn537 was involved in a hydrogen bond<br />
formation with N5 atom <strong>of</strong> isoalloxazine ring (NH – N5) with the distance <strong>of</strong> 0.175 nm. The<br />
same hydrogen bond was observed in the first 15 ns <strong>of</strong> FHQ simulation (in Figure 4.7a).<br />
However, being N5 atom in FHQ a hydrogen donor due to the protonation, a hydrogen<br />
bond with the oxygen from the side chain carboxamide group (-CONH2) <strong>of</strong> Asn537 (-<br />
(NH2)CO – HN5) was observed in last 25 ns simulation.<br />
Figure 4.6: (a) <strong>and</strong> (b) shows the FMN binding site from the representative structures <strong>of</strong> the first<br />
cluster for FOX <strong>and</strong> FHQ, respectively. The represented residues are within 0.4 nm from the FMN<br />
c<strong>of</strong>actor (in black). FMN binding loops Lβ1, Lβ3 <strong>and</strong> Lβ4 are shown as the ribbon <strong>of</strong> yellow, pink<br />
<strong>and</strong> cyan, respectively. The residues are labeled in red, blue <strong>and</strong> green as the part <strong>of</strong> Lβ1, Lβ3 <strong>and</strong><br />
Lβ4, respectively. Dashed lines show hydrogen bonds between isoalloxazine <strong>and</strong> surrounding<br />
residues. The underlined labels indicate the residues that have major change in conformation after<br />
the change in the redox state <strong>of</strong> FMN c<strong>of</strong>actor.<br />
In the crystal structure <strong>and</strong> simulations, the stable hydrogen bonds were observed<br />
at oxygen (O2) <strong>and</strong> nitrogen (N3H) <strong>of</strong> isoalloxazine ring with the hydrogen <strong>of</strong> backbone<br />
amino group <strong>of</strong> Gln579 (NH – O2) <strong>and</strong> the oxygen from backbone carbonyl group (CO) <strong>of</strong><br />
91
PART II: P450BM-3 Reductase Domain<br />
Thr577 (N3H – OC), respectively. O4 position was involved in hydrogen bond formation<br />
with hydrogen <strong>of</strong> hydroxyl group <strong>of</strong> Thr577 (OH – O4) in the first 15 ns simulation <strong>of</strong> FOX<br />
but it occurred throughout the whole FHQ simulation. Atom N1 was observed to form<br />
hydrogen bond with backbone amino group <strong>of</strong> Asp571 (NH – N1) in the crystal structure<br />
<strong>and</strong> FOX. However, in the FHQ simulation, the occurrence <strong>of</strong> this bond (NH – N1H) was very<br />
low.<br />
Figure 4.7: Hydrogen bond existence between FMN binding residues <strong>and</strong> a) isoalloxazine ring (a)<br />
<strong>and</strong> ribityl side chain (b) <strong>of</strong> FMN c<strong>of</strong>actor throughout the MD simulations calculated using every<br />
50 th ps frame. Blue <strong>and</strong> red color lines show hydrogen bond occurrences in FOX <strong>and</strong> FHQ,<br />
respectively as a function <strong>of</strong> time. On Y-axis labeled the partners <strong>of</strong> hydrogen bond.<br />
92
PART II: P450BM-3 Reductase Domain<br />
The change in protonation state <strong>of</strong> FMN c<strong>of</strong>actor affects its binding in FMN domain.<br />
FHQ strengthen the hydrogen-bonding network between ribityl side chain <strong>and</strong> phosphate<br />
moiety <strong>of</strong> FMN c<strong>of</strong>actor <strong>and</strong> domain. In FHQ, the phosphate group <strong>of</strong> FMN c<strong>of</strong>actor forms<br />
strong hydrogen bonds with the residues <strong>of</strong> Lβ1 loop than in FOX (shown in Figure 4.7b).<br />
The first hydroxyl group <strong>of</strong> ribityl side chain was involved in strong hydrogen bonding with<br />
the carbonyl oxygen <strong>of</strong> Ser537 (OH – OC) in FOX <strong>and</strong> FHQ. The third hydroxyl group <strong>of</strong><br />
ribityl side chain shows stronger hydrogen bond with Thr492 hydroxyl in FOX than FHQ.<br />
The latter was also observed to form hydrogen bond with carbonyl oxygen <strong>of</strong> Cys569 in<br />
FHQ.<br />
In FHQ, the tighter binding <strong>of</strong> FMN c<strong>of</strong>actor with stronger hydrogen bonding<br />
network between flavin <strong>and</strong> protein than FOX, induces the conformational change in FMN<br />
binding loops Lβ1, Lβ3 <strong>and</strong> Lβ4. The major conformational change was observed in the<br />
orientation <strong>of</strong> Trp574 residue due to the protonated N5 position in the isoalloxazine ring in<br />
FHQ (Figure 4.6b). The indole ring <strong>of</strong> Trp574 was nearly coplanar to the isoalloxazine ring<br />
<strong>of</strong> FMN c<strong>of</strong>actor in the crystal structure.[17] Experimentally, Trp574 conformation was<br />
observed to be critical to the FMN binding <strong>and</strong> found to be involved in ET tunneling from<br />
FMN to HEME. In FOX, the indole ring <strong>of</strong> Trp574 remains in the same conformation as in<br />
the crystal. In FHQ, it rotates to another configuration not aligned to the isoalloxazine ring.<br />
This rotation is a consequence <strong>of</strong> the steric hindrance induced by the conformation change<br />
in the reduced isoalloxazine ring.<br />
The change in the volume, hydrophobicity, solvent accessibility <strong>and</strong> polarity <strong>of</strong> FMN<br />
binding site are reported in Figure 4.8a, 4.8b , 4.8c <strong>and</strong> 4.8d, respectively during the<br />
simulation <strong>of</strong> FOX (in black), FHQ (in red) <strong>and</strong> APO (in green). In APO, the absence <strong>of</strong> FMN<br />
c<strong>of</strong>actor promotes a rearrangement <strong>of</strong> FMN binding site. After rearrangement, the side<br />
chains <strong>of</strong> amino acids <strong>of</strong> FMN binding loops replaced the initial water molecules <strong>and</strong><br />
occupied the cavity after ~5 ns <strong>of</strong> simulation. FOX shows larger variation in the geometric<br />
properties <strong>of</strong> FMN binding pocket than FHQ. Major changes were observed in the volume <strong>of</strong><br />
FMN binding pocket after the change in protonation state <strong>of</strong> FMN c<strong>of</strong>actor with averages<br />
429 ± 86 <strong>and</strong> 357 ± 45 for FOX <strong>and</strong> FHQ, respectively. In FHQ, the pocket volume showed<br />
93
PART II: P450BM-3 Reductase Domain<br />
less variation than in FOX (see Figure 4.8a). The hydrophobicity (Figure 4.8b) <strong>and</strong> polarity<br />
(Figure 4.8d) <strong>of</strong> FMN pocket are slightly perturbed in the first 15 ns simulation for then<br />
converge to the same values in both FQH <strong>and</strong> FOX simulations.<br />
Figure 4.8: FMN binding pocket (a) volume, (b) hydrophobicity, y, (c) solvent accessibility<br />
<strong>and</strong> (d) polarity for FOX (in black), FHQ (in red) <strong>and</strong> APO (in green).<br />
4.3.4. Conservation pr<strong>of</strong>ile <strong>of</strong> FMN binding site<br />
In Table 4.2, the summary <strong>of</strong> MSA for FMN domain <strong>of</strong> P450BM-3 <strong>and</strong> its homologous<br />
structures (in SI see Figure S4. 4.2 for MSA <strong>and</strong> Figure S4.3 for conservation patterns mapped<br />
on FMN domain <strong>of</strong> P450BM-3) is reported. The ribityl side-chain binding region is the most<br />
94
PART II: P450BM-3 Reductase Domain<br />
Table 4.2: Summarizing MSA <strong>of</strong> FMN domain <strong>of</strong> P450BM-3 <strong>and</strong> its homologous structures<br />
<strong>and</strong> the characterization <strong>of</strong> their FMN-binding pocket.<br />
PDB<br />
Id<br />
1BVY<br />
1B1C<br />
1JA1<br />
2BF4<br />
1YKG<br />
3HR4<br />
1F4P<br />
Max.<br />
FMN binding pocket<br />
RMSD sequence<br />
Hydro Solvent<br />
Protein<br />
Pola<br />
(nm) identity Volume phobicitbility<br />
accessi-<br />
-rity<br />
(%)<br />
CPR a 0.000 100.00 601.82 358.17 16.69 11<br />
CPR a 0.134 30.26 508.28 299.54 25.26 11<br />
CPR a 0.135 30.26 594.28 313.50 24.65 12<br />
CPR a 0.135 25.00 554.08 354.37 11.29 12<br />
SiR-FP b 0.172 19.86 639.03 354.39 20.88 15<br />
NOS c 0.144 23.68 466.78 258.09 14.44 12<br />
Fld d 0.214 23.13 558.02 340.19 10.82 15<br />
Organism<br />
Bacillus<br />
megaterium<br />
Homo<br />
sapiens<br />
Rattus<br />
norvegicus<br />
Saccharomyces<br />
cerevisiae<br />
Escherichia<br />
coli<br />
Homo<br />
sapiens<br />
Desulfovibrio<br />
vulgaris<br />
conserved region in FMN domain since it is responsible for the tight binding <strong>of</strong> FMN<br />
c<strong>of</strong>actor. Among the cytochrome P450 reductases (CPR), the P450BM-3 one has the higher<br />
volume <strong>of</strong> FMN binding site with higher hydrophobicity <strong>and</strong> lower polarity. The solvent<br />
accessibility <strong>of</strong> the FMN binding pocket was found to be in the middle <strong>of</strong> other homologous<br />
protein. Together all these differences in the properties <strong>of</strong> FMN-binding pocket <strong>of</strong> P450BM-<br />
3 results into the better catalytic turnover than other P450 monooxygenases.[14,17]<br />
*CPR a : cytochrome P450 reductase, SiR-FP b : sulfite reductase, NOS c : nitric oxide synthase, Fld d : flavodoxin<br />
95
PART II: P450BM-3 Reductase Domain<br />
4.3.5. Principal component analysis <strong>of</strong> FMN domain<br />
The cumulative relative positional fluctuation <strong>of</strong> the first 20 eigenvectors accounts<br />
for 82 %, 79 % <strong>and</strong> 75 % <strong>of</strong> the total RPF in the simulation <strong>of</strong> FOX, FHQ <strong>and</strong> APO,<br />
respectively. The convergence <strong>of</strong> the trajectory has been analyzed by comparing the RMSIP<br />
value for the first 20 eigenvectors obtained from the PCA <strong>of</strong> MD trajectories (50 ns). The<br />
RMSIP values calculated by the two halves <strong>of</strong> the trajectories resulted in 0.563, 0.624 <strong>and</strong><br />
0.594 for the simulation <strong>of</strong> FOX, FHQ <strong>and</strong> APO, respectively. The relatively high values <strong>of</strong><br />
the RMSIP for the trajectories indicate the good convergence <strong>of</strong> the essential eigenvectors.<br />
The first two eigenvectors cover the 50%, 41%, <strong>and</strong> 32% (with 32% <strong>and</strong> 29% <strong>and</strong><br />
20% contribution just from the first eigenvector) <strong>of</strong> the total fluctuations in FOX, FHQ, <strong>and</strong><br />
APO, respectively. The inner product (IP) values for the first two eigenvectors obtained<br />
from the inner product matrix <strong>of</strong> two trajectories are reported in Table 4.3. The inner<br />
product <strong>of</strong> the first eigenvector in all simulations was found to be less than 0.350, which<br />
shows that the most important essential mode is different for the three systems.<br />
Table 4.3: RMSIP values <strong>of</strong> the first 2 eigenvectors obtained from the last 50 ns trajectories<br />
<strong>of</strong> two different simulations.<br />
Inner product<br />
FOX/FHQ<br />
FOX/APO<br />
FHQ/APO<br />
1 st eigenvector 2 nd eigenvector<br />
1 st eigenvector 0.143 0.400<br />
2 st eigenvector 0.485 0.414<br />
1 st eigenvector 0.268 0.156<br />
2 st eigenvector 0.033 0.156<br />
1 st eigenvector 0.351 0.281<br />
2 st eigenvector 0.183 0.169<br />
96
PART II: P450BM-3 Reductase Domain<br />
Figure 4.9a represents RMSF <strong>of</strong> the backbone atoms in the first <strong>and</strong> second<br />
eigenvector <strong>of</strong> FOX (black), FHQ (red) <strong>and</strong> APO (green). The corresponding tridimensional<br />
representations obtained after the projection <strong>of</strong> first <strong>and</strong> second eigenvectors on MD<br />
trajectories are reported in Figure 4.9b, 4.9c <strong>and</strong> 4.9d for FOX, FHQ <strong>and</strong> APO, respectively.<br />
In all simulations, the residues <strong>of</strong> C terminal <strong>and</strong> long loop regions, Lβ2 <strong>and</strong> Lα2 that are<br />
present opposite to FMN binding site show higher fluctuations <strong>and</strong> together constitute the<br />
collective motions in the first eigenvector. In the first eigenvector <strong>of</strong> FOX the collective<br />
motion is restricted to Lβ2, Lβ3 <strong>and</strong> Lα2 loops <strong>and</strong> C terminal region <strong>and</strong> have higher<br />
fluctuations. In the second eigenvector <strong>of</strong> FOX, the higher fluctuation was shown by C<br />
terminal residues <strong>and</strong> Lα1, Lβ2, Lα2 loops. FHQ shows higher fluctuations in Lα1, Lα2, Lβ4<br />
loops, <strong>and</strong> C <strong>and</strong> N terminus in the first eigenvector. The second eigenvector <strong>of</strong> FHQ shows<br />
higher fluctuations in Lα2, Lβ2 <strong>and</strong> Lβ4 loops. The first eigenvector <strong>of</strong> APO shows higher<br />
fluctuations in Lβ1, Lβ2 <strong>and</strong> Lα2 loops <strong>and</strong> C terminus, while the second eigenvector shows<br />
in Lα2 loop <strong>and</strong> in all the FMN binding loops. The higher fluctuations constitute the<br />
collective motion in the first eigenvector were observed in inner FMN binding loop Lβ3 for<br />
FOX, outer FMN binding loop Lβ4 for FHQ <strong>and</strong> Lβ1 for APO.<br />
Figure 4.10 shows the crystallographic structure complex <strong>of</strong> HEME with FMN<br />
domain with labeled helices, c<strong>of</strong>actors <strong>and</strong> FMN binding loops. In the crystal structure, the<br />
α1 helix is involved in direct or water mediated contacts with HEME domain <strong>and</strong> the outer<br />
FMN binding loop (Lβ4) interact with the peptide precede the HEME binding loop (K/L<br />
loop). 18 These interaction sites are crucial for the ET from FMN to HEME. Higher<br />
fluctuations observed in Lβ4 <strong>and</strong> α1 helix regions in the first eigenvector <strong>of</strong> FHQ might be<br />
related to the inhibition <strong>of</strong> electron transfer from FMN to HEME by reduced state <strong>of</strong> FMN<br />
c<strong>of</strong>actor as observed experimentally. 11 In the first eigenvector <strong>of</strong> FOX, the higher<br />
fluctuation was restricted to the residues <strong>of</strong> inner FMN binding loop Lβ3 <strong>and</strong> loops,<br />
opposite to FMN binding site Lα2 <strong>and</strong> Lβ2. So the latter defined region is found opposite to<br />
the region <strong>of</strong> probable HEME binding surface in the crystal structure so the collective<br />
motion constitute by this region in FOX might be related to the electron transfer from FAD<br />
to FMN <strong>and</strong> it could be the probable binding site for FAD domain. In APO the local structure<br />
97
PART II: P450BM-3 Reductase Domain<br />
remain conserved with highly flexible FMN binding loops that helps to rebind FMN c<strong>of</strong>actor<br />
in apo-protein protein <strong>and</strong> working again as holo-protein as found experimentally. 41<br />
Figure 4.9: (a) RPF associated with eigenvectors. Vertical bars in grey color show the loop<br />
regions. FMN binding loop are shown in black horizontal bars. Representation <strong>of</strong> the RMSF <strong>of</strong> the<br />
98
PART II: P450BM-3 Reductase Domain<br />
protein backbone atoms along first <strong>and</strong> second eigenvectors after projection <strong>of</strong> the trajectory <strong>of</strong><br />
FOX (black), FHQ (red) <strong>and</strong> APO (green) on the corresponding eigenvectors. The 10 sequential<br />
frames representing the extension <strong>of</strong> the fluctuations in FOX (b), FHQ (c) <strong>and</strong> APO (d) trajectories<br />
along the first <strong>and</strong> second eigenvector are reported. . The first extreme is shown in blue color <strong>and</strong><br />
last extreme in cyan. Loops <strong>and</strong> helices es are labeled. Labels in red show the FMN binding loops. N<br />
<strong>and</strong> C indicate the N- <strong>and</strong> C-terminus <strong>of</strong> the protein.<br />
Figure 4.10: FMN domain (in pink) complex with P450 HEME domain (in blue) in crystal<br />
structure (1BVY[18]). HEME c<strong>of</strong>actor represented in orange <strong>and</strong> FMN c<strong>of</strong>actor in green. Helices <strong>and</strong><br />
N- <strong>and</strong> C-terminus are labeled in both domains. Labels Lβ1, Lβ3 <strong>and</strong> Lβ4 show the FMN c<strong>of</strong>actor<br />
binding loops in the FMN domain.<br />
4.3.6. FMN c<strong>of</strong>actor: structural <strong>and</strong> dynamical properties<br />
The conformational changes <strong>of</strong> the FMN c<strong>of</strong>actor induced by the surrounding<br />
protein environment were studied for both redox states. The RMSD <strong>and</strong> RMSF <strong>of</strong> phosphate<br />
group atoms for both states show higher fluctuations <strong>and</strong> deviations than other FMN<br />
c<strong>of</strong>actor heavy atoms (see Figure 4.11a <strong>and</strong> 4.11b). Furthermore, the phosphate group <strong>of</strong><br />
99
PART II: P450BM-3 Reductase Domain<br />
Figure 4.11: (a) RMSF <strong>and</strong> RMSD <strong>of</strong> heavy atoms calculated with respect to crystal<br />
structure. Vertical line shows the beginning <strong>of</strong> ribityl side chain. (b) Schematic diagram <strong>of</strong><br />
FMN c<strong>of</strong>actor with the numbering used in the plots (4.4a) for the atomic positions <strong>of</strong> heavy<br />
atoms.<br />
FMN c<strong>of</strong>actor in oxidized state deviates more from the crystal structure <strong>and</strong> with higher<br />
fluctuations than it does in the reduced state. This is consistent with the observed<br />
variations <strong>of</strong> the hydrogen bonding network between the FMN c<strong>of</strong>actor <strong>and</strong> the protein.<br />
Figure 4.12a shows the distribution <strong>of</strong> the value <strong>of</strong> δ angles <strong>of</strong> the isoalloxazine ring<br />
(see Figure 4.2) in FOX <strong>and</strong> FHQ. For the oxidized state <strong>of</strong> FMN c<strong>of</strong>actor the vales are<br />
normal distributed in the range from 170º to 180º with the peak centered at 177º. In the<br />
reduced state, the distribution <strong>of</strong> δ has a larger width. The reduced state <strong>of</strong> FMN shows a<br />
distribution ranging from 154º to 171º with the peak at 162º. For the latter case, the<br />
average value is consistent with quantum mechanical calculations in vacuum <strong>of</strong> the<br />
isoalloxazine ring in the reduced state[22,32] that give a value for δ = ~160º.<br />
100
PART II: P450BM-3 Reductase Domain<br />
Figure 4.12: Distribution <strong>of</strong> (a)<br />
angles calculated at N5 — N10 axis <strong>of</strong> isoalloxazine ring along the<br />
50 ns simulation <strong>of</strong> FOX (in black color) <strong>and</strong> FHQ (in red color) with 0.3 bin width <strong>and</strong> (b) the<br />
beginning to end distance <strong>of</strong> the ribityl side chain along 50 ns simulation for FOX <strong>and</strong> FHQ (0.007<br />
bin width) <strong>and</strong> in X-ray (in green color) <strong>and</strong> NMR (in blue color) homologous structures <strong>of</strong> FMN<br />
domain.<br />
Figure 4.12b shows the distribution <strong>of</strong> the beginning to end distance for the ribityl side<br />
chain <strong>of</strong> FMN c<strong>of</strong>actor in FOX, FHQ <strong>and</strong> the homologous structures <strong>of</strong> FMN domain with<br />
FMN c<strong>of</strong>actor in oxidized state. For the crystallographic <strong>and</strong> NMR homologous structures<br />
the distance range from 0.73 to 0.8 nm <strong>and</strong> 0.70 to 0.81 nm, respectively. The distance<br />
observed ed in the crystal structure <strong>of</strong> FMN domain was 0.77 nm. The distances obtained from<br />
the FOX <strong>and</strong> FHQ simulations are distributed in the range <strong>of</strong> 0.61 to 0.90 nm with the main<br />
peaks at 0.78 nm <strong>and</strong> 0.80 nm, respectively. The beginning to end distance for ribityl side<br />
chain in the simulations <strong>and</strong> in NMR <strong>and</strong> crystallographic studies was consistent <strong>and</strong><br />
distributed in the same range.<br />
101
PART II: P450BM-3 Reductase Domain<br />
4.3.7. Cluster analysis <strong>of</strong> FMN c<strong>of</strong>actor<br />
The cluster analysis was performed on the heavy atoms <strong>of</strong> FMN c<strong>of</strong>actor using the<br />
crystal structure as reference <strong>and</strong> a cut<strong>of</strong>f <strong>of</strong> 0.04 nm in the protein environment. The first<br />
cluster comprises 87 % <strong>and</strong> 99 % <strong>of</strong> the total 13 <strong>and</strong> 8 clusters in FOX <strong>and</strong> FHQ,<br />
respectively. The cumulative sum <strong>of</strong> the number <strong>of</strong> clusters obtained from different<br />
simulations as a function <strong>of</strong> time is reported in Figure S4.4 <strong>of</strong> the SI. Both the simulations<br />
reached a plateau, which indicates a sufficient sampling <strong>of</strong> conformational space along the<br />
trajectories. The representative structure <strong>of</strong> first cluster <strong>of</strong> FOX (in black) <strong>and</strong> FHQ (in red)<br />
are reported in Figure S4.4 in SI. The conformational flexibility <strong>of</strong> ribityl side chain was<br />
observed to be mainly responsible for the conformational diversity <strong>of</strong> clusters in both the<br />
simulations. The reduced number <strong>of</strong> clusters in the FHQ simulation is consistent with the<br />
small RMSF fluctuations <strong>of</strong> the ribityl side chain. The preferred conformation for the FMN<br />
c<strong>of</strong>actor in both redox states is characterized by a partially elongated ribityl side chain (see<br />
Figure S4.4 in SI).<br />
4.3.8. Principal component analysis <strong>of</strong> FMN c<strong>of</strong>actor<br />
The first eight eigenvectors cover ~79 % <strong>of</strong> the total RPF in both the simulations. In<br />
the protein environment, RPF covers ~25 % <strong>and</strong> ~27 % by the first eigenvector in FOX <strong>and</strong><br />
FHQ, respectively. The superimposition <strong>of</strong> the first <strong>and</strong> last extreme structures <strong>of</strong><br />
isoalloxazine ring generated by the projection <strong>of</strong> the first eight eigenvectors on the<br />
trajectory <strong>of</strong> FOX is shown in Figure 4.13. In the first eigenvector, a symmetric bending<br />
mode <strong>of</strong> the ring is present in both the simulations. However, the reduced state (Figure<br />
4.13 (1b)) manifest a “butterfly wing” bending mode around the N5 — N10 axis. This type<br />
<strong>of</strong> vibrational mode has been also reported in previous experimental <strong>and</strong> quantum<br />
mechanics study.[21,22,32] The other seven modes show similar eigenvectors in both<br />
states. The second most dominant motion is the twisting <strong>of</strong> the isoalloxazine ring along the<br />
main isoalloxazine axis. The observed collective modes are in the qualitative agreement<br />
with the vibrational normal modes obtained from resonance Raman spectroscopy<br />
measurements <strong>and</strong> QM calculations for the Lumiflavin.[23,32] In addition, surface<br />
102
PART II: P450BM-3 Reductase Domain<br />
enhanced resonance Raman scattering studies <strong>of</strong> the free FMN c<strong>of</strong>actor <strong>and</strong> in FMN domain<br />
indicate evidences the presence <strong>of</strong> vibrational modes resulted by atomic displacement <strong>of</strong><br />
atoms like C4, O, C4a, C10a , C5a <strong>and</strong> C9a.[33] These experimental observations are<br />
consistent with the higher fluctuations in correspondence <strong>of</strong> these atoms observed in PC<br />
modes from the first 8 eigenvectors <strong>of</strong> our simulations.<br />
Figure 4.13: The superimposition <strong>of</strong> two extreme structures generated after projecting FOX<br />
trajectory on the first eight eigenvectors (structures from 1 - 8). 1a <strong>and</strong> 1b show the different<br />
collective motion <strong>of</strong> 1 st eigenvector in oxidized <strong>and</strong> reduced state, respectively.<br />
4.4. Discussion <strong>and</strong> conclusions<br />
MD simulations have been performed on FMN binding reductase domain <strong>of</strong><br />
monooxygenase P450BM-3 using FMN c<strong>of</strong>actor in oxidized <strong>and</strong> reduced state to<br />
underst<strong>and</strong> the effect <strong>of</strong> the change in protonation state <strong>of</strong> isoalloxazine ring on the<br />
103
PART II: P450BM-3 Reductase Domain<br />
conformation <strong>and</strong> dynamics <strong>of</strong> FMN domain <strong>and</strong> c<strong>of</strong>actor. The results <strong>of</strong> the simulations<br />
showed that the change <strong>of</strong> the protonation state in the reduced FMN affect the overall<br />
structure <strong>and</strong> dynamics <strong>of</strong> FMN domain in solution. In particular, the structural <strong>and</strong><br />
dynamic properties <strong>of</strong> the si-face FMN binding loop (Lβ3) are strongly influenced by the<br />
change in protonation <strong>of</strong> FMN c<strong>of</strong>actor (in FOX). In the apo-protein, the overall local<br />
structure <strong>of</strong> the protein remains reserved but higher fluctuations were observed in FMN<br />
binding loops. The latter effect can explain the experimental finding <strong>of</strong> reversible rebinding<br />
<strong>of</strong> FMN c<strong>of</strong>actor in apo-protein.[31,34,35] The loop Lβ2 were observed to contribute<br />
mainly on the collective modes <strong>of</strong> the FMN domain as holo-protein or apo-protein that is<br />
also in agreement with the solution structure <strong>of</strong> flavodoxin-like domain <strong>of</strong> E.coli<br />
determined by NMR.[35] The inner FMN binding loop (Lβ3) contributed to the prominent<br />
collective mode <strong>of</strong> FMN domain in oxidized state. While the outer FMN binding loop (Lβ4)<br />
contribute to the prominent collective mode <strong>of</strong> FMN domain in reduced state <strong>and</strong> in apoprotein.<br />
In FHQ, the major conformational change in the FMN binding site residue Trp574<br />
was observed. Trp574 is critical to FMN c<strong>of</strong>actor binding <strong>and</strong> electron transfer in P450BM-<br />
3. In FHQ, the latter do not remain coplanar to isoalloxazine ring to avoid the steric<br />
hindrance induced by the conformation change in the FMN c<strong>of</strong>actor upon protonation as<br />
also suggested by 15 N–NMR[31] <strong>and</strong> surface enhanced resonance Raman scattering[33]<br />
experiments. Hence, the change in the conformation <strong>of</strong> Trp574 might be the major factor<br />
that makes the reduced state kinetically unfavorable in P450BM-3 for transferring the<br />
electron from FMN to HEME as observed in previous studies.[36]<br />
The FMN c<strong>of</strong>actor during simulations acquires different conformations that are<br />
mainly influenced by the movement <strong>of</strong> ribityl side chain. The binding region <strong>of</strong> the ribityl<br />
side chain was evolutionary more conserved. In general the oxidized state was observed<br />
more flexible to obtain different conformation in protein environment. The latter might be<br />
the result <strong>of</strong> change in the FMN binding site properties <strong>and</strong> hydrogen bond environment in<br />
the reduced state. The isoalloxazine ring <strong>of</strong> FMN c<strong>of</strong>actor exhibits mainly 8 collective<br />
motions. Except first vibrational modes all modes were identical in both redox states. FMN<br />
c<strong>of</strong>actor in reduced state constitutes to the so-called “butterfly motion” as the first<br />
collective motion due to bending <strong>of</strong> isoalloxazine along the N5 — N10 axis.<br />
104
PART II: P450BM-3 Reductase Domain<br />
In summary, we have analyzed for the first time the dynamics <strong>of</strong> the FMN binding<br />
domain <strong>of</strong> P450BM-3 in water. In particular, we have studied the effect <strong>of</strong> FMN binding on<br />
the fluctuation modes <strong>of</strong> FMN domain. FMN c<strong>of</strong>actor is involved in the electron transfer in<br />
P450BM-3 <strong>and</strong> its dynamics can play an important role in electron transfer. The results <strong>of</strong><br />
our study indicate a difference in the fluctuation amplitude <strong>of</strong> the FMN c<strong>of</strong>actor in the<br />
different redox states. The latter effect was resulted by the change in the conformation <strong>of</strong><br />
FMN binding site due to the protonation state <strong>of</strong> isoalloxazine ring.<br />
4.5. References<br />
1. Chefson A, Auclair K (2006) Progress towards the easier use <strong>of</strong> P450 enzymes. Mol<br />
Biosyst 2: 462-469.<br />
2. Wong LL (1998) Cytochrome P450 monooxygenases. Curr Opin Chem Biol 2: 263-<br />
268.<br />
3. Guengerich FP (2001) Common <strong>and</strong> uncommon cytochrome P450 reactions related<br />
to metabolism <strong>and</strong> chemical toxicity. Chem Res Toxicol 14: 611-650.<br />
4. Kumar S (2010) Engineering cytochrome P450 biocatalysts for biotechnology,<br />
medicine <strong>and</strong> bioremediation. Expert Opin Drug Metab Toxicol 6: 115-131.<br />
5. Warman AJ, Roitel O, Neeli R, Girvan HM, Seward HE, et al. (2005) Flavocytochrome<br />
P450 BM3: an update on structure <strong>and</strong> mechanism <strong>of</strong> a biotechnologically important<br />
enzyme. Biochem Soc Trans 33: 747-753.<br />
6. Munro AW, Leys DG, McLean KJ, Marshall KR, Ost TW, et al. (2002) P450 BM3: the<br />
very model <strong>of</strong> a modern flavocytochrome. Trends Biochem Sci 27: 250-257.<br />
7. Girvan HM, Waltham TN, Neeli R, Collins HF, McLean KJ, et al. (2006)<br />
Flavocytochrome P450 BM3 <strong>and</strong> the origin <strong>of</strong> CYP102 fusion species. Biochem Soc<br />
Trans 34: 1173-1177.<br />
8. Jung ST, Lauchli R, Arnold FH (2011) Cytochrome P450: taming a wild type enzyme.<br />
Curr Opin Biotechnol 22: 809–817.<br />
105
PART II: P450BM-3 Reductase Domain<br />
9. Whitehouse CJC, Bell SG, Wong L-L (2012) P450BM3 (CYP102A1): connecting the<br />
dots. Chem Soc Rev 41: 1218-1260.<br />
10. Sevrioukova I, Peterson JA (1996) Domain-domain interaction in cytochrome<br />
P450BM-3. Biochimie 78: 744-751.<br />
11. Sevrioukova I, Shaffer C, Ballou DP, Peterson JA (1996) Equilibrium <strong>and</strong> transient<br />
state spectrophotometric studies <strong>of</strong> the mechanism <strong>of</strong> reduction <strong>of</strong> the flavoprotein<br />
domain <strong>of</strong> P450BM-3. Biochemistry 35: 7058-7068.<br />
12. Sevrioukova I, Truan G, Peterson JA (1996) The flavoprotein domain <strong>of</strong> P450BM-3:<br />
Expression, purification, <strong>and</strong> properties <strong>of</strong> the flavin adenine dinucleotide- <strong>and</strong><br />
flavin mononucleotide-binding subdomains. Biochemistry 35: 7528-7535.<br />
13. Hazzard JT, Govindaraj S, Poulos TL, Tollin G (1997) Electron transfer between the<br />
FMN <strong>and</strong> heme domains <strong>of</strong> cytochrome P450BM-3. Effects <strong>of</strong> substrate <strong>and</strong> CO. J Biol<br />
Chem 272: 7922-7926.<br />
14. Narhi LO, Fulco AJ (1987) Identification <strong>and</strong> characterization <strong>of</strong> two functional<br />
domains in cytochrome P-450BM-3, a catalytically self-sufficient monooxygenase<br />
induced by barbiturates in Bacillus megaterium. J Biol Chem 262: 6683-6690.<br />
15. Pylypenko O, Schlichting I (2004) Structural aspects <strong>of</strong> lig<strong>and</strong> binding to <strong>and</strong><br />
electron transfer in bacterial <strong>and</strong> fungal P450s. Annu Rev Biochem 73: 991-1018.<br />
16. Chen HC, Swenson RP (2008) Effect <strong>of</strong> the Insertion <strong>of</strong> a Glycine Residue into the<br />
Loop Spanning Residues 536-541 on the Semiquinone State <strong>and</strong> Redox Properties <strong>of</strong><br />
the Flavin Mononucleotide-Binding Domain <strong>of</strong> Flavocytochrome P450BM-3 from<br />
Bacillus megaterium. Biochemistry 47: 13788-13799.<br />
17. Sevrioukova IF, Hazzard JT, Tollin G, Poulos TL (1999) The FMN to heme electron<br />
transfer in cytochrome P450BM-3 - Effect <strong>of</strong> chemical modification <strong>of</strong> cysteines<br />
engineered at the FMN-heme domain interaction site. J Biol Chem 274: 36097-<br />
36106.<br />
18. Sevrioukova IF, Li HY, Zhang H, Peterson JA, Poulos TL (1999) Structure <strong>of</strong> a<br />
cytochrome P450-redox partner electron-transfer complex. P Natl Acad Sci USA 96:<br />
1863-1868.<br />
106
PART II: P450BM-3 Reductase Domain<br />
19. van Gunsteren WF, Billeter SR, Eising AA, Hunenberger PH, Kruger P, et al. (1996)<br />
Biomolecular Simulation: The GROMOS96 Manual <strong>and</strong> User Guide. VdF:<br />
Hochschulverlag AG an der ETH Zurich <strong>and</strong> BIOMOS bv, Zurich, Groningen: 1.<br />
20. van Gunsteren WF, Billeter SR, Eising AA, Hunenberger PH, Kruger P, et al. (1996)<br />
Biomolecular Simulation: The GROMOS96 Manual <strong>and</strong> User Guide. VdF:<br />
Hochschulverlag AG an der ETH Zurich <strong>and</strong> BIOMOS bv, Zurich, Groningen.<br />
21. Zheng Y-J, Ornstein RL (1996) A Theoretical Study <strong>of</strong> the Structures <strong>of</strong> Flavin in<br />
Different Oxidation <strong>and</strong> Protonation States. J Am Chem Soc 118: 9402-9408.<br />
22. Walsh JD, Miller AF (2003) Flavin reduction potential tuning by substitution <strong>and</strong><br />
bending. J Mol Struc-Theochem 623: 185-195.<br />
23. Abe M, Kyogoku Y (1987) Vibrational Analysis <strong>of</strong> Flavin Derivatives - Normal<br />
Coordinate Treatments <strong>of</strong> Lumiflavin. Spectrochim Acta A 43: 1027-1037.<br />
24. (2010) ACD/ChemSketch Freeware, version 1201, Advanced Chemistry<br />
<strong>Development</strong>, Inc, Toronto, ON, Canada, wwwacdlabscom.<br />
25. Schmidtke P, Bidon-Chanal A, Luque FJ, Barril X (2011) MDpocket : Open Source<br />
Cavity Detection <strong>and</strong> Characterization on Molecular Dynamics Trajectories.<br />
Bioinformatics 27: 3276-3285.<br />
26. (2012) The PyMOL Molecular Graphics System, Version 1504 Schrödinger, LLC,<br />
http://wwwpymolorg/.<br />
27. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment<br />
search tool. J Mol Biol 215: 403-410.<br />
28. Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Jr., Brice MD, et al. (1977) The<br />
Protein Data Bank: a computer-based archival file for macromolecular structures. J<br />
Mol Biol 112: 535-542.<br />
29. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, et al. (2004) UCSF<br />
Chimera--a visualization system for exploratory research <strong>and</strong> analysis. J Comput<br />
Chem 25: 1605-1612.<br />
30. Kabsch W, S<strong>and</strong>er C (1983) Dictionary <strong>of</strong> protein secondary structure: pattern<br />
recognition <strong>of</strong> hydrogen-bonded <strong>and</strong> geometrical features. Biopolymers 22: 2577-<br />
2637.<br />
107
PART II: P450BM-3 Reductase Domain<br />
31. Kasim M, Chen HC, Swenson RP (2009) Functional characterization <strong>of</strong> the re-face<br />
loop spanning residues 536-541 <strong>and</strong> its interactions with the c<strong>of</strong>actor in the flavin<br />
mononucleotide-binding domain <strong>of</strong> flavocytochrome P450 from Bacillus<br />
megaterium. Biochemistry 48: 5131-5141.<br />
32. Nakai S, Yoneda F, Yamabe T (1999) Theoretical study on the lowest-frequency<br />
mode <strong>of</strong> the flavin ring. Theor Chem Acc 103: 109-116.<br />
33. Macdonald IDG, Smith WE, Munro AW (1999) Analysis <strong>of</strong> the structure <strong>of</strong> the flavin<br />
binding sites <strong>of</strong> flavocytochrome P450BM3 using surface enhanced resonance<br />
Raman scattering. Eur Biophys J 28: 437-445.<br />
34. Wittung-Stafshede P (2002) Role <strong>of</strong> c<strong>of</strong>actors in protein folding. Acc Chem Res 35:<br />
201-208.<br />
35. Sibille N, Blackledge M, Brutscher B, Coves J, Bersch B (2005) Solution structure <strong>of</strong><br />
the sulfite reductase flavodoxin-like domain from Escherichia coli. Biochemistry 44:<br />
9086-9095.<br />
36. Klein ML, Fulco AJ (1993) Critical residues involved in FMN binding <strong>and</strong> catalytic<br />
activity in cytochrome P450BM-3. J Biol Chem 268: 7553-7561.<br />
Part <strong>of</strong> this chapter is adapted with permission from ‘Verma R, Schwaneberg U,<br />
Roccatano D. Journal <strong>of</strong> Chemical Theory <strong>and</strong> Computation DOI: 10.1021/ct300723x.’<br />
108
PART II: P450BM-3 Reductase Domain SI<br />
Supporting Information<br />
Conformational Dynamics <strong>of</strong> the FMN-binding Reductase<br />
Domain <strong>of</strong> Monooxygenase P450BM-3<br />
Table S4.1: Force field parameters for FMN c<strong>of</strong>actor in oxidized state for GROMOS96 43a1<br />
force field.[1]<br />
Atom number Atom type Atom name Charge group Partial charge<br />
1 C FC9A 1 0.200<br />
2 NR FN10 1 -0.200<br />
3 C FC10A 2 0.360<br />
4 NR FN1 2 -0.360<br />
5 C FC2 3 0.380<br />
6 O FO2 3 -0.380<br />
7 NR FN3 4 -0.280<br />
8 H FH3 4 0.280<br />
9 C FC4 5 0.380<br />
10 O FO4 5 -0.380<br />
11 C FC4A 6 0.180<br />
12 NR FN5 6 -0.280<br />
13 C FC5A 6 0.100<br />
14 CR1 FC6 7 0.000<br />
15 C FC7 8 0.000<br />
109
PART II: P450BM-3 Reductase Domain SI<br />
16 CH3 FCM7 8 0.000<br />
17 C FC8 9 0.000<br />
18 CH3 FCM8 9 0.000<br />
19 CR1 FC9 10 0.000<br />
20 CH2 FCA 11 0.000<br />
21 CH1 FCB 12 0.150<br />
22 OA FOB 12 -0.548<br />
23 H FHB 12 0.398<br />
24 CH1 FCG 13 0.150<br />
25 OA FOG 13 -0.548<br />
26 H FHG 13 0.398<br />
27 CH1 FCD 14 0.150<br />
28 OA FOD 14 -0.548<br />
29 H FHD 14 0.398<br />
30 CH2 FCE 15 0.150<br />
31 OA FOZ 15 -0.36<br />
32 P FPH 15 0.630<br />
33 OA FOH 15 -0.548<br />
34 H FHH 15 0.398<br />
35 OM FOT1 15 -0.635<br />
36 OM FOT2 15 -0.635<br />
Dihedral parameters<br />
ai aj ak al function c0 c1 c2<br />
13 2 1 3 1 180.000 33.5 2<br />
1 2 3 11 1 180.000 33.5 2<br />
3 11 12 13 1 180.000 33.5 2<br />
11 13 12 1 1 180.000 33.5 2<br />
2 20 21 24 1 0.000 5.86 3<br />
20 22 21 23 1 0.000 1.26 3<br />
20 21 24 27 1 0.000 5.86 3<br />
110
PART II: P450BM-3 Reductase Domain SI<br />
21 24 25 26 1 0.000 1.26 3<br />
21 24 27 30 1 0.000 5.86 3<br />
24 27 28 29 1 0.000 1.26 3<br />
24 30 27 31 1 0.000 5.86 3<br />
27 30 31 32 1 0.000 3.77 3<br />
30 31 32 33 1 0.000 1.05 3<br />
31 33 32 34 1 0.000 1.05 3<br />
Improper dihedral parameters<br />
ai aj ak al function c0 c1<br />
1 2 19 13 2 0.0 167.42<br />
1 12 14 13 2 0.0 167.42<br />
1 13 14 15 2 0.0 167.42<br />
1 19 17 15 2 0.0 167.42<br />
1 2 3 4 2 180.0 167.42<br />
1 12 13 11 2 0.0 167.42<br />
1 14 13 12 2 180.0 167.42<br />
1 2 3 11 2 0.0 167.42<br />
1 17 19 18 2 180.0 167.42<br />
2 1 13 12 2 0.0 167.42<br />
2 11 3 12 2 0.0 167.42<br />
2 1 13 14 2 180.0 167.42<br />
2 3 11 9 2 180.0 167.42<br />
2 19 1 17 2 180.0 167.42<br />
2 4 3 5 2 180.0 167.42<br />
2 1 3 20 2 0.0 167.42<br />
3 2 1 19 2 180.0 167.42<br />
3 5 4 7 2 0.0 167.42<br />
3 4 5 6 2 180.0 167.42<br />
3 11 9 7 2 0.0 167.42<br />
3 2 4 11 2 0.0 167.42<br />
111
PART II: P450BM-3 Reductase Domain SI<br />
3 12 9 11 2 0.0 167.42<br />
3 11 12 13 2 0.0 167.42<br />
3 1 2 13 2 0.0 167.42<br />
3 9 11 10 2 180.0 167.42<br />
4 11 3 9 2 0.0 167.42<br />
4 5 7 9 2 0.0 167.42<br />
4 5 7 8 2 180.0 167.42<br />
4 3 2 11 2 180.0 167.42<br />
4 11 3 12 2 180.0 167.42<br />
5 4 7 6 2 0.0 167.42<br />
5 4 3 11 2 0.0 167.42<br />
5 9 7 11 2 0.0 167.42<br />
5 9 7 10 2 180.0 167.42<br />
6 7 5 8 2 0.0 167.42<br />
6 5 7 9 2 180.0 167.42<br />
7 9 5 8 2 0.0 167.42<br />
7 9 10 11 2 180.0 167.42<br />
8 7 9 20 2 0.0 167.42<br />
8 7 9 11 2 180.0 167.42<br />
8 10 11 12 2 180.0 167.42<br />
9 7 11 10 2 0.0 167.42<br />
9 12 11 13 2 180.0 167.42<br />
12 1 13 19 2 180.0 167.42<br />
12 11 9 7 2 180.0 167.42<br />
12 13 14 15 2 180.0 167.42<br />
13 19 1 17 2 0.0 167.42<br />
13 15 14 17 2 0.0 167.42<br />
13 15 14 16 2 180.0 167.42<br />
13 12 1 14 2 0.0 167.42<br />
14 12 13 11 2 180.0 167.42<br />
112
PART II: P450BM-3 Reductase Domain SI<br />
14 1 13 19 2 0.0 167.42<br />
14 17 15 19 2 0.0 167.42<br />
14 15 17 18 2 180.0 167.42<br />
15 17 14 16 2 0.0 167.42<br />
16 15 17 18 2 0.0 167.42<br />
17 15 19 18 2 0.0 167.42<br />
19 17 15 16 2 180.0 167.42<br />
20 24 22 21 2 35.0 167.42<br />
21 27 25 24 2 35.0 167.42<br />
24 30 28 27 2 35.0 167.42<br />
Table S4.2: Force field parameters for FMN c<strong>of</strong>actor in reduced state for GROMOS96 43a1<br />
force field.[1]<br />
Atom number Atom type Atom name Charge group Partial charge<br />
1 C FC9A 1 0.1<br />
2 NR FN10 1 -0.2<br />
3 C FC10A 1 0.1<br />
4 NR FN1 2 -0.28<br />
5 H FH1 2 0.28<br />
6 C FC2 3 0.38<br />
7 O FO2 3 -0.38<br />
8 NR FN3 4 -0.28<br />
9 H FH3 4 0.28<br />
10 C FC4 5 0.38<br />
11 O FO4 5 -0.38<br />
12 C FC4A 6 0.00<br />
13 NR FN5 7 -0.28<br />
14 H FH5 7 0.28<br />
113
PART II: P450BM-3 Reductase Domain SI<br />
15 C FC5A 8 0.00<br />
16 CR1 FC6 9 0.00<br />
17 C FC7 10 0.00<br />
18 CH3 FCM7 10 0.00<br />
19 C FC8 11 0.00<br />
20 CH3 FCM8 11 0.00<br />
21 CR1 FC9 12 0.00<br />
22 CH2 FCA 13 0.00<br />
23 CH1 FCB 14 0.15<br />
24 OA FOB 14 -0.548<br />
25 H FHB 14 0.398<br />
26 CH1 FCG 14 0.15<br />
27 OA FOG 15 -0.548<br />
28 H FHG 15 0.398<br />
29 CH1 FCD 15 0.15<br />
30 OA FOD 16 -0.548<br />
31 H FHD 16 0.398<br />
32 CH2 FCE 16 0.15<br />
33 OA FOZ 17 -0.36<br />
34 P FPH 17 0.63<br />
35 OA FOH 17 -0.548<br />
36 H FHH 17 0.398<br />
37 OM FOT1 17 -0.635<br />
38 OM FOT2 17 -0.635<br />
Dihedral parameters<br />
ai aj ak al function c0 c1 c2<br />
15 1 3 2 1 180.00 33.5 2<br />
1 12 2 3 1 180.00 33.5 2<br />
3 15 12 13 1 180.00 33.5 2<br />
12 13 1 15 1 180.00 33.5 2<br />
114
PART II: P450BM-3 Reductase Domain SI<br />
2 22 23 26 1 0.00 5.86 2<br />
22 24 23 25 1 0.00 1.26 2<br />
22 23 26 29 1 0.00 5.86 2<br />
23 26 27 28 1 0.00 1.26 2<br />
23 26 30 29 1 0.00 5.86 2<br />
26 29 30 31 1 0.00 1.26 2<br />
26 32 29 33 1 0.00 5.86 2<br />
29 32 33 34 1 0.00 3.77 2<br />
32 33 34 35 1 0.00 1.05 2<br />
33 35 34 36 1 0.00 1.05 2<br />
Improper dihedral parameters<br />
ai aj ak al function c0 c1<br />
1 15 2 21 2 5 167.42<br />
1 13 16 15 2 0 167.42<br />
1 15 16 17 2 0 167.42<br />
1 21 19 17 2 0 167.42<br />
1 2 3 4 2 160 167.42<br />
1 13 15 12 2 50 167.42<br />
1 16 15 13 2 180 167.42<br />
1 3 2 12 2 50 167.42<br />
1 19 21 20 2 180 167.42<br />
2 1 15 13 2 60 167.42<br />
2 12 3 13 2 60 167.42<br />
2 15 1 16 2 180 167.42<br />
2 3 12 10 2 180 167.42<br />
2 21 1 19 2 180 167.42<br />
2 4 3 6 2 180 167.42<br />
2 3 1 22 2 5 167.42<br />
3 1 2 21 2 160 167.42<br />
3 6 4 8 2 0 167.42<br />
115
PART II: P450BM-3 Reductase Domain SI<br />
3 4 6 7 2 180 167.42<br />
3 12 10 8 2 0 167.42<br />
3 2 4 12 2 5 167.42<br />
3 13 10 12 2 0 167.42<br />
3 12 13 15 2 50 167.42<br />
3 1 2 15 2 50 167.42<br />
4 12 3 10 2 0 167.42<br />
4 6 8 10 2 0 167.42<br />
4 6 8 9 2 180 167.42<br />
4 3 2 12 2 180 167.42<br />
4 12 3 13 2 180 167.42<br />
5 6 4 3 2 180 167.42<br />
6 4 8 7 2 0 167.42<br />
6 4 3 12 2 0 167.42<br />
6 10 8 12 2 0 167.42<br />
6 10 8 11 2 180 167.42<br />
7 8 6 9 2 0 167.42<br />
7 6 8 10 2 180 167.42<br />
8 10 6 9 2 0 167.42<br />
8 10 11 12 2 180 167.42<br />
9 8 10 11 2 0 167.42<br />
9 8 10 12 2 180 167.42<br />
10 8 12 11 2 0 167.42<br />
10 13 12 15 2 160 167.42<br />
11 12 10 3 2 180 167.42<br />
13 1 15 21 2 180 167.42<br />
13 10 12 8 2 180 167.42<br />
13 16 15 17 2 180 167.42<br />
14 15 13 12 2 135 167.42<br />
15 21 1 19 2 0 167.42<br />
116
PART II: P450BM-3 Reductase Domain SI<br />
15 17 16 19 2 0 167.42<br />
15 16 17 18 2 180 167.42<br />
15 13 1 16 2 0 167.42<br />
16 15 13 12 2 160 167.42<br />
16 1 15 21 2 0 167.42<br />
16 19 17 21 2 0 167.42<br />
16 17 19 20 2 180 167.42<br />
17 19 16 18 2 0 167.42<br />
18 17 19 20 2 0 167.42<br />
19 17 21 20 2 0 167.42<br />
21 19 17 18 2 180 167.42<br />
22 26 24 23 2 35 334.84<br />
23 29 27 26 2 35 334.84<br />
26 32 30 29 2 35 334.84<br />
117
PART II: P450BM-3 Reductase Domain SI<br />
Figure S4.1: Secondary structure per residue calculated by DSSP[2] along the trajectory as a<br />
function <strong>of</strong> time for (a) FOX, (b) FHQ <strong>and</strong> (c) APO. Color code represents different secondary<br />
structure elements.<br />
118
PART II: P450BM-3 Reductase Domain SI<br />
Figure S4.2: (a) Multiple structure alignment <strong>of</strong> FMN domain (1BVY) <strong>and</strong> its homologous<br />
structures (summarized in Table 2) with the conservation pr<strong>of</strong>ile, root mean square deviation<br />
(RMSD) <strong>and</strong> charge variation per residue created by using Chimera[3]. . Green <strong>and</strong> yellow color<br />
boxes show the helixes <strong>and</strong> beta str<strong>and</strong>s, respectively. However the purple color boxes represent<br />
the residues involved in FMN binding. (b) The phylogenetic tree <strong>of</strong> FMN domain <strong>and</strong> its homologous<br />
structures generated by ClustalW2[4].<br />
119
PART II: P450BM-3 Reductase Domain SI<br />
Figure S4.3: Evolutionary conservation pr<strong>of</strong>ile on the FMN domain <strong>of</strong> P450BM-3 using<br />
RWB color scheme. Red region shows the highly conserved residues in the domain. FMN<br />
domain is shown in cartoon representation with FMN c<strong>of</strong>actor in green color <strong>and</strong> labeled<br />
helixes <strong>and</strong> loop op regions. FMN binding loops are labeled in blue color. N <strong>and</strong> C represent<br />
the amino <strong>and</strong> carboxy terminus <strong>of</strong> FMN domain.<br />
Figure S4.4: Cumulative sum <strong>of</strong> the number <strong>of</strong> clusters obtained from the simulations. The<br />
sampling <strong>of</strong> clusters was performed over 50 ns <strong>of</strong> FOX (black) <strong>and</strong> FHQ (red) using RMSD cut<strong>of</strong>f <strong>of</strong><br />
0.04 nm. . The representative conformations <strong>of</strong> FMN c<strong>of</strong>actor in the first cluster <strong>of</strong> FOX (black) <strong>and</strong><br />
FHQ (red) are shown.<br />
120
PART II: P450BM-3 Reductase Domain SI<br />
References<br />
1. van Gunsteren WF, Billeter SR, Eising AA, Hunenberger PH, Kruger P, et al. (1996)<br />
Biomolecular Simulation: The GROMOS96 Manual <strong>and</strong> User Guide. VdF:<br />
Hochschulverlag AG an der ETH Zurich <strong>and</strong> BIOMOS bv, Zurich, Groningen.<br />
2. Kabsch W, S<strong>and</strong>er C (1983) Dictionary <strong>of</strong> protein secondary structure: pattern<br />
recognition <strong>of</strong> hydrogen-bonded <strong>and</strong> geometrical features. Biopolymers 22: 2577-<br />
2637.<br />
3. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, et al. (2004) UCSF<br />
Chimera--a visualization system for exploratory research <strong>and</strong> analysis. J Comput<br />
Chem 25: 1605-1612.<br />
4. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, et al. (2007) Clustal<br />
W <strong>and</strong> Clustal X version 2.0. Bioinformatics 23: 2947-2948.<br />
Part <strong>of</strong> this chapter is adapted with permission from ‘Verma R, Schwaneberg U,<br />
Roccatano D. Journal <strong>of</strong> Chemical Theory <strong>and</strong> Computation DOI: 10.1021/ct300723x.’<br />
121
PART II: P450BM-3 HEME/FMN Complex<br />
Chapter 5<br />
Insight into the redox partner interaction mechanism in<br />
cytochrome P450BM-3 using molecular dynamics<br />
simulation<br />
5.1. Abstract<br />
Flavocytochrome P450BM-3 is a soluble bacterial reductase composed by two flavin<br />
(FAD/FMN) <strong>and</strong> one HEME domains. The underst<strong>and</strong>ing <strong>of</strong> atomic details <strong>of</strong> the inter<br />
domain electron transfer (ET) mechanism is requisite for better exploitation <strong>of</strong> the enzyme<br />
in biotechnological applications <strong>and</strong> to extend the knowledge <strong>of</strong> P450 proteins family in<br />
general. In this paper, we have performed molecular dynamics (MD) simulations on both<br />
FMN <strong>and</strong> HEME domains, isolated <strong>and</strong> their crystallographic complex to study their binding<br />
modes <strong>and</strong> to garner insight on structural determinant for inter-domain ET. In the<br />
simulation <strong>of</strong> the complex, we observed conformational rearrangements in both the<br />
domains that reduce the separation between FMN <strong>and</strong> HEME c<strong>of</strong>actor. In particular,<br />
FMN/HEME closest distance was decreased from 1.84 nm (in crystal structure) to an<br />
average <strong>of</strong> 1.41 ± 0.09 nm during the simulation with a minimum distance <strong>of</strong> 1.02 nm.<br />
These distance values are within the range <strong>of</strong> distance for ET tunneling between the two<br />
redox centers. The analysis <strong>of</strong> the possible ET pathways in the crystal complex indicates<br />
Met490 <strong>of</strong> ribityl tail binding loop <strong>of</strong> FMN domain, <strong>and</strong> Ala399 <strong>and</strong> Cys400 <strong>of</strong> HEME<br />
domain as possible mediators for ET. However, during simulation, at ~1.41 nm FMN/HEME<br />
122
PART II: P450BM-3 HEME/FMN Complex<br />
distance, in spite <strong>of</strong> Ala399, Cys400, Phe393 along with Met490 take part in ET while, with<br />
the minimum FMN/HEME distance <strong>of</strong> ~1.02 nm, only Met490 mediates the ET tunneling.<br />
The results <strong>of</strong> the simulations are in agreement with previously proposed hypotheses that<br />
the crystal complex <strong>of</strong> FMN/HEME domains is not in the optimal arrangement for favorable<br />
needed electron transfer rate under physiological conditions.<br />
5.2. Introduction<br />
Cytochrome P450s are the regio- <strong>and</strong> stereo-selective monooxygenase <strong>of</strong> the family<br />
oxidoreductase with a wide variety <strong>of</strong> substrates.[1-3] They have been studied as the<br />
potential catalyst for the production <strong>of</strong> high value oxygenated organic molecules to<br />
promote enzyme-mediated product formation.[4-6] In particular, cytochrome P450BM-3 is<br />
a NADPH dependent fatty acid hydroxylase system, isolated from soil bacterium Bacillus<br />
megaterium.[7,8] This enzyme is an attractive target <strong>and</strong> model system for biochemical <strong>and</strong><br />
biomedical applications for different reasons. First, it is a stable, catalytically self-sufficient<br />
protein with a convenient multidomain structure that allows easier production <strong>and</strong><br />
h<strong>and</strong>ling than other monooxygenases <strong>of</strong> the same family. Second, it is a water soluble<br />
enzyme with a high catalytic efficiency <strong>and</strong> oxygenase rate <strong>and</strong> readily expressed<br />
recombinantly.[9,10] Third, it resembles to eukaryotic diflavin reductase such as human<br />
microsomal P450s. As a pivotal member <strong>of</strong> its super family it has been widely studied as an<br />
important model system for the comprehension <strong>of</strong> structure-function-dynamics<br />
relationships with the wealth <strong>of</strong> structural <strong>and</strong> kinetic data.[11,12]<br />
P450BM-3, being a multidomain protein, has two reductase flavin adenine<br />
dinucleotide (FAD)- <strong>and</strong> flavin mononucleotide (FMN)- binding domains <strong>and</strong> a HEME<br />
domain arranged as HEME-FMN-FAD on a single polypeptide chain.[13,14] The main<br />
catalytic function <strong>of</strong> P450s is to transfer oxygen atom from molecular oxygen to their<br />
substrates. During the reaction, the enzyme is reduced by NADPH, with electrons first<br />
transferred to FAD c<strong>of</strong>actor <strong>of</strong> FAD-binding domain <strong>and</strong> then to HEME iron in the substrate<br />
123
PART II: P450BM-3 HEME/FMN Complex<br />
bound HEME domain mediated by FMN c<strong>of</strong>actor <strong>of</strong> FMN-binding domain. The<br />
crystallization <strong>of</strong> the whole P450BM-3 protein has been proven difficult due to the<br />
presence <strong>of</strong> flexible linker regions between domains. However, the crystallographic<br />
structures <strong>of</strong> the isolated HEME domain[15], FAD domain[16] <strong>and</strong> a non-stoichiometric<br />
complex with one FMN <strong>and</strong> two HEME domains[15] are available in the PDB database. In<br />
the FMN/HEME complex (PDB ID: 1BVY) the smallest edge to edge distance between redox<br />
centers is 1.81 nm.[15] However, it has been shown from the survey <strong>of</strong> electron transfer<br />
(ET) in oxidoreductase protein structures that the latter should be less than 1.40 nm for an<br />
efficient ET tunneling between redox centers in the protein environment.[17] Munro et al.<br />
used modeling approach to rationalize the electron transfer between FMN to HEME <strong>and</strong><br />
postulated the movement <strong>of</strong> FMN domain is essential to decrease the distance between<br />
FMN <strong>and</strong> HEME c<strong>of</strong>actors within the physiological range (less than 1.40 nm) for ET.[11]<br />
In this study, we aim to extend our knowledge regarding structure-functiondynamics<br />
relationships in P450BM-3 at atomistic level using Molecular dynamics (MD)<br />
simulations <strong>of</strong> the isolated HEME <strong>and</strong> FMN domains <strong>and</strong> <strong>of</strong> their complex in water. It has<br />
been proved experimentally that specific arrangement <strong>of</strong> HEME <strong>and</strong> FMN domain is<br />
responsible for the catalytic efficiency <strong>and</strong> high oxygenase rate <strong>of</strong> P450BM-3.[18] In this<br />
paper, for the first time, the dynamics in solution <strong>of</strong> the complex <strong>and</strong> the isolated HEME <strong>and</strong><br />
FMN domains will be comparatively investigated. In particular, the relative rearrangement<br />
<strong>of</strong> FMN/HEME domains <strong>and</strong> how the latter affects the ET pathways from isoalloxazine ring<br />
<strong>of</strong> FMN c<strong>of</strong>actor to HEME iron will be analyzed.<br />
The chapter is organized as follows. The details <strong>of</strong> the MD simulations <strong>and</strong> the<br />
analysis <strong>of</strong> the trajectories are reported in the Method section. The Results <strong>and</strong> Discussions<br />
section is organized as follows. In the first part, the general structural <strong>and</strong> properties <strong>of</strong> the<br />
simulated systems to assess the quality <strong>of</strong> the simulation are reported. Cluster analysis is<br />
used to identify representative structures to evidence the difference <strong>of</strong> the domain in<br />
solution <strong>and</strong> in the complex. The following paragraphs will focus on the ET pathways<br />
between the FMN <strong>and</strong> HEME calculated on selected conformations from the cluster analysis<br />
<strong>of</strong> the trajectory. The structural behavior <strong>of</strong> the substrate access channel will be also<br />
124
PART II: P450BM-3 HEME/FMN Complex<br />
reported. Hence, the collective dynamics <strong>of</strong> the system will be analyzed using the principal<br />
component analysis <strong>of</strong> the trajectories. Finally, in the conclusion section a summary <strong>of</strong> the<br />
outcome <strong>of</strong> the study is provided.<br />
5.3. Methods<br />
5.3.1. Starting coordinates<br />
The non-stoichiometric FMN/(HEME)2 complex <strong>of</strong> one FMN domain without<br />
substrate (PDB ID: 1BVY with resolution 0.203 nm)[15] were used as to obtain the starting<br />
coordinate for MD simulation. Out <strong>of</strong> two HEME domains (chain A: 20 - 450) was in close<br />
proximity <strong>of</strong> FMN domain (chain F: 479 - 630) in the crystal structure. Hence, These A <strong>and</strong><br />
F chains were extracted from crystal structure (including crystallographic water within<br />
0.60 nm from the proteins) <strong>and</strong> used as starting coordinates for MD simulation. 1,2-<br />
ethanediol molecules were removed from the crystallographic structure <strong>and</strong> replaced by<br />
water molecules.<br />
5.3.2. Molecular dynamic simulations<br />
The GROMOS96 43a1 force field[19] was used for all simulations. The MD<br />
simulations performed in this study are summarized in Table 5.1. Figure 5.1 shows the<br />
FMN <strong>and</strong> HEME c<strong>of</strong>actors in stick representation. The parameters for the ferric iron <strong>of</strong><br />
HEME c<strong>of</strong>actor were adopted from Helms et al.[20] <strong>and</strong> has been employed already for the<br />
MD simulation <strong>of</strong> P450BM-3 HEME domain by Roccatano et al..[21,22] The partial charges<br />
were redistributed on porphyrin ring <strong>of</strong> HEME c<strong>of</strong>actor to adopt the parameters for<br />
GROMOS96 43a1 force field[19] with hydrogen atoms bound to bridging carbon in<br />
porphyrin ring (see Table S5.1 in Supplementary Information (SI)). FMN c<strong>of</strong>actor was in<br />
oxidized state in the FMN domain. Additional improper dihedrals were introduced to adopt<br />
125
PART II: P450BM-3 HEME/FMN Complex<br />
the conformation <strong>of</strong> isoalloxazine ring as observed in crystallographic structure <strong>and</strong><br />
molecular geometry optimization <strong>of</strong> flavin in both redox states.[23-25]<br />
Table 5.1: Summary <strong>of</strong> the MD simulations <strong>of</strong> P450BM-3 in water<br />
Starting coordinates<br />
No. <strong>of</strong> atoms No. <strong>of</strong> solvent No.<br />
<strong>of</strong><br />
molecules counter ions<br />
(Na + )<br />
HEME Domain (A<br />
65650 20365 16<br />
chain)<br />
FMN Domain (F chain) 33483 10650 14<br />
Complex (AF chain) 86101 26671 30<br />
Simulation<br />
length (ns)<br />
100<br />
100<br />
100<br />
*The abbreviation A, F <strong>and</strong> AF chain will be used in rest <strong>of</strong> the paper for HEME domain, FMN domain <strong>and</strong><br />
HEME/FMN complex, respectively.<br />
126
PART II: P450BM-3 HEME/FMN Complex<br />
Figure 5.1: (a) HEME c<strong>of</strong>actor <strong>and</strong> (b) FMN c<strong>of</strong>actor are in stick representation, colored by<br />
elements such as, oxygen in red, nitrogen in blue, hydrogen in green, iron or phosphorus in orange<br />
<strong>and</strong> carbon in gray, with atomic labeling according to GROMOS96[19] topology.<br />
5.3.2. Electron transfer tunneling<br />
Electron tunneling (ET) from FMN to HEME c<strong>of</strong>actor was calculated by the program<br />
Pathways.[26,27] The method calculates donor to acceptor partial electronic coupling<br />
influenced by protein structure using graph theory to identify the electron transfer<br />
pathways in biological electron transfer reactions.[26] FMN to HEME c<strong>of</strong>actor ET pathway<br />
was identified in the crystal structure <strong>and</strong> P450BM-3 conformation after rearrangement by<br />
taking C8 atom <strong>of</strong> isoalloxazine ring <strong>of</strong> FMN c<strong>of</strong>actor as donor <strong>and</strong> HEME iron as acceptor<br />
with the default parameters <strong>of</strong> Pathways. The ET pathway was visualized by VMD.[28]<br />
5.4. Results <strong>and</strong> discussion<br />
5.4.1. Structural properties<br />
The structural stability <strong>and</strong> convergence <strong>of</strong> P450BM-3 domains were examined by<br />
analyzing root mean square deviation (RMSD), radius <strong>of</strong> gyration (Rg) <strong>and</strong> secondary<br />
structure elements with respect to crystal structure during the MD simulation. Figure 5.2a<br />
shows the backbone RMSD <strong>of</strong> the proteins as a function <strong>of</strong> time. The total RMSD curves for<br />
both the AF chains <strong>and</strong> the single A chain reach to a plateau with an average RMSD <strong>of</strong> 0.41 ±<br />
0.03 nm <strong>and</strong> 0.36 ± 0.03 nm, respectively. The RMSD <strong>of</strong> isolated A chain shows an average<br />
plateau to a slightly lower value <strong>of</strong> 0.33 ± 0.02 nm. The F chain in the complex increases its<br />
RMSD value to an average on the last 10 ns <strong>of</strong> simulation <strong>of</strong> 0.25 ± 0.02 nm. The RMSD <strong>of</strong><br />
isolated F chain increases rapidly to stabilize after 10 ns <strong>of</strong> simulation to an average value<br />
<strong>of</strong> 0.26 ± 0.02 nm. In Figure 5.2b, the radius <strong>of</strong> gyrations is also reported. In the first 10 ns<br />
127
PART II: P450BM-3 HEME/FMN Complex<br />
<strong>of</strong> the simulation, the Rg <strong>of</strong> the complex decreases <strong>of</strong> ~3.7% from the crystallographic<br />
value (2.42 nm) to the average value <strong>of</strong> 2.33 ± 0.01 nm. The variation <strong>of</strong> the single A<br />
domain in the complex <strong>and</strong> in solution with respect the initial structure (2.16 nm) is less<br />
than 1.8% (2.12 ± 0.01 nm <strong>and</strong> 2.14 ± 0.01 nm, respectively). F chain does not show<br />
variation from the crystal structure (1.45 nm) with an average <strong>of</strong> 1.45 ± 0.01 nm <strong>and</strong> 1.46 ±<br />
0.01 nm for isolated F chain <strong>and</strong> in complex simulation, respectively.<br />
Figure 5.2: (a) Backbone RMSD <strong>and</strong> (b) Rg with respect to crystal structure as a function <strong>of</strong> time<br />
for AF chain (black), A <strong>of</strong> AF chain (red), F <strong>of</strong> AF chain (green), A chain (blue) <strong>and</strong> F chain (orange).<br />
In P450BM-3, A <strong>and</strong> F chain have structurally conserved P450 <strong>and</strong> flavodoxin like<br />
protein fold, respectively. Figure 5.3c shows the structure with labeled helices <strong>of</strong> A (A to L)<br />
<strong>and</strong> F (α1 to α4) chain <strong>and</strong> FMN binding loops (Lβ1, Lβ3 <strong>and</strong> Lβ4). The loop regions<br />
together with irregular structures (coils <strong>and</strong> turns) are named according to the secondary<br />
structure element (α helix or β sheet) preceding them. DSSP criteria[29]<br />
were used to<br />
follow the secondary structure <strong>of</strong> the P450BM-3 domains in isolated <strong>and</strong> complex MD<br />
simulations (Figure S5.1 in SI). The secondary structure remains fairly conserved during<br />
the simulations.<br />
128
PART II: P450BM-3 HEME/FMN Complex<br />
Figure 5.3a <strong>and</strong> 5.3b show residual RMSD <strong>and</strong> RMSF with respect to crystal<br />
structure, respectively. The regions involved in c<strong>of</strong>actor binding show smaller deviation<br />
<strong>and</strong> fluctuation from the crystal structure in isolated (in red color) <strong>and</strong> complex (black<br />
color) simulations. For both domains, the loop regions <strong>and</strong> N- <strong>and</strong> C- terminus show higher<br />
deviation. The isolated domains deviate more than the one in complex except the region<br />
between helices, A - B, B’ - C, H - I, <strong>and</strong> K - L <strong>and</strong> in G helix in A chain <strong>and</strong> Lβ3 in F chain.<br />
Isolated F chain shows largest deviation in Lβ2 <strong>and</strong> Lβ4 regions. In both systems, F chain<br />
shows higher fluctuation in Lβ2 <strong>and</strong> Lα2 loops. In complex simulation, the loop regions<br />
A/B, <strong>and</strong> F/G fluctuate more. While in isolated F chain simulation, inner FMN c<strong>of</strong>actor<br />
binding loop Lβ3 fluctuate slightly more.<br />
Figure 5.3: Backbone RMSD (a) <strong>and</strong> RMSF (b) per residue with respect to crystal structure for<br />
isolated domains (in red) <strong>and</strong> in complex (in black) MD simulations. The green vertical line<br />
separates HEME <strong>and</strong> FMN domains. Horizontal bars, in blue <strong>and</strong> orange color represent helices<br />
(labeled) <strong>and</strong> beta sheets, respectively. The regions involved in c<strong>of</strong>actor binding are represented by<br />
horizontal bars in purple color. (c) HEME <strong>and</strong> FMN domain are in cartoon representation in sky<br />
blue <strong>and</strong> tan color, respectively. HEME <strong>and</strong> FMN c<strong>of</strong>actor are in red <strong>and</strong> green color, respectively.<br />
Helices, c<strong>of</strong>actors, FMN binding regions <strong>and</strong>, N- <strong>and</strong> C- terminus are labeled.<br />
129
PART II: P450BM-3 HEME/FMN Complex<br />
5.4.2. Cluster analysis<br />
The first two clusters account for 46.16 % <strong>and</strong> 27.12 % for AF chain, respectively.<br />
For isolated domain simulation, A chain <strong>and</strong> F chain have 6 clusters <strong>and</strong> in complex<br />
simulation 7 <strong>and</strong> 8 clusters, respectively. The first two clusters <strong>of</strong> A chain in complex covers<br />
76.87 % <strong>and</strong> 10.99 % <strong>and</strong> as isolated domains they account for 46.53 % <strong>and</strong> 30.12 %,<br />
respectively. The first cluster <strong>of</strong> F chain covers ~64 % <strong>and</strong> second cluster covers 21.05%<br />
<strong>and</strong> 23.05 % in complex <strong>and</strong> isolated simulation, respectively. A chain is more liable for<br />
conformational change in isolated simulation than in complex, while F chain shows the<br />
negligible difference in conformation space in both the simulations.<br />
Figure 5.4: The representative conformation <strong>of</strong> first cluster <strong>of</strong> (a) AF, (b) A <strong>and</strong> (c) F chain in<br />
cartoon representation superimposed with crystal structure. In the crystal structure, A <strong>and</strong> F chain<br />
are in sky blue <strong>and</strong> tan color, respectively. For the complex simulation, A <strong>and</strong> F chain are in dark<br />
blue <strong>and</strong> brown color, respectively. For the isolated domain simulation, A <strong>and</strong> F chain are in orange<br />
130
PART II: P450BM-3 HEME/FMN Complex<br />
<strong>and</strong> purple color, respectively. HEME <strong>and</strong> FMN c<strong>of</strong>actors are in green, red <strong>and</strong> blue color in crystal<br />
structure, isolated domain <strong>and</strong> in complex structure, respectively. The helices <strong>and</strong> FMN c<strong>of</strong>actor<br />
binding loops are labeled. Amino <strong>and</strong> carboxy terminal <strong>of</strong> the domain are labeled in red color.<br />
Figure 5.4a, 5.4b <strong>and</strong> 5.4c show the crystal structure superimposed with the<br />
representative conformation <strong>of</strong> the first cluster <strong>of</strong> AF chain <strong>and</strong>, A <strong>and</strong> F chain in isolated<br />
<strong>and</strong> complex simulation, respectively. Major differences were observed in the loop regions<br />
<strong>of</strong> the domains in both simulations. N terminus region (residue 20 – 82, including A, B <strong>and</strong><br />
B’ helices) <strong>of</strong> A chain deviates more from crystal structure in both the simulations. In<br />
complex simulation, larger deviation in G helix <strong>and</strong>, H/I <strong>and</strong> K/L (residue 380 - 390) loop<br />
region <strong>of</strong> A chain. Residues 380 – 390 in K/L loop region precedes HEME c<strong>of</strong>actor binding<br />
region. H/I <strong>and</strong> K/L loops are involved in the binding <strong>of</strong> FMN domain. α2 helix <strong>of</strong> F chain<br />
shows larger deviation in isolated F chain simulation <strong>and</strong> resulted in a compact<br />
conformation <strong>of</strong> FMN domain in solution than in complex. In complex simulation, the<br />
representative structure <strong>of</strong> first cluster <strong>of</strong> AF chain represents the conformational<br />
rearrangement in both domains to increase compactness <strong>of</strong> AF chains complex. The<br />
deviations in both the domains from crystal structure mainly involve G helix <strong>and</strong> H/I <strong>and</strong><br />
K/L loops <strong>of</strong> A chain <strong>and</strong> displacement <strong>of</strong> F chain towards HEME domain that resulted into<br />
the decrease in the minimum distance between both the c<strong>of</strong>actors.<br />
5.4.3. Substrate access channel<br />
Pro45 <strong>and</strong> Ala191 were found to be at the mouth <strong>of</strong> substrate access channel. In the<br />
crystal structure <strong>of</strong> P450BM-3 complex, P45Cα - A191Cα is 1.61 nm apart (0.87 nm in A<br />
chain <strong>of</strong> 1BU7). Chang et. al. observed that the substrate binding was not dramatically<br />
affected by the closeness <strong>of</strong> substrate access channel in P450BM-3 using MD simulation<br />
<strong>and</strong> docking approach.[30] The behavior <strong>of</strong> substrate access channel has been accessed by<br />
monitoring the distance between these two residues by Roccatano et al..[22] P45Cα -<br />
A191Cα minimum distance was calculated <strong>and</strong> reported in Figure 5.5 during the isolated<br />
domain <strong>and</strong> complex simulations. Both simulations show higher variations in P45Cα -<br />
131
PART II: P450BM-3 HEME/FMN Complex<br />
A191Cα distance in the first 20 ns simulation. After that in isolated A chain, an average<br />
distance <strong>of</strong> 1.11 ± 0.10 nm was observed with slight variations. In A chain <strong>of</strong> AF chain, the<br />
P45Cα - A191Cα distance continues decreasing till 32 ns simulation <strong>and</strong> reaches to an<br />
average distance <strong>of</strong> 0.59 ± 0.10 nm. In comparison to isolated A chain substrate access<br />
channel was partially closed in A chain in complex that might be the result <strong>of</strong> more<br />
deviation <strong>of</strong> F/G loop in complex than in isolated domain.<br />
Figure 5.5: Minimum distance between P45Cα <strong>and</strong> A191Cα as a function <strong>of</strong> time for A chain in<br />
isolated (in red color) <strong>and</strong> complex (in black color) simulations.<br />
In the crystal structure, the crystallographic water molecule was not ligated to heme<br />
iron (Fe) (distance in 1BVY > 6 nm <strong>and</strong> 1BU7 0.24 nm). When A chain was solvated in<br />
water (crystal structure), the water molecule was present at the distance <strong>of</strong> 0.47 nm <strong>and</strong><br />
0.34 nm from heme iron in isolated <strong>and</strong> complex simulation. Figure S5.2 in SI shows the<br />
minimum distance between Fe <strong>and</strong> water molecules (every 100 ps). An average distance <strong>of</strong><br />
0.28 ± 0.13 nm <strong>and</strong> <strong>of</strong> 0.34 ± 0.14 nm was observed for A chain in complex <strong>and</strong> isolated<br />
domain simulation, respectively.<br />
132
PART II: P450BM-3 HEME/FMN Complex<br />
5.4.4. ET tunneling pathways<br />
The minimum distance between heavy atoms <strong>of</strong> isoalloxazine ring <strong>of</strong> FMN <strong>and</strong><br />
HEME c<strong>of</strong>actors is represented in Figure 5.6 (the AF chain simulation was extended to 150<br />
ns to check the distance convergence). During the simulation, FMN/HEME distance is<br />
decreased from 1.81 nm (in crystal structure) to an average distance <strong>of</strong> 1.41 ± 0.09 nm with<br />
the minimum distance <strong>of</strong> 1.02 nm that is within the range for expected ET between redox<br />
centers[17] (1.40 - 1.50 nm) <strong>and</strong> proposed by Munro et al. 11 The decreased distance might<br />
result into the ET rate <strong>of</strong> 10 8 to 10 11 s -1 , that is consistent with experimental <strong>and</strong> theoretical<br />
observations.[11,17]<br />
Figure 5.6: Minimum distance between heavy atoms <strong>of</strong> isoalloxazine ring <strong>of</strong> FMN <strong>and</strong> HEME<br />
c<strong>of</strong>actor as a function <strong>of</strong> time. Red color horizontal line shows the distance observed in crystal<br />
structure.[15]<br />
133
PART II: P450BM-3 HEME/FMN Complex<br />
Figure 5.7a, 5.7b <strong>and</strong> 5.7c show the ET pathway identified by Pathways VMD<br />
plugin[27] in the crystal structure (min. dist 1.80 nm), representative <strong>of</strong> first cluster <strong>of</strong> AF<br />
chain (minimum distance 1.41 nm) <strong>and</strong> AF chain with minimum distance (minimum<br />
distance 1.10 nm) between FMN to HEME c<strong>of</strong>actor, respectively. In Table 5.2, the results <strong>of</strong><br />
the analysis are summarized. In the crystal structure, FMN to HEME ET tunneling is<br />
mediated by solvent molecules as well but after rearrangement in AF conformation, FMN<br />
c<strong>of</strong>actor come close to HEME c<strong>of</strong>actor <strong>and</strong> eliminate the involvement <strong>of</strong> water molecules in<br />
ET tunneling. In Figure 5.7c, when the FMN to HEME distance is ~ 1 nm, ET tunneling is<br />
mediated by the Met490 residue only <strong>and</strong> the ET pathway length decrease from 2.7 nm (in<br />
crystal structure) to 1.8 nm <strong>and</strong> electronic coupling from 4.00 x 10 -10 to 2.68 x 10 -8 ,<br />
respectively.<br />
Table 5.2: Electron transfer tunneling pathway in AF chain complex calculated by<br />
Pathways[27] VMD plugin.<br />
Coordinates<br />
FMN/HEME<br />
Max. Distance<br />
minimum<br />
coupling along ET<br />
distance<br />
(a.u.) pathway (nm)<br />
(nm)<br />
Crystal<br />
structure<br />
1.8 4.00 x10 -10 2.70<br />
First cluster 1.4 9.07 x10 -9 1.96<br />
Minimum<br />
FMN/HEME 1.1 2.68 x10 -8 1.79<br />
distance<br />
Amino acids involved in<br />
the ET pathway<br />
FMN(C8) → M490 →<br />
Sol → Sol → A399 →<br />
C400 → HEME(FE)<br />
FMN(C8) → M490 →<br />
→ F393 → HEME(FE)<br />
FMN(C8) → M490 →<br />
→ HEME(FE)<br />
134
PART II: P450BM-3 HEME/FMN Complex<br />
Figure 5.7: ET tunneling from the isoalloxazine ring (C8 atom) <strong>of</strong> FMN c<strong>of</strong>actor (in gray color) to<br />
iron center <strong>of</strong> HEME c<strong>of</strong>actor (in black color) represented by red color tubes in a) crystal structure,<br />
b) conformation <strong>of</strong> first cluster <strong>and</strong> c) conformation with minimum distance between HEME to FMN<br />
c<strong>of</strong>actor. The amino acids with in the distance <strong>of</strong> 0.50 nm from both the c<strong>of</strong>actors are labeled <strong>and</strong><br />
shown in licorice representation colored by element type (oxygen in red, carbon in cyan <strong>and</strong><br />
nitrogen in blue color) <strong>and</strong> their associated secondary structure in cartoon representation in sky<br />
blue for HEME domain <strong>and</strong> in orange color for FMN domain. The residues involved in electron<br />
tunneling are represented <strong>and</strong> labeled in green color.<br />
5.4.5. Essential dynamics<br />
The cumulative sum <strong>of</strong> relative positional fluctuation (RPF) <strong>of</strong> first 50 eigenvectors<br />
<strong>of</strong> A <strong>and</strong> F chain in isolated <strong>and</strong> complex simulation is greater than 69% <strong>and</strong> reported in<br />
135
PART II: P450BM-3 HEME/FMN Complex<br />
Figure S5.3 <strong>of</strong> SI. RMSIP for first twenty eigenvectors <strong>of</strong> A chain <strong>and</strong> F chain in both<br />
simulations was less than 0.53. The inner product value <strong>of</strong> the first three eigenvectors for A<br />
<strong>and</strong> F chain were less than 0.25 <strong>and</strong> 0.43, respectively. The overlap <strong>and</strong> inner product<br />
analysis indicate the existence <strong>of</strong> different set <strong>of</strong> collective motions in the eigenvectors <strong>of</strong><br />
same time windows <strong>of</strong> both the trajectories.<br />
Figure 5.8a, 5.8b <strong>and</strong> 5.8c represent RPF associated with first three eigenvectors <strong>of</strong><br />
A <strong>and</strong> F chain in isolated (in red color) <strong>and</strong> complex (in black color) simulation. Figure 5.9<br />
show the RMSF associated with first three eigenvector (a, b <strong>and</strong> c) <strong>of</strong> A (in sky blue) <strong>and</strong> F<br />
(in tan color) chain in isolated (a1, b1 <strong>and</strong> c1) <strong>and</strong> complex (a2, b2 <strong>and</strong> c2) simulation,<br />
respectively.<br />
Figure 5.8: RPF for (a) first, (b) second <strong>and</strong> (c) third eigenvector <strong>of</strong> A <strong>and</strong> F chain in isolated (red<br />
color) <strong>and</strong> complex (black color) simulation. The green vertical line separates HEME <strong>and</strong> FMN<br />
domain. Horizontal bars, in blue <strong>and</strong> orange color represent helixes (labeled) <strong>and</strong> beta sheets,<br />
136
PART II: P450BM-3 HEME/FMN Complex<br />
respectively. The regions involved in c<strong>of</strong>actor binding are represented by horizontal bars in purple<br />
color.<br />
In complex simulation, the first collective motion (Figure 5.9a1) <strong>of</strong> A chain involves<br />
the turn succeeds beta sheet 1 (residues 44 – 48, highest RPF for Arg47 that is involved in<br />
substrate binding), D/E loop (residues 130 – 138), F/G loop (residues 190 – 196), K/L loop<br />
(residue 385 – 390) <strong>and</strong> C- terminus loop (residues 425 – 432 <strong>and</strong> 452 – 458). The<br />
cooperative motion in the turn succeeds beta sheet 1 <strong>and</strong> F/G loop related to the<br />
movement <strong>of</strong> substrate access channel closing <strong>and</strong> opening. Residue F393 <strong>of</strong> the latter<br />
region <strong>of</strong> K/L loop was involves FMN domain binding <strong>and</strong> found to be involved in ET<br />
tunneling in the average structure <strong>of</strong> first cluster <strong>of</strong> AF chain. The first collective mode <strong>of</strong> F<br />
chain in complex involves the major contribution <strong>of</strong> Lα2 loop with slightly higher RPF <strong>of</strong><br />
Lβ2 <strong>and</strong> Lβ3 (inner FMN binding loop). The cooperative motion <strong>of</strong> Lα2 <strong>and</strong> Lβ3 might<br />
facilitate ET tunneling from FMN to HEME c<strong>of</strong>actor. In complex the collective motions <strong>of</strong><br />
both the domains were synchronized to relate ET tunneling <strong>and</strong> change in substrate<br />
binding. The effect was clearly seen when the first eigenvectors <strong>of</strong> AF chain was compared<br />
with that <strong>of</strong> A <strong>and</strong> F chain (reported in Figure S5.4a <strong>and</strong> S5.5a in SI). In both AF <strong>and</strong>, A <strong>and</strong> F<br />
chain, the first eigenvector show fluctuations in the same regions with higher fluctuations<br />
in the collective mode <strong>of</strong> AF chain <strong>and</strong> cooperative effect due to their binding. The second<br />
collective mode in A chain involve mainly the motion in D/E <strong>and</strong> G/H loops, beta sheets in<br />
K/L regions <strong>and</strong> A/B region <strong>and</strong> the third collective motion was restricted to D/E <strong>and</strong> G/H<br />
loops <strong>and</strong> C-terminus loop (residues 425 – 432). F chain shows involvement <strong>of</strong> Lα2 <strong>and</strong> Lβ2<br />
loops <strong>and</strong> C- terminus region in the second collective mode <strong>and</strong> Lα2, Lβ3 <strong>and</strong> Lβ5 in the<br />
third eigenvector. In AF chain, the collective motion associated with the first two<br />
eigenvectors belongs to the movement <strong>of</strong> F chain towards A chain to decrease the distance<br />
between FMN <strong>and</strong> HEME c<strong>of</strong>actor <strong>and</strong> show slightly higher fluctuation than in the<br />
individual chains. In the third eigenvector the major difference was observed mainly in Lβ3<br />
<strong>and</strong> Lα2 loop <strong>of</strong> F chain with higher fluctuations. The collective motion associated with the<br />
first three eigenvectors <strong>of</strong> AF chain is reported in Figure S5.5a, S5.5b <strong>and</strong> S5.5c,<br />
respectively in SI.<br />
137
PART II: P450BM-3 HEME/FMN Complex<br />
Figure 5.9: RMSF <strong>of</strong> protein backbone atoms along first (a), second (b) <strong>and</strong> third (c)<br />
eigenvector after projection <strong>of</strong> the trajectory on the corresponding eigenvector <strong>of</strong> A <strong>and</strong> F<br />
138
PART II: P450BM-3 HEME/FMN Complex<br />
chain in complex simulation (a1, b1 <strong>and</strong> c1) <strong>and</strong> in isolated simulation (a2, b2 <strong>and</strong> c2). The<br />
10 sequential frames represent the extension <strong>of</strong> the fluctuations in trajectories along the<br />
eigenvectors. The first extreme conformation is shown in green color <strong>and</strong> last extreme in<br />
violet color. Other conformations <strong>of</strong> A <strong>and</strong> F chain are in sky blue <strong>and</strong> tan color,<br />
respectively. Helices <strong>and</strong> loops in FMN domain are labeled. N <strong>and</strong> C indicate the N- <strong>and</strong> C-<br />
terminus <strong>of</strong> the protein (labeled in red color).<br />
The isolated A chain have the first collective mode (Figure 5.9a2) have higher RPF at<br />
the end <strong>of</strong> C helix (residue 103 – 107) <strong>and</strong> C- terminus (residues 452 – 458). Other region<br />
involves in first collective mode were D/E, E/F, F/G <strong>and</strong> K/L loop (residue 385 – 390).<br />
Together the motion in related to the change in substrate binding region <strong>and</strong> FMN domain<br />
binding region. The first collective mode <strong>of</strong> the isolated F chain shows higher RPF in Lα2<br />
<strong>and</strong> Lβ2, <strong>and</strong> slightly high RPF in Lβ3 <strong>and</strong> Lβ4. In the isolated domains, the collective<br />
motions were more independent i.e. in F chain related to binding <strong>of</strong> FMN c<strong>of</strong>actor <strong>and</strong> in A<br />
chain restricted to substrate binding region. The second collective mode in A chain involve<br />
mainly the motion in D/E, E/F <strong>and</strong> F/G loops <strong>and</strong> only in F/G region in the third collective<br />
motion. F chain shows involvement <strong>of</strong> Lα2 <strong>and</strong> Lβ2 in the second collective mode <strong>and</strong><br />
Lα2 <strong>and</strong> Lβ3 in the third eigenvector.<br />
5.5. Conclusions<br />
We performed MD simulation on HEME <strong>and</strong> FMN domains as isolated domain or in<br />
complex. Structure remains conserved in both the systems throughout the simulation.<br />
During simulation, HEME/FMN complex undergoes into the conformational rearrangement<br />
in the first 10 ns simulation (with decrease in Rg from 2.42 nm to 2.33 nm) <strong>and</strong> resulted<br />
into the compactness <strong>of</strong> the complex with decrease in FMN/HEME distance from 1.81 nm<br />
to an average 1.41 nm. FMN domain in solution show major conformational change in Lα2<br />
loop in the absence <strong>of</strong> HEME domain. In isolated HEME domain major conformational<br />
139
PART II: P450BM-3 HEME/FMN Complex<br />
change were observed in FMN binding region especially in C helix <strong>and</strong> H/I <strong>and</strong> K/L (residue<br />
385 – 395) loops. G helix <strong>and</strong> inner FMN c<strong>of</strong>actor loop (Lβ3) fluctuate more in both the<br />
simulations. Both domains differ in the atomic fluctuation amplitude in isolated <strong>and</strong><br />
complex simulation. In complex the collective motion was dominated by the interaction<br />
mechanism between HEME <strong>and</strong> FMN domain <strong>and</strong> associated change in substrate access<br />
channel. The movement <strong>of</strong> FMN domain over HEME domain might be related to ET<br />
mechanism in P450BM-3 as proposed earlier <strong>and</strong> responsible to the ET rate between both<br />
the domains in the range from 10 8 to 10 11 s -1 under physiological condition as observed<br />
experimentally <strong>and</strong> proposed theoretically earlier.[11]<br />
5.6. References<br />
1. Chefson A, Auclair K (2006) Progress towards the easier use <strong>of</strong> P450 enzymes. Mol<br />
Biosyst 2: 462-469.<br />
2. Wong LL (1998) Cytochrome P450 monooxygenases. Curr Opin Chem Biol 2: 263-<br />
268.<br />
3. Guengerich FP (2001) Common <strong>and</strong> uncommon cytochrome P450 reactions related<br />
to metabolism <strong>and</strong> chemical toxicity. Chem Res Toxicol 14: 611-650.<br />
4. Urlacher VB, Eiben S (2006) Cytochrome P450 monooxygenases: perspectives for<br />
synthetic application. Trends biotechnol 24: 324-330.<br />
5. Bernhardt R (2006) Cytochromes P450 as versatile biocatalysts. J Biotechnol 124:<br />
128-145.<br />
6. Coon MJ (2005) Cytochrome P450: nature's most versatile biological catalyst. Annu<br />
Rev Pharmacol Toxicol 45: 1-25.<br />
7. Narhi LO, Fulco AJ (1986) Characterization <strong>of</strong> a catalytically self-sufficient 119,000-<br />
dalton cytochrome P-450 monooxygenase induced by barbiturates in Bacillus<br />
megaterium. J Biol Chem 261: 7160-7169.<br />
140
PART II: P450BM-3 HEME/FMN Complex<br />
8. Narhi LO, Fulco AJ (1987) Identification <strong>and</strong> Characterization <strong>of</strong> 2 Functional<br />
Domains in Cytochrome-P-450bm-3, a Catalytically Self-Sufficient Monooxygenase<br />
Induced by Barbiturates in Bacillus-Megaterium. J Biol Chem 262: 6683-6690.<br />
9. Munro AW, Lindsay JG, Coggins JR, Kelly SM, Price NC (1994) Structural <strong>and</strong><br />
Enzymological Analysis <strong>of</strong> the Interaction <strong>of</strong> Isolated Domains <strong>of</strong> Cytochrome-P-450<br />
Bm3. Febs Letters 343: 70-74.<br />
10. Warman AJ, Roitel O, Neeli R, Girvan HM, Seward HE, et al. (2005) Flavocytochrome<br />
P450 BM3: an update on structure <strong>and</strong> mechanism <strong>of</strong> a biotechnologically important<br />
enzyme. Biochem Soc Trans 33: 747-753.<br />
11. Munro AW, Leys DG, McLean KJ, Marshall KR, Ost TW, et al. (2002) P450 BM3: the<br />
very model <strong>of</strong> a modern flavocytochrome. Trends Biochem Sci 27: 250-257.<br />
12. Girvan HM, Waltham TN, Neeli R, Collins HF, McLean KJ, et al. (2006)<br />
Flavocytochrome P450 BM3 <strong>and</strong> the origin <strong>of</strong> CYP102 fusion species. Biochem Soc<br />
Trans 34: 1173-1177.<br />
13. Peterson JA, Sevrioukova I, Truan G, GrahamLorence SE (1997) P450BM-3: A tale <strong>of</strong><br />
two domains - Or is it three? Steroids 62: 117-123.<br />
14. Munro AW, Daff S, Coggins JR, Lindsay JG, Chapman SK (1996) Probing electron<br />
transfer in flavocytochrome P-450 BM3 <strong>and</strong> its component domains. Eur J Biochem<br />
239: 403-409.<br />
15. Sevrioukova IF, Li HY, Zhang H, Peterson JA, Poulos TL (1999) Structure <strong>of</strong> a<br />
cytochrome P450-redox partner electron-transfer complex. P Natl Acad Sci USA 96:<br />
1863-1868.<br />
16. Joyce MG, Ekanem IS, Roitel O, Dunford AJ, Neeli R, et al. (2012) The crystal<br />
structure <strong>of</strong> the FAD/NADPH-binding domain <strong>of</strong> flavocytochrome P450 BM3. FEBS<br />
Journal 279: 1694-1706.<br />
17. Page CC, Moser CC, Chen X, Dutton PL (1999) Natural engineering principles <strong>of</strong><br />
electron tunnelling in biological oxidation-reduction. Nature 402: 47-52.<br />
18. Hazzard JT, Govindaraj S, Poulos TL, Tollin G (1997) Electron transfer between the<br />
FMN <strong>and</strong> heme domains <strong>of</strong> cytochrome P450BM-3. Effects <strong>of</strong> substrate <strong>and</strong> CO. J Biol<br />
Chem 272: 7922-7926.<br />
141
PART II: P450BM-3 HEME/FMN Complex<br />
19. van Gunsteren WF, Billeter SR, Eising AA, Hunenberger PH, Kruger P, et al. (1996)<br />
Biomolecular Simulation: The GROMOS96 Manual <strong>and</strong> User Guide. VdF:<br />
Hochschulverlag AG an der ETH Zurich <strong>and</strong> BIOMOS bv, Zurich, Groningen.<br />
20. Helms V, Deprez E, Gill E, Barret C, Hui Bon Hoa G, et al. (1996) Improved binding <strong>of</strong><br />
cytochrome P450cam substrate analogues designed to fill extra space in the<br />
substrate binding pocket. Biochemistry 35: 1485-1499.<br />
21. Roccatano D, Wong TS, Schwaneberg U, Zacharias M (2005) Structural <strong>and</strong> dynamic<br />
properties <strong>of</strong> cytochrome P450 BM-3 in pure water <strong>and</strong> in a<br />
dimethylsulfoxide/water mixture. Biopolymers 78: 259-267.<br />
22. Roccatano D, Wong TS, Schwaneberg U, Zacharias M (2006) Toward underst<strong>and</strong>ing<br />
the inactivation mechanism <strong>of</strong> monooxygenase P450 BM-3 by organic cosolvents: a<br />
molecular dynamics simulation study. Biopolymers 83: 467-476.<br />
23. Verma R, Schwaneberg U, Roccatano D Conformational Dynamics <strong>of</strong> the FMNbinding<br />
Reductase Domain <strong>of</strong> Monooxygenase P450BM-3. Unpublished.<br />
24. Walsh JD, Miller AF (2003) Flavin reduction potential tuning by substitution <strong>and</strong><br />
bending. J Mol Struc-Theochem 623: 185-195.<br />
25. Zheng Y-J, Ornstein RL (1996) A Theoretical Study <strong>of</strong> the Structures <strong>of</strong> Flavin in<br />
Different Oxidation <strong>and</strong> Protonation States. J Am Chem Soc 118: 9402-9408.<br />
26. Beratan DN, Betts JN, Onuchic JN (1991) Protein electron transfer rates set by the<br />
bridging secondary <strong>and</strong> tertiary structure. Science 252: 1285-1288.<br />
27. Balabin IA, Hu X, Beratan DN (2012) Exploring biological electron transfer pathway<br />
dynamics with the Pathways Plugin for VMD. J Comput Chem 33: 906-910.<br />
28. Humphrey W, Dalke A, Schulten K (1996) VMD: Visual molecular dynamics. J Mol<br />
Graphics 14: 33-&.<br />
29. Kabsch W, S<strong>and</strong>er C (1983) Dictionary <strong>of</strong> protein secondary structure: pattern<br />
recognition <strong>of</strong> hydrogen-bonded <strong>and</strong> geometrical features. Biopolymers 22: 2577-<br />
2637.<br />
30. Chang YT, Loew GH (1999) Molecular dynamics simulations <strong>of</strong> P450 BM3--<br />
examination <strong>of</strong> substrate-induced conformational change. J Biomol Struct Dyn 16:<br />
1189-1203.<br />
142
PART II: P450BM-3 HEME/FMN Complex SI<br />
Supporting Information<br />
Insight into the redox partner interaction mechanism in<br />
cytochrome P450BM-3 using molecular dynamics<br />
simulation<br />
Table S5.1: Partial charge on HEME c<strong>of</strong>actor with ferric iron.[1-3]<br />
Atom number Atom type Atom name Charge group Partial charge<br />
1 FE FE 1 1.0<br />
2 NR NA 1 -0.4<br />
3 NR NB 1 -0.4<br />
4 NR NC 1 -0.4<br />
5 NR ND 1 -0.4<br />
6 C CHA 2 -0.2<br />
7 HC HHA 2 0.2<br />
8 C C1A 3 0.2<br />
9 C C2A 3 -0.1<br />
10 C C3A 3 0.0<br />
11 C C4A 3 0.1<br />
12 CH3 CMA 4 0.0<br />
13 CH2 CAA 5 0.0<br />
14 CH2 CBA 5 0.0<br />
143
PART II: P450BM-3 HEME/FMN Complex SI<br />
15 C CGA 6 0.27<br />
16 OM O1A 6 -0.635<br />
17 OM O2A 6 -0.635<br />
18 C CHB 7 -0.2<br />
19 HC HHB 7 0.2<br />
20 C C1B 8 0.05<br />
21 C C2B 8 0.05<br />
22 C C3B 8 -0.1<br />
23 C C4B 8 0.2<br />
24 CH3 CMB 9 0.0<br />
25 CR1 CAB 10 0.0<br />
26 CH2 CBB 10 0.0<br />
27 C CHC 11 -0.2<br />
28 HC HHC 11 0.2<br />
29 C C1C 12 0.2<br />
30 C C2C 12 0.0<br />
31 C C3C 12 -0.1<br />
32 C C4C 12 0.2<br />
33 CH3 CMC 13 0.0<br />
34 CR1 CAC 14 0.0<br />
35 CH2 CBC 14 0.0<br />
36 C CHD 15 -0.2<br />
37 HC HHD 15 0.2<br />
38 C C1D 16 0.2<br />
39 C C2D 16 0.1<br />
40 C C3D 16 -0.2<br />
41 C C4D 16 0.2<br />
42 CH3 CMD 17 0.0<br />
43 CH2 CAD 18 0.0<br />
44 CH2 CBD 18 0.0<br />
144
PART II: P450BM-3 HEME/FMN Complex SI<br />
45<br />
46<br />
47<br />
C CGD 19<br />
OM O1D 19<br />
OM O2D 19<br />
0.27<br />
-0.635<br />
-0.635<br />
Figure S5.1: Secondary structure per residue calculated by DSSP[4] along the trajectory as a<br />
function <strong>of</strong> time for HEME domain <strong>and</strong> FMN domain (a) in complex simulation <strong>and</strong> (b) isolated.<br />
Color code represents different secondary structures.<br />
145
PART II: P450BM-3 HEME/FMN Complex SI<br />
Figure S5.2: Minimum distance between water molecules <strong>and</strong> HEME iron as a function <strong>of</strong><br />
time (every 100 ps) in isolated (in red color) <strong>and</strong> complex (in black color) simulation.<br />
146
PART II: P450BM-3 HEME/FMN Complex SI<br />
Figure S5.3: Relative positional fluctuation <strong>of</strong> first 50 eigenvectors <strong>of</strong> the A <strong>and</strong> F chains in<br />
isolation <strong>and</strong> complex simulation. In AF chain, the first 50 eigenvectors account for 80.45 % <strong>of</strong> total<br />
RPF with 25.96 % contribution by the first eigenvector. A chain has 79.28 % <strong>and</strong> 86.77 %<br />
cumulative RPF with 27.19 % <strong>and</strong> 48.54 % contribution by first eigenvector in complex <strong>and</strong><br />
isolated domain simulation, respectively. For F chain, cumulative RPF <strong>of</strong> first 50 eigenvectors was<br />
90.96 % <strong>and</strong> 89.19 % with 33.98 % <strong>and</strong> 35.02 % RPF <strong>of</strong> first eigenvector in complex <strong>and</strong> isolated<br />
domain simulation.<br />
147
PART II: P450BM-3 HEME/FMN Complex SI<br />
Figure S5.4: RPF for (a) first, (b) second <strong>and</strong> (c) third eigenvector <strong>of</strong> AF chain (cyan color), <strong>and</strong> A<br />
<strong>and</strong> F chain in complex (black color) simulation. The green vertical line separates Heme <strong>and</strong> FMN<br />
domain. Horizontal bars, in blue <strong>and</strong> orange color represent helixes (labeled) <strong>and</strong> beta sheets,<br />
respectively. The regions involved in c<strong>of</strong>actor binding are represented ed by horizontal bars in purple<br />
color.<br />
148
PART II: P450BM-3 HEME/FMN Complex SI<br />
Figure S5.5: RMSF <strong>of</strong> protein backbone atoms along first, second <strong>and</strong> third eigenvector after<br />
projection <strong>of</strong> the trajectory on the corresponding eigenvector <strong>of</strong> AF chain in complex simulation in<br />
(a), (b) <strong>and</strong> (c), respectively. The 10 sequential frames represent the extension <strong>of</strong> the fluctuations in<br />
trajectories along the eigenvectors. The first extreme conformation is shown in green color <strong>and</strong> last<br />
extreme in violet color. Other conformations <strong>of</strong> Heme <strong>and</strong> FMN domain are in sky blue <strong>and</strong> tan<br />
color, respectively. Helixes <strong>and</strong> loops are labeled. N <strong>and</strong> C indicate the N- <strong>and</strong> C-terminus <strong>of</strong> the<br />
protein (labeled in red color).<br />
149
PART II: P450BM-3 HEME/FMN Complex SI<br />
References:<br />
1. Helms V, Deprez E, Gill E, Barret C, Hui Bon Hoa G, et al. (1996) Improved binding <strong>of</strong><br />
cytochrome P450cam substrate analogues designed to fill extra space in the<br />
substrate binding pocket. Biochemistry 35: 1485-1499.<br />
2. Roccatano D, Wong TS, Schwaneberg U, Zacharias M (2005) Structural <strong>and</strong> dynamic<br />
properties <strong>of</strong> cytochrome P450 BM-3 in pure water <strong>and</strong> in a<br />
dimethylsulfoxide/water mixture. Biopolymers 78: 259-267.<br />
3. Roccatano D, Wong TS, Schwaneberg U, Zacharias M (2006) Toward underst<strong>and</strong>ing<br />
the inactivation mechanism <strong>of</strong> monooxygenase P450 BM-3 by organic cosolvents: a<br />
molecular dynamics simulation study. Biopolymers 83: 467-476.<br />
4. Kabsch W, S<strong>and</strong>er C (1983) Dictionary <strong>of</strong> protein secondary structure: pattern<br />
recognition <strong>of</strong> hydrogen-bonded <strong>and</strong> geometrical features. Biopolymers 22: 2577-<br />
2637.<br />
150
PART II: P450BM-3 HEME/FMN & CoSep<br />
Chapter 6<br />
A molecular dynamics study <strong>of</strong> the effect <strong>of</strong><br />
cobalt(II)sepulchrate as an electron transfer mediator<br />
on the conformational <strong>and</strong> dynamics <strong>of</strong> P450BM-3<br />
6.1. Abstract<br />
The major limitation <strong>of</strong> the exploitation <strong>of</strong> P450BM-3 in the industrial processes is<br />
the consumption <strong>of</strong> expensive NADPH as a reduction equivalent in the catalytic cycle.<br />
Experimentally NADPH has also been found to inactivate the enzyme in the absence <strong>of</strong><br />
substrate. The use <strong>of</strong> alternative cost effective c<strong>of</strong>actor like cobalt(III)sepulchrate (CoSep)<br />
with zinc dust as the source <strong>of</strong> electron has been proposed as a possible alternative<br />
solution to overcome the latter limitation. The mechanism <strong>of</strong> interaction <strong>of</strong> cobalt(III)<br />
sepulchrate with the protein has not yet elucidated at molecular level. In this paper, we<br />
propose a novel model <strong>of</strong> CoSep <strong>and</strong> use to study using molecular dynamic simulations its<br />
interaction with isolated HEME domain <strong>and</strong> the HEME/FMN complex <strong>of</strong> the P450BM-3. The<br />
aim <strong>of</strong> the study is to identify the putative binding modes <strong>of</strong> the CoSep on P450BM-3<br />
domains <strong>and</strong> their effect on their conformation, dynamics <strong>and</strong> electron transfer (ET)<br />
tunneling. The results <strong>of</strong> this study indicates that CoSep preferentially bind to negative<br />
charged residue on the surface exposed regions <strong>of</strong> P450BM-3 domains. Two ET tunneling<br />
pathways were observed for HEME/FMN complex in the presence <strong>of</strong> CoSep. First one is<br />
from CoSep to FMN isoalloxazine ring involving Trp574 then to HEME iron mediated by<br />
151
PART II: P450BM-3 HEME/FMN & CoSep<br />
water molecule on interface <strong>and</strong> Met490 (ribityl tail <strong>of</strong> FMN c<strong>of</strong>actor binding loop) <strong>and</strong><br />
Phe394, the same residues were involved in ET tunneling in water simulation <strong>of</strong> P450BM-3<br />
domains. The second ET pathways was from CoSep to HEME iron via Ile102 <strong>and</strong> Leu103 (C<br />
helix residues) <strong>and</strong> Ile401 <strong>and</strong> Cys400. In isolated HEME domain ET tunneling involved the<br />
residues <strong>of</strong> B’/C loop, Asp84, Gly85, Leu86 <strong>and</strong> Phe87. The collective motions <strong>of</strong> different<br />
amplitude were observed in both the systems <strong>and</strong> were found to facilitate the ET tunneling<br />
from CoSep to HEME iron.<br />
6.2. Introduction<br />
Cytochrome P450 monooxygenases, the largest superfamily <strong>of</strong> heme-containing<br />
soluble proteins, spread widely in almost all domains <strong>of</strong> life e.g. bacteria, yeast, insects,<br />
mammalian tissues, <strong>and</strong> plants.[1-3] They catalyze the oxidation using oxygen molecules <strong>of</strong><br />
wide variety <strong>of</strong> substrates involved in biosynthesis <strong>and</strong> biodegradation pathways, or in<br />
xenobiotics metabolism[4] in the presence <strong>of</strong> reduction equivalents. The high<br />
stereoselectivity <strong>and</strong> large variety <strong>of</strong> possible substrates make these enzymes particular<br />
interesting for industrial applications. However, their complexity, low solubility, low<br />
catalytic turnover <strong>and</strong> in particular the utilization <strong>of</strong> expensive source <strong>of</strong> electron have so<br />
far limited their use.[4] Cytochrome P450BM-3, from the soil bacterium Bacillus<br />
megaterium, is one <strong>of</strong> the most widely studied members <strong>of</strong> this family.[5,6] Being soluble<br />
<strong>and</strong> self sufficient (P450 <strong>and</strong> reductase domains linked together on a single polypeptide<br />
chain), P450BM-3 has higher catalytic turnover with easy expression <strong>and</strong> purification in<br />
cell free medium.[7] Protein engineering approaches have been used successfully to<br />
increase technologically viability <strong>of</strong> P450BM-3 by fine-tuning its catalytic parameters <strong>and</strong><br />
substrate recognition.[8,9] In past years, fast advancements are also made towards the cost<br />
effectiveness <strong>of</strong> the P450BM-3 catalytic reaction by the regeneration or substitution <strong>of</strong><br />
expensive c<strong>of</strong>actor (NADPH or NADH) as a source <strong>of</strong> electrons.[7] The electrochemistry <strong>of</strong><br />
P450BM-3 received considerable attention <strong>and</strong> various methods have allowed direct<br />
electron transfer system (from electrode to protein via conducting polymer films like<br />
152
PART II: P450BM-3 HEME/FMN & CoSep<br />
BaytronP)[10] or mediated electron transfer system (to shuttle electrons from electrode to<br />
protein via small electro active compounds like Zn dust (as a source <strong>of</strong> electron) with<br />
Co(III)sepulchrate (as electron mediator) for driving the catalytic cycle.[11,12] Protein<br />
engineering via directed evolution <strong>and</strong> rational design <strong>of</strong>fers an attractive solution to<br />
improve the enzymatic properties <strong>and</strong> to enhance the electrochemical performance <strong>of</strong> the<br />
enzyme.[11-13] In this paper, we performed molecular dynamic simulation to gain insight<br />
into the interaction mechanism <strong>of</strong> P450BM-3 domains with cobalt(II)sepulchrate (CoSep)<br />
as an electron transfer mediator. The results will help to investigate the effect <strong>of</strong> CoSep<br />
binding on conformation, dynamics <strong>and</strong> ET tunneling in P450BM-3 domains.<br />
The chapter is organized as follows. The details <strong>of</strong> MD simulations <strong>and</strong> force field<br />
modeling <strong>of</strong> the CoSep are reported in Method section. The Results <strong>and</strong> Discussion section<br />
is organized as follows. The preferential binding sites <strong>of</strong> CoSep on P450BM-3 domains are<br />
reported. The following paragraph provides information about the ET tunneling from<br />
CoSep to P450BM-3 domains. Hence, the collective dynamics <strong>of</strong> the system will be analyzed<br />
using the principal component analysis <strong>of</strong> the trajectories. Finally, in the conclusion section<br />
provides a summary <strong>of</strong> the outcome <strong>of</strong> the study.<br />
6.3. Methods<br />
6.3.1. Starting coordinates<br />
The non- stoichiometric complex <strong>of</strong> one FMN domain to two HEME domains without<br />
substrate were used as a starting coordinate (PDB ID: 1BVY with resolution 0.203 nm).[14]<br />
For MD simulation, HEME domain (chain A: 20 - 450) associated with FMN domain (chain<br />
F: 479 - 630) was extracted from the starting coordinates including crystallographic water<br />
(within 0.60 nm from the protein was extracted using VMD s<strong>of</strong>tware[15]). 1,2-ethanediol<br />
molecules were removed <strong>and</strong> replaced by water molecules from the crystallographic<br />
153
PART II: P450BM-3 HEME/FMN & CoSep<br />
structure. The MD simulation was set up for isolated HEME-binding domain <strong>and</strong><br />
HEME/FMN complex in water- CoSep mixture.<br />
6.3.2. Molecular dynamics simulation <strong>and</strong> modeling<br />
The GROMOS96 43a1 force field[16] was used for all simulations. The MD<br />
simulations performed in this study are summarized in Table 6.1. The HEME c<strong>of</strong>actor<br />
parameters for ferric iron was adopted from Helms et al.[17], that was already employed<br />
for the MD simulation <strong>of</strong> P450BM-3 HEME domain by Roccatano et al..[18,19] The partial<br />
charges were redistributed on porphyrin ring <strong>of</strong> HEME c<strong>of</strong>actor to adopt the parameters<br />
for GROMOS96 43a1 force field[16] with hydrogen atoms bound to bridging carbon in<br />
prophyrin ring.[20] FMN c<strong>of</strong>actor was in oxidized state in the FMN domain. Additional<br />
improper dihedrals were introduced to adopt the conformation <strong>of</strong> isoalloxazine ring as<br />
observed in crystallographic structure <strong>and</strong> molecular geometry optimization <strong>of</strong> flavin in<br />
both redox states. [21,22] Detail <strong>of</strong> the modified force field for FMN are reported in a<br />
previous paper.[23]<br />
For CoSep, (schematically represented in Figure 6.1) the force field parameters for<br />
bond <strong>and</strong> bond angles are adapted from Dehayes et al.[24] (the values are reported in<br />
Table S6.1 in Supporting Information (SI)). The non-bonded parameters are adopted from<br />
GROMOS96 43a1 force field.[16]<br />
Density functional theory calculation using Becke3LYP method[25] with LanL2DZ<br />
basic set[26] was used for the geometry optimization. Atomic partial charges were derived<br />
using CHelpG scheme[27] after constraints them to reproduce dipole moment (partial<br />
charges are reported in Table S6.2 in SI). A ionic radius for Co +2 <strong>of</strong> 0.075 nm was used to fit<br />
electrostatic potentials. All the calculations were performed using Gaussian09 package.[28]<br />
Fourty molecules <strong>of</strong> CoSep were r<strong>and</strong>omly placed in the simulation box <strong>and</strong> solvated<br />
by stacking equilibrated boxes <strong>of</strong> solvent molecules to fill the simulation box. The CoSep<br />
154
PART II: P450BM-3 HEME/FMN & CoSep<br />
concentration was equal to ~0.5 mM <strong>and</strong> it corresponds to the one used experimentally for<br />
the fastest biotransformation in P450BM-3 using Zn dust <strong>and</strong> cobalt(III)sepulchrate as<br />
alternative electron transfer system.[12]<br />
Figure 6.1: CoSep is in ball <strong>and</strong> stick representation, colored by elements such as, nitrogen<br />
in blue, hydrogen in green, <strong>and</strong> carbon in gray with labeled atom name <strong>and</strong> number (except<br />
hydrogen).<br />
Table 6.1: Summarizing the MD simulations <strong>of</strong> P450BM-3 domains in water <strong>and</strong> CoSep<br />
solution.<br />
No. <strong>of</strong><br />
Starting<br />
No. <strong>of</strong> counter Simulation<br />
No. <strong>of</strong> atoms No. <strong>of</strong> CoSep solvent<br />
coordinates<br />
ions<br />
length (ns)<br />
molecules<br />
Heme<br />
domain (A 65650 - 20365 16 Na + 100<br />
chain)<br />
FMN domain 33483 - 10650 14 Na + 100<br />
155
PART II: P450BM-3 HEME/FMN & CoSep<br />
(F chain)<br />
Complex (AF<br />
chain)<br />
A chain &<br />
CoSep<br />
AF chain &<br />
CoSep<br />
86101 - 26671 30 Na + 100<br />
64597 40 19638 64 Cl - 100<br />
85275 40 26029 50 Cl - 100<br />
*The abbreviation A, F <strong>and</strong> AF are used in the rest <strong>of</strong> the paper for HEME domain, FMN domain <strong>and</strong><br />
HEME/FMN complex, respectively.<br />
6.4. Results <strong>and</strong> discussion<br />
The difference in conformation <strong>and</strong> dynamics <strong>of</strong> isolated FMN <strong>and</strong> HEME domain<br />
<strong>and</strong> HEME/FMN complex has been discussed in our previous paper.[20,23] Herein, we will<br />
focus on the effect <strong>of</strong> CoSep binding on the conformation, dynamics <strong>and</strong> ET tunneling in<br />
P450BM-3 domains in isolated HEME domain <strong>and</strong> in HEME/FMN complex. The presence <strong>of</strong><br />
CoSep does not affect the structure <strong>of</strong> P450BM-3 domains significantly. The structural<br />
stability <strong>and</strong> convergence <strong>of</strong> P450BM-3 domains in CoSep solution were compared with the<br />
one in water <strong>and</strong> reported in SI through backbone root mean square deviation (RMSD)<br />
(Figure S6.1), radius <strong>of</strong> gyration (Rg) (Figure S6.2) <strong>and</strong> backbone RMSD <strong>and</strong> RMSF per<br />
residue (Figure S6.3a <strong>and</strong> S6.3a, respectively) using crystal structure as reference. In CoSep<br />
solution, both HEME domain <strong>and</strong> the complex show the same behavior as in pure water.<br />
The backbone RMSD <strong>of</strong> the HEME domain in the CoSep solution reaches a plateau with an<br />
average value <strong>of</strong> 0.25 ± 0.01 nm after 10 ns <strong>of</strong> simulation (Figure S6.1 in SI) <strong>and</strong> it shows<br />
the lowest RMS deviations <strong>and</strong> fluctuations (Figure S6.3a <strong>and</strong> S6.3b in SI). On the contrary,<br />
the AF complex shows the largest deviation in the residues <strong>of</strong> H helix (Figure S6.3a in SI).<br />
156
PART II: P450BM-3 HEME/FMN & CoSep<br />
6.4.1. CoSep binding on P450BM-3 domains<br />
At the end <strong>of</strong> the simulations <strong>of</strong> both the isolated HEME domain <strong>and</strong> the complex,<br />
CoSep molecules were found bounded mainly at the surface exposed loop regions <strong>of</strong> the<br />
protein (see Figure S6.4 <strong>of</strong> SI). The average minimum distances between the CoSep<br />
molecules <strong>and</strong> HEME iron <strong>and</strong> the isoalloxazine ring along the simulations are reported in<br />
Figure S6.5 <strong>of</strong> SI. After 20 ns <strong>of</strong> simulation, CoSep molecules approach the HEME domain<br />
within an average distance <strong>of</strong> 1.72 ± 0.44 nm <strong>and</strong> 1.94 ± 0.19 nm in the isolated protein <strong>and</strong><br />
in the complex, respectively. Figure 6.2a, 6.2b <strong>and</strong> 6.2c shows the minimum distance<br />
between CoSep <strong>and</strong> residues <strong>of</strong> isolated A chain <strong>and</strong>, <strong>of</strong> A <strong>and</strong> F chain in complex,<br />
respectively.<br />
The cluster analysis was used to select representative structure for the isolated<br />
HEME domain <strong>and</strong> for the complex. The first cluster <strong>of</strong> A chain <strong>and</strong> complex accounts for<br />
more ~83 % <strong>and</strong> 99 %, respectively in CoSep solution. The binding <strong>of</strong> CoSep in isolated A<br />
chain <strong>and</strong> AF chain complex is shown in Figure 6.2c <strong>and</strong> 6.2d, respectively. In isolated A<br />
chain, CoSep molecules bind mainly at HEME/FMN interface in contact with C (94 – 107)<br />
<strong>and</strong> H (233 – 238) helix <strong>and</strong>, B’/C (82 – 94), H/I (239 – 251), K/L (359 – 367) C-terminus<br />
turn (441 – 445) regions. Other regions <strong>of</strong> CoSep binding on isolated chain was A/B (32 –<br />
38 <strong>and</strong> 51 – 55), B helix (55 – 61) <strong>and</strong> B/B’ loop (61 – 68) <strong>and</strong> E helix (139 – 143) <strong>and</strong> F<br />
helix (181, 182) <strong>and</strong> F/G (192 – 198) loop. In AF chain, the binding <strong>of</strong> F chain slightly<br />
influence the distribution <strong>of</strong> CoSep on A chain. CoSep were more abundant at F chain, two<br />
<strong>of</strong> them were present near (≤ 0.50 nm) to FMN c<strong>of</strong>actor binding loop Lβ3 <strong>and</strong> Lβ4 at<br />
FMN/HEME interface. In P450BM-3 domains, the regions <strong>of</strong> CoSep binding were found to<br />
be rich in charged residues especially negative charged polar residues (aspartic acid <strong>and</strong><br />
glutamic acid) i.e. obtained from the analysis <strong>of</strong> number <strong>of</strong> contacts between CoSep <strong>and</strong><br />
P450BM-3 residues within the distance <strong>of</strong> 0.50 nm (reported in Figure S6.6 <strong>and</strong> also shown<br />
in Figure 6.2d <strong>and</strong> 6.2e by the abundance <strong>of</strong> oxygen (red color surface) in CoSep binding<br />
region).<br />
157
PART II: P450BM-3 HEME/FMN & CoSep<br />
Figure 6.2: Minimum distance (≤ 1.0 nm) between CoSep <strong>and</strong> residues <strong>of</strong> (a) isolated A chain, (b)<br />
A chain <strong>and</strong> (c) F chain in complex. Horizontal bars, in blue <strong>and</strong> orange color represent helices<br />
(labeled) <strong>and</strong> beta sheets, respectively. The regions involved in c<strong>of</strong>actor binding are represented by<br />
horizontal bars in purple color. (d) <strong>and</strong> (e) show binding site for CoSep in the structure <strong>of</strong> first<br />
158
PART II: P450BM-3 HEME/FMN & CoSep<br />
cluster <strong>of</strong> the isolates A chain <strong>and</strong> AF chain, respectively. CoSep molecules are in ball <strong>and</strong> stick<br />
representation <strong>and</strong> colored by element type (nitrogen in blue color, hydrogen in white color, <strong>and</strong><br />
carbon in black color). HEME <strong>and</strong> FMN domain are in cartoon representation in sky blue <strong>and</strong> tan<br />
color, respectively with surface colored according to element type. FMN <strong>and</strong> HEME c<strong>of</strong>actors are in<br />
green <strong>and</strong> red, respectively. Helices, c<strong>of</strong>actors, loops <strong>and</strong>, N- <strong>and</strong> C- terminus (in red color) are<br />
labeled.<br />
6.4.2. Effect <strong>of</strong> CoSep binding on substrate access channel<br />
The accessibility <strong>of</strong> active site has been monitored by the dynamics behavior <strong>of</strong><br />
residues Pro45 <strong>and</strong> A191 that line the substrate access channel by Roccatano et al..[19]<br />
P45Cα - A191Cα minimum distance (1.61 nm in crystal structure) calculated <strong>and</strong> reported<br />
in Figure S6.7 <strong>of</strong> SI. After 30 ns <strong>of</strong> simulation, least variation in the distances was observed<br />
in all the simulations. CoSep binding in A chain <strong>of</strong> AF complex induce larger deviation in G<br />
helix <strong>and</strong> F/G loop region <strong>and</strong> resulted in wider substrate access channel with an average<br />
distance <strong>of</strong> 1.87 ± 0.15 nm than in A chain <strong>of</strong> AF complex in water (0.59 ± 0.10 nm). Isolated<br />
A chain was less affected by CoSep binding <strong>and</strong> show slightly higher P45Cα - A191Cα<br />
distance (1.50 ± 0.14 nm) in it CoSep solution than in water with an average distance <strong>of</strong><br />
1.11 ± 0.10 nm. In isolated A chain, reverse effect <strong>of</strong> CoSep binding was observed with 0.22<br />
± 0.03 nm average distance between water <strong>and</strong> HEME iron that was observed to be 0.34 ±<br />
0.14 nm in water simulation. Hence, CoSep binding in isolated HEME domain make its<br />
structure slightly compact <strong>and</strong> decreased the size <strong>of</strong> substrate access channel.<br />
6.4.3. Effect <strong>of</strong> CoSep binding on ET tunneling<br />
In CoSep solution, the distance between FMN <strong>and</strong> HEME c<strong>of</strong>actor was as average<br />
1.35 ± 0.01 nm (with the minimum distance <strong>of</strong> 0.95 nm), lower than the one in water (1.41<br />
± 0.09 nm) (reported in Figure S6.8 in SI). The ET tunneling in AF chain in crystal structure<br />
<strong>and</strong> in the simulation has been discussed in detail in our previous paper.[20] In CoSep<br />
159
PART II: P450BM-3 HEME/FMN & CoSep<br />
solution, the ET tunneling was identified form CoSep to HEME iron in representative<br />
structures <strong>of</strong> isolated <strong>and</strong> complex simulation obtained via cluster analysis <strong>and</strong> reported in<br />
Table 6.2.<br />
Table 6.2: Electron transfer tunneling in AF chain <strong>and</strong> isolated A chain in CoSep solution<br />
calculated by Pathways[29] VMD plugin.<br />
Coordinates Redox Max. Distance Amino acids involved in<br />
partners coupling along ET the ET pathway<br />
(a.u.) pathway (nm)<br />
A chain CoSep/HEME 6.39 x10 -9 3.08 CoSep → D84 →<br />
G85 → L86 → F87 →<br />
HEME (FE)<br />
AF chain CoSep/HEME 2.25 x10 -9 2.93 CoSep → I102 →<br />
L103 → I401 → C400 →<br />
HEME (FE)<br />
AF chain CoSep/FMN 4.00 x10 -6 1.72 CoSep → W574 →<br />
→ FMN (C7)<br />
AF chain FMN/HEME 1.38 x10 -9 2.08 FMN (C7) → SOL →<br />
→ M490 → F393 →<br />
HEME (FE)<br />
Figure 6.3a <strong>and</strong> 6.3b shows the possible ET tunneling from CoSep to HEME iron <strong>of</strong> A<br />
chain in isolated <strong>and</strong> complex simulation. In isolated A chain, ET was mediated by the<br />
residues <strong>of</strong> B’/C loop, Asp84, Gly85, Leu86 <strong>and</strong> Phe87. In A chain <strong>of</strong> AF complex, ET<br />
tunneling can be mediated by two pathways. The first ET pathway is from CoSep to HEME<br />
iron mediated by Iso102, Leu103, Ile401 <strong>and</strong> Cys400. Figure 6.3c <strong>and</strong> 6.3d shows the<br />
160
PART II: P450BM-3 HEME/FMN & CoSep<br />
second possible pathway, first from CoSep to isoalloxazine ring <strong>of</strong> FMN (C7 atom) <strong>and</strong> then<br />
from C7 atom to HEME iron mediated by water molecule involving Met490 <strong>of</strong> Lβ1 FMN<br />
binding loop <strong>and</strong> Phe393 <strong>of</strong> K/L loop. The same FMN/HEME ET pathway was observed in<br />
water simulation <strong>of</strong> AF complex without the involvement <strong>of</strong> water molecule.[20]<br />
161
PART II: P450BM-3 HEME/FMN & CoSep<br />
Figure 6.3: ET tunneling from CoSep (in purple color) to HEME iron in AF complex (a) <strong>and</strong><br />
isolated A chain (b) in CoSep solution. (c) ET from CoSep to the isoalloxazine ring (C7 atom) <strong>of</strong> FMN<br />
c<strong>of</strong>actor <strong>and</strong> d) from C7 atom <strong>of</strong> FMN c<strong>of</strong>actor to HEME iron in AF complex. ET is represented by<br />
red color tubes. HEME <strong>and</strong> FMN c<strong>of</strong>actors are in black <strong>and</strong> pink color, respectively. The<br />
conformation <strong>of</strong> first cluster with minimum distance between HEME to FMN c<strong>of</strong>actor is used. The<br />
amino acids with in the distance <strong>of</strong> 0.50 nm from both the c<strong>of</strong>actors are labeled <strong>and</strong> shown in<br />
licorice representation colored by element type (oxygen in red, carbon in cyan <strong>and</strong> nitrogen in blue<br />
color) <strong>and</strong> their associated secondary structure in cartoon representation in sky blue for HEME<br />
domain <strong>and</strong> in orange color for FMN domain. The residues involved in electron tunneling are<br />
represented <strong>and</strong> labeled in green color.<br />
6.4.4. Effect <strong>of</strong> CoSep binding on P450BM-3 dynamics<br />
The subspace overlap <strong>and</strong> inner product <strong>of</strong> first ten eigenvectors A chain (together<br />
account for ~60% <strong>of</strong> total residue position fluctuation) in isolated <strong>and</strong> complex simulation<br />
was less than 0.20 <strong>and</strong> 0.34, respectively. The latter indicate the existence <strong>of</strong> different set <strong>of</strong><br />
collective motions in the eigenvectors <strong>of</strong> same time windows <strong>of</strong> both the trajectories. The<br />
first three eigenvectors <strong>of</strong> A chain together represent ~41 % <strong>of</strong> the total relative positional<br />
fluctuation (RPF). Figure 6.4a, 6.4b <strong>and</strong> 6.4c represents RPF associated with first three<br />
eigenvectors <strong>of</strong> A chain in isolated (in green color) <strong>and</strong> complex (in orange color)<br />
simulation (comparison to P450BM-3 domain in water is reported in Figure S6.9). Figure<br />
6.5 shows RMSF associated with first three eigenvector <strong>of</strong> A chain in isolated (a1, a2 <strong>and</strong><br />
a3) <strong>and</strong> complex (b1, b2 <strong>and</strong> b3) simulation, respectively in CoSep solution.<br />
In isolated A chain, the first collective motion (Figure 6.4a <strong>and</strong> 6.5a1) involves<br />
mainly N-terminus region (residue 20 – 26), turn (residue 35 – 38) between A helix <strong>and</strong><br />
beta sheet 1, C-terminus (residue 450 – 458) <strong>and</strong> in K/L loop region (residue 325 – 380,<br />
involved in HEME c<strong>of</strong>actor binding) <strong>and</strong> slight motion in B helix <strong>and</strong> D/E, F/G <strong>and</strong> G/H<br />
loops. In the second eigenvector involve the collective motion (Figure 6.4b <strong>and</strong> 6.5a2) along<br />
the turn (residue 35 – 38) between A helix <strong>and</strong> beta sheet, F/G loop, K/L loop (residue 366<br />
162
PART II: P450BM-3 HEME/FMN & CoSep<br />
– 385) <strong>and</strong> C-terminus (residue 425 – 430 <strong>and</strong> 450 – 458). In the third eigenvector (Figure<br />
6.4c <strong>and</strong> 6.5a3), mainly at C-terminus (residue 425 – 458) <strong>and</strong> slightly D/E, F/G <strong>and</strong> K/L<br />
loop (residue 390 – 402). The first three eigenvectors show that the substrate channel<br />
remains open (also found in P45Cα - A191Cα<br />
Figure 6.4: RPF for (a) first, (b) second <strong>and</strong> (c) third eigenvector <strong>of</strong> A chain in isolated (green<br />
color) <strong>and</strong> complex (orange color) simulation. Horizontal bars, in blue <strong>and</strong> orange color represent<br />
helixes (labeled) <strong>and</strong> beta sheets, respectively. The regions involved in c<strong>of</strong>actor binding are<br />
represented by horizontal bars in purple color.<br />
distance) <strong>and</strong> the collective motion is related to the interaction <strong>of</strong> residues to HEME<br />
c<strong>of</strong>actor <strong>and</strong> to facilitate ET tunneling from CoSep to HEME iron through B’/C loop in<br />
isolated A chain.<br />
163
PART II: P450BM-3 HEME/FMN & CoSep<br />
In complex simulation, the first eigenvector <strong>of</strong> A chain (Figure 6.4a <strong>and</strong> 6.5b1)<br />
involve RPF in B’/C loop (residue 83 – 94), C <strong>and</strong> D helix <strong>and</strong> C/D loop (residue 100 – 130),<br />
E/F loop (residue 159 – 171), G helix <strong>and</strong> G/H loop (residue 198 – 230), I helix (residue<br />
250 – 268) <strong>and</strong> K/L loop (residue 335 – 340 <strong>and</strong> 392 – 400). The first collective motion<br />
involved residues in contact with HEME c<strong>of</strong>actor <strong>and</strong> is related to interaction <strong>of</strong> A chain<br />
with F chain with the largest RPF the regions on interface <strong>of</strong> A chain mainly constituted by<br />
C – D helix (found to be involved in ET tunneling from CoSep to HEME iron) <strong>and</strong> K/L loop<br />
(F393 is involved in water mediated ET tunneling from FMN to HEME iron). In the second<br />
eigenvector (Figure 6.4b <strong>and</strong> 6.5b2), the collective motion was involve mainly G helix <strong>and</strong><br />
slightly in B’/C loop, G/H loop <strong>and</strong> K/L loop (residue 495 - 400). Larger RPF in G helix is<br />
resulted by CoSep binding the slight kink formation in G helix. The collective motion in<br />
third eigenvector (Figure 6.4c <strong>and</strong> 6.5b3) involve mainly G/H loop only <strong>and</strong> slightly in G<br />
helix <strong>and</strong> K/L loop regions.<br />
AF chain in CoSep solution shows the collective motion <strong>of</strong> different amplitude than<br />
the one in water. RPF <strong>of</strong> first three eigenvectors <strong>of</strong> AF chain in both water <strong>and</strong> CoSep<br />
solution is reported in Figure S6.10 <strong>of</strong> SI. The collective motion associated with the first<br />
two eigenvectors <strong>of</strong> AF chain in the presence <strong>of</strong> CoSep does not belongs solely to the<br />
movement <strong>of</strong> F chain towards A chain as observed in water simulation. The collective<br />
motion associated with the first three eigenvectors <strong>of</strong> AF chain in the presence <strong>of</strong> CoSep is<br />
reported in Figure S6.11 in SI. In the first eigenvector in CoSep, A chain <strong>of</strong> AF complex show<br />
higher fluctuation in C helix, D helix, beginning <strong>of</strong> F helix (residue 170 – 175) <strong>and</strong> G/H loop<br />
<strong>and</strong> lower fluctuation for F chain than the one in water. In the second eigenvector, the<br />
collective motion in A chain in the presence <strong>of</strong> CoSep mainly involve G helix <strong>and</strong> slightly<br />
higher RMSF for B’/C loop <strong>and</strong> K/L loop (residue 390 – 400), both loops are involved in<br />
HEME c<strong>of</strong>actor binding. Third collective motion in A chain <strong>of</strong> AF complex involves N-<br />
terminus residues (A – C helix), G helix <strong>and</strong> K/L loop. Third eigenvector <strong>of</strong> AF chain show<br />
slight collective motion <strong>of</strong> F chain towards A chain.<br />
164
PART II: P450BM-3 HEME/FMN & CoSep<br />
Figure 6.5: RMSF <strong>of</strong> protein backbone atoms along first (a), second (b) <strong>and</strong> third (c) eigenvector<br />
after projection <strong>of</strong> the trajectory on the corresponding eigenvector <strong>of</strong> isolated A chain in water (a1,<br />
a2 <strong>and</strong> a3) <strong>and</strong> in CoSep solution (b1, b2 <strong>and</strong> b3). The 10 sequential frames represent the extension<br />
<strong>of</strong> the fluctuations in trajectories along the eigenvectors. The first extreme conformation is shown<br />
in green color <strong>and</strong> last extreme in violet color. Other conformations <strong>of</strong> A chain are in sky blue.<br />
Helices <strong>and</strong> loops are labeled. N-terminus <strong>of</strong> the protein is labeled in red color.<br />
165
PART II: P450BM-3 HEME/FMN & CoSep<br />
6.5. Conclusions<br />
We performed the simulation <strong>of</strong> isolated HEME domain <strong>and</strong> HEME/FMN complex in<br />
CoSep solution. Structure remains conserved in both the systems throughout the<br />
simulation. CoSep was found to bound mainly on surface exposed loop regions (richer in<br />
charged amino acid mainly negative charged, E <strong>and</strong> D) in both the systems. CoSep binding<br />
affects the substrate access channel, found to be relatively more open in compassion to the<br />
one in water simulation.[20] Isolated HEME domain adopts ET tunneling from CoSep to<br />
HEME iron mediated by residues <strong>of</strong> B’/C loop. HEME/FMN complex has two possible ET<br />
tunneling pathways. First one was from CoSep to FMN <strong>and</strong> then from FMN to HEME by the<br />
involvement <strong>of</strong> same residues (as observed in water simulation, Met490 <strong>and</strong> Phe393) but<br />
mediated by water molecule. However, the average distance between FMN/HEME was<br />
lesser (1.35 ± 0.11 nm) than the one observed in water simulation (1.41 ± 0.09 nm <strong>and</strong> 1.81<br />
nm in crystal structure). Second ET tunneling was directly from CoSep to HEME iron. Both<br />
the system shows atomic fluctuations <strong>of</strong> different amplitude in CoSep solution <strong>and</strong> in water<br />
simulation. Hence, the presence <strong>of</strong> CoSep does not affect dramatically the conformation <strong>of</strong><br />
P450BM-3 domains but mainly it results into the stabilization on the loops on surface.<br />
Except in HEME/FMN complex CoSep binding induced the conformational change in G helix<br />
<strong>and</strong> resulted in higher fluctuation in F/G <strong>and</strong> G/H loop regions during the simulation. In<br />
HEME/FMN complex, the preferable ET pathway is from CoSep to FMN <strong>and</strong> then to HEME<br />
iron <strong>and</strong> in this process surface water molecule plays an important role. However, in<br />
isolated HEME domain direct ET from CoSep to HEME iron might fasten the ET tunneling<br />
<strong>and</strong> hence the performance <strong>of</strong> enzyme as observed in protein engineering experiment <strong>of</strong><br />
P450BM-3 that isolated HEME domain perform better in the presence <strong>of</strong> Zn/Co(III)sep. The<br />
results <strong>of</strong> this study provide indication <strong>of</strong> the mechanism <strong>of</strong> ET by the CoSep. These results<br />
are in agreement with the findings <strong>of</strong> directed evolution <strong>and</strong> side directed mutagenesis<br />
experiments on the whole P450BM-3 <strong>and</strong> the HEME domain.<br />
166
PART II: P450BM-3 HEME/FMN & CoSep<br />
6.6. References<br />
1. Chefson A, Auclair K (2006) Progress towards the easier use <strong>of</strong> P450 enzymes. Mol<br />
Biosyst 2: 462-469.<br />
2. Wong LL (1998) Cytochrome P450 monooxygenases. Curr Opin Chem Biol 2: 263-<br />
268.<br />
3. Guengerich FP (2001) Common <strong>and</strong> uncommon cytochrome P450 reactions related<br />
to metabolism <strong>and</strong> chemical toxicity. Chem Res Toxicol 14: 611-650.<br />
4. Kumar S Engineering cytochrome P450 biocatalysts for biotechnology, medicine <strong>and</strong><br />
bioremediation. Expert Opin Drug Metab Toxicol 6: 115-131.<br />
5. Narhi LO, Fulco AJ (1986) Characterization <strong>of</strong> a catalytically self-sufficient 119,000-<br />
dalton cytochrome P-450 monooxygenase induced by barbiturates in Bacillus<br />
megaterium. J Biol Chem 261: 7160-7169.<br />
6. Narhi LO, Fulco AJ (1987) Identification <strong>and</strong> Characterization <strong>of</strong> 2 Functional<br />
Domains in Cytochrome-P-450bm-3, a Catalytically Self-Sufficient Monooxygenase<br />
Induced by Barbiturates in Bacillus-Megaterium. J Biol Chem 262: 6683-6690.<br />
7. Whitehouse CJC, Bell SG, Wong L-L (2012) P450BM3 (CYP102A1): connecting the<br />
dots. Chem Soc Rev.<br />
8. Warman AJ, Roitel O, Neeli R, Girvan HM, Seward HE, et al. (2005) Flavocytochrome<br />
P450 BM3: an update on structure <strong>and</strong> mechanism <strong>of</strong> a biotechnologically important<br />
enzyme. Biochem Soc Trans 33: 747-753.<br />
9. Girvan HM, Waltham TN, Neeli R, Collins HF, McLean KJ, et al. (2006)<br />
Flavocytochrome P450 BM3 <strong>and</strong> the origin <strong>of</strong> CYP102 fusion species. Biochem Soc<br />
Trans 34: 1173-1177.<br />
10. Schuhmann W (2002) Amperometric enzyme biosensors based on optimised<br />
electron-transfer pathways <strong>and</strong> non-manual immobilisation procedures. J<br />
Biotechnol 82: 425-441.<br />
11. Nazor J, Dannenmann S, Adjei RO, Fordjour YB, Ghampson IT, et al. (2008)<br />
Laboratory evolution <strong>of</strong> P450 BM3 for mediated electron transfer yielding an<br />
167
PART II: P450BM-3 HEME/FMN & CoSep<br />
activity-improved <strong>and</strong> reductase-independent variant. Protein Eng Des Sel 21: 29-<br />
35.<br />
12. Schwaneberg U, Appel D, Schmitt J, Schmid RD (2000) P450 in biotechnology: zinc<br />
driven omega-hydroxylation <strong>of</strong> p-nitrophenoxydodecanoic acid using P450 BM-3<br />
F87A as a catalyst. J Biotechnol 84: 249-257.<br />
13. Wong TS, Schwaneberg U (2003) Protein engineering in bioelectrocatalysis. Curr<br />
Opin Biotechnol 14: 590-596.<br />
14. Sevrioukova IF, Li HY, Zhang H, Peterson JA, Poulos TL (1999) Structure <strong>of</strong> a<br />
cytochrome P450-redox partner electron-transfer complex. P Natl Acad Sci USA 96:<br />
1863-1868.<br />
15. Humphrey W, Dalke A, Schulten K (1996) VMD: Visual molecular dynamics. J Mol<br />
Graphics 14: 33-&.<br />
16. van Gunsteren WF, Billeter SR, Eising AA, Hunenberger PH, Kruger P, et al. (1996)<br />
Biomolecular Simulation: The GROMOS96 Manual <strong>and</strong> User Guide. VdF:<br />
Hochschulverlag AG an der ETH Zurich <strong>and</strong> BIOMOS bv, Zurich, Groningen.<br />
17. Helms V, Deprez E, Gill E, Barret C, Hui Bon Hoa G, et al. (1996) Improved binding <strong>of</strong><br />
cytochrome P450cam substrate analogues designed to fill extra space in the<br />
substrate binding pocket. Biochemistry 35: 1485-1499.<br />
18. Roccatano D, Wong TS, Schwaneberg U, Zacharias M (2005) Structural <strong>and</strong> dynamic<br />
properties <strong>of</strong> cytochrome P450 BM-3 in pure water <strong>and</strong> in a<br />
dimethylsulfoxide/water mixture. Biopolymers 78: 259-267.<br />
19. Roccatano D, Wong TS, Schwaneberg U, Zacharias M (2006) Toward underst<strong>and</strong>ing<br />
the inactivation mechanism <strong>of</strong> monooxygenase P450 BM-3 by organic cosolvents: a<br />
molecular dynamics simulation study. Biopolymers 83: 467-476.<br />
20. Verma R, Schwaneberg U, Roccatano D Insight into the redox partner interaction<br />
mechanism in cytochrome P450BM-3 using molecular dynamics simulation.<br />
Unpublished.<br />
21. Walsh JD, Miller AF (2003) Flavin reduction potential tuning by substitution <strong>and</strong><br />
bending. J Mol Struc-Theochem 623: 185-195.<br />
22. Zheng Y-J, Ornstein RL (1996) A Theoretical Study <strong>of</strong> the Structures <strong>of</strong> Flavin in<br />
Different Oxidation <strong>and</strong> Protonation States. J Am Chem Soc 118: 9402-9408.<br />
168
PART II: P450BM-3 HEME/FMN & CoSep<br />
23. Verma R, Schwaneberg U, Roccatano D Conformational Dynamics <strong>of</strong> the FMNbinding<br />
Reductase Domain <strong>of</strong> Monooxygenase P450BM-3. Unpublished.<br />
24. Dehayes LJ, Busch DH (1973) Conformational Studies <strong>of</strong> Metal-Chelates .1. Intra-<br />
Ring Strain in 5-Membered <strong>and</strong> 6-Membered Chelate Rings. Inorganic Chemistry 12:<br />
1505-1513.<br />
25. Becke AD (1993) Density-functional thermochemistry. III. The role <strong>of</strong> exact<br />
exchange. The Journal <strong>of</strong> Chemical Physics 98: 5648-5652.<br />
26. Hay PJ, Willard RW (1985) Ab initio effective core potentials for molecular<br />
calculations. Potentials for the transition metal atoms Sc to Hg. The Journal <strong>of</strong><br />
Chemical Physics 82: 270-283.<br />
27. Breneman CM, Wiberg KB (1990) Determining Atom-Centered Monopoles from<br />
Molecular Electrostatic Potentials - the Need for High Sampling Density in<br />
Formamide Conformational-Analysis. Journal <strong>of</strong> Computational Chemistry 11: 361-<br />
373.<br />
28. Frisch MJ, Trucks GW, Schlegel HB, Scuseria GE, Robb MA, et al. (2009) Gaussian 09,<br />
Revision B.01. Gaussian 09, Revision B01, Gaussian, Inc, Wallingford CT. Wallingford<br />
CT.<br />
29. Balabin IA, Hu X, Beratan DN (2012) Exploring biological electron transfer pathway<br />
dynamics with the Pathways Plugin for VMD. J Comput Chem 33: 906-910.<br />
169
PART II: P450BM-3 HEME/FMN & CoSep SI<br />
Supporting Information<br />
Insight into the redox partner interaction mechanism in<br />
cytochrome P450BM-3 using molecular dynamics<br />
simulation<br />
Table S6.1: Force field parameters for cobalt(II)sepulchrate adopted from the force<br />
constants calculated on the energy minimized geometries by Dehayes et al.[1]<br />
Bond stretching parameters<br />
Bond type Force constant (kJ mol -1 nm -1 )<br />
Co – N 885.234<br />
N – C 2342.558<br />
N – H 2962.824<br />
C – C 2709.900<br />
C – H 2740.010<br />
Angle bending parameters<br />
Angle type Force constant (kJ mol -1 rad -2 )<br />
N – Co – N 240.88<br />
Co – N – C 167.41<br />
Co – N – H 167.41<br />
C – C – C 417.92<br />
C – C – H 292.06<br />
170
PART II: P450BM-3 HEME/FMN & CoSep SI<br />
C – N – H 167.41<br />
C – N – C 417.92<br />
N – C – C 417.92<br />
N – C – H 292.06<br />
N – C – N 334.22<br />
H – N – H 251.11<br />
Dihedral parameters<br />
Dihedral type Force constant (kJ mol -1 )<br />
H – C – C – H 2.729<br />
H – C – N – Co 2.729<br />
H – C – N – C 2.729<br />
H – N – C – C 2.729<br />
H – C – N – H 1.807<br />
H – C – C – C 4.103<br />
H – C – C – N 4.103<br />
Co – N – C – C 1.373<br />
N – Co – N – C 0.000<br />
N – C – C – N 2.060<br />
N – C – C – C 2.060<br />
C – C – C – C 2.060<br />
171
PART II: P450BM-3 HEME/FMN & CoSep SI<br />
Table S6.2: Partial charges on cobalt(II)sepulchrate calculated by DFT calculations <strong>and</strong><br />
adopted for GROMOS96 43a1 force field.[2]<br />
Atom number Atom type Atom name Charge group Partial charge<br />
1 CH2 C 1 0.514<br />
2 N N 2 -0.544<br />
3 CH2 C 3 0.149<br />
4 CH2 C 4 0.149<br />
5 N N 5 -0.542<br />
6 CH2 C 6 0.511<br />
7 N N 7 -0.945<br />
8 CH2 C 8 0.639<br />
9 N N 9 -0.785<br />
10 CH2 C 10 0.143<br />
11 CH2 C 11 0.253<br />
12 N N 12 -0.944<br />
13 CH2 C 13 0.697<br />
14 N N 14 -0.947<br />
15 CH2 C 15 0.639<br />
16 N N 16 -0.785<br />
17 CH2 C 17 0.143<br />
18 CH2 C 18 0.253<br />
19 N N 19 -0.944<br />
20 CH2 C 20 0.697<br />
21 CO CO 21 1.682<br />
22 H H 22 0.252<br />
23 H H 23 0.381<br />
24 H H 24 0.351<br />
25 H H 25 0.381<br />
26 H H 26 0.351<br />
172
PART II: P450BM-3 HEME/FMN & CoSep SI<br />
27 H<br />
H 27<br />
0.251<br />
Figure S6.1: Backbone root means square deviation (RMSD) with respect to reference structure<br />
as a function <strong>of</strong> time for AF chain (black), A <strong>of</strong> AF chain (red), F <strong>of</strong> AF chain (green), A (blue) <strong>and</strong> F<br />
(orange) chain in water (as dotted line), <strong>and</strong> CoSep solution (straight line). P450BM-3 domains<br />
deviated less in CoSep solution than the one in water only. Major difference was observed for<br />
isolated A chain (green color solid line) in the presence <strong>of</strong> CoSep with lower deviation than the one<br />
in water <strong>and</strong> it reached to a plateau after 25 ns with less variation.<br />
173
PART II: P450BM-3 HEME/FMN & CoSep SI<br />
Figure S6.2: Radius <strong>of</strong> gyration with respect to reference structure as a function <strong>of</strong> time for AF<br />
chain (black), A <strong>of</strong> AF chain (red), F <strong>of</strong> AF chain (green), A chain (blue) <strong>and</strong> F chain (orange) in<br />
water (as dotted line), <strong>and</strong> CoSep solution (straight line).<br />
174
PART II: P450BM-3 HEME/FMN & CoSep SI<br />
Figure S6.3: Backbone RMSD (a) <strong>and</strong> RMSF (b) per residue with respect to crystal structure for<br />
isolated A <strong>and</strong> F chain (in red color), AF complex (in black color) in water, <strong>and</strong> A chain (in green<br />
color) <strong>and</strong> AF complex (in orange color) in CoSep solution. The maroon vertical line separates<br />
HEME <strong>and</strong> FMN domains. Horizontal bars, in blue <strong>and</strong> orange color represent helices (labeled) <strong>and</strong><br />
beta sheets, respectively. The regions involved in c<strong>of</strong>actor binding are represented by horizontal<br />
bars in purple color.<br />
175
PART II: P450BM-3 HEME/FMN & CoSep SI<br />
Figure S6.4: The binding <strong>of</strong> CoSep (in ball <strong>and</strong> stick representation) on a) A chain <strong>and</strong> b) AF chain<br />
<strong>of</strong> P450BM-3 in cartoon representation with surface colored by element type (carbon in gray,<br />
oxygen in red, nitrogen in blue <strong>and</strong> hydrogen in white). FMN <strong>and</strong> HEME c<strong>of</strong>actors are in licorice<br />
representation in red <strong>and</strong> green color, respectively. Helices, loops <strong>and</strong> N- <strong>and</strong> C- terminus (in red)<br />
are labeled.<br />
176
PART II: P450BM-3 HEME/FMN & CoSep SI<br />
Figure S6.5: Average over the minimum distance (less than 2.3 nm) between CoSep <strong>and</strong><br />
isoalloxazine ring <strong>of</strong> FMN c<strong>of</strong>actor (in green color), <strong>and</strong> HEME iron <strong>of</strong> A chain in isolated domain (in<br />
black color) <strong>and</strong> complex simulation (in red color) as a function <strong>of</strong> time.<br />
Figure S6.6: Number <strong>of</strong> contacts between CoSep <strong>and</strong> amino acids <strong>of</strong> P450BM-3 domains within<br />
the distance <strong>of</strong> 0.50 nm.<br />
177
PART II: P450BM-3 HEME/FMN & CoSep SI<br />
Figure S6.7: Minimum distance between P45C α <strong>and</strong> A191C α (1.61 nm in crystal structure) as a<br />
function <strong>of</strong> time for isolated A <strong>and</strong> F chain (in red color), AF complex (in black color), <strong>and</strong> A chain<br />
(in green color) <strong>and</strong> AF complex (in orange color) with CoSep.<br />
178
PART II: P450BM-3 HEME/FMN & CoSep SI<br />
Figure S6.8: Minimum distance between heavy atoms <strong>of</strong> isoalloxazine ring <strong>of</strong> FMN <strong>and</strong> HEME<br />
c<strong>of</strong>actor as a function <strong>of</strong> time in AF complex in water (in black color) <strong>and</strong> AF complex in CoSep<br />
solution (in red color). Green color horizontal line shows the distance observed in crystal<br />
structure.[3]<br />
179
PART II: P450BM-3 HEME/FMN & CoSep SI<br />
Figure S6.9: RPF for first, second <strong>and</strong> third eigenvector <strong>of</strong> isolated A <strong>and</strong> F chain (in red color), AF<br />
complex (in black color) in water, <strong>and</strong> A chain (in green color) <strong>and</strong> AF complex (in orange color) in<br />
CoSep solution. . The maroon vertical line separates HEME <strong>and</strong> FMN domain. Horizontal bars, in blue<br />
<strong>and</strong> orange color represent helices es (labeled) <strong>and</strong> beta sheets, respectively. The regions involved in<br />
c<strong>of</strong>actor binding are represented by horizontal bars in purple color.<br />
180
PART II: P450BM-3 HEME/FMN & CoSep SI<br />
Figure S6.10: RPF for first, second <strong>and</strong> third eigenvector <strong>of</strong> AF complex in water (black color) <strong>and</strong><br />
in CoSep solution (orange color). The maroon vertical line separates HEME <strong>and</strong> FMN domain.<br />
Horizontal bars, in blue <strong>and</strong> orange color represent helices (labeled) <strong>and</strong> beta sheets, respectively.<br />
The regions involved in c<strong>of</strong>actor binding are represented by horizontal bars in purple color.<br />
181
PART II: P450BM-3 HEME/FMN & CoSep SI<br />
Figure S6.11: RMSF <strong>of</strong> protein backbone atoms along first (a), second (b) <strong>and</strong> third (c)<br />
eigenvector after projection <strong>of</strong> the trajectory on the corresponding eigenvector <strong>of</strong> AF complex in<br />
CoSep solution. The 10 sequential frames represent the extension <strong>of</strong> the fluctuations in trajectories<br />
along the eigenvectors. The first extreme conformation is shown in green color <strong>and</strong> last extreme in<br />
violet color. Other conformations <strong>of</strong> A <strong>and</strong> F chain are in sky blue <strong>and</strong> tan color, respectively. Helices<br />
<strong>and</strong> loops in FMN domain are labeled. N <strong>and</strong> C indicate the N- <strong>and</strong> C-terminus <strong>of</strong> the protein<br />
(labeled in red color).<br />
182
PART II: P450BM-3 HEME/FMN & CoSep SI<br />
References<br />
1. Dehayes LJ, Busch DH (1973) Conformational Studies <strong>of</strong> Metal-Chelates .1. Intra-<br />
Ring Strain in 5-Membered <strong>and</strong> 6-Membered Chelate Rings. Inorganic Chemistry 12:<br />
1505-1513.<br />
2. van Gunsteren WF, Billeter SR, Eising AA, Hunenberger PH, Kruger P, et al. (1996)<br />
Biomolecular Simulation: The GROMOS96 Manual <strong>and</strong> User Guide. VdF:<br />
Hochschulverlag AG an der ETH Zurich <strong>and</strong> BIOMOS bv, Zurich, Groningen.<br />
3. Sevrioukova IF, Li HY, Zhang H, Peterson JA, Poulos TL (1999) Structure <strong>of</strong> a<br />
cytochrome P450-redox partner electron-transfer complex. P Natl Acad Sci USA 96:<br />
1863-1868.<br />
183
Summary <strong>and</strong> outlook<br />
In the first part <strong>of</strong> the thesis, the importance <strong>of</strong> the combined computational <strong>and</strong><br />
directed evolution methods have been reviewed as a winning strategy for protein<br />
engineering. The computational approaches can assist the design <strong>of</strong> protein engineering<br />
experiments <strong>and</strong> holds particular promise to tailor proteins for specific functions.<br />
MAP 2.0 3D server has been introduced to assist the development <strong>of</strong> directed evolution<br />
experiments for generating sequence libraries with the highest chance to have variants<br />
with desired enzymatic properties. This task is accomplished by correlating the generated<br />
amino acid substitution patterns for a specific r<strong>and</strong>om mutagenesis method to the<br />
structural information <strong>of</strong> the target protein. The combined information can help to select<br />
an experimental strategy that improves the chances to obtain functional efficient <strong>and</strong>/or<br />
stable enzyme variants. Hence, MAP 2.0 3D server facilitates the ‘in-silico’ pre-screening <strong>of</strong><br />
the target gene by predicting the amino acid diversity population in r<strong>and</strong>om mutagenesis<br />
libraries. Currently, MAP 2.0 3D server provides sequence/structure based analysis using the<br />
protein sequence/structure (crystallographic structure or homology model) provided by<br />
the user. In future, the capability <strong>of</strong> the server can further be extended by (1) dynamically<br />
identifying the functionally important regions e.g. active site residues <strong>and</strong> trans-membrane<br />
regions in the target protein <strong>and</strong> focusing the analysis only on those regions, (2) by<br />
providing MAP 2.0 3D results <strong>of</strong> structural analysis in the absence <strong>of</strong> crystallographic or<br />
model structure using the predicted secondary structure elements from protein sequence,<br />
<strong>and</strong> (3) predicting the flexible regions in protein structure using e.g. Gaussian network<br />
model (GMN) <strong>and</strong> correlate them with MAP 2.0 3D analysis.<br />
184
In the second part <strong>of</strong> the thesis, molecular dynamics simulations were used to<br />
underst<strong>and</strong> the interaction mechanism in the HEME <strong>and</strong> FMN domains <strong>of</strong> P450BM-3 in<br />
solution <strong>and</strong> in the presence <strong>of</strong> electron mediator cobalt(II)sepulchrate (CoSep).<br />
Cytochrome P450BM-3 is the pivot member <strong>of</strong> cytochrome P450 monooxygenase<br />
superfamily particularly for being bacterial P450, fused with its eukaryotic like P450s<br />
redox partners (FMN <strong>and</strong> FAD binding domains). This structural feature makes the enzyme<br />
catalytically self-sufficient. In addition, being soluble in water, it has high catalytic<br />
efficiency <strong>and</strong> monooxygenase rate. These characteristics make the enzyme particularly<br />
interesting for possible biotechnological application. For this reason, the comprehension <strong>of</strong><br />
structure-function-dynamics relationships in P450BM-3 is relevant. In this thesis we have<br />
analyzed different dynamic <strong>and</strong> structural properties <strong>of</strong> the HEME domain, FMN domain<br />
<strong>and</strong> their complex in solution.<br />
In the first study, the effect <strong>of</strong> protonation states (oxidized <strong>and</strong> reduced) <strong>of</strong> FMN<br />
c<strong>of</strong>actor on conformation <strong>and</strong> dynamics <strong>of</strong> FMN-binding domain <strong>of</strong> P450BM-3 was<br />
analyzed by performing MD simulations <strong>of</strong> holo- <strong>and</strong> apo- protein in solution. In holoprotein,<br />
the protonation state <strong>of</strong> isoalloxazine ring influences the conformation <strong>and</strong><br />
dynamics <strong>of</strong> FMN c<strong>of</strong>actor <strong>and</strong> resulted in change in FMN binding site. In particular, the<br />
dynamics <strong>of</strong> FMN domain showed significant differences in the atomic fluctuation<br />
amplitude in oxidized <strong>and</strong> reduced states. In apo-protein, the overall structure remained<br />
conserved but high fluctuations were observed in FMN binding region that can promote the<br />
feasible rebinding <strong>of</strong> FMN c<strong>of</strong>actor as observed experimentally.<br />
The MD simulation <strong>of</strong> HEME <strong>and</strong> FMN domains were performed to gain insight into<br />
the interaction mechanism <strong>and</strong> inter domain electron transfer in HEME/FMN complex. The<br />
simulations <strong>of</strong> isolated HEME <strong>and</strong> FMN domains were also performed to compare their<br />
behavior in solution <strong>and</strong> in HEME/FMN complex. The HEME/FMN complex undergoes<br />
conformational rearrangement during the simulation <strong>and</strong> decrease the distance between<br />
FMN <strong>and</strong> HEME c<strong>of</strong>actor within the range for expected ET between both the redox centers.<br />
185
In complex the main collective motion was dominated by the interaction mechanism<br />
between HEME <strong>and</strong> FMN domain.<br />
The MD simulations <strong>of</strong> HEME/FMN complex <strong>and</strong> isolated HEME domain were<br />
performed to investigate the binding modes between CoSep <strong>and</strong> P450BM-3 domains <strong>and</strong><br />
their effect on ET pathway. CoSep prefers to bind on surface exposed loop regions mainly<br />
having negative charged residues. CoSep binding on HEME domain was observed to affect<br />
the substrate access channel <strong>and</strong> keep it more open in comparison to the one observed in<br />
solution. Putative ET pathways were proposed between CoSep <strong>and</strong> HEME iron in<br />
HEME/FMN complex <strong>and</strong> isolated HEME domain.<br />
The results <strong>of</strong> P450BM-3 simulations can enhance our basic underst<strong>and</strong>ing with the<br />
possible applications in enzyme catalysis toward (1) the effect <strong>of</strong> protonation state on<br />
dynamics <strong>of</strong> P450BM-3 reductase domain, (2) the interaction mechanism <strong>of</strong> redox partners<br />
<strong>and</strong> its effect on ET tunneling between redox centers, <strong>and</strong> (3) the effect <strong>of</strong> the presence <strong>of</strong><br />
ET mediator on redox partner interaction <strong>and</strong> ET tunneling. The study can be further<br />
extended by performing the MD simulation <strong>of</strong> HEME/FMN complex with FMN c<strong>of</strong>actor in<br />
reduced state. The modeling <strong>of</strong> linker regions connecting HEME <strong>and</strong> FMN domains followed<br />
by its simulation will help to further enhance our underst<strong>and</strong>ing toward HEME/FMN<br />
binding interaction mechanism <strong>and</strong> ET tunneling. Recently the release <strong>of</strong> FAD domain <strong>of</strong><br />
P450BM-3 with NADPH also <strong>of</strong>fers a chance to perform the simulation <strong>of</strong> the whole<br />
complex.<br />
186
Curriculum vitae<br />
Personal Details<br />
Name:<br />
Rajni Verma<br />
Address:<br />
School <strong>of</strong> Engineering <strong>and</strong> Science,<br />
<strong>Jacobs</strong> <strong>University</strong> Bremen,<br />
28759 Bremen, Germany<br />
Tel.: +49 421 200 3208<br />
Email:<br />
ra.verma@jacobs-university.de<br />
Date <strong>of</strong> Birth: 15 th April 1984<br />
Nationality<br />
Indian<br />
Linguistic skills:<br />
Hindi, English<br />
__________________________________<br />
Employment & Education<br />
_______________________________<br />
Since 06/09<br />
PhD Fellow in Computational Chemistry <strong>and</strong> Bioinformatics,<br />
<strong>Jacobs</strong> <strong>University</strong> Bremen, Bremen, Germany<br />
04/08 – 03/09 Project Assistant, Bioinformatics,<br />
Institute <strong>of</strong> Genomics <strong>and</strong> Integrative Biology, New Delhi, India<br />
07/05 – 03/08 Master <strong>of</strong> Science in Bioinformatics,<br />
CCS CS <strong>University</strong>, Meerut, India<br />
06/04 – 06/05 Advanced Diploma in Computer <strong>Application</strong>,<br />
CCSCS <strong>University</strong>, Meerut, India<br />
07/01 – 06/05 Bachelor <strong>of</strong> Science in Life Sciences,<br />
CCSCS <strong>University</strong>, Meerut, India<br />
Publications<br />
1. Verma R, , Schwaneberg U, Roccatano D. Conformational dynamics <strong>of</strong> the FMN-binding reductase<br />
domain <strong>of</strong> monooxygenase P450BM-3. J Chem Theory <strong>and</strong> Comput 2012, DOI: 10.1021/ct300723x.<br />
2. Verma R, , Schwaneberg U, Roccatano D. Computer-aided protein directed evolution: a review <strong>of</strong><br />
web servers, databases <strong>and</strong> other computational tools for protein engineering. Computational <strong>and</strong><br />
Structural Biotechnology Journal 2012, 2 (3), e201209008.<br />
3. Ruff AJ, Marienhagen J, Verma R, , Roccatano D, Genieser HG, Niemann P, Shivange AV,<br />
Schwaneberg U. dRTP <strong>and</strong> dPTP a complementary nucleotide couple for the Sequence Saturation<br />
Mutagenesis (SeSaM) method. J Mol Catal B-Enzym 2012, 84, 40-47.<br />
4. Verma R, Schwaneberg U, Roccatano D. MAP 2.0 3D: a sequence/structure based<br />
server for protein<br />
engineering. ACS Synth Bio. 2012, 1 (4), 139-150.
Curriculum vitae<br />
5. Ramach<strong>and</strong>ran S, Chaudhuri R, Verma R, Shah AR, Sen R, Paul C. Systems Immunology: Data<br />
modeling <strong>and</strong> scripting in R Book Chapter: Encyclopedia <strong>of</strong> Systems Biology, Springer Science &<br />
Business Media, LLC 2011. Edited by W. Dubitsky, O. Wolkenhauer, K. Cho, & H. Yokota.<br />
6. Zhu L, Verma R, Roccatano D, Ni Y, Sun Z, Schwaneberg U. A potential antitumor drug (arginine<br />
deiminase) reengineered for efficient operation under physiological conditions. ChemBioChem 2010,<br />
11, 2294-2301. [inside cover page]<br />
Conferences/Abstracts<br />
1. Verma R, Schwaneberg U, Roccatano D. Molecular dynamics simulations <strong>of</strong> P450BM-3 reductase<br />
domain. Computer simulation <strong>and</strong> theory <strong>of</strong> macromolecules 2012, Hunfeld, Germany.<br />
2. Verma R, Schwaneberg U, Roccatano D. Protein <strong>and</strong> c<strong>of</strong>actor conformational dynamics <strong>of</strong> FMNbinding<br />
reductase domain <strong>of</strong> monooxygenase P450BM-3. 5th Meeting <strong>of</strong> the North German<br />
Biophysicist 2012, Borstel, Germany.<br />
3. Verma R, Schwaneberg U, Roccatano D. MAP 2.0 3D: a structure based substitution spectra analyses<br />
<strong>of</strong> mutagenesis methods. 10 th International Symposium on Biocatalysis- Biotrans 2011, Sicily, Italy.<br />
4. Ruff AJ, Marienhagen J, Verma R, Roccatano D, Mundhada H, Shivange AV, Schwaneberg U.<br />
Ribavarin: A complementary universal base to P for Sequence Saturation Mutagensis Method<br />
(SeSaM). 10 th International Symposium on Biocatalysis- Biotrans 2011, Sicily, Italy.<br />
5. Verma R, Schwaneberg U, Roccatano D. Conformational dynamics <strong>of</strong> oxidized <strong>and</strong> reduced FMN in<br />
water <strong>and</strong> methanol. MoLife Center <strong>Jacobs</strong> <strong>University</strong> Bremen 2011, Seefeld, Germany.<br />
6. Verma R, Schwaneberg U, Roccatano D. MAP2.0: Evolution <strong>of</strong> Mutagenesis Assistant Program. 5 th<br />
International Congress on Biocatalysis- Biocat 2010, Hamburg, Germany.<br />
7. WE-Heraeus Summer School June 2009, ‘Quantum <strong>and</strong> classical simulation <strong>of</strong> biological systems <strong>and</strong><br />
their interaction with technical materials’, Bremen, Germany. (Participation)<br />
References<br />
Pr<strong>of</strong>. Dr. Danilo Roccatano<br />
Assistant Pr<strong>of</strong>essor<br />
School <strong>of</strong> Engineering <strong>and</strong> Science,<br />
<strong>Jacobs</strong> <strong>University</strong> Bremen,<br />
Campus Ring 1, Research II,<br />
28759 Bremen, Germany<br />
Tel: +49-421 200-3144<br />
Fax: +49-421-200-3249<br />
Email: d.roccatano@jacobs-university.de<br />
Web: http://ses.jacobs-university.de/ses/droccatano<br />
Pr<strong>of</strong>. Dr. Ulrich Schwaneberg<br />
Head <strong>of</strong> the Institute<br />
Department <strong>of</strong> Biotechnology,<br />
RWTH Aachen <strong>University</strong>,<br />
Worringer Weg 1,<br />
52056 Aachen, Germany<br />
Tel.: +49-241-80-24176<br />
Fax: +49-241-80-22387<br />
E-Mail: u.schwaneberg@biotec.rwth-aachen.de<br />
Web: www.biotec.rwth-aachen.de
Statutory Declaration<br />
I, RAJNI VERMA, hereby declare that I have written this PhD thesis independently,<br />
unless where clearly stated otherwise. I have used only the sources, the data <strong>and</strong> the<br />
support that I have clearly mentioned. This PhD thesis has not been submitted for<br />
conferral <strong>of</strong> degree elsewhere.<br />
Bremen, December 19, 2012<br />
Signature ____________________________________________________________