11.03.2014 Views

Development and Application of Novel ... - Jacobs University

Development and Application of Novel ... - Jacobs University

Development and Application of Novel ... - Jacobs University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Development</strong> <strong>and</strong> <strong>Application</strong> <strong>of</strong> <strong>Novel</strong> Bioinformatics <strong>and</strong><br />

Computational Modeling Tools for Protein Engineering<br />

by<br />

Rajni Verma<br />

A thesis submitted for the degree <strong>of</strong><br />

Doctor <strong>of</strong> Philosophy<br />

in<br />

Computational Chemistry & Bioinformatics<br />

Date <strong>of</strong> Defense: December 14 th 2012<br />

Supervisor<br />

Pr<strong>of</strong>. Dr. Danilo Roccatano<br />

<strong>Jacobs</strong> <strong>University</strong> Bremen, Germany<br />

Co-supervisor<br />

Pr<strong>of</strong>. Dr. Ulrich Schwaneberg<br />

RWTH Aachen <strong>University</strong>, Germany<br />

External committee member<br />

Dr. Steven Hayward<br />

<strong>University</strong> <strong>of</strong> East Anglia, UK<br />

School <strong>of</strong> Engineering <strong>and</strong> Science


Acknowledgement<br />

I express my sincere gratitude to my PhD supervisor, Pr<strong>of</strong>. Dr. Danilo Roccatano for<br />

his expert <strong>and</strong> continuous guidance. Especially, I thank him for his patience <strong>and</strong> the time he<br />

spent to explain the concepts <strong>and</strong> ideas that really helped me to accomplish my work. His<br />

constant support, underst<strong>and</strong>ing, motivation <strong>and</strong> valuable discussions provided me a<br />

wonderful learning experience during my PhD.<br />

I am thankful to Pr<strong>of</strong>. Dr. Ulrich Schwaneberg for his constructive comments, fruitful<br />

discussions <strong>and</strong> his support during this endeavor. It has been a great honor <strong>and</strong> pleasure to<br />

work with him. I am deeply grateful for his trust on me. I express my respectful gratitude to<br />

Dr. Steven Hayward for being a member <strong>of</strong> my PhD committee. I am thankful to Dr. Achim<br />

Gelessus for his technical support to utilize efficiently CLAMV facility for scientific<br />

computation throughout this work at <strong>Jacobs</strong> <strong>University</strong> Bremen.<br />

I express my whole-hearted thanks to the member <strong>of</strong> Pr<strong>of</strong>. Roccatano Group, Samira<br />

Hezaveh, Khadga Karki, Susruta Samanta <strong>and</strong> Edita Sarukhanyan for their wonderful<br />

company <strong>and</strong> support. I also convey my thanks to the member <strong>of</strong> Pr<strong>of</strong>. Schwaneberg group<br />

at RWTH Aachen for their cooperation. I express my special thanks to my friends, Kavita,<br />

Amit, Amol, Sagar, Hemanshu, Susruta, Usha <strong>and</strong> Steffi for providing such a friendly<br />

environment. Especially, I express my whole-souled gratitude to Amol for his continuous<br />

support, encouragement, motivation, patience <strong>and</strong> care throughout my PhD.<br />

i


Funding<br />

The work described in this PhD thesis was financially supported by European<br />

Union 7 th framework program for the project entitled “Effective redesign <strong>of</strong> oxidative<br />

enzymes for green chemistry” (Project reference: 212281) in collaboration with Pr<strong>of</strong>. Dr.<br />

Ulrich Schwaneberg from RWTH Aachen <strong>University</strong>.<br />

ii


List <strong>of</strong> Publication<br />

1. Verma R, Schwaneberg U, Roccatano D. Conformational dynamics <strong>of</strong> the FMN-binding<br />

reductase domain <strong>of</strong> monooxygenase P450BM-3. J Chem Theory <strong>and</strong> Comput 2012, DOI:<br />

10.1021/ct300723x.<br />

2. Verma R, Schwaneberg U, Roccatano D. Computer-aided protein directed evolution: a<br />

review <strong>of</strong> web servers, databases <strong>and</strong> other computational tools for protein<br />

engineering. Computational <strong>and</strong> Structural Biotechnology Journal 2012, 2 (3),<br />

e201209008.<br />

3. Ruff AJ, Marienhagen J, Verma R, Roccatano D, Genieser HG, Niemann P, Shivange AV,<br />

Schwaneberg U. dRTP <strong>and</strong> dPTP a complementary nucleotide couple for the Sequence<br />

Saturation Mutagenesis (SeSaM) method. J Mol Catal B-Enzym 2012, 84, 40-47.<br />

4. Verma R, Schwaneberg U, Roccatano D. MAP 2.0 3D: a sequence/structure based server<br />

for protein engineering. ACS Synth Bio. 2012, 1 (4), 139-150.<br />

5. Zhu L, Verma R, Roccatano D, Ni Y, Sun Z, Schwaneberg U. A potential antitumor drug<br />

(arginine deiminase) reengineered for efficient operation under physiological<br />

conditions. ChemBioChem 2010, 11, 2294-2301. [inside cover page]<br />

6. Verma R, Schwaneberg U, Roccatano D. Insight into the redox partner interaction<br />

mechanism in cytochrome P450BM-3 using molecular dynamics simulations.<br />

(manuscript under preparation)<br />

7. Verma R, Schwaneberg U, Roccatano D. A molecular dynamics study <strong>of</strong> the interactions<br />

between P450BM-3 domains <strong>and</strong> Coblat(II)Sepulchrate as an electron transfer<br />

mediator. (manuscript under preparation)<br />

iii


Abstract<br />

In the last decades, enzymatic catalysis emerges as a convenient <strong>and</strong><br />

environmentally friendly substitute for the traditional chemical processes range from the<br />

synthesis <strong>of</strong> many pharmaceutical <strong>and</strong> agrochemical building blocks to fine <strong>and</strong> bulk<br />

chemicals, <strong>and</strong> more recently, the components <strong>of</strong> bi<strong>of</strong>uel. The combination <strong>of</strong> experimental<br />

<strong>and</strong> computational methods holds particular promise in the field <strong>of</strong> enzymatic catalysis to<br />

tailor enzymes for the tasks not yet exploited by natural selection. Therefore, it is<br />

important to develop computational tools that help to exploit this goal. The scope <strong>of</strong> this<br />

thesis is to propose novel bioinformatics tools <strong>and</strong> to explore computational methods<br />

aimed to support <strong>and</strong> guide protein evolution experiments. The thesis is divided into two<br />

parts. First part <strong>of</strong> the thesis (Part I, Chapter 1 <strong>and</strong> Chapter 2) is focused on extending the<br />

benchmarking system <strong>of</strong> r<strong>and</strong>om mutagenesis methods (MAP: Mutagenesis Assistant<br />

Program) towards the sequence/structure <strong>and</strong> structure/function analysis <strong>and</strong> to evaluate<br />

this approach on commonly used enzymes as biocatalysts. Chapter 1 <strong>of</strong>fers the<br />

comprehensive information about the computational methods used to assist protein<br />

engineering experiments. Chapter 2 describes a completely renewed <strong>and</strong> improved version<br />

<strong>of</strong> MAP server, named as MAP 2.0 3D server that correlates the generated amino acid<br />

substitution patterns to the structural information <strong>of</strong> the target protein. Therefore, the<br />

latter helps to identify in advance the r<strong>and</strong>om mutagenesis method that can introduce<br />

mutations having less deleterious effect <strong>and</strong> to improve protein fitness towards an<br />

expected property, e.g. charged amino acid substitutions to increase solubility <strong>of</strong> protein in<br />

water. The capability <strong>of</strong> the server was illustrated by in-silico screening <strong>of</strong> different<br />

enzymes <strong>and</strong> the predicted results were in agreement with the experimental findings.<br />

iv


The atomic level underst<strong>and</strong>ing <strong>of</strong> the subtle intertwining among structure,<br />

dynamics <strong>and</strong> function <strong>of</strong> enzymes plays an important role to rationally design new or<br />

improved functions. Second part <strong>of</strong> the thesis (Part II, Chapter 3 – 6) is based on molecular<br />

modeling approach to gain insight into the structural <strong>and</strong> dynamic properties <strong>of</strong> P450BM-3<br />

(CYP102) complex in water <strong>and</strong> in the presence <strong>of</strong> cobalt(II)sepulchrate (CoSep) as an<br />

electron transfer (ET) mediator. P450BM-3, isolated from Bacillus megaterium is an<br />

attractive target <strong>and</strong> model system for biochemical (catalyzes the wide variety <strong>of</strong><br />

industrially attractive substrates) <strong>and</strong> biomedical (being a bacterial model for microsomal<br />

P450s system) applications. The comprehensive theoretical aspects <strong>of</strong> MD simulation are<br />

provided in Chapter 3 with the overview about the system preparation for MD simulation<br />

<strong>and</strong> the analysis <strong>of</strong> protein conformation <strong>and</strong> dynamics in the generated trajectory. In<br />

Chapter 4, the structural <strong>and</strong> dynamic properties <strong>of</strong> P450BM-3 FMN (Flavin<br />

mononucleotide) domain as holo-protein, with the c<strong>of</strong>actor in oxidized <strong>and</strong> reduced states<br />

<strong>and</strong> as apo-protein are investigated. The results illustrate the effect <strong>of</strong> FMN c<strong>of</strong>actor <strong>and</strong> its<br />

protonation state on the conformation <strong>and</strong> dynamics <strong>of</strong> the FMN domain that can be<br />

related to ET pathway from FMN to HEME c<strong>of</strong>actor. The study is further extended to garner<br />

insight into the binding modes <strong>and</strong> the structural determinant <strong>of</strong> inter-domain ET in<br />

HEME/FMN complex <strong>of</strong> P450BM-3. MD simulations were performed on both FMN <strong>and</strong><br />

HEME domains, isolated <strong>and</strong> in their crystallographic complex <strong>and</strong> results are reported in<br />

Chapter 5. HEME/FMN complex undergoes the rearrangement process to decrease the<br />

distance between their redox centers to promote favorable ET rate under physiological<br />

condition. In Chapter 6, MD simulation <strong>of</strong> P450BM-3 domains (isolated HEME domain <strong>and</strong><br />

HEME/FMN complex) were performed in the presence <strong>of</strong> CoSep, as ET mediator. The<br />

results illustrate the preferential binding modes <strong>of</strong> CoSep in P450BM-3 domains <strong>and</strong> the<br />

putative ET pathways from CoSep to the iron center <strong>of</strong> HEME c<strong>of</strong>actor <strong>and</strong> are in agreement<br />

with the experimental findings.<br />

v


Table <strong>of</strong> Content<br />

Acknowledgement .............................................................................................................................................. i<br />

Funding .................................................................................................................................................................... ii<br />

List <strong>of</strong> Publication ............................................................................................................................................. iii<br />

Abstract .................................................................................................................................................................. iv<br />

Chapter 1 ................................................................................................................................................................. 1<br />

1.1. Abstract ................................................................................................................................................... 1<br />

1.2. Background ............................................................................................................................................ 1<br />

1.3. Generated diversity <strong>and</strong> library size ............................................................................................ 5<br />

1.4. Evolutionary conservation based focused library .................................................................. 9<br />

1.5. Structure-based focused library ................................................................................................. 15<br />

1.6. Mutational effects in protein ........................................................................................................ 23<br />

1.7. Summary <strong>and</strong> outlook..................................................................................................................... 26<br />

1.8. References ........................................................................................................................................... 27<br />

Chapter 2 .............................................................................................................................................................. 38<br />

2.1. Abstract ................................................................................................................................................ 38<br />

2.2. Introduction ........................................................................................................................................ 39<br />

2.3. Methods ................................................................................................................................................ 41<br />

2.3.1. Mutational probability <strong>and</strong> statistics ............................................................................... 41<br />

2.3.2. MAP indicators .......................................................................................................................... 43<br />

2.3.3. Local chemical diversity <strong>and</strong> protein structure components ................................. 44<br />

2.3.4. MAP 2.0 3D server description ............................................................................................... 46<br />

2.3.5. MAP 2.0 3D output....................................................................................................................... 48<br />

2.3.6. Model proteins .......................................................................................................................... 48<br />

2.4. Results <strong>and</strong> discussions ................................................................................................................. 49<br />

2.4.1. D-amino acid oxidase ............................................................................................................. 49<br />

vi


2.4.2. Phytase ......................................................................................................................................... 57<br />

2.4.3. N-acetylneuraminic acid aldolase ..................................................................................... 61<br />

2.5. Conclusions ......................................................................................................................................... 65<br />

2.6. References ........................................................................................................................................... 66<br />

Chapter 3 .............................................................................................................................................................. 71<br />

3.1. Background ......................................................................................................................................... 71<br />

3.2. Setup <strong>of</strong> the simulated systems ................................................................................................... 75<br />

3.3. Equilibration procedure ................................................................................................................ 76<br />

3.4. Structural <strong>and</strong> dynamical analysis ............................................................................................. 77<br />

3.5. Cluster analysis ................................................................................................................................. 77<br />

3.6. Principal component analysis ...................................................................................................... 78<br />

3.7. References ........................................................................................................................................... 79<br />

Chapter 4 .............................................................................................................................................................. 81<br />

4.1. Abstract ................................................................................................................................................ 81<br />

4.1. Introduction ........................................................................................................................................ 82<br />

4.2. Methods ................................................................................................................................................ 84<br />

4.2.1. Starting coordinates ............................................................................................................... 84<br />

4.2.2. Molecular dynamics simulation ......................................................................................... 84<br />

4.2.3. FMN binding site analysis ..................................................................................................... 86<br />

4.2.4. Multiple structural alignment <strong>of</strong> FMN domain ............................................................. 86<br />

4.3. Results ................................................................................................................................................... 87<br />

4.3.1. FMN domain: structural <strong>and</strong> dynamical properties ................................................... 87<br />

4.3.2. Cluster analysis <strong>of</strong> FMN domain......................................................................................... 89<br />

4.3.3. FMN binding site ...................................................................................................................... 90<br />

4.3.4. Conservation pr<strong>of</strong>ile <strong>of</strong> FMN binding site ...................................................................... 94<br />

4.3.5. Principal component analysis <strong>of</strong> FMN domain ............................................................. 96<br />

4.3.6. FMN c<strong>of</strong>actor: structural <strong>and</strong> dynamical properties .................................................. 99<br />

4.3.7. Cluster analysis <strong>of</strong> FMN c<strong>of</strong>actor ..................................................................................... 102<br />

4.3.8. Principal component analysis <strong>of</strong> FMN c<strong>of</strong>actor.......................................................... 102<br />

4.4. Discussions <strong>and</strong> conclusions ...................................................................................................... 103<br />

vii


4.5. References ......................................................................................................................................... 105<br />

Supporting information ............................................................................................................................. 109<br />

Chapter 5 ............................................................................................................................................................ 122<br />

5.1. Abstract .............................................................................................................................................. 122<br />

5.2. Introduction ...................................................................................................................................... 123<br />

5.3. Methods .............................................................................................................................................. 125<br />

5.3.1. Starting coordinates ............................................................................................................. 125<br />

5.3.2. Molecular dynamic simulations ....................................................................................... 125<br />

5.3.2. Electron transfer tunneling ................................................................................................ 127<br />

5.4. Results <strong>and</strong> discussion.................................................................................................................. 127<br />

5.4.1. Structural properties ............................................................................................................ 127<br />

5.4.2. Cluster analysis ....................................................................................................................... 130<br />

5.4.3. Substrate access channel .................................................................................................... 131<br />

5.4.4. ET tunneling pathways ........................................................................................................ 133<br />

5.4.5. Essential dynamics ..................................................................................................................... 135<br />

5.5. Conclusions ....................................................................................................................................... 139<br />

5.6. References ..................................................................................................................................... 140<br />

Supporting information ............................................................................................................................. 143<br />

Chapter 6 ............................................................................................................................................................ 151<br />

6.1. Abstract .............................................................................................................................................. 151<br />

6.2. Introduction ...................................................................................................................................... 152<br />

6.3. Methods .............................................................................................................................................. 153<br />

6.3.1. Starting coordinates ............................................................................................................. 153<br />

6.3.2. Molecular dynamics simulation <strong>and</strong> modeling .......................................................... 154<br />

6.4. Results <strong>and</strong> discussion.................................................................................................................. 156<br />

6.4.1. CoSep binding on P450BM-3 domains .......................................................................... 157<br />

6.4.2. Effect <strong>of</strong> CoSep binding on substrate access channel .............................................. 159<br />

6.4.3. Effect <strong>of</strong> CoSep binding on ET tunneling....................................................................... 159<br />

6.4.4. Effect <strong>of</strong> CoSep binding on P450BM-3 dynamics ...................................................... 162<br />

6.5. Conclusions ....................................................................................................................................... 166<br />

viii


6.6. References ......................................................................................................................................... 167<br />

Supporting information ............................................................................................................................. 170<br />

Summary <strong>and</strong> outlook................................................................................................................................. 184<br />

Curriculum vitae<br />

ix


PART I: CAPDE<br />

Chapter 1<br />

Computer-Aided Protein Directed Evolution: a Review <strong>of</strong><br />

Web Servers, Databases <strong>and</strong> other Computational Tools<br />

for Protein Engineering<br />

1.1. Abstract<br />

The combination <strong>of</strong> computational <strong>and</strong> directed evolution methods has proven a<br />

winning strategy for protein engineering. We refer to this approach as computer-aided<br />

protein directed evolution (CAPDE) <strong>and</strong> the chapter summarizes the recent developments<br />

in this rapidly growing field. We will restrict ourselves to overview the availability,<br />

usability <strong>and</strong> limitations <strong>of</strong> web servers, databases <strong>and</strong> other computational tools proposed<br />

in the last five years. The goal <strong>of</strong> this chapter is to provide concise information about<br />

currently available computational resources to assist the design <strong>of</strong> directed evolution<br />

based protein engineering experiment.<br />

1.2. Background<br />

Protein engineering comprises a large number <strong>of</strong> techniques applied to evolve or<br />

design protein with desired function.[1] The primary objective in any protein engineering<br />

experiment is to identify specific sequence changes <strong>and</strong> alter the protein for desired<br />

1


PART I: CAPDE<br />

functional properties.[1,2] Generally, two main approaches are used to design the novel<br />

proteins or enzymes: rational design <strong>and</strong> directed evolution. The first approach employs<br />

the information <strong>of</strong> protein structure <strong>and</strong> focuses mutagenesis to modify protein scaffolds<br />

(e.g. the active site <strong>of</strong> the biocatalyst). For this approach, the knowledge <strong>of</strong> the target amino<br />

acid is necessary <strong>and</strong> can be provided by visual inspection or in-silico prescreening.[3] Both<br />

cases depend on the nature <strong>of</strong> the problem <strong>and</strong> show high success rate only for the<br />

prediction <strong>of</strong> single or double mutations. Indeed, multiple mutations involve cooperative<br />

effects on protein structure <strong>and</strong> function that are almost inaccessible to the current<br />

computational screening methods as well.<br />

A more challenging de novo design or redesign <strong>of</strong> synthetic protein or peptide uses<br />

solely structural information <strong>and</strong> folding rules <strong>of</strong> the proteins.[4,5] Although the method<br />

<strong>of</strong>fers broadest possibility to design novel fold <strong>and</strong> function, the success for large proteins<br />

is limited.[6,7] The reasons rely on the limited number <strong>of</strong> three-dimensional protein<br />

structures (in particular membrane proteins) <strong>and</strong> the lack <strong>of</strong> unifying theory for protein<br />

folding mechanisms. Computational approaches based on micro-second to milliseconds<br />

atomistic [8-10] molecular dynamics (MD) simulations <strong>of</strong> protein folding have recently<br />

given some encouraging success for ab-initio folding <strong>of</strong> peptides <strong>and</strong> small proteins. In<br />

addition, the combined approach <strong>of</strong> quantum mechanics <strong>and</strong> molecular dynamics methods<br />

have shown the superior capability <strong>of</strong> physical based method to design new enzymatic<br />

reaction.[11] However, the application <strong>of</strong> these methods is still limited since they are<br />

considerably computational time dem<strong>and</strong>ing.[12] In this chapter, the approaches based on<br />

de novo design, quantum mechanics <strong>and</strong> molecular dynamics will not be covered. The<br />

reader can refer to different recent papers <strong>and</strong> reviews on these topics.[13-16]<br />

The second approach is the so-called directed evolution. The method is one <strong>of</strong> the<br />

most powerful approaches to improve or create new protein function by redesigning the<br />

protein structure.[17] It can, for example, improve activity or stability <strong>of</strong> biocatalyst under<br />

unnatural conditions (e.g. the presence <strong>of</strong> organic solvent) by accumulating multiple<br />

mutations.[17,18] Directed evolution involves multiple rounds <strong>of</strong> r<strong>and</strong>om mutagenesis or<br />

gene shuffling followed by screening <strong>of</strong> the mutant library.[19] The preliminary knowledge<br />

2


PART I: CAPDE<br />

<strong>of</strong> protein structure is not required in directed protein evolution. However, the structural<br />

information can focus <strong>and</strong> restrict the approach to specific subsets <strong>of</strong> amino acids (e.g.<br />

active site residues). A common problem <strong>of</strong> directed evolution methods is the limited<br />

distribution <strong>of</strong> generated sequence diversity that reduces the efficient sampling <strong>of</strong><br />

functional sequence space.[19,20]<br />

In summary, rational design via site directed or saturation mutagenesis <strong>and</strong> directed<br />

evolution via r<strong>and</strong>om mutagenesis are used as key tools in protein engineering. In both<br />

approaches, the sequence diversity is directly generated as point mutation, insertion or<br />

deletion within a single parental gene. Consequently, the improvement in the quality <strong>of</strong><br />

rationally designed libraries <strong>and</strong> techniques for sequence space exploration <strong>and</strong> diversity<br />

generation are critical for future advances.<br />

The combination <strong>of</strong> experimental <strong>and</strong> computational methods holds particular<br />

promise to tailor the proteins for tasks not yet exploited by natural selection.[21,22] In fact,<br />

most <strong>of</strong> the computational tools or web servers for directed evolution utilize, when it is<br />

possible, structural data to assist library generation processes. Since it is impossible to test<br />

more than a very small fraction <strong>of</strong> vast number <strong>of</strong> possible protein sequences, it urges to<br />

have a directed evolution strategy for generating sequence libraries with the highest<br />

chance to have variants with desired enzymatic properties. Such libraries can be designed<br />

by applying the current knowledge <strong>of</strong> the protein response towards mutations <strong>and</strong><br />

sequence-structure-function relationships.<br />

Thermo stability, solvent stability (pH <strong>and</strong> salt stability or co-solvents tolerance)<br />

<strong>and</strong> enzymatic activity (as improvement in both binding affinity <strong>and</strong> catalytic activity) are<br />

the properties commonly targeted by protein engineering experiments. The first two<br />

effects are subtle to predict due to their distributed effect on protein structure. For the<br />

enzymatic activity, different mutagenesis studies indicate that most <strong>of</strong> the mutations,<br />

affecting certain enzyme properties, as substrate specificity, enantioselectivity <strong>and</strong> new<br />

catalytic activities, are located into or near the active site.[21] Rational design approach is<br />

successful in targeting relevant active site residues for site-directed mutagenesis but less<br />

3


PART I: CAPDE<br />

effective for important residues located in the second coordination sphere <strong>of</strong> the active site.<br />

For these cases, the combination <strong>of</strong> r<strong>and</strong>om mutagenesis <strong>and</strong> computer-aided protein<br />

directed evolution (CAPDE) approaches can provide a winning strategy. The application <strong>of</strong><br />

computational methods in conjunction with directed evolution <strong>of</strong>fers the exciting promise<br />

to generate libraries having high frequency <strong>of</strong> active <strong>and</strong> improved variants.[23]<br />

Figure 1.1: Schematic representation <strong>of</strong> four CAPDE approaches (as the quarters <strong>of</strong> the circle): (1)<br />

generated diversity <strong>and</strong> library size (in red), (2) evolutionary conservation based focused library<br />

(in green), (3) structure-based focused library (in purple) <strong>and</strong> (4) mutational effects in protein (in<br />

cyan). The servers, tools <strong>and</strong> databases associated with the approaches are shown in boxes.<br />

4


PART I: CAPDE<br />

In this chapter, for the sake <strong>of</strong> clarity, the CAPDE approaches have been divided in<br />

four major areas, schematically represented in Figure 1.1. The first one comprises tools<br />

used for characterizing the library generated by mutagenesis methods mainly through the<br />

statistical approaches. The second <strong>and</strong> third areas are represented by tools that consider<br />

the evolutionary <strong>and</strong> structural information <strong>of</strong> the target protein to design the focused<br />

library. Multiple sequence or structure alignment (MSA) is the key approach used by these<br />

tools to identify variable or conserved positions in the target protein. The fourth part is<br />

dedicated to the tools for the prediction <strong>of</strong> mutational effects on protein structure <strong>and</strong><br />

function. These tools <strong>and</strong>/or web servers are based on machine learning, statistical or<br />

empirical approaches <strong>and</strong> predict mutational effect on protein stability <strong>and</strong>/or activity by<br />

estimating the relative free energy changes.[24]<br />

This chapter is divided in four parts following the division <strong>of</strong> CAPDE approaches. It<br />

aims to provide the concise information about currently available CAPDE methods to assist<br />

<strong>and</strong> design directed evolution experiments with the final goal to enhance the probability<br />

for identifying the mutants with desired properties. In particular, the reader will find a<br />

short overview <strong>and</strong> classification to novel database, web server <strong>and</strong> other computational<br />

tools that can provide relevant information for the interpretation <strong>of</strong> experimental results<br />

<strong>and</strong> have been developed in the last few years in the field <strong>of</strong> molecular modeling <strong>of</strong> protein<br />

structure. Finally <strong>and</strong> as previously mentioned, we are not going to take in consideration<br />

the methods that involve physical approach based on QM/MM or MD simulations.<br />

1.3. Generated diversity <strong>and</strong> library size<br />

The unbiased diversity generation followed by the screening <strong>of</strong> a statistically<br />

meaningful fraction <strong>of</strong> generated sequence space are fundamental challenges in directed<br />

evolution experiments.[25] The directed evolution strategy comprises two key steps: 1)<br />

generate diverse mutant libraries <strong>and</strong> 2) screen to identify the improved protein variants.<br />

The success <strong>of</strong> a directed evolution methods depends upon the quality <strong>of</strong> the mutant<br />

5


PART I: CAPDE<br />

library. The challenges <strong>and</strong> advances to generate the functionally diverse libraries have<br />

been reviewed in past year.[20,26] Computational tools can assist directed evolution in<br />

these two steps by in-silico analysis <strong>and</strong> screening <strong>of</strong> expected protein sequence space<br />

sampled by generated libraries (summarized in Table 1.1). Publicly available web servers,<br />

MAP (Mutagenesis Assistant Program)[25,27] <strong>and</strong> PEDAL-AA[28] were developed to<br />

estimate the diversity at protein level in the library generated by r<strong>and</strong>om mutagenesis<br />

method.<br />

Table 1.1: Summarizing computational tools to analyze amino acid diversity, size <strong>and</strong><br />

completeness <strong>of</strong> the library generated by mutagenesis methods.<br />

Approach Name Input<br />

Nucleotide<br />

MAP 2.0 3D<br />

sequence or<br />

[25,27]<br />

protein structure.<br />

Statistics <strong>of</strong><br />

Nucleotide<br />

generated<br />

sequence,<br />

diversity<br />

PEDEL-AA mutation rate,<br />

[28] library size, indel<br />

rate, nucleotide<br />

mutation matrix.<br />

Library size <strong>and</strong><br />

Library size GLUE-IT<br />

r<strong>and</strong>omization<br />

<strong>and</strong><br />

[28]<br />

techniques.<br />

completenes<br />

s<br />

Probability<br />

TopLib [30]<br />

required by<br />

Case study<br />

examples<br />

Cytochrome<br />

P450BM-3,[25] D-<br />

amino acid oxidase,<br />

Phytase [27]<br />

α-synuclein,<br />

Phosphoribosylpyro<br />

phosphate<br />

amidotransferase<br />

(purF) [29]<br />

R<strong>and</strong>omization<br />

scheme: NNK. NDT,<br />

NNB, NAY [28]<br />

R<strong>and</strong>omization<br />

scheme: NNN, NNB,<br />

URL<br />

http://map.jacob<br />

s-<br />

university.de/su<br />

bmission.html<br />

http://guinevere<br />

.otago.ac.nz/cgi-<br />

bin/aef/pedel-<br />

AA.pl<br />

http://guinevere<br />

.otago.ac.nz/cgi-<br />

bin/aef/glue-<br />

IT.pl<br />

http://stat.haifa.<br />

ac.il/~yuval/topl<br />

6


PART I: CAPDE<br />

library size <strong>and</strong><br />

r<strong>and</strong>omization<br />

techniques.<br />

NNK, MAX [30]<br />

ib/<br />

Figure 1.2: a) The MAP 2.0 3D analysis for the amino acid diversity generated by balanced epPCR<br />

(Taq (MnCl 2, G=A=C=T) method. Y-axis shows the original amino acid species <strong>and</strong> the X-axis shows<br />

the amino acid substitution patterns indicated from red (lowest probability) to blue (highest<br />

probability). The MAP 2.0 3D analysis is restricted to the active site residues (Ala11, Ser47, Thr48,<br />

Tyr137, Ile139, Lys165, Thr167, Gly189, Tyr190). For this analysis, the amino acids are grouped<br />

into four classes according to their chemical nature (charged, neutral, aromatic <strong>and</strong> aliphatic) with<br />

stop codon ((structure disrupting) <strong>and</strong> glycine/proline (helix destabilizing) as separate classes. The<br />

7


PART I: CAPDE<br />

probabilities <strong>of</strong> amino acid substitutions were mapped on the protein sequence <strong>and</strong> structure (PDB<br />

Id: 1NAL) <strong>of</strong> N-acetylneuraminic acid <strong>and</strong> represented in b <strong>and</strong> c, respectively. b) The Jmol [33]<br />

applet is used for the visualization <strong>of</strong> amino acid substitution patterns using RWB (Red-white-blue)<br />

color gradient scheme <strong>and</strong> active site residues as sticks. Y-axis shows sequence id, PDB id, amino<br />

acid name <strong>and</strong> in c) secondary structure elements (T: hydrogen bonded turn <strong>and</strong> bend, *: loop or<br />

irregular structure), d) normalized Cα b-factor to differentiate flexible (F) <strong>and</strong> rigid (R) residues,<br />

<strong>and</strong> e) relative solvent associability to identify exposed (E) or buried (B) residues.<br />

MAP [25] takes nucleotide sequence as input <strong>and</strong> assists to design better directed<br />

evolution strategy by providing the statistical analysis <strong>of</strong> r<strong>and</strong>om mutagenesis methods on<br />

protein level. The capabilities <strong>of</strong> MAP was extended in MAP 2.0 3D[27] server that predicts<br />

the residue mutability resulted by the mutational bias <strong>of</strong> r<strong>and</strong>om mutagenesis methods <strong>and</strong><br />

correlates the generated amino acid substitution patterns with the structural information<br />

<strong>of</strong> the target protein. In this way, the server <strong>of</strong>fers the possibility to analyze the<br />

consequences <strong>of</strong> the limitations <strong>of</strong> mutational preferences <strong>of</strong> r<strong>and</strong>om mutagenesis methods<br />

on protein level <strong>and</strong> their effects on protein structure.[25] The capability <strong>of</strong> the server was<br />

illustrated by the in-silico screening <strong>of</strong> different enzymes <strong>and</strong> the predicted results were in<br />

agreement with the experimental results.[27,31,32 ] Figure 1.2 shows an example <strong>of</strong> the<br />

MAP 2.0 3D output for active site residues <strong>of</strong> N-acetylneuraminic acid using epPCR<br />

method.[27]<br />

PEDAL-AA returns statistics, at amino acid level <strong>and</strong> for the libraries generated by<br />

epPCR method, after providing the nucleotide sequence with library size, mutation rate,<br />

indel rate <strong>and</strong> nucleotide mutation matrix.[28] CodonCalculator <strong>and</strong> AA-Calculator are two<br />

algorithms developed by Patrik et al. to select an appropriate r<strong>and</strong>omization scheme for<br />

library construction.[28] Two servers GLUE-IT <strong>and</strong> GLUE estimate amino acid diversity <strong>and</strong><br />

completeness in the generated library. Finally, the TopLib [30] web server assists to design<br />

saturation mutagenesis experiment by predicting the size or completeness <strong>of</strong> the generated<br />

library with the user-defined codon r<strong>and</strong>omization scheme using probabilistic approach.<br />

8


PART I: CAPDE<br />

1.4. Evolutionary conservation based focused library<br />

Multiple sequence or structure alignment (MS) is the most common approach to<br />

identify functionally significant or evolutionary variable regions in protein.[34] In CAPDE,<br />

several servers <strong>and</strong> databases use MSA with the physical <strong>and</strong> structural information <strong>of</strong><br />

protein or protein superfamilies. Table 1.2 contains a list <strong>of</strong> the tools considered in this<br />

chapter. ConSurf 2010 [35] server provides the evolutionary conservation pr<strong>of</strong>iles <strong>of</strong><br />

protein or nucleic acid sequence or structure by first identifying the conserved positions<br />

using MSA <strong>and</strong> then calculating the evolutionary conservation rate using an empirical<br />

Bayesian inference. ConSurf-DB [36] database make available the evolutionary<br />

conservation pr<strong>of</strong>iles <strong>of</strong> the available protein structures pre-calculated by ConSurf web<br />

server. The 3DM [37] server performs structure based multiple sequence alignments (MSA)<br />

<strong>of</strong> the members <strong>of</strong> a protein superfamily <strong>and</strong> provides the consensus data combined with<br />

other useful information, like interactions <strong>and</strong> solvent accessibility, about amino acid<br />

positions in protein with published mutation data.<br />

For more focused analysis <strong>of</strong> protein hotspots or amino acid patches, three<br />

interesting tools are available as st<strong>and</strong>alone programs or web servers. The Joint<br />

Evolutionary Tree (JET) method is more tuned to identify the conserved amino acids<br />

patches on protein interface by taking into account the physical-chemical properties <strong>and</strong><br />

evolutionary conservation <strong>of</strong> the surface residues.[38] The predicted protein interaction<br />

sites or core residues might be used in site-specific mutagenesis experiments. HotSprint<br />

[39] database provides information <strong>of</strong> the hotspots in protein interfaces using the sequence<br />

conservation score (calculated by Rate4Site algorithm [40]) <strong>of</strong> the residues <strong>and</strong> their<br />

solvent accessible surface area. HotSpot Wizard predicts the suitability <strong>of</strong> the mutagenesis<br />

<strong>of</strong> the amino acids in or near the active site using their evolutionary conservation<br />

information.[41] The server takes protein structure as input <strong>and</strong> provides a platform to<br />

experimentalists to select target amino acids for site directed mutagenesis to improve<br />

enzymatic properties like substrate specificities, activity <strong>and</strong> enantioselectivity.[41]<br />

MAP 2.0 3D [27] (Table 1.1, see previous paragraph) also provides the information <strong>of</strong><br />

9


PART I: CAPDE<br />

mutagenic hotspots generated due to the mutational preferences <strong>of</strong> the r<strong>and</strong>om<br />

mutagenesis methods with sequence <strong>and</strong> structural information <strong>of</strong> protein. Selecton [42]<br />

web server predicts the selective forces at each amino acid position in protein. The server<br />

performs the codon-based alignment on a set <strong>of</strong> the homologous nucleotide sequences <strong>and</strong><br />

uses the ratio <strong>of</strong> amino acids altered to silent substitutions (Ka/Ks) to estimate both the<br />

positive (>1) <strong>and</strong> purifying (


PART I: CAPDE<br />

The web server<br />

performs MSA <strong>and</strong><br />

ConSurf 2010<br />

[35]<br />

calculates evolutionary<br />

conservation rate to<br />

identify conserved<br />

positions in protein or<br />

GAL4<br />

transcription<br />

factor [35]<br />

http://consur<br />

f.tau.ac.il/<br />

nucleotide<br />

sequence/structure.<br />

The database provides<br />

ConSurf DB [36]<br />

the predicted results <strong>of</strong><br />

ConSurf [35] server for<br />

known protein<br />

Cytochrome c<br />

[36]<br />

http://consur<br />

fdb.tau.ac.il/in<br />

dex.php<br />

structures.<br />

Hotspot<br />

identificatio<br />

n<br />

The Evolutionary trace<br />

based method performs<br />

MSA on a set <strong>of</strong><br />

homologous sequences<br />

DNA<br />

polymerase I,<br />

DNA<br />

transferase,<br />

(from PSI-BLAST) after<br />

allophycocya<br />

Gibbs like sampling.<br />

nin, Leucine<br />

JET [38]<br />

The aligned<br />

homologous sequences<br />

are used to construct<br />

distance tree based on<br />

Neighbor Joining<br />

dehydrogena<br />

se, β-trypsin<br />

proteinase,<br />

phosphotrans<br />

ferase,<br />

http://www.i<br />

hes.fr/~carbo<br />

ne/data6/lege<br />

nda.htm<br />

algorithm. The<br />

human<br />

clustering method is<br />

CDC42 gene<br />

parameterized to<br />

regulation<br />

identify protein<br />

protein,<br />

interface or core<br />

oncogene<br />

residues by taking into<br />

protein,<br />

11


PART I: CAPDE<br />

account the physical-<br />

signal<br />

chemical properties <strong>and</strong><br />

transduction<br />

evolutionary<br />

protein etc<br />

conservation.<br />

[38]<br />

The database provides<br />

information about<br />

HotSprint<br />

Database [39]<br />

hotspots in protein<br />

interface using<br />

conservation rate <strong>and</strong><br />

Numb PTB<br />

domain [39]<br />

http://prism.c<br />

cbb.ku.edu.tr/<br />

hotsprint/<br />

solvent accessibility <strong>of</strong><br />

the residues.<br />

Haloalkane<br />

dehalogenase<br />

HotSpot wizard<br />

[41]<br />

The web server predicts<br />

residue mutability <strong>of</strong><br />

functionally important<br />

residues <strong>and</strong> visualizes<br />

it on protein sequence<br />

<strong>and</strong> structure.<br />

,<br />

Phosphotries<br />

terase, 1,3-<br />

1,4-b-D-<br />

Glucan 4-<br />

glucanohydro<br />

lase, β-<br />

http://loschm<br />

idt.chemi.mun<br />

i.cz/hotspotwi<br />

zard/<br />

Lactamase<br />

[41]<br />

The web server detects<br />

Selecton [42]<br />

selection forces on<br />

biologically significant<br />

sites in the target<br />

protein during<br />

TRIM5α<br />

protein [42]<br />

http://selecto<br />

n.tau.ac.il/ind<br />

ex.html<br />

evolutionary process.<br />

Protein<br />

superfamily<br />

3DM [37]<br />

The database performs<br />

structure based MSA for<br />

α/β<br />

hydrolase<br />

http://3dmcsi<br />

s.systemsbiol<br />

12


PART I: CAPDE<br />

based MSA<br />

a protein superfamily<br />

fold [53]<br />

ogy.nl/<br />

with sequence,<br />

structural, molecular<br />

interaction <strong>and</strong><br />

mutational information<br />

from literature.<br />

The Lipase<br />

Engineering<br />

Database<br />

[43,54,55]<br />

Lipases<br />

[43,54,55]<br />

http://www.l<br />

ed.unistuttgart.de/<br />

The database <strong>of</strong><br />

Epoxide<br />

epoxide<br />

hydrolases <strong>and</strong><br />

haloalkane<br />

dehalogenase<br />

The database performs<br />

hydrolases<br />

<strong>and</strong><br />

haloalkane<br />

dehalogenase<br />

http://www.l<br />

ed.unistuttgart.de/<br />

[56]<br />

protein superfamily<br />

[56]<br />

The Laccase<br />

based MSA <strong>and</strong><br />

http://www.l<br />

Engineering<br />

annotates functionally<br />

Laccases [45]<br />

cced.uni-<br />

database [45]<br />

relevant amino acid<br />

stuttgart.de/<br />

The Cytochrome<br />

P450<br />

engineering<br />

database [57]<br />

positions with<br />

structural <strong>and</strong><br />

mutational information.<br />

Cytochrome<br />

P450s [57]<br />

http://www.c<br />

yped.unistuttgart.de/<br />

The PHA<br />

Depolymerase<br />

Engineering<br />

Database [44]<br />

Polyhydroxya<br />

lkanoates<br />

depolymeras<br />

e [44]<br />

http://www.d<br />

ed.unistuttgart.de/<br />

The Lactamase<br />

Engineering<br />

database [46]<br />

Lactamases<br />

[46]<br />

http://www.l<br />

aced.unistuttgart.de/<br />

13


PART I: CAPDE<br />

SHV Lactamase<br />

Engineering<br />

Database [47]<br />

SHV<br />

lactamases<br />

[47]<br />

http://www.l<br />

aced.unistuttgart.de/cl<br />

assA/SHVED/<br />

PMD [48]<br />

The database provides<br />

literature based protein<br />

mutant information<br />

with structure <strong>and</strong><br />

functional annotation.<br />

http://pmd.d<br />

dbj.nig.ac.jp/<br />

~pmd/pmd.ht<br />

ml<br />

The database provides<br />

literature based protein<br />

Literature<br />

based<br />

ProTherm [49-<br />

51]<br />

mutant information<br />

with thermodynamic<br />

parameters <strong>and</strong><br />

experimental<br />

conditions integrated<br />

with sequence,<br />

http://gibk26.<br />

bio.kyutech.ac<br />

.jp/jouhou/Pr<br />

otherm/proth<br />

erm.html<br />

protein<br />

structure <strong>and</strong> function<br />

mutant data<br />

annotation.<br />

The database provides<br />

literature based protein<br />

MuteinDB [52]<br />

mutant information,<br />

kinetic parameters <strong>and</strong><br />

experimental<br />

conditions integrated<br />

with user-friendly <strong>and</strong><br />

flexible query system to<br />

fetch data using<br />

Cytochrome<br />

P450s [52]<br />

https://mutei<br />

ndb.genome.t<br />

ugraz.at/mute<br />

indb-web-<br />

2.0/faces/init<br />

/index.seam<br />

reaction name or<br />

substrate or inhibitor<br />

14


PART I: CAPDE<br />

name or structure <strong>and</strong><br />

mutations.<br />

1.5. Structure-based focused library<br />

The structure based approaches assist rational design <strong>and</strong> r<strong>and</strong>om mutagenesis by<br />

predicting regions in the protein responsible for stability <strong>and</strong> activity.[2,58] The<br />

computational tools as 3DLig<strong>and</strong>Site [59], ProBiS [60,61] (Protein Binding Site) <strong>and</strong><br />

SiteComp [62] predict lig<strong>and</strong> binding site in protein [63]. All these tools, in the absence <strong>of</strong><br />

crystal structure, use homology model <strong>of</strong> the target protein <strong>and</strong> aid the design <strong>and</strong> tune<br />

lig<strong>and</strong> binding site by identifying key residues for activity <strong>and</strong> their molecular interactions<br />

properties. 3DLig<strong>and</strong>Site [59] performs alignment <strong>and</strong> clustering <strong>of</strong> the homologous<br />

structures to predict lig<strong>and</strong> binding site. ProBiS [60,61] uses MSA to detect structurally<br />

similar binding site in protein <strong>and</strong> also perform local structural pairwise alignment to<br />

identify functionally relevant binding regions. The pre-calculated results <strong>of</strong> ProBiS analysis<br />

are available via ProBiS-database [64] as a repository <strong>of</strong> structurally similar binding sites.<br />

SiteComp [62] characterizes protein binding site using molecular interaction fields based<br />

descriptors. The server evaluates differences in similar binding sites, identification <strong>of</strong> subsites<br />

<strong>and</strong> residue contributions in lig<strong>and</strong> binding. TRITON [65,66] provides the single<br />

platform to protein engineers to model mutants, perform protein-lig<strong>and</strong> docking <strong>and</strong><br />

calculate reaction pathways. In this way, these methods facilitate to study the properties <strong>of</strong><br />

protein-lig<strong>and</strong> complexes.<br />

The knowledge <strong>of</strong> molecular interactions, contribute to relevant free energy barrier,<br />

<strong>and</strong> the design <strong>of</strong> surface charge distribution, can help to underst<strong>and</strong> the molecular basis <strong>of</strong><br />

kinetic stability <strong>and</strong> efficiently modulates the enhancement <strong>of</strong> protein stability.[58,67] PIC<br />

(Protein Interaction Calculator) server [68] calculates inter or intra protein interactions<br />

using published criteria integrated with solvent accessibility <strong>and</strong> residue depth<br />

calculations. Recently introduced web server, COCOMAP (bioCOmplexes COntact MAPs)<br />

15


PART I: CAPDE<br />

[69] uses intermolecular interactions to analyze interfaces in biological complexes. The<br />

identification <strong>of</strong> exposed <strong>and</strong> buried amino acids also helps to gain insight into protein<br />

stability <strong>and</strong> to explore the mutational effect on protein. DEPTH [70] employ distance<br />

information between residues <strong>and</strong> bulk solvent to predict protein stability, conservation or<br />

binding cavity based on information about residue depth <strong>and</strong> solvent accessibility. SRide<br />

[71] provides residual contribution to protein stability using interactions, evolutionary<br />

conservations <strong>and</strong> hydrophobicity <strong>of</strong> their neighboring residues. Patch finder plus [72]<br />

identifies residues that contribute to positively charge patches on protein surface <strong>and</strong><br />

might interact with DNA, membrane or the other protein. ConPlex [73] utilizes protein<br />

solvent accessible surface area to identify surface or interface residues <strong>and</strong> assign residue<br />

specific conservation score on sequence <strong>and</strong> structure <strong>of</strong> the protein complex. The server<br />

also provides the pre-calculated ConPlex results <strong>of</strong> known protein complexes as repository.<br />

Recent studies have suggested that protein flexibility <strong>and</strong> protein functions are<br />

strongly linked.[24,74,75] Protein flexibility plays an important role in both catalytic<br />

activity <strong>and</strong> molecular recognition processes. The effect <strong>of</strong> protein flexibility is particularly<br />

relevant in protein from extremophiles to balance rigidity required for stability <strong>and</strong><br />

flexibility necessary for activity [76-78]. In addition, numerous proteins have regions that<br />

adopt different conformation under different conditions, allowing them to take part in<br />

cellular <strong>and</strong> molecular regulation.[24,79] The residue flexibility in protein has been taken<br />

in account to describe a variety <strong>of</strong> protein properties including relation with thermal<br />

stability, catalytic activity, lig<strong>and</strong> binding (induced fit), domain motion, preferential<br />

solvation <strong>and</strong> molecular recognition in intrinsically disordered protein system. The Debye–<br />

Waller factor, reported in crystallographic atomic resolution structures, provides an rough<br />

estimation <strong>of</strong> local residue flexibility [80] <strong>and</strong> different servers provide this information as<br />

an indicator (for example, in MAP 2.0 3D server [27]). If the crystallographic structure is not<br />

available then different tools can be used to estimate flexibility pr<strong>of</strong>iles using different<br />

approaches.<br />

The RosettaBackrub [81] server can generate protein backbone structural variability<br />

as consequence <strong>of</strong> amino acid variations [82] that can be used to design sequence libraries<br />

16


PART I: CAPDE<br />

for experimental screening <strong>and</strong> to predict protein or peptide interaction specificity. The<br />

server generates Rosetta scored modeled structures for variant with single or multiple<br />

point mutations in monomeric proteins. It also generates near-native structural ensembles<br />

<strong>of</strong> protein backbone conformations <strong>and</strong> sequences consistent with those ensembles.<br />

Finally, it can predict sequences tolerated by proteins or protein interfaces using flexible<br />

backbone design methods. The tCONCOORD [83] method generates conformational<br />

ensembles to gain insight in the conformational flexibility <strong>and</strong> conformational space <strong>of</strong> the<br />

protein.<br />

FlexPred [84] specially predicts residue flexibility using pattern recognition<br />

approach to identify residue positions in conformations switches integrated with their<br />

evolutionary conservation <strong>and</strong> normalized solvent accessibility (if structure is available) as<br />

the Support Vector Machine (SVM) predictors.<br />

Different simplified methods have been proposed to identify local flexibility or large<br />

scale motions in protein at coarse-grained level [85-87] Many <strong>of</strong> these methods are based<br />

on Gaussian network model (GNM) [88] or its extension, the anisotropic network model<br />

(ANM) [89] to study protein dynamics using Normal Mode Analysis (NMA) (see the review<br />

[90] for a general overview about these topics). Table 1.3 shows the tools available to<br />

analyze conformational flexibility on protein structure (for more details see [91]). ElNemo<br />

[92] <strong>and</strong> WEBnb@ [93] servers are reported here to complete the information about NMA<br />

based tools. Both the servers perform NMA using coarse grain model to analyze the<br />

conformational changes in protein. FlexServ [94] server estimates protein flexibility using<br />

three different coarse-grained approaches: 1) discrete molecular dynamics (DMD), 2)<br />

normal mode analysis (NMA) <strong>and</strong> 3) Brownian dynamics (BD). The server characterizes<br />

protein flexibility by analyzing different structural <strong>and</strong> dynamic properties <strong>of</strong> the protein<br />

such as structural variations, essential modes, stiffness between the interacting residues<br />

<strong>and</strong> dynamic domains <strong>and</strong> hinge points. Different tools are available to identify hinge<br />

bending residues on large-scale protein motions. HINGEprot [95] server predicts hinge<br />

motion in protein using coarse grained GNM <strong>and</strong> ANM model. DynDom [96] use a rigorous<br />

approach to describe domain motion. The method determines hinge axes <strong>and</strong> hinge<br />

17


PART I: CAPDE<br />

bending residues using two conformations <strong>of</strong> the protein. A recent addition to DynDom is<br />

the lig<strong>and</strong>-induced domain movements in enzymes database.[97] Furthermore, the<br />

Dyndom3D [98] server provides a more advanced <strong>and</strong> generic tool that can be used to<br />

study any kind <strong>of</strong> polymer.<br />

The reader should be noticed that the connection between protein flexibility <strong>and</strong><br />

function has been investigated theoretically <strong>and</strong> experimentally only in the last few years.<br />

[87,99-101] The methods based on this approach provide a qualitative estimation <strong>of</strong><br />

protein dynamical properties but they do not take in account many effects (such as direct<br />

solvent effects) that are important for protein functionality. Till now, the atomistic<br />

simulation (MD or QM/MD) is the best approach to quantitatively study protein flexibility<br />

<strong>and</strong> dynamics.[8,87,99] Nevertheless, even to this level <strong>of</strong> accuracy, the connection<br />

between flexibility <strong>and</strong> functionality is still puzzling. In addition, the simulation approaches<br />

are still time consuming <strong>and</strong> unpractical for high-throughput modeling <strong>and</strong> analysis <strong>of</strong><br />

protein structural dynamics.<br />

Table 1.3: Summarizing the computational tools for structure-based focused library<br />

generation.<br />

Approach Name Description<br />

The web server<br />

identifies lig<strong>and</strong><br />

3DLig<strong>and</strong>Site<br />

binding site via MSA<br />

[59]<br />

Lig<strong>and</strong><br />

<strong>and</strong> clustering<br />

binding site<br />

algorithm.<br />

The web server<br />

ProBiS [60,61] detects binding site<br />

using MSA <strong>and</strong><br />

Case study<br />

examples<br />

Target<br />

T0483 in<br />

CASP8<br />

Biotin<br />

carboxylase,<br />

TATA<br />

URL<br />

http://www.sbg.bio.i<br />

c.ac.uk/~3dlig<strong>and</strong>sit<br />

e/<br />

http://probis.cmm.ki<br />

.si/<br />

18


PART I: CAPDE<br />

characterizes it<br />

binding<br />

using local<br />

protein [60],<br />

structural pairwise<br />

D-alanine–<br />

alignment.<br />

D-alanine<br />

ligase,<br />

Protein<br />

kinases C<br />

[61]<br />

The database<br />

provides<br />

ProBiS-<br />

structurally similar<br />

Cytochrome<br />

http://probis.cmm.ki<br />

database [64]<br />

protein binding site<br />

c [64]<br />

.si/?what=database<br />

using ProBiS<br />

algorithm.<br />

The web server<br />

SiteComp [62]<br />

characterizes lig<strong>and</strong><br />

binding site using<br />

molecular<br />

interaction<br />

Cyclooxygen<br />

ase,<br />

adenylate<br />

kinase [62]<br />

http://scbx.mssm.ed<br />

u/sitecomp/sitecom<br />

p-web/Input.html<br />

descriptors.<br />

The method<br />

facilitates to model<br />

mutant, dock lig<strong>and</strong><br />

TRITON<br />

[65,66]<br />

in the protein <strong>and</strong><br />

calculates reaction<br />

pathways for the<br />

characterization <strong>of</strong><br />

PA-IIL lectin<br />

<strong>and</strong> its<br />

mutants<br />

[65]<br />

http://www.ncbr.mu<br />

ni.cz/triton/descripti<br />

on.html<br />

protein-lig<strong>and</strong><br />

interactions using<br />

Semi-empirical<br />

19


PART I: CAPDE<br />

quantum-mechanics<br />

approach.<br />

The web server<br />

PIC [68]<br />

calculates the<br />

molecular<br />

interactions using<br />

-<br />

http://pic.mbu.iisc.er<br />

net.in/job.html<br />

published criteria.<br />

The web server<br />

Protein<br />

interaction<br />

COCOMAPS<br />

[69]<br />

analyzes <strong>and</strong><br />

visualizes interfaces<br />

in biological<br />

complexes using<br />

intermolecular<br />

contact maps based<br />

Hen egg<br />

lysozyme<br />

interaction<br />

with two<br />

antibodies<br />

https://www.molnac<br />

.unisa.it/BioTools/co<br />

comaps/<br />

on distance or<br />

[69]<br />

physicochemical<br />

properties.<br />

The web server<br />

predicts binding<br />

cavity <strong>and</strong><br />

West Nile<br />

mutational effect on<br />

Virus<br />

http://mspc.bii.a-<br />

DEPTH [70]<br />

protein stability<br />

NS2B/NS3<br />

star.edu.sg/tankp/int<br />

Residue<br />

using residue depth<br />

protease<br />

ro.html<br />

depth <strong>and</strong><br />

<strong>and</strong> solvent<br />

[70]<br />

stability<br />

accessible surface<br />

area.<br />

SRIde [71]<br />

The web serve<br />

predicts the<br />

contribution <strong>of</strong><br />

residues in protein<br />

TIM-barrel<br />

proteins<br />

[102]<br />

http://sride.enzim.h<br />

u/<br />

20


PART I: CAPDE<br />

stability using<br />

interactions with its<br />

spatial neighbors<br />

<strong>and</strong> their<br />

evolutionary<br />

conservation.<br />

The web server<br />

identifies large<br />

positively charged<br />

DNA binding<br />

Patch finder<br />

plus [72]<br />

electrostatic patches<br />

on protein surface<br />

using Poisson<br />

domain <strong>of</strong><br />

TATA<br />

binding<br />

http://pfp.technion.a<br />

c.il/<br />

Protein<br />

Boltzmann<br />

protein [72]<br />

surface <strong>and</strong><br />

electrostatic<br />

interface<br />

potential.<br />

The web server<br />

performs<br />

Rho–<br />

ConPlex [73]<br />

evolutionary<br />

RhoGAP<br />

http://sbi.postech.ac.<br />

conservation<br />

complex<br />

kr/ConPlex/<br />

analysis <strong>of</strong> the<br />

[73]<br />

protein complex.<br />

The web server<br />

performs flexible<br />

https://kortemmelab<br />

Protein<br />

flexibility<br />

RosettaBackru<br />

b [81]<br />

backbone modeling<br />

using Backrub [103]<br />

method to design<br />

hGH-hGHr<br />

interface<br />

[104]<br />

.ucsf.edu/backrub/cg<br />

i-<br />

bin/rosettaweb.py?q<br />

tolerated protein<br />

uery=index<br />

sequences.<br />

tCONCOORD<br />

The method<br />

Osmoprotec<br />

http://wwwuser.gw<br />

[83]<br />

generates<br />

tion protein<br />

dg.de/~dseelig/tcon<br />

21


PART I: CAPDE<br />

FlexPred [84]<br />

ElNemo [92]<br />

WEBnm@<br />

[93]<br />

FlexServ [94]<br />

HINGEprot<br />

[95]<br />

conformation<br />

ensemble <strong>and</strong><br />

transitions using<br />

geometrical<br />

constrains based<br />

prediction <strong>of</strong><br />

protein<br />

conformational<br />

flexibility<br />

The web server<br />

predicts residue<br />

flexibility in the<br />

protein using SVM<br />

approach.<br />

The web server<br />

predicts large<br />

amplitude motions<br />

in the protein using<br />

NMA.<br />

The web server<br />

determines <strong>and</strong><br />

analyzes protein<br />

flexibility using<br />

coarse-grained<br />

modeling approach.<br />

The web server<br />

detects hinge region<br />

[83] coord.html<br />

Human PrP http://flexpred.rit.al<br />

[105] bany.edu/<br />

HIV-1<br />

protease, E. http://igs-<br />

coli server.cnrs-<br />

membrane<br />

mrs.fr/elnemo/index<br />

channel .html<br />

protein TolC<br />

Calcium http://apps.cbu.uib.n<br />

ATPase [93] o/webnma/home<br />

http://mmb.pcb.ub.e<br />

-<br />

s/FlexServ/input.ph<br />

p<br />

Calmodulin http://www.prc.bou<br />

protein, n.edu.tr/appserv/prc<br />

22


PART I: CAPDE<br />

in the protein using<br />

hemoglobin<br />

/hingeprot/<br />

both GNM <strong>and</strong> ANM.<br />

[95]<br />

The web server<br />

predicts domain<br />

Hemoglobin,<br />

DynDom3D<br />

motions using<br />

70S<br />

http://fizz.cmp.uea.a<br />

[98]<br />

conformational<br />

ribosome<br />

c.uk/dyndom/3D/<br />

changes in the<br />

[98]<br />

protein.<br />

1.6. Mutational effects in protein<br />

For biotechnological applications, the enhancement <strong>of</strong> protein thermal stability or<br />

tolerance is a common requested task in protein engineering.[106] Highly stable structure<br />

correlates with well-packed highly compact structure <strong>and</strong> has increased tolerance to<br />

mutation because mostly the mutations are deleterious i.e. related to instability <strong>of</strong><br />

protein.[107] Generally the effect <strong>of</strong> the mutation on protein has been calculated by the<br />

free energy differences between two states <strong>of</strong> protein like thermodynamic stability as<br />

change in free energy in folded <strong>and</strong> unfolded state (ΔΔG). The mutational effect has been<br />

predicted by using different machine learning <strong>and</strong> selection methods (as SVM, Decision<br />

Tree (DT) or R<strong>and</strong>om Forest (RE) [108]) for classification or regression <strong>of</strong> data or by using<br />

statistical or empirical methods taking into account the atomic interactions or structural<br />

properties like solvent accessibility. Most <strong>of</strong> the servers based on these approaches use<br />

available information <strong>of</strong> mutational effects (fetched from databases like PMD [48],<br />

ProTherm [51]) to predict the effect <strong>of</strong> new substitutions. Table 1.4 summarizes the<br />

available tools to predict mutational effects on protein stability <strong>and</strong> activity using different<br />

methods. I-Mutant2.0 [109] <strong>and</strong> MUpro [110] are SVM based methods to predict stabilizing<br />

or destabilizing amino acid substitutions based on free energy change (ΔΔG). iPTREE-STAB<br />

[111] server employ a DT approach to predict the effect <strong>of</strong> single mutation on protein<br />

stability considering physicochemical properties <strong>and</strong> contact information <strong>of</strong> the substituted<br />

23


PART I: CAPDE<br />

amino acid with their neighboring amino acids. WET-STAB [112] server performs a similar<br />

prediction with an additional feature to predict protein stability changes upon double<br />

mutations from amino acid sequence. ProMAYA [113] uses RF machine learning algorithm<br />

to predict protein stability based on free energy difference. MuD (Mutation detector) uses<br />

the same algorithm for the classification <strong>of</strong> amino acid substitutions as neutral or<br />

deleterious by taking into account structure- <strong>and</strong> sequence-based features as solvent<br />

accessibility, binding site, sequence identity.[114] SDM (Site Directed Mutator) [115] <strong>and</strong><br />

PopMuSic2.1 [116] are statistical derived force field potential based methods for protein<br />

stability prediction using relative free energy differences. In PopMuSic2.1 [116], however,<br />

the parameters <strong>of</strong> statistical derived force field potential depend on protein solvent<br />

accessibility. FoldX plugin [117] <strong>and</strong> PEAT-SA [118] program suite utilize empirical force<br />

field to calculate, from three-dimensional protein or peptides structures, the relative free<br />

energy difference determined by the changes <strong>of</strong> interactions in the mutated structures.<br />

CUPSAT [119] estimates the effect <strong>of</strong> mutations on the protein stability using protein<br />

environment specific mean force potentials. The potentials are derived from statistical<br />

analysis <strong>of</strong> protein structure data sets. AUTO-MUTE [120,121] provides either energy based<br />

or machine learning methods for the prediction <strong>of</strong> protein stability by providing protein<br />

structure, mutation <strong>and</strong> experimental condition. SIFT (Sorts Intolerant From Tolerant)<br />

[122] server helps to explore the effect <strong>of</strong> mutation on protein function using sequence<br />

homology approach. The multiple alignment information is used to identify tolerated <strong>and</strong><br />

deleterious substitutions in the query sequence.<br />

A quantitative in-silico screening <strong>of</strong> the virtual libraries based on the cooperative<br />

effect <strong>of</strong> multiple mutations to the stability <strong>and</strong> functionality is still out <strong>of</strong> reach. However,<br />

the current methods allow a qualitative indication <strong>of</strong> possible mutation sites that can<br />

increase the chances to get higher population <strong>of</strong> stable <strong>and</strong> functionally active variants in<br />

the library. The available knowledge <strong>of</strong> mutational effects on protein provided by all these<br />

CAPDE approaches help to limit library size <strong>and</strong> focus to generate unpredictable<br />

substitutions that may lead to large effects. These libraries based on in-silico screening<br />

generally show a higher success rate when the starting protein has sufficient stability.<br />

24


PART I: CAPDE<br />

Table 1.4: Summarizing the computational tools to analyze the mutational effect on<br />

protein stability <strong>and</strong> activity.<br />

Approach Name Description URL<br />

SVM<br />

I-Mutant2.0<br />

[109]<br />

MUpro [110]<br />

The web server predicts<br />

protein stability change<br />

upon point mutation.<br />

http://folding.uib.es/imutant/i-mutant2.0.html<br />

http://mupro.proteomics.ic<br />

s.uci.edu/<br />

The web server predicts<br />

iPTREE-STAB<br />

protein stability change<br />

http://210.60.98.19/IPTRE<br />

[111]<br />

with residues<br />

Er/iptree.htm<br />

Decision tree<br />

information.<br />

(DT)<br />

The web server predicts<br />

WET-STAB [112]<br />

protein stability change<br />

upon double mutation<br />

with residue<br />

http://210.60.98.19/WETr<br />

/wet.htm<br />

information.<br />

R<strong>and</strong>om<br />

forests (RF)<br />

ProMAYA [113]<br />

MuD [114]<br />

The web server predicts<br />

mutational effect on<br />

protein function.<br />

http://bental.tau.ac.il/Pro<br />

Maya/<br />

http://mud.tau.ac.il/<br />

Statistical<br />

potential<br />

SDM [115]<br />

The web server predicts<br />

mutational effect on<br />

protein stability.<br />

http://mordred.bioc.cam.ac<br />

.uk/sdm/sdm.php<br />

based<br />

method<br />

PopMuSic2.1<br />

[116]<br />

The web server predicts<br />

thermodynamic stability<br />

change upon mutation.<br />

http://babylone.ulb.ac.be/<br />

popmusic/<br />

Empirical<br />

force field<br />

FoldX [117]<br />

The plugin predicts<br />

mutational effect on<br />

http://foldx.crg.es/<br />

25


PART I: CAPDE<br />

protein <strong>and</strong> facilitates in-<br />

silico alanine screening,<br />

mutant homology<br />

modeling <strong>and</strong><br />

interaction energy<br />

calculation.<br />

The program suite<br />

PEAT-SA [118]<br />

predict mutational effect<br />

on protein stability,<br />

lig<strong>and</strong> affinity <strong>and</strong> pKa<br />

http://enzyme.ucd.ie/PEA<br />

TSA/Pages/FrontPage.php<br />

values.<br />

The web server predicts<br />

CUPSAT [119]<br />

mutational effect on<br />

http://cupsat.tu-bs.de/<br />

protein stability.<br />

The web server predicts<br />

RF, SVM, Tree<br />

<strong>and</strong> SVM<br />

regression<br />

AUTO-MUTE<br />

[121]<br />

mutational effect on<br />

protein stability <strong>and</strong><br />

activity (up to 19<br />

http://proteins.gmu.edu/a<br />

utomute/<br />

mutations).<br />

Evolutionary<br />

conservation<br />

SIFT [122]<br />

The web server predicts<br />

mutational effect on<br />

protein function.<br />

http://sift.jcvi.org/<br />

1.7. Summary <strong>and</strong> outlook<br />

In this chapter, the recent additions to the CAPDE arsenal <strong>of</strong> computational tools,<br />

servers <strong>and</strong> databases have been briefly reviewed. The rapid accumulation <strong>of</strong> the<br />

knowledge on protein structures <strong>and</strong> sequence-structure-function relationships foresee<br />

the continuous amelioration <strong>of</strong> these methods. In particular, machine-learning approaches,<br />

26


PART I: CAPDE<br />

in which the volume <strong>of</strong> data is the heuristic key to access the hidden knowledge, statistical<br />

based force fields for coarse-grained approaches will surely benefit this trend. These<br />

approaches are not only the convenient aids to support lab experiments but also the<br />

workbench for heuristically blueprinting novel molecules. In addition, the availability <strong>of</strong> the<br />

low cost <strong>and</strong> high performance computers will soon transform currently expensive<br />

physically based simulations to the convenient <strong>and</strong> very accurate high throughput<br />

computational tools. This will make possible to predict structural stability <strong>and</strong> folds <strong>of</strong><br />

small or medium sized proteins <strong>and</strong> will open a new working style paradigm in protein<br />

engineering. In addition, the physical based approach has already shown promising results<br />

to underst<strong>and</strong> enzyme activity.[123,124]<br />

1.8. References<br />

1. Bornscheuer UT, Huisman GW, Kazlauskas RJ, Lutz S, Moore JC, et al. (2012)<br />

Engineering the third wave <strong>of</strong> biocatalysis. Nature 485: 185-194.<br />

2. Lutz S (2010) Beyond directed evolution-semi-rational protein engineering <strong>and</strong><br />

design. Curr Opin Biotech 21: 734-743.<br />

3. Gerlt JA, Babbitt PC (2009) Enzyme (re)design: lessons from natural evolution <strong>and</strong><br />

computation. Curr Opin Chem Biol 13: 10-18.<br />

4. Jackel C, Kast P, Hilvert D (2008) Protein design by directed evolution. Annu Rev<br />

Biophys 37: 153-173.<br />

5. Damborsky J, Brezovsky J (2009) Computational tools for designing <strong>and</strong> engineering<br />

biocatalysts. Curr Opin Chem Biol 13: 26-34.<br />

6. Suarez M, Jaramillo A (2009) Challenges in the computational design <strong>of</strong> proteins. J R<br />

Soc Interface 6 (Suppl 4): S477–S491.<br />

7. Pantazes RJ, Grisewood MJ, Maranas CD (2011) Recent advances in computational<br />

protein design. Curr Opin Struct Biol 21: 467-472.<br />

8. Dror RO, Dirks RM, Grossman JP, Xu H, Shaw DE (2012) Biomolecular simulation: a<br />

computational microscope for molecular biology. Annu Rev Biophys 41: 429-452.<br />

27


PART I: CAPDE<br />

9. Lee EH, Hsin J, Sotomayor M, Comellas G, Schulten K (2009) Discovery through the<br />

computational microscope. Structure 17: 1295-1306.<br />

10. Schlick T, Collepardo-Guevara R, Halvorsen LA, Jung S, Xiao X (2011)<br />

Biomolecularmodeling <strong>and</strong> simulation: a field coming <strong>of</strong> age. Q Rev Biophys 44: 191-<br />

228.<br />

11. McGeagh JD, Ranaghan KE, Mulholl<strong>and</strong> AJ (2011) Protein dynamics <strong>and</strong> enzyme<br />

catalysis: Insights from simulations. BBA-Proteins Proteom 1814: 1077-1092.<br />

12. Klepeis JL, Lindorff-Larsen K, Dror RO, Shaw DE (2009) Long-timescale molecular<br />

dynamics simulations <strong>of</strong> protein structure <strong>and</strong> function. Curr Opin Struct Biol 19:<br />

120-127.<br />

13. Barrozo A, Borstnar R, Marloie Gl, Kamerlin SCL (2012) Computational protein<br />

engineering: bridging the gap between rational design <strong>and</strong> laboratory evolution. Int<br />

J Mol Sci 13: 12428-12460.<br />

14. Frushicheva MP, Cao J, Warshel A (2011) Challenges <strong>and</strong> advances in validating<br />

enzyme design proposals: the case <strong>of</strong> kemp eliminase catalysis. Biochemistry 50:<br />

3849-3858.<br />

15. Frushicheva MP, Warshel A (2012) Towards quantitative computer-aided studies <strong>of</strong><br />

enzymatic enantioselectivity: the case <strong>of</strong> C<strong>and</strong>ida antarctica lipase A. Chembiochem<br />

13: 215-223.<br />

16. van der Kamp MW, Mulholl<strong>and</strong> AJ (2008) Computational enzymology: insight into<br />

biological catalysts from modelling. Nat Prod Rep 25: 1001-1014.<br />

17. Turner NJ (2009) Directed evolution drives the next generation <strong>of</strong> biocatalysts. Nat<br />

Chem Biol 5: 567-573.<br />

18. Arnold FH, Moore JC (1997) Optimizing industrial enzymes by directed evolution.<br />

Adv Biochem Eng Biotechnol 58: 1-14.<br />

19. Tracewell CA, Arnold FH (2009) Directed enzyme evolution: climbing fitness peaks<br />

one amino acid at a time. Curr Opin Chem Biol 13: 3-9.<br />

20. Wong TS, Roccatano D, Schwaneberg U (2007) Steering directed protein evolution:<br />

strategies to manage combinatorial complexity <strong>of</strong> mutant libraries. Environ<br />

Microbiol 9: 2645-2659.<br />

28


PART I: CAPDE<br />

21. Chica RA, Doucet N, Pelletier JN (2005) Semi-rational approaches to engineering<br />

enzyme activity: combining the benefits <strong>of</strong> directed evolution <strong>and</strong> rational design.<br />

Curr Opin Biotech 16: 378-384.<br />

22. Kazlauskas RJ, Bornscheuer UT (2009) Finding better protein engineering<br />

strategies. Nat Chem Biol 5: 526-529.<br />

23. Romero PA, Arnold FH (2009) Exploring protein fitness l<strong>and</strong>scapes by directed<br />

evolution. Nat Rev Mol Cell Biol 10: 866-876.<br />

24. Tokuriki N, Tawfik DS (2009) Stability effects <strong>of</strong> mutations <strong>and</strong> protein evolvability.<br />

Curr Opin Struct Biol 19: 596-604.<br />

25. Wong TS, Roccatano D, Zacharias M, Schwaneberg U (2006) A statistical analysis <strong>of</strong><br />

r<strong>and</strong>om mutagenesis methods used for directed protein evolution. J Mol Biol 355:<br />

858-871.<br />

26. Shivange AV, Marienhagen J, Mundhada H, Schenk A, Schwaneberg U (2009)<br />

Advances in generating functional diversity for directed protein evolution. Curr<br />

Opin Chem Biol 13: 19-25.<br />

27. Verma R, Schwaneberg U, Roccatano D (2012) MAP2.03D: a sequence/structure<br />

based server for protein engineering. ACS Synth Biol 1: 139-150.<br />

28. Firth AE, Patrick WM (2008) GLUE-IT <strong>and</strong> PEDEL-AA: new programmes for<br />

analyzing protein diversity in r<strong>and</strong>omized libraries. Nucleic Acids Res 36: W281-<br />

W285.<br />

29. Patrick WM, Matsumura I (2008) A study in molecular contingency: glutamine<br />

phosphoribosylpyrophosphate amidotransferase is a promiscuous <strong>and</strong> evolvable<br />

phosphoribosylanthranilate isomerase. J Mol Biol 377: 323-336.<br />

30. Nov Y (2011) When second best is good enough: another probabilistic look at<br />

saturation mutagenesis. Appl Environ Microbiol 78: 258-262.<br />

31. Rasila TS, Pajunen MI, Savilahti H (2009) Critical evaluation <strong>of</strong> r<strong>and</strong>om mutagenesis<br />

by error-prone polymerase chain reaction protocols, Escherichia coli mutator strain,<br />

<strong>and</strong> hydroxylamine treatment. Anal Biochem 388: 71-80.<br />

32. Ruff AJ, Marienhagen J, Verma R, Roccatano D, Genieser H-G, et al. (2012) dRTP <strong>and</strong><br />

dPTP a complementary nucleotide couple for the Sequence Saturation Mutagenesis<br />

(SeSaM) method. J Mol Catal B-Enzym 84: 40-47.<br />

29


PART I: CAPDE<br />

33. Jmol: an open-source Java viewer for chemical structures in 3D.<br />

http://www.jmol.org/<br />

34. Pei J (2008) Multiple protein sequence alignment. Curr Opin Struct Biol 18: 382-386.<br />

35. Ashkenazy H, Erez E, Martz E, Pupko T, Ben-Tal N (2010) ConSurf 2010: calculating<br />

evolutionary conservation in sequence <strong>and</strong> structure <strong>of</strong> proteins <strong>and</strong> nucleic acids.<br />

Nucleic Acids Res 38: W529-533.<br />

36. Goldenberg O, Erez E, Nimrod G, Ben-Tal N (2009) The ConSurf-DB: pre-calculated<br />

evolutionary conservation pr<strong>of</strong>iles <strong>of</strong> protein structures. Nucleic Acids Res 37:<br />

D323-D327.<br />

37. Kuipers RK, Joosten H-J, van Berkel WJH, Leferink NGH, Rooijen E, et al. (2010) 3DM:<br />

Systematic analysis <strong>of</strong> heterogeneous superfamily data to discover protein<br />

functionalities. Proteins 78: 2101-2113.<br />

38. Engelen S, Trojan LA, Sacquin-Mora S, Lavery R, Carbone A (2009) Joint<br />

Evolutionary Trees: a large-scale method to predict protein interfaces based on<br />

sequence sampling. PLoS Comput Biol 5: e1000267.<br />

39. Guney E, Tuncbag N, Keskin O, Gursoy A (2008) HotSprint: database <strong>of</strong><br />

computational hot spots in protein interfaces. Nucleic Acids Res 36: D662-D666.<br />

40. Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N (2002) Rate4Site: an algorithmic<br />

tool for the identification <strong>of</strong> functional regions in proteins by surface mapping <strong>of</strong><br />

evolutionary determinants within their homologues. Bioinformatics 18: S71-S77.<br />

41. Pavelka A, Chovancova E, Damborsky J (2009) HotSpot Wizard: a web server for<br />

identification <strong>of</strong> hot spots in protein engineering. Nucleic Acids Res 37: W376-<br />

W383.<br />

42. Stern A, Doron-Faigenboim A, Erez E, Martz E, Bacharach E, et al. (2007) Selecton<br />

2007: advanced models for detecting positive <strong>and</strong> purifying selection using a<br />

Bayesian inference approach. Nucleic Acids Res 35: W506-W511.<br />

43. Pleiss Jr, Fischer M, Peiker M, Thiele C, Rolf D (2000) Lipase Engineering Database:<br />

underst<strong>and</strong>ing <strong>and</strong> exploiting sequence-structure-function relationships. J Mol Catal<br />

B-Enzym 10: 491-508.<br />

30


PART I: CAPDE<br />

44. Knoll M, Hamm TM, Wagner F, Martinez V, Pleiss J (2009) The PHA Depolymerase<br />

Engineering Database: a systematic analysis tool for the diverse family <strong>of</strong><br />

polyhydroxyalkanoate (PHA) depolymerases. BMC Bioinformatics 10: 89.<br />

45. Sirim D, Wagner F, Wang L, Schmid RD, Pleiss J (2010) The Laccase Engineering<br />

Database: a classification <strong>and</strong> analysis system for laccases <strong>and</strong> related multicopper<br />

oxidases. Database 2011: bar006.<br />

46. Thai QK, Bos F, Pleiss J (2009) The Lactamase Engineering Database: a critical<br />

survey <strong>of</strong> TEM sequences in public databases. BMC Genomics 10: 390.<br />

47. Thai QK, Pleiss J (2010) SHV Lactamase Engineering Database: a reconciliation tool<br />

for SHV beta-lactamases in public databases. BMC Genomics 11: 563.<br />

48. Kawabata T, Ota M, Nishikawa K (1999) The Protein Mutant Database. Nucleic Acids<br />

Res 27: 355-357.<br />

49. Gromiha MM, Uedaira H, An J, Selvaraj S, Prabakaran P, et al. (2002) ProTherm,<br />

thermodynamic database for proteins <strong>and</strong> mutants: developments in version 3.0.<br />

Nucleic Acids Res 30: 301-302.<br />

50. Gromiha MM, An J, Kono H, Oobatake M, Uedaira H, et al. (2000) ProTherm, version<br />

2.0: thermodynamic database for proteins <strong>and</strong> mutants. Nucleic Acids Res 28: 283-<br />

285.<br />

51. Bava KA, Gromiha MM, Uedaira H, Kitajima K, Sarai A (2004) ProTherm, version 4.0:<br />

thermodynamic database for proteins <strong>and</strong> mutants. Nucleic Acids Res 32: D120-121.<br />

52. Braun A, Halwachs B, Geier M, Weinh<strong>and</strong>l K, Guggemos M, et al. (2012) MuteinDB:<br />

the mutein database linking substrates, products <strong>and</strong> enzymatic reactions directly<br />

with genetic variants <strong>of</strong> enzymes. Database 2012.<br />

53. Kourist R, Jochens H, Bartsch S, Kuipers R, Padhi SK, et al. (2010) The alpha/betahydrolase<br />

fold 3DM database (ABHDB) as a tool for protein engineering.<br />

Chembiochem 11: 1635-1643.<br />

54. Fischer M, Pleiss J (2003) The Lipase Engineering Database: a navigation <strong>and</strong><br />

analysis tool for protein families. Nucleic Acids Res 31: 319-321.<br />

55. Widmann M, Juhl PB, Pleiss J (2010) Structural classification by the Lipase<br />

Engineering Database: a case study <strong>of</strong> C<strong>and</strong>ida antarctica lipase A. BMC Genomics<br />

11: 123.<br />

31


PART I: CAPDE<br />

56. Barth S, Fischer M, Schmid RD, Pleiss J (2004) The database <strong>of</strong> epoxide hydrolases<br />

<strong>and</strong> haloalkane dehalogenases: one structure, many functions. Bioinformatics 20:<br />

2845-2847.<br />

57. Sirim D, Wagner F, Lisitsa A, Pleiss J (2009) The cytochrome P450 engineering<br />

database: Integration <strong>of</strong> biochemical properties. BMC Biochem 10: 27.<br />

58. Gong S, Worth CL, Bickerton GR, Lee S, Tanramluk D, et al. (2009) Structural <strong>and</strong><br />

functional restraints in the evolution <strong>of</strong> protein families <strong>and</strong> superfamilies. Biochem<br />

Soc Trans 37: 727-733.<br />

59. Wass MN, Kelley LA, Sternberg MJ (2010) 3DLig<strong>and</strong>Site: predicting lig<strong>and</strong>-binding<br />

sites using similar structures. Nucleic Acids Res 38: W469-473.<br />

60. Konc J, Janezic D (2010) ProBiS algorithm for detection <strong>of</strong> structurally similar<br />

protein binding sites by local structural alignment. Bioinformatics 26: 1160-1168.<br />

61. Konc J, Janezic D (2012) ProBiS-2012: web server <strong>and</strong> web services for detection <strong>of</strong><br />

structurally similar binding sites in proteins. Nucleic Acids Res 40: W214-221.<br />

62. Lin Y, Yoo S, Sanchez R (2012) SiteComp: a server for lig<strong>and</strong> binding site analysis in<br />

protein structures. Bioinformatics.<br />

63. Liang J, Tseng YY, Dundas J, Binkowski TA, Joachimiak A, et al. (2008) Predicting <strong>and</strong><br />

characterizing protein functions through matching geometric <strong>and</strong> evolutionary<br />

patterns <strong>of</strong> binding surfaces. Adv Protein Chem Struct Biol 75: 107-141.<br />

64. Konc J, Cesnik T, Konc JT, Penca M, Janezic D (2012) ProBiS-database: precalculated<br />

binding site similarities <strong>and</strong> local pairwise alignments <strong>of</strong> PDB structures. J Chem Inf<br />

Model 52: 604-612.<br />

65. Prokop M, Damborsky J, Koca J (2000) TRITON: in silico construction <strong>of</strong> protein<br />

mutants <strong>and</strong> prediction <strong>of</strong> their activities. Bioinformatics 16: 845-846.<br />

66. Prokop M, Adam J, Kriz Z, Wimmerova M, Koca J (2008) TRITON: a graphical tool for<br />

lig<strong>and</strong>-binding protein engineering. Bioinformatics 24: 1955-1956.<br />

67. Sanchez-Ruiz JM (2010) Protein kinetic stability. Biophys Chem 148: 1-15.<br />

68. Tina KG, Bhadra R, Srinivasan N (2007) PIC: Protein Interactions Calculator. Nucleic<br />

Acids Res 35: W473-476.<br />

32


PART I: CAPDE<br />

69. Vangone A, Spinelli R, Scarano V, Cavallo L, Oliva R (2011) COCOMAPS: a web<br />

application to analyse <strong>and</strong> visualize contacts at the interface <strong>of</strong> biomolecular<br />

complexes. Bioinformatics.<br />

70. Tan KP, Varadarajan R, Madhusudhan MS (2011) DEPTH: a web server to compute<br />

depth <strong>and</strong> predict small-molecule binding cavities in proteins. Nucleic Acids Res.<br />

71. Magyar C, Gromiha MM, Pujadas G, Tusnady GE, Simon I (2005) SRide: a server for<br />

identifying stabilizing residues in proteins. Nucleic Acids Res 33: W303-305.<br />

72. Shazman S, Celniker G, Haber O, Glaser F, M<strong>and</strong>el-Gutfreund Y (2007) Patch Finder<br />

Plus (PFplus): a web server for extracting <strong>and</strong> displaying positive electrostatic<br />

patches on protein surfaces. Nucleic Acids Res 35: W526-W530.<br />

73. Choi YS, Han SK, Kim J, Yang J-S, Jeon J, et al. (2010) ConPlex: a server for the<br />

evolutionary conservation analysis <strong>of</strong> protein complex structures. Nucleic Acids Res<br />

38: W450-W456.<br />

74. Teilum K, Olsen JG, Kragelund BB (2011) Protein stability, flexibility <strong>and</strong> function.<br />

Biochim Biophys Acta 1814: 969-976.<br />

75. Teilum K, Olsen JG, Kragelund BB (2009) Functional aspects <strong>of</strong> protein flexibility.<br />

Cell Mol Life Sci 66: 2231-2247.<br />

76. Henzler-Wildman K, Kern D (2007) Dynamic personalities <strong>of</strong> proteins. Nature 450:<br />

964-972.<br />

77. Mittermaier AK, Kay LE (2009) Observing biological dynamics at atomic resolution<br />

using NMR. Trends Biochem Sci 34: 601-611.<br />

78. Martinez R, Schwaneberg U, Roccatano D (2011) Temperature effects on structure<br />

<strong>and</strong> dynamics <strong>of</strong> the psychrophilic protease subtilisin S41 <strong>and</strong> its thermostable<br />

mutants in solution. Protein Eng Des Sel 24: 533-544.<br />

79. Ma B, Nussinov R (2010) Enzyme dynamics point to stepwise conformational<br />

selection in catalysis. Curr Opin Chem Biol 14: 652-659.<br />

80. Zhang H, Zhang T, Chen K, Shen SY, Ruan JS, et al. (2009) On the relation between<br />

residue flexibility <strong>and</strong> local solvent accessibility in proteins. Proteins 76: 617-636.<br />

81. Lauck F, Smith CA, Friedl<strong>and</strong> GF, Humphris EL, Kortemme T (2010) RosettaBackruba<br />

web server for flexible backbone protein structure modeling <strong>and</strong> design. Nucleic<br />

Acids Res 38: W569-W575.<br />

33


PART I: CAPDE<br />

82. M<strong>and</strong>ell DJ, Kortemme T (2009) Backbone flexibility in computational protein<br />

design. Curr Opin Biotech 20: 420-428.<br />

83. Seeliger D, Haas Jr, de Groot BL (2007) Geometry-based sampling <strong>of</strong> conformational<br />

transitions in proteins. Structure 15: 1482-1492.<br />

84. Kuznetsov IB, McDuffie M (2008) FlexPred: a web-server for predicting residue<br />

positions involved in conformational switches in proteins. Bioinformation 3: 134-<br />

136.<br />

85. Bahar I, Lezon TR, Yang L-W, Eyal E (2010) Global dynamics <strong>of</strong> proteins: bridging<br />

between structure <strong>and</strong> function. Ann Rev Biophys 39: 23-42.<br />

86. Bahar I, Rader AJ (2005) Coarse-grained normal mode analysis in structural biology.<br />

Curr Opin Struct Biol 15: 586-592.<br />

87. Kamerlin SCL, Vicatos S, Dryga A, Warshel A (2011) Coarse-grained (multiscale)<br />

simulations in studies <strong>of</strong> biophysical <strong>and</strong> chemical systems. Annu Rev Phys Chem<br />

62: 41-64.<br />

88. Bahar I, Atilgan AR, Erman B (1997) Direct evaluation <strong>of</strong> thermal fluctuations in<br />

proteins using a single-parameter harmonic potential. Fold Des 2: 173-181.<br />

89. Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Keskin O, et al. (2001) Anisotropy <strong>of</strong><br />

fluctuation dynamics <strong>of</strong> proteins with an elastic network model. Biophys J 80: 505-<br />

515.<br />

90. Skjaerven L, Hollup SM, Reuter N (2009) Normal mode analysis for proteins. J Mol<br />

Struc- Theochem 898: 42-48.<br />

91. Liu X, Karimi HA (2007) High-throughput modeling <strong>and</strong> analysis <strong>of</strong> protein<br />

structural dynamics. Brief Bioinform 8: 432-445.<br />

92. Suhre K, Sanejou<strong>and</strong> Y-H (2004) ElNemo: a normal mode web server for protein<br />

movement analysis <strong>and</strong> the generation <strong>of</strong> templates for molecular replacement.<br />

Nucleic Acids Res 32: W610-W614.<br />

93. Hollup S, Salensminde G, Reuter N (2005) WEBnm@: a web application for normal<br />

mode analyses <strong>of</strong> proteins. BMC Bioinformatics 6: 52.<br />

94. Camps J, Carrillo O, Emperador A, Orellana L, Hospital A, et al. (2009) FlexServ: an<br />

integrated tool for the analysis <strong>of</strong> protein flexibility. Bioinformatics 25: 1709-1710.<br />

34


PART I: CAPDE<br />

95. Emekli U, Schneidman-Duhovny D, Wolfson HJ, Nussinov R, Haliloglu T (2008)<br />

HingeProt: automated prediction <strong>of</strong> hinges in protein structures. Proteins 70: 1219-<br />

1227.<br />

96. Hayward S, Berendsen HJC (1998) Systematic analysis <strong>of</strong> domain motions in<br />

proteins from conformational change: New results on citrate synthase <strong>and</strong> T4<br />

lysozyme. Proteins 30: 144-154.<br />

97. Qi GY, Hayward S (2009) Database <strong>of</strong> lig<strong>and</strong>-induced domain movements in<br />

enzymes. BMC Struct Biol 9.<br />

98. Poornam GP, Matsumoto A, Ishida H, Hayward S (2009) A method for the analysis <strong>of</strong><br />

domain movements in large biomolecular complexes. Proteins 76: 201-212.<br />

99. Glowacki DR, Harvey JN, Mulholl<strong>and</strong> AJ (2012) Taking Ockham's razor to enzyme<br />

dynamics <strong>and</strong> catalysis. Nature Chemistry 4: 169-176.<br />

100. Pisliakov AV, Cao J, Kamerlin SCL, Warshel A (2009) Enzyme millisecond<br />

conformational dynamics do not catalyze the chemical step. Proc Natl Acad Sci USA<br />

106: 17359-17364.<br />

101. Roca M, Vardi-Kilshtain A, Warshel A (2009) Toward accurate screening in<br />

computer-aided enzyme design. Biochemistry 48: 3046-3056.<br />

102. Gromiha MM, Pujadas G, Magyar C, Selvaraj S, Simon I (2004) Locating the<br />

stabilizing residues in (α/β)8 barrel proteins based on hydrophobicity, long-range<br />

interactions, <strong>and</strong> sequence conservation. Proteins 55: 316-329.<br />

103. Davis IW, Arendall WB, 3rd, Richardson DC, Richardson JS (2006) The backrub<br />

motion: how protein backbone shrugs when a sidechain dances. Structure 14: 265-<br />

274.<br />

104. Humphris EL, Kortemme T (2008) Prediction <strong>of</strong> Protein-Protein Interface Sequence<br />

Diversity Using Flexible Backbone Computational Protein Design. Structure 16:<br />

1777-1788.<br />

105. Kuznetsov IB (2008) Ordered conformational change in the protein backbone:<br />

prediction <strong>of</strong> conformationally variable positions from sequence <strong>and</strong> low-resolution<br />

structural data. Proteins 72: 74-87.<br />

106. Bloom JD, Arnold FH (2009) In the light <strong>of</strong> directed evolution: pathways <strong>of</strong> adaptive<br />

protein evolution. Proc Natl Acad Sci USA 106 Suppl 1: 9995-10000.<br />

35


PART I: CAPDE<br />

107. Tokuriki N, Stricher F, Serrano L, Tawfik DS (2008) How protein stability <strong>and</strong> new<br />

functions trade <strong>of</strong>f. PLoS Comput Biol 4: e1000002.<br />

108. Saeys Y, Inza I, Larranaga P (2007) A review <strong>of</strong> feature selection techniques in<br />

bioinformatics. Bioinformatics 23: 2507-2517.<br />

109. Capriotti E, Fariselli P, Casadio R (2005) I-Mutant2.0: predicting stability changes<br />

upon mutation from the protein sequence or structure. Nucleic Acids Res 33: W306-<br />

310.<br />

110. Cheng J, R<strong>and</strong>all A, Baldi P (2006) Prediction <strong>of</strong> protein stability changes for singlesite<br />

mutations using support vector machines. Proteins 62: 1125-1132.<br />

111. Huang LT, Gromiha MM, Ho SY (2007) iPTREE-STAB: interpretable decision tree<br />

based method for predicting protein stability changes upon mutations.<br />

Bioinformatics 23: 1292-1293.<br />

112. Huang LT, Gromiha MM (2009) Reliable prediction <strong>of</strong> protein thermostability<br />

change upon double mutation from amino acid sequence. Bioinformatics 25: 2181-<br />

2187.<br />

113. Wainreb G, Wolf L, Ashkenazy H, Dehouck Y, Ben-Tal N (2011) Protein stability: a<br />

single recorded mutation aids in predicting the effects <strong>of</strong> other mutations in the<br />

same amino acid site. Bioinformatics 27: 3286-3292.<br />

114. Wainreb G, Ashkenazy H, Bromberg Y, Starovolsky-Shitrit A, Haliloglu T, et al.<br />

(2010) MuD: an interactive web server for the prediction <strong>of</strong> non-neutral<br />

substitutions using protein structural data. Nucleic Acids Res 38: W523-W528.<br />

115. Worth CL, Preissner R, Blundell TL (2011) SDM:a server for predicting effects <strong>of</strong><br />

mutations on protein stability <strong>and</strong> malfunction. Nucleic Acids Res 39: W215-W222.<br />

116. Dehouck Y, Kwasigroch J, Gilis D, Rooman M (2011) PoPMuSiC 2.1: a web server for<br />

the estimation <strong>of</strong> protein stability changes upon mutation <strong>and</strong> sequence optimality.<br />

BMC Bioinformatics 12: 151.<br />

117. Van Durme J, Delgado J, Stricher F, Serrano L, Schymkowitz J, et al. (2011) A<br />

graphical interface for the FoldX forcefield. Bioinformatics 27: 1711-1712.<br />

118. Johnston MA, Søndergaard CR, Nielsen JE (2011) Integrated prediction <strong>of</strong> the effect<br />

<strong>of</strong> mutations on multiple protein characteristics. Proteins 79: 165-178.<br />

36


PART I: CAPDE<br />

119. Parthiban V, Gromiha MM, Schomburg D (2006) CUPSAT: prediction <strong>of</strong> protein<br />

stability upon point mutations. Nucleic Acids Res 34: W239-242.<br />

120. Masso M, Vaisman II (2008) Accurate prediction <strong>of</strong> stability changes in protein<br />

mutants by combining machine learning with structure based computational<br />

mutagenesis. Bioinformatics 24: 2002-2009.<br />

121. Masso M, Vaisman II (2010) AUTO-MUTE: web-based tools for predicting stability<br />

changes in proteins due to single amino acid replacements. Protein Eng Des Sel 23:<br />

683-687.<br />

122. Kumar P, Henik<strong>of</strong>f S, Ng PC (2009) Predicting the effects <strong>of</strong> coding non-synonymous<br />

variants on protein function using the SIFT algorithm. Nat Protoc 4: 1073-1081.<br />

123. Adamczyk AJ, Cao J, Kamerlin SC, Warshel A (2011) Catalysis by dihydr<strong>of</strong>olate<br />

reductase <strong>and</strong> other enzymes arises from electrostatic preorganization, not<br />

conformational motions. Proc Natl Acad Sci USA 108: 14115-14120.<br />

124. Ishikita H, Warshel A (2008) Predicting drug-resistant mutations <strong>of</strong> HIV protease.<br />

Angew Chem Int Edit 47: 697-700.<br />

Part <strong>of</strong> this chapter is adapted with permission from ‘Verma R, Schwaneberg U,<br />

Roccatano D. Computational <strong>and</strong> Structural Biotechnology Journal 2012, 2 (3),<br />

e201209008.’<br />

37


PART I: MAP 2.0 3D<br />

Chapter 2<br />

MAP 2.0 3D: A Sequence/Structure Based Server for<br />

Protein Engineering<br />

2.1. Abstract<br />

The Mutagenesis Assistant Program (MAP) is a web-based tool to provide statistical<br />

analyses <strong>of</strong> the mutational biases <strong>of</strong> directed evolution experiments on amino acid<br />

substitution patterns. MAP analysis assists protein engineering in the benchmarking <strong>of</strong><br />

r<strong>and</strong>om mutagenesis methods that generate single nucleotide mutation in a codon. Herein,<br />

we describe a completely renewed <strong>and</strong> improved version <strong>of</strong> the MAP server, named as<br />

MAP 2.0 3D server that correlates the generated amino acid substitution patterns to the<br />

structural information <strong>of</strong> the target protein. This correlation helps to select more suitable<br />

r<strong>and</strong>om mutagenesis method with specific biases on amino acid substitution patterns. In<br />

particular, the new server represents MAP indicators on secondary <strong>and</strong> tertiary structure,<br />

<strong>and</strong> correlates them to specific structural components like hydrogen bonds, hydrophobic<br />

contacts, salt bridges, solvent accessibility <strong>and</strong> crystallographic B-factors. Three model<br />

proteins (D-amino oxidase, phytase <strong>and</strong> N-acetylneuraminic acid aldolase) are used to<br />

illustrate the novel capability <strong>of</strong> the server. MAP 2.0 3D server is available publicly at<br />

http://map.jacobs-university.de/map3d.html.<br />

38


PART I: MAP 2.0 3D<br />

2.2. Introduction<br />

Over the past two decades directed protein evolution has been proven to be a<br />

powerful algorithm to tailor protein properties through iterative rounds <strong>of</strong> r<strong>and</strong>om<br />

mutagenesis <strong>and</strong> screening for improved protein variants.[1,2] Directed evolution methods<br />

are especially useful for improving properties difficult to rationalize <strong>and</strong>, hence, to identify<br />

amino acids <strong>and</strong> protein regions that can guide to further enhancements using site directed<br />

<strong>and</strong> saturation mutagenesis methods.[3,4] The success <strong>of</strong> a directed evolution campaign<br />

depends highly on the quality <strong>of</strong> the mutant library <strong>and</strong> on the employed r<strong>and</strong>om<br />

mutagenesis method. R<strong>and</strong>om mutagenesis methods are based on specific error prone<br />

polymerase (enzymatic methods), DNA modifying chemicals (e.g. nitrous acid) or mutator<br />

strains (e.g. Escherichia coli mutA).[5] The quality <strong>of</strong> a mutant library is determined by the<br />

generated genetic diversity <strong>and</strong> corresponding protein sequence space.[6] Since the<br />

number <strong>of</strong> muteins boost with the increasing number <strong>of</strong> amino acid exchanged in the<br />

protein, protein engineers are challenged with an astronomically vast sequence space[7].<br />

Despite advances in high-throughput screening, it is very difficult to screen the<br />

theoretically generated diversity even in the case <strong>of</strong> a small protein sequence.[8,9]<br />

Therefore, generating high quality mutant libraries enriched with functional trait is <strong>of</strong> high<br />

importance. To deal with the challenge to access <strong>and</strong> screen such a large sequence space,<br />

protein engineers usually adopt two strategic approaches.[10-12] The first approach<br />

consists in the r<strong>and</strong>om mutagenesis <strong>of</strong> the target protein <strong>and</strong> the subsequent identification<br />

<strong>of</strong> ‘mutagenic hot spots’. R<strong>and</strong>om mutagenesis can be followed by recombination <strong>of</strong> the<br />

best variants by site directed mutagenesis or saturation mutagenesis.[13] The second<br />

approach involves the identification <strong>of</strong> a subset <strong>of</strong> specific residues using rational or semirational<br />

design with the help <strong>of</strong> computational tools.[14] Up to five amino acid positions<br />

can be efficiently targeted with focused mutagenesis methods allowing the generation <strong>of</strong><br />

focused mutant libraries <strong>of</strong> a number <strong>of</strong> variants that can be screened with the state <strong>of</strong> the<br />

art in flow cytometry methods.[13] Focused mutagenesis is normally employed to improve<br />

the properties <strong>of</strong> target protein such as activity or selectivity, by mutating residues in close<br />

39


PART I: MAP 2.0 3D<br />

proximity to a specific protein region like the active site. In this case, r<strong>and</strong>om mutagenesis<br />

methods are complementary to the rational design since they can identify important amino<br />

acid positions, especially in the second <strong>and</strong> third coordination sphere, which would have<br />

been overlooked rationally. Nevertheless, r<strong>and</strong>om mutagenesis methods are biased toward<br />

certain nucleotide exchanges (e.g. many epPCR methods prefer transition mutations). The<br />

mutagenic preferences resulted by biased r<strong>and</strong>om mutagenesis methods affect the<br />

generated diversity. The analysis <strong>of</strong> the effects <strong>of</strong> mutational bias on the amino acid<br />

diversity provides a useful indicator in the selection <strong>of</strong> the mutagenesis method with<br />

diverse <strong>and</strong> complementary amino acid substitution patterns. The generated<br />

complementary mutant libraries extend the sampling <strong>of</strong> the vast protein space <strong>and</strong><br />

enhance the chance to obtain improved variants.[15,16]<br />

Recently, we have introduced a freely available web-based statistical analysis tool<br />

(MAP[17]). The server statistically analyzes the effects <strong>of</strong> mutational bias <strong>of</strong> 19 different<br />

r<strong>and</strong>om mutagenesis methods on the level <strong>of</strong> amino acid substitutions for a given<br />

nucleotide sequence <strong>of</strong> the target protein. The analysis is returned in terms <strong>of</strong> MAP<br />

indicators that allow a rapid comparison <strong>of</strong> different r<strong>and</strong>om mutagenesis methods on the<br />

protein level. It has been shown that this approach can be used to predict the type, extent,<br />

<strong>and</strong> chemical nature <strong>of</strong> the genetic diversity generated by different mutagenesis<br />

methods.[17,18] Recently, Rasila <strong>and</strong> co-workers[19] reported a comparative evolution <strong>of</strong><br />

commonly used r<strong>and</strong>om mutagenesis methods. They found the experimentally induced<br />

substitution patterns very similar to those obtained by MAP server <strong>and</strong> suggested the use<br />

<strong>of</strong> combination <strong>of</strong> mutagenesis methods to generate high diversity.[19]<br />

One <strong>of</strong> the limitations <strong>of</strong> the original MAP server is the absence <strong>of</strong> the analysis tools<br />

relating the MAP indicators to the structural properties <strong>of</strong> the target protein. The nature <strong>of</strong><br />

the amino acid change in different region <strong>of</strong> the protein can affects its global <strong>and</strong> local<br />

structural <strong>and</strong> thermodynamics properties.[14,20,21] Therefore, the possibility to<br />

correlate the generated diversity with structural properties help to identify in advance the<br />

r<strong>and</strong>om mutagenesis method that has the least number <strong>of</strong> “deleterious” mutations on the<br />

protein stability <strong>and</strong> the higher probability to introduce amino acid substitutions that may<br />

40


PART I: MAP 2.0 3D<br />

improve the fitness toward an expected property, e.g. substitutions to charged amino acid<br />

residues to increase solubility in water. For this reason, we have exp<strong>and</strong>ed the capability <strong>of</strong><br />

the server by introducing these new features. The new server (MAP 2.0 3D) can correlate the<br />

mutational propensity at amino acid level <strong>of</strong> a gene for 19 r<strong>and</strong>om mutagenesis methods<br />

(<strong>and</strong> now also for a user customized r<strong>and</strong>om mutagenesis method) with the<br />

crystallographic or homology modeled structure (if available in Protein Data Bank[22]<br />

format) <strong>of</strong> the target protein. MAP 2.0 3D analyses the three-dimensional structure <strong>of</strong> the<br />

target proteins by calculating secondary structure elements, important local interactions<br />

(such as hydrogen bonds, hydrophobic contacts, salt bridges, disulphide bridges, solvent<br />

accessibility), <strong>and</strong> amino acid motilities from the crystallographic B-factors. These<br />

combined information help to identify biased amino acid substitutions that may improve<br />

stability <strong>and</strong> function <strong>of</strong> the protein.[23-25]<br />

To correlate the sequence-based analysis to the structural data analysis, a new<br />

indicator, the residue mutability indicator ‘µ’ (amino acid substitution probability leading<br />

to amino acid change at specific position), has been introduced (see Methods). The<br />

mutability indicator allows a rapid identification <strong>of</strong> mutagenic hot spots <strong>and</strong>, more easy<br />

comparison <strong>of</strong> experimental data to the predicted ones.<br />

This chapter illustrates the new features <strong>of</strong> MAP 2.0 3D server in detail, performing<br />

the analysis on three model proteins. The results <strong>of</strong> the MAP 2.0 3D analysis are compared<br />

with the results <strong>of</strong> protein engineering experiments reported in the literature. The three<br />

examples show possible uses <strong>of</strong> the server for computational pre-screening <strong>of</strong> the target<br />

protein to evaluate <strong>and</strong> select mutagenesis method for directed evolution.<br />

2.3. Methods<br />

2.3.1. Mutational probability <strong>and</strong> statistics<br />

41


PART I: MAP 2.0 3D<br />

The MAP 2.0 3D server performs statistical analysis on a given nucleotide sequence<br />

based on the mutational spectra <strong>of</strong> different r<strong>and</strong>om mutagenesis methods that were<br />

slightly elaborated as follow to be used in the analysis.[17] First, insertions <strong>and</strong> deletions<br />

with an occurrence frequency between 0.80 % <strong>and</strong> 13.9 % were neglected <strong>and</strong> remaining<br />

nucleotide substitution frequencies were scaled proportionally to 100 %. Second,<br />

mutations in upper <strong>and</strong> lower DNA str<strong>and</strong> were considered to occur with equal frequency.<br />

The scaled mutational frequencies are used in the analysis to calculate the probability <strong>of</strong><br />

amino acid substitutions resulting from one nucleotide exchange in one codon <strong>of</strong> the gene.<br />

The analysis is performed as follows. Consider a gene coding for a protein <strong>of</strong> L amino acids.<br />

For each nucleotide <strong>of</strong> a codon (named as X,Y,Z) in the gene sequence, the corresponding<br />

single nucleotide substitutions X´,Y´,Z´ (with {X, Y, Z, X´, Y´, Z’ ∈ {A, T, G, C} | X´ ≠ X, Y’ ≠ Y, Z’ ≠<br />

Z}) are considered. For each one <strong>of</strong> the 19 r<strong>and</strong>om mutagenesis methods, matrix P (in<br />

equation 2.1) gives the 16 mutational probability values for the given nucleotide<br />

substitution into another three (e.g. X → X´). The values <strong>of</strong> matrix P have been already<br />

reported in Table 1 <strong>of</strong> our previous publication.[17]<br />

⎛<br />

⎜<br />

P= ⎜<br />

⎜<br />

⎜<br />

⎝<br />

A → A A → T A → G A → C<br />

T → A T → T T → G T → C<br />

G → A G → T G → G G → C<br />

C → A C → T C → G C → C<br />

⎞<br />

⎟<br />

⎟<br />

⎟<br />

⎟<br />

⎠<br />

(2.1)<br />

In equation 2.2, the binary vector U <strong>and</strong> V are then used to select a given probability<br />

(f) from the matrix P. The four elements <strong>of</strong> U <strong>and</strong> V correspond to the nucleotide (A, T, G, C)<br />

that can be selected by assigning a value <strong>of</strong> one or zero. U selects the original nucleotide<br />

<strong>and</strong> V the mutated one. In the equation 2.2, an example for the epPCR method with the Taq-<br />

Polymerase (unbalanced dNTPs) is given as matrix P. In this example the mutational<br />

probability for the transformation <strong>of</strong> nucleotide A → T gives a value <strong>of</strong> f = 9.7.<br />

42


PART I: MAP 2.0 3D<br />

⎛ 0.0 9.70 19.34 16.14⎞⎛0⎞<br />

⎜<br />

⎟⎜<br />

⎟<br />

T<br />

⎜9.70<br />

0.0 16.14 19.34⎟⎜1⎟<br />

f = UPV =<br />

⎜<br />

=<br />

4.82 0.0 0.0 0.0 ⎟⎜0⎟<br />

⎜<br />

⎟⎜<br />

⎟<br />

0.0 4.82 0.0 0.0<br />

0<br />

⎝<br />

⎠⎝<br />

⎠<br />

( 1 0 0 0) 9. 7<br />

(2.2)<br />

By applying this procedure to each single nucleotide substitution in the codon, nine<br />

probability values (three for each nucleotide) are obtained. Each <strong>of</strong> these values gives the<br />

k th f ( i<br />

mutational probability (<br />

k<br />

) α → β<br />

) that change the i th amino acid (α) expressed by the<br />

native codon into the one (β, which also comprises the stop codon) expressed by the<br />

k<br />

f( i) mutated codon (e.g. X,Y,Z → X´,Y,Z). Therefore, the 9 probabilities ( α→<br />

get the normalization factor (Ni) for the i th residue <strong>of</strong> the protein sequence<br />

β<br />

) are summed to<br />

N<br />

i<br />

=<br />

9<br />

∑<br />

k = 1<br />

f<br />

( i)<br />

k<br />

α →β<br />

(2.3)<br />

hence, the normalized probability for the substitution <strong>of</strong> amino acid α → β is given by<br />

φ(<br />

i)<br />

k<br />

α<br />

→<br />

= f ( i)<br />

β<br />

k<br />

α → β<br />

N<br />

i<br />

(2.4)<br />

2.3.2. MAP indicators<br />

Three indicators protein structure indicator, amino acid diversity indicator <strong>and</strong><br />

chemical diversity indicator are used to summarize the characteristics <strong>of</strong> r<strong>and</strong>om<br />

mutagenesis method for the target gene on amino acid level. The amino acid diversity for<br />

the substitution <strong>of</strong> amino acid α → β in the protein sequence (L) is calculated by<br />

43


PART I: MAP 2.0 3D<br />

∆<br />

α→ β<br />

=<br />

L<br />

1<br />

∑φ(<br />

i)<br />

L 1<br />

i=<br />

α→β<br />

(2.5)<br />

The amino acid diversities are summed together to calculate the values for MAP indicators.<br />

I<br />

α →S<br />

=<br />

r '<br />

∑<br />

r = 1<br />

∆<br />

r<br />

α →β<br />

( r )<br />

(2.6)<br />

where S indicates different subset <strong>of</strong> amino acids or stop codons <strong>and</strong> r´ represents the<br />

elements in these subsets. The chemical diversity indicator quantifies the generated<br />

chemical diversity by the r<strong>and</strong>om mutagenesis method. For this indicator, the S consists <strong>of</strong><br />

one <strong>of</strong> the subset <strong>of</strong> amino acids: charged (D, E, H, K, R; S = ch), neutral (C, M, S, P, T, N, Q; S<br />

= ne), aromatic (F, Y, W; S = ar) <strong>and</strong> aliphatic (G, A, V, L, I; S = al). For example, Iα <br />

ch indicate<br />

the total probability <strong>of</strong> a given amino acid α to substitute into charged amino acids (ch) is<br />

calculated by Δα <br />

β(r), where the substituted residue β(r) can be one <strong>of</strong> the charged residues<br />

(E, D, R, K <strong>and</strong> H) i.e. r´ = 5. The protein structure indicator signifies the fraction <strong>of</strong> single<br />

nucleotide substitution resulting in protein structure/function-disrupting (stop codons; S =<br />

st) <strong>and</strong> likely destabilizing (glycine or proline; S = gp) amino acid substitutions. Finally, the<br />

amino acid diversity indicator measures the fraction <strong>of</strong> variants with preserved amino acid<br />

substitutions (S = pr) <strong>and</strong> average amino acid substitutions per residue. This is<br />

complemented by codon diversity coefficient that measures the distribution <strong>of</strong> r<strong>and</strong>om<br />

mutations among the codons <strong>of</strong> the gene.<br />

2.3.3. Local chemical diversity <strong>and</strong> protein structure components<br />

Two new sequence based indicators are introduced with the MAP 2.0 3D server to<br />

complement the single amino acid structural analysis. The substitution probability <strong>of</strong> the i th<br />

44


PART I: MAP 2.0 3D<br />

amino acid (α) that leads to change in the amino acid (β) with the side chain <strong>of</strong> same<br />

chemical nature is calculated by<br />

δ(i) α→S<br />

=<br />

r'<br />

∑<br />

r=1<br />

r<br />

φ(i) α→β<br />

(2.7)<br />

where, x <strong>and</strong> r´ represents the amino acid group <strong>and</strong> its members, respectively (as<br />

described for equation 2.6). The amino acid mutability <strong>of</strong> the i th amino acid (a special case<br />

<strong>of</strong> the equation 2.7 with r´ = 1) is given by<br />

µ(i) =1−φ(i) α→α (2.8)<br />

where<br />

φ(i)<br />

α →α<br />

is the normalized probability for the substitution does not lead to amino<br />

acid change (α → α) at the i th residue. The local structure environment <strong>of</strong> the amino acid<br />

residue influences the acceptance <strong>of</strong> the amino acid substitutions.[23,24] Local structural<br />

environment <strong>of</strong> the protein comprises secondary structure element, residue flexibility <strong>and</strong><br />

solvent accessibility. Intra protein interactions contribute to define secondary structure<br />

elements <strong>and</strong> residue flexibility in a target protein <strong>and</strong> help to underst<strong>and</strong> molecular basis<br />

<strong>of</strong> the stability <strong>and</strong> activity <strong>of</strong> the protein.[26] To illustrate the effect <strong>of</strong> generated chemical<br />

diversity on protein structural environment, these factors are mapped with amino acid<br />

substitution patterns.<br />

The secondary structure elements are derived using DSSP[27] while Relative<br />

Solvent Accessibility (RSA) has been calculated by the number <strong>of</strong> water molecules in<br />

contact <strong>of</strong> residue[27] divided by total surface area <strong>of</strong> the residue.[28] A threshold value <strong>of</strong><br />

0.16 is used to differentiate between exposed (RSA >= 0.16) or buried residues (RSA <<br />

0.16). Crystallographic B-factors are used as indicators <strong>of</strong> the residue flexibility.[29] The B-<br />

factors <strong>of</strong> Cα atoms are normalized by the<br />

45


B´=<br />

( B − B )<br />

σ<br />

PART I: MAP 2.0 3D<br />

(2.9)<br />

where ‹B› is the average value for Cα atom (after omitting first <strong>and</strong> last 3 residues) <strong>and</strong> σ<br />

the st<strong>and</strong>ard deviation.[30] The relative B-factor values after normalization is employed to<br />

differentiate flexibility <strong>and</strong> rigidity <strong>of</strong> the residue.[31]<br />

Finally, the new server calculates from the crystallographic protein structure, using<br />

criteria reported in literature, the following intra-protein interactions: disulphide<br />

bonds,[27] salt bridge,[32] hydrophobic interaction,[33] aromatic interaction[34] <strong>and</strong> side<br />

chain hydrogen bond.[35] The default parameters are taken from the widely accepted<br />

primary literature for the calculation <strong>of</strong> molecular interactions <strong>and</strong> can me modified by the<br />

user.<br />

2.3.4. MAP 2.0 3D server description<br />

MAP 2.0 3D analysis was performed on gene sequence along with the 3D coordinates<br />

<strong>of</strong> target protein for a r<strong>and</strong>om mutagenesis method at a time. Figure 2.1 shows the query<br />

interface <strong>of</strong> the server available at http://map.jacobs-university.de/map3d.html.<br />

The server is flexible to accept the gene sequence in commonly used sequence<br />

format (fasta, GenBank, GCG) or as the raw sequence. The 3D coordinates is accepted in<br />

PDB file format[36]. The protein sequence, after translation from gene sequence, is aligned<br />

with protein sequence, extracted from protein coordinates, by using Smith Waterman<br />

algorithm[37] for local sequence alignment. For the complete analysis, the sequences<br />

should have appropriate identity (default >= 70 %). In case <strong>of</strong> multi-protein chain files, the<br />

analysis performs on first chain or can be defined by the user. The analysis is performed on<br />

a user selected mutagenesis method (chosen among the MAP library <strong>of</strong> commonly used<br />

methods or, as a feature <strong>of</strong> the server by directly introducing the values <strong>of</strong> the probability<br />

46


PART I: MAP 2.0 3D<br />

<strong>of</strong> transformation matrix P). By default the results include the analysis <strong>of</strong> all the residues<br />

that can be changed by selecting predefined group <strong>of</strong> amino (charged, neutral, aromatic<br />

<strong>and</strong> aliphatic or, accordingly to their relative solvent accessibility, exposed or buried) or by<br />

providing a set <strong>of</strong> amino acid residues, which can be extended to residues within a given<br />

range (in Å) from the given set <strong>of</strong> amino acids. Finally, the advanced user interface section<br />

allows the change <strong>of</strong> the parameters used for the calculation <strong>of</strong> molecular interactions.<br />

47


PART I: MAP 2.0 3D<br />

Figure 2.1: Query interface for MAP 2.0 3D. Black boxes show two ways to query the sever: (1)<br />

sequence based analysis that take nucleotide sequence as an input (red box) <strong>and</strong> (2) structure<br />

based analysis, which takes protein coordinates (crystallographic structure or homology model),<br />

nucleotide sequence <strong>and</strong> a r<strong>and</strong>om mutagenesis method as input (red boxes). The options given in<br />

the green boxes can be used to customize the query like (1) 19 commonly used mutagenesis<br />

methods are included in the server as default, new method can be included by defining its<br />

mutational spectra, (2) selection <strong>of</strong> chain in case <strong>of</strong> multi chain protein, (3) restrict the search for a<br />

group <strong>of</strong> amino acids either selecting the predefined groups based on (a) the chemical property <strong>of</strong><br />

their side chain like charged, neutral, aromatic, <strong>and</strong> aliphatic, (b) the solvent accessible area like<br />

buried or exposed <strong>and</strong> (c) the given set <strong>of</strong> amino acids <strong>and</strong> define cut<strong>of</strong>f (in Å) to include residues in<br />

the defined diameter <strong>of</strong> given residues in the analysis, <strong>and</strong> (4) altering the threshold used for the<br />

calculation <strong>of</strong> molecular interactions.<br />

2.3.5. MAP 2.0 3D output<br />

Along with the sequence based MAP analysis indicators, the implemented indicators<br />

in MAP 2.0 3D correlate the generated amino acid substitution patterns <strong>of</strong> r<strong>and</strong>om<br />

mutagenesis methods to the protein structure (by using the Jmol applet,<br />

http://www.jmol.org/) <strong>and</strong> includes a residue mutability indicator <strong>and</strong> taking secondary<br />

structure elements, residue flexibility, relative solvent accessibility <strong>and</strong> intra protein<br />

interactions into account (see above). Generated results are also available to download for<br />

further use in text format. The modified coordinate files (with amino acid substitution<br />

probabilities) in pdb format are also available as downloads.<br />

2.3.6. Model proteins<br />

The enzyme selected for the analysis by MAP 2.0 3D are: 1) D-amino acid oxidase from<br />

Rhodotorula gracilis (EC: 1.4.3.1; EMBL-Bank: AAB93974.1[36]; PDB Id: 1C0I[37]), 2)<br />

Phytase from Escherichia coli (EC: 3.1.3.2; EMBL-Bank: AY496073.1[38]; PDB Id:<br />

1DKP[39]), 3) N-Acetylneuramine acid aldolase from Escherichia coli (EC: 4.1.3.3; EMBL-<br />

Bank: X03345.1[40] ; PDB Id: 1NAL[41]). The sequence composition <strong>of</strong> the enzymes: 1) D-<br />

48


PART I: MAP 2.0 3D<br />

amino acid oxidase (1107 bases: A 19.96 %; T 17.52 %; G 31.17 %; C 31.35 %; 369<br />

residues), 2) Phytase (1299 bases: A 24.25 %; T 22.09 %; G 27.25 %; C 26.40 %; 433<br />

residues) <strong>and</strong> 3) N-acetylneuraminic acid aldolase (894 bases: A 24.38 %; T 23.60 %; G<br />

27.07 %; C 24.94 %; 298 residues). Secondary structure <strong>of</strong> the enzymes: 1) D-amino acid<br />

oxidase (30 % helical, 28 % beta sheet), 2) Phytase (42 % helical, 15 % beta sheet), 3) N-<br />

acetylneuraminic acid aldolase (50 % helical, 13 % beta sheet).<br />

2.4. Results <strong>and</strong> discussions<br />

The use <strong>of</strong> MAP 2.0 3D server is illustrated by performing the analysis <strong>of</strong> three<br />

different enzymes evolved for different properties by using directed protein evolution. The<br />

first example describes how to decrease effects <strong>of</strong> mutational bias <strong>and</strong> to generate a mutant<br />

library with a higher fraction <strong>of</strong> active clones. The second <strong>and</strong> third examples show the<br />

usability <strong>of</strong> the server to analyze the influence <strong>of</strong> mutational preferences on the evolution<br />

<strong>of</strong> desirable property. Outputs <strong>of</strong> the complete MAP 2.0 3D analysis are provided as examples<br />

in the instruction link <strong>of</strong> the server (http://map.jacobs-university.de/instruction.html).<br />

2.4.1. D-amino acid oxidase<br />

D-amino acid oxidase (DAAO) is a flavin adenine dinucleotide (FAD) dependent<br />

flavoenzyme. DAAO catalyses the dehydrogenation <strong>of</strong> D-amino acid to the corresponding α-<br />

keto acids, producing ammonia <strong>and</strong> hydrogen peroxide.[42,43] The high turnover rate, the<br />

stable FAD-binding <strong>and</strong> the broad substrate specificity <strong>of</strong> DAAO from Rhodotorula gracilis<br />

(RgDAAO) make it an attractive catalyst for biotechnological application as the biosensing<br />

(i.e. the rapid <strong>and</strong> reliable detection <strong>of</strong> D-amino acid content in food specimens or <strong>of</strong> the<br />

neurotransmitter D-serine in the brain).[43] We performed MAP 2.0 3D analysis on the<br />

RgDAAO to evaluate the amino acid diversity generated by r<strong>and</strong>om mutagenesis methods.<br />

49


PART I: MAP 2.0 3D<br />

Table 2.1: Summary <strong>of</strong> the MAP 2.0 3D analysis for the oxidase, the phytase <strong>and</strong> the aldolase,<br />

targeting different epPCR methods for r<strong>and</strong>om mutagenesis.<br />

RgDAAO (1 st<br />

RgDAAO (2 nd<br />

Phytase<br />

N-acetylneura-<br />

round)<br />

round)<br />

minic acid<br />

aldolase<br />

epPCR method<br />

Average amino<br />

acid substitution a<br />

Preserved amino<br />

acid substitution b<br />

Codon diversity<br />

coefficient c<br />

Stop codons<br />

frequency d<br />

Gly/Pro<br />

frequency e<br />

Charged amino<br />

acid diversity f<br />

Neutral amino<br />

acid diversity g<br />

Aromatic amino<br />

acid diversity h<br />

Aliphatic amino<br />

acid diversity i<br />

Taq<br />

(+,G=A=C=T)<br />

Taq<br />

(+,G=A,C=T)<br />

Taq<br />

(+,G=A,C=T)<br />

Taq<br />

(+,G=A=C=T)<br />

7.40 7.40 7.45 7.20<br />

24.53 % 23.38 % 25.40 % 28.47 %<br />

42.48 34.04 36.49 43.70<br />

2.30 % 4.38 % 4.69 % 2.12 %<br />

20.58 % 13.23 % 11.60 % 16.26 %<br />

-0.34 %<br />

(25.00 %)<br />

-2.62 %<br />

(25.00 %)<br />

1.39 %<br />

(19.21 %)<br />

5.00 %<br />

(22.22 %)<br />

3.37 %<br />

4.47 % -4.14 % 1.73 %<br />

(27.99 %) (27.99 %) (35.65 %) (26.94 %)<br />

-3.19 % -0.23 % 0.91 % -3.14 %<br />

(7.34 %) (7.34 %) (5.79 %) (8.08 %)<br />

-2.13 % -6.00 % -2.86 % -5.72 %<br />

(39.67 %) (39.67 %) (39.35 %) (42.76 %)<br />

aaverage number <strong>of</strong> amino acid substitutions per residue, b Iα→pr: fraction <strong>of</strong> variants with<br />

preserved amino acid substitutions, c codon diversity coefficient, d Iα→st: fraction <strong>of</strong> variants with<br />

stop codons, e Iα→gp: fraction <strong>of</strong> variants with Gly/Pro <strong>and</strong> chemical diversity generated by the<br />

mutagenesis methods presented as f Iα→ch: charged, g Iα→ne: neutral, h Iα→ar: aromatic <strong>and</strong> I Iα→al:<br />

50


PART I: MAP 2.0 3D<br />

aliphatic amino acid diversity with the amino acid composition <strong>of</strong> the target protein sequence (in<br />

parenthesis) <strong>and</strong> deviation from this composition after mutagenesis.<br />

MAP 2.0 3D analysis<br />

The sequence based MAP 2.0 3D analysis was performed using the following<br />

descriptors: i) protein structure indicators, ii) amino acid diversity indicator with codon<br />

diversity coefficient <strong>and</strong> iii) chemical diversity indicator.<br />

Figure 2.2: Statistical analysis <strong>of</strong> stop codons frequencies (a) <strong>and</strong> Gly/Pro substitutions (b) for<br />

RgDAAO. The r<strong>and</strong>om mutagenesis methods enclosed in the black rectangles (epPCR (Taq (MnCl 2,<br />

G=A=C=T)) <strong>and</strong> epPCR (Taq (MnCl 2, G=A, C=T))) are used for the MAP 2.0 3D analysis.<br />

In Figure 2.2, the values for the stop codon indicator (Iα→st)) <strong>and</strong> the Gly/Pro<br />

indicator (Iα→gp)) for different r<strong>and</strong>om mutagenesis methods are reported. The two methods<br />

show opposite trend in the generation <strong>of</strong> stop codons (sequence truncation) <strong>and</strong> Gly/Pro<br />

51


PART I: MAP 2.0 3D<br />

(α-helix destabilizers), i.e. higher the stop codons frequency lower the Gly/Pro<br />

substitutions <strong>and</strong> vice versa.[17] The two epPCR methods (indicated in the Figure 2.2 with<br />

the black rectangles) were found to be more appropriate for the RgDAAO with the balanced<br />

frequencies <strong>of</strong> stop codons <strong>and</strong> Gly/Pro in comparison to other mutagenesis methods. In<br />

Table 2.1, the sequence-based analysis <strong>of</strong> the server for selected r<strong>and</strong>om mutagenesis<br />

methods is summarized. The first method, the balanced epPCR Taq-Pol (Mn 2+ , balanced<br />

dNTP)[44] has strong preference for specific nucleotide exchange ~32 % AT → GC<br />

(transition mutations). While second method, the unbalanced epPCR Taq-Pol (Mn 2+ ,<br />

unbalanced dNTP)[45] is expected to produce more transversion (21.41 % AT → TA) than<br />

transition (14.45 % AT→GC) mutations. Balanced epPCR was expected to generate lower<br />

fraction <strong>of</strong> stop codons (Iα→st = 2.30 %) <strong>and</strong> higher Gly/Pro (Iα→gp = 20.58 %) content than<br />

the unbalanced epPCR (Iα→st = 4.38 % <strong>and</strong> Iα→gp = 13.23 %) (see Table 2.1). For both<br />

methods, an average 7.4 amino acid substitutions per residue was calculated.<br />

In Figure 2.3, cartoon representations <strong>of</strong> the RgDAAO crystallographic structure<br />

colored accordingly to Iα→gp using the Jmol[46] visualization feature <strong>of</strong> the new server, are<br />

shown. Out <strong>of</strong> 30 % <strong>of</strong> the residues involved in helix formation, 51 % has a higher Iα→gp<br />

value (if α is equal to S, L, E <strong>and</strong> D) with a prevalence <strong>of</strong> negative charged residues (E <strong>and</strong> D,<br />

highlighted in stick format in Figure 2.3). In comparison to the unbalanced epPCR, the<br />

balanced epPCR method was observed with a higher probability <strong>of</strong> the charged residues<br />

substitution into Gly/Pro (represented by the color code used to define amino acid<br />

substitution probability in Figure 2.3). The mapping <strong>of</strong> charged amino acid substitution<br />

patterns on the structure <strong>of</strong> RgDAAO are reported in Figure 2.4 <strong>and</strong> found to be consistent<br />

with the latter observation <strong>of</strong> Gly/Pro substitution patterns. The balanced epPCR (Figure<br />

2.4a) shows lower probability for charged amino acid substitutions than unbalanced epPCR<br />

(Figure 2.4b) that is found to be opposite to the Gly/Pro substitution patterns for both<br />

methods (Figure 2.3). Hence, the amino acid substitutions <strong>of</strong> charged residues into residues<br />

unfavorable for forming molecular interactions result in destabilization <strong>of</strong> protein. For<br />

example, charged residues were found to be involved in molecular interactions like salt<br />

bridges (15 out <strong>of</strong> 21 show more than 0.5 probability to be substituted in glycine) <strong>and</strong> side<br />

chain H-bonds (5 out <strong>of</strong> 26 with more than 0.5 probability for glycine substitutions). In<br />

52


PART I: MAP 2.0 3D<br />

Figure 2.5, charged residues involved in salt bridge formation with the amino acid diversity<br />

generated by the balanced (a1<br />

<strong>and</strong> a2) <strong>and</strong> unbalanced epPCR (b1 <strong>and</strong> b2) methods are<br />

reported. The balanced epPCR method shows lower probabilities for substitution into<br />

charged residues when compared to the unbalanced epPCR method. The unbalanced<br />

epPCR is less transition biased (AT → GC) that results in higher probability <strong>of</strong> substitutions<br />

to charged residues (for E coded by GAG <strong>and</strong> D coded by GAC; a transition mutation leads<br />

<strong>of</strong>ten to a substitution into glycine (GGC, GGG)). These effects <strong>of</strong> mutagenesis methods due<br />

to mutational preferences might be minimized by codon optimization like for E using GAA<br />

<strong>and</strong> for D using GAT codon.<br />

Figure 2.3: Gly/Pro amino acid substitutions mapping on RgDAAO structure for (a) epPCR<br />

(Taq (MnCl2, G=A=C=T)) <strong>and</strong> (b) epPCR (Taq (MnCl2, G=A, C=T)). For the balanced epPCR<br />

method (a) the red colored regions <strong>of</strong> RgDAAO structure indicate an overall higher<br />

probability <strong>of</strong> charged residues substitutions, mainly for negatively charged residues (in<br />

stick representation), into Gly/Pro than the unbalanced epPCR (b).<br />

53


PART I: MAP 2.0 3D<br />

Figure 2.4: Amino acid substitutions mapping <strong>of</strong> charged residues (E, D, R, K, H) on RgDAAO with<br />

for (a) epPCR (Taq (MnCl 2, , G=A=C=T)) <strong>and</strong> (b) epPCR (Taq (MnCl 2, G=A, C=T)).<br />

54


PART I: MAP 2.0 3D<br />

Figure 2.5: Chemical diversity <strong>and</strong> mutability <strong>of</strong> charged amino acid positions <strong>of</strong> D-amino acid<br />

oxidase (E, D, R, K, H) that are involved in salt bridges formation (a1) <strong>and</strong> (a2) for epPCR (Taq<br />

(MnCl 2, G=A=C=T)) <strong>and</strong> (b1) <strong>and</strong> (b2) for epPCR (Taq (MnCl 2, G=A, C=T)). Y-axis shows residue (i)<br />

sequence id, (ii) PDB id, (iii) residue name, (iv) secondary structure elements (H: alpha helix; B:<br />

beta bridge <strong>and</strong> extended str<strong>and</strong>; T: hydrogen bonded turn <strong>and</strong> bend; *: loop or irregular structure)<br />

<strong>and</strong> (v) Amino acid category according to the chemical property <strong>of</strong> its side chain (P: charged, Y:<br />

neutral, C: aromatic <strong>and</strong> B: aliphatic) with stop codon (R) <strong>and</strong> Gly/Pro (G) as separate classes.<br />

Using the new focused analysis feature <strong>of</strong> the server, the amino acid substitutions<br />

patterns for active site residues (Y223, Y238, <strong>and</strong> R285) were also evaluated. Y223 <strong>and</strong><br />

Y238 are involved in substrate binding <strong>and</strong> product release while R285 forms a pair with<br />

carboxylate portion <strong>of</strong> the substrate (arginine) in RgDAAO.[37] R285 has a very low<br />

residue mutability indicator (μ(285) < 0.3) (i.e. low probability <strong>of</strong> substitution leading to<br />

amino acid change) for both methods. Y223 <strong>and</strong> Y238 have µ(223/238) = 0.9 <strong>and</strong> therefore<br />

they have a higher probability to be substituted into another amino acid. For the balanced<br />

epPCR, Y223 <strong>and</strong> Y238 are preferentially substituted into charged (δ(223/238)Y→ch = 0.37)<br />

<strong>and</strong> neutral (δ(223/238)Y→ne = 0.46) amino acids. In the unbalanced epPCR, the chemical<br />

diversity at Y223/238 is more preserved (δ(223/238)Y→ne = 0.44 <strong>and</strong> δ(223/238)Y→ar =<br />

0.31). The tendency to the substitution <strong>of</strong> active site aromatic residues into chemically<br />

different amino acid might result in the increased number <strong>of</strong> inactive clones in the mutant<br />

library. In summary, MAP 2.0 3D provides qualitative indication that the balanced epPCR<br />

method might be less beneficial (or <strong>of</strong> lower quality) than the unbalanced one in the<br />

directed evolution <strong>of</strong> RgDAAO.<br />

RgDAAO directed evolution<br />

In one directed evolution study by Pollegioni et al.[47], the substrate specificity <strong>of</strong><br />

RgDAAO was altered to formulate it as biosensor for analytical determination <strong>of</strong> D-amino<br />

acid in biological samples. Two rounds <strong>of</strong> directed evolution were performed employing<br />

epPCR mutant libraries (balanced dNTP) followed by another round <strong>of</strong> directed evolution<br />

55


PART I: MAP 2.0 3D<br />

employing epPCR (unbalanced dNTP) for diversity generation. In the first round (1 st set <strong>of</strong><br />

epPCR: balanced), 91 % <strong>and</strong> in the second round (2 nd set <strong>of</strong> epPCR: unbalanced), 63 %<br />

clones were reported to be inactive. The results <strong>of</strong> these experiments are in agreement<br />

with the predictions <strong>of</strong> MAP 2.0 3D server. In fact, mutational preferences <strong>of</strong> the balanced<br />

method induce more structural destabilizing substitutions <strong>and</strong> resulted in a higher number<br />

<strong>of</strong> inactive clones than balanced epPCR. In addition, MAP 2.0 3D analysis suggests that most<br />

<strong>of</strong> the inactive clones should be a result <strong>of</strong> substitutions into Gly/Pro (destabilizing amino<br />

acids), which can destabilize the secondary structure <strong>of</strong> a helix or weaken intra-molecular<br />

interactions.<br />

The best variant obtained from the experiments was the triple mutant (T60A Q144R<br />

K152E) with broader substrate specificity. Amino acid substitution patterns calculated by<br />

the MAP 2.0 3D server at these positions were also found in agreement with experimental<br />

results. All mutated positions were assigned by MAP 2.0 3D with high residue mutability<br />

value (μ(60/144/152) > 0.8), i.e. within mutagenic hotspots generated by mutagenesis<br />

methods. Q144R substitution was identified in the first round <strong>of</strong> the balanced epPCR. Q144<br />

has a high probability to substitute into charged residue (δ(144)Q→ch = 0.67) <strong>and</strong><br />

experimentally the Q144R (φ(144)Q→R = 0.58) substitution was found. In second round <strong>of</strong><br />

r<strong>and</strong>om mutagenesis with unbalanced epPCR, T60A (φ(60)T→A = 0.58) <strong>and</strong> K152E<br />

(φ(152)K→E = 0.36) were substituted. Both residues have a high preference to be<br />

substituted into aliphatic (δ(60)T→al = = 0.6) <strong>and</strong> charged residues (δ(152)K→ch = = 0.7),<br />

respectively.<br />

In summary, the RgDAAO case illustrates how the MAP 2.0 3D server can be used in<br />

developing efficient mutagenesis strategies before <strong>and</strong> during directed evolution<br />

experiments by, for instance, the selection <strong>of</strong> the most efficient mutagenesis method for the<br />

target gene with least unfavorable effects on its protein structure or function <strong>and</strong> codon<br />

engineering. In this way, the gene can be synthesized prior to the directed evolution<br />

experiment to reduce highly destabilizing substitutions at key amino acid positions.<br />

56


PART I: MAP 2.0 3D<br />

2.4.2. Phytase<br />

Phytase is a class <strong>of</strong> phosphatase enzymes that catalyses the hydrolysis <strong>of</strong> phytic<br />

acid (myoinositol hexakisphosphate) to release inorganic phosphorus in a usable form.<br />

Phytases have been used as a feed supplement since decades.[48] <strong>Application</strong> <strong>of</strong> phytases<br />

in industrial feed pelleting process requires high temperatures. For this reason, directed<br />

evolution methods have been used to increase thermal resistance <strong>of</strong> phytases while<br />

maintaining high activity at ambient temperature.[49]<br />

MAP 2.0 3D analysis<br />

MAP 2.0 3D analysis was performed on the phytase appA2 (full analysis is given in<br />

MAP 2.0 3D server as an example). In comparison to other 18 r<strong>and</strong>om mutagenesis methods,<br />

epPCR Taq (+, G=A, C=T) was found to be the preferred choice for directed appA2<br />

evolution. In fact, as reported in Table 2.1, the sequence based MAP 2.0 3D analysis shows<br />

frequency <strong>of</strong> stop codons Iα→st = 4.69 % <strong>and</strong> substitutions into Gly/Pro Iα→gp = 11.60 %. The<br />

average 7.45 amino acid substitutions per residue were calculated. The value <strong>of</strong> codon<br />

diversity coefficient was 36.49 % <strong>and</strong> resulted in preserved amino acid substitutions Iα→pr =<br />

25.40 %. Charged (19.21 %) <strong>and</strong> aromatic (5.69 %) residues were overrepresented with<br />

1.39 % <strong>and</strong> 0.91 % deviation from their chemical distribution, respectively. The aliphatic<br />

(39.35 %) <strong>and</strong> neutral (35.65 %) residues were underrepresented with -2.86 % <strong>and</strong> -4.14<br />

% deviation, respectively.<br />

By using the structural data a different conclusion emerge in contrast to the<br />

sequence analysis alone. One <strong>of</strong> the rule <strong>of</strong> thumb, used to enhance the thermostability <strong>of</strong><br />

an enzyme, is to increase the number <strong>of</strong> charged residues in the loop regions at the protein<br />

surface. The reduction <strong>of</strong> mobility <strong>of</strong> these flexible regions by strengthening with<br />

electrostatic <strong>and</strong> hydrogen bonding interactions usually has a stabilizing effect on the<br />

thermal stability.[50] Hence, the amino acid substitution patterns <strong>of</strong> charged residues were<br />

analyzed using the residue mutability indicator, the normalized B-factors (B´) as a residue<br />

flexibility indicator <strong>and</strong> the relative solvent accessibility (RSA) to differentiate exposed <strong>and</strong><br />

57


PART I: MAP 2.0 3D<br />

buried residues. In Figure 2.6, the mapping <strong>of</strong> amino acid substitution patterns, generated<br />

by epPCR Taq (+, G=A, C=T), for different amino acid substitution classes (charged, neutral,<br />

aromatic <strong>and</strong> aliphatic), stop codon <strong>and</strong> Gly/Pro on the phytase appA2 is reported with<br />

charged residues represented in stick representation. The high probability <strong>of</strong> charged<br />

residues substitutions into Gly/Pro, aliphatic <strong>and</strong> neutral residues were observed in<br />

MAP 2.0 3D analysis. In Figure 2.7 the detailed information <strong>of</strong> amino acid substitution<br />

patterns for charged residues is reported with three MAP 2.0 3D structural indicators for the<br />

epPCR Taq (+, G=A, C=T) method. The experimentally determined mutations are<br />

highlighted with black rectangles in Figure 2.7. Most <strong>of</strong> the charged residues were found<br />

with mutability value µ > 0.6 i.e. high substitution probability to change into another amino<br />

acid. In Figure 2.7, the high probabilities were evident to substitute from charged residues<br />

into glycine or proline (alpha helix destabilizers), aliphatic <strong>and</strong> neutral residues (less<br />

favorable to improve thermostability).<br />

58


PART I: MAP 2.0 3D<br />

Figure 2.6: MAP 2.0 3D analysis <strong>of</strong> amino acid substitutions probability <strong>of</strong> phytase appA2 after being<br />

subjected to epPCR (Taq (MnCl 2, G=A, C=T) in cartoon representation; charged residues (D, E, H, K,<br />

R) shown in stick representation. The probability values increase from blue (lowest probability) to<br />

red (highest probability). Amino acids were grouped according to the chemical nature <strong>of</strong> their side<br />

chain: charged (c), neutral (d), aromatic (e) or aliphatic (f) with sequence interrupting (stop codons<br />

(a)) <strong>and</strong> structure destabilizing amino acids (glycine <strong>and</strong> proline (b)).<br />

Phytase directed evolution<br />

In one example, Kim et al. performed directed evolution on phytase appA2 from E.<br />

coli to generate variants with increased thermostability by using epPCR with unbalanced<br />

dNTPs.[51] Two variants (K46E <strong>and</strong> K65E K97M S209G) with 20 % improved<br />

thermostabilty were found after screening 5000 clones. Out <strong>of</strong> four positions, three were<br />

resulted from charged residue substitutions occurred at lysine residues.<br />

MAP 2.0 3D analysis <strong>of</strong> amino acid substitution pattern for these positions was found<br />

in agreement with experimental findings with, all four positions having a high mutability<br />

indicator value (µ > 0.8) <strong>and</strong> relative solvent accessibility (RSA > 0.4). Furthermore, all<br />

lysine residues in the mutated positions have a probability to a nucleotide exchange that<br />

results in a stop codon. K46 <strong>and</strong> K97 have the same amino acid substitution patterns with<br />

substitution preference for stop codon (δ(46/97)K→st = 0.24) but different for charged<br />

(δ(46)K→ch = 0.40; φ(46)K→E = 0.16) <strong>and</strong> neutral residues (δ(97)K→ne = 0.35; φ(97)K→M =<br />

0.24). K65 has different amino acid substitution values to change into residues with<br />

aliphatic (δ(65)K→al = 0.18), charged (δ(65)K→ch = 0.36; φ(65)K→E = 0.12) <strong>and</strong> neutral<br />

(δ(65)K→ne = 0.27) side chains <strong>and</strong> δ(65)K→st = 0.18 for stop codon. S209 has a high<br />

probability to preserve the chemical property <strong>of</strong> its side chain <strong>and</strong> has high preference to<br />

neutral substitution (δ(209)S→ne = 0.60). S209 substitution into glycine alone has<br />

probability φ(209)S→G = 0.24. The mutations generated by using the epPCR Taq (+, G=A,<br />

=T) mutagenesis method experimentally resulted in only 20 % active clones in the library<br />

<strong>and</strong> only 80 were found improved in thermal stability. Phytase appA2 has high C helical<br />

59


PART I: MAP 2.0 3D<br />

content (42%) <strong>and</strong> substitutions into Gly/Pro residues might reduce thermal stability by<br />

destabilizing the structure <strong>and</strong> increasing the number <strong>of</strong> inactive clones. In general, amino<br />

acid substitutions <strong>of</strong> charged residues into aliphatic or neutral residues are less favorable<br />

to improve thermal stability.<br />

60


PART I: MAP 2.0 3D<br />

Figure 2.7: Amino acid substitution patterns for charged residues in phytase with performance<br />

the parameters residue mutability, residue flexibility <strong>and</strong> relative solvent accessibility <strong>of</strong> amino<br />

acids. The experimentally determined mutations are highlighted in black boxes. Y-axis shows<br />

sequence id, PDB id, amino acid name <strong>and</strong> in (a) secondary structure elements (H: alpha helix; B:<br />

beta bridge <strong>and</strong> extended str<strong>and</strong>; T: hydrogen bonded turn <strong>and</strong> bend; *: loop or irregular structure),<br />

(b) normalized Cα B-factor to differentiate between flexible: F <strong>and</strong> rigid: R residues <strong>and</strong> (c) relative<br />

solvent associability to identify exposed: E or buried: B residues.<br />

2.4.3. N-acetylneuraminic acid aldolase<br />

N-acetylneuraminic acid aldolase (Neu5Ac aldolase) catalyses the aldol<br />

condensation <strong>of</strong> N-acetyl-D-mannosamine <strong>and</strong> pyruvate to give N-acetyl-D-neuraminic acid<br />

(D-sialic acid).[52] Neu5Ac aldolase is used in the synthesis <strong>of</strong> sialic acid, a complex sugar<br />

with many pharmaceutical applications.<br />

MAP 2.0 3D analysis<br />

Based on the sequence based analysis <strong>of</strong> MAP 2.0 3D server, the balanced epPCR<br />

method (Taq (MnCl2, G=A=C=T) was found suitable for directed evolution <strong>of</strong> Neu5Ac<br />

aldolase (summarized in Table 2.1). For this method, the value <strong>of</strong> codon diversity<br />

coefficient was 43.12, which is resulted in Iα→pr = 28.47 % preserved amino acid<br />

substitutions with an average 7.20 amino acid substitutions per residue. The frequency for<br />

stop codons was Iα→st = 2.12% <strong>and</strong> for Gly/Pro substitutions Iα→gp = 16.26% were reported.<br />

The structure based analysis was focused on active site residues (A11, S47, T48, Y137,<br />

I139, k165, T167, G189, Y190) using the new option <strong>of</strong> the MAP 2.0 3D server to restrict the<br />

analysis to selected amino acids. Figure 2.8 shows the expected amino acid substitutions<br />

for active site residues <strong>and</strong>, highlighted in boxes, experimentally determined mutation<br />

positions[52] (G70, T84, Y98, F115, V251, E282). With the exception <strong>of</strong> residues A11 <strong>and</strong><br />

G189, the other active site residues have a residue mutability value (μ > 0.6). The values <strong>of</strong><br />

the RSA <strong>and</strong> B´ indicate that A11 <strong>and</strong> G189 are buried in the protein active site <strong>and</strong> highly<br />

rigid. The residue I139, another aliphatic residue <strong>of</strong> active site, resulted in a moderately<br />

61


PART I: MAP 2.0 3D<br />

high preference <strong>of</strong> substitution into neutral amino acid (δ(139)I→ne = 0.36). The active site<br />

residues, Y137 <strong>and</strong> Y190 have high residue mutability value (µ(137/190) = 0.94) <strong>and</strong><br />

substitute into charged (δ(137/190)K→ch = 0.37) or neutral (δ(137/190)K→ne = 0.46) amino<br />

acids. S47 has mutability value µ = 0.88 with a substitution probability φ(47)S→G = 0.6 to<br />

change into glycine. K165 has preference (µ(165) = 0.73) to substitute into charged<br />

residues (φ(165)K→R/K/E = 0.26).<br />

Figure 2.8: Amino acid substitution patterns for active site residues (A11, S47, T48, Y137, I139,<br />

K165, T167, G189, Y190) <strong>of</strong> Neu5Ac aldolase <strong>and</strong> experimentally determined mutations (I st<br />

generation: Y98H, P115L, II nd generation: V251I, III rd generation G70A, T84S, Q282L) in the boxes<br />

for r<strong>and</strong>om mutagenesis method: epPCR (Taq (MnCl 2, G=A=C=T). Y-axis representations are same<br />

as described in Figure 2.7.<br />

Figure 2.9 <strong>and</strong> 2.10 show the analysis <strong>of</strong> hydrophobic contacts <strong>and</strong> hydrogen bonds<br />

for active site residues <strong>and</strong> experimentally determined mutation positions, respectively.<br />

The results <strong>of</strong> the analysis highly suggest an involvement <strong>of</strong> A11 in hydrophobic<br />

interactions with Y43 or I206 (see Figure 2.9) <strong>and</strong> a side-chain hydrogen bond formation<br />

with Y43 or G207 or N211 (see Figure 2.10). In short, the substitution spectra analysis <strong>of</strong><br />

62


PART I: MAP 2.0 3D<br />

active site residues (A11, S47, T48, Y137, I139, K165, T167, G189, Y190) indicates that the<br />

chemical environment <strong>of</strong> active site residues is not substantially modified by the epPCR<br />

r<strong>and</strong>om mutagenesis method.<br />

Figure 2.9: Amino acid substitution patterns for active site residues (A11, Y137, I139) <strong>of</strong> Neu5Ac<br />

aldolase <strong>and</strong> mutations (1 st generation: Y98H <strong>and</strong> P115L; highlighted in the box frames) involved in<br />

hydrophobic interactions. Figure (a) <strong>and</strong> (b) shows the interaction partners for hydrophobic<br />

interaction. Y-axis representations are same as described in Figure 2.5.<br />

Figure 2.10: Amino acid substitution patterns for active site residues (A11, S47, T48, Y137) <strong>of</strong><br />

Neu5Ac aldolase <strong>and</strong> mutation (I st generation: Y98H; highlighted in a black box) involved in side<br />

chain hydrogen bond. Figure (a) <strong>and</strong> (b) shows the interaction partners for side chain hydrogen<br />

bond. Y-axis representations are same as described in Figure 2.5.<br />

63


PART I: MAP 2.0 3D<br />

Neu5Ac Aldolase directed evolution<br />

Neu5Ac aldolase was engineered applying epPCR with balanced dNTP for a<br />

complete reversal <strong>of</strong> enantioselectivity by Wada et al.[52] Three rounds <strong>of</strong> r<strong>and</strong>om<br />

mutagenesis resulted in a variant more effective toward both D/L enantiomeric substrates<br />

(3-deoxy–L/D-manno-2-octulosonic acid). The two mutation positions (Y98H <strong>and</strong> F115L,<br />

see Figure 2.8) from the first round <strong>of</strong> r<strong>and</strong>om mutagenesis were found to be involved in<br />

hydrophobic interaction (Figure 2.9, from Y98 to L63/A67/F100 <strong>and</strong> from F115 to<br />

L147/L155) <strong>and</strong> side chain hydrogen bonds (Figure 2.10, from E64 to Y98) formation in<br />

wild type. Y98H <strong>and</strong> F115L were present outside the active site, partially exposed to the<br />

solvent with relative solvent accessibility value (RSA = 0.26), moderately flexible with<br />

normalized B-factor (B´ = 0.91) <strong>and</strong> “variable” amino acid substitutions with residue<br />

mutability indicator µ(98/115) = 0.75. The substitutions at these positions into<br />

comparatively more hydrophobic residues resulted in increased activity <strong>of</strong> the wild type<br />

aldolase. In MAP 2.0 3D, Y98 is preferably preserved or substitute into charged (δ(98)Y→ch =<br />

0.27; φ(98)Y→H = 0.25) or neutral (δ(98)Y→ne = 0.33) residues while F115 shows a slightly<br />

higher preference toward aliphatic substitution (δ(115)F→al = 0.42; φ(115)F→L = 0.33) <strong>and</strong><br />

cannot be substituted into charged residues (δ(115)F→ch = 0.00). The substitution at V251<br />

residue was obtained in the second round <strong>of</strong> directed evolution experiment <strong>and</strong> resulted in<br />

partially inverted enantiomeric preference <strong>of</strong> the enzyme. The position was found to be<br />

more conserved in MAP 2.0 3D analysis with µ(251) = 0.54 or substituted into more<br />

hydrophobic residues. The third generation mutations (G70A, T84S, Q282L) resulted into a<br />

complete reversal <strong>of</strong> enzymatic enantioselectivity for use in the synthesis <strong>of</strong> both D- <strong>and</strong> L-<br />

sugars. G70 has a high flexibility (B´ = 2.59) <strong>and</strong> low probability to be substituted into an<br />

aliphatic residue (δ(70)G→al = 0.10; φ(70)G→A = 0.03). Thr84 is a part <strong>of</strong> a turn with a high<br />

flexibility (B´ = 0.77) <strong>and</strong> exposure to the solvent (RSA = 0.25) with the residue mutability<br />

µ(84) = 0.87. T84 has a high preference for being substituted by aliphatic residues<br />

(δ(84)T→al = 0.58) <strong>and</strong> with less extend by a “neutral” amino acid (δ(84)T→ne = 0.34;<br />

φ(84)T→S = 0.13). Q282 is a part <strong>of</strong> a helix, is rigid (B´ = -1.01) but partially exposed to the<br />

solvent (RSA = 0.23). Q282 has high preference to be substituted into a charged residue<br />

64


PART I: MAP 2.0 3D<br />

(δ(282)Q→ch = 0.67) <strong>and</strong> very low for aliphatic (δ(282)Q→al = 0.13; φ(282)Q→L = 0.13) or<br />

neutral (δ(282)Q→ne = 0.08) substitution.<br />

In the case <strong>of</strong> aldolase, the MAP 2.0 3D analysis shows also a good agreement with<br />

experimental results. The variability in amino acid substitution patterns for active site<br />

residues resulted in exploring more sequence space for catalytic activity <strong>of</strong> the enzyme <strong>and</strong><br />

resulted in getting a high fraction <strong>of</strong> beneficial mutations in first generation.<br />

2.5. Conclusions<br />

In this manuscript, we introduced MAP 2.0 3D server <strong>and</strong> its use to assist the design <strong>of</strong><br />

directed evolution experiments. MAP 2.0 3D correlates the traditional sequence based MAP<br />

indicators with the structural information <strong>of</strong> the target protein. The combined information<br />

can help to improve the chances to find functional <strong>and</strong> stable enzyme variants. MAP 2.0 3D<br />

helps to guide the directed evolution experiments by focusing the analysis on a set <strong>of</strong><br />

residues that are important for specific enhancement <strong>of</strong> enzymatic properties such as to<br />

improve substrate specificity by targeting residues located in or near the active site, or to<br />

enhance thermal stability or water solubility <strong>of</strong> proteins by increasing the number <strong>of</strong><br />

charged amino acid substitutions. The new structure oriented features <strong>of</strong> the MAP 2.0 3D<br />

server have been applied to the analysis <strong>of</strong> three different proteins (phytase, oxidase <strong>and</strong><br />

aldolase) <strong>and</strong> the predicted results were compared with the experimental results. The<br />

results <strong>of</strong> RgDAAO analysis indicate that the selection <strong>of</strong> the r<strong>and</strong>om mutagenesis method<br />

by the pre-screening <strong>of</strong> the generated library can help to elucidate the effects <strong>of</strong> mutational<br />

bias on the structural environment <strong>of</strong> the protein <strong>and</strong> how these effects can be optimized.<br />

The analysis <strong>of</strong> phytase <strong>and</strong> Neu5Ac aldolase illustrate that how the structural analysis<br />

features included in MAP 2.0 3D server can now assist to correlate the effect <strong>of</strong> mutational<br />

biases with protein structural environment <strong>and</strong> to evolve desired property. In this way,<br />

MAP 2.0 3D server facilitates the ‘in-silico’ pre-screening <strong>of</strong> the target gene <strong>and</strong> can also<br />

promote an increase <strong>of</strong> the active population in r<strong>and</strong>om mutagenesis libraries, thereby<br />

65


PART I: MAP 2.0 3D<br />

decrease screening efforts <strong>and</strong> increase probability for obtaining desirable mutations even<br />

in the small mutant library.<br />

2.6. References<br />

1. Bornscheuer UT, Pohl M (2001) Improved biocatalysts by directed evolution <strong>and</strong><br />

rational protein design. Curr Opin Chem Biol 5: 137-143.<br />

2. Brakmann S (2001) Discovery <strong>of</strong> superior enzymes by directed molecular evolution.<br />

Chembiochem 2: 865-871.<br />

3. Wong TS, Arnold FH, Schwaneberg U (2004) Laboratory evolution <strong>of</strong> cytochrome<br />

p450 BM-3 monooxygenase for organic cosolvents. Biotechnol Bioeng 85: 351-358.<br />

4. Roccatano D, Wong TS, Schwaneberg U, Zacharias M (2006) Toward underst<strong>and</strong>ing<br />

the inactivation mechanism <strong>of</strong> monooxygenase P450 BM-3 by organic cosolvents: a<br />

molecular dynamics simulation study. Biopolymers 83: 467-476.<br />

5. Wong TS, Zhurina D, Schwaneberg U (2006) The diversity challenge in directed<br />

protein evolution. Comb Chem High T Scr 9: 271-288.<br />

6. Wong TS, Roccatano D, Schwaneberg U (2007) Steering directed protein evolution:<br />

strategies to manage combinatorial complexity <strong>of</strong> mutant libraries. Environ<br />

Microbiol 9: 2645-2659.<br />

7. Smith JM (1970) Natural Selection <strong>and</strong> Concept <strong>of</strong> a Protein Space. Nature 225: 563-<br />

564.<br />

8. Olsen M, Iverson B, Georgiou G (2000) High-throughput screening <strong>of</strong> enzyme<br />

libraries. Curr Opin Biotech 11: 331-337.<br />

9. Tawfik DS, Bershtein S (2008) Advances in laboratory evolution <strong>of</strong> enzymes. Curr<br />

Opin Chem Biol 12: 151-158.<br />

10. Shivange AV, Marienhagen J, Mundhada H, Schenk A, Schwaneberg U (2009)<br />

Advances in generating functional diversity for directed protein evolution. Curr<br />

Opin Chem Biol 13: 19-25.<br />

66


PART I: MAP 2.0 3D<br />

11. Turner NJ (2009) Directed evolution drives the next generation <strong>of</strong> biocatalysts. Nat<br />

Chem Biol 5: 567-573.<br />

12. Wong TS, Roccatano D, Loakes D, Tee KL, Schenk A, et al. (2008) Transversionenriched<br />

sequence saturation mutagenesis (SeSaM-Tv+): a r<strong>and</strong>om mutagenesis<br />

method with consecutive nucleotide exchanges that complements the bias <strong>of</strong> errorprone<br />

PCR. Biotechnol J 3: 74-82.<br />

13. Dennig A, Shivange AV, Marienhagen J, Schwaneberg U (2011) OmniChange: The<br />

Sequence Independent Method for Simultaneous Site-Saturation <strong>of</strong> Five Codons.<br />

PLoS ONE 6: e26222.<br />

14. Chica RA, Doucet N, Pelletier JN (2005) Semi-rational approaches to engineering<br />

enzyme activity: combining the benefits <strong>of</strong> directed evolution <strong>and</strong> rational design.<br />

Curr Opin Biotech 16: 378-384.<br />

15. Zumarraga M, Camarero S, Shleev S, Martinez-Arias A, Ballesteros A, et al. (2008)<br />

Altering the laccase functionality by in vivo assembly <strong>of</strong> mutant libraries with<br />

different mutational spectra. Proteins 71: 250-260.<br />

16. Vanhercke T, Ampe C, Tirry L, Denolf P (2005) Reducing mutational bias in r<strong>and</strong>om<br />

protein libraries. Anal Biochem 339: 9-14.<br />

17. Wong TS, Roccatano D, Zacharias M, Schwaneberg U (2006) A statistical analysis <strong>of</strong><br />

r<strong>and</strong>om mutagenesis methods used for directed protein evolution. J Mol Biol 355:<br />

858-871.<br />

18. Wong TS, Roccatano D, Schwaneberg U (2007) Are transversion mutations better? A<br />

Mutagenesis Assistant Program analysis on P450 BM-3 heme domain. Biotechnol J<br />

2: 133-142.<br />

19. Rasila TS, Pajunen MI, Savilahti H (2009) Critical evaluation <strong>of</strong> r<strong>and</strong>om mutagenesis<br />

by error-prone polymerase chain reaction protocols, Escherichia coli mutator strain,<br />

<strong>and</strong> hydroxylamine treatment. Anal Biochem 388: 71-80.<br />

20. Ditursi MK, Kwon SJ, Reeder PJ, Dordick JS (2006) Bioinformatics-driven, rational<br />

engineering <strong>of</strong> protein thermostability. Protein Eng Des Sel 19: 517-524.<br />

21. Shoichet BK, Beadle BM (2002) Structural bases <strong>of</strong> stability-function trade<strong>of</strong>fs in<br />

enzymes. J Mol Bio 321: 285-296.<br />

67


PART I: MAP 2.0 3D<br />

22. Berman H, Henrick K, Nakamura H (2003) Announcing the worldwide Protein Data<br />

Bank. Nat Struct Biol 10: 980-980.<br />

23. Zhang H, Zhang T, Chen K, Shen S, Ruan J, et al. (2009) On the relation between<br />

residue flexibility <strong>and</strong> local solvent accessibility in proteins. Proteins 76: 617-636.<br />

24. Teilum K, Olsen JG, Kragelund BB (2009) Functional aspects <strong>of</strong> protein flexibility.<br />

Cell Mol Life Sci 66: 2231-2247.<br />

25. Gromiha MM, Oobatake M, Kono H, Uedaira H, Sarai A (1999) Role <strong>of</strong> structural <strong>and</strong><br />

sequence information in the prediction <strong>of</strong> protein stability changes: comparison<br />

between buried <strong>and</strong> partially buried mutations. Protein Eng Des Sel 12: 549-555.<br />

26. Gromiha MM, Selvaraj S (2004) Inter-residue interactions in protein folding <strong>and</strong><br />

stability. Prog Biophys Mol Bio 86: 235-277.<br />

27. Kabsch W, S<strong>and</strong>er C (1983) Dictionary <strong>of</strong> protein secondary structure: pattern<br />

recognition <strong>of</strong> hydrogen-bonded <strong>and</strong> geometrical features. Biopolymers 22: 2577-<br />

2637.<br />

28. Chothia C (1976) The nature <strong>of</strong> the accessible <strong>and</strong> buried surfaces in proteins. J Mol<br />

Biol 105: 1-12.<br />

29. Peisajovich SG, Tawfik DS (2007) Protein engineers turned evolutionists. Nat<br />

Methods 4: 991-994.<br />

30. Karplus PA, Schulz GE (1985) Prediction <strong>of</strong> Chain Flexibility in Proteins - a Tool for<br />

the Selection <strong>of</strong> Peptide Antigens. Naturwissenschaften 72: 212-213.<br />

31. Yuan Z, Zhao J, Wang ZX (2003) Flexibility analysis <strong>of</strong> enzyme active sites by<br />

crystallographic temperature factors. Protein Eng Des Sel 16: 109-114.<br />

32. Kumar S, Nussinov R (2002) Close-range electrostatic interactions in proteins.<br />

Chembiochem 3: 604-617.<br />

33. Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic<br />

character <strong>of</strong> a protein. J Mol Biol 157: 105-132.<br />

34. Burley SK, Petsko GA (1985) Aromatic-aromatic interaction: a mechanism <strong>of</strong> protein<br />

structure stabilization. Science 229: 23-28.<br />

35. Overington J, Johnson MS, Sali A, Blundell TL (1990) Tertiary structural constraints<br />

on protein evolutionary diversity: templates, key residues <strong>and</strong> structure prediction.<br />

P Roy Soc Lond B Bio 241: 132-145.<br />

68


PART I: MAP 2.0 3D<br />

36. Liao GJ, Lee YJ, Lee YH, Chen LL, Chu WS (1998) Structure <strong>and</strong> expression <strong>of</strong> the D-<br />

amino-acid oxidase gene from the yeast Rhodosporidium toruloides. Biotechnol<br />

Appl Bioc 27 ( Pt 1): 55-61.<br />

37. Pollegioni L, Diederichs K, Molla G, Umhau S, Welte W, et al. (2002) Yeast D-amino<br />

acid oxidase: structural basis <strong>of</strong> its catalytic properties. J Mol Biol 324: 535-546.<br />

38. Rodriguez E, Han Y, Lei XG (1999) Cloning, sequencing, <strong>and</strong> expression <strong>of</strong> an<br />

Escherichia coli acid phosphatase/phytase gene (appA2) isolated from pig colon.<br />

Biochem Bioph Res Co 257: 117-123.<br />

39. Lim D, Golovan S, Forsberg CW, Jia Z (2000) Crystal structures <strong>of</strong> Escherichia coli<br />

phytase <strong>and</strong> its complex with phytate. Nat Struct Biol 7: 108-113.<br />

40. Ohta Y, Watanabe K, Kimura A (1985) Complete nucleotide sequence <strong>of</strong> the E. coli N-<br />

acetylneuraminate lyase. Nucleic Acids Res 13: 8843-8852.<br />

41. Izard T, Lawrence MC, Malby RL, Lilley GG, Colman PM (1994) The threedimensional<br />

structure <strong>of</strong> N-acetylneuraminate lyase from Escherichia coli. Structure<br />

2: 361-369.<br />

42. Pilone MS (2000) D-Amino acid oxidase: new findings. Cell Mol Life Sci 57: 1732-<br />

1747.<br />

43. Pollegioni L, Molla G (2011) New biotech applications from evolved D-amino acid<br />

oxidases. Trends Biotechnol 29: 276-283.<br />

44. Lin-Goerke JL, Robbins DJ, Burczak JD (1997) PCR-based r<strong>and</strong>om mutagenesis using<br />

manganese <strong>and</strong> reduced dNTP concentration. Biotechniques 23: 409-412.<br />

45. Vartanian JP, Henry M, Wain-Hobson S (1996) Hypermutagenic PCR involving all<br />

four transitions <strong>and</strong> a sizeable proportion <strong>of</strong> transversions. Nucleic Acids Res 24:<br />

2627-2631.<br />

46. Jmol: an open-source Java viewer for chemical structures in 3D.<br />

http://www.jmol.org/<br />

47. Sacchi S, Rosini E, Molla G, Pilone MS, Pollegioni L (2004) Modulating D-amino acid<br />

oxidase substrate specificity: production <strong>of</strong> an enzyme for analytical determination<br />

<strong>of</strong> all D-amino acids by directed evolution. Protein Eng Des Sel 17: 517-525.<br />

69


PART I: MAP 2.0 3D<br />

48. Rao DE, Rao KV, Reddy TP, Reddy VD (2009) Molecular characterization,<br />

physicochemical properties, known <strong>and</strong> potential applications <strong>of</strong> phytases: An<br />

overview. Crit Rev Biotechnol 29: 182-198.<br />

49. Garrett JB, Kretz KA, O'Donoghue E, Kerovuo J, Kim W, et al. (2004) Enhancing the<br />

thermal tolerance <strong>and</strong> gastric performance <strong>of</strong> a microbial phytase for use as a<br />

phosphate-mobilizing monogastric-feed supplement. Appl Environ Microb 70:<br />

3041-3046.<br />

50. Fields PA (2001) Review: Protein function at thermal extremes: balancing stability<br />

<strong>and</strong> flexibility. Comp Biochem Phys A 129: 417-431.<br />

51. Kim MS, Lei XG (2008) Enhancing thermostability <strong>of</strong> Escherichia coli phytase AppA2<br />

by error-prone PCR. Appl Microbiol Biotechnol 79: 69-75.<br />

52. Wada M, Hsu CC, Franke D, Mitchell M, Heine A, et al. (2003) Directed evolution <strong>of</strong><br />

N-acetylneuraminic acid aldolase to catalyze enantiomeric aldol reactions. Bioorgan<br />

Med Chem 11: 2091-2098.<br />

Part <strong>of</strong> this chapter is adapted with permission from ‘Verma R, Schwaneberg U,<br />

Roccatano D. ACS Synthetic Biology 2012, 1 (4), 139-150.’<br />

70


PART II: MD Simulation<br />

Chapter 3<br />

Introduction to Molecular Dynamics Simulation <strong>of</strong><br />

Biomolecules<br />

This chapter provides the brief introduction <strong>of</strong> molecular dynamics (MD) simulation<br />

followed by the description about the system preparation for MD simulation <strong>and</strong> the<br />

analysis <strong>of</strong> generated trajectories. In the following chapters <strong>of</strong> this thesis, the same<br />

procedure is used to perform MD simulation using P450BM-3 monooxygenase as model<br />

system <strong>and</strong> the analysis <strong>of</strong> trajectories.<br />

3.1. Background<br />

Last decades witnessed the rapid development in the field <strong>of</strong> MD simulations for<br />

biological molecules to study their dynamic processes at atomic level. Far from its infancy,<br />

the computer simulation methods can nowadays provide an important insight into the<br />

molecular basis <strong>of</strong> protein structure, function <strong>and</strong> dynamics relationships.[1-4]<br />

MD is a computational chemistry method that describes the dynamics <strong>of</strong> a molecular<br />

system by integrating Newton’s equations <strong>of</strong> motion for a system <strong>of</strong> N interacting atoms. In<br />

MD simulation, the force acting on the i th particle (Fi) are calculated as negative derivatives<br />

<strong>of</strong> a potential energy function (V) (equation 3.1), called force field that describes the atomic<br />

interactions in an approximate way. In equation 3.1, ri represents the position <strong>of</strong> i th atom.<br />

71


PART II: MD Simulation<br />

F<br />

i<br />

∂V<br />

( r<br />

= −<br />

1<br />

,r3<br />

…r<br />

∂r<br />

i<br />

N<br />

)<br />

(3.1)<br />

The dynamics <strong>of</strong> the system is calculated according to the Newton’s law by<br />

numerically integrating the differential equations <strong>of</strong> motion (equation 3.2). In this way, a<br />

new set <strong>of</strong> atomic positions <strong>and</strong> velocities (vi) can be generated at successive integration<br />

time step dt.<br />

∂v i =<br />

∂t<br />

Fi<br />

m<br />

i<br />

(3.2)<br />

The so-called Leap-frog algorithm[5] is commonly used in MD simulation to<br />

integrate the equation <strong>of</strong> motion. It updates velocities (equation 3.3) <strong>and</strong> positions<br />

(equation 3.4) <strong>of</strong> i th atom <strong>of</strong> mass mi using force F(t) at position ri(t).<br />

1 1 dt<br />

vi ( t + dt)<br />

= vi<br />

( t − dt)<br />

+ F(<br />

t)<br />

2 2 m<br />

(3.3)<br />

1<br />

ri ( t dt)<br />

= ri<br />

( t)<br />

+ dtvi<br />

( t + dt)<br />

2<br />

+<br />

(3.4)<br />

The simulation generates an ensemble <strong>of</strong> molecular configurations (trajectory) that<br />

describes the evolution <strong>of</strong> the coordinates <strong>and</strong> velocities <strong>of</strong> the system as a function <strong>of</strong> time.<br />

To generate equilibrium ensemble consistent with the experimental conditions, at which<br />

the system was studied temperature <strong>and</strong> pressure <strong>of</strong> the simulated system are controlled<br />

<strong>and</strong> keep constant during the simulation.[3,4]<br />

72


PART II: MD Simulation<br />

Force field equation<br />

In MD simulation, the force field characterizes the different terms <strong>of</strong> atomic<br />

interactions as bonded <strong>and</strong> non-bonded interaction. Bonded interactions include bonds,<br />

angles, dihedrals <strong>and</strong> improper interaction terms <strong>and</strong> non-bonded interactions have van<br />

der Waals (vdW) <strong>and</strong> electrostatic terms (equation 3.5).<br />

V = V + V + V + V + V + V<br />

bond<br />

angle<br />

dihedral<br />

improper<br />

vdW<br />

es<br />

(3.5)<br />

Bonded interactions<br />

Bond stretching between covalently bound atoms (i <strong>and</strong> j having bond length b) is<br />

calculated by covalent bond potential (Vbond) in GROMOS-96 (equation 3.6) using<br />

b<br />

k<br />

ij<br />

force<br />

constant <strong>and</strong> equilibrium bond length b0. Angle vibrations between triplets <strong>of</strong> atoms (i, j<br />

<strong>and</strong> k) having bond angle θ are represented by cosine based angle potential (Vangle)<br />

(equation 3.7) using<br />

θ<br />

k<br />

ijk<br />

force constant <strong>and</strong> equilibrium bond angle cosθ0.<br />

V<br />

bond<br />

1<br />

4<br />

b 2 2 2<br />

= kij<br />

( b − b0<br />

)<br />

(3.6)<br />

V<br />

1<br />

=<br />

2<br />

θ<br />

angle<br />

k ijk<br />

(cosθ<br />

− cosθ<br />

)<br />

o<br />

2<br />

(3.7)<br />

Torsional interactions (equation 3.8) involve four atoms (i, j, k, <strong>and</strong> l) <strong>and</strong> define the<br />

dihedral angle Φ as the angle present between two planes constituted by first three (i, j <strong>and</strong><br />

k) <strong>and</strong> last three (j, k <strong>and</strong> l) atoms. Torsional potential define the interactions arising by the<br />

73


PART II: MD Simulation<br />

rotation <strong>of</strong> two functional groups connected with a bond <strong>and</strong> defined in equation 3.8,<br />

where<br />

φ<br />

k<br />

ijkl<br />

is the force constant.<br />

V<br />

dihedral<br />

=<br />

k<br />

1+<br />

cos( nφ<br />

−φ<br />

))<br />

φ<br />

ijkl( o<br />

(3.8)<br />

Improper dihedrals are used to define the planarity <strong>of</strong> the four atoms (i, j, k <strong>and</strong> l)<br />

defined by harmonic interaction potential in equation 3.9 where<br />

ξ<br />

k<br />

ijkl<br />

is the force constant<br />

<strong>and</strong> ξ is the dihedral angle on four atoms to keep them in special configuration. For<br />

example, ξ0 will be equal to 0° to keep the four atoms in planar but also tetrahedral<br />

configuration (i, j, k <strong>and</strong> l)).<br />

V<br />

1<br />

2<br />

ξ<br />

ξ<br />

ξ<br />

2<br />

improper<br />

= k ijkl<br />

( −<br />

o)<br />

(3.9)<br />

Non-bonded interactions<br />

vdW interactions are resulted from the induced atomic dipoles <strong>and</strong> excluded<br />

volumes <strong>of</strong> atom pairs. They are attractive at long-range distance but become repulsive at<br />

short-range distance between the atom pairs. In GROMACS, they are defined as Lennard-<br />

Jones (LJ) potential terms (VLJ) (see equation 3.10). In equation 3.10, εij <strong>and</strong> σij are empirical<br />

parameters where ε is the depth <strong>of</strong> potential well <strong>and</strong> σ is the distance at which the<br />

potential is zero <strong>and</strong> rij is the distance between i th <strong>and</strong> j th atoms.<br />

V<br />

LJ<br />

⎛<br />

⎜⎛σ<br />

ij<br />

= 4ε<br />

⎜<br />

ij⎜<br />

⎝⎝<br />

rij<br />

⎞<br />

⎟<br />

⎠<br />

12<br />

⎛σ<br />

ij<br />

− ⎜<br />

⎝ rij<br />

⎞<br />

⎟<br />

⎠<br />

6<br />

⎞<br />

⎟<br />

⎟<br />

⎠<br />

(3.10)<br />

74


PART II: MD Simulation<br />

Electrostatic potential (Ves) terms define the Coulombic interaction between two<br />

charged atoms (i <strong>and</strong> j) calculated by equation 3.11, where ε0 <strong>and</strong> εr are the dielectric<br />

constants, qi <strong>and</strong> qj are the atomic charges on i th <strong>and</strong> j th atoms <strong>and</strong> rij is the distance between<br />

them.<br />

V<br />

es<br />

=<br />

1<br />

4πε 0<br />

q q<br />

i<br />

ε r<br />

r<br />

j<br />

ij<br />

(3.11)<br />

The topology <strong>of</strong> a simulated biomolecule is the ordered list <strong>of</strong> all these interactions<br />

<strong>and</strong> their predefined parameters for the selected force field. In this thesis all MD<br />

simulations were performed using GROMOS96 43a1 force field.[4]<br />

3.2. Setup <strong>of</strong> the simulated systems<br />

The crystal structure <strong>of</strong> the protein was used as the starting coordinates <strong>of</strong> the MD<br />

simulation (in this thesis PDB ID: 1BVY [6] is used for the simulation <strong>of</strong> P450BM-3<br />

domains). The proteins were centered in a cubic periodic box <strong>and</strong> set to have at least a<br />

minimal distance between the protein <strong>and</strong> any side <strong>of</strong> the box larger than 0.80 nm so that<br />

the protein cannot see its periodic image across the boundary <strong>of</strong> the box. They were<br />

solvated by stacking equilibrated boxes <strong>of</strong> solvent molecules to fill completely the<br />

simulation box. All solvent molecules with any atom within 0.15 nm from the atoms <strong>of</strong><br />

protein were removed. SPC water model, a simple three atoms model was used for water<br />

molecule. [7] Sodium counter ions were added by replacing solvent molecules at the most<br />

negative electrostatic potential to provide a total charge <strong>of</strong> the box equal to zero. The<br />

protonation state <strong>of</strong> residues in the protein was assumed to be the same as <strong>of</strong> the isolated<br />

amino acids in solution at pH 7. Hence, the water molecules closest to the charges in the<br />

protein structures are replaced by the counter ions to neutralize the system. The LINCS<br />

(Linear Constraints Solver)[8] algorithm was used to constrain all bond lengths. LINCS [8]<br />

75


PART II: MD Simulation<br />

algorithm use the stable <strong>and</strong> fast way to reset the bond length after an unconstrained<br />

update. SETTLE [9] algorithm was used for solvent molecules to constrain them as rigid<br />

body.<br />

Electrostatic interactions were calculated by using Particle Mesh Ewalds<br />

method.[10] For the calculation <strong>of</strong> the long-range interactions, a grid spacing <strong>of</strong> 0.12 nm<br />

combined with a fourth-order B-spline interpolation were used to compute the potential<br />

<strong>and</strong> forces between grid points. A non-bonded pair-list cut<strong>of</strong>f <strong>of</strong> 1.3 nm was used <strong>and</strong><br />

updated at every 5 time-steps.<br />

3.3. Equilibration procedure<br />

Simulated systems were first energy minimized, using the steepest descent<br />

algorithm [11], for at least 2000 steps in order to remove clashes between atoms that were<br />

too close. After energy minimization, all atoms were given an initial velocity obtained from<br />

a Maxwell-Boltzmann velocity distribution at 300 K to start MD simulations.<br />

All systems were initially equilibrated by 100 ps <strong>of</strong> MD run with position restraints<br />

on the heavy atoms <strong>of</strong> the solute to allow relaxation <strong>of</strong> the solvent molecules. In position<br />

restraint, the protein is fixed in the reference position using force constants in each spatial<br />

dimension <strong>and</strong> let the solvent relax around protein. Berendsen’s thermostat[12] was used<br />

to keep the temperature at 300 K by weak coupling the systems to an external thermal bath<br />

with a relaxation time constant τ = 0.1 ps. The pressure <strong>of</strong> the system was kept at 1 bar by<br />

using the Berendsen’s barostat[12] with a time constant <strong>of</strong> 1 ps. After the equilibration<br />

procedure, position restraints were removed <strong>and</strong> the system was gradually heated from 50<br />

K to 300 K during 200 ps <strong>of</strong> simulation. Finally, a production run was performed at 300 K.<br />

The analysis <strong>of</strong> trajectories were performed by using the GROMACS s<strong>of</strong>tware package<br />

(http://www.gromacs.org/).[13]<br />

76


PART II: MD Simulation<br />

3.4. Structural <strong>and</strong> dynamical analysis<br />

The structural stability <strong>and</strong> convergence <strong>of</strong> protein were examined by analyzing<br />

root mean square deviation (RMSD), radius <strong>of</strong> gyration (Rg) <strong>and</strong> secondary structure<br />

elements with respect to its crystal structure as a function <strong>of</strong> time. Residual root mean<br />

square fluctuation (RMSF) was used to access the dynamics <strong>of</strong> the target protein during the<br />

simulation.<br />

3.5. Cluster analysis<br />

Cluster analysis was performed to characterize the conformational diversity <strong>of</strong> the<br />

structures generated during MD simulations. Cluster analysis was performed using the<br />

Gromos clustering algorithm[14] on the backbone atoms (Cα, C <strong>and</strong> N) <strong>of</strong> protein. The<br />

analysis is performed on conformations extracted at regular time interval from the<br />

generated trajectory. The resulting conformations were superimposed with respect to<br />

backbone atoms <strong>of</strong> reference structure after removing overall translation <strong>and</strong> rotation <strong>of</strong><br />

protein in space. The similarity (RMSD-distance) matrix is prepared for all the pairs <strong>of</strong><br />

selected conformers.<br />

In Gromos clustering algorithm, RMSD cut<strong>of</strong>f criteria is used to add similar atomic<br />

coordinates in the cluster having RMSD value less than the defined cut<strong>of</strong>f. The atomic<br />

coordinates having smallest RMSD from other members <strong>of</strong> the cluster is known as the<br />

representative structure <strong>of</strong> that cluster. The same process is repeated until all the selected<br />

structures for the analysis are assigned to the clusters. In this way, the large clusters<br />

represent the ensemble <strong>of</strong> frequently populated configurations in conformational space<br />

during MD simulation.<br />

77


PART II: MD Simulation<br />

3.6. Principal component analysis<br />

Principal component analysis (PCA, also called essential dynamics analysis) was<br />

performed to access the conformational space by identifying collective motions in the<br />

biomolecules during MD simulation. PCA correlates the atomic positional fluctuations in<br />

proteins <strong>and</strong> can enhance the molecular level underst<strong>and</strong>ing <strong>of</strong> protein function.[15] The<br />

covariance matrix (C) <strong>of</strong> atomic coordinates (3N x 3N) is used to construct to identify<br />

collective motions in the biomolecules[16,17] The backbone atoms (Cα, C <strong>and</strong> N) <strong>of</strong> the<br />

target proteins were used to explore the conformational subspace in solution. For PCA<br />

analysis first the translational <strong>and</strong> rotational motions are eliminated from the trajectory (to<br />

consider internal motions only) by the least square fitting <strong>of</strong> atomic coordinates (using<br />

backbone atoms) to crystal structure. The resulted set <strong>of</strong> atomic coordinates is used to<br />

construct C <strong>of</strong> positional deviations using equation 3.12.[17]<br />

C<br />

=<br />

( x − x )( x − x ) T<br />

(3.12)<br />

where x is the subset <strong>of</strong> atoms in the trajectory x(t) <strong>and</strong><br />

represents the ensemble<br />

average over time. The eigenvectors or essential modes can be identified by the<br />

diagonalization <strong>of</strong> the symmetric matrix (C) using orthogonal transformation matrix T. The<br />

displacements along different eigenvectors were calculated by projecting the atomic<br />

coordinates on eigenvectors. The comparison <strong>of</strong> eigenvectors obtained from different<br />

simulations was performed using the root-mean-square inner product (RMSIP)[18], which<br />

is defined in equation 3.13.<br />

RMSIP<br />

=<br />

1<br />

N<br />

m<br />

m<br />

∑ ∑<br />

i= 1 j=<br />

1<br />

2<br />

( v<br />

i<br />

⋅ u<br />

j<br />

)<br />

(3.13)<br />

78


PART II: MD Simulation<br />

where vi <strong>and</strong> uj are i th <strong>and</strong> j th eigenvectors <strong>of</strong> the two different m dimensions essential<br />

subspaces <strong>of</strong> the two systems. RMSIP gives a simple measure to assess the dynamical<br />

similarity <strong>of</strong> eigenvectors.[18]<br />

3.7. References<br />

1. Dror RO, Dirks RM, Grossman JP, Xu H, Shaw DE (2012) Biomolecular simulation: a<br />

computational microscope for molecular biology. Annu Rev Biophys 41: 429-452.<br />

2. Mccammon JA, Gelin BR, Karplus M (1977) Dynamics <strong>of</strong> Folded Proteins. Nature<br />

267: 585-590.<br />

3. Karplus M, McCammon JA (2002) Molecular dynamics simulations <strong>of</strong> biomolecules.<br />

Nat Struct Biol 9: 646-652.<br />

4. van Gunsteren WF, Billeter SR, Eising AA, Hunenberger PH, Kruger P, et al. (1996)<br />

Biomolecular Simulation: The GROMOS96 Manual <strong>and</strong> User Guide. VdF:<br />

Hochschulverlag AG an der ETH Zurich <strong>and</strong> BIOMOS bv, Zurich, Groningen.<br />

5. Hockney RW, Goel SP, Eastwood JW (1974) Quiet high-resolution computer models<br />

<strong>of</strong> a plasma. J Comp Phys 14: 148-158.<br />

6. Sevrioukova IF, Li HY, Zhang H, Peterson JA, Poulos TL (1999) Structure <strong>of</strong> a<br />

cytochrome P450-redox partner electron-transfer complex. P Natl Acad Sci USA 96:<br />

1863-1868.<br />

7. Berendsen HJC, Postma JPM, van Gunsteren WF, Hermans J (1981) Interaction<br />

models for water in relation to protein hydration. Intermolecular Forces: 331-342.<br />

8. Hess B, Bekker H, Berendsen HJC, Fraaije JGEM (1997) LINCS: A linear constraint<br />

solver for molecular simulations. J Comput Chem 18: 1463-1472.<br />

9. Miyamoto S, Kollman PA (1992) Settle - an Analytical Version <strong>of</strong> the Shake <strong>and</strong><br />

Rattle Algorithm for Rigid Water Models. J Comput Chem 13: 952-962.<br />

10. Darden T, York D, Pedersen L (1993) Particle Mesh Ewald - an N.Log(N) Method for<br />

Ewald Sums in Large Systems. J Chem Phys 98: 10089-10092.<br />

79


PART II: MD Simulation<br />

11. Cauchy A (1847) Méthode générale pour la résolution des systèmes d'équations<br />

simultanées. C R Acad Sci Paris 25: 536-538.<br />

12. Berendsen HJC, Postma JPM, Vangunsteren WF, Dinola A, Haak JR (1984) Molecular-<br />

Dynamics with Coupling to an External Bath. J Chem Phys 81: 3684-3690.<br />

13. Hess B, Kutzner C, van der Spoel D, Lindahl E (2008) GROMACS 4: Algorithms for<br />

highly efficient, load-balanced, <strong>and</strong> scalable molecular simulation. J Chem Theory<br />

Comput 4: 435-447.<br />

14. Daura X, Gademann K, Jaun B, Seebach D, van Gunsteren WF, et al. (1999) Peptide<br />

folding: When simulation meets experiment. Angew Chem Int Edit 38: 236-240.<br />

15. Berendsen HJ, Hayward S (2000) Collective protein dynamics in relation to function.<br />

Curr Opin Struct Biol 10: 165-169.<br />

16. Ichiye T, Karplus M (1991) Collective motions in proteins: a covariance analysis <strong>of</strong><br />

atomic fluctuations in molecular dynamics <strong>and</strong> normal mode simulations. Proteins<br />

11: 205-217.<br />

17. Amadei A, Linssen AB, Berendsen HJ (1993) Essential dynamics <strong>of</strong> proteins. Proteins<br />

17: 412-425.<br />

18. Amadei A, Ceruso MA, Di Nola A (1999) On the convergence <strong>of</strong> the conformational<br />

coordinates basis set obtained by the essential dynamics analysis <strong>of</strong> proteins'<br />

molecular dynamics simulations. Proteins 36: 419-424.<br />

80


PART II: P450BM-3 Reductase Domain<br />

Chapter 4<br />

Conformational Dynamics <strong>of</strong> the FMN-binding Reductase<br />

Domain <strong>of</strong> Monooxygenase P450BM-3<br />

4.1. Abstract<br />

In the cytochrome P450BM-3, flavin mononucleotide (FMN) binding domain is an<br />

intermediate electron donor between the flavin adenine dinucleotide (FAD) binding<br />

domain <strong>and</strong> the HEME domain. Experimental evidence has shown that different redox<br />

states <strong>of</strong> FMN c<strong>of</strong>actor were found to induce conformational changes in the FMN domain.<br />

Herein, molecular dynamics (MD) simulation is used to gain insight into the latter<br />

phenomenon at atomistic level. We have studied the effect <strong>of</strong> FMN c<strong>of</strong>actor <strong>and</strong> its redox<br />

states (oxidized <strong>and</strong> reduced) on the structure <strong>and</strong> dynamics <strong>of</strong> FMN domain. The results <strong>of</strong><br />

our study show significant differences in the atomic fluctuation amplitude <strong>of</strong> FMN domain<br />

in both holo- <strong>and</strong> apo-protein. The change in the protonation state <strong>of</strong> FMN c<strong>of</strong>actor mostly<br />

affects its binding in holo-protein. In particular, the loops involved in the binding <strong>of</strong><br />

isoalloxazine ring (Lβ4) <strong>and</strong> ribityl side chain (Lβ1) adopt different conformations in both<br />

reduced <strong>and</strong> oxidized states. In addition, the reduced FMN c<strong>of</strong>actor mainly induces a<br />

conformational change in Trp574 residue (Lβ4) that is essential to control electron<br />

transfer (ET) within P450BM-3 domains. The structure <strong>of</strong> the apo-protein in solution<br />

remains mostly unchanged with respect to the crystal structure <strong>of</strong> the holo-protein.<br />

However, FMN binding loops were more flexible in apo-protein that might favor the<br />

81


PART II: P450BM-3 Reductase Domain<br />

rebinding <strong>of</strong> FMN c<strong>of</strong>actor. In the holo-protein simulation, the largest conformational<br />

changes in FMN c<strong>of</strong>actor are caused by ribityl side chain. The isoalloxazine ring <strong>of</strong> FMN<br />

c<strong>of</strong>actor remains almost planar (~177°) in oxidized state <strong>and</strong> bends along N5 — N10 axis<br />

at the angle <strong>of</strong> ~160° in reduced state. The collective modes <strong>of</strong> isoalloxazine ring were<br />

identical in both the protonation states <strong>of</strong> FMN c<strong>of</strong>actor except the first eigenvector. In<br />

reduced state, the isoalloxazine ring attains the butterfly motion as a dominant collective<br />

motion in first eigenvector due to the bending along N5 — N10 axis.<br />

4.1. Introduction<br />

Cytochrome P450 monooxygenases, the largest superfamily <strong>of</strong> heme-containing<br />

soluble proteins, spread widely in almost all domains <strong>of</strong> life e.g. bacteria, yeast, insects,<br />

mammalian tissues, <strong>and</strong> plants.[1-3] They catalyze the oxidation <strong>of</strong> wide variety <strong>of</strong><br />

substrates involved in biosynthesis <strong>and</strong> biodegradation pathways, or in xenobiotics<br />

metabolism.[4] Cytochrome P450BM-3, isolated from Bacillus megaterium, is a<br />

multidomain self-sufficient NADPH dependent flavoenzyme (class III bacterial P450).[5] As<br />

a pivotal member <strong>of</strong> its super family it has been deeply studied as an important model<br />

system for the comprehension <strong>of</strong> structure/function relationships <strong>and</strong> many structural <strong>and</strong><br />

kinetic data are available in literature.[6,7] The peculiar catalytic properties <strong>of</strong> this enzyme<br />

towards industrial applications has also been successfully enhanced by protein<br />

engineering.[8,9]<br />

This enzyme is composed by two reductase domains (with FAD <strong>and</strong> FMN c<strong>of</strong>actors)<br />

<strong>and</strong> a P450 HEME domain (with a HEME c<strong>of</strong>actor) are arranged on a single polypeptide<br />

chain as HEME-FMN-FAD from N- to C-terminus. The transfer <strong>of</strong> two successive electrons<br />

from NADPH to HEME c<strong>of</strong>actor is essential for oxygenation reaction.[10-12] During the<br />

oxygenation reaction, the enzyme is reduced by NADPH, with electrons first transferred to<br />

FAD c<strong>of</strong>actor <strong>of</strong> FAD-binding domain then to the FMN c<strong>of</strong>actor <strong>of</strong> FMN-binding domain <strong>and</strong><br />

finally to the HEME iron in the substrate bound HEME domain. In this ET process, FMN<br />

82


PART II: P450BM-3 Reductase Domain<br />

domain serves as one or two electrons mediator from the FAD c<strong>of</strong>actor to the heme<br />

iron.[13] FMN c<strong>of</strong>actor switches between fully oxidized <strong>and</strong> semiquinone state during<br />

catalytic turnover. The thermodynamically unstable anionic semiquinone state can reduce<br />

HEME iron. However, other P450s utilize FMN hydroquinone as the reduction species.[11-<br />

13] In the substrate free P450BM-3, FMN c<strong>of</strong>actor stays in a thermodynamically stable<br />

hydroquinone state which is not able to reduce HEME iron.[11]. The use <strong>of</strong> the anionic<br />

semiquinone state <strong>of</strong> FMN c<strong>of</strong>actor as reduction species makes P450BM-3 more efficient<br />

with a high turnover rate in comparison to other members <strong>of</strong> this family.[14] Therefore, the<br />

thermodynamics properties <strong>of</strong> the FMN moiety are mainly responsible for the unusual<br />

redox properties <strong>of</strong> P450BM-3. The protein environment has a strong influence on the<br />

latter mechanism by changing the redox potential <strong>of</strong> FMN <strong>and</strong> HEME c<strong>of</strong>actors. The<br />

mutagenesis studies have shown that the conformation <strong>of</strong> FMN binding loops plays a<br />

critical role in stabilizing the different redox states <strong>of</strong> FMN c<strong>of</strong>actor in the protein<br />

environment.[14,15] The insertion <strong>of</strong> a glycine residue in the re-face (inner-FMN binding)<br />

loop able to stabilize neutral semiquinone state in P450BM-3 as observed in other diflavin<br />

reductases.[16]<br />

Although, experimental data <strong>of</strong> the isolated FMN domain in solution <strong>and</strong> the<br />

crystallographic structure[17,18] are available, molecular dynamics (MD) study <strong>of</strong> this<br />

protein has not been reported. In this paper, the structural <strong>and</strong> dynamics properties <strong>of</strong> the<br />

FMN domain as holo-protein, with FMN c<strong>of</strong>actor in oxidized <strong>and</strong> reduced states <strong>and</strong> as apoprotein<br />

are investigated using classical MD simulations. The aim <strong>of</strong> this study is to<br />

underst<strong>and</strong> the structural <strong>and</strong> dynamics properties <strong>of</strong> the protein in solution <strong>and</strong> the effect<br />

<strong>of</strong> the protonation state <strong>of</strong> FMN c<strong>of</strong>actor on the conformational dynamics <strong>of</strong> FMN c<strong>of</strong>actor<br />

<strong>and</strong> the whole protein. The paper is organized as follows. In the Methods Section, the<br />

details <strong>of</strong> the simulations, in particular the refinement <strong>of</strong> the original GROMOS96 FMN<br />

parameters for the oxidized <strong>and</strong> reduced states <strong>of</strong> the FMN c<strong>of</strong>actors are reported. In the<br />

results part, the analysis <strong>of</strong> the simulation trajectories for the apo- <strong>and</strong> the holo-protein in<br />

the oxidized <strong>and</strong> reduced states are reported. The analysis will be focused on the structural<br />

<strong>and</strong> dynamics properties <strong>of</strong> the overall protein structure <strong>and</strong> <strong>of</strong> the FMN binding side as<br />

well as the FMN c<strong>of</strong>actor. Finally, in the discussion <strong>and</strong> conclusions, the results <strong>of</strong> the<br />

83


PART II: P450BM-3 Reductase Domain<br />

simulations will be discussed in the context <strong>of</strong> the experimental knowledge <strong>of</strong> the FMN<br />

domain <strong>and</strong> a summary <strong>of</strong> the paper will be provided.<br />

4.2. Methods<br />

4.2.1. Starting coordinates<br />

The starting coordinates <strong>of</strong> the FMN domain (residue 479 - 630) were taken from a<br />

non-stoichiometric complex <strong>of</strong> P450BM-3 (PDB ID: 1BVY, 2.03 nm resolution) which has<br />

one FMN domain with two HEME domains.[18] The crystallographic water (within a<br />

distance <strong>of</strong> 0.6 nm from the FMN domain) was also kept during system preparation for MD<br />

simulations.<br />

4.2.2. Molecular dynamics simulation<br />

Table 4.1 summarizes the systems used to perform MD simulations with GROMOS96<br />

43a1 force field.[19] The crystal structure[18] has FMN c<strong>of</strong>actor in oxidized state (FOX)<br />

(see Figure 4.1a). The protonation <strong>of</strong> FOX isoalloxazine ring at N1 <strong>and</strong> N5 position<br />

represents the reduced state <strong>of</strong> FMN c<strong>of</strong>actor (FHQ) as indicated in Figure 4.1b. For the<br />

preparation <strong>of</strong> apo-protein simulation, FMN c<strong>of</strong>actor was removed from the crystal<br />

structure <strong>of</strong> FMN domain. GROMOS96 force field[20] was used for FMN c<strong>of</strong>actor. Additional<br />

improper dihedrals were introduced to adopt the conformation <strong>of</strong> isoalloxazine ring as<br />

observed in crystallographic structure <strong>and</strong> molecular geometry optimization <strong>of</strong> flavin in<br />

both redox states, reported in Table S4.1 (for FOX) <strong>and</strong> Table S4.2 (for FHQ) <strong>of</strong> supporting<br />

information (SI).[21,22]<br />

In the isoalloxazine ring <strong>of</strong> FMN c<strong>of</strong>actor, bending angle (δ) <strong>and</strong> puckering angle (ρ)<br />

were calculated along N5—N10 axis using v1, v2, a1, a2 <strong>and</strong> c vectors as shown in Figure 4.2.<br />

84


PART II: P450BM-3 Reductase Domain<br />

In FOX, isoalloxazine ring was kept close to planar while the bending angle <strong>of</strong> ~160º was<br />

used for FHQ.<br />

Table 4.1: Summarizing MD Simulation <strong>of</strong> P450BM-3 FMN domain in water.<br />

FMN domain<br />

No. <strong>of</strong> atoms<br />

No. <strong>of</strong> solvent No.<br />

<strong>of</strong> Simulation<br />

molecules counter ions length (ns)<br />

Oxidized (FOX) 33483<br />

10650 14<br />

50<br />

Reduced (FHQ) 33491<br />

10652 14<br />

50<br />

Apo protein<br />

(APO)<br />

33482<br />

10662 13<br />

50<br />

*The abbreviations FOX, FHQ <strong>and</strong> APO are used in the rest <strong>of</strong> the paper for FMN domain in oxidized<br />

<strong>and</strong> reduced states, <strong>and</strong> as apo-protein, protein, respectively.<br />

Figure 4.1: The schematic representation <strong>of</strong> FMN c<strong>of</strong>actor in oxidized (a) <strong>and</strong> reduced (b) states<br />

with atomic numbering <strong>of</strong> isoalloxazine ring.[23] N1 <strong>and</strong> N5 atoms, in blue ovals highlight the<br />

protonation positions. ChemSketch[24] was used to draw the figures.<br />

85


PART II: P450BM-3 Reductase Domain<br />

Figure 4.2: The schematic structure <strong>of</strong> the isoalloxazine ring to define the bending angle (δ) <strong>and</strong><br />

puckering angle (ρ) using the vectors v 1, v 2, a 1, a 2 <strong>and</strong> c.<br />

4.2.3. FMN binding site analysis<br />

The FMN binding site <strong>of</strong> holo-protein <strong>and</strong> apo-protein protein were tracked throughout the<br />

MD simulation using the MDpocket method.[25] The analysis was performed on total 5000<br />

snapshots after taking every 50 th frame from the trajectories. The pocket volume analysis<br />

was performed on aligned MD snapshots with the minimum alpha sphere size <strong>of</strong> 3 Å, the<br />

minimum number <strong>of</strong> alpha sphere close to each other for clustering <strong>of</strong> alpha sphere equal<br />

to The number <strong>of</strong> iteration to perform pocket volume calculation using Monte Carlo<br />

algorithm was set to 5000. The grid file generated by the first MDpocket run (iso-value =<br />

0.7) was used to extract the grid points for FMN binding pocket. The grid file <strong>of</strong> FMN<br />

binding pocket was edited manually by deleting some grid points using PyMOL[26] for<br />

better representation <strong>of</strong> FMN binding site in FMN domain. The FMN binding grid file was<br />

used with the aligned snapshots to track the changes in FMN binding site during the<br />

simulation.<br />

4.2.4. Multiple structural alignment <strong>of</strong> FMN domain<br />

The homologous structures <strong>of</strong> FMN domain were obtained by performing BlastP[27]<br />

against PDB database.[28] The protein sequences with identity greater than 20% were<br />

86


PART II: P450BM-3 Reductase Domain<br />

taking in account for further analysis. Six structures were selected as homologous to FMN<br />

domain <strong>of</strong> P450BM-3 after manually removing the redundant entries from BlastP results<br />

(summarized in Table 2). Multiple structural alignment (MSA) was performed on selected<br />

structures taking the FMN domain as reference structure with maximum RMSD cut<strong>of</strong>f <strong>of</strong><br />

0.5 nm using UCSB chimera.[29]<br />

4.3. Results<br />

4.3.1. FMN domain: structural <strong>and</strong> dynamical properties<br />

Figure 4.3 shows the backbone root mean square deviation (RMSD) <strong>of</strong> the FMN<br />

domain as apo- <strong>and</strong> holo- protein with FMN c<strong>of</strong>actor in oxidized <strong>and</strong> reduced states. In FOX,<br />

RMSD curve stabilizes to a plateau with an average value <strong>of</strong> 0.24 ± 0.04 nm after a rapid<br />

increase in the first 15 ns simulation. In the first 10 ns simulation, FHQ follows the same<br />

trend as observed in FOX. However, after 15 ns, RMSD <strong>of</strong> FHQ stabilizes to an average value<br />

<strong>of</strong> 0.16 ± 0.01 nm lower than the one in FOX. In APO, FMN domain remains stable<br />

throughout the simulation with the average RMSD value <strong>of</strong> 0.19 ± 0.01 nm after the short<br />

equilibration <strong>of</strong> ~5 ns. For all the simulations, the average radius <strong>of</strong> gyration (Rg) did not<br />

show appreciable variations from the crystal structure value (1.45 nm).<br />

87


PART II: P450BM-3 Reductase Domain<br />

Figure 4.3: Backbone RMSD with respect to crystal structure as a function <strong>of</strong> time for APO (in<br />

green) <strong>and</strong> holo-protein in FOX (in black) <strong>and</strong> FHQ (in red).<br />

FMN domain has a highly classical flavodoxin fold with five parallel β-sheets (β1 –<br />

β5) that are surrounded by four α-helices (α1 – α4). The loop regions together with<br />

irregular structures (coils <strong>and</strong> turns) are named according to the secondary structure<br />

element, α(–helix) or β(–sheet), preceding them. FMN c<strong>of</strong>actor is surrounded by three<br />

loops that succeed β sheets, hence named as Lβ1, Lβ3 <strong>and</strong> Lβ4 for ribityl binding loop<br />

(residues 488 – 491), inner (re face residues 534 – 544) <strong>and</strong> outer (si face residuse 571 –<br />

579) FMN binding loop, respectively. The crystallographic protein secondary structure,<br />

calculated using the DSSP method[30], was preserved during the simulation <strong>of</strong> FOX, FHQ<br />

<strong>and</strong> even in the APO simulations (see Figure S4.1 <strong>of</strong> SI).<br />

Figure 4.4: Backbone RMSD (a) <strong>and</strong> RMSF (b) per residue with respect to crystal structure for<br />

FOX (black), FHQ (red) <strong>and</strong> APO (green). Vertical bars in grey color show the loop regions. Loops<br />

surrounding the FMN c<strong>of</strong>actor are shown in black horizontal bars. (c) FMN domain in pink <strong>and</strong> FMN<br />

c<strong>of</strong>actor in cyan color with labeled helices <strong>and</strong> loop regions. FMN binding loops are labeled in<br />

orange color. N <strong>and</strong> C represent the amino <strong>and</strong> carboxy terminus <strong>of</strong> FMN domain.<br />

88


PART II: P450BM-3 Reductase Domain<br />

In Figure 4.4a <strong>and</strong> 4.4b, backbone RMSD <strong>and</strong> RMSF per residue with respect to<br />

crystal structure are reported, respectively. FMN domain with labeled helices <strong>and</strong> loops is<br />

reported in Figure 4.4c. Residues with large RMSD <strong>and</strong> RMSF values corresponds to loop<br />

regions (represented by grey colored bars in the Figure 4.4a <strong>and</strong> 4.4b) <strong>and</strong> to N- <strong>and</strong> C-<br />

terminus. In all simulations, the largest deviations <strong>and</strong> fluctuations were observed in Lβ3<br />

<strong>and</strong> Lβ4 FMN binding loops (black horizontal bars in Figure 4.4a <strong>and</strong> 4.4b) <strong>and</strong> Lβ2 <strong>and</strong><br />

Lα2 loops which are present opposite to FMN binding site. Lα2 loop shows the highest<br />

deviation in FOX. In APO, the deviations <strong>and</strong> fluctuations are mainly observed in the FMN<br />

binding loops (especially in the Lβ1 loop) that also occupy the FMN binding cavity during<br />

simulation.<br />

4.3.2. Cluster analysis <strong>of</strong> FMN domain<br />

Total number <strong>of</strong> clusters observed in FOX, FHQ <strong>and</strong> APO are 11, 12 <strong>and</strong> 7,<br />

respectively. The first two clusters accounts for 80.26 % <strong>and</strong> 79.25 % <strong>of</strong> the population for<br />

FOX <strong>and</strong> FHQ, respectively. The first cluster contributes to 47.10 % in FOX <strong>and</strong> 60.07 % in<br />

FHQ. While the second cluster represent the 33.16 % <strong>and</strong> 19.18 % <strong>of</strong> the total population in<br />

FOX <strong>and</strong> FHQ, respectively. In APO, even the first cluster covers the 85.17 % <strong>of</strong> the<br />

population. The apo-protein shows the least conformational diversity during the<br />

simulation. However the difference in the population <strong>of</strong> clusters in holo-protein indicates<br />

that the FMN protonation state notably influences the conformation <strong>of</strong> FMN domain.<br />

In Figure 4.5a, 4.5b <strong>and</strong> 4.5c, the representative conformations <strong>of</strong> the first two<br />

clusters <strong>of</strong> FOX <strong>and</strong> FHQ, <strong>and</strong> the first cluster <strong>of</strong> APO are superimposed with the crystal<br />

structure (in sky blue) <strong>and</strong> shown in cartoon representation. Major differences occur in<br />

loop regions as well as in N- <strong>and</strong> C- terminus. In particular, slightly larger deviations are<br />

present in FMN binding loops in FOX <strong>and</strong> FHQ. On the contrary, FMN binding region in APO<br />

is the most deviating part <strong>of</strong> FMN domain. The loop regions Lβ2 <strong>and</strong> Lα2 are also affected<br />

by the presence <strong>and</strong> the protonation state <strong>of</strong> FMN c<strong>of</strong>actor. In FOX, Lβ2 <strong>and</strong> Lα2 show<br />

larger deviations than in FHQ <strong>and</strong> APO as evidenced by the conformation <strong>of</strong> the first two<br />

89


PART II: P450BM-3 Reductase Domain<br />

clusters (see Figure 4.5). Lα2 loop flips inwards with higher deviation from crystal<br />

structure in the first two clusters <strong>of</strong> FOX.<br />

Figure 4.5: The conformation <strong>of</strong> first two clusters <strong>of</strong> (a) oxidized (black <strong>and</strong> gray), (b) reduced<br />

(red <strong>and</strong> coral) <strong>and</strong> (c) apo-protein protein (green) superimposed with crystal structure (in sky blue).<br />

Loops <strong>and</strong> helices (α1, α2, α3 <strong>and</strong> α4) are labeled. FMN binding loops are labeled in red color. The<br />

labeling <strong>of</strong> loops belongs to the secondary structure element succeed them.<br />

4.3.3. FMN binding site<br />

Figure 4.6 represents FMN binding site in detail using the representative structure<br />

<strong>of</strong> the first cluster <strong>of</strong> FOX (4.6a) <strong>and</strong> FHQ (4.6b), respectively. The hydrogen bonds between<br />

FMN domain <strong>and</strong> c<strong>of</strong>actor were calculated for distance between acceptor <strong>and</strong> hydrogen<br />

donor ≤ 0.35 nm <strong>and</strong> an angle among acceptor, donor <strong>and</strong> acceptor ≤ 30°.<br />

The occurrence <strong>of</strong> hydrogen bonds that are observed in crystal structure between<br />

FMN domain <strong>and</strong> the isoalloxazine ring <strong>of</strong> FMN c<strong>of</strong>actor are reported in Figure 4.7 for the<br />

simulation <strong>of</strong> FOX (in blue) <strong>and</strong> FHQ (in red). NMR spectroscopy studies <strong>of</strong> the protein in<br />

solution have also evidenced the hydrogen-bonding network involving N1, C2O, N3, C4O<br />

<strong>and</strong> N5 atoms <strong>of</strong> the isoalloxazine ring.[14,31] In the crystal structure, the hydrogen atom<br />

90


PART II: P450BM-3 Reductase Domain<br />

from the backbone amino group (NH) <strong>of</strong> Asn537 was involved in a hydrogen bond<br />

formation with N5 atom <strong>of</strong> isoalloxazine ring (NH – N5) with the distance <strong>of</strong> 0.175 nm. The<br />

same hydrogen bond was observed in the first 15 ns <strong>of</strong> FHQ simulation (in Figure 4.7a).<br />

However, being N5 atom in FHQ a hydrogen donor due to the protonation, a hydrogen<br />

bond with the oxygen from the side chain carboxamide group (-CONH2) <strong>of</strong> Asn537 (-<br />

(NH2)CO – HN5) was observed in last 25 ns simulation.<br />

Figure 4.6: (a) <strong>and</strong> (b) shows the FMN binding site from the representative structures <strong>of</strong> the first<br />

cluster for FOX <strong>and</strong> FHQ, respectively. The represented residues are within 0.4 nm from the FMN<br />

c<strong>of</strong>actor (in black). FMN binding loops Lβ1, Lβ3 <strong>and</strong> Lβ4 are shown as the ribbon <strong>of</strong> yellow, pink<br />

<strong>and</strong> cyan, respectively. The residues are labeled in red, blue <strong>and</strong> green as the part <strong>of</strong> Lβ1, Lβ3 <strong>and</strong><br />

Lβ4, respectively. Dashed lines show hydrogen bonds between isoalloxazine <strong>and</strong> surrounding<br />

residues. The underlined labels indicate the residues that have major change in conformation after<br />

the change in the redox state <strong>of</strong> FMN c<strong>of</strong>actor.<br />

In the crystal structure <strong>and</strong> simulations, the stable hydrogen bonds were observed<br />

at oxygen (O2) <strong>and</strong> nitrogen (N3H) <strong>of</strong> isoalloxazine ring with the hydrogen <strong>of</strong> backbone<br />

amino group <strong>of</strong> Gln579 (NH – O2) <strong>and</strong> the oxygen from backbone carbonyl group (CO) <strong>of</strong><br />

91


PART II: P450BM-3 Reductase Domain<br />

Thr577 (N3H – OC), respectively. O4 position was involved in hydrogen bond formation<br />

with hydrogen <strong>of</strong> hydroxyl group <strong>of</strong> Thr577 (OH – O4) in the first 15 ns simulation <strong>of</strong> FOX<br />

but it occurred throughout the whole FHQ simulation. Atom N1 was observed to form<br />

hydrogen bond with backbone amino group <strong>of</strong> Asp571 (NH – N1) in the crystal structure<br />

<strong>and</strong> FOX. However, in the FHQ simulation, the occurrence <strong>of</strong> this bond (NH – N1H) was very<br />

low.<br />

Figure 4.7: Hydrogen bond existence between FMN binding residues <strong>and</strong> a) isoalloxazine ring (a)<br />

<strong>and</strong> ribityl side chain (b) <strong>of</strong> FMN c<strong>of</strong>actor throughout the MD simulations calculated using every<br />

50 th ps frame. Blue <strong>and</strong> red color lines show hydrogen bond occurrences in FOX <strong>and</strong> FHQ,<br />

respectively as a function <strong>of</strong> time. On Y-axis labeled the partners <strong>of</strong> hydrogen bond.<br />

92


PART II: P450BM-3 Reductase Domain<br />

The change in protonation state <strong>of</strong> FMN c<strong>of</strong>actor affects its binding in FMN domain.<br />

FHQ strengthen the hydrogen-bonding network between ribityl side chain <strong>and</strong> phosphate<br />

moiety <strong>of</strong> FMN c<strong>of</strong>actor <strong>and</strong> domain. In FHQ, the phosphate group <strong>of</strong> FMN c<strong>of</strong>actor forms<br />

strong hydrogen bonds with the residues <strong>of</strong> Lβ1 loop than in FOX (shown in Figure 4.7b).<br />

The first hydroxyl group <strong>of</strong> ribityl side chain was involved in strong hydrogen bonding with<br />

the carbonyl oxygen <strong>of</strong> Ser537 (OH – OC) in FOX <strong>and</strong> FHQ. The third hydroxyl group <strong>of</strong><br />

ribityl side chain shows stronger hydrogen bond with Thr492 hydroxyl in FOX than FHQ.<br />

The latter was also observed to form hydrogen bond with carbonyl oxygen <strong>of</strong> Cys569 in<br />

FHQ.<br />

In FHQ, the tighter binding <strong>of</strong> FMN c<strong>of</strong>actor with stronger hydrogen bonding<br />

network between flavin <strong>and</strong> protein than FOX, induces the conformational change in FMN<br />

binding loops Lβ1, Lβ3 <strong>and</strong> Lβ4. The major conformational change was observed in the<br />

orientation <strong>of</strong> Trp574 residue due to the protonated N5 position in the isoalloxazine ring in<br />

FHQ (Figure 4.6b). The indole ring <strong>of</strong> Trp574 was nearly coplanar to the isoalloxazine ring<br />

<strong>of</strong> FMN c<strong>of</strong>actor in the crystal structure.[17] Experimentally, Trp574 conformation was<br />

observed to be critical to the FMN binding <strong>and</strong> found to be involved in ET tunneling from<br />

FMN to HEME. In FOX, the indole ring <strong>of</strong> Trp574 remains in the same conformation as in<br />

the crystal. In FHQ, it rotates to another configuration not aligned to the isoalloxazine ring.<br />

This rotation is a consequence <strong>of</strong> the steric hindrance induced by the conformation change<br />

in the reduced isoalloxazine ring.<br />

The change in the volume, hydrophobicity, solvent accessibility <strong>and</strong> polarity <strong>of</strong> FMN<br />

binding site are reported in Figure 4.8a, 4.8b , 4.8c <strong>and</strong> 4.8d, respectively during the<br />

simulation <strong>of</strong> FOX (in black), FHQ (in red) <strong>and</strong> APO (in green). In APO, the absence <strong>of</strong> FMN<br />

c<strong>of</strong>actor promotes a rearrangement <strong>of</strong> FMN binding site. After rearrangement, the side<br />

chains <strong>of</strong> amino acids <strong>of</strong> FMN binding loops replaced the initial water molecules <strong>and</strong><br />

occupied the cavity after ~5 ns <strong>of</strong> simulation. FOX shows larger variation in the geometric<br />

properties <strong>of</strong> FMN binding pocket than FHQ. Major changes were observed in the volume <strong>of</strong><br />

FMN binding pocket after the change in protonation state <strong>of</strong> FMN c<strong>of</strong>actor with averages<br />

429 ± 86 <strong>and</strong> 357 ± 45 for FOX <strong>and</strong> FHQ, respectively. In FHQ, the pocket volume showed<br />

93


PART II: P450BM-3 Reductase Domain<br />

less variation than in FOX (see Figure 4.8a). The hydrophobicity (Figure 4.8b) <strong>and</strong> polarity<br />

(Figure 4.8d) <strong>of</strong> FMN pocket are slightly perturbed in the first 15 ns simulation for then<br />

converge to the same values in both FQH <strong>and</strong> FOX simulations.<br />

Figure 4.8: FMN binding pocket (a) volume, (b) hydrophobicity, y, (c) solvent accessibility<br />

<strong>and</strong> (d) polarity for FOX (in black), FHQ (in red) <strong>and</strong> APO (in green).<br />

4.3.4. Conservation pr<strong>of</strong>ile <strong>of</strong> FMN binding site<br />

In Table 4.2, the summary <strong>of</strong> MSA for FMN domain <strong>of</strong> P450BM-3 <strong>and</strong> its homologous<br />

structures (in SI see Figure S4. 4.2 for MSA <strong>and</strong> Figure S4.3 for conservation patterns mapped<br />

on FMN domain <strong>of</strong> P450BM-3) is reported. The ribityl side-chain binding region is the most<br />

94


PART II: P450BM-3 Reductase Domain<br />

Table 4.2: Summarizing MSA <strong>of</strong> FMN domain <strong>of</strong> P450BM-3 <strong>and</strong> its homologous structures<br />

<strong>and</strong> the characterization <strong>of</strong> their FMN-binding pocket.<br />

PDB<br />

Id<br />

1BVY<br />

1B1C<br />

1JA1<br />

2BF4<br />

1YKG<br />

3HR4<br />

1F4P<br />

Max.<br />

FMN binding pocket<br />

RMSD sequence<br />

Hydro Solvent<br />

Protein<br />

Pola<br />

(nm) identity Volume phobicitbility<br />

accessi-<br />

-rity<br />

(%)<br />

CPR a 0.000 100.00 601.82 358.17 16.69 11<br />

CPR a 0.134 30.26 508.28 299.54 25.26 11<br />

CPR a 0.135 30.26 594.28 313.50 24.65 12<br />

CPR a 0.135 25.00 554.08 354.37 11.29 12<br />

SiR-FP b 0.172 19.86 639.03 354.39 20.88 15<br />

NOS c 0.144 23.68 466.78 258.09 14.44 12<br />

Fld d 0.214 23.13 558.02 340.19 10.82 15<br />

Organism<br />

Bacillus<br />

megaterium<br />

Homo<br />

sapiens<br />

Rattus<br />

norvegicus<br />

Saccharomyces<br />

cerevisiae<br />

Escherichia<br />

coli<br />

Homo<br />

sapiens<br />

Desulfovibrio<br />

vulgaris<br />

conserved region in FMN domain since it is responsible for the tight binding <strong>of</strong> FMN<br />

c<strong>of</strong>actor. Among the cytochrome P450 reductases (CPR), the P450BM-3 one has the higher<br />

volume <strong>of</strong> FMN binding site with higher hydrophobicity <strong>and</strong> lower polarity. The solvent<br />

accessibility <strong>of</strong> the FMN binding pocket was found to be in the middle <strong>of</strong> other homologous<br />

protein. Together all these differences in the properties <strong>of</strong> FMN-binding pocket <strong>of</strong> P450BM-<br />

3 results into the better catalytic turnover than other P450 monooxygenases.[14,17]<br />

*CPR a : cytochrome P450 reductase, SiR-FP b : sulfite reductase, NOS c : nitric oxide synthase, Fld d : flavodoxin<br />

95


PART II: P450BM-3 Reductase Domain<br />

4.3.5. Principal component analysis <strong>of</strong> FMN domain<br />

The cumulative relative positional fluctuation <strong>of</strong> the first 20 eigenvectors accounts<br />

for 82 %, 79 % <strong>and</strong> 75 % <strong>of</strong> the total RPF in the simulation <strong>of</strong> FOX, FHQ <strong>and</strong> APO,<br />

respectively. The convergence <strong>of</strong> the trajectory has been analyzed by comparing the RMSIP<br />

value for the first 20 eigenvectors obtained from the PCA <strong>of</strong> MD trajectories (50 ns). The<br />

RMSIP values calculated by the two halves <strong>of</strong> the trajectories resulted in 0.563, 0.624 <strong>and</strong><br />

0.594 for the simulation <strong>of</strong> FOX, FHQ <strong>and</strong> APO, respectively. The relatively high values <strong>of</strong><br />

the RMSIP for the trajectories indicate the good convergence <strong>of</strong> the essential eigenvectors.<br />

The first two eigenvectors cover the 50%, 41%, <strong>and</strong> 32% (with 32% <strong>and</strong> 29% <strong>and</strong><br />

20% contribution just from the first eigenvector) <strong>of</strong> the total fluctuations in FOX, FHQ, <strong>and</strong><br />

APO, respectively. The inner product (IP) values for the first two eigenvectors obtained<br />

from the inner product matrix <strong>of</strong> two trajectories are reported in Table 4.3. The inner<br />

product <strong>of</strong> the first eigenvector in all simulations was found to be less than 0.350, which<br />

shows that the most important essential mode is different for the three systems.<br />

Table 4.3: RMSIP values <strong>of</strong> the first 2 eigenvectors obtained from the last 50 ns trajectories<br />

<strong>of</strong> two different simulations.<br />

Inner product<br />

FOX/FHQ<br />

FOX/APO<br />

FHQ/APO<br />

1 st eigenvector 2 nd eigenvector<br />

1 st eigenvector 0.143 0.400<br />

2 st eigenvector 0.485 0.414<br />

1 st eigenvector 0.268 0.156<br />

2 st eigenvector 0.033 0.156<br />

1 st eigenvector 0.351 0.281<br />

2 st eigenvector 0.183 0.169<br />

96


PART II: P450BM-3 Reductase Domain<br />

Figure 4.9a represents RMSF <strong>of</strong> the backbone atoms in the first <strong>and</strong> second<br />

eigenvector <strong>of</strong> FOX (black), FHQ (red) <strong>and</strong> APO (green). The corresponding tridimensional<br />

representations obtained after the projection <strong>of</strong> first <strong>and</strong> second eigenvectors on MD<br />

trajectories are reported in Figure 4.9b, 4.9c <strong>and</strong> 4.9d for FOX, FHQ <strong>and</strong> APO, respectively.<br />

In all simulations, the residues <strong>of</strong> C terminal <strong>and</strong> long loop regions, Lβ2 <strong>and</strong> Lα2 that are<br />

present opposite to FMN binding site show higher fluctuations <strong>and</strong> together constitute the<br />

collective motions in the first eigenvector. In the first eigenvector <strong>of</strong> FOX the collective<br />

motion is restricted to Lβ2, Lβ3 <strong>and</strong> Lα2 loops <strong>and</strong> C terminal region <strong>and</strong> have higher<br />

fluctuations. In the second eigenvector <strong>of</strong> FOX, the higher fluctuation was shown by C<br />

terminal residues <strong>and</strong> Lα1, Lβ2, Lα2 loops. FHQ shows higher fluctuations in Lα1, Lα2, Lβ4<br />

loops, <strong>and</strong> C <strong>and</strong> N terminus in the first eigenvector. The second eigenvector <strong>of</strong> FHQ shows<br />

higher fluctuations in Lα2, Lβ2 <strong>and</strong> Lβ4 loops. The first eigenvector <strong>of</strong> APO shows higher<br />

fluctuations in Lβ1, Lβ2 <strong>and</strong> Lα2 loops <strong>and</strong> C terminus, while the second eigenvector shows<br />

in Lα2 loop <strong>and</strong> in all the FMN binding loops. The higher fluctuations constitute the<br />

collective motion in the first eigenvector were observed in inner FMN binding loop Lβ3 for<br />

FOX, outer FMN binding loop Lβ4 for FHQ <strong>and</strong> Lβ1 for APO.<br />

Figure 4.10 shows the crystallographic structure complex <strong>of</strong> HEME with FMN<br />

domain with labeled helices, c<strong>of</strong>actors <strong>and</strong> FMN binding loops. In the crystal structure, the<br />

α1 helix is involved in direct or water mediated contacts with HEME domain <strong>and</strong> the outer<br />

FMN binding loop (Lβ4) interact with the peptide precede the HEME binding loop (K/L<br />

loop). 18 These interaction sites are crucial for the ET from FMN to HEME. Higher<br />

fluctuations observed in Lβ4 <strong>and</strong> α1 helix regions in the first eigenvector <strong>of</strong> FHQ might be<br />

related to the inhibition <strong>of</strong> electron transfer from FMN to HEME by reduced state <strong>of</strong> FMN<br />

c<strong>of</strong>actor as observed experimentally. 11 In the first eigenvector <strong>of</strong> FOX, the higher<br />

fluctuation was restricted to the residues <strong>of</strong> inner FMN binding loop Lβ3 <strong>and</strong> loops,<br />

opposite to FMN binding site Lα2 <strong>and</strong> Lβ2. So the latter defined region is found opposite to<br />

the region <strong>of</strong> probable HEME binding surface in the crystal structure so the collective<br />

motion constitute by this region in FOX might be related to the electron transfer from FAD<br />

to FMN <strong>and</strong> it could be the probable binding site for FAD domain. In APO the local structure<br />

97


PART II: P450BM-3 Reductase Domain<br />

remain conserved with highly flexible FMN binding loops that helps to rebind FMN c<strong>of</strong>actor<br />

in apo-protein protein <strong>and</strong> working again as holo-protein as found experimentally. 41<br />

Figure 4.9: (a) RPF associated with eigenvectors. Vertical bars in grey color show the loop<br />

regions. FMN binding loop are shown in black horizontal bars. Representation <strong>of</strong> the RMSF <strong>of</strong> the<br />

98


PART II: P450BM-3 Reductase Domain<br />

protein backbone atoms along first <strong>and</strong> second eigenvectors after projection <strong>of</strong> the trajectory <strong>of</strong><br />

FOX (black), FHQ (red) <strong>and</strong> APO (green) on the corresponding eigenvectors. The 10 sequential<br />

frames representing the extension <strong>of</strong> the fluctuations in FOX (b), FHQ (c) <strong>and</strong> APO (d) trajectories<br />

along the first <strong>and</strong> second eigenvector are reported. . The first extreme is shown in blue color <strong>and</strong><br />

last extreme in cyan. Loops <strong>and</strong> helices es are labeled. Labels in red show the FMN binding loops. N<br />

<strong>and</strong> C indicate the N- <strong>and</strong> C-terminus <strong>of</strong> the protein.<br />

Figure 4.10: FMN domain (in pink) complex with P450 HEME domain (in blue) in crystal<br />

structure (1BVY[18]). HEME c<strong>of</strong>actor represented in orange <strong>and</strong> FMN c<strong>of</strong>actor in green. Helices <strong>and</strong><br />

N- <strong>and</strong> C-terminus are labeled in both domains. Labels Lβ1, Lβ3 <strong>and</strong> Lβ4 show the FMN c<strong>of</strong>actor<br />

binding loops in the FMN domain.<br />

4.3.6. FMN c<strong>of</strong>actor: structural <strong>and</strong> dynamical properties<br />

The conformational changes <strong>of</strong> the FMN c<strong>of</strong>actor induced by the surrounding<br />

protein environment were studied for both redox states. The RMSD <strong>and</strong> RMSF <strong>of</strong> phosphate<br />

group atoms for both states show higher fluctuations <strong>and</strong> deviations than other FMN<br />

c<strong>of</strong>actor heavy atoms (see Figure 4.11a <strong>and</strong> 4.11b). Furthermore, the phosphate group <strong>of</strong><br />

99


PART II: P450BM-3 Reductase Domain<br />

Figure 4.11: (a) RMSF <strong>and</strong> RMSD <strong>of</strong> heavy atoms calculated with respect to crystal<br />

structure. Vertical line shows the beginning <strong>of</strong> ribityl side chain. (b) Schematic diagram <strong>of</strong><br />

FMN c<strong>of</strong>actor with the numbering used in the plots (4.4a) for the atomic positions <strong>of</strong> heavy<br />

atoms.<br />

FMN c<strong>of</strong>actor in oxidized state deviates more from the crystal structure <strong>and</strong> with higher<br />

fluctuations than it does in the reduced state. This is consistent with the observed<br />

variations <strong>of</strong> the hydrogen bonding network between the FMN c<strong>of</strong>actor <strong>and</strong> the protein.<br />

Figure 4.12a shows the distribution <strong>of</strong> the value <strong>of</strong> δ angles <strong>of</strong> the isoalloxazine ring<br />

(see Figure 4.2) in FOX <strong>and</strong> FHQ. For the oxidized state <strong>of</strong> FMN c<strong>of</strong>actor the vales are<br />

normal distributed in the range from 170º to 180º with the peak centered at 177º. In the<br />

reduced state, the distribution <strong>of</strong> δ has a larger width. The reduced state <strong>of</strong> FMN shows a<br />

distribution ranging from 154º to 171º with the peak at 162º. For the latter case, the<br />

average value is consistent with quantum mechanical calculations in vacuum <strong>of</strong> the<br />

isoalloxazine ring in the reduced state[22,32] that give a value for δ = ~160º.<br />

100


PART II: P450BM-3 Reductase Domain<br />

Figure 4.12: Distribution <strong>of</strong> (a)<br />

angles calculated at N5 — N10 axis <strong>of</strong> isoalloxazine ring along the<br />

50 ns simulation <strong>of</strong> FOX (in black color) <strong>and</strong> FHQ (in red color) with 0.3 bin width <strong>and</strong> (b) the<br />

beginning to end distance <strong>of</strong> the ribityl side chain along 50 ns simulation for FOX <strong>and</strong> FHQ (0.007<br />

bin width) <strong>and</strong> in X-ray (in green color) <strong>and</strong> NMR (in blue color) homologous structures <strong>of</strong> FMN<br />

domain.<br />

Figure 4.12b shows the distribution <strong>of</strong> the beginning to end distance for the ribityl side<br />

chain <strong>of</strong> FMN c<strong>of</strong>actor in FOX, FHQ <strong>and</strong> the homologous structures <strong>of</strong> FMN domain with<br />

FMN c<strong>of</strong>actor in oxidized state. For the crystallographic <strong>and</strong> NMR homologous structures<br />

the distance range from 0.73 to 0.8 nm <strong>and</strong> 0.70 to 0.81 nm, respectively. The distance<br />

observed ed in the crystal structure <strong>of</strong> FMN domain was 0.77 nm. The distances obtained from<br />

the FOX <strong>and</strong> FHQ simulations are distributed in the range <strong>of</strong> 0.61 to 0.90 nm with the main<br />

peaks at 0.78 nm <strong>and</strong> 0.80 nm, respectively. The beginning to end distance for ribityl side<br />

chain in the simulations <strong>and</strong> in NMR <strong>and</strong> crystallographic studies was consistent <strong>and</strong><br />

distributed in the same range.<br />

101


PART II: P450BM-3 Reductase Domain<br />

4.3.7. Cluster analysis <strong>of</strong> FMN c<strong>of</strong>actor<br />

The cluster analysis was performed on the heavy atoms <strong>of</strong> FMN c<strong>of</strong>actor using the<br />

crystal structure as reference <strong>and</strong> a cut<strong>of</strong>f <strong>of</strong> 0.04 nm in the protein environment. The first<br />

cluster comprises 87 % <strong>and</strong> 99 % <strong>of</strong> the total 13 <strong>and</strong> 8 clusters in FOX <strong>and</strong> FHQ,<br />

respectively. The cumulative sum <strong>of</strong> the number <strong>of</strong> clusters obtained from different<br />

simulations as a function <strong>of</strong> time is reported in Figure S4.4 <strong>of</strong> the SI. Both the simulations<br />

reached a plateau, which indicates a sufficient sampling <strong>of</strong> conformational space along the<br />

trajectories. The representative structure <strong>of</strong> first cluster <strong>of</strong> FOX (in black) <strong>and</strong> FHQ (in red)<br />

are reported in Figure S4.4 in SI. The conformational flexibility <strong>of</strong> ribityl side chain was<br />

observed to be mainly responsible for the conformational diversity <strong>of</strong> clusters in both the<br />

simulations. The reduced number <strong>of</strong> clusters in the FHQ simulation is consistent with the<br />

small RMSF fluctuations <strong>of</strong> the ribityl side chain. The preferred conformation for the FMN<br />

c<strong>of</strong>actor in both redox states is characterized by a partially elongated ribityl side chain (see<br />

Figure S4.4 in SI).<br />

4.3.8. Principal component analysis <strong>of</strong> FMN c<strong>of</strong>actor<br />

The first eight eigenvectors cover ~79 % <strong>of</strong> the total RPF in both the simulations. In<br />

the protein environment, RPF covers ~25 % <strong>and</strong> ~27 % by the first eigenvector in FOX <strong>and</strong><br />

FHQ, respectively. The superimposition <strong>of</strong> the first <strong>and</strong> last extreme structures <strong>of</strong><br />

isoalloxazine ring generated by the projection <strong>of</strong> the first eight eigenvectors on the<br />

trajectory <strong>of</strong> FOX is shown in Figure 4.13. In the first eigenvector, a symmetric bending<br />

mode <strong>of</strong> the ring is present in both the simulations. However, the reduced state (Figure<br />

4.13 (1b)) manifest a “butterfly wing” bending mode around the N5 — N10 axis. This type<br />

<strong>of</strong> vibrational mode has been also reported in previous experimental <strong>and</strong> quantum<br />

mechanics study.[21,22,32] The other seven modes show similar eigenvectors in both<br />

states. The second most dominant motion is the twisting <strong>of</strong> the isoalloxazine ring along the<br />

main isoalloxazine axis. The observed collective modes are in the qualitative agreement<br />

with the vibrational normal modes obtained from resonance Raman spectroscopy<br />

measurements <strong>and</strong> QM calculations for the Lumiflavin.[23,32] In addition, surface<br />

102


PART II: P450BM-3 Reductase Domain<br />

enhanced resonance Raman scattering studies <strong>of</strong> the free FMN c<strong>of</strong>actor <strong>and</strong> in FMN domain<br />

indicate evidences the presence <strong>of</strong> vibrational modes resulted by atomic displacement <strong>of</strong><br />

atoms like C4, O, C4a, C10a , C5a <strong>and</strong> C9a.[33] These experimental observations are<br />

consistent with the higher fluctuations in correspondence <strong>of</strong> these atoms observed in PC<br />

modes from the first 8 eigenvectors <strong>of</strong> our simulations.<br />

Figure 4.13: The superimposition <strong>of</strong> two extreme structures generated after projecting FOX<br />

trajectory on the first eight eigenvectors (structures from 1 - 8). 1a <strong>and</strong> 1b show the different<br />

collective motion <strong>of</strong> 1 st eigenvector in oxidized <strong>and</strong> reduced state, respectively.<br />

4.4. Discussion <strong>and</strong> conclusions<br />

MD simulations have been performed on FMN binding reductase domain <strong>of</strong><br />

monooxygenase P450BM-3 using FMN c<strong>of</strong>actor in oxidized <strong>and</strong> reduced state to<br />

underst<strong>and</strong> the effect <strong>of</strong> the change in protonation state <strong>of</strong> isoalloxazine ring on the<br />

103


PART II: P450BM-3 Reductase Domain<br />

conformation <strong>and</strong> dynamics <strong>of</strong> FMN domain <strong>and</strong> c<strong>of</strong>actor. The results <strong>of</strong> the simulations<br />

showed that the change <strong>of</strong> the protonation state in the reduced FMN affect the overall<br />

structure <strong>and</strong> dynamics <strong>of</strong> FMN domain in solution. In particular, the structural <strong>and</strong><br />

dynamic properties <strong>of</strong> the si-face FMN binding loop (Lβ3) are strongly influenced by the<br />

change in protonation <strong>of</strong> FMN c<strong>of</strong>actor (in FOX). In the apo-protein, the overall local<br />

structure <strong>of</strong> the protein remains reserved but higher fluctuations were observed in FMN<br />

binding loops. The latter effect can explain the experimental finding <strong>of</strong> reversible rebinding<br />

<strong>of</strong> FMN c<strong>of</strong>actor in apo-protein.[31,34,35] The loop Lβ2 were observed to contribute<br />

mainly on the collective modes <strong>of</strong> the FMN domain as holo-protein or apo-protein that is<br />

also in agreement with the solution structure <strong>of</strong> flavodoxin-like domain <strong>of</strong> E.coli<br />

determined by NMR.[35] The inner FMN binding loop (Lβ3) contributed to the prominent<br />

collective mode <strong>of</strong> FMN domain in oxidized state. While the outer FMN binding loop (Lβ4)<br />

contribute to the prominent collective mode <strong>of</strong> FMN domain in reduced state <strong>and</strong> in apoprotein.<br />

In FHQ, the major conformational change in the FMN binding site residue Trp574<br />

was observed. Trp574 is critical to FMN c<strong>of</strong>actor binding <strong>and</strong> electron transfer in P450BM-<br />

3. In FHQ, the latter do not remain coplanar to isoalloxazine ring to avoid the steric<br />

hindrance induced by the conformation change in the FMN c<strong>of</strong>actor upon protonation as<br />

also suggested by 15 N–NMR[31] <strong>and</strong> surface enhanced resonance Raman scattering[33]<br />

experiments. Hence, the change in the conformation <strong>of</strong> Trp574 might be the major factor<br />

that makes the reduced state kinetically unfavorable in P450BM-3 for transferring the<br />

electron from FMN to HEME as observed in previous studies.[36]<br />

The FMN c<strong>of</strong>actor during simulations acquires different conformations that are<br />

mainly influenced by the movement <strong>of</strong> ribityl side chain. The binding region <strong>of</strong> the ribityl<br />

side chain was evolutionary more conserved. In general the oxidized state was observed<br />

more flexible to obtain different conformation in protein environment. The latter might be<br />

the result <strong>of</strong> change in the FMN binding site properties <strong>and</strong> hydrogen bond environment in<br />

the reduced state. The isoalloxazine ring <strong>of</strong> FMN c<strong>of</strong>actor exhibits mainly 8 collective<br />

motions. Except first vibrational modes all modes were identical in both redox states. FMN<br />

c<strong>of</strong>actor in reduced state constitutes to the so-called “butterfly motion” as the first<br />

collective motion due to bending <strong>of</strong> isoalloxazine along the N5 — N10 axis.<br />

104


PART II: P450BM-3 Reductase Domain<br />

In summary, we have analyzed for the first time the dynamics <strong>of</strong> the FMN binding<br />

domain <strong>of</strong> P450BM-3 in water. In particular, we have studied the effect <strong>of</strong> FMN binding on<br />

the fluctuation modes <strong>of</strong> FMN domain. FMN c<strong>of</strong>actor is involved in the electron transfer in<br />

P450BM-3 <strong>and</strong> its dynamics can play an important role in electron transfer. The results <strong>of</strong><br />

our study indicate a difference in the fluctuation amplitude <strong>of</strong> the FMN c<strong>of</strong>actor in the<br />

different redox states. The latter effect was resulted by the change in the conformation <strong>of</strong><br />

FMN binding site due to the protonation state <strong>of</strong> isoalloxazine ring.<br />

4.5. References<br />

1. Chefson A, Auclair K (2006) Progress towards the easier use <strong>of</strong> P450 enzymes. Mol<br />

Biosyst 2: 462-469.<br />

2. Wong LL (1998) Cytochrome P450 monooxygenases. Curr Opin Chem Biol 2: 263-<br />

268.<br />

3. Guengerich FP (2001) Common <strong>and</strong> uncommon cytochrome P450 reactions related<br />

to metabolism <strong>and</strong> chemical toxicity. Chem Res Toxicol 14: 611-650.<br />

4. Kumar S (2010) Engineering cytochrome P450 biocatalysts for biotechnology,<br />

medicine <strong>and</strong> bioremediation. Expert Opin Drug Metab Toxicol 6: 115-131.<br />

5. Warman AJ, Roitel O, Neeli R, Girvan HM, Seward HE, et al. (2005) Flavocytochrome<br />

P450 BM3: an update on structure <strong>and</strong> mechanism <strong>of</strong> a biotechnologically important<br />

enzyme. Biochem Soc Trans 33: 747-753.<br />

6. Munro AW, Leys DG, McLean KJ, Marshall KR, Ost TW, et al. (2002) P450 BM3: the<br />

very model <strong>of</strong> a modern flavocytochrome. Trends Biochem Sci 27: 250-257.<br />

7. Girvan HM, Waltham TN, Neeli R, Collins HF, McLean KJ, et al. (2006)<br />

Flavocytochrome P450 BM3 <strong>and</strong> the origin <strong>of</strong> CYP102 fusion species. Biochem Soc<br />

Trans 34: 1173-1177.<br />

8. Jung ST, Lauchli R, Arnold FH (2011) Cytochrome P450: taming a wild type enzyme.<br />

Curr Opin Biotechnol 22: 809–817.<br />

105


PART II: P450BM-3 Reductase Domain<br />

9. Whitehouse CJC, Bell SG, Wong L-L (2012) P450BM3 (CYP102A1): connecting the<br />

dots. Chem Soc Rev 41: 1218-1260.<br />

10. Sevrioukova I, Peterson JA (1996) Domain-domain interaction in cytochrome<br />

P450BM-3. Biochimie 78: 744-751.<br />

11. Sevrioukova I, Shaffer C, Ballou DP, Peterson JA (1996) Equilibrium <strong>and</strong> transient<br />

state spectrophotometric studies <strong>of</strong> the mechanism <strong>of</strong> reduction <strong>of</strong> the flavoprotein<br />

domain <strong>of</strong> P450BM-3. Biochemistry 35: 7058-7068.<br />

12. Sevrioukova I, Truan G, Peterson JA (1996) The flavoprotein domain <strong>of</strong> P450BM-3:<br />

Expression, purification, <strong>and</strong> properties <strong>of</strong> the flavin adenine dinucleotide- <strong>and</strong><br />

flavin mononucleotide-binding subdomains. Biochemistry 35: 7528-7535.<br />

13. Hazzard JT, Govindaraj S, Poulos TL, Tollin G (1997) Electron transfer between the<br />

FMN <strong>and</strong> heme domains <strong>of</strong> cytochrome P450BM-3. Effects <strong>of</strong> substrate <strong>and</strong> CO. J Biol<br />

Chem 272: 7922-7926.<br />

14. Narhi LO, Fulco AJ (1987) Identification <strong>and</strong> characterization <strong>of</strong> two functional<br />

domains in cytochrome P-450BM-3, a catalytically self-sufficient monooxygenase<br />

induced by barbiturates in Bacillus megaterium. J Biol Chem 262: 6683-6690.<br />

15. Pylypenko O, Schlichting I (2004) Structural aspects <strong>of</strong> lig<strong>and</strong> binding to <strong>and</strong><br />

electron transfer in bacterial <strong>and</strong> fungal P450s. Annu Rev Biochem 73: 991-1018.<br />

16. Chen HC, Swenson RP (2008) Effect <strong>of</strong> the Insertion <strong>of</strong> a Glycine Residue into the<br />

Loop Spanning Residues 536-541 on the Semiquinone State <strong>and</strong> Redox Properties <strong>of</strong><br />

the Flavin Mononucleotide-Binding Domain <strong>of</strong> Flavocytochrome P450BM-3 from<br />

Bacillus megaterium. Biochemistry 47: 13788-13799.<br />

17. Sevrioukova IF, Hazzard JT, Tollin G, Poulos TL (1999) The FMN to heme electron<br />

transfer in cytochrome P450BM-3 - Effect <strong>of</strong> chemical modification <strong>of</strong> cysteines<br />

engineered at the FMN-heme domain interaction site. J Biol Chem 274: 36097-<br />

36106.<br />

18. Sevrioukova IF, Li HY, Zhang H, Peterson JA, Poulos TL (1999) Structure <strong>of</strong> a<br />

cytochrome P450-redox partner electron-transfer complex. P Natl Acad Sci USA 96:<br />

1863-1868.<br />

106


PART II: P450BM-3 Reductase Domain<br />

19. van Gunsteren WF, Billeter SR, Eising AA, Hunenberger PH, Kruger P, et al. (1996)<br />

Biomolecular Simulation: The GROMOS96 Manual <strong>and</strong> User Guide. VdF:<br />

Hochschulverlag AG an der ETH Zurich <strong>and</strong> BIOMOS bv, Zurich, Groningen: 1.<br />

20. van Gunsteren WF, Billeter SR, Eising AA, Hunenberger PH, Kruger P, et al. (1996)<br />

Biomolecular Simulation: The GROMOS96 Manual <strong>and</strong> User Guide. VdF:<br />

Hochschulverlag AG an der ETH Zurich <strong>and</strong> BIOMOS bv, Zurich, Groningen.<br />

21. Zheng Y-J, Ornstein RL (1996) A Theoretical Study <strong>of</strong> the Structures <strong>of</strong> Flavin in<br />

Different Oxidation <strong>and</strong> Protonation States. J Am Chem Soc 118: 9402-9408.<br />

22. Walsh JD, Miller AF (2003) Flavin reduction potential tuning by substitution <strong>and</strong><br />

bending. J Mol Struc-Theochem 623: 185-195.<br />

23. Abe M, Kyogoku Y (1987) Vibrational Analysis <strong>of</strong> Flavin Derivatives - Normal<br />

Coordinate Treatments <strong>of</strong> Lumiflavin. Spectrochim Acta A 43: 1027-1037.<br />

24. (2010) ACD/ChemSketch Freeware, version 1201, Advanced Chemistry<br />

<strong>Development</strong>, Inc, Toronto, ON, Canada, wwwacdlabscom.<br />

25. Schmidtke P, Bidon-Chanal A, Luque FJ, Barril X (2011) MDpocket : Open Source<br />

Cavity Detection <strong>and</strong> Characterization on Molecular Dynamics Trajectories.<br />

Bioinformatics 27: 3276-3285.<br />

26. (2012) The PyMOL Molecular Graphics System, Version 1504 Schrödinger, LLC,<br />

http://wwwpymolorg/.<br />

27. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment<br />

search tool. J Mol Biol 215: 403-410.<br />

28. Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Jr., Brice MD, et al. (1977) The<br />

Protein Data Bank: a computer-based archival file for macromolecular structures. J<br />

Mol Biol 112: 535-542.<br />

29. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, et al. (2004) UCSF<br />

Chimera--a visualization system for exploratory research <strong>and</strong> analysis. J Comput<br />

Chem 25: 1605-1612.<br />

30. Kabsch W, S<strong>and</strong>er C (1983) Dictionary <strong>of</strong> protein secondary structure: pattern<br />

recognition <strong>of</strong> hydrogen-bonded <strong>and</strong> geometrical features. Biopolymers 22: 2577-<br />

2637.<br />

107


PART II: P450BM-3 Reductase Domain<br />

31. Kasim M, Chen HC, Swenson RP (2009) Functional characterization <strong>of</strong> the re-face<br />

loop spanning residues 536-541 <strong>and</strong> its interactions with the c<strong>of</strong>actor in the flavin<br />

mononucleotide-binding domain <strong>of</strong> flavocytochrome P450 from Bacillus<br />

megaterium. Biochemistry 48: 5131-5141.<br />

32. Nakai S, Yoneda F, Yamabe T (1999) Theoretical study on the lowest-frequency<br />

mode <strong>of</strong> the flavin ring. Theor Chem Acc 103: 109-116.<br />

33. Macdonald IDG, Smith WE, Munro AW (1999) Analysis <strong>of</strong> the structure <strong>of</strong> the flavin<br />

binding sites <strong>of</strong> flavocytochrome P450BM3 using surface enhanced resonance<br />

Raman scattering. Eur Biophys J 28: 437-445.<br />

34. Wittung-Stafshede P (2002) Role <strong>of</strong> c<strong>of</strong>actors in protein folding. Acc Chem Res 35:<br />

201-208.<br />

35. Sibille N, Blackledge M, Brutscher B, Coves J, Bersch B (2005) Solution structure <strong>of</strong><br />

the sulfite reductase flavodoxin-like domain from Escherichia coli. Biochemistry 44:<br />

9086-9095.<br />

36. Klein ML, Fulco AJ (1993) Critical residues involved in FMN binding <strong>and</strong> catalytic<br />

activity in cytochrome P450BM-3. J Biol Chem 268: 7553-7561.<br />

Part <strong>of</strong> this chapter is adapted with permission from ‘Verma R, Schwaneberg U,<br />

Roccatano D. Journal <strong>of</strong> Chemical Theory <strong>and</strong> Computation DOI: 10.1021/ct300723x.’<br />

108


PART II: P450BM-3 Reductase Domain SI<br />

Supporting Information<br />

Conformational Dynamics <strong>of</strong> the FMN-binding Reductase<br />

Domain <strong>of</strong> Monooxygenase P450BM-3<br />

Table S4.1: Force field parameters for FMN c<strong>of</strong>actor in oxidized state for GROMOS96 43a1<br />

force field.[1]<br />

Atom number Atom type Atom name Charge group Partial charge<br />

1 C FC9A 1 0.200<br />

2 NR FN10 1 -0.200<br />

3 C FC10A 2 0.360<br />

4 NR FN1 2 -0.360<br />

5 C FC2 3 0.380<br />

6 O FO2 3 -0.380<br />

7 NR FN3 4 -0.280<br />

8 H FH3 4 0.280<br />

9 C FC4 5 0.380<br />

10 O FO4 5 -0.380<br />

11 C FC4A 6 0.180<br />

12 NR FN5 6 -0.280<br />

13 C FC5A 6 0.100<br />

14 CR1 FC6 7 0.000<br />

15 C FC7 8 0.000<br />

109


PART II: P450BM-3 Reductase Domain SI<br />

16 CH3 FCM7 8 0.000<br />

17 C FC8 9 0.000<br />

18 CH3 FCM8 9 0.000<br />

19 CR1 FC9 10 0.000<br />

20 CH2 FCA 11 0.000<br />

21 CH1 FCB 12 0.150<br />

22 OA FOB 12 -0.548<br />

23 H FHB 12 0.398<br />

24 CH1 FCG 13 0.150<br />

25 OA FOG 13 -0.548<br />

26 H FHG 13 0.398<br />

27 CH1 FCD 14 0.150<br />

28 OA FOD 14 -0.548<br />

29 H FHD 14 0.398<br />

30 CH2 FCE 15 0.150<br />

31 OA FOZ 15 -0.36<br />

32 P FPH 15 0.630<br />

33 OA FOH 15 -0.548<br />

34 H FHH 15 0.398<br />

35 OM FOT1 15 -0.635<br />

36 OM FOT2 15 -0.635<br />

Dihedral parameters<br />

ai aj ak al function c0 c1 c2<br />

13 2 1 3 1 180.000 33.5 2<br />

1 2 3 11 1 180.000 33.5 2<br />

3 11 12 13 1 180.000 33.5 2<br />

11 13 12 1 1 180.000 33.5 2<br />

2 20 21 24 1 0.000 5.86 3<br />

20 22 21 23 1 0.000 1.26 3<br />

20 21 24 27 1 0.000 5.86 3<br />

110


PART II: P450BM-3 Reductase Domain SI<br />

21 24 25 26 1 0.000 1.26 3<br />

21 24 27 30 1 0.000 5.86 3<br />

24 27 28 29 1 0.000 1.26 3<br />

24 30 27 31 1 0.000 5.86 3<br />

27 30 31 32 1 0.000 3.77 3<br />

30 31 32 33 1 0.000 1.05 3<br />

31 33 32 34 1 0.000 1.05 3<br />

Improper dihedral parameters<br />

ai aj ak al function c0 c1<br />

1 2 19 13 2 0.0 167.42<br />

1 12 14 13 2 0.0 167.42<br />

1 13 14 15 2 0.0 167.42<br />

1 19 17 15 2 0.0 167.42<br />

1 2 3 4 2 180.0 167.42<br />

1 12 13 11 2 0.0 167.42<br />

1 14 13 12 2 180.0 167.42<br />

1 2 3 11 2 0.0 167.42<br />

1 17 19 18 2 180.0 167.42<br />

2 1 13 12 2 0.0 167.42<br />

2 11 3 12 2 0.0 167.42<br />

2 1 13 14 2 180.0 167.42<br />

2 3 11 9 2 180.0 167.42<br />

2 19 1 17 2 180.0 167.42<br />

2 4 3 5 2 180.0 167.42<br />

2 1 3 20 2 0.0 167.42<br />

3 2 1 19 2 180.0 167.42<br />

3 5 4 7 2 0.0 167.42<br />

3 4 5 6 2 180.0 167.42<br />

3 11 9 7 2 0.0 167.42<br />

3 2 4 11 2 0.0 167.42<br />

111


PART II: P450BM-3 Reductase Domain SI<br />

3 12 9 11 2 0.0 167.42<br />

3 11 12 13 2 0.0 167.42<br />

3 1 2 13 2 0.0 167.42<br />

3 9 11 10 2 180.0 167.42<br />

4 11 3 9 2 0.0 167.42<br />

4 5 7 9 2 0.0 167.42<br />

4 5 7 8 2 180.0 167.42<br />

4 3 2 11 2 180.0 167.42<br />

4 11 3 12 2 180.0 167.42<br />

5 4 7 6 2 0.0 167.42<br />

5 4 3 11 2 0.0 167.42<br />

5 9 7 11 2 0.0 167.42<br />

5 9 7 10 2 180.0 167.42<br />

6 7 5 8 2 0.0 167.42<br />

6 5 7 9 2 180.0 167.42<br />

7 9 5 8 2 0.0 167.42<br />

7 9 10 11 2 180.0 167.42<br />

8 7 9 20 2 0.0 167.42<br />

8 7 9 11 2 180.0 167.42<br />

8 10 11 12 2 180.0 167.42<br />

9 7 11 10 2 0.0 167.42<br />

9 12 11 13 2 180.0 167.42<br />

12 1 13 19 2 180.0 167.42<br />

12 11 9 7 2 180.0 167.42<br />

12 13 14 15 2 180.0 167.42<br />

13 19 1 17 2 0.0 167.42<br />

13 15 14 17 2 0.0 167.42<br />

13 15 14 16 2 180.0 167.42<br />

13 12 1 14 2 0.0 167.42<br />

14 12 13 11 2 180.0 167.42<br />

112


PART II: P450BM-3 Reductase Domain SI<br />

14 1 13 19 2 0.0 167.42<br />

14 17 15 19 2 0.0 167.42<br />

14 15 17 18 2 180.0 167.42<br />

15 17 14 16 2 0.0 167.42<br />

16 15 17 18 2 0.0 167.42<br />

17 15 19 18 2 0.0 167.42<br />

19 17 15 16 2 180.0 167.42<br />

20 24 22 21 2 35.0 167.42<br />

21 27 25 24 2 35.0 167.42<br />

24 30 28 27 2 35.0 167.42<br />

Table S4.2: Force field parameters for FMN c<strong>of</strong>actor in reduced state for GROMOS96 43a1<br />

force field.[1]<br />

Atom number Atom type Atom name Charge group Partial charge<br />

1 C FC9A 1 0.1<br />

2 NR FN10 1 -0.2<br />

3 C FC10A 1 0.1<br />

4 NR FN1 2 -0.28<br />

5 H FH1 2 0.28<br />

6 C FC2 3 0.38<br />

7 O FO2 3 -0.38<br />

8 NR FN3 4 -0.28<br />

9 H FH3 4 0.28<br />

10 C FC4 5 0.38<br />

11 O FO4 5 -0.38<br />

12 C FC4A 6 0.00<br />

13 NR FN5 7 -0.28<br />

14 H FH5 7 0.28<br />

113


PART II: P450BM-3 Reductase Domain SI<br />

15 C FC5A 8 0.00<br />

16 CR1 FC6 9 0.00<br />

17 C FC7 10 0.00<br />

18 CH3 FCM7 10 0.00<br />

19 C FC8 11 0.00<br />

20 CH3 FCM8 11 0.00<br />

21 CR1 FC9 12 0.00<br />

22 CH2 FCA 13 0.00<br />

23 CH1 FCB 14 0.15<br />

24 OA FOB 14 -0.548<br />

25 H FHB 14 0.398<br />

26 CH1 FCG 14 0.15<br />

27 OA FOG 15 -0.548<br />

28 H FHG 15 0.398<br />

29 CH1 FCD 15 0.15<br />

30 OA FOD 16 -0.548<br />

31 H FHD 16 0.398<br />

32 CH2 FCE 16 0.15<br />

33 OA FOZ 17 -0.36<br />

34 P FPH 17 0.63<br />

35 OA FOH 17 -0.548<br />

36 H FHH 17 0.398<br />

37 OM FOT1 17 -0.635<br />

38 OM FOT2 17 -0.635<br />

Dihedral parameters<br />

ai aj ak al function c0 c1 c2<br />

15 1 3 2 1 180.00 33.5 2<br />

1 12 2 3 1 180.00 33.5 2<br />

3 15 12 13 1 180.00 33.5 2<br />

12 13 1 15 1 180.00 33.5 2<br />

114


PART II: P450BM-3 Reductase Domain SI<br />

2 22 23 26 1 0.00 5.86 2<br />

22 24 23 25 1 0.00 1.26 2<br />

22 23 26 29 1 0.00 5.86 2<br />

23 26 27 28 1 0.00 1.26 2<br />

23 26 30 29 1 0.00 5.86 2<br />

26 29 30 31 1 0.00 1.26 2<br />

26 32 29 33 1 0.00 5.86 2<br />

29 32 33 34 1 0.00 3.77 2<br />

32 33 34 35 1 0.00 1.05 2<br />

33 35 34 36 1 0.00 1.05 2<br />

Improper dihedral parameters<br />

ai aj ak al function c0 c1<br />

1 15 2 21 2 5 167.42<br />

1 13 16 15 2 0 167.42<br />

1 15 16 17 2 0 167.42<br />

1 21 19 17 2 0 167.42<br />

1 2 3 4 2 160 167.42<br />

1 13 15 12 2 50 167.42<br />

1 16 15 13 2 180 167.42<br />

1 3 2 12 2 50 167.42<br />

1 19 21 20 2 180 167.42<br />

2 1 15 13 2 60 167.42<br />

2 12 3 13 2 60 167.42<br />

2 15 1 16 2 180 167.42<br />

2 3 12 10 2 180 167.42<br />

2 21 1 19 2 180 167.42<br />

2 4 3 6 2 180 167.42<br />

2 3 1 22 2 5 167.42<br />

3 1 2 21 2 160 167.42<br />

3 6 4 8 2 0 167.42<br />

115


PART II: P450BM-3 Reductase Domain SI<br />

3 4 6 7 2 180 167.42<br />

3 12 10 8 2 0 167.42<br />

3 2 4 12 2 5 167.42<br />

3 13 10 12 2 0 167.42<br />

3 12 13 15 2 50 167.42<br />

3 1 2 15 2 50 167.42<br />

4 12 3 10 2 0 167.42<br />

4 6 8 10 2 0 167.42<br />

4 6 8 9 2 180 167.42<br />

4 3 2 12 2 180 167.42<br />

4 12 3 13 2 180 167.42<br />

5 6 4 3 2 180 167.42<br />

6 4 8 7 2 0 167.42<br />

6 4 3 12 2 0 167.42<br />

6 10 8 12 2 0 167.42<br />

6 10 8 11 2 180 167.42<br />

7 8 6 9 2 0 167.42<br />

7 6 8 10 2 180 167.42<br />

8 10 6 9 2 0 167.42<br />

8 10 11 12 2 180 167.42<br />

9 8 10 11 2 0 167.42<br />

9 8 10 12 2 180 167.42<br />

10 8 12 11 2 0 167.42<br />

10 13 12 15 2 160 167.42<br />

11 12 10 3 2 180 167.42<br />

13 1 15 21 2 180 167.42<br />

13 10 12 8 2 180 167.42<br />

13 16 15 17 2 180 167.42<br />

14 15 13 12 2 135 167.42<br />

15 21 1 19 2 0 167.42<br />

116


PART II: P450BM-3 Reductase Domain SI<br />

15 17 16 19 2 0 167.42<br />

15 16 17 18 2 180 167.42<br />

15 13 1 16 2 0 167.42<br />

16 15 13 12 2 160 167.42<br />

16 1 15 21 2 0 167.42<br />

16 19 17 21 2 0 167.42<br />

16 17 19 20 2 180 167.42<br />

17 19 16 18 2 0 167.42<br />

18 17 19 20 2 0 167.42<br />

19 17 21 20 2 0 167.42<br />

21 19 17 18 2 180 167.42<br />

22 26 24 23 2 35 334.84<br />

23 29 27 26 2 35 334.84<br />

26 32 30 29 2 35 334.84<br />

117


PART II: P450BM-3 Reductase Domain SI<br />

Figure S4.1: Secondary structure per residue calculated by DSSP[2] along the trajectory as a<br />

function <strong>of</strong> time for (a) FOX, (b) FHQ <strong>and</strong> (c) APO. Color code represents different secondary<br />

structure elements.<br />

118


PART II: P450BM-3 Reductase Domain SI<br />

Figure S4.2: (a) Multiple structure alignment <strong>of</strong> FMN domain (1BVY) <strong>and</strong> its homologous<br />

structures (summarized in Table 2) with the conservation pr<strong>of</strong>ile, root mean square deviation<br />

(RMSD) <strong>and</strong> charge variation per residue created by using Chimera[3]. . Green <strong>and</strong> yellow color<br />

boxes show the helixes <strong>and</strong> beta str<strong>and</strong>s, respectively. However the purple color boxes represent<br />

the residues involved in FMN binding. (b) The phylogenetic tree <strong>of</strong> FMN domain <strong>and</strong> its homologous<br />

structures generated by ClustalW2[4].<br />

119


PART II: P450BM-3 Reductase Domain SI<br />

Figure S4.3: Evolutionary conservation pr<strong>of</strong>ile on the FMN domain <strong>of</strong> P450BM-3 using<br />

RWB color scheme. Red region shows the highly conserved residues in the domain. FMN<br />

domain is shown in cartoon representation with FMN c<strong>of</strong>actor in green color <strong>and</strong> labeled<br />

helixes <strong>and</strong> loop op regions. FMN binding loops are labeled in blue color. N <strong>and</strong> C represent<br />

the amino <strong>and</strong> carboxy terminus <strong>of</strong> FMN domain.<br />

Figure S4.4: Cumulative sum <strong>of</strong> the number <strong>of</strong> clusters obtained from the simulations. The<br />

sampling <strong>of</strong> clusters was performed over 50 ns <strong>of</strong> FOX (black) <strong>and</strong> FHQ (red) using RMSD cut<strong>of</strong>f <strong>of</strong><br />

0.04 nm. . The representative conformations <strong>of</strong> FMN c<strong>of</strong>actor in the first cluster <strong>of</strong> FOX (black) <strong>and</strong><br />

FHQ (red) are shown.<br />

120


PART II: P450BM-3 Reductase Domain SI<br />

References<br />

1. van Gunsteren WF, Billeter SR, Eising AA, Hunenberger PH, Kruger P, et al. (1996)<br />

Biomolecular Simulation: The GROMOS96 Manual <strong>and</strong> User Guide. VdF:<br />

Hochschulverlag AG an der ETH Zurich <strong>and</strong> BIOMOS bv, Zurich, Groningen.<br />

2. Kabsch W, S<strong>and</strong>er C (1983) Dictionary <strong>of</strong> protein secondary structure: pattern<br />

recognition <strong>of</strong> hydrogen-bonded <strong>and</strong> geometrical features. Biopolymers 22: 2577-<br />

2637.<br />

3. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, et al. (2004) UCSF<br />

Chimera--a visualization system for exploratory research <strong>and</strong> analysis. J Comput<br />

Chem 25: 1605-1612.<br />

4. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, et al. (2007) Clustal<br />

W <strong>and</strong> Clustal X version 2.0. Bioinformatics 23: 2947-2948.<br />

Part <strong>of</strong> this chapter is adapted with permission from ‘Verma R, Schwaneberg U,<br />

Roccatano D. Journal <strong>of</strong> Chemical Theory <strong>and</strong> Computation DOI: 10.1021/ct300723x.’<br />

121


PART II: P450BM-3 HEME/FMN Complex<br />

Chapter 5<br />

Insight into the redox partner interaction mechanism in<br />

cytochrome P450BM-3 using molecular dynamics<br />

simulation<br />

5.1. Abstract<br />

Flavocytochrome P450BM-3 is a soluble bacterial reductase composed by two flavin<br />

(FAD/FMN) <strong>and</strong> one HEME domains. The underst<strong>and</strong>ing <strong>of</strong> atomic details <strong>of</strong> the inter<br />

domain electron transfer (ET) mechanism is requisite for better exploitation <strong>of</strong> the enzyme<br />

in biotechnological applications <strong>and</strong> to extend the knowledge <strong>of</strong> P450 proteins family in<br />

general. In this paper, we have performed molecular dynamics (MD) simulations on both<br />

FMN <strong>and</strong> HEME domains, isolated <strong>and</strong> their crystallographic complex to study their binding<br />

modes <strong>and</strong> to garner insight on structural determinant for inter-domain ET. In the<br />

simulation <strong>of</strong> the complex, we observed conformational rearrangements in both the<br />

domains that reduce the separation between FMN <strong>and</strong> HEME c<strong>of</strong>actor. In particular,<br />

FMN/HEME closest distance was decreased from 1.84 nm (in crystal structure) to an<br />

average <strong>of</strong> 1.41 ± 0.09 nm during the simulation with a minimum distance <strong>of</strong> 1.02 nm.<br />

These distance values are within the range <strong>of</strong> distance for ET tunneling between the two<br />

redox centers. The analysis <strong>of</strong> the possible ET pathways in the crystal complex indicates<br />

Met490 <strong>of</strong> ribityl tail binding loop <strong>of</strong> FMN domain, <strong>and</strong> Ala399 <strong>and</strong> Cys400 <strong>of</strong> HEME<br />

domain as possible mediators for ET. However, during simulation, at ~1.41 nm FMN/HEME<br />

122


PART II: P450BM-3 HEME/FMN Complex<br />

distance, in spite <strong>of</strong> Ala399, Cys400, Phe393 along with Met490 take part in ET while, with<br />

the minimum FMN/HEME distance <strong>of</strong> ~1.02 nm, only Met490 mediates the ET tunneling.<br />

The results <strong>of</strong> the simulations are in agreement with previously proposed hypotheses that<br />

the crystal complex <strong>of</strong> FMN/HEME domains is not in the optimal arrangement for favorable<br />

needed electron transfer rate under physiological conditions.<br />

5.2. Introduction<br />

Cytochrome P450s are the regio- <strong>and</strong> stereo-selective monooxygenase <strong>of</strong> the family<br />

oxidoreductase with a wide variety <strong>of</strong> substrates.[1-3] They have been studied as the<br />

potential catalyst for the production <strong>of</strong> high value oxygenated organic molecules to<br />

promote enzyme-mediated product formation.[4-6] In particular, cytochrome P450BM-3 is<br />

a NADPH dependent fatty acid hydroxylase system, isolated from soil bacterium Bacillus<br />

megaterium.[7,8] This enzyme is an attractive target <strong>and</strong> model system for biochemical <strong>and</strong><br />

biomedical applications for different reasons. First, it is a stable, catalytically self-sufficient<br />

protein with a convenient multidomain structure that allows easier production <strong>and</strong><br />

h<strong>and</strong>ling than other monooxygenases <strong>of</strong> the same family. Second, it is a water soluble<br />

enzyme with a high catalytic efficiency <strong>and</strong> oxygenase rate <strong>and</strong> readily expressed<br />

recombinantly.[9,10] Third, it resembles to eukaryotic diflavin reductase such as human<br />

microsomal P450s. As a pivotal member <strong>of</strong> its super family it has been widely studied as an<br />

important model system for the comprehension <strong>of</strong> structure-function-dynamics<br />

relationships with the wealth <strong>of</strong> structural <strong>and</strong> kinetic data.[11,12]<br />

P450BM-3, being a multidomain protein, has two reductase flavin adenine<br />

dinucleotide (FAD)- <strong>and</strong> flavin mononucleotide (FMN)- binding domains <strong>and</strong> a HEME<br />

domain arranged as HEME-FMN-FAD on a single polypeptide chain.[13,14] The main<br />

catalytic function <strong>of</strong> P450s is to transfer oxygen atom from molecular oxygen to their<br />

substrates. During the reaction, the enzyme is reduced by NADPH, with electrons first<br />

transferred to FAD c<strong>of</strong>actor <strong>of</strong> FAD-binding domain <strong>and</strong> then to HEME iron in the substrate<br />

123


PART II: P450BM-3 HEME/FMN Complex<br />

bound HEME domain mediated by FMN c<strong>of</strong>actor <strong>of</strong> FMN-binding domain. The<br />

crystallization <strong>of</strong> the whole P450BM-3 protein has been proven difficult due to the<br />

presence <strong>of</strong> flexible linker regions between domains. However, the crystallographic<br />

structures <strong>of</strong> the isolated HEME domain[15], FAD domain[16] <strong>and</strong> a non-stoichiometric<br />

complex with one FMN <strong>and</strong> two HEME domains[15] are available in the PDB database. In<br />

the FMN/HEME complex (PDB ID: 1BVY) the smallest edge to edge distance between redox<br />

centers is 1.81 nm.[15] However, it has been shown from the survey <strong>of</strong> electron transfer<br />

(ET) in oxidoreductase protein structures that the latter should be less than 1.40 nm for an<br />

efficient ET tunneling between redox centers in the protein environment.[17] Munro et al.<br />

used modeling approach to rationalize the electron transfer between FMN to HEME <strong>and</strong><br />

postulated the movement <strong>of</strong> FMN domain is essential to decrease the distance between<br />

FMN <strong>and</strong> HEME c<strong>of</strong>actors within the physiological range (less than 1.40 nm) for ET.[11]<br />

In this study, we aim to extend our knowledge regarding structure-functiondynamics<br />

relationships in P450BM-3 at atomistic level using Molecular dynamics (MD)<br />

simulations <strong>of</strong> the isolated HEME <strong>and</strong> FMN domains <strong>and</strong> <strong>of</strong> their complex in water. It has<br />

been proved experimentally that specific arrangement <strong>of</strong> HEME <strong>and</strong> FMN domain is<br />

responsible for the catalytic efficiency <strong>and</strong> high oxygenase rate <strong>of</strong> P450BM-3.[18] In this<br />

paper, for the first time, the dynamics in solution <strong>of</strong> the complex <strong>and</strong> the isolated HEME <strong>and</strong><br />

FMN domains will be comparatively investigated. In particular, the relative rearrangement<br />

<strong>of</strong> FMN/HEME domains <strong>and</strong> how the latter affects the ET pathways from isoalloxazine ring<br />

<strong>of</strong> FMN c<strong>of</strong>actor to HEME iron will be analyzed.<br />

The chapter is organized as follows. The details <strong>of</strong> the MD simulations <strong>and</strong> the<br />

analysis <strong>of</strong> the trajectories are reported in the Method section. The Results <strong>and</strong> Discussions<br />

section is organized as follows. In the first part, the general structural <strong>and</strong> properties <strong>of</strong> the<br />

simulated systems to assess the quality <strong>of</strong> the simulation are reported. Cluster analysis is<br />

used to identify representative structures to evidence the difference <strong>of</strong> the domain in<br />

solution <strong>and</strong> in the complex. The following paragraphs will focus on the ET pathways<br />

between the FMN <strong>and</strong> HEME calculated on selected conformations from the cluster analysis<br />

<strong>of</strong> the trajectory. The structural behavior <strong>of</strong> the substrate access channel will be also<br />

124


PART II: P450BM-3 HEME/FMN Complex<br />

reported. Hence, the collective dynamics <strong>of</strong> the system will be analyzed using the principal<br />

component analysis <strong>of</strong> the trajectories. Finally, in the conclusion section a summary <strong>of</strong> the<br />

outcome <strong>of</strong> the study is provided.<br />

5.3. Methods<br />

5.3.1. Starting coordinates<br />

The non-stoichiometric FMN/(HEME)2 complex <strong>of</strong> one FMN domain without<br />

substrate (PDB ID: 1BVY with resolution 0.203 nm)[15] were used as to obtain the starting<br />

coordinate for MD simulation. Out <strong>of</strong> two HEME domains (chain A: 20 - 450) was in close<br />

proximity <strong>of</strong> FMN domain (chain F: 479 - 630) in the crystal structure. Hence, These A <strong>and</strong><br />

F chains were extracted from crystal structure (including crystallographic water within<br />

0.60 nm from the proteins) <strong>and</strong> used as starting coordinates for MD simulation. 1,2-<br />

ethanediol molecules were removed from the crystallographic structure <strong>and</strong> replaced by<br />

water molecules.<br />

5.3.2. Molecular dynamic simulations<br />

The GROMOS96 43a1 force field[19] was used for all simulations. The MD<br />

simulations performed in this study are summarized in Table 5.1. Figure 5.1 shows the<br />

FMN <strong>and</strong> HEME c<strong>of</strong>actors in stick representation. The parameters for the ferric iron <strong>of</strong><br />

HEME c<strong>of</strong>actor were adopted from Helms et al.[20] <strong>and</strong> has been employed already for the<br />

MD simulation <strong>of</strong> P450BM-3 HEME domain by Roccatano et al..[21,22] The partial charges<br />

were redistributed on porphyrin ring <strong>of</strong> HEME c<strong>of</strong>actor to adopt the parameters for<br />

GROMOS96 43a1 force field[19] with hydrogen atoms bound to bridging carbon in<br />

porphyrin ring (see Table S5.1 in Supplementary Information (SI)). FMN c<strong>of</strong>actor was in<br />

oxidized state in the FMN domain. Additional improper dihedrals were introduced to adopt<br />

125


PART II: P450BM-3 HEME/FMN Complex<br />

the conformation <strong>of</strong> isoalloxazine ring as observed in crystallographic structure <strong>and</strong><br />

molecular geometry optimization <strong>of</strong> flavin in both redox states.[23-25]<br />

Table 5.1: Summary <strong>of</strong> the MD simulations <strong>of</strong> P450BM-3 in water<br />

Starting coordinates<br />

No. <strong>of</strong> atoms No. <strong>of</strong> solvent No.<br />

<strong>of</strong><br />

molecules counter ions<br />

(Na + )<br />

HEME Domain (A<br />

65650 20365 16<br />

chain)<br />

FMN Domain (F chain) 33483 10650 14<br />

Complex (AF chain) 86101 26671 30<br />

Simulation<br />

length (ns)<br />

100<br />

100<br />

100<br />

*The abbreviation A, F <strong>and</strong> AF chain will be used in rest <strong>of</strong> the paper for HEME domain, FMN domain <strong>and</strong><br />

HEME/FMN complex, respectively.<br />

126


PART II: P450BM-3 HEME/FMN Complex<br />

Figure 5.1: (a) HEME c<strong>of</strong>actor <strong>and</strong> (b) FMN c<strong>of</strong>actor are in stick representation, colored by<br />

elements such as, oxygen in red, nitrogen in blue, hydrogen in green, iron or phosphorus in orange<br />

<strong>and</strong> carbon in gray, with atomic labeling according to GROMOS96[19] topology.<br />

5.3.2. Electron transfer tunneling<br />

Electron tunneling (ET) from FMN to HEME c<strong>of</strong>actor was calculated by the program<br />

Pathways.[26,27] The method calculates donor to acceptor partial electronic coupling<br />

influenced by protein structure using graph theory to identify the electron transfer<br />

pathways in biological electron transfer reactions.[26] FMN to HEME c<strong>of</strong>actor ET pathway<br />

was identified in the crystal structure <strong>and</strong> P450BM-3 conformation after rearrangement by<br />

taking C8 atom <strong>of</strong> isoalloxazine ring <strong>of</strong> FMN c<strong>of</strong>actor as donor <strong>and</strong> HEME iron as acceptor<br />

with the default parameters <strong>of</strong> Pathways. The ET pathway was visualized by VMD.[28]<br />

5.4. Results <strong>and</strong> discussion<br />

5.4.1. Structural properties<br />

The structural stability <strong>and</strong> convergence <strong>of</strong> P450BM-3 domains were examined by<br />

analyzing root mean square deviation (RMSD), radius <strong>of</strong> gyration (Rg) <strong>and</strong> secondary<br />

structure elements with respect to crystal structure during the MD simulation. Figure 5.2a<br />

shows the backbone RMSD <strong>of</strong> the proteins as a function <strong>of</strong> time. The total RMSD curves for<br />

both the AF chains <strong>and</strong> the single A chain reach to a plateau with an average RMSD <strong>of</strong> 0.41 ±<br />

0.03 nm <strong>and</strong> 0.36 ± 0.03 nm, respectively. The RMSD <strong>of</strong> isolated A chain shows an average<br />

plateau to a slightly lower value <strong>of</strong> 0.33 ± 0.02 nm. The F chain in the complex increases its<br />

RMSD value to an average on the last 10 ns <strong>of</strong> simulation <strong>of</strong> 0.25 ± 0.02 nm. The RMSD <strong>of</strong><br />

isolated F chain increases rapidly to stabilize after 10 ns <strong>of</strong> simulation to an average value<br />

<strong>of</strong> 0.26 ± 0.02 nm. In Figure 5.2b, the radius <strong>of</strong> gyrations is also reported. In the first 10 ns<br />

127


PART II: P450BM-3 HEME/FMN Complex<br />

<strong>of</strong> the simulation, the Rg <strong>of</strong> the complex decreases <strong>of</strong> ~3.7% from the crystallographic<br />

value (2.42 nm) to the average value <strong>of</strong> 2.33 ± 0.01 nm. The variation <strong>of</strong> the single A<br />

domain in the complex <strong>and</strong> in solution with respect the initial structure (2.16 nm) is less<br />

than 1.8% (2.12 ± 0.01 nm <strong>and</strong> 2.14 ± 0.01 nm, respectively). F chain does not show<br />

variation from the crystal structure (1.45 nm) with an average <strong>of</strong> 1.45 ± 0.01 nm <strong>and</strong> 1.46 ±<br />

0.01 nm for isolated F chain <strong>and</strong> in complex simulation, respectively.<br />

Figure 5.2: (a) Backbone RMSD <strong>and</strong> (b) Rg with respect to crystal structure as a function <strong>of</strong> time<br />

for AF chain (black), A <strong>of</strong> AF chain (red), F <strong>of</strong> AF chain (green), A chain (blue) <strong>and</strong> F chain (orange).<br />

In P450BM-3, A <strong>and</strong> F chain have structurally conserved P450 <strong>and</strong> flavodoxin like<br />

protein fold, respectively. Figure 5.3c shows the structure with labeled helices <strong>of</strong> A (A to L)<br />

<strong>and</strong> F (α1 to α4) chain <strong>and</strong> FMN binding loops (Lβ1, Lβ3 <strong>and</strong> Lβ4). The loop regions<br />

together with irregular structures (coils <strong>and</strong> turns) are named according to the secondary<br />

structure element (α helix or β sheet) preceding them. DSSP criteria[29]<br />

were used to<br />

follow the secondary structure <strong>of</strong> the P450BM-3 domains in isolated <strong>and</strong> complex MD<br />

simulations (Figure S5.1 in SI). The secondary structure remains fairly conserved during<br />

the simulations.<br />

128


PART II: P450BM-3 HEME/FMN Complex<br />

Figure 5.3a <strong>and</strong> 5.3b show residual RMSD <strong>and</strong> RMSF with respect to crystal<br />

structure, respectively. The regions involved in c<strong>of</strong>actor binding show smaller deviation<br />

<strong>and</strong> fluctuation from the crystal structure in isolated (in red color) <strong>and</strong> complex (black<br />

color) simulations. For both domains, the loop regions <strong>and</strong> N- <strong>and</strong> C- terminus show higher<br />

deviation. The isolated domains deviate more than the one in complex except the region<br />

between helices, A - B, B’ - C, H - I, <strong>and</strong> K - L <strong>and</strong> in G helix in A chain <strong>and</strong> Lβ3 in F chain.<br />

Isolated F chain shows largest deviation in Lβ2 <strong>and</strong> Lβ4 regions. In both systems, F chain<br />

shows higher fluctuation in Lβ2 <strong>and</strong> Lα2 loops. In complex simulation, the loop regions<br />

A/B, <strong>and</strong> F/G fluctuate more. While in isolated F chain simulation, inner FMN c<strong>of</strong>actor<br />

binding loop Lβ3 fluctuate slightly more.<br />

Figure 5.3: Backbone RMSD (a) <strong>and</strong> RMSF (b) per residue with respect to crystal structure for<br />

isolated domains (in red) <strong>and</strong> in complex (in black) MD simulations. The green vertical line<br />

separates HEME <strong>and</strong> FMN domains. Horizontal bars, in blue <strong>and</strong> orange color represent helices<br />

(labeled) <strong>and</strong> beta sheets, respectively. The regions involved in c<strong>of</strong>actor binding are represented by<br />

horizontal bars in purple color. (c) HEME <strong>and</strong> FMN domain are in cartoon representation in sky<br />

blue <strong>and</strong> tan color, respectively. HEME <strong>and</strong> FMN c<strong>of</strong>actor are in red <strong>and</strong> green color, respectively.<br />

Helices, c<strong>of</strong>actors, FMN binding regions <strong>and</strong>, N- <strong>and</strong> C- terminus are labeled.<br />

129


PART II: P450BM-3 HEME/FMN Complex<br />

5.4.2. Cluster analysis<br />

The first two clusters account for 46.16 % <strong>and</strong> 27.12 % for AF chain, respectively.<br />

For isolated domain simulation, A chain <strong>and</strong> F chain have 6 clusters <strong>and</strong> in complex<br />

simulation 7 <strong>and</strong> 8 clusters, respectively. The first two clusters <strong>of</strong> A chain in complex covers<br />

76.87 % <strong>and</strong> 10.99 % <strong>and</strong> as isolated domains they account for 46.53 % <strong>and</strong> 30.12 %,<br />

respectively. The first cluster <strong>of</strong> F chain covers ~64 % <strong>and</strong> second cluster covers 21.05%<br />

<strong>and</strong> 23.05 % in complex <strong>and</strong> isolated simulation, respectively. A chain is more liable for<br />

conformational change in isolated simulation than in complex, while F chain shows the<br />

negligible difference in conformation space in both the simulations.<br />

Figure 5.4: The representative conformation <strong>of</strong> first cluster <strong>of</strong> (a) AF, (b) A <strong>and</strong> (c) F chain in<br />

cartoon representation superimposed with crystal structure. In the crystal structure, A <strong>and</strong> F chain<br />

are in sky blue <strong>and</strong> tan color, respectively. For the complex simulation, A <strong>and</strong> F chain are in dark<br />

blue <strong>and</strong> brown color, respectively. For the isolated domain simulation, A <strong>and</strong> F chain are in orange<br />

130


PART II: P450BM-3 HEME/FMN Complex<br />

<strong>and</strong> purple color, respectively. HEME <strong>and</strong> FMN c<strong>of</strong>actors are in green, red <strong>and</strong> blue color in crystal<br />

structure, isolated domain <strong>and</strong> in complex structure, respectively. The helices <strong>and</strong> FMN c<strong>of</strong>actor<br />

binding loops are labeled. Amino <strong>and</strong> carboxy terminal <strong>of</strong> the domain are labeled in red color.<br />

Figure 5.4a, 5.4b <strong>and</strong> 5.4c show the crystal structure superimposed with the<br />

representative conformation <strong>of</strong> the first cluster <strong>of</strong> AF chain <strong>and</strong>, A <strong>and</strong> F chain in isolated<br />

<strong>and</strong> complex simulation, respectively. Major differences were observed in the loop regions<br />

<strong>of</strong> the domains in both simulations. N terminus region (residue 20 – 82, including A, B <strong>and</strong><br />

B’ helices) <strong>of</strong> A chain deviates more from crystal structure in both the simulations. In<br />

complex simulation, larger deviation in G helix <strong>and</strong>, H/I <strong>and</strong> K/L (residue 380 - 390) loop<br />

region <strong>of</strong> A chain. Residues 380 – 390 in K/L loop region precedes HEME c<strong>of</strong>actor binding<br />

region. H/I <strong>and</strong> K/L loops are involved in the binding <strong>of</strong> FMN domain. α2 helix <strong>of</strong> F chain<br />

shows larger deviation in isolated F chain simulation <strong>and</strong> resulted in a compact<br />

conformation <strong>of</strong> FMN domain in solution than in complex. In complex simulation, the<br />

representative structure <strong>of</strong> first cluster <strong>of</strong> AF chain represents the conformational<br />

rearrangement in both domains to increase compactness <strong>of</strong> AF chains complex. The<br />

deviations in both the domains from crystal structure mainly involve G helix <strong>and</strong> H/I <strong>and</strong><br />

K/L loops <strong>of</strong> A chain <strong>and</strong> displacement <strong>of</strong> F chain towards HEME domain that resulted into<br />

the decrease in the minimum distance between both the c<strong>of</strong>actors.<br />

5.4.3. Substrate access channel<br />

Pro45 <strong>and</strong> Ala191 were found to be at the mouth <strong>of</strong> substrate access channel. In the<br />

crystal structure <strong>of</strong> P450BM-3 complex, P45Cα - A191Cα is 1.61 nm apart (0.87 nm in A<br />

chain <strong>of</strong> 1BU7). Chang et. al. observed that the substrate binding was not dramatically<br />

affected by the closeness <strong>of</strong> substrate access channel in P450BM-3 using MD simulation<br />

<strong>and</strong> docking approach.[30] The behavior <strong>of</strong> substrate access channel has been accessed by<br />

monitoring the distance between these two residues by Roccatano et al..[22] P45Cα -<br />

A191Cα minimum distance was calculated <strong>and</strong> reported in Figure 5.5 during the isolated<br />

domain <strong>and</strong> complex simulations. Both simulations show higher variations in P45Cα -<br />

131


PART II: P450BM-3 HEME/FMN Complex<br />

A191Cα distance in the first 20 ns simulation. After that in isolated A chain, an average<br />

distance <strong>of</strong> 1.11 ± 0.10 nm was observed with slight variations. In A chain <strong>of</strong> AF chain, the<br />

P45Cα - A191Cα distance continues decreasing till 32 ns simulation <strong>and</strong> reaches to an<br />

average distance <strong>of</strong> 0.59 ± 0.10 nm. In comparison to isolated A chain substrate access<br />

channel was partially closed in A chain in complex that might be the result <strong>of</strong> more<br />

deviation <strong>of</strong> F/G loop in complex than in isolated domain.<br />

Figure 5.5: Minimum distance between P45Cα <strong>and</strong> A191Cα as a function <strong>of</strong> time for A chain in<br />

isolated (in red color) <strong>and</strong> complex (in black color) simulations.<br />

In the crystal structure, the crystallographic water molecule was not ligated to heme<br />

iron (Fe) (distance in 1BVY > 6 nm <strong>and</strong> 1BU7 0.24 nm). When A chain was solvated in<br />

water (crystal structure), the water molecule was present at the distance <strong>of</strong> 0.47 nm <strong>and</strong><br />

0.34 nm from heme iron in isolated <strong>and</strong> complex simulation. Figure S5.2 in SI shows the<br />

minimum distance between Fe <strong>and</strong> water molecules (every 100 ps). An average distance <strong>of</strong><br />

0.28 ± 0.13 nm <strong>and</strong> <strong>of</strong> 0.34 ± 0.14 nm was observed for A chain in complex <strong>and</strong> isolated<br />

domain simulation, respectively.<br />

132


PART II: P450BM-3 HEME/FMN Complex<br />

5.4.4. ET tunneling pathways<br />

The minimum distance between heavy atoms <strong>of</strong> isoalloxazine ring <strong>of</strong> FMN <strong>and</strong><br />

HEME c<strong>of</strong>actors is represented in Figure 5.6 (the AF chain simulation was extended to 150<br />

ns to check the distance convergence). During the simulation, FMN/HEME distance is<br />

decreased from 1.81 nm (in crystal structure) to an average distance <strong>of</strong> 1.41 ± 0.09 nm with<br />

the minimum distance <strong>of</strong> 1.02 nm that is within the range for expected ET between redox<br />

centers[17] (1.40 - 1.50 nm) <strong>and</strong> proposed by Munro et al. 11 The decreased distance might<br />

result into the ET rate <strong>of</strong> 10 8 to 10 11 s -1 , that is consistent with experimental <strong>and</strong> theoretical<br />

observations.[11,17]<br />

Figure 5.6: Minimum distance between heavy atoms <strong>of</strong> isoalloxazine ring <strong>of</strong> FMN <strong>and</strong> HEME<br />

c<strong>of</strong>actor as a function <strong>of</strong> time. Red color horizontal line shows the distance observed in crystal<br />

structure.[15]<br />

133


PART II: P450BM-3 HEME/FMN Complex<br />

Figure 5.7a, 5.7b <strong>and</strong> 5.7c show the ET pathway identified by Pathways VMD<br />

plugin[27] in the crystal structure (min. dist 1.80 nm), representative <strong>of</strong> first cluster <strong>of</strong> AF<br />

chain (minimum distance 1.41 nm) <strong>and</strong> AF chain with minimum distance (minimum<br />

distance 1.10 nm) between FMN to HEME c<strong>of</strong>actor, respectively. In Table 5.2, the results <strong>of</strong><br />

the analysis are summarized. In the crystal structure, FMN to HEME ET tunneling is<br />

mediated by solvent molecules as well but after rearrangement in AF conformation, FMN<br />

c<strong>of</strong>actor come close to HEME c<strong>of</strong>actor <strong>and</strong> eliminate the involvement <strong>of</strong> water molecules in<br />

ET tunneling. In Figure 5.7c, when the FMN to HEME distance is ~ 1 nm, ET tunneling is<br />

mediated by the Met490 residue only <strong>and</strong> the ET pathway length decrease from 2.7 nm (in<br />

crystal structure) to 1.8 nm <strong>and</strong> electronic coupling from 4.00 x 10 -10 to 2.68 x 10 -8 ,<br />

respectively.<br />

Table 5.2: Electron transfer tunneling pathway in AF chain complex calculated by<br />

Pathways[27] VMD plugin.<br />

Coordinates<br />

FMN/HEME<br />

Max. Distance<br />

minimum<br />

coupling along ET<br />

distance<br />

(a.u.) pathway (nm)<br />

(nm)<br />

Crystal<br />

structure<br />

1.8 4.00 x10 -10 2.70<br />

First cluster 1.4 9.07 x10 -9 1.96<br />

Minimum<br />

FMN/HEME 1.1 2.68 x10 -8 1.79<br />

distance<br />

Amino acids involved in<br />

the ET pathway<br />

FMN(C8) → M490 →<br />

Sol → Sol → A399 →<br />

C400 → HEME(FE)<br />

FMN(C8) → M490 →<br />

→ F393 → HEME(FE)<br />

FMN(C8) → M490 →<br />

→ HEME(FE)<br />

134


PART II: P450BM-3 HEME/FMN Complex<br />

Figure 5.7: ET tunneling from the isoalloxazine ring (C8 atom) <strong>of</strong> FMN c<strong>of</strong>actor (in gray color) to<br />

iron center <strong>of</strong> HEME c<strong>of</strong>actor (in black color) represented by red color tubes in a) crystal structure,<br />

b) conformation <strong>of</strong> first cluster <strong>and</strong> c) conformation with minimum distance between HEME to FMN<br />

c<strong>of</strong>actor. The amino acids with in the distance <strong>of</strong> 0.50 nm from both the c<strong>of</strong>actors are labeled <strong>and</strong><br />

shown in licorice representation colored by element type (oxygen in red, carbon in cyan <strong>and</strong><br />

nitrogen in blue color) <strong>and</strong> their associated secondary structure in cartoon representation in sky<br />

blue for HEME domain <strong>and</strong> in orange color for FMN domain. The residues involved in electron<br />

tunneling are represented <strong>and</strong> labeled in green color.<br />

5.4.5. Essential dynamics<br />

The cumulative sum <strong>of</strong> relative positional fluctuation (RPF) <strong>of</strong> first 50 eigenvectors<br />

<strong>of</strong> A <strong>and</strong> F chain in isolated <strong>and</strong> complex simulation is greater than 69% <strong>and</strong> reported in<br />

135


PART II: P450BM-3 HEME/FMN Complex<br />

Figure S5.3 <strong>of</strong> SI. RMSIP for first twenty eigenvectors <strong>of</strong> A chain <strong>and</strong> F chain in both<br />

simulations was less than 0.53. The inner product value <strong>of</strong> the first three eigenvectors for A<br />

<strong>and</strong> F chain were less than 0.25 <strong>and</strong> 0.43, respectively. The overlap <strong>and</strong> inner product<br />

analysis indicate the existence <strong>of</strong> different set <strong>of</strong> collective motions in the eigenvectors <strong>of</strong><br />

same time windows <strong>of</strong> both the trajectories.<br />

Figure 5.8a, 5.8b <strong>and</strong> 5.8c represent RPF associated with first three eigenvectors <strong>of</strong><br />

A <strong>and</strong> F chain in isolated (in red color) <strong>and</strong> complex (in black color) simulation. Figure 5.9<br />

show the RMSF associated with first three eigenvector (a, b <strong>and</strong> c) <strong>of</strong> A (in sky blue) <strong>and</strong> F<br />

(in tan color) chain in isolated (a1, b1 <strong>and</strong> c1) <strong>and</strong> complex (a2, b2 <strong>and</strong> c2) simulation,<br />

respectively.<br />

Figure 5.8: RPF for (a) first, (b) second <strong>and</strong> (c) third eigenvector <strong>of</strong> A <strong>and</strong> F chain in isolated (red<br />

color) <strong>and</strong> complex (black color) simulation. The green vertical line separates HEME <strong>and</strong> FMN<br />

domain. Horizontal bars, in blue <strong>and</strong> orange color represent helixes (labeled) <strong>and</strong> beta sheets,<br />

136


PART II: P450BM-3 HEME/FMN Complex<br />

respectively. The regions involved in c<strong>of</strong>actor binding are represented by horizontal bars in purple<br />

color.<br />

In complex simulation, the first collective motion (Figure 5.9a1) <strong>of</strong> A chain involves<br />

the turn succeeds beta sheet 1 (residues 44 – 48, highest RPF for Arg47 that is involved in<br />

substrate binding), D/E loop (residues 130 – 138), F/G loop (residues 190 – 196), K/L loop<br />

(residue 385 – 390) <strong>and</strong> C- terminus loop (residues 425 – 432 <strong>and</strong> 452 – 458). The<br />

cooperative motion in the turn succeeds beta sheet 1 <strong>and</strong> F/G loop related to the<br />

movement <strong>of</strong> substrate access channel closing <strong>and</strong> opening. Residue F393 <strong>of</strong> the latter<br />

region <strong>of</strong> K/L loop was involves FMN domain binding <strong>and</strong> found to be involved in ET<br />

tunneling in the average structure <strong>of</strong> first cluster <strong>of</strong> AF chain. The first collective mode <strong>of</strong> F<br />

chain in complex involves the major contribution <strong>of</strong> Lα2 loop with slightly higher RPF <strong>of</strong><br />

Lβ2 <strong>and</strong> Lβ3 (inner FMN binding loop). The cooperative motion <strong>of</strong> Lα2 <strong>and</strong> Lβ3 might<br />

facilitate ET tunneling from FMN to HEME c<strong>of</strong>actor. In complex the collective motions <strong>of</strong><br />

both the domains were synchronized to relate ET tunneling <strong>and</strong> change in substrate<br />

binding. The effect was clearly seen when the first eigenvectors <strong>of</strong> AF chain was compared<br />

with that <strong>of</strong> A <strong>and</strong> F chain (reported in Figure S5.4a <strong>and</strong> S5.5a in SI). In both AF <strong>and</strong>, A <strong>and</strong> F<br />

chain, the first eigenvector show fluctuations in the same regions with higher fluctuations<br />

in the collective mode <strong>of</strong> AF chain <strong>and</strong> cooperative effect due to their binding. The second<br />

collective mode in A chain involve mainly the motion in D/E <strong>and</strong> G/H loops, beta sheets in<br />

K/L regions <strong>and</strong> A/B region <strong>and</strong> the third collective motion was restricted to D/E <strong>and</strong> G/H<br />

loops <strong>and</strong> C-terminus loop (residues 425 – 432). F chain shows involvement <strong>of</strong> Lα2 <strong>and</strong> Lβ2<br />

loops <strong>and</strong> C- terminus region in the second collective mode <strong>and</strong> Lα2, Lβ3 <strong>and</strong> Lβ5 in the<br />

third eigenvector. In AF chain, the collective motion associated with the first two<br />

eigenvectors belongs to the movement <strong>of</strong> F chain towards A chain to decrease the distance<br />

between FMN <strong>and</strong> HEME c<strong>of</strong>actor <strong>and</strong> show slightly higher fluctuation than in the<br />

individual chains. In the third eigenvector the major difference was observed mainly in Lβ3<br />

<strong>and</strong> Lα2 loop <strong>of</strong> F chain with higher fluctuations. The collective motion associated with the<br />

first three eigenvectors <strong>of</strong> AF chain is reported in Figure S5.5a, S5.5b <strong>and</strong> S5.5c,<br />

respectively in SI.<br />

137


PART II: P450BM-3 HEME/FMN Complex<br />

Figure 5.9: RMSF <strong>of</strong> protein backbone atoms along first (a), second (b) <strong>and</strong> third (c)<br />

eigenvector after projection <strong>of</strong> the trajectory on the corresponding eigenvector <strong>of</strong> A <strong>and</strong> F<br />

138


PART II: P450BM-3 HEME/FMN Complex<br />

chain in complex simulation (a1, b1 <strong>and</strong> c1) <strong>and</strong> in isolated simulation (a2, b2 <strong>and</strong> c2). The<br />

10 sequential frames represent the extension <strong>of</strong> the fluctuations in trajectories along the<br />

eigenvectors. The first extreme conformation is shown in green color <strong>and</strong> last extreme in<br />

violet color. Other conformations <strong>of</strong> A <strong>and</strong> F chain are in sky blue <strong>and</strong> tan color,<br />

respectively. Helices <strong>and</strong> loops in FMN domain are labeled. N <strong>and</strong> C indicate the N- <strong>and</strong> C-<br />

terminus <strong>of</strong> the protein (labeled in red color).<br />

The isolated A chain have the first collective mode (Figure 5.9a2) have higher RPF at<br />

the end <strong>of</strong> C helix (residue 103 – 107) <strong>and</strong> C- terminus (residues 452 – 458). Other region<br />

involves in first collective mode were D/E, E/F, F/G <strong>and</strong> K/L loop (residue 385 – 390).<br />

Together the motion in related to the change in substrate binding region <strong>and</strong> FMN domain<br />

binding region. The first collective mode <strong>of</strong> the isolated F chain shows higher RPF in Lα2<br />

<strong>and</strong> Lβ2, <strong>and</strong> slightly high RPF in Lβ3 <strong>and</strong> Lβ4. In the isolated domains, the collective<br />

motions were more independent i.e. in F chain related to binding <strong>of</strong> FMN c<strong>of</strong>actor <strong>and</strong> in A<br />

chain restricted to substrate binding region. The second collective mode in A chain involve<br />

mainly the motion in D/E, E/F <strong>and</strong> F/G loops <strong>and</strong> only in F/G region in the third collective<br />

motion. F chain shows involvement <strong>of</strong> Lα2 <strong>and</strong> Lβ2 in the second collective mode <strong>and</strong><br />

Lα2 <strong>and</strong> Lβ3 in the third eigenvector.<br />

5.5. Conclusions<br />

We performed MD simulation on HEME <strong>and</strong> FMN domains as isolated domain or in<br />

complex. Structure remains conserved in both the systems throughout the simulation.<br />

During simulation, HEME/FMN complex undergoes into the conformational rearrangement<br />

in the first 10 ns simulation (with decrease in Rg from 2.42 nm to 2.33 nm) <strong>and</strong> resulted<br />

into the compactness <strong>of</strong> the complex with decrease in FMN/HEME distance from 1.81 nm<br />

to an average 1.41 nm. FMN domain in solution show major conformational change in Lα2<br />

loop in the absence <strong>of</strong> HEME domain. In isolated HEME domain major conformational<br />

139


PART II: P450BM-3 HEME/FMN Complex<br />

change were observed in FMN binding region especially in C helix <strong>and</strong> H/I <strong>and</strong> K/L (residue<br />

385 – 395) loops. G helix <strong>and</strong> inner FMN c<strong>of</strong>actor loop (Lβ3) fluctuate more in both the<br />

simulations. Both domains differ in the atomic fluctuation amplitude in isolated <strong>and</strong><br />

complex simulation. In complex the collective motion was dominated by the interaction<br />

mechanism between HEME <strong>and</strong> FMN domain <strong>and</strong> associated change in substrate access<br />

channel. The movement <strong>of</strong> FMN domain over HEME domain might be related to ET<br />

mechanism in P450BM-3 as proposed earlier <strong>and</strong> responsible to the ET rate between both<br />

the domains in the range from 10 8 to 10 11 s -1 under physiological condition as observed<br />

experimentally <strong>and</strong> proposed theoretically earlier.[11]<br />

5.6. References<br />

1. Chefson A, Auclair K (2006) Progress towards the easier use <strong>of</strong> P450 enzymes. Mol<br />

Biosyst 2: 462-469.<br />

2. Wong LL (1998) Cytochrome P450 monooxygenases. Curr Opin Chem Biol 2: 263-<br />

268.<br />

3. Guengerich FP (2001) Common <strong>and</strong> uncommon cytochrome P450 reactions related<br />

to metabolism <strong>and</strong> chemical toxicity. Chem Res Toxicol 14: 611-650.<br />

4. Urlacher VB, Eiben S (2006) Cytochrome P450 monooxygenases: perspectives for<br />

synthetic application. Trends biotechnol 24: 324-330.<br />

5. Bernhardt R (2006) Cytochromes P450 as versatile biocatalysts. J Biotechnol 124:<br />

128-145.<br />

6. Coon MJ (2005) Cytochrome P450: nature's most versatile biological catalyst. Annu<br />

Rev Pharmacol Toxicol 45: 1-25.<br />

7. Narhi LO, Fulco AJ (1986) Characterization <strong>of</strong> a catalytically self-sufficient 119,000-<br />

dalton cytochrome P-450 monooxygenase induced by barbiturates in Bacillus<br />

megaterium. J Biol Chem 261: 7160-7169.<br />

140


PART II: P450BM-3 HEME/FMN Complex<br />

8. Narhi LO, Fulco AJ (1987) Identification <strong>and</strong> Characterization <strong>of</strong> 2 Functional<br />

Domains in Cytochrome-P-450bm-3, a Catalytically Self-Sufficient Monooxygenase<br />

Induced by Barbiturates in Bacillus-Megaterium. J Biol Chem 262: 6683-6690.<br />

9. Munro AW, Lindsay JG, Coggins JR, Kelly SM, Price NC (1994) Structural <strong>and</strong><br />

Enzymological Analysis <strong>of</strong> the Interaction <strong>of</strong> Isolated Domains <strong>of</strong> Cytochrome-P-450<br />

Bm3. Febs Letters 343: 70-74.<br />

10. Warman AJ, Roitel O, Neeli R, Girvan HM, Seward HE, et al. (2005) Flavocytochrome<br />

P450 BM3: an update on structure <strong>and</strong> mechanism <strong>of</strong> a biotechnologically important<br />

enzyme. Biochem Soc Trans 33: 747-753.<br />

11. Munro AW, Leys DG, McLean KJ, Marshall KR, Ost TW, et al. (2002) P450 BM3: the<br />

very model <strong>of</strong> a modern flavocytochrome. Trends Biochem Sci 27: 250-257.<br />

12. Girvan HM, Waltham TN, Neeli R, Collins HF, McLean KJ, et al. (2006)<br />

Flavocytochrome P450 BM3 <strong>and</strong> the origin <strong>of</strong> CYP102 fusion species. Biochem Soc<br />

Trans 34: 1173-1177.<br />

13. Peterson JA, Sevrioukova I, Truan G, GrahamLorence SE (1997) P450BM-3: A tale <strong>of</strong><br />

two domains - Or is it three? Steroids 62: 117-123.<br />

14. Munro AW, Daff S, Coggins JR, Lindsay JG, Chapman SK (1996) Probing electron<br />

transfer in flavocytochrome P-450 BM3 <strong>and</strong> its component domains. Eur J Biochem<br />

239: 403-409.<br />

15. Sevrioukova IF, Li HY, Zhang H, Peterson JA, Poulos TL (1999) Structure <strong>of</strong> a<br />

cytochrome P450-redox partner electron-transfer complex. P Natl Acad Sci USA 96:<br />

1863-1868.<br />

16. Joyce MG, Ekanem IS, Roitel O, Dunford AJ, Neeli R, et al. (2012) The crystal<br />

structure <strong>of</strong> the FAD/NADPH-binding domain <strong>of</strong> flavocytochrome P450 BM3. FEBS<br />

Journal 279: 1694-1706.<br />

17. Page CC, Moser CC, Chen X, Dutton PL (1999) Natural engineering principles <strong>of</strong><br />

electron tunnelling in biological oxidation-reduction. Nature 402: 47-52.<br />

18. Hazzard JT, Govindaraj S, Poulos TL, Tollin G (1997) Electron transfer between the<br />

FMN <strong>and</strong> heme domains <strong>of</strong> cytochrome P450BM-3. Effects <strong>of</strong> substrate <strong>and</strong> CO. J Biol<br />

Chem 272: 7922-7926.<br />

141


PART II: P450BM-3 HEME/FMN Complex<br />

19. van Gunsteren WF, Billeter SR, Eising AA, Hunenberger PH, Kruger P, et al. (1996)<br />

Biomolecular Simulation: The GROMOS96 Manual <strong>and</strong> User Guide. VdF:<br />

Hochschulverlag AG an der ETH Zurich <strong>and</strong> BIOMOS bv, Zurich, Groningen.<br />

20. Helms V, Deprez E, Gill E, Barret C, Hui Bon Hoa G, et al. (1996) Improved binding <strong>of</strong><br />

cytochrome P450cam substrate analogues designed to fill extra space in the<br />

substrate binding pocket. Biochemistry 35: 1485-1499.<br />

21. Roccatano D, Wong TS, Schwaneberg U, Zacharias M (2005) Structural <strong>and</strong> dynamic<br />

properties <strong>of</strong> cytochrome P450 BM-3 in pure water <strong>and</strong> in a<br />

dimethylsulfoxide/water mixture. Biopolymers 78: 259-267.<br />

22. Roccatano D, Wong TS, Schwaneberg U, Zacharias M (2006) Toward underst<strong>and</strong>ing<br />

the inactivation mechanism <strong>of</strong> monooxygenase P450 BM-3 by organic cosolvents: a<br />

molecular dynamics simulation study. Biopolymers 83: 467-476.<br />

23. Verma R, Schwaneberg U, Roccatano D Conformational Dynamics <strong>of</strong> the FMNbinding<br />

Reductase Domain <strong>of</strong> Monooxygenase P450BM-3. Unpublished.<br />

24. Walsh JD, Miller AF (2003) Flavin reduction potential tuning by substitution <strong>and</strong><br />

bending. J Mol Struc-Theochem 623: 185-195.<br />

25. Zheng Y-J, Ornstein RL (1996) A Theoretical Study <strong>of</strong> the Structures <strong>of</strong> Flavin in<br />

Different Oxidation <strong>and</strong> Protonation States. J Am Chem Soc 118: 9402-9408.<br />

26. Beratan DN, Betts JN, Onuchic JN (1991) Protein electron transfer rates set by the<br />

bridging secondary <strong>and</strong> tertiary structure. Science 252: 1285-1288.<br />

27. Balabin IA, Hu X, Beratan DN (2012) Exploring biological electron transfer pathway<br />

dynamics with the Pathways Plugin for VMD. J Comput Chem 33: 906-910.<br />

28. Humphrey W, Dalke A, Schulten K (1996) VMD: Visual molecular dynamics. J Mol<br />

Graphics 14: 33-&.<br />

29. Kabsch W, S<strong>and</strong>er C (1983) Dictionary <strong>of</strong> protein secondary structure: pattern<br />

recognition <strong>of</strong> hydrogen-bonded <strong>and</strong> geometrical features. Biopolymers 22: 2577-<br />

2637.<br />

30. Chang YT, Loew GH (1999) Molecular dynamics simulations <strong>of</strong> P450 BM3--<br />

examination <strong>of</strong> substrate-induced conformational change. J Biomol Struct Dyn 16:<br />

1189-1203.<br />

142


PART II: P450BM-3 HEME/FMN Complex SI<br />

Supporting Information<br />

Insight into the redox partner interaction mechanism in<br />

cytochrome P450BM-3 using molecular dynamics<br />

simulation<br />

Table S5.1: Partial charge on HEME c<strong>of</strong>actor with ferric iron.[1-3]<br />

Atom number Atom type Atom name Charge group Partial charge<br />

1 FE FE 1 1.0<br />

2 NR NA 1 -0.4<br />

3 NR NB 1 -0.4<br />

4 NR NC 1 -0.4<br />

5 NR ND 1 -0.4<br />

6 C CHA 2 -0.2<br />

7 HC HHA 2 0.2<br />

8 C C1A 3 0.2<br />

9 C C2A 3 -0.1<br />

10 C C3A 3 0.0<br />

11 C C4A 3 0.1<br />

12 CH3 CMA 4 0.0<br />

13 CH2 CAA 5 0.0<br />

14 CH2 CBA 5 0.0<br />

143


PART II: P450BM-3 HEME/FMN Complex SI<br />

15 C CGA 6 0.27<br />

16 OM O1A 6 -0.635<br />

17 OM O2A 6 -0.635<br />

18 C CHB 7 -0.2<br />

19 HC HHB 7 0.2<br />

20 C C1B 8 0.05<br />

21 C C2B 8 0.05<br />

22 C C3B 8 -0.1<br />

23 C C4B 8 0.2<br />

24 CH3 CMB 9 0.0<br />

25 CR1 CAB 10 0.0<br />

26 CH2 CBB 10 0.0<br />

27 C CHC 11 -0.2<br />

28 HC HHC 11 0.2<br />

29 C C1C 12 0.2<br />

30 C C2C 12 0.0<br />

31 C C3C 12 -0.1<br />

32 C C4C 12 0.2<br />

33 CH3 CMC 13 0.0<br />

34 CR1 CAC 14 0.0<br />

35 CH2 CBC 14 0.0<br />

36 C CHD 15 -0.2<br />

37 HC HHD 15 0.2<br />

38 C C1D 16 0.2<br />

39 C C2D 16 0.1<br />

40 C C3D 16 -0.2<br />

41 C C4D 16 0.2<br />

42 CH3 CMD 17 0.0<br />

43 CH2 CAD 18 0.0<br />

44 CH2 CBD 18 0.0<br />

144


PART II: P450BM-3 HEME/FMN Complex SI<br />

45<br />

46<br />

47<br />

C CGD 19<br />

OM O1D 19<br />

OM O2D 19<br />

0.27<br />

-0.635<br />

-0.635<br />

Figure S5.1: Secondary structure per residue calculated by DSSP[4] along the trajectory as a<br />

function <strong>of</strong> time for HEME domain <strong>and</strong> FMN domain (a) in complex simulation <strong>and</strong> (b) isolated.<br />

Color code represents different secondary structures.<br />

145


PART II: P450BM-3 HEME/FMN Complex SI<br />

Figure S5.2: Minimum distance between water molecules <strong>and</strong> HEME iron as a function <strong>of</strong><br />

time (every 100 ps) in isolated (in red color) <strong>and</strong> complex (in black color) simulation.<br />

146


PART II: P450BM-3 HEME/FMN Complex SI<br />

Figure S5.3: Relative positional fluctuation <strong>of</strong> first 50 eigenvectors <strong>of</strong> the A <strong>and</strong> F chains in<br />

isolation <strong>and</strong> complex simulation. In AF chain, the first 50 eigenvectors account for 80.45 % <strong>of</strong> total<br />

RPF with 25.96 % contribution by the first eigenvector. A chain has 79.28 % <strong>and</strong> 86.77 %<br />

cumulative RPF with 27.19 % <strong>and</strong> 48.54 % contribution by first eigenvector in complex <strong>and</strong><br />

isolated domain simulation, respectively. For F chain, cumulative RPF <strong>of</strong> first 50 eigenvectors was<br />

90.96 % <strong>and</strong> 89.19 % with 33.98 % <strong>and</strong> 35.02 % RPF <strong>of</strong> first eigenvector in complex <strong>and</strong> isolated<br />

domain simulation.<br />

147


PART II: P450BM-3 HEME/FMN Complex SI<br />

Figure S5.4: RPF for (a) first, (b) second <strong>and</strong> (c) third eigenvector <strong>of</strong> AF chain (cyan color), <strong>and</strong> A<br />

<strong>and</strong> F chain in complex (black color) simulation. The green vertical line separates Heme <strong>and</strong> FMN<br />

domain. Horizontal bars, in blue <strong>and</strong> orange color represent helixes (labeled) <strong>and</strong> beta sheets,<br />

respectively. The regions involved in c<strong>of</strong>actor binding are represented ed by horizontal bars in purple<br />

color.<br />

148


PART II: P450BM-3 HEME/FMN Complex SI<br />

Figure S5.5: RMSF <strong>of</strong> protein backbone atoms along first, second <strong>and</strong> third eigenvector after<br />

projection <strong>of</strong> the trajectory on the corresponding eigenvector <strong>of</strong> AF chain in complex simulation in<br />

(a), (b) <strong>and</strong> (c), respectively. The 10 sequential frames represent the extension <strong>of</strong> the fluctuations in<br />

trajectories along the eigenvectors. The first extreme conformation is shown in green color <strong>and</strong> last<br />

extreme in violet color. Other conformations <strong>of</strong> Heme <strong>and</strong> FMN domain are in sky blue <strong>and</strong> tan<br />

color, respectively. Helixes <strong>and</strong> loops are labeled. N <strong>and</strong> C indicate the N- <strong>and</strong> C-terminus <strong>of</strong> the<br />

protein (labeled in red color).<br />

149


PART II: P450BM-3 HEME/FMN Complex SI<br />

References:<br />

1. Helms V, Deprez E, Gill E, Barret C, Hui Bon Hoa G, et al. (1996) Improved binding <strong>of</strong><br />

cytochrome P450cam substrate analogues designed to fill extra space in the<br />

substrate binding pocket. Biochemistry 35: 1485-1499.<br />

2. Roccatano D, Wong TS, Schwaneberg U, Zacharias M (2005) Structural <strong>and</strong> dynamic<br />

properties <strong>of</strong> cytochrome P450 BM-3 in pure water <strong>and</strong> in a<br />

dimethylsulfoxide/water mixture. Biopolymers 78: 259-267.<br />

3. Roccatano D, Wong TS, Schwaneberg U, Zacharias M (2006) Toward underst<strong>and</strong>ing<br />

the inactivation mechanism <strong>of</strong> monooxygenase P450 BM-3 by organic cosolvents: a<br />

molecular dynamics simulation study. Biopolymers 83: 467-476.<br />

4. Kabsch W, S<strong>and</strong>er C (1983) Dictionary <strong>of</strong> protein secondary structure: pattern<br />

recognition <strong>of</strong> hydrogen-bonded <strong>and</strong> geometrical features. Biopolymers 22: 2577-<br />

2637.<br />

150


PART II: P450BM-3 HEME/FMN & CoSep<br />

Chapter 6<br />

A molecular dynamics study <strong>of</strong> the effect <strong>of</strong><br />

cobalt(II)sepulchrate as an electron transfer mediator<br />

on the conformational <strong>and</strong> dynamics <strong>of</strong> P450BM-3<br />

6.1. Abstract<br />

The major limitation <strong>of</strong> the exploitation <strong>of</strong> P450BM-3 in the industrial processes is<br />

the consumption <strong>of</strong> expensive NADPH as a reduction equivalent in the catalytic cycle.<br />

Experimentally NADPH has also been found to inactivate the enzyme in the absence <strong>of</strong><br />

substrate. The use <strong>of</strong> alternative cost effective c<strong>of</strong>actor like cobalt(III)sepulchrate (CoSep)<br />

with zinc dust as the source <strong>of</strong> electron has been proposed as a possible alternative<br />

solution to overcome the latter limitation. The mechanism <strong>of</strong> interaction <strong>of</strong> cobalt(III)<br />

sepulchrate with the protein has not yet elucidated at molecular level. In this paper, we<br />

propose a novel model <strong>of</strong> CoSep <strong>and</strong> use to study using molecular dynamic simulations its<br />

interaction with isolated HEME domain <strong>and</strong> the HEME/FMN complex <strong>of</strong> the P450BM-3. The<br />

aim <strong>of</strong> the study is to identify the putative binding modes <strong>of</strong> the CoSep on P450BM-3<br />

domains <strong>and</strong> their effect on their conformation, dynamics <strong>and</strong> electron transfer (ET)<br />

tunneling. The results <strong>of</strong> this study indicates that CoSep preferentially bind to negative<br />

charged residue on the surface exposed regions <strong>of</strong> P450BM-3 domains. Two ET tunneling<br />

pathways were observed for HEME/FMN complex in the presence <strong>of</strong> CoSep. First one is<br />

from CoSep to FMN isoalloxazine ring involving Trp574 then to HEME iron mediated by<br />

151


PART II: P450BM-3 HEME/FMN & CoSep<br />

water molecule on interface <strong>and</strong> Met490 (ribityl tail <strong>of</strong> FMN c<strong>of</strong>actor binding loop) <strong>and</strong><br />

Phe394, the same residues were involved in ET tunneling in water simulation <strong>of</strong> P450BM-3<br />

domains. The second ET pathways was from CoSep to HEME iron via Ile102 <strong>and</strong> Leu103 (C<br />

helix residues) <strong>and</strong> Ile401 <strong>and</strong> Cys400. In isolated HEME domain ET tunneling involved the<br />

residues <strong>of</strong> B’/C loop, Asp84, Gly85, Leu86 <strong>and</strong> Phe87. The collective motions <strong>of</strong> different<br />

amplitude were observed in both the systems <strong>and</strong> were found to facilitate the ET tunneling<br />

from CoSep to HEME iron.<br />

6.2. Introduction<br />

Cytochrome P450 monooxygenases, the largest superfamily <strong>of</strong> heme-containing<br />

soluble proteins, spread widely in almost all domains <strong>of</strong> life e.g. bacteria, yeast, insects,<br />

mammalian tissues, <strong>and</strong> plants.[1-3] They catalyze the oxidation using oxygen molecules <strong>of</strong><br />

wide variety <strong>of</strong> substrates involved in biosynthesis <strong>and</strong> biodegradation pathways, or in<br />

xenobiotics metabolism[4] in the presence <strong>of</strong> reduction equivalents. The high<br />

stereoselectivity <strong>and</strong> large variety <strong>of</strong> possible substrates make these enzymes particular<br />

interesting for industrial applications. However, their complexity, low solubility, low<br />

catalytic turnover <strong>and</strong> in particular the utilization <strong>of</strong> expensive source <strong>of</strong> electron have so<br />

far limited their use.[4] Cytochrome P450BM-3, from the soil bacterium Bacillus<br />

megaterium, is one <strong>of</strong> the most widely studied members <strong>of</strong> this family.[5,6] Being soluble<br />

<strong>and</strong> self sufficient (P450 <strong>and</strong> reductase domains linked together on a single polypeptide<br />

chain), P450BM-3 has higher catalytic turnover with easy expression <strong>and</strong> purification in<br />

cell free medium.[7] Protein engineering approaches have been used successfully to<br />

increase technologically viability <strong>of</strong> P450BM-3 by fine-tuning its catalytic parameters <strong>and</strong><br />

substrate recognition.[8,9] In past years, fast advancements are also made towards the cost<br />

effectiveness <strong>of</strong> the P450BM-3 catalytic reaction by the regeneration or substitution <strong>of</strong><br />

expensive c<strong>of</strong>actor (NADPH or NADH) as a source <strong>of</strong> electrons.[7] The electrochemistry <strong>of</strong><br />

P450BM-3 received considerable attention <strong>and</strong> various methods have allowed direct<br />

electron transfer system (from electrode to protein via conducting polymer films like<br />

152


PART II: P450BM-3 HEME/FMN & CoSep<br />

BaytronP)[10] or mediated electron transfer system (to shuttle electrons from electrode to<br />

protein via small electro active compounds like Zn dust (as a source <strong>of</strong> electron) with<br />

Co(III)sepulchrate (as electron mediator) for driving the catalytic cycle.[11,12] Protein<br />

engineering via directed evolution <strong>and</strong> rational design <strong>of</strong>fers an attractive solution to<br />

improve the enzymatic properties <strong>and</strong> to enhance the electrochemical performance <strong>of</strong> the<br />

enzyme.[11-13] In this paper, we performed molecular dynamic simulation to gain insight<br />

into the interaction mechanism <strong>of</strong> P450BM-3 domains with cobalt(II)sepulchrate (CoSep)<br />

as an electron transfer mediator. The results will help to investigate the effect <strong>of</strong> CoSep<br />

binding on conformation, dynamics <strong>and</strong> ET tunneling in P450BM-3 domains.<br />

The chapter is organized as follows. The details <strong>of</strong> MD simulations <strong>and</strong> force field<br />

modeling <strong>of</strong> the CoSep are reported in Method section. The Results <strong>and</strong> Discussion section<br />

is organized as follows. The preferential binding sites <strong>of</strong> CoSep on P450BM-3 domains are<br />

reported. The following paragraph provides information about the ET tunneling from<br />

CoSep to P450BM-3 domains. Hence, the collective dynamics <strong>of</strong> the system will be analyzed<br />

using the principal component analysis <strong>of</strong> the trajectories. Finally, in the conclusion section<br />

provides a summary <strong>of</strong> the outcome <strong>of</strong> the study.<br />

6.3. Methods<br />

6.3.1. Starting coordinates<br />

The non- stoichiometric complex <strong>of</strong> one FMN domain to two HEME domains without<br />

substrate were used as a starting coordinate (PDB ID: 1BVY with resolution 0.203 nm).[14]<br />

For MD simulation, HEME domain (chain A: 20 - 450) associated with FMN domain (chain<br />

F: 479 - 630) was extracted from the starting coordinates including crystallographic water<br />

(within 0.60 nm from the protein was extracted using VMD s<strong>of</strong>tware[15]). 1,2-ethanediol<br />

molecules were removed <strong>and</strong> replaced by water molecules from the crystallographic<br />

153


PART II: P450BM-3 HEME/FMN & CoSep<br />

structure. The MD simulation was set up for isolated HEME-binding domain <strong>and</strong><br />

HEME/FMN complex in water- CoSep mixture.<br />

6.3.2. Molecular dynamics simulation <strong>and</strong> modeling<br />

The GROMOS96 43a1 force field[16] was used for all simulations. The MD<br />

simulations performed in this study are summarized in Table 6.1. The HEME c<strong>of</strong>actor<br />

parameters for ferric iron was adopted from Helms et al.[17], that was already employed<br />

for the MD simulation <strong>of</strong> P450BM-3 HEME domain by Roccatano et al..[18,19] The partial<br />

charges were redistributed on porphyrin ring <strong>of</strong> HEME c<strong>of</strong>actor to adopt the parameters<br />

for GROMOS96 43a1 force field[16] with hydrogen atoms bound to bridging carbon in<br />

prophyrin ring.[20] FMN c<strong>of</strong>actor was in oxidized state in the FMN domain. Additional<br />

improper dihedrals were introduced to adopt the conformation <strong>of</strong> isoalloxazine ring as<br />

observed in crystallographic structure <strong>and</strong> molecular geometry optimization <strong>of</strong> flavin in<br />

both redox states. [21,22] Detail <strong>of</strong> the modified force field for FMN are reported in a<br />

previous paper.[23]<br />

For CoSep, (schematically represented in Figure 6.1) the force field parameters for<br />

bond <strong>and</strong> bond angles are adapted from Dehayes et al.[24] (the values are reported in<br />

Table S6.1 in Supporting Information (SI)). The non-bonded parameters are adopted from<br />

GROMOS96 43a1 force field.[16]<br />

Density functional theory calculation using Becke3LYP method[25] with LanL2DZ<br />

basic set[26] was used for the geometry optimization. Atomic partial charges were derived<br />

using CHelpG scheme[27] after constraints them to reproduce dipole moment (partial<br />

charges are reported in Table S6.2 in SI). A ionic radius for Co +2 <strong>of</strong> 0.075 nm was used to fit<br />

electrostatic potentials. All the calculations were performed using Gaussian09 package.[28]<br />

Fourty molecules <strong>of</strong> CoSep were r<strong>and</strong>omly placed in the simulation box <strong>and</strong> solvated<br />

by stacking equilibrated boxes <strong>of</strong> solvent molecules to fill the simulation box. The CoSep<br />

154


PART II: P450BM-3 HEME/FMN & CoSep<br />

concentration was equal to ~0.5 mM <strong>and</strong> it corresponds to the one used experimentally for<br />

the fastest biotransformation in P450BM-3 using Zn dust <strong>and</strong> cobalt(III)sepulchrate as<br />

alternative electron transfer system.[12]<br />

Figure 6.1: CoSep is in ball <strong>and</strong> stick representation, colored by elements such as, nitrogen<br />

in blue, hydrogen in green, <strong>and</strong> carbon in gray with labeled atom name <strong>and</strong> number (except<br />

hydrogen).<br />

Table 6.1: Summarizing the MD simulations <strong>of</strong> P450BM-3 domains in water <strong>and</strong> CoSep<br />

solution.<br />

No. <strong>of</strong><br />

Starting<br />

No. <strong>of</strong> counter Simulation<br />

No. <strong>of</strong> atoms No. <strong>of</strong> CoSep solvent<br />

coordinates<br />

ions<br />

length (ns)<br />

molecules<br />

Heme<br />

domain (A 65650 - 20365 16 Na + 100<br />

chain)<br />

FMN domain 33483 - 10650 14 Na + 100<br />

155


PART II: P450BM-3 HEME/FMN & CoSep<br />

(F chain)<br />

Complex (AF<br />

chain)<br />

A chain &<br />

CoSep<br />

AF chain &<br />

CoSep<br />

86101 - 26671 30 Na + 100<br />

64597 40 19638 64 Cl - 100<br />

85275 40 26029 50 Cl - 100<br />

*The abbreviation A, F <strong>and</strong> AF are used in the rest <strong>of</strong> the paper for HEME domain, FMN domain <strong>and</strong><br />

HEME/FMN complex, respectively.<br />

6.4. Results <strong>and</strong> discussion<br />

The difference in conformation <strong>and</strong> dynamics <strong>of</strong> isolated FMN <strong>and</strong> HEME domain<br />

<strong>and</strong> HEME/FMN complex has been discussed in our previous paper.[20,23] Herein, we will<br />

focus on the effect <strong>of</strong> CoSep binding on the conformation, dynamics <strong>and</strong> ET tunneling in<br />

P450BM-3 domains in isolated HEME domain <strong>and</strong> in HEME/FMN complex. The presence <strong>of</strong><br />

CoSep does not affect the structure <strong>of</strong> P450BM-3 domains significantly. The structural<br />

stability <strong>and</strong> convergence <strong>of</strong> P450BM-3 domains in CoSep solution were compared with the<br />

one in water <strong>and</strong> reported in SI through backbone root mean square deviation (RMSD)<br />

(Figure S6.1), radius <strong>of</strong> gyration (Rg) (Figure S6.2) <strong>and</strong> backbone RMSD <strong>and</strong> RMSF per<br />

residue (Figure S6.3a <strong>and</strong> S6.3a, respectively) using crystal structure as reference. In CoSep<br />

solution, both HEME domain <strong>and</strong> the complex show the same behavior as in pure water.<br />

The backbone RMSD <strong>of</strong> the HEME domain in the CoSep solution reaches a plateau with an<br />

average value <strong>of</strong> 0.25 ± 0.01 nm after 10 ns <strong>of</strong> simulation (Figure S6.1 in SI) <strong>and</strong> it shows<br />

the lowest RMS deviations <strong>and</strong> fluctuations (Figure S6.3a <strong>and</strong> S6.3b in SI). On the contrary,<br />

the AF complex shows the largest deviation in the residues <strong>of</strong> H helix (Figure S6.3a in SI).<br />

156


PART II: P450BM-3 HEME/FMN & CoSep<br />

6.4.1. CoSep binding on P450BM-3 domains<br />

At the end <strong>of</strong> the simulations <strong>of</strong> both the isolated HEME domain <strong>and</strong> the complex,<br />

CoSep molecules were found bounded mainly at the surface exposed loop regions <strong>of</strong> the<br />

protein (see Figure S6.4 <strong>of</strong> SI). The average minimum distances between the CoSep<br />

molecules <strong>and</strong> HEME iron <strong>and</strong> the isoalloxazine ring along the simulations are reported in<br />

Figure S6.5 <strong>of</strong> SI. After 20 ns <strong>of</strong> simulation, CoSep molecules approach the HEME domain<br />

within an average distance <strong>of</strong> 1.72 ± 0.44 nm <strong>and</strong> 1.94 ± 0.19 nm in the isolated protein <strong>and</strong><br />

in the complex, respectively. Figure 6.2a, 6.2b <strong>and</strong> 6.2c shows the minimum distance<br />

between CoSep <strong>and</strong> residues <strong>of</strong> isolated A chain <strong>and</strong>, <strong>of</strong> A <strong>and</strong> F chain in complex,<br />

respectively.<br />

The cluster analysis was used to select representative structure for the isolated<br />

HEME domain <strong>and</strong> for the complex. The first cluster <strong>of</strong> A chain <strong>and</strong> complex accounts for<br />

more ~83 % <strong>and</strong> 99 %, respectively in CoSep solution. The binding <strong>of</strong> CoSep in isolated A<br />

chain <strong>and</strong> AF chain complex is shown in Figure 6.2c <strong>and</strong> 6.2d, respectively. In isolated A<br />

chain, CoSep molecules bind mainly at HEME/FMN interface in contact with C (94 – 107)<br />

<strong>and</strong> H (233 – 238) helix <strong>and</strong>, B’/C (82 – 94), H/I (239 – 251), K/L (359 – 367) C-terminus<br />

turn (441 – 445) regions. Other regions <strong>of</strong> CoSep binding on isolated chain was A/B (32 –<br />

38 <strong>and</strong> 51 – 55), B helix (55 – 61) <strong>and</strong> B/B’ loop (61 – 68) <strong>and</strong> E helix (139 – 143) <strong>and</strong> F<br />

helix (181, 182) <strong>and</strong> F/G (192 – 198) loop. In AF chain, the binding <strong>of</strong> F chain slightly<br />

influence the distribution <strong>of</strong> CoSep on A chain. CoSep were more abundant at F chain, two<br />

<strong>of</strong> them were present near (≤ 0.50 nm) to FMN c<strong>of</strong>actor binding loop Lβ3 <strong>and</strong> Lβ4 at<br />

FMN/HEME interface. In P450BM-3 domains, the regions <strong>of</strong> CoSep binding were found to<br />

be rich in charged residues especially negative charged polar residues (aspartic acid <strong>and</strong><br />

glutamic acid) i.e. obtained from the analysis <strong>of</strong> number <strong>of</strong> contacts between CoSep <strong>and</strong><br />

P450BM-3 residues within the distance <strong>of</strong> 0.50 nm (reported in Figure S6.6 <strong>and</strong> also shown<br />

in Figure 6.2d <strong>and</strong> 6.2e by the abundance <strong>of</strong> oxygen (red color surface) in CoSep binding<br />

region).<br />

157


PART II: P450BM-3 HEME/FMN & CoSep<br />

Figure 6.2: Minimum distance (≤ 1.0 nm) between CoSep <strong>and</strong> residues <strong>of</strong> (a) isolated A chain, (b)<br />

A chain <strong>and</strong> (c) F chain in complex. Horizontal bars, in blue <strong>and</strong> orange color represent helices<br />

(labeled) <strong>and</strong> beta sheets, respectively. The regions involved in c<strong>of</strong>actor binding are represented by<br />

horizontal bars in purple color. (d) <strong>and</strong> (e) show binding site for CoSep in the structure <strong>of</strong> first<br />

158


PART II: P450BM-3 HEME/FMN & CoSep<br />

cluster <strong>of</strong> the isolates A chain <strong>and</strong> AF chain, respectively. CoSep molecules are in ball <strong>and</strong> stick<br />

representation <strong>and</strong> colored by element type (nitrogen in blue color, hydrogen in white color, <strong>and</strong><br />

carbon in black color). HEME <strong>and</strong> FMN domain are in cartoon representation in sky blue <strong>and</strong> tan<br />

color, respectively with surface colored according to element type. FMN <strong>and</strong> HEME c<strong>of</strong>actors are in<br />

green <strong>and</strong> red, respectively. Helices, c<strong>of</strong>actors, loops <strong>and</strong>, N- <strong>and</strong> C- terminus (in red color) are<br />

labeled.<br />

6.4.2. Effect <strong>of</strong> CoSep binding on substrate access channel<br />

The accessibility <strong>of</strong> active site has been monitored by the dynamics behavior <strong>of</strong><br />

residues Pro45 <strong>and</strong> A191 that line the substrate access channel by Roccatano et al..[19]<br />

P45Cα - A191Cα minimum distance (1.61 nm in crystal structure) calculated <strong>and</strong> reported<br />

in Figure S6.7 <strong>of</strong> SI. After 30 ns <strong>of</strong> simulation, least variation in the distances was observed<br />

in all the simulations. CoSep binding in A chain <strong>of</strong> AF complex induce larger deviation in G<br />

helix <strong>and</strong> F/G loop region <strong>and</strong> resulted in wider substrate access channel with an average<br />

distance <strong>of</strong> 1.87 ± 0.15 nm than in A chain <strong>of</strong> AF complex in water (0.59 ± 0.10 nm). Isolated<br />

A chain was less affected by CoSep binding <strong>and</strong> show slightly higher P45Cα - A191Cα<br />

distance (1.50 ± 0.14 nm) in it CoSep solution than in water with an average distance <strong>of</strong><br />

1.11 ± 0.10 nm. In isolated A chain, reverse effect <strong>of</strong> CoSep binding was observed with 0.22<br />

± 0.03 nm average distance between water <strong>and</strong> HEME iron that was observed to be 0.34 ±<br />

0.14 nm in water simulation. Hence, CoSep binding in isolated HEME domain make its<br />

structure slightly compact <strong>and</strong> decreased the size <strong>of</strong> substrate access channel.<br />

6.4.3. Effect <strong>of</strong> CoSep binding on ET tunneling<br />

In CoSep solution, the distance between FMN <strong>and</strong> HEME c<strong>of</strong>actor was as average<br />

1.35 ± 0.01 nm (with the minimum distance <strong>of</strong> 0.95 nm), lower than the one in water (1.41<br />

± 0.09 nm) (reported in Figure S6.8 in SI). The ET tunneling in AF chain in crystal structure<br />

<strong>and</strong> in the simulation has been discussed in detail in our previous paper.[20] In CoSep<br />

159


PART II: P450BM-3 HEME/FMN & CoSep<br />

solution, the ET tunneling was identified form CoSep to HEME iron in representative<br />

structures <strong>of</strong> isolated <strong>and</strong> complex simulation obtained via cluster analysis <strong>and</strong> reported in<br />

Table 6.2.<br />

Table 6.2: Electron transfer tunneling in AF chain <strong>and</strong> isolated A chain in CoSep solution<br />

calculated by Pathways[29] VMD plugin.<br />

Coordinates Redox Max. Distance Amino acids involved in<br />

partners coupling along ET the ET pathway<br />

(a.u.) pathway (nm)<br />

A chain CoSep/HEME 6.39 x10 -9 3.08 CoSep → D84 →<br />

G85 → L86 → F87 →<br />

HEME (FE)<br />

AF chain CoSep/HEME 2.25 x10 -9 2.93 CoSep → I102 →<br />

L103 → I401 → C400 →<br />

HEME (FE)<br />

AF chain CoSep/FMN 4.00 x10 -6 1.72 CoSep → W574 →<br />

→ FMN (C7)<br />

AF chain FMN/HEME 1.38 x10 -9 2.08 FMN (C7) → SOL →<br />

→ M490 → F393 →<br />

HEME (FE)<br />

Figure 6.3a <strong>and</strong> 6.3b shows the possible ET tunneling from CoSep to HEME iron <strong>of</strong> A<br />

chain in isolated <strong>and</strong> complex simulation. In isolated A chain, ET was mediated by the<br />

residues <strong>of</strong> B’/C loop, Asp84, Gly85, Leu86 <strong>and</strong> Phe87. In A chain <strong>of</strong> AF complex, ET<br />

tunneling can be mediated by two pathways. The first ET pathway is from CoSep to HEME<br />

iron mediated by Iso102, Leu103, Ile401 <strong>and</strong> Cys400. Figure 6.3c <strong>and</strong> 6.3d shows the<br />

160


PART II: P450BM-3 HEME/FMN & CoSep<br />

second possible pathway, first from CoSep to isoalloxazine ring <strong>of</strong> FMN (C7 atom) <strong>and</strong> then<br />

from C7 atom to HEME iron mediated by water molecule involving Met490 <strong>of</strong> Lβ1 FMN<br />

binding loop <strong>and</strong> Phe393 <strong>of</strong> K/L loop. The same FMN/HEME ET pathway was observed in<br />

water simulation <strong>of</strong> AF complex without the involvement <strong>of</strong> water molecule.[20]<br />

161


PART II: P450BM-3 HEME/FMN & CoSep<br />

Figure 6.3: ET tunneling from CoSep (in purple color) to HEME iron in AF complex (a) <strong>and</strong><br />

isolated A chain (b) in CoSep solution. (c) ET from CoSep to the isoalloxazine ring (C7 atom) <strong>of</strong> FMN<br />

c<strong>of</strong>actor <strong>and</strong> d) from C7 atom <strong>of</strong> FMN c<strong>of</strong>actor to HEME iron in AF complex. ET is represented by<br />

red color tubes. HEME <strong>and</strong> FMN c<strong>of</strong>actors are in black <strong>and</strong> pink color, respectively. The<br />

conformation <strong>of</strong> first cluster with minimum distance between HEME to FMN c<strong>of</strong>actor is used. The<br />

amino acids with in the distance <strong>of</strong> 0.50 nm from both the c<strong>of</strong>actors are labeled <strong>and</strong> shown in<br />

licorice representation colored by element type (oxygen in red, carbon in cyan <strong>and</strong> nitrogen in blue<br />

color) <strong>and</strong> their associated secondary structure in cartoon representation in sky blue for HEME<br />

domain <strong>and</strong> in orange color for FMN domain. The residues involved in electron tunneling are<br />

represented <strong>and</strong> labeled in green color.<br />

6.4.4. Effect <strong>of</strong> CoSep binding on P450BM-3 dynamics<br />

The subspace overlap <strong>and</strong> inner product <strong>of</strong> first ten eigenvectors A chain (together<br />

account for ~60% <strong>of</strong> total residue position fluctuation) in isolated <strong>and</strong> complex simulation<br />

was less than 0.20 <strong>and</strong> 0.34, respectively. The latter indicate the existence <strong>of</strong> different set <strong>of</strong><br />

collective motions in the eigenvectors <strong>of</strong> same time windows <strong>of</strong> both the trajectories. The<br />

first three eigenvectors <strong>of</strong> A chain together represent ~41 % <strong>of</strong> the total relative positional<br />

fluctuation (RPF). Figure 6.4a, 6.4b <strong>and</strong> 6.4c represents RPF associated with first three<br />

eigenvectors <strong>of</strong> A chain in isolated (in green color) <strong>and</strong> complex (in orange color)<br />

simulation (comparison to P450BM-3 domain in water is reported in Figure S6.9). Figure<br />

6.5 shows RMSF associated with first three eigenvector <strong>of</strong> A chain in isolated (a1, a2 <strong>and</strong><br />

a3) <strong>and</strong> complex (b1, b2 <strong>and</strong> b3) simulation, respectively in CoSep solution.<br />

In isolated A chain, the first collective motion (Figure 6.4a <strong>and</strong> 6.5a1) involves<br />

mainly N-terminus region (residue 20 – 26), turn (residue 35 – 38) between A helix <strong>and</strong><br />

beta sheet 1, C-terminus (residue 450 – 458) <strong>and</strong> in K/L loop region (residue 325 – 380,<br />

involved in HEME c<strong>of</strong>actor binding) <strong>and</strong> slight motion in B helix <strong>and</strong> D/E, F/G <strong>and</strong> G/H<br />

loops. In the second eigenvector involve the collective motion (Figure 6.4b <strong>and</strong> 6.5a2) along<br />

the turn (residue 35 – 38) between A helix <strong>and</strong> beta sheet, F/G loop, K/L loop (residue 366<br />

162


PART II: P450BM-3 HEME/FMN & CoSep<br />

– 385) <strong>and</strong> C-terminus (residue 425 – 430 <strong>and</strong> 450 – 458). In the third eigenvector (Figure<br />

6.4c <strong>and</strong> 6.5a3), mainly at C-terminus (residue 425 – 458) <strong>and</strong> slightly D/E, F/G <strong>and</strong> K/L<br />

loop (residue 390 – 402). The first three eigenvectors show that the substrate channel<br />

remains open (also found in P45Cα - A191Cα<br />

Figure 6.4: RPF for (a) first, (b) second <strong>and</strong> (c) third eigenvector <strong>of</strong> A chain in isolated (green<br />

color) <strong>and</strong> complex (orange color) simulation. Horizontal bars, in blue <strong>and</strong> orange color represent<br />

helixes (labeled) <strong>and</strong> beta sheets, respectively. The regions involved in c<strong>of</strong>actor binding are<br />

represented by horizontal bars in purple color.<br />

distance) <strong>and</strong> the collective motion is related to the interaction <strong>of</strong> residues to HEME<br />

c<strong>of</strong>actor <strong>and</strong> to facilitate ET tunneling from CoSep to HEME iron through B’/C loop in<br />

isolated A chain.<br />

163


PART II: P450BM-3 HEME/FMN & CoSep<br />

In complex simulation, the first eigenvector <strong>of</strong> A chain (Figure 6.4a <strong>and</strong> 6.5b1)<br />

involve RPF in B’/C loop (residue 83 – 94), C <strong>and</strong> D helix <strong>and</strong> C/D loop (residue 100 – 130),<br />

E/F loop (residue 159 – 171), G helix <strong>and</strong> G/H loop (residue 198 – 230), I helix (residue<br />

250 – 268) <strong>and</strong> K/L loop (residue 335 – 340 <strong>and</strong> 392 – 400). The first collective motion<br />

involved residues in contact with HEME c<strong>of</strong>actor <strong>and</strong> is related to interaction <strong>of</strong> A chain<br />

with F chain with the largest RPF the regions on interface <strong>of</strong> A chain mainly constituted by<br />

C – D helix (found to be involved in ET tunneling from CoSep to HEME iron) <strong>and</strong> K/L loop<br />

(F393 is involved in water mediated ET tunneling from FMN to HEME iron). In the second<br />

eigenvector (Figure 6.4b <strong>and</strong> 6.5b2), the collective motion was involve mainly G helix <strong>and</strong><br />

slightly in B’/C loop, G/H loop <strong>and</strong> K/L loop (residue 495 - 400). Larger RPF in G helix is<br />

resulted by CoSep binding the slight kink formation in G helix. The collective motion in<br />

third eigenvector (Figure 6.4c <strong>and</strong> 6.5b3) involve mainly G/H loop only <strong>and</strong> slightly in G<br />

helix <strong>and</strong> K/L loop regions.<br />

AF chain in CoSep solution shows the collective motion <strong>of</strong> different amplitude than<br />

the one in water. RPF <strong>of</strong> first three eigenvectors <strong>of</strong> AF chain in both water <strong>and</strong> CoSep<br />

solution is reported in Figure S6.10 <strong>of</strong> SI. The collective motion associated with the first<br />

two eigenvectors <strong>of</strong> AF chain in the presence <strong>of</strong> CoSep does not belongs solely to the<br />

movement <strong>of</strong> F chain towards A chain as observed in water simulation. The collective<br />

motion associated with the first three eigenvectors <strong>of</strong> AF chain in the presence <strong>of</strong> CoSep is<br />

reported in Figure S6.11 in SI. In the first eigenvector in CoSep, A chain <strong>of</strong> AF complex show<br />

higher fluctuation in C helix, D helix, beginning <strong>of</strong> F helix (residue 170 – 175) <strong>and</strong> G/H loop<br />

<strong>and</strong> lower fluctuation for F chain than the one in water. In the second eigenvector, the<br />

collective motion in A chain in the presence <strong>of</strong> CoSep mainly involve G helix <strong>and</strong> slightly<br />

higher RMSF for B’/C loop <strong>and</strong> K/L loop (residue 390 – 400), both loops are involved in<br />

HEME c<strong>of</strong>actor binding. Third collective motion in A chain <strong>of</strong> AF complex involves N-<br />

terminus residues (A – C helix), G helix <strong>and</strong> K/L loop. Third eigenvector <strong>of</strong> AF chain show<br />

slight collective motion <strong>of</strong> F chain towards A chain.<br />

164


PART II: P450BM-3 HEME/FMN & CoSep<br />

Figure 6.5: RMSF <strong>of</strong> protein backbone atoms along first (a), second (b) <strong>and</strong> third (c) eigenvector<br />

after projection <strong>of</strong> the trajectory on the corresponding eigenvector <strong>of</strong> isolated A chain in water (a1,<br />

a2 <strong>and</strong> a3) <strong>and</strong> in CoSep solution (b1, b2 <strong>and</strong> b3). The 10 sequential frames represent the extension<br />

<strong>of</strong> the fluctuations in trajectories along the eigenvectors. The first extreme conformation is shown<br />

in green color <strong>and</strong> last extreme in violet color. Other conformations <strong>of</strong> A chain are in sky blue.<br />

Helices <strong>and</strong> loops are labeled. N-terminus <strong>of</strong> the protein is labeled in red color.<br />

165


PART II: P450BM-3 HEME/FMN & CoSep<br />

6.5. Conclusions<br />

We performed the simulation <strong>of</strong> isolated HEME domain <strong>and</strong> HEME/FMN complex in<br />

CoSep solution. Structure remains conserved in both the systems throughout the<br />

simulation. CoSep was found to bound mainly on surface exposed loop regions (richer in<br />

charged amino acid mainly negative charged, E <strong>and</strong> D) in both the systems. CoSep binding<br />

affects the substrate access channel, found to be relatively more open in compassion to the<br />

one in water simulation.[20] Isolated HEME domain adopts ET tunneling from CoSep to<br />

HEME iron mediated by residues <strong>of</strong> B’/C loop. HEME/FMN complex has two possible ET<br />

tunneling pathways. First one was from CoSep to FMN <strong>and</strong> then from FMN to HEME by the<br />

involvement <strong>of</strong> same residues (as observed in water simulation, Met490 <strong>and</strong> Phe393) but<br />

mediated by water molecule. However, the average distance between FMN/HEME was<br />

lesser (1.35 ± 0.11 nm) than the one observed in water simulation (1.41 ± 0.09 nm <strong>and</strong> 1.81<br />

nm in crystal structure). Second ET tunneling was directly from CoSep to HEME iron. Both<br />

the system shows atomic fluctuations <strong>of</strong> different amplitude in CoSep solution <strong>and</strong> in water<br />

simulation. Hence, the presence <strong>of</strong> CoSep does not affect dramatically the conformation <strong>of</strong><br />

P450BM-3 domains but mainly it results into the stabilization on the loops on surface.<br />

Except in HEME/FMN complex CoSep binding induced the conformational change in G helix<br />

<strong>and</strong> resulted in higher fluctuation in F/G <strong>and</strong> G/H loop regions during the simulation. In<br />

HEME/FMN complex, the preferable ET pathway is from CoSep to FMN <strong>and</strong> then to HEME<br />

iron <strong>and</strong> in this process surface water molecule plays an important role. However, in<br />

isolated HEME domain direct ET from CoSep to HEME iron might fasten the ET tunneling<br />

<strong>and</strong> hence the performance <strong>of</strong> enzyme as observed in protein engineering experiment <strong>of</strong><br />

P450BM-3 that isolated HEME domain perform better in the presence <strong>of</strong> Zn/Co(III)sep. The<br />

results <strong>of</strong> this study provide indication <strong>of</strong> the mechanism <strong>of</strong> ET by the CoSep. These results<br />

are in agreement with the findings <strong>of</strong> directed evolution <strong>and</strong> side directed mutagenesis<br />

experiments on the whole P450BM-3 <strong>and</strong> the HEME domain.<br />

166


PART II: P450BM-3 HEME/FMN & CoSep<br />

6.6. References<br />

1. Chefson A, Auclair K (2006) Progress towards the easier use <strong>of</strong> P450 enzymes. Mol<br />

Biosyst 2: 462-469.<br />

2. Wong LL (1998) Cytochrome P450 monooxygenases. Curr Opin Chem Biol 2: 263-<br />

268.<br />

3. Guengerich FP (2001) Common <strong>and</strong> uncommon cytochrome P450 reactions related<br />

to metabolism <strong>and</strong> chemical toxicity. Chem Res Toxicol 14: 611-650.<br />

4. Kumar S Engineering cytochrome P450 biocatalysts for biotechnology, medicine <strong>and</strong><br />

bioremediation. Expert Opin Drug Metab Toxicol 6: 115-131.<br />

5. Narhi LO, Fulco AJ (1986) Characterization <strong>of</strong> a catalytically self-sufficient 119,000-<br />

dalton cytochrome P-450 monooxygenase induced by barbiturates in Bacillus<br />

megaterium. J Biol Chem 261: 7160-7169.<br />

6. Narhi LO, Fulco AJ (1987) Identification <strong>and</strong> Characterization <strong>of</strong> 2 Functional<br />

Domains in Cytochrome-P-450bm-3, a Catalytically Self-Sufficient Monooxygenase<br />

Induced by Barbiturates in Bacillus-Megaterium. J Biol Chem 262: 6683-6690.<br />

7. Whitehouse CJC, Bell SG, Wong L-L (2012) P450BM3 (CYP102A1): connecting the<br />

dots. Chem Soc Rev.<br />

8. Warman AJ, Roitel O, Neeli R, Girvan HM, Seward HE, et al. (2005) Flavocytochrome<br />

P450 BM3: an update on structure <strong>and</strong> mechanism <strong>of</strong> a biotechnologically important<br />

enzyme. Biochem Soc Trans 33: 747-753.<br />

9. Girvan HM, Waltham TN, Neeli R, Collins HF, McLean KJ, et al. (2006)<br />

Flavocytochrome P450 BM3 <strong>and</strong> the origin <strong>of</strong> CYP102 fusion species. Biochem Soc<br />

Trans 34: 1173-1177.<br />

10. Schuhmann W (2002) Amperometric enzyme biosensors based on optimised<br />

electron-transfer pathways <strong>and</strong> non-manual immobilisation procedures. J<br />

Biotechnol 82: 425-441.<br />

11. Nazor J, Dannenmann S, Adjei RO, Fordjour YB, Ghampson IT, et al. (2008)<br />

Laboratory evolution <strong>of</strong> P450 BM3 for mediated electron transfer yielding an<br />

167


PART II: P450BM-3 HEME/FMN & CoSep<br />

activity-improved <strong>and</strong> reductase-independent variant. Protein Eng Des Sel 21: 29-<br />

35.<br />

12. Schwaneberg U, Appel D, Schmitt J, Schmid RD (2000) P450 in biotechnology: zinc<br />

driven omega-hydroxylation <strong>of</strong> p-nitrophenoxydodecanoic acid using P450 BM-3<br />

F87A as a catalyst. J Biotechnol 84: 249-257.<br />

13. Wong TS, Schwaneberg U (2003) Protein engineering in bioelectrocatalysis. Curr<br />

Opin Biotechnol 14: 590-596.<br />

14. Sevrioukova IF, Li HY, Zhang H, Peterson JA, Poulos TL (1999) Structure <strong>of</strong> a<br />

cytochrome P450-redox partner electron-transfer complex. P Natl Acad Sci USA 96:<br />

1863-1868.<br />

15. Humphrey W, Dalke A, Schulten K (1996) VMD: Visual molecular dynamics. J Mol<br />

Graphics 14: 33-&.<br />

16. van Gunsteren WF, Billeter SR, Eising AA, Hunenberger PH, Kruger P, et al. (1996)<br />

Biomolecular Simulation: The GROMOS96 Manual <strong>and</strong> User Guide. VdF:<br />

Hochschulverlag AG an der ETH Zurich <strong>and</strong> BIOMOS bv, Zurich, Groningen.<br />

17. Helms V, Deprez E, Gill E, Barret C, Hui Bon Hoa G, et al. (1996) Improved binding <strong>of</strong><br />

cytochrome P450cam substrate analogues designed to fill extra space in the<br />

substrate binding pocket. Biochemistry 35: 1485-1499.<br />

18. Roccatano D, Wong TS, Schwaneberg U, Zacharias M (2005) Structural <strong>and</strong> dynamic<br />

properties <strong>of</strong> cytochrome P450 BM-3 in pure water <strong>and</strong> in a<br />

dimethylsulfoxide/water mixture. Biopolymers 78: 259-267.<br />

19. Roccatano D, Wong TS, Schwaneberg U, Zacharias M (2006) Toward underst<strong>and</strong>ing<br />

the inactivation mechanism <strong>of</strong> monooxygenase P450 BM-3 by organic cosolvents: a<br />

molecular dynamics simulation study. Biopolymers 83: 467-476.<br />

20. Verma R, Schwaneberg U, Roccatano D Insight into the redox partner interaction<br />

mechanism in cytochrome P450BM-3 using molecular dynamics simulation.<br />

Unpublished.<br />

21. Walsh JD, Miller AF (2003) Flavin reduction potential tuning by substitution <strong>and</strong><br />

bending. J Mol Struc-Theochem 623: 185-195.<br />

22. Zheng Y-J, Ornstein RL (1996) A Theoretical Study <strong>of</strong> the Structures <strong>of</strong> Flavin in<br />

Different Oxidation <strong>and</strong> Protonation States. J Am Chem Soc 118: 9402-9408.<br />

168


PART II: P450BM-3 HEME/FMN & CoSep<br />

23. Verma R, Schwaneberg U, Roccatano D Conformational Dynamics <strong>of</strong> the FMNbinding<br />

Reductase Domain <strong>of</strong> Monooxygenase P450BM-3. Unpublished.<br />

24. Dehayes LJ, Busch DH (1973) Conformational Studies <strong>of</strong> Metal-Chelates .1. Intra-<br />

Ring Strain in 5-Membered <strong>and</strong> 6-Membered Chelate Rings. Inorganic Chemistry 12:<br />

1505-1513.<br />

25. Becke AD (1993) Density-functional thermochemistry. III. The role <strong>of</strong> exact<br />

exchange. The Journal <strong>of</strong> Chemical Physics 98: 5648-5652.<br />

26. Hay PJ, Willard RW (1985) Ab initio effective core potentials for molecular<br />

calculations. Potentials for the transition metal atoms Sc to Hg. The Journal <strong>of</strong><br />

Chemical Physics 82: 270-283.<br />

27. Breneman CM, Wiberg KB (1990) Determining Atom-Centered Monopoles from<br />

Molecular Electrostatic Potentials - the Need for High Sampling Density in<br />

Formamide Conformational-Analysis. Journal <strong>of</strong> Computational Chemistry 11: 361-<br />

373.<br />

28. Frisch MJ, Trucks GW, Schlegel HB, Scuseria GE, Robb MA, et al. (2009) Gaussian 09,<br />

Revision B.01. Gaussian 09, Revision B01, Gaussian, Inc, Wallingford CT. Wallingford<br />

CT.<br />

29. Balabin IA, Hu X, Beratan DN (2012) Exploring biological electron transfer pathway<br />

dynamics with the Pathways Plugin for VMD. J Comput Chem 33: 906-910.<br />

169


PART II: P450BM-3 HEME/FMN & CoSep SI<br />

Supporting Information<br />

Insight into the redox partner interaction mechanism in<br />

cytochrome P450BM-3 using molecular dynamics<br />

simulation<br />

Table S6.1: Force field parameters for cobalt(II)sepulchrate adopted from the force<br />

constants calculated on the energy minimized geometries by Dehayes et al.[1]<br />

Bond stretching parameters<br />

Bond type Force constant (kJ mol -1 nm -1 )<br />

Co – N 885.234<br />

N – C 2342.558<br />

N – H 2962.824<br />

C – C 2709.900<br />

C – H 2740.010<br />

Angle bending parameters<br />

Angle type Force constant (kJ mol -1 rad -2 )<br />

N – Co – N 240.88<br />

Co – N – C 167.41<br />

Co – N – H 167.41<br />

C – C – C 417.92<br />

C – C – H 292.06<br />

170


PART II: P450BM-3 HEME/FMN & CoSep SI<br />

C – N – H 167.41<br />

C – N – C 417.92<br />

N – C – C 417.92<br />

N – C – H 292.06<br />

N – C – N 334.22<br />

H – N – H 251.11<br />

Dihedral parameters<br />

Dihedral type Force constant (kJ mol -1 )<br />

H – C – C – H 2.729<br />

H – C – N – Co 2.729<br />

H – C – N – C 2.729<br />

H – N – C – C 2.729<br />

H – C – N – H 1.807<br />

H – C – C – C 4.103<br />

H – C – C – N 4.103<br />

Co – N – C – C 1.373<br />

N – Co – N – C 0.000<br />

N – C – C – N 2.060<br />

N – C – C – C 2.060<br />

C – C – C – C 2.060<br />

171


PART II: P450BM-3 HEME/FMN & CoSep SI<br />

Table S6.2: Partial charges on cobalt(II)sepulchrate calculated by DFT calculations <strong>and</strong><br />

adopted for GROMOS96 43a1 force field.[2]<br />

Atom number Atom type Atom name Charge group Partial charge<br />

1 CH2 C 1 0.514<br />

2 N N 2 -0.544<br />

3 CH2 C 3 0.149<br />

4 CH2 C 4 0.149<br />

5 N N 5 -0.542<br />

6 CH2 C 6 0.511<br />

7 N N 7 -0.945<br />

8 CH2 C 8 0.639<br />

9 N N 9 -0.785<br />

10 CH2 C 10 0.143<br />

11 CH2 C 11 0.253<br />

12 N N 12 -0.944<br />

13 CH2 C 13 0.697<br />

14 N N 14 -0.947<br />

15 CH2 C 15 0.639<br />

16 N N 16 -0.785<br />

17 CH2 C 17 0.143<br />

18 CH2 C 18 0.253<br />

19 N N 19 -0.944<br />

20 CH2 C 20 0.697<br />

21 CO CO 21 1.682<br />

22 H H 22 0.252<br />

23 H H 23 0.381<br />

24 H H 24 0.351<br />

25 H H 25 0.381<br />

26 H H 26 0.351<br />

172


PART II: P450BM-3 HEME/FMN & CoSep SI<br />

27 H<br />

H 27<br />

0.251<br />

Figure S6.1: Backbone root means square deviation (RMSD) with respect to reference structure<br />

as a function <strong>of</strong> time for AF chain (black), A <strong>of</strong> AF chain (red), F <strong>of</strong> AF chain (green), A (blue) <strong>and</strong> F<br />

(orange) chain in water (as dotted line), <strong>and</strong> CoSep solution (straight line). P450BM-3 domains<br />

deviated less in CoSep solution than the one in water only. Major difference was observed for<br />

isolated A chain (green color solid line) in the presence <strong>of</strong> CoSep with lower deviation than the one<br />

in water <strong>and</strong> it reached to a plateau after 25 ns with less variation.<br />

173


PART II: P450BM-3 HEME/FMN & CoSep SI<br />

Figure S6.2: Radius <strong>of</strong> gyration with respect to reference structure as a function <strong>of</strong> time for AF<br />

chain (black), A <strong>of</strong> AF chain (red), F <strong>of</strong> AF chain (green), A chain (blue) <strong>and</strong> F chain (orange) in<br />

water (as dotted line), <strong>and</strong> CoSep solution (straight line).<br />

174


PART II: P450BM-3 HEME/FMN & CoSep SI<br />

Figure S6.3: Backbone RMSD (a) <strong>and</strong> RMSF (b) per residue with respect to crystal structure for<br />

isolated A <strong>and</strong> F chain (in red color), AF complex (in black color) in water, <strong>and</strong> A chain (in green<br />

color) <strong>and</strong> AF complex (in orange color) in CoSep solution. The maroon vertical line separates<br />

HEME <strong>and</strong> FMN domains. Horizontal bars, in blue <strong>and</strong> orange color represent helices (labeled) <strong>and</strong><br />

beta sheets, respectively. The regions involved in c<strong>of</strong>actor binding are represented by horizontal<br />

bars in purple color.<br />

175


PART II: P450BM-3 HEME/FMN & CoSep SI<br />

Figure S6.4: The binding <strong>of</strong> CoSep (in ball <strong>and</strong> stick representation) on a) A chain <strong>and</strong> b) AF chain<br />

<strong>of</strong> P450BM-3 in cartoon representation with surface colored by element type (carbon in gray,<br />

oxygen in red, nitrogen in blue <strong>and</strong> hydrogen in white). FMN <strong>and</strong> HEME c<strong>of</strong>actors are in licorice<br />

representation in red <strong>and</strong> green color, respectively. Helices, loops <strong>and</strong> N- <strong>and</strong> C- terminus (in red)<br />

are labeled.<br />

176


PART II: P450BM-3 HEME/FMN & CoSep SI<br />

Figure S6.5: Average over the minimum distance (less than 2.3 nm) between CoSep <strong>and</strong><br />

isoalloxazine ring <strong>of</strong> FMN c<strong>of</strong>actor (in green color), <strong>and</strong> HEME iron <strong>of</strong> A chain in isolated domain (in<br />

black color) <strong>and</strong> complex simulation (in red color) as a function <strong>of</strong> time.<br />

Figure S6.6: Number <strong>of</strong> contacts between CoSep <strong>and</strong> amino acids <strong>of</strong> P450BM-3 domains within<br />

the distance <strong>of</strong> 0.50 nm.<br />

177


PART II: P450BM-3 HEME/FMN & CoSep SI<br />

Figure S6.7: Minimum distance between P45C α <strong>and</strong> A191C α (1.61 nm in crystal structure) as a<br />

function <strong>of</strong> time for isolated A <strong>and</strong> F chain (in red color), AF complex (in black color), <strong>and</strong> A chain<br />

(in green color) <strong>and</strong> AF complex (in orange color) with CoSep.<br />

178


PART II: P450BM-3 HEME/FMN & CoSep SI<br />

Figure S6.8: Minimum distance between heavy atoms <strong>of</strong> isoalloxazine ring <strong>of</strong> FMN <strong>and</strong> HEME<br />

c<strong>of</strong>actor as a function <strong>of</strong> time in AF complex in water (in black color) <strong>and</strong> AF complex in CoSep<br />

solution (in red color). Green color horizontal line shows the distance observed in crystal<br />

structure.[3]<br />

179


PART II: P450BM-3 HEME/FMN & CoSep SI<br />

Figure S6.9: RPF for first, second <strong>and</strong> third eigenvector <strong>of</strong> isolated A <strong>and</strong> F chain (in red color), AF<br />

complex (in black color) in water, <strong>and</strong> A chain (in green color) <strong>and</strong> AF complex (in orange color) in<br />

CoSep solution. . The maroon vertical line separates HEME <strong>and</strong> FMN domain. Horizontal bars, in blue<br />

<strong>and</strong> orange color represent helices es (labeled) <strong>and</strong> beta sheets, respectively. The regions involved in<br />

c<strong>of</strong>actor binding are represented by horizontal bars in purple color.<br />

180


PART II: P450BM-3 HEME/FMN & CoSep SI<br />

Figure S6.10: RPF for first, second <strong>and</strong> third eigenvector <strong>of</strong> AF complex in water (black color) <strong>and</strong><br />

in CoSep solution (orange color). The maroon vertical line separates HEME <strong>and</strong> FMN domain.<br />

Horizontal bars, in blue <strong>and</strong> orange color represent helices (labeled) <strong>and</strong> beta sheets, respectively.<br />

The regions involved in c<strong>of</strong>actor binding are represented by horizontal bars in purple color.<br />

181


PART II: P450BM-3 HEME/FMN & CoSep SI<br />

Figure S6.11: RMSF <strong>of</strong> protein backbone atoms along first (a), second (b) <strong>and</strong> third (c)<br />

eigenvector after projection <strong>of</strong> the trajectory on the corresponding eigenvector <strong>of</strong> AF complex in<br />

CoSep solution. The 10 sequential frames represent the extension <strong>of</strong> the fluctuations in trajectories<br />

along the eigenvectors. The first extreme conformation is shown in green color <strong>and</strong> last extreme in<br />

violet color. Other conformations <strong>of</strong> A <strong>and</strong> F chain are in sky blue <strong>and</strong> tan color, respectively. Helices<br />

<strong>and</strong> loops in FMN domain are labeled. N <strong>and</strong> C indicate the N- <strong>and</strong> C-terminus <strong>of</strong> the protein<br />

(labeled in red color).<br />

182


PART II: P450BM-3 HEME/FMN & CoSep SI<br />

References<br />

1. Dehayes LJ, Busch DH (1973) Conformational Studies <strong>of</strong> Metal-Chelates .1. Intra-<br />

Ring Strain in 5-Membered <strong>and</strong> 6-Membered Chelate Rings. Inorganic Chemistry 12:<br />

1505-1513.<br />

2. van Gunsteren WF, Billeter SR, Eising AA, Hunenberger PH, Kruger P, et al. (1996)<br />

Biomolecular Simulation: The GROMOS96 Manual <strong>and</strong> User Guide. VdF:<br />

Hochschulverlag AG an der ETH Zurich <strong>and</strong> BIOMOS bv, Zurich, Groningen.<br />

3. Sevrioukova IF, Li HY, Zhang H, Peterson JA, Poulos TL (1999) Structure <strong>of</strong> a<br />

cytochrome P450-redox partner electron-transfer complex. P Natl Acad Sci USA 96:<br />

1863-1868.<br />

183


Summary <strong>and</strong> outlook<br />

In the first part <strong>of</strong> the thesis, the importance <strong>of</strong> the combined computational <strong>and</strong><br />

directed evolution methods have been reviewed as a winning strategy for protein<br />

engineering. The computational approaches can assist the design <strong>of</strong> protein engineering<br />

experiments <strong>and</strong> holds particular promise to tailor proteins for specific functions.<br />

MAP 2.0 3D server has been introduced to assist the development <strong>of</strong> directed evolution<br />

experiments for generating sequence libraries with the highest chance to have variants<br />

with desired enzymatic properties. This task is accomplished by correlating the generated<br />

amino acid substitution patterns for a specific r<strong>and</strong>om mutagenesis method to the<br />

structural information <strong>of</strong> the target protein. The combined information can help to select<br />

an experimental strategy that improves the chances to obtain functional efficient <strong>and</strong>/or<br />

stable enzyme variants. Hence, MAP 2.0 3D server facilitates the ‘in-silico’ pre-screening <strong>of</strong><br />

the target gene by predicting the amino acid diversity population in r<strong>and</strong>om mutagenesis<br />

libraries. Currently, MAP 2.0 3D server provides sequence/structure based analysis using the<br />

protein sequence/structure (crystallographic structure or homology model) provided by<br />

the user. In future, the capability <strong>of</strong> the server can further be extended by (1) dynamically<br />

identifying the functionally important regions e.g. active site residues <strong>and</strong> trans-membrane<br />

regions in the target protein <strong>and</strong> focusing the analysis only on those regions, (2) by<br />

providing MAP 2.0 3D results <strong>of</strong> structural analysis in the absence <strong>of</strong> crystallographic or<br />

model structure using the predicted secondary structure elements from protein sequence,<br />

<strong>and</strong> (3) predicting the flexible regions in protein structure using e.g. Gaussian network<br />

model (GMN) <strong>and</strong> correlate them with MAP 2.0 3D analysis.<br />

184


In the second part <strong>of</strong> the thesis, molecular dynamics simulations were used to<br />

underst<strong>and</strong> the interaction mechanism in the HEME <strong>and</strong> FMN domains <strong>of</strong> P450BM-3 in<br />

solution <strong>and</strong> in the presence <strong>of</strong> electron mediator cobalt(II)sepulchrate (CoSep).<br />

Cytochrome P450BM-3 is the pivot member <strong>of</strong> cytochrome P450 monooxygenase<br />

superfamily particularly for being bacterial P450, fused with its eukaryotic like P450s<br />

redox partners (FMN <strong>and</strong> FAD binding domains). This structural feature makes the enzyme<br />

catalytically self-sufficient. In addition, being soluble in water, it has high catalytic<br />

efficiency <strong>and</strong> monooxygenase rate. These characteristics make the enzyme particularly<br />

interesting for possible biotechnological application. For this reason, the comprehension <strong>of</strong><br />

structure-function-dynamics relationships in P450BM-3 is relevant. In this thesis we have<br />

analyzed different dynamic <strong>and</strong> structural properties <strong>of</strong> the HEME domain, FMN domain<br />

<strong>and</strong> their complex in solution.<br />

In the first study, the effect <strong>of</strong> protonation states (oxidized <strong>and</strong> reduced) <strong>of</strong> FMN<br />

c<strong>of</strong>actor on conformation <strong>and</strong> dynamics <strong>of</strong> FMN-binding domain <strong>of</strong> P450BM-3 was<br />

analyzed by performing MD simulations <strong>of</strong> holo- <strong>and</strong> apo- protein in solution. In holoprotein,<br />

the protonation state <strong>of</strong> isoalloxazine ring influences the conformation <strong>and</strong><br />

dynamics <strong>of</strong> FMN c<strong>of</strong>actor <strong>and</strong> resulted in change in FMN binding site. In particular, the<br />

dynamics <strong>of</strong> FMN domain showed significant differences in the atomic fluctuation<br />

amplitude in oxidized <strong>and</strong> reduced states. In apo-protein, the overall structure remained<br />

conserved but high fluctuations were observed in FMN binding region that can promote the<br />

feasible rebinding <strong>of</strong> FMN c<strong>of</strong>actor as observed experimentally.<br />

The MD simulation <strong>of</strong> HEME <strong>and</strong> FMN domains were performed to gain insight into<br />

the interaction mechanism <strong>and</strong> inter domain electron transfer in HEME/FMN complex. The<br />

simulations <strong>of</strong> isolated HEME <strong>and</strong> FMN domains were also performed to compare their<br />

behavior in solution <strong>and</strong> in HEME/FMN complex. The HEME/FMN complex undergoes<br />

conformational rearrangement during the simulation <strong>and</strong> decrease the distance between<br />

FMN <strong>and</strong> HEME c<strong>of</strong>actor within the range for expected ET between both the redox centers.<br />

185


In complex the main collective motion was dominated by the interaction mechanism<br />

between HEME <strong>and</strong> FMN domain.<br />

The MD simulations <strong>of</strong> HEME/FMN complex <strong>and</strong> isolated HEME domain were<br />

performed to investigate the binding modes between CoSep <strong>and</strong> P450BM-3 domains <strong>and</strong><br />

their effect on ET pathway. CoSep prefers to bind on surface exposed loop regions mainly<br />

having negative charged residues. CoSep binding on HEME domain was observed to affect<br />

the substrate access channel <strong>and</strong> keep it more open in comparison to the one observed in<br />

solution. Putative ET pathways were proposed between CoSep <strong>and</strong> HEME iron in<br />

HEME/FMN complex <strong>and</strong> isolated HEME domain.<br />

The results <strong>of</strong> P450BM-3 simulations can enhance our basic underst<strong>and</strong>ing with the<br />

possible applications in enzyme catalysis toward (1) the effect <strong>of</strong> protonation state on<br />

dynamics <strong>of</strong> P450BM-3 reductase domain, (2) the interaction mechanism <strong>of</strong> redox partners<br />

<strong>and</strong> its effect on ET tunneling between redox centers, <strong>and</strong> (3) the effect <strong>of</strong> the presence <strong>of</strong><br />

ET mediator on redox partner interaction <strong>and</strong> ET tunneling. The study can be further<br />

extended by performing the MD simulation <strong>of</strong> HEME/FMN complex with FMN c<strong>of</strong>actor in<br />

reduced state. The modeling <strong>of</strong> linker regions connecting HEME <strong>and</strong> FMN domains followed<br />

by its simulation will help to further enhance our underst<strong>and</strong>ing toward HEME/FMN<br />

binding interaction mechanism <strong>and</strong> ET tunneling. Recently the release <strong>of</strong> FAD domain <strong>of</strong><br />

P450BM-3 with NADPH also <strong>of</strong>fers a chance to perform the simulation <strong>of</strong> the whole<br />

complex.<br />

186


Curriculum vitae<br />

Personal Details<br />

Name:<br />

Rajni Verma<br />

Address:<br />

School <strong>of</strong> Engineering <strong>and</strong> Science,<br />

<strong>Jacobs</strong> <strong>University</strong> Bremen,<br />

28759 Bremen, Germany<br />

Tel.: +49 421 200 3208<br />

Email:<br />

ra.verma@jacobs-university.de<br />

Date <strong>of</strong> Birth: 15 th April 1984<br />

Nationality<br />

Indian<br />

Linguistic skills:<br />

Hindi, English<br />

__________________________________<br />

Employment & Education<br />

_______________________________<br />

Since 06/09<br />

PhD Fellow in Computational Chemistry <strong>and</strong> Bioinformatics,<br />

<strong>Jacobs</strong> <strong>University</strong> Bremen, Bremen, Germany<br />

04/08 – 03/09 Project Assistant, Bioinformatics,<br />

Institute <strong>of</strong> Genomics <strong>and</strong> Integrative Biology, New Delhi, India<br />

07/05 – 03/08 Master <strong>of</strong> Science in Bioinformatics,<br />

CCS CS <strong>University</strong>, Meerut, India<br />

06/04 – 06/05 Advanced Diploma in Computer <strong>Application</strong>,<br />

CCSCS <strong>University</strong>, Meerut, India<br />

07/01 – 06/05 Bachelor <strong>of</strong> Science in Life Sciences,<br />

CCSCS <strong>University</strong>, Meerut, India<br />

Publications<br />

1. Verma R, , Schwaneberg U, Roccatano D. Conformational dynamics <strong>of</strong> the FMN-binding reductase<br />

domain <strong>of</strong> monooxygenase P450BM-3. J Chem Theory <strong>and</strong> Comput 2012, DOI: 10.1021/ct300723x.<br />

2. Verma R, , Schwaneberg U, Roccatano D. Computer-aided protein directed evolution: a review <strong>of</strong><br />

web servers, databases <strong>and</strong> other computational tools for protein engineering. Computational <strong>and</strong><br />

Structural Biotechnology Journal 2012, 2 (3), e201209008.<br />

3. Ruff AJ, Marienhagen J, Verma R, , Roccatano D, Genieser HG, Niemann P, Shivange AV,<br />

Schwaneberg U. dRTP <strong>and</strong> dPTP a complementary nucleotide couple for the Sequence Saturation<br />

Mutagenesis (SeSaM) method. J Mol Catal B-Enzym 2012, 84, 40-47.<br />

4. Verma R, Schwaneberg U, Roccatano D. MAP 2.0 3D: a sequence/structure based<br />

server for protein<br />

engineering. ACS Synth Bio. 2012, 1 (4), 139-150.


Curriculum vitae<br />

5. Ramach<strong>and</strong>ran S, Chaudhuri R, Verma R, Shah AR, Sen R, Paul C. Systems Immunology: Data<br />

modeling <strong>and</strong> scripting in R Book Chapter: Encyclopedia <strong>of</strong> Systems Biology, Springer Science &<br />

Business Media, LLC 2011. Edited by W. Dubitsky, O. Wolkenhauer, K. Cho, & H. Yokota.<br />

6. Zhu L, Verma R, Roccatano D, Ni Y, Sun Z, Schwaneberg U. A potential antitumor drug (arginine<br />

deiminase) reengineered for efficient operation under physiological conditions. ChemBioChem 2010,<br />

11, 2294-2301. [inside cover page]<br />

Conferences/Abstracts<br />

1. Verma R, Schwaneberg U, Roccatano D. Molecular dynamics simulations <strong>of</strong> P450BM-3 reductase<br />

domain. Computer simulation <strong>and</strong> theory <strong>of</strong> macromolecules 2012, Hunfeld, Germany.<br />

2. Verma R, Schwaneberg U, Roccatano D. Protein <strong>and</strong> c<strong>of</strong>actor conformational dynamics <strong>of</strong> FMNbinding<br />

reductase domain <strong>of</strong> monooxygenase P450BM-3. 5th Meeting <strong>of</strong> the North German<br />

Biophysicist 2012, Borstel, Germany.<br />

3. Verma R, Schwaneberg U, Roccatano D. MAP 2.0 3D: a structure based substitution spectra analyses<br />

<strong>of</strong> mutagenesis methods. 10 th International Symposium on Biocatalysis- Biotrans 2011, Sicily, Italy.<br />

4. Ruff AJ, Marienhagen J, Verma R, Roccatano D, Mundhada H, Shivange AV, Schwaneberg U.<br />

Ribavarin: A complementary universal base to P for Sequence Saturation Mutagensis Method<br />

(SeSaM). 10 th International Symposium on Biocatalysis- Biotrans 2011, Sicily, Italy.<br />

5. Verma R, Schwaneberg U, Roccatano D. Conformational dynamics <strong>of</strong> oxidized <strong>and</strong> reduced FMN in<br />

water <strong>and</strong> methanol. MoLife Center <strong>Jacobs</strong> <strong>University</strong> Bremen 2011, Seefeld, Germany.<br />

6. Verma R, Schwaneberg U, Roccatano D. MAP2.0: Evolution <strong>of</strong> Mutagenesis Assistant Program. 5 th<br />

International Congress on Biocatalysis- Biocat 2010, Hamburg, Germany.<br />

7. WE-Heraeus Summer School June 2009, ‘Quantum <strong>and</strong> classical simulation <strong>of</strong> biological systems <strong>and</strong><br />

their interaction with technical materials’, Bremen, Germany. (Participation)<br />

References<br />

Pr<strong>of</strong>. Dr. Danilo Roccatano<br />

Assistant Pr<strong>of</strong>essor<br />

School <strong>of</strong> Engineering <strong>and</strong> Science,<br />

<strong>Jacobs</strong> <strong>University</strong> Bremen,<br />

Campus Ring 1, Research II,<br />

28759 Bremen, Germany<br />

Tel: +49-421 200-3144<br />

Fax: +49-421-200-3249<br />

Email: d.roccatano@jacobs-university.de<br />

Web: http://ses.jacobs-university.de/ses/droccatano<br />

Pr<strong>of</strong>. Dr. Ulrich Schwaneberg<br />

Head <strong>of</strong> the Institute<br />

Department <strong>of</strong> Biotechnology,<br />

RWTH Aachen <strong>University</strong>,<br />

Worringer Weg 1,<br />

52056 Aachen, Germany<br />

Tel.: +49-241-80-24176<br />

Fax: +49-241-80-22387<br />

E-Mail: u.schwaneberg@biotec.rwth-aachen.de<br />

Web: www.biotec.rwth-aachen.de


Statutory Declaration<br />

I, RAJNI VERMA, hereby declare that I have written this PhD thesis independently,<br />

unless where clearly stated otherwise. I have used only the sources, the data <strong>and</strong> the<br />

support that I have clearly mentioned. This PhD thesis has not been submitted for<br />

conferral <strong>of</strong> degree elsewhere.<br />

Bremen, December 19, 2012<br />

Signature ____________________________________________________________

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!