Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
entire set of comprehensive and pre-computed structural alignments are obta<strong>in</strong>ed with the VAST algorithm [36] where as<br />
the visualization tools for 3D structure and structure/sequence alignment are provided by the molecular graphics viewer<br />
Cn3D [37].<br />
As on April 29, 2013, there are 89,571 structure records total which <strong>in</strong>cludes 22,084 prote<strong>in</strong>s, 715 DNA and 508 RNA<br />
molecules only. Additionally, MMDB also consist of 2571 prote<strong>in</strong>-DNA complexes, 1113 prote<strong>in</strong>-RNA complexes and 116<br />
prote<strong>in</strong>-DNA-RNA complexes, <strong>in</strong> addition to more than 60,000 prote<strong>in</strong>s bound to chemicals.<br />
2.2.2.3 ModBase<br />
ModBase is a database of annotated homology based prote<strong>in</strong> structure models. Models <strong>in</strong> ModBase are generated as<br />
an automated software pipel<strong>in</strong>e for comparative prote<strong>in</strong> structure model<strong>in</strong>g, known as ModPipe [3] which mostly rely on<br />
modules of Modeller [38]. For fold assignment and target–template alignment, ModPipe uses sequence–sequence [39]<br />
sequence–profile [40,41] and profile–profile [40,42] methods by us<strong>in</strong>g an E-value cut-off of 1.0 to augment the possibility<br />
of identify<strong>in</strong>g the f<strong>in</strong>est available template structure. 10 models are generated [38] for each target–template alignment, and<br />
the model with the top Discrete Optimized Prote<strong>in</strong> Energy (DOPE) statistical potential [43] score is selected and further<br />
assessed by numerous additional quality criteria: (i) target–template sequence identity, (ii) GA341 score [44] (iii) Z-DOPE<br />
score [43] (iv) ModPipe Quality Score (MPQS) and (v) TSVMod score [45].<br />
Because of the rapid growth of the public sequence databases, models <strong>in</strong> ModBase are structured <strong>in</strong> data sets that are<br />
useful for specific projects. Currently, ModBase <strong>in</strong>cludes about 27,288,148 models and 4,332,658 unique sequences modeled<br />
for more than 50 complete genomes [46]. ModBase can be queried through its web <strong>in</strong>terface by query<strong>in</strong>g with UniprotKB<br />
[47] and GI [48] identifiers, gene names, annotation keywords, PDB codes [49] data set names, organism names, sequence<br />
similarity to the modeled sequences (BLAST [41]) and model-specific criteria such as model reliability, model size and<br />
target–template sequence identity. Additionally, the coord<strong>in</strong>ate and alignment files can also be retrieved as text files [50].<br />
2.2.2.4 SWISS-MODEL Repository<br />
SWISS-MODEL Repository is a database of 3D prote<strong>in</strong> structure models constructed by us<strong>in</strong>g the SWISS-MODEL<br />
homology-modell<strong>in</strong>g pipel<strong>in</strong>e based on prote<strong>in</strong> sequences from the UniProt database [47]. The SWISS-MODEL pipel<strong>in</strong>e<br />
<strong>in</strong>tegrates various steps like: template selection, target sequence and template structure alignment, model build<strong>in</strong>g, energy<br />
m<strong>in</strong>imization and/or ref<strong>in</strong>ement and model quality assessment [51] Model target sequences are <strong>in</strong>dividually identified by<br />
their md5 cryptographic hash of the full length raw am<strong>in</strong>o acid sequence which permits the redundancy <strong>in</strong> prote<strong>in</strong> sequence<br />
databases entries to be reduced, and <strong>in</strong> turn assists cross-referenc<strong>in</strong>g with databases by means of different accession codes.<br />
The current SWISS-MODEL Repository release conta<strong>in</strong>s 3143784 model entries for 2286870 unique sequences <strong>in</strong> the<br />
UniProt database (2013_02).<br />
The database could be queried for particular prote<strong>in</strong>s by us<strong>in</strong>g diverse database accession codes (e.g. UniProt AC and<br />
ID, GenBank, IPI, Refseq) or directly by means of the prote<strong>in</strong> am<strong>in</strong>o acid sequence. For a particular query prote<strong>in</strong>, a<br />
graphical outl<strong>in</strong>e demonstrat<strong>in</strong>g the segments for which models or experimental structures are available is shown. SWISS-<br />
MODEL Repository users can review the quality of the models <strong>in</strong> the database; search for alternative template structures,<br />
and construct models <strong>in</strong>teractively by the use of SWISS-MODEL Workspace [52]. Repository is updated on a regular basis<br />
to reflect the growth of the sequence and structure databases.<br />
3. 3D Structure Prediction<br />
3.1 A brief history of molecular modell<strong>in</strong>g<br />
The first homology based model dates back to 1969 when a wire and plastic models of bonds and atoms of α-lactalbum<strong>in</strong><br />
was constructed by us<strong>in</strong>g the coord<strong>in</strong>ates of a hen’s egg-white lysozyme and adjust<strong>in</strong>g, physically, those am<strong>in</strong>o acids that<br />
did not match the structure [53]. The two prote<strong>in</strong>s exhibited 39% of sequence identity. Afterwards, the crystal structure of<br />
lysozyme was used to generate a model for α-lactalbum<strong>in</strong> [54]. These models were created by tak<strong>in</strong>g the exist<strong>in</strong>g coord<strong>in</strong>ates<br />
of the well-known structure, and mutat<strong>in</strong>g side cha<strong>in</strong>s that were not identical <strong>in</strong> the prote<strong>in</strong> to be modeled. This approach to<br />
prote<strong>in</strong> model<strong>in</strong>g is still used at present with substantial success, particularly when the prote<strong>in</strong>s share a considerable degree<br />
of sequence similarity [55].<br />
McLachlan and Shotton [56] used the structures of mammalian chymotryps<strong>in</strong> and elastase, and modeled the structure<br />
of α-lytic prote<strong>in</strong>ase of the fungus Myxobacter 495. The model<strong>in</strong>g was not easy as the sequence similarity between the<br />
target and the template was only about 18%. Subsequently, the crystal structure of α-lytic prote<strong>in</strong>ase was determ<strong>in</strong>ed and<br />
compared with the homology model [57]. Although the doma<strong>in</strong>s of the model were constructed accurately, it was found<br />
that misalignment of the sequences led to local errors.<br />
The model<strong>in</strong>g of variable regions was <strong>in</strong>troduced <strong>in</strong> prote<strong>in</strong>s on the basis of equivalent regions from homologous<br />
prote<strong>in</strong>s of known structures [58,59]. Therefore, <strong>in</strong> order to construct the homology models of various ser<strong>in</strong>e proteases,<br />
structures of tryps<strong>in</strong>, chymotryps<strong>in</strong> and elastase were superimposed, and it was found many equivalent Cα atoms were<br />
with<strong>in</strong> 1.0Å of one another. The regions compris<strong>in</strong>g of the am<strong>in</strong>o acids of these Cα atoms were described as structurally<br />
conserved regions (SCRs). All the other rema<strong>in</strong><strong>in</strong>g sites correspond to structurally variable or loop regions (VR) where<br />
the <strong>in</strong>sertions/deletions were located. The backbone of SCRs and VRs was generated from the fragments of known ser<strong>in</strong>e<br />
proteases, where as the side cha<strong>in</strong>s were modeled based on the conformation found at the equivalent locations for those<br />
identical side cha<strong>in</strong>s <strong>in</strong> the well-known structures.<br />
OMICS Group eBooks<br />
08