29.10.2014 Views

advances-in-protein-chemistry

advances-in-protein-chemistry

advances-in-protein-chemistry

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

entire set of comprehensive and pre-computed structural alignments are obta<strong>in</strong>ed with the VAST algorithm [36] where as<br />

the visualization tools for 3D structure and structure/sequence alignment are provided by the molecular graphics viewer<br />

Cn3D [37].<br />

As on April 29, 2013, there are 89,571 structure records total which <strong>in</strong>cludes 22,084 prote<strong>in</strong>s, 715 DNA and 508 RNA<br />

molecules only. Additionally, MMDB also consist of 2571 prote<strong>in</strong>-DNA complexes, 1113 prote<strong>in</strong>-RNA complexes and 116<br />

prote<strong>in</strong>-DNA-RNA complexes, <strong>in</strong> addition to more than 60,000 prote<strong>in</strong>s bound to chemicals.<br />

2.2.2.3 ModBase<br />

ModBase is a database of annotated homology based prote<strong>in</strong> structure models. Models <strong>in</strong> ModBase are generated as<br />

an automated software pipel<strong>in</strong>e for comparative prote<strong>in</strong> structure model<strong>in</strong>g, known as ModPipe [3] which mostly rely on<br />

modules of Modeller [38]. For fold assignment and target–template alignment, ModPipe uses sequence–sequence [39]<br />

sequence–profile [40,41] and profile–profile [40,42] methods by us<strong>in</strong>g an E-value cut-off of 1.0 to augment the possibility<br />

of identify<strong>in</strong>g the f<strong>in</strong>est available template structure. 10 models are generated [38] for each target–template alignment, and<br />

the model with the top Discrete Optimized Prote<strong>in</strong> Energy (DOPE) statistical potential [43] score is selected and further<br />

assessed by numerous additional quality criteria: (i) target–template sequence identity, (ii) GA341 score [44] (iii) Z-DOPE<br />

score [43] (iv) ModPipe Quality Score (MPQS) and (v) TSVMod score [45].<br />

Because of the rapid growth of the public sequence databases, models <strong>in</strong> ModBase are structured <strong>in</strong> data sets that are<br />

useful for specific projects. Currently, ModBase <strong>in</strong>cludes about 27,288,148 models and 4,332,658 unique sequences modeled<br />

for more than 50 complete genomes [46]. ModBase can be queried through its web <strong>in</strong>terface by query<strong>in</strong>g with UniprotKB<br />

[47] and GI [48] identifiers, gene names, annotation keywords, PDB codes [49] data set names, organism names, sequence<br />

similarity to the modeled sequences (BLAST [41]) and model-specific criteria such as model reliability, model size and<br />

target–template sequence identity. Additionally, the coord<strong>in</strong>ate and alignment files can also be retrieved as text files [50].<br />

2.2.2.4 SWISS-MODEL Repository<br />

SWISS-MODEL Repository is a database of 3D prote<strong>in</strong> structure models constructed by us<strong>in</strong>g the SWISS-MODEL<br />

homology-modell<strong>in</strong>g pipel<strong>in</strong>e based on prote<strong>in</strong> sequences from the UniProt database [47]. The SWISS-MODEL pipel<strong>in</strong>e<br />

<strong>in</strong>tegrates various steps like: template selection, target sequence and template structure alignment, model build<strong>in</strong>g, energy<br />

m<strong>in</strong>imization and/or ref<strong>in</strong>ement and model quality assessment [51] Model target sequences are <strong>in</strong>dividually identified by<br />

their md5 cryptographic hash of the full length raw am<strong>in</strong>o acid sequence which permits the redundancy <strong>in</strong> prote<strong>in</strong> sequence<br />

databases entries to be reduced, and <strong>in</strong> turn assists cross-referenc<strong>in</strong>g with databases by means of different accession codes.<br />

The current SWISS-MODEL Repository release conta<strong>in</strong>s 3143784 model entries for 2286870 unique sequences <strong>in</strong> the<br />

UniProt database (2013_02).<br />

The database could be queried for particular prote<strong>in</strong>s by us<strong>in</strong>g diverse database accession codes (e.g. UniProt AC and<br />

ID, GenBank, IPI, Refseq) or directly by means of the prote<strong>in</strong> am<strong>in</strong>o acid sequence. For a particular query prote<strong>in</strong>, a<br />

graphical outl<strong>in</strong>e demonstrat<strong>in</strong>g the segments for which models or experimental structures are available is shown. SWISS-<br />

MODEL Repository users can review the quality of the models <strong>in</strong> the database; search for alternative template structures,<br />

and construct models <strong>in</strong>teractively by the use of SWISS-MODEL Workspace [52]. Repository is updated on a regular basis<br />

to reflect the growth of the sequence and structure databases.<br />

3. 3D Structure Prediction<br />

3.1 A brief history of molecular modell<strong>in</strong>g<br />

The first homology based model dates back to 1969 when a wire and plastic models of bonds and atoms of α-lactalbum<strong>in</strong><br />

was constructed by us<strong>in</strong>g the coord<strong>in</strong>ates of a hen’s egg-white lysozyme and adjust<strong>in</strong>g, physically, those am<strong>in</strong>o acids that<br />

did not match the structure [53]. The two prote<strong>in</strong>s exhibited 39% of sequence identity. Afterwards, the crystal structure of<br />

lysozyme was used to generate a model for α-lactalbum<strong>in</strong> [54]. These models were created by tak<strong>in</strong>g the exist<strong>in</strong>g coord<strong>in</strong>ates<br />

of the well-known structure, and mutat<strong>in</strong>g side cha<strong>in</strong>s that were not identical <strong>in</strong> the prote<strong>in</strong> to be modeled. This approach to<br />

prote<strong>in</strong> model<strong>in</strong>g is still used at present with substantial success, particularly when the prote<strong>in</strong>s share a considerable degree<br />

of sequence similarity [55].<br />

McLachlan and Shotton [56] used the structures of mammalian chymotryps<strong>in</strong> and elastase, and modeled the structure<br />

of α-lytic prote<strong>in</strong>ase of the fungus Myxobacter 495. The model<strong>in</strong>g was not easy as the sequence similarity between the<br />

target and the template was only about 18%. Subsequently, the crystal structure of α-lytic prote<strong>in</strong>ase was determ<strong>in</strong>ed and<br />

compared with the homology model [57]. Although the doma<strong>in</strong>s of the model were constructed accurately, it was found<br />

that misalignment of the sequences led to local errors.<br />

The model<strong>in</strong>g of variable regions was <strong>in</strong>troduced <strong>in</strong> prote<strong>in</strong>s on the basis of equivalent regions from homologous<br />

prote<strong>in</strong>s of known structures [58,59]. Therefore, <strong>in</strong> order to construct the homology models of various ser<strong>in</strong>e proteases,<br />

structures of tryps<strong>in</strong>, chymotryps<strong>in</strong> and elastase were superimposed, and it was found many equivalent Cα atoms were<br />

with<strong>in</strong> 1.0Å of one another. The regions compris<strong>in</strong>g of the am<strong>in</strong>o acids of these Cα atoms were described as structurally<br />

conserved regions (SCRs). All the other rema<strong>in</strong><strong>in</strong>g sites correspond to structurally variable or loop regions (VR) where<br />

the <strong>in</strong>sertions/deletions were located. The backbone of SCRs and VRs was generated from the fragments of known ser<strong>in</strong>e<br />

proteases, where as the side cha<strong>in</strong>s were modeled based on the conformation found at the equivalent locations for those<br />

identical side cha<strong>in</strong>s <strong>in</strong> the well-known structures.<br />

OMICS Group eBooks<br />

08

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!