12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3 Comparative <strong>Protein</strong> <strong>Structure</strong> Modelling 81facilitated by structural information for all or almost all proteins. Much of the structuralinformation will be provided by structural genomics (Burley et al. 2008;Chance et al. 2002), a large-scale determination of protein structures by X-raycrystallography and nuclear magnetic resonance spectroscopy, combined efficiently<strong>with</strong> accurate, au<strong>to</strong>mated and large-scale comparative protein structure modellingtechniques. Given the performance of the current modelling techniques, it seemsreasonable <strong>to</strong> require models based on at least 30% sequence identity (Vitkup et al.2001), corresponding <strong>to</strong> one experimentally determined structure per sequencefamily, rather than fold family.To enable large-scale comparative modelling needed for structural genomics,the steps of comparative modelling are being assembled in<strong>to</strong> a completelyau<strong>to</strong>mated pipelines such as SWISS-MODEL reposi<strong>to</strong>ry or MODBASE(Kopp and Schwede 2006; Pieper et al. 2006) each of which contains morethan a million models. Statistics of these databases show that domains inapproximately 70% of the known protein sequences can be modelled. This isdue substantially of the almost 2,000 structures that were deposited by thestructural genomics centres, which focus on new folds or novel structure.These depositions contributed 73% of all novel structural features in the PDBin the last 7 years (Burley et al. 2008).While the current number of at least partially modelled proteins may lookimpressive, usually only one domain per protein is modelled. On average, in contrast,proteins have two or three domains. For example, the average length of a yeas<strong>to</strong>pen reading frame (ORF) is 472 amino acids, while the average size of domainsin CATH, a database of structural domains, is 175 amino acids. The average modelsize in MODBASE, a database of comparative models, is only slightly longer at 192residues. Furthermore, in two thirds of the modelling cases the template shares lessthan 30% sequence identity <strong>to</strong> the closest template.3.5 SummaryComparative modelling has already proven <strong>to</strong> be a useful <strong>to</strong>ol in many biological applicationsand its importance among structure prediction methods is expected <strong>to</strong> be furtheraccentuated because of the many experimental structures emerging from <strong>Protein</strong><strong>Structure</strong> Initiative projects and the continuous improvements in methodologies.The average sequence identity between structurally related proteins in general is justaround 8–9%, and most of them share less than 15% identity (Rost 1997). Comparativemodelling is largely restricted <strong>to</strong> that subset of sequences that share a recognizablesequence similarity <strong>to</strong> a protein <strong>with</strong> a known structure; therefore it is safe <strong>to</strong> assumethat this approach is still only scratching the surface of possibilities in terms of recognizingand utilizing useful structural information. Fold recognition methods discussed inChapter 2 will have an important role in extending the possibilities for comparativemodelling <strong>to</strong>wards ever remote homologues and even structural analogues.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!