12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

30 L.A. Kelleymodelling chapter of this book. The advantages of this approach are clear; it iscomputationally quick, and the accuracy of the resulting model will be very highgiven a high sequence similarity between query and template. This immediatelypoints <strong>to</strong> the method’s limitations. If no similar sequence has yet had its structuresolved, we can make no progress at all.So, we have two lines of attack in the search for a solution <strong>to</strong> the protein structureprediction problem. One approach, based on general physics principles, aims at providinga well-unders<strong>to</strong>od, universal technique <strong>to</strong> predict structure from sequence, <strong>with</strong>the added benefit of enabling protein design, a study of dynamics and much more.However, it is extremely difficult and will probably remain computationally intractablefor years <strong>to</strong> come. At the other extreme, we have a straightforward but highly limitedheuristic technique, homology modelling, which can give high accuracy models, bu<strong>to</strong>nly in a very limited number of cases. It is against this backdrop that the term ‘foldrecognition’ was coined, <strong>to</strong> act as a bridge between these two extremes.2.1.3 The Limits of Fold SpaceSeveral key observations about the nature of proteins are in order. Of the approximately50,000 experimentally determined protein structures in the protein data bank(Berman et al. 2000), the Structural Classification of <strong>Protein</strong>s (SCOP; Murzin et al.1995) has grouped these structures in<strong>to</strong> just 1,100 unique structural folds (unique<strong>to</strong>pologies), and ∼1,800 superfamilies (evolutionarily related protein families). Asmore and more structures are solved experimentally, the number of new folds discoveredincreases very slowly. And the rate of new fold discovery appears <strong>to</strong> be declining(Fig. 2.2). These findings have led <strong>to</strong> the broad acceptance of the view that there area finite and relatively small number of folds found in nature (Marsden et al.2006). There are hundreds if not thousands of examples in the structure databasedemonstrating that highly similar structures may have radically different sequences.So although it is true that highly similar sequences adopt highly similar structures, so<strong>to</strong>o do highly dissimilar sequences sometimes adopt similar structures.Thus, it appears that any sequence we choose from the database of sequencedgenomes has a high probability of adopting a structure we have already seen. Thebig question is how <strong>to</strong> work out which of the 50,000 structures is the right templateand how <strong>to</strong> align our sequence <strong>to</strong> that structure. Fold recognition is concerned<strong>with</strong> the search for scoring functions that can reliably detect the compatibilityof a sequence <strong>with</strong> a known structure and align them accurately when simplesequence similarity cannot be seen.Despite the size of sequence space, i.e. the space of all possible proteinsequences, the space of protein structures appears considerably smaller. Whetherthis is related <strong>to</strong> thermodynamics, the kinetics of folding or <strong>to</strong> evolutionaryselection is difficult <strong>to</strong> say and beyond the scope of this chapter. Nevertheless itis a highly fortui<strong>to</strong>us fact that has been of great benefit in the field of proteinstructure prediction.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!