12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2 Fold Recognition 41The idea of combining sequence and secondary structure when searching a databaseis schematically represented in Fig. 2.4b. This general idea demonstrated significantlysuperior performance over standard sequence searching and was ordersof magnitude faster <strong>to</strong> compute than most threading algorithms – a feature which isparticularly important when searching large databases of templates.In the early days of the international CASP competition, threading approacheswere generally the <strong>to</strong>p performers <strong>with</strong> the hybrid sequence-structure approachesjust described following closely behind. However, two fac<strong>to</strong>rs were <strong>to</strong> push threadingoff the <strong>to</strong>p spot in the structure prediction game: (1) the explosion in the size ofthe sequence database and (2) the development of PSI-BLAST.2.3.2 Sequence Profiles and Hidden Markov ModelsAs the sequence databases were rapidly growing in size due <strong>to</strong> worldwide efforts atgenome sequencing, technological developments geared <strong>to</strong>wards using this informationefficiently were underway. A simple approach by Park et al. (1997) illustratedhow two homologous sequences, which have diverged beyond the pointwhere their homology can be recognised by a simple direct comparison, can berelated through a third sequence that is suitably intermediate between the two.Known as ‘intermediate sequence search’, this ‘hopping’ through sequence spacewas clearly going <strong>to</strong> be powerful, and a more refined approach was developed inPSI-BLAST (Altschul et al. 1997). Instead of using a fixed 20 × 20 scoring matrixfor every protein, and for every position in a protein, one could construct an n × 20scoring matrix, or profile that captures the specific mutational propensities of eachposition in a specific protein sequence. For this reason such a profile is often calleda position specific scoring matrix or PSSM.After an initial standard BLAST scan <strong>to</strong> collect relatively close homologues, the(pseudo) multiple sequence alignment of these homologues <strong>to</strong> the query sequencepermits one <strong>to</strong> calculate statistics based on the observed mutations at each positionin the query sequence. These statistics form the basis of a new scoring matrix whichcan be used for a subsequent round of searching. This process of collecting homologues,building a new scoring function and searching again <strong>with</strong> this new scoringfunction can be iterated many (usually between 5 and 10) times and is calledPosition Specific Iterated BLAST (PSI-BLAST). Coupling this powerful iterativeapproach <strong>with</strong> the growing sequence database permitted a substantial improvementin the detection of extremely remote homology, and this was reflected in theFig. 2.4 (continued) a profile used for secondary structure, i.e. Each position in each sequencehas a probability for each of the three types of secondary structure. Note, in this case one wouldprobably use predicted secondary structure for the templates even though one knows the truesecondary structure. This has been shown <strong>to</strong> perform well (e.g. Bennett-Lovsey et al. 2008)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!