12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

48 L.A. Kelleyconsistency is analyzed. For regions where one dominant alignment variant is produced,the alignment is considered reliable, while the regions where the consistencyof query–template alignment is lacking are deemed unreliable. Thus alignmentaccuracy is increased by searching for a consensus of alignments – similar in spirit<strong>to</strong> the idea of 3D-Jury where consensus in structure space is sought. Prasad et al.(2004) take a similar approach using five different methods for alignment generationand searching for a consensus between them.Tress et al. (2003) looked at the distribution of residue-residue profile scoresalong the length of an alignment. They found that accurate regions of alignmentcould be reliably discriminated based on contiguous stretches of high scoringresidues.As mentioned earlier, a dynamic programming or HMM approach guarantees an‘optimal’ alignment given a scoring function. However, because the scoring functionsare not perfect, there may be many similar alignments <strong>to</strong> the ‘optimal’ one<strong>with</strong> slightly poorer scores which may in fact be more accurate from a structuralpoint of view. Similarly, alignment algorithms require some parameterisation of thelikelihood of insertions and deletions and these parameters will not be optimal forall proteins. For these reasons Jaroszewski et al. (2002) performed a systematicinvestigation of the ‘sub-optimal’ alignments near the ‘optimal’ one by varyingalignment parameters and weakening the strongest path through the dynamic programmingmatrix. In doing so they discovered that alignments far more accuratethan the ‘optimal’ one according <strong>to</strong> the scoring function may be found by a modestsearch of alignments ‘near’ the optimal one. This left open the question of how onecould reliably pick such improved alignments out of the large pool of alternatives.Chivian and Baker (2006) tackled this problem by building models based oneach alignment and assessing the models using a combination of structural clustering(e.g. 3D-Jury) and their finely tuned 3D protein energy function. Similarly,Wallner and Elofsson (2006) trained a neural network on the residue environmentsand profile-profile scores from a set of protein models <strong>to</strong> generate a predic<strong>to</strong>r ofmodel quality. Finally, McGuffin (2008) has used several programs for assessingmodel quality <strong>to</strong>gether <strong>with</strong> structural clustering techniques such as 3D-Jury asinput <strong>to</strong> a neural network predic<strong>to</strong>r.2.4.2 Estimation of Statistical SignificanceFor the techniques described in this chapter <strong>to</strong> be of practical use <strong>to</strong> the generalbioscience community requires reliable estimates of error. If a molecular biologistis confronted <strong>with</strong> a prediction <strong>with</strong>out indication of the likelihood of the prediction’saccuracy, the prediction is next <strong>to</strong> useless. In a sequence search, or a searchof a fold library, or a set of threading models, the common result is a list of scores.We know that after comparing a sequence <strong>with</strong> a library of potential models, thatthe vast majority of these models must be incorrect. Thus the majority of thesequence-structure scores can be treated as background noise. One may then use

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!