12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

196 E.C. Meng et al.used in matching. An alpha-carbon-only description is less specific than one thatincludes side chain a<strong>to</strong>ms. Three-dimensional motifs <strong>with</strong> fewer residues are lessspecific. Stringent match settings (only allowing residues of identical types <strong>to</strong>pair, a low RMSD cu<strong>to</strong>ff) can restrict results <strong>to</strong> closely related proteins even ifmeaningful matches <strong>to</strong> more distantly related proteins could be obtained <strong>with</strong>looser criteria.Most methods provide numerical scores <strong>to</strong> rank hits and indicate match quality.For example, RMSD values indicate the geometric fit between points in a 3D motifand the corresponding points in a structure. RMSD is an appropriate measure forranking matches <strong>to</strong> a given motif, but it is not useful for comparing among motifsof different sizes. Further, some motifs are more likely <strong>to</strong> be matched merelybecause they contain more common residues. To account for these issues and providea better ranking of hits, some methods calculate statistical significance orexpectation values (e.g., p-values or E-values). Some limitations must be kept inmind, however, as these values depend on any underlying assumptions of a statisticalmodel and on the data used <strong>to</strong> parameterize the model.Regardless of how a 3D motif was generated, it can be evaluated against asample of structures. When the sample includes validated positive and negativeexamples, the results can be expressed in terms of sensitivity, the ability <strong>to</strong> identifythe positive examples, and specificity, the ability <strong>to</strong> exclude the negative examples.When the sample includes just negative examples, the resulting RMSD distributioncan be used <strong>to</strong> estimate the statistical significance of matches <strong>to</strong> that motif.The usefulness of these derived quantities depends on an adequately large andrepresentative sample.A consensus approach may be helpful, where multiple hits <strong>to</strong> related motifs orsimilar results obtained <strong>with</strong> different programs or databases (Table 8.1) may converge<strong>to</strong> a common prediction.Finally, common sense must be applied. For example, there may be a significantmatch <strong>to</strong> an active site motif but no pocket for binding the substrate. Further, statisticalsignificance is not the same thing as biological significance – a biologicallysignificant motif may not score as statistically significant compared <strong>to</strong> a motif thathas no biological importance. Thus, it is prudent <strong>to</strong> inspect matches visually and <strong>to</strong>evaluate them using biologically relevant criteria before using them <strong>to</strong> infer functionor any other characteristic of a protein.8.3 Specific MethodsThe studies of 3D motifs can be described according <strong>to</strong> how they treat the problemof choosing motifs. First are those studies that focus on evaluating methods forfinding matches <strong>to</strong> any user-defined motif, and leave the problem of choosing a relevantmotif <strong>to</strong> the users of the method or later studies. Second are those studies thattreat motif discovery or generation of motif libraries as essential <strong>to</strong> their method, ifnot the primary goal of their method.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!