12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8 3D Motifs 207from a training set of structures, are expressed as allowed ranges of intera<strong>to</strong>mic distancesand other geometric scalars rather than as 3D coordinates. Sensitivity andspecificity values for the motifs are available at the web site, and the program(Perl scripts and associated data files) can also be downloaded (Table 8.2).8.3.2.5 Positive and Negative ExamplesThe main difference between the “positive and negative examples” and “positiveexamples” approaches is that the former explicitly considers structures outside theclassification of interest during motif discovery. In other words, motif generationand evaluation of motif specificity are intertwined.Geometric sieving refines an existing motif or list of putatively important residuesbased on geometric uniqueness (Chen et al. 2007a). RMSD distributions forcandidate motifs (subsets of the input list) are obtained by comparing them <strong>to</strong> arepresentative sample of structures. Geometric sieving does not require segregationof the positive and negative examples; instead, it is assumed that the low-RMSD tail in a distribution represents true positives and the rest of the distributionrepresents false positives. For motifs of a given number of residues, the one <strong>with</strong>the highest median RMSD is taken as the most geometrically unique, as it providesthe best separation between the main part of the distribution and the low-RMSD tail. The main limitation is that the “right” residues must have beenincluded in the starting motif.Given positive and negative example structures, GASPS (Genetic AlgorithmSearch for Patterns in <strong>Structure</strong>s) finds patterns of residues that best separate thetwo groups (Polacco and Babbitt 2006). No prior residues list is required, andhow the positive/negative groups are defined is independent of the method. Theunderlying search <strong>to</strong>ol is SPASM (Kleywegt 1999), <strong>with</strong> residues represented byalpha-carbons and side chain centroids and only identical types allowed <strong>to</strong> match.To limit the search space, GASPS considers only the 100 most conserved residuesin a structure chain, based on an au<strong>to</strong>matically constructed sequence alignment.An initial candidate motif is constructed by picking one residue randomly andthen choosing four more, also randomly except in the vicinity of the first. Eachof 50 initial candidates is scored on how well it separates the positive and negativestructures in terms of best match RMSD values. In each round of the geneticalgorithm, the 16 highest-scoring motifs are used as the parents of 36 new motifs,and the <strong>to</strong>p-scoring motif after 50 rounds is declared the winner. Motifs areallowed <strong>to</strong> contain from three <strong>to</strong> ten residues. Sensitive and specific motifs wereobtained for diverse superfamilies (Babbitt and Gerlt 2000) and serine proteases.Most of the residues in the motifs were functionally important, but in some cases,residues <strong>with</strong> no known functional role were found <strong>to</strong> be equally predictive(Polacco and Babbitt 2006).The GASPSdb server (Polacco and Babbitt, in preparation) (Table 8.1) comparesa query structure <strong>to</strong> databases of 3D motifs previously generated by GASPSfor several protein classification schemes: SCOP superfamilies, SCOP families,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!