12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8 3D Motifs 2018.3.2 Motif Discovery8.3.2.1 LiteraturePerhaps the most reliable but least au<strong>to</strong>matable approach <strong>to</strong> motif discovery is <strong>to</strong>mine the published literature for experimental evidence on which residues areimportant for the function of a protein. For 3D motifs, the focus is more on residuesthat provide a specific binding or catalytic capability rather than stabilizing thestructure, although it is not always possible <strong>to</strong> separate these aspects of function.The Catalytic Site Atlas (CSA) (Table 8.1) contains several hundred familiesof enzymes, each comprised of a structure <strong>with</strong> catalytic residue annotations fromthe literature and a set of related sequences (Porter et al. 2004). Representativestructural templates (3D motifs) based on side chain functional a<strong>to</strong>ms or onalpha- and beta-carbons are available for a subset of the families (Torrance et al.2005). These can be searched <strong>with</strong> a structure of interest or downloaded (Table8.1). Searching is performed <strong>with</strong> the program JESS (Barker and Thorn<strong>to</strong>n 2003);chemically similar residue types such as aspartate and glutamate are allowed <strong>to</strong>match. Statistical significance is evaluated <strong>with</strong> a formula that incorporates thenumber of residues in a motif, the number of points per residue, residue abundances,and parameters determined empirically by treating the RMSD distributionsas exponents of power functions (Stark et al. 2003). This formula estimatesbackground RMSD distributions a priori so it is not necessary <strong>to</strong> compare eachmotif <strong>to</strong> a random or reference set of structures.8.3.2.2 Undirected MiningUndirected mining refers <strong>to</strong> finding common patterns in an unbiased set of structures,where “unbiased” means not chosen based on any common feature or function.In practice, there are <strong>to</strong>o many possible combinations of amino acids instructures <strong>to</strong> consider them all, and the search space must be restricted.Russell performed all-by-all pairwise comparisons among a representative set ofstructures (Russell 1998). The search space was limited <strong>with</strong> distance constraintsand by disregarding nonpolar residues, disulphide-bonded cysteines, and residuesnot well conserved in sequence alignments. To detect cases of convergent evolution,matches between proteins of the same fold were ignored. The process identifiedseveral metal-binding sites and active site patterns, including the catalytic triad.The program TRILOGY also disregards residues that are not well conserved insequence alignments (Bradley et al. 2002). Patterns are required <strong>to</strong> occur in at leastthree different SCOP superfamilies. Triplets of potentially matching residues, includingconservative substitutions, are identified and merged in<strong>to</strong> larger patterns. However,TRILOGY is designed <strong>to</strong> identify sequence-structure patterns, not simply 3D motifs;residue patterns must be similar in sequence spacing and order as well as in 3D.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!