12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

10 Integrated Servers for <strong>Structure</strong>-Informed <strong>Function</strong> Prediction 265of relevance. ProFunc does this by comparing the environment around the templateresidues in their parent structure <strong>with</strong> the environment around the residuesthat were matched. Residues <strong>with</strong>in 10 Å of the template’s geometrical centre inboth structures are paired off according <strong>to</strong> their degree of similarity and overlap.Where alternative pairings are possible an optimization procedure is applied <strong>to</strong>maximize the numbers of paired identical or similar residues in equivalent 3Dpositions. The number of paired residues gives a crude measure of the local similarityof the matched sites in the two proteins (Figs. 10.6b and 10.7b). However, thiscrude measure still lets through <strong>to</strong>o many false positives. Therefore the measurethat is actually used takes in<strong>to</strong> account the relative positions of the paired residuesin their respective amino acid sequences. If the paired residues appear inthe same order in both sequences then the likelihood of the sequences beinghomologues is high.To see why this is so, consider two sequences descended from a common ances<strong>to</strong>rprotein which have diverged so much that their relationship cannot be detectedby sequence methods. However, if both have retained the same function, then theregion that will have changed least is likely <strong>to</strong> be the active site. Any significantchange here will have altered the function. The net result will be that the highestlevel of similarity between the two proteins will be among the residues in the vicinityof the active site. These residues will be close in 3D, but may be scattered alongthe lengths of the two sequences. That is why the similarity can detected in 3D, butmay be virtually impossible <strong>to</strong> pick up from comparison of the sequences.Figure 10.7c provides an illustration of this. It shows a sequence alignmentbetween 2fck and its <strong>to</strong>p reverse template hit, 1s7f. The alignment has been drivenby the residues determined <strong>to</strong> be equivalent in the local matching proceduredescribed above. The residues are marked by the double dots between thesequences. (The single dots correspond <strong>to</strong> residues that have lost their 3D-equivalentpartners in the alignment). One can see that the paired residues, which lie in acompact region in 3D, are spread out across nearly the full length of bothsequences.More interestingly, while the whole alignment gives a sequence identity of24.7% between the two proteins, 16 of the 44 residues <strong>with</strong>in 10 Å of the templatecentre are identical, giving a local sequence identity of 36.4%. As this region corresponds<strong>to</strong> a significant part of the coA binding site in the 1s7f structure it providesstrong structural evidence that 2fck also binds coA. It also covers part of the putativesubstrate binding site, but not enough <strong>to</strong> suggest the substrates of both proteinsare the same nor, indeed, that they perform the same function.In addition <strong>to</strong> the local similarity score, various other statistics are quoted byProFunc. One of these is an estimated E-value associated <strong>with</strong> the score. For thereverse templates these are calculated from the distribution of all scores obtained ina given search using the same procedure that FASTA uses for computing its E-values(Pearson 1998). For the other template searches the E-values are calculated usingpre-computed parameters. The hits are ranked by E-value and categorized in<strong>to</strong> fourgroups: certain matches (E < 10 −6 ), probable matches (10 −6 < E < 0.01), possiblematches (0.01 < E < 0.1) and long shots (0.1 < E < 10.0).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!