12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

258 R.A. Laskowski10.2.5 <strong>Protein</strong> InteractionsThe final set of features extracted by ProKnow relate <strong>to</strong> protein-protein interactionstaken from the Database of Interacting <strong>Protein</strong>s (DIP) (Xenarios et al. 2002) andfunctional annotations from the Prolinks Database (Bowers et al. 2004). Anysequence matched by the PSI-BLAST search can return a functional linkage ifpresent in either DIP or Prolinks.10.2.6 Combining the PredictionsOnce all processes have completed, the functions (i.e. GO terms) associated <strong>with</strong> anyextracted features that reoccur are combined using Bayes’ Theorem weighting. Thisprovides an estimate of the significance of any predicted GO term. Only terms relating<strong>to</strong> molecular function and biological process are considered – i.e. terms relating <strong>to</strong> cellularcomponent are ignored. The significance of any predicted GO term is reflectedby three numbers. The first is the Bayesian weight which represents the probability,0.0–1.0, of the predicted GO term being correct. The second is the evidence rank andrelates <strong>to</strong> how reliable a particular GO assignment is deemed <strong>to</strong> be in the first place.GO assignments come from various sources: inferred by the cura<strong>to</strong>r, inferred fromdirect assay, inferred from sequence or structural similarity, and so on. These have arange of reliabilities, the most reliable being any that have direct experimental evidence<strong>to</strong> support them. The source of the annotation is recorded by the evidence codein the GO data. In ProKnow, each type of evidence code is assigned a rank <strong>to</strong> quantifyits reliability, and the ranks from several predictions are averaged <strong>to</strong> give the evidencerank. The third measure of significance is the clue count which is the number ofweights used <strong>to</strong> calculate the Bayesian weight and is related <strong>to</strong> how many of theProKnow sequence and structure methods contributed <strong>to</strong> a given GO prediction.10.2.7 Prediction SuccessFigure 10.3 shows some of the output on our example structure, 2fck. The Dali foldmatches were almost exclusively <strong>to</strong> acetyltransferases. The UniProt BLASTsearches found a number of strong sequence matches <strong>to</strong> acetyltransferases. TheRIGOR search threw in a few red herrings <strong>with</strong> matches <strong>to</strong> fibroblast growth fac<strong>to</strong>rs,a lipid-binding protein called lipovitellin, and an integrase. More red herrings werecontributed by the PROSITE hits, all <strong>to</strong> short motifs, two being phosphorylationsites and one a myris<strong>to</strong>ylation site (all are annotated on the PROSITE web site <strong>with</strong>the comment: “This entry can, in some cases, be ignored by a program (because itis <strong>to</strong>o unspecific)”). The DIP search returned nothing. Nevertheless, the overwhelminglystrongest prediction was the one that appears <strong>to</strong> be the correct one; namely, thatthe protein is an acetyltransferase. So in this case the overall prediction looks right.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!