12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Estimating Protein Function Using Protein–Protein Relationships 119and a threshold needs to be adopted for discarding false-positives and functionallinkages that conflict with known biological facts. Unfortunately, it is difficult todevise a cutoff that is useful for confidently describing both profile similarity as wellas biological validity of the linkages. One way of obtaining a primary cutoff valuerelies on the use of shuffled profiles; mutual information scores between normalprofiles that match or fall below the highest score observed when comparing shuffledprofiles can be discarded. Certainly, other techniques can also be imagined,such as using a set of known linkages to derive a true-positive to false-positive ratio,which can then be used as a threshold.Once a reasonable set of matching profiles is obtained, annotations of theincluded proteins can be searched for overrepresentation of a particular function.Overrepresented annotations reveal functional links to particular pathways,suggesting a putative role for the query protein, especially if the queryprotein in uncharacterized. As a test case, the profile of the P. falciparumprotein PFB0445c was generated and compared with profiles of all knownP. falciparum proteins. The results capture functional links between PFB0445cand other helicases in the parasite genome:Query PFB0445c (helicase, putative)0.70 PF10_0309 (hypothetical protein)0.69 MAL6P1.119 (DEAD/DEAH box ATP-dependent RNA helicase, putative)0.61 MAL7P1.113 (DEAD box helicase, putative)0.58 PF14_0436 (helicase, truncated, putative)0.57 PFE0215w (ATP-dependent helicase, putative)In this example, mutual information scores in the left column indicate confidencein the functional links; the greater the mutual information values, thegreater the confidence in the predicted linkages. Comparison against the Proteinfamilies (Pfam) database (http://www.sanger.ac.uk/Software/Pfam/) revealsthat the hypothetical protein PF10_0309 included in the results also containshelicase domains, demonstrating that the method captures biologically validfunctional links. Profile data used for this example is available for downloadfrom the plasmoMAP website (http://cbil.upenn.edu/plasmoMAP/) (8). A scoreof 0.559, based on scores derived from a comparison of permuted profiles wasused as the cutoff in this example.As described previously, the input query set can be expanded to include theentire protein complement of any given genome. After profiles are constructedfor all proteins, an all vs all comparison of profile similarity reveals functionallinkages on a local and genome-wide scale. This is highly useful in understandingrelationships between genes, and in some cases, has the ability to reveal newsystems and pathways, especially if a majority of the components involved areof unknown function (5).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!