12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Estimating Protein Function Using Protein–Protein Relationships 1213.2.2. The Rosetta Stone MethodThe Rosetta stone fusion sequences can also be used to identify functionallinks between proteins. The method entails looking for two or more proteinsthat appear as a fused protein either in the same genome, or the genome ofsome other organism. The presence of a fusion protein indicates that theindependent proteins are very likely to share a pathway, or are parts of pathwaysthat are interlinked in some way, or even likely to physically interactwith each other.3.2.2.1. APPLICATION OF THE METHODInitial steps of the protocol for identifying functional links based on fusion proteinsare similar to the protocol for generating phylogenetic profiles. The methodrequires a database of completely sequenced reference genomes, and computerprograms to take an input file containing multiple amino acid sequences and comparethem against the database. The BLAST results need to be parsed, or generatedin a form where the attributes such as E-values and start–stop coordinates areretained. Once parsed BLAST output is available, it can be searched forsequences with nonoverlapping regions of similarity for two or more independentproteins. This can be algorithmically described as follows:For any two proteins X and Y in a genome, identify all proteins (R) from aset of completely sequenced genomes (N), sharing similarities with both X andY in distinctly different regions, where:X p≠ Y p≠ R ij; andS(R ij, X p) BEGIN > S(R ij, Y p) END or, S(R ij, Y p) BEGIN > S(R ij, X p) END ; andp ∈ NIn this formulation, S represents the region of similarity spanning all identifiedHSPs between the fusion protein R ifrom genome j (contained in N),and proteins X and Y, from genome p, whereas BEGIN and END denoteamino acid positions of the similarity span on protein R ij. The E-valueassigned to the span is the minimum E-value observed among all HSPs thatconsists of the match, provided all E-values are lower than 10 −5 (see Note 5).This algorithm needs to be coded as a computer program by the user, and setto use parsed BLAST results as the input. The output should ideally containidentifiers of the individual proteins and that of the Rosetta stone sequence,name of the genome in which the fusion sequence was identified, and theBEGIN and END positions associated with the Rosetta stone sequence.The following is an example of results obtained when the protocol wasapplied to the genome of P. falciparum. Amino acid sequences of all 5334

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!