12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

116 Date>hsapiens|gi|20093443 >hsapiens|gi|14556780raw_score: 220 | E-value: 1e-138 | query_start: 1 |query_end: 105 | subject_start: 15 | subject_end:155 | match_length: 105 | identity_percentage: 78 |similarity_percentage: 91 | query_length: 140 | subject_length:244>hsapiens|gi|20093443 >celegans|gi|85444128raw_score: 132 | E-value: 1e-66 | query_start: 22 |query_end: 80 | subject_start: 107 | subject_end:165 | match_length: 58 | identity_percentage: 70 |similarity_percentage: 88 | query_length: 140 | subject_length:111In this illustration, each line represents a BLAST hit to the query in thedatabase of reference genomes. The output is divided into three columns: thefirst column is the identifier for the query, the second is the identifier for thehit (the subject) in the database, and the third describes details of the match,such as E-values and start–stop coordinates. Herein, besides the requiredattributes, raw scores, sequence identities and similarities, and subjectsequence length are also captured. Users are free to experiment with the variousstand-alone parser programs available for free through the Internet, orwrite their own (see Note 2). One advantage of writing a custom parser programis that it proves helpful in getting acquainted with the raw BLASTresults.3.2. Steps Specific to Individual Methods3.2.1. The Phylogenetic Profiling MethodPhylogenetic profiles for each of the query input sequences can be createdusing the parsed BLAST results. In this protocol, transformed BLAST E-valueswill be used to construct the profile vector, rather than representing presence orabsence of the query in a genome using simple binary values of 0 and 1. Thisuse of BLAST E-values in generating profiles results in profile vectors with ahigher resolution, wherein the similarity or the distance between the vectors canbe measured more accurately.3.2.1.1. GENERATING PHYLOGENETIC PROFILES FROM BLAST DATAGenerating profiles involves checking BLAST results for informationabout best matches to the query sequence, from each genome included in thedatabase. The E-value of this best match is retained, transformed, and used inprofile construction. One method of transforming E-values, as described byPellegrini and coworkers (2), uses the following formulation:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!