12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

100 Crabtree et al.3.1.5.1. FILTER BLASTP GSPS1. For each pair of proteins in the cluster P1 and P2 find the highest-scoringBLASTP, GSPs, or high-scoring segment pair (HSPs) that align P1 and P2.3.1.5.2. AVERAGE PERCENT IDENTITY SCORE1. Calculate the (unweighted) average percent identity of all the high-scoring GSPsfrom Subheading 3.1.5.1.; this is the cluster’s average percent identity score.3.1.5.3. AVERAGE PERCENT COVERAGE SCORE1. Retrieve all high-scoring BLASTP HSPs/GSPs from Subheading 3.1.5.1. for asingle pair of polypeptides (P1 and P2) in the cluster.2. Create a list of all the intervals on P1 that are aligned to P2 by an HSP/GSP.3. Merge (take the union of) any intervals that overlap until no overlaps remain.4. Sum the lengths of the merged intervals and divide this quantity by the length ofP1. The result should be a number between zero and one.5. Repeat steps 2–4 for P2.6. Repeat steps 1–5 for all pairs of polypeptides in the cluster.7. Compute the average of all the values computed in step 4, multiplying by 100 toobtain a percentage value. This is defined to be the cluster’s average percent coveragescore (see Note 13).3.2. Protein Cluster VisualizationThis section describes how to generate a multiple-genome graphicaldisplay like the one that appears in the Sybil protein cluster report pageshown in Fig. 2. Each individual genomic sequence or genomic sequencefragment that appears in the display is rendered using the Perl packageBio::Graphics::Panel, which is part of the Bioperl (11) toolkit. The techniqueallows several Bio::Graphics::Panels to appear in the same image, with additionalshaded areas used to indicate which genes belong to the same cluster.Given a cluster identifier the algorithm proceeds as follows:1. Retrieve all proteins in the cluster (see Note 4).2. Retrieve the gene models that correspond to the proteins in step 1 and determinetheir respective genomic locations.3. Retrieve all gene models and any other features of interest within a specified vicinity(see Note 14) of the clustered genes (see Note 15).4. Convert all genomic sequence fragments, gene models, and other sequence featuresinto bioPerl objects (see Note 16).5. Create a Bio::Graphics::Panel for each of the genomic sequence fragments toappear in the figure (see Note 17). This is done in a top-to-bottom fashion so thatthe vertical offset of each successive panel can be set so that it does not overlapwith the panels above it (see Note 18). Calling the height() method of a panelforces it to compute the layout of all the features contained within it, but withoutactually drawing any of those features.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!