12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Sybil: Multiple Genome Comparison and Visualization 1016. Once the individual panels have been initialized it is possible to determine theoverall dimensions of the combined image. This information is used to allocate adrawing area (in the form of a Perl GD::Image object) that is large enough to holdall of the panels, arranged vertically as described.7. Generate a complete set of “matching gene pairs.” Two genes are consideredmatching if both have a protein product in the same cluster (JAC or JOC).8. Transform the genes in the figure into a graph, creating a node for each distinct(panel, gene) pair. An edge is drawn between (pA, geneA) and (pB, geneB) foreach matching gene pair (geneA, geneB) identified in step 7. and each panel pA,pB in which geneA and geneB, respectively, appear. Each edge is assigned aweight using the following formula, which depends only on the panels in whichgeneA and geneB appear (see Note 19):edge_weight [(pA, geneA), (pB, geneB)] = [distance (pA, pB)] 2 – 1where distance (pA, pB) = (number of panels between pA and pB) + 1.9. Use any minimum spanning tree (MST) algorithm (18) to select a minimal set ofedges from those calculated in step 8 (see Note 20). These edges represent thegene–gene matches that will be drawn in the figure (see Note 21 and Fig. 7).10. Draw the filtered set of matches into the background of the image (see Note 22),using the boxes() method of Bio::Graphics::Panel to determine the on-screen locationsof the matching gene pairs.11. Draw the individual Bio::Graphics::Panels on top of the previously drawn matchesfrom step 10 (see Note 23).12. Generate a Portable Network Graphics (PNG) or Joint Photographic ExpertsGroup (JPEG) (see Note 24) image suitable for display on a web page (see Fig. 2)using the standard GD::Image methods.4. Notes1. As the protein clustering algorithm uses a bidirectional best hit analysis to computeorthologs, it is important that the respective polypeptide sets be as completeas possible, lest one of the polypeptides not find its true “mate” owing to anincompletely sequenced or annotated genome. The algorithms can and have beenused on partial polypeptide sets, but the limitation of such data sets is that theycannot reliably be used to ask questions about the absence of an ortholog for aparticular gene or protein.2. An automated gene prediction algorithm may be used for this purpose. It is notcritical that all the gene models are completely accurate; indeed, if a sufficientlysimilar and well-annotated genome is included in the analysis then a subsequentcomparative analysis of the gene calls can be used to identify many of the omissionsand discrepancies. To this end, a comparative “structural annotation tool”that allows curators to examine several genomes at once and tag common annotationdiscrepancies for later correction has been developed. The annotation toolalso allows one to manually add or remove proteins to or from any protein cluster,and to create or delete entire clusters.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!