12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Sybil: Multiple Genome Comparison and Visualization 95Fig. 1. A six-way genome comparison generated using one of the Sybil tools availableat http://www.tigr.org/sybil/rcd. The genes in the reference sequence (bottom row) arecolor-coded according to their position in the genome. Those in all the other sequencesare assigned the color of their orthologs in the reference sequence (or are left uncoloredif they have none). Therefore, the figure provides a very high-level view of deletions,insertions, and rearrangements of any number of sequences compared with a fixed reference.For more examples of this type of figure see (29) (Fig. 3) and (30) (Fig. 2).described in Subheading 3.1. Therefore, in that section, an attempt is madeto provide a largely implementation-neutral description of the technique,relegating any comments on specific implementation choices and strategiesto Subheading 4. On the other hand, Subheading 3.2. describes a techniquethat would take significantly longer to implement without using Bioperl, andwhich might be of general interest in its own right. Therefore, in that section,one pays closer attention to the specific technical details that must be observedin order to interoperate with the Bio::Graphics package.2. Materials2.1. Protein Clustering1. Genome sequences: two or more sequenced genomes, preferably in a finished ornearly finished state (see Note 1).2. Gene models/predictions: a complete set of gene models for each of the sequencedgenomes (see Note 2). At minimum each gene model should consist of a set ofprotein-coding exon locations, plus the translation start and stop positions if eitherthe 5′- or 3′-exon contains untranslated sequence, i.e., the same information thatis typically encoded in a GenBank (Protein-coding sequence) CDS feature.3. Polypeptides: a polypeptide sequence for each of the protein-coding genes in step 2.If the polypeptide sequences are not specified explicitly then they can be computedfrom the gene model information supplied in this section.2.2. Protein Cluster Visualization1. A set of protein clusters in which no protein is a member of more than one cluster(see Note 3).2. A database that contains (at least) the protein clusters in addition to the genomesequence data, gene models, and polypeptides for each input genome fromSubheading 2.1.1. (see Note 4).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!