12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Sybil: Multiple Genome Comparison and Visualization 105Fig. 7. Using a MST algorithm to remove redundant matches. Protein cluster imagebefore (left) and after (right) applying the MST filter.polypeptides P1 and P2 such that P1 has 100 amino acids and P2 has only 50,but the two match each other perfectly—over the region of the match—andtherefore the cluster is assigned a percent identity score of 100% and a percentcoverage score of 100%. This is not a completely satisfactory result, so in orderto distinguish this case from one in which the polypeptides in a cluster matchperfectly and are of the same length, one frequently calculates and stores a thirdquantity, namely the ratio of the length of the shortest polypeptide in the clusterto the length of the longest polypeptide in the cluster.14. The default sequence “neighborhood” size is configurable and may also bechanged by clicking on the links (“5 kb,” “10 kb,” and so on) that appear abovethe graphical display on the protein cluster report page (see Fig. 2). The amountof additional sequence to display is calculated by taking the extent of the longestgene in the cluster and then adding the specified neighborhood size (e.g., 5 kb) oneither side of it. For the other (shorter) genes in the cluster slightly more sequencemust be displayed on either side in order to make all of the sequences line up atthe left and right edges of the display (assuming that the ends of the contigs arenot reached before the edge of the display).15. The features are retrieved from the chado comparative database using a standardStructured Query Language (SQL) see http://en.wikipedia.org/wiki/SQL range queryon the chado featureloc.fmin and featureloc.fmax columns. This has producedacceptable performance, but in future one may adopt a binning scheme for fasterretrieval of sequence features within a given range, as is done in GBrowse seehttp://www.bioperl.org/wiki/Lincoln_Stein and http://www.bioperl.org/wiki/Gbrowse(23) and the University of California, Santa Cruz (UCSC) Genome Browser (UCSCGenome Bioinformatics Group) (24).16. The Bioperl features (instances of Bio::SeqFeatureI) that are created are “skeleton”features that contain the coordinates and unique identifiers of the featuresread from the database. A mapping is stored that allows each Bioperl feature to bemapped back to the data that were read from the database.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!