29.07.2013 Views

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Comparative</strong> Genomics<br />

the publicly available Vibrio genomes. Output the comm<strong>and</strong> is listed <strong>in</strong> appendix D.2.<br />

List<strong>in</strong>g 2.3: Us<strong>in</strong>g queryGenomes to obta<strong>in</strong> genome meta data.<br />

1 # download client script<br />

2 wget http :// www . cbs . dtu .dk/ws/ GenomeAtlas / examples / querygenomes .pl<br />

3<br />

4 # download XML :: Compile helper script<br />

5 wget http :// www . cbs . dtu .dk/ws/ GenomeAtlas / examples /xml - compile .pl<br />

6<br />

7 # extract AT - content <strong>and</strong> number of genes for all vibrio genomes<br />

8 perl querygenomes .pl - hideMerged - organism vibrio -output<br />

ATCONTENT , NGENES<br />

2.2.3 Tools contigsort <strong>and</strong> contigmap<br />

For some applications <strong>in</strong> analysis of unf<strong>in</strong>ished or partially sequenced genomes, it is desired<br />

to obta<strong>in</strong> approximate coord<strong>in</strong>ates of the contigs with<strong>in</strong> the complete chromosome. To<br />

resolve this the contigsort program was written. It accepts any number of entries (contigs)<br />

<strong>in</strong> one FASTA file together with a backbone sequence <strong>in</strong> one contig <strong>in</strong> a second FASTA file.<br />

The entries of the contig file is then mapped to the backbone sequence us<strong>in</strong>g a nucleotide<br />

BLAST, assum<strong>in</strong>g at least one significant hit. The tool then sorts all contigs based on the<br />

coord<strong>in</strong>ate <strong>in</strong> the backbone of the center-po<strong>in</strong>t of each alignment. Contigs spann<strong>in</strong>g the<br />

orig<strong>in</strong> of circular backbones are automatically split <strong>in</strong> two.<br />

The tool genomemap was written to visualize genome homology between two genomes<br />

sequences. Each genome may consist of one or more contigs <strong>and</strong> all contigs are aligned<br />

us<strong>in</strong>g BLASTN. This tool allow a user to validate the output of the backbone mapp<strong>in</strong>g from<br />

contigsort. The plot generated has similarities to that produced by Artemis Comparison<br />

Tool (ACT) (Rutherford et al., 2000); however the output of genomemap is a vector<br />

graphic file (PostScript) <strong>and</strong> allows for multiple sequence entries with<strong>in</strong> each of the two<br />

compared sequences.<br />

Example: Campylobacter jejuni str. 260.94<br />

The 10 contigs of the currently unpublished sequence of Campylobacter jejuni str. 260.94<br />

(GenBank accession no. AANK01000001-AANK01000010) were downloaded <strong>and</strong> converted<br />

<strong>in</strong>to FASTA format file. The program saco convert is an <strong>in</strong>-house program at <strong>CBS</strong>,<br />

which converts between different sequence formats. In the example provided the Campylobacter<br />

jejuni str. NCTC 11168 (Parkhill et al., 2000) is used as the backbone (see list<strong>in</strong>g<br />

2.4).<br />

List<strong>in</strong>g 2.4: Us<strong>in</strong>g contigsort to map assemblied contigs to a backbone.<br />

1 set path = (˜ pfh/scripts/contigsort ˜pfh/scripts/fetchgbk $path )<br />

2 fetchgbk −a AANK01000001−AANK01000010 > AANK . gbk<br />

3 saco_convert −I genbank −O fasta AANK . gbk > AANK . fsa<br />

4 fetchgbk −a AL111168 > AL111168 . gbk<br />

5 saco_convert −I genbank −O fasta AL111168 . gbk > AL111168 . fsa<br />

6 contigsort −c −i AANK . fsa −b AL111168 . fsa > mapped . fsa<br />

To visualize the result of the contig mapp<strong>in</strong>g the mapped <strong>and</strong> un-mapped contigs were<br />

processed by contigmap. The output from the comparison is a PostScript document (figure<br />

2.1 <strong>and</strong> list<strong>in</strong>g 2.5).<br />

5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!