29.07.2013 Views

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4322 74<br />

Tools<br />

for Comparison of Bacterial Genomes<br />

6 Codon Usage Comparisons<br />

Once the genes of a given genome have been def<strong>in</strong>ed, their codon usage can be analyzed. S<strong>in</strong>ce<br />

the genetic code is redundant, with up to 6 codons per am<strong>in</strong>o acid, variable codons are used at<br />

different frequencies. Much of the redundancy <strong>in</strong> the genetic code is due to third base<br />

variation. > Figure 4 displays the am<strong>in</strong>o acid usage for three prokaryotic genomes: Methanosphaera<br />

stadtmanae (27.6% GC), an archaeal methanogen that uses methanol <strong>and</strong> hydrogen to<br />

produce methane; Desulfitobacterium hafniense (47.4% GC), a Firmicute that efficiently<br />

dehalogenates tetrachloroethene <strong>and</strong> polychloroethanes; <strong>and</strong> Anaeromyxobacter dehalogenans<br />

(75% GC). This species, the first myxobacteria to be grown as a pure culture, can use orthosubstituted<br />

mono- <strong>and</strong> dichlor<strong>in</strong>ated phenols. The frequency of each possible codon is plotted<br />

<strong>in</strong> a wheel plot <strong>in</strong> the upper part of the figure, arranged such that their third base is conserved<br />

<strong>in</strong> each quarter. The bias <strong>in</strong> codon usage towards the third position can also be seen <strong>in</strong> the<br />

sequence logo plots <strong>in</strong> the lower part of > Fig. 4. From both graphics it is evident that genomic<br />

GC content highly affects codon use (or the other way round). Based on a genome’s bias <strong>in</strong><br />

codon usage, it is possible to predict its likely environmental niche (Willenbrock et al., 2006).<br />

Moreover, it is known that am<strong>in</strong>o acid usage (not shown here) depends on environment, based<br />

on analysis of metagenomic samples (Musto et al., 2006, Foerstner et al., 2005).<br />

7 Prote<strong>in</strong> Sequence Comparisons<br />

One can compare each <strong>in</strong>dividual gene <strong>in</strong> a given genome by BLAST aga<strong>in</strong>st a set of genomes.<br />

This produces a huge amount of data that can be graphically represented <strong>in</strong> a BLAST Matrix<br />

(B<strong>in</strong>newies et al., 2005, Ussery et al., 2009). A BLAST Matrix is not symmetrical, as the<br />

outcome is determ<strong>in</strong>ed by which genome is used as query sequence. The diagonal of a BLAST<br />

matrix represents a BLASTof a genome aga<strong>in</strong>st itself. The self-match (the gene f<strong>in</strong>d<strong>in</strong>g itself) is<br />

discarded, thus the reported scores reflect <strong>in</strong>ternal homologues present <strong>in</strong> a given genome.<br />

Most of these have been derived from gene duplication <strong>and</strong> are thus paralogs.<br />

When more <strong>in</strong>formation should be visualized a BLAST Atlas is helpful. Such an atlas uses<br />

one genome as a reference aga<strong>in</strong>st which the gene conservation of other genomes is plotted<br />

(Hall<strong>in</strong> <strong>and</strong> Ussery, 2004, Skovgaard et al., 2002). In this case gene location only refers to the<br />

location <strong>in</strong> the reference genome, which of course can be varied <strong>in</strong> multiple BLAST Atlases.<br />

A BLAST Atlas is also a suitable platform to visualize metagenomic data. So far, we have<br />

not dealt with metagenomics extensively, ma<strong>in</strong>ly because this approach very rarely results <strong>in</strong><br />

completely assembled microbiological genomes. But for a BLAST Atlas, that is not a problem,<br />

as one can comb<strong>in</strong>e all the metagenomic DNA <strong>in</strong> one lane, thereby ignor<strong>in</strong>g from which<br />

organism the detected genes orig<strong>in</strong>ated. All obta<strong>in</strong>ed BLAST hits are plotted around a<br />

reference genome. An example of a BLAST Atlas is given <strong>in</strong> > Fig. 5, centered around<br />

Pelotomaculum thermopropionicum, a thermophilic, syntropic Firmicute that can utilize<br />

1-butanol, 1-propanol, 1-pentanol or 1,3-propanediol as a carbon source. Note that despite<br />

the high number of lanes, conserved <strong>and</strong> variable genes can still be easily visually <strong>in</strong>spected.<br />

From compact<strong>in</strong>g a s<strong>in</strong>gle genome <strong>in</strong>to a Genome Atlas, we’ve now moved several levels up<br />

<strong>and</strong> compact multiple genomes <strong>in</strong>to a s<strong>in</strong>gle atlas. In > Fig. 5, the P. thermopropionicum<br />

genome is compared to many species of Clostridia, as well as other bacteria. Unfortunately,<br />

very few BLAST hits were found with the metagenomics samples so there is very little color <strong>in</strong><br />

those three lanes. Compared to well characterized genomes (like E. coli), relatively few hits are

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!