Computational tools and Interoperability in Comparative ... - CBS
Computational tools and Interoperability in Comparative ... - CBS
Computational tools and Interoperability in Comparative ... - CBS
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
4322 74<br />
Tools<br />
for Comparison of Bacterial Genomes<br />
6 Codon Usage Comparisons<br />
Once the genes of a given genome have been def<strong>in</strong>ed, their codon usage can be analyzed. S<strong>in</strong>ce<br />
the genetic code is redundant, with up to 6 codons per am<strong>in</strong>o acid, variable codons are used at<br />
different frequencies. Much of the redundancy <strong>in</strong> the genetic code is due to third base<br />
variation. > Figure 4 displays the am<strong>in</strong>o acid usage for three prokaryotic genomes: Methanosphaera<br />
stadtmanae (27.6% GC), an archaeal methanogen that uses methanol <strong>and</strong> hydrogen to<br />
produce methane; Desulfitobacterium hafniense (47.4% GC), a Firmicute that efficiently<br />
dehalogenates tetrachloroethene <strong>and</strong> polychloroethanes; <strong>and</strong> Anaeromyxobacter dehalogenans<br />
(75% GC). This species, the first myxobacteria to be grown as a pure culture, can use orthosubstituted<br />
mono- <strong>and</strong> dichlor<strong>in</strong>ated phenols. The frequency of each possible codon is plotted<br />
<strong>in</strong> a wheel plot <strong>in</strong> the upper part of the figure, arranged such that their third base is conserved<br />
<strong>in</strong> each quarter. The bias <strong>in</strong> codon usage towards the third position can also be seen <strong>in</strong> the<br />
sequence logo plots <strong>in</strong> the lower part of > Fig. 4. From both graphics it is evident that genomic<br />
GC content highly affects codon use (or the other way round). Based on a genome’s bias <strong>in</strong><br />
codon usage, it is possible to predict its likely environmental niche (Willenbrock et al., 2006).<br />
Moreover, it is known that am<strong>in</strong>o acid usage (not shown here) depends on environment, based<br />
on analysis of metagenomic samples (Musto et al., 2006, Foerstner et al., 2005).<br />
7 Prote<strong>in</strong> Sequence Comparisons<br />
One can compare each <strong>in</strong>dividual gene <strong>in</strong> a given genome by BLAST aga<strong>in</strong>st a set of genomes.<br />
This produces a huge amount of data that can be graphically represented <strong>in</strong> a BLAST Matrix<br />
(B<strong>in</strong>newies et al., 2005, Ussery et al., 2009). A BLAST Matrix is not symmetrical, as the<br />
outcome is determ<strong>in</strong>ed by which genome is used as query sequence. The diagonal of a BLAST<br />
matrix represents a BLASTof a genome aga<strong>in</strong>st itself. The self-match (the gene f<strong>in</strong>d<strong>in</strong>g itself) is<br />
discarded, thus the reported scores reflect <strong>in</strong>ternal homologues present <strong>in</strong> a given genome.<br />
Most of these have been derived from gene duplication <strong>and</strong> are thus paralogs.<br />
When more <strong>in</strong>formation should be visualized a BLAST Atlas is helpful. Such an atlas uses<br />
one genome as a reference aga<strong>in</strong>st which the gene conservation of other genomes is plotted<br />
(Hall<strong>in</strong> <strong>and</strong> Ussery, 2004, Skovgaard et al., 2002). In this case gene location only refers to the<br />
location <strong>in</strong> the reference genome, which of course can be varied <strong>in</strong> multiple BLAST Atlases.<br />
A BLAST Atlas is also a suitable platform to visualize metagenomic data. So far, we have<br />
not dealt with metagenomics extensively, ma<strong>in</strong>ly because this approach very rarely results <strong>in</strong><br />
completely assembled microbiological genomes. But for a BLAST Atlas, that is not a problem,<br />
as one can comb<strong>in</strong>e all the metagenomic DNA <strong>in</strong> one lane, thereby ignor<strong>in</strong>g from which<br />
organism the detected genes orig<strong>in</strong>ated. All obta<strong>in</strong>ed BLAST hits are plotted around a<br />
reference genome. An example of a BLAST Atlas is given <strong>in</strong> > Fig. 5, centered around<br />
Pelotomaculum thermopropionicum, a thermophilic, syntropic Firmicute that can utilize<br />
1-butanol, 1-propanol, 1-pentanol or 1,3-propanediol as a carbon source. Note that despite<br />
the high number of lanes, conserved <strong>and</strong> variable genes can still be easily visually <strong>in</strong>spected.<br />
From compact<strong>in</strong>g a s<strong>in</strong>gle genome <strong>in</strong>to a Genome Atlas, we’ve now moved several levels up<br />
<strong>and</strong> compact multiple genomes <strong>in</strong>to a s<strong>in</strong>gle atlas. In > Fig. 5, the P. thermopropionicum<br />
genome is compared to many species of Clostridia, as well as other bacteria. Unfortunately,<br />
very few BLAST hits were found with the metagenomics samples so there is very little color <strong>in</strong><br />
those three lanes. Compared to well characterized genomes (like E. coli), relatively few hits are