29.07.2013 Views

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Tools for Comparison of Bacterial Genomes 74<br />

base composition <strong>in</strong> a Base Atlas). Such specialized atlases are expla<strong>in</strong>ed <strong>in</strong> detail <strong>in</strong> a book that<br />

we recently produced (Ussery et al., 2008).<br />

As can be seen <strong>in</strong> > Fig. 2, the genes <strong>in</strong> this chromosome are strongly favor<strong>in</strong>g one str<strong>and</strong>:<br />

the positive str<strong>and</strong> for the first (right) half <strong>and</strong> the negative str<strong>and</strong> for the second (left) half of<br />

the chromosome. These happen to be the lead<strong>in</strong>g str<strong>and</strong> dur<strong>in</strong>g replication. Replication starts<br />

at the orig<strong>in</strong>, (the 12 o’clock position here), <strong>and</strong> proceeds on either side along the circle with<br />

both a lead<strong>in</strong>g <strong>and</strong> lagg<strong>in</strong>g str<strong>and</strong> until the bubble reaches the term<strong>in</strong>us, at 6 o’clock, <strong>and</strong> the<br />

ends are comb<strong>in</strong>ed. The positive str<strong>and</strong> represented by a genome sequence is the lead<strong>in</strong>g<br />

str<strong>and</strong> but only for the first half up till the term<strong>in</strong>us. Read<strong>in</strong>g across the term<strong>in</strong>us along the<br />

sequence on the same str<strong>and</strong> one enters the lagg<strong>in</strong>g str<strong>and</strong>. Gene preference for the lead<strong>in</strong>g<br />

str<strong>and</strong> is a general feature for Firmicutes <strong>and</strong> for some other bacteria.<br />

In > Fig. 2 the two outward lanes identify some regions with strong structural properties<br />

(for <strong>in</strong>stance the region around 2 o’clock, <strong>in</strong>dicated by a black l<strong>in</strong>e). The observed strong<br />

curvature (blue <strong>in</strong> the outward lane) where the DNA would easily melt (red <strong>in</strong> the second lane)<br />

suggests this region conta<strong>in</strong>s genes that are highly expressed.<br />

There are a number of global repeats, notably <strong>in</strong> the first quarter of the chromosome. Note<br />

that the ribosomal RNA genes (light blue <strong>in</strong> the annotation lane) are located here, as <strong>in</strong>dicated<br />

by the arrows, <strong>and</strong> these are picked up as global repeats, as <strong>in</strong>deed they are repeated genes.<br />

The GC skew lane shows the bias of G’s towards one str<strong>and</strong> or the other, averaged over a<br />

10,000 bp w<strong>in</strong>dow. In contrast to many Firmicutes with a strong GC skew, this genome only<br />

has a weak GC skew (the right half is light blue <strong>and</strong> the left half is light p<strong>in</strong>k). The <strong>in</strong>nermost<br />

circle colors the local AT content when it is more than three st<strong>and</strong>ard deviations distant from<br />

the global average. Note a light red color around the 2 o’clock region: this local deviation <strong>in</strong> AT<br />

content is related to the structural features located here.<br />

The Genome Atlas of the Archaea Methanosarc<strong>in</strong>a acetivorans, shown <strong>in</strong> > Fig. 3, tells a<br />

different story. This strictly anaerobic organism so efficiently produces methane that it is held<br />

responsible for virtually all biogenic methane. It can also oxidate CO to CO 2 (Lessner et al.,<br />

2006). Stra<strong>in</strong> C2A (the type stra<strong>in</strong> of the species) was isolated from a mar<strong>in</strong>e sediment<br />

(Galagan et al., 2002). Its genome is 5.7 Mbp long <strong>and</strong> conta<strong>in</strong>s 42.7% GC. The Genome<br />

Atlas shows that its genes are evenly distributed over the two str<strong>and</strong>s, <strong>and</strong> a GC skew is absent.<br />

Instead, the lower quart of the genome conta<strong>in</strong>s many strong structural features. The genome<br />

only conta<strong>in</strong>s three rRNA gene copies (<strong>in</strong>dicated by arrows) one of which is located on the<br />

negative str<strong>and</strong> (but as discussed above, this is actually the lead<strong>in</strong>g str<strong>and</strong>, as is preferred for<br />

nearly all bacterial rRNA genes). Many other global repeats are visible, notably <strong>in</strong> the region<br />

around 1.2 Mbp, which is strongly curved <strong>and</strong> easily melted, <strong>and</strong> is slightly more AT rich than<br />

the rest of the genome. Here, the important carbon-monoxide dehydrogenase gene locus is<br />

present, as are multiple transposases, which could be an <strong>in</strong>dication of horizontally acquired<br />

DNA. The genome is relatively poorly annotated, with many genes given as ‘‘predicted<br />

prote<strong>in</strong>’’ only, which is not uncommon for archaeal genomes.<br />

In conclusion, a Genome atlas comb<strong>in</strong>es a number of features <strong>in</strong> one s<strong>in</strong>gle figure that<br />

summarizes a very long <strong>and</strong> detailed story about a chromosome or plasmid.<br />

4 Whole Genome Alignment Methods<br />

4319<br />

Another way to compare genomes is based on alignment of nucleotide or am<strong>in</strong>o acid<br />

sequences. Sequence alignment is a common tool to identify similarities, with BLAST, for

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!