29.07.2013 Views

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

eplication tends to be more GC rich, <strong>and</strong> the region around the replication term<strong>in</strong>us usually<br />

is more ATrich. AT-rich sequences melt more easily than GC-rich sequences, due <strong>in</strong> part to the<br />

extra hydrogen bond present <strong>in</strong> a GC base pair. Contra-<strong>in</strong>tuitively, this would make the orig<strong>in</strong><br />

of replication the least likely to start replication. However, with<strong>in</strong> the ‘‘large region’’ around<br />

the orig<strong>in</strong> of approximately 5% of the chromosome, there is a short stretch of more AT rich<br />

basepairs, where the replication orig<strong>in</strong> bubble opens up. Second, <strong>and</strong> zoom<strong>in</strong>g <strong>in</strong> at genes, the<br />

average GC content of <strong>in</strong>tergenic regions is generally lower than that of cod<strong>in</strong>g sequences.<br />

These regions will melt more readily, are more curved <strong>and</strong> more rigid than the chromosomal<br />

average, <strong>in</strong> order to enable gene expression (Pedersen et al., 2000, Ussery <strong>and</strong> Hall<strong>in</strong>, 2004).<br />

This is true for nearly all of the bacterial genomes sequenced, regardless of GC content. In order<br />

to calculate relative or local %GC, a w<strong>in</strong>dow has to be def<strong>in</strong>ed (say, <strong>in</strong>vestigat<strong>in</strong>g 100 basepairs)<br />

for which the %GC is calculated. This w<strong>in</strong>dow is then moved along the genome by s<strong>in</strong>glenucleotide<br />

steps, <strong>and</strong> the %GC is scored related to the middle of each w<strong>in</strong>dow. These scores can<br />

then be graphically represented. A web-based tool for this is available at the Genome Atlas<br />

Website 2 <strong>in</strong> which local %GC can be visualized by color codes as discussed below.<br />

3 Visualization of Genomic Data: The Genome Atlas<br />

Genome atlases are circular plots of chromosomes or plasmids (a l<strong>in</strong>ear version is available<br />

when applicable) on which general properties of the DNA molecule are plotted as colors.<br />

Genome atlases are available from our web server 2 for many of the currently sequenced<br />

bacterial genomes. > Figure 2 shows a Genome Atlas for the chromosome of Geobacillus<br />

kaustophilus stra<strong>in</strong> HTA426 (a thermophilic Firmicute that also conta<strong>in</strong>s a plasmid of 4.8 kb).<br />

This isolate was obta<strong>in</strong>ed from a deep sea sediment of the Mariana Trench <strong>in</strong> the Pacific Ocean<br />

(Takami et al., 2004a, b). Its genome is 3.5 Mbp long <strong>and</strong> conta<strong>in</strong>s 52.1% GC. G. kaustophilus<br />

has been suggested to provide a possible solution for paraff<strong>in</strong> deposition problems with oil<br />

production (Sood <strong>and</strong> Lal, 2008). A Genome Atlas maps four different aspects of the<br />

chromosomal DNA sequence <strong>in</strong> various lanes <strong>in</strong> a st<strong>and</strong>ard manner: DNA structural features<br />

are represented <strong>in</strong> the three outer lanes, all cod<strong>in</strong>g sequences are <strong>in</strong>dicated <strong>in</strong> the next lane, two<br />

k<strong>in</strong>ds of repeats are mapped <strong>in</strong> the next two lanes, <strong>and</strong> base composition properties are plotted<br />

<strong>in</strong> the two <strong>in</strong>nermost lanes (Jensen et al., 1999). The scale <strong>in</strong> the center corresponds with the<br />

sequence number<strong>in</strong>g <strong>in</strong> GenBank. The DNA structural features of the three outermost circles<br />

are based on the physical chemical properties of the DNA helix. The annotated genes are given<br />

<strong>in</strong> blue for prote<strong>in</strong>-cod<strong>in</strong>g genes oriented clockwise, <strong>and</strong> red for genes on the other str<strong>and</strong><br />

(counterclockwise). The tRNA <strong>and</strong> rRNA genes have their own color. The clockwise str<strong>and</strong><br />

corresponds with the sequence stored <strong>in</strong> GenBank (genes on the other str<strong>and</strong> are annotated as<br />

‘‘complement’’ <strong>in</strong> there). To identify global repeats (sequences that are repeated somewhere<br />

else on the chromosome) we search for the best match of a 100 bp w<strong>in</strong>dow aga<strong>in</strong>st the entire<br />

chromosome. Search<strong>in</strong>g on the positive str<strong>and</strong> results <strong>in</strong> direct repeats (both sequences run <strong>in</strong><br />

the same direction) whilst search<strong>in</strong>g on the negative str<strong>and</strong> gives <strong>in</strong>verted repeats (the two<br />

repeat units run <strong>in</strong> opposite directions). For most of these general properties summarized <strong>in</strong> a<br />

Genome Atlas (structural properties, repeats, base composition) dedicated atlases are also<br />

available, where more features are given (such as local <strong>and</strong> simple repeats <strong>in</strong> a Repeat Atlas, or<br />

2 http://www.cbs.dtu.dk/services/GenomeAtlas/<br />

Tools for Comparison of Bacterial Genomes 74<br />

4317

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!