29.07.2013 Views

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Basic Local Alignment Search Tool, the most common (Altschul et al., 1990). However<br />

BLAST is not automatically suitable for large DNA <strong>in</strong>put segments such as complete<br />

genomes. A more suitable program to align sequences <strong>in</strong> the range of megabases is Mummer,<br />

developed at TIGR, of which version 3 is now publicly available (Kurtz et al., 2004). Further,<br />

this method has been recently extended to <strong>in</strong>clude the average nucleotide identity <strong>in</strong> the<br />

conserved core genes of a set of genomes (Deloger et al., 2009). Moreover, graphical representation<br />

of the result<strong>in</strong>g alignment becomes an issue. Specific <strong>tools</strong> have been designed to align<br />

genome sequences <strong>and</strong> visualize such events. The Artemis Comparison Tool (ACT) is worth<br />

mention<strong>in</strong>g of which two versions are available: a downloadable version to be used on a local<br />

computer (Carver et al., 2005) <strong>and</strong> a web-based version with pre-computed comparisons<br />

between several hundred bacterial genomes. 3 BLAST results of entire bacterial chromosomes<br />

aga<strong>in</strong>st each other have also been used to construct phylogenetic trees (Henz et al., 2005). Blast<br />

comparisons will be treated <strong>in</strong> Section 7 of this chapter.<br />

5 Compar<strong>in</strong>g the Cod<strong>in</strong>g Fraction of Genomes<br />

The typical cod<strong>in</strong>g density for a bacterial genome is about 90%, rang<strong>in</strong>g from 95%<br />

for Pelagibacter ubique (an alpha-proteal mar<strong>in</strong>e bacterium that counts to the most numerous<br />

bacteria <strong>in</strong> the world) (Giovannoni et al., 2005) to around 75% for M. acetivorans.<br />

Intracellular bacteria can have a cod<strong>in</strong>g density as low as 50%. This means the majority<br />

of bacterial DNA codes for genes, which mostly are not spliced so that <strong>in</strong>trons are absent<br />

(with very few exceptions). However, not every open read<strong>in</strong>g frame is a gene, <strong>and</strong> it<br />

appears that many bacterial genomes are over-annotated, predict<strong>in</strong>g 10–15% more genes<br />

than are real (Skovgaard et al., 2001). These over-annotated genes are frequently short<br />

open read<strong>in</strong>g frames. In addition, genes can be missed <strong>in</strong> the annotation. A frequent mistake<br />

is that genes are annotated on the wrong str<strong>and</strong>, which can happen if the read<strong>in</strong>g frame is<br />

open <strong>in</strong> either direction. The <strong>in</strong>tergenic regions separat<strong>in</strong>g genes regulate transcription,<br />

<strong>and</strong> <strong>in</strong> <strong>in</strong>tracellular bacteria frequently conta<strong>in</strong> pseudogenes or repeats. Genes not cod<strong>in</strong>g<br />

for prote<strong>in</strong>s <strong>in</strong>clude tRNA <strong>and</strong> rRNA genes, <strong>and</strong> some parts of <strong>in</strong>tergenic regions can<br />

be transcribed <strong>in</strong>to stable RNA that are transcribed but do not code for prote<strong>in</strong>s. E. coli<br />

conta<strong>in</strong>s several hundred small non-cod<strong>in</strong>g RNA genes (ncRNA) (Chen et al., 2002) that<br />

can act as regulators (Gottesman, 2005). Their role <strong>in</strong> environmental bacteria is virtually<br />

unexplored.<br />

Although tRNA <strong>and</strong> rRNA genes are essential to life, they are sometimes missed <strong>in</strong> the<br />

annotation of a genome, a rather embarrass<strong>in</strong>g omission, or occasionally annotated on<br />

the wrong str<strong>and</strong> (Lagesen et al., 2007). The number <strong>and</strong> location of rRNA operons <strong>in</strong> a<br />

genome can say someth<strong>in</strong>g about an organism. It appears that organisms with short doubl<strong>in</strong>g<br />

times have larger numbers of rRNA <strong>and</strong> tRNA genes. Compar<strong>in</strong>g > Figs. 2 <strong>and</strong> 3 it is<br />

likely that G. kaustrophilus with 9 rRNA copies, nearly all located close to the orig<strong>in</strong> of<br />

replication (which boosts expression dur<strong>in</strong>g replication as their copy number <strong>in</strong>creases) can<br />

divide more quickly than M. acetivorans which only has three copies. Some really fast-grow<strong>in</strong>g<br />

bacteria can have 14 or more rRNA copies, as can be viewed from our list of genomes. 4<br />

3 http://www.webact.org/WebACT/home<br />

4 www.cbs.dtu.dk/services/GenomeAtlas/<br />

Tools for Comparison of Bacterial Genomes 74<br />

4321

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!