29.07.2013 Views

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Abstract<br />

The scientific community is witness<strong>in</strong>g an explosion <strong>in</strong> both the number <strong>and</strong> the complexity<br />

of DNA sequenc<strong>in</strong>g projects. As sequenc<strong>in</strong>g equipment becomes more reliable,<br />

faster <strong>and</strong> less expensive, new possibilities of apply<strong>in</strong>g the technology are open<strong>in</strong>g up.<br />

The early genome sequenc<strong>in</strong>g projects, dat<strong>in</strong>g back almost 15 years, presented only <strong>in</strong>dividual<br />

microbial stra<strong>in</strong>s <strong>and</strong> the large efforts <strong>and</strong> scientific achievements at this time<br />

qualified publication <strong>in</strong> high rank<strong>in</strong>g journals. Today however, projects like the Human<br />

Microbiome Project (HMP), Human Gut Microbiome Initiative (HGMI) <strong>and</strong> the Genomic<br />

Encyclopedia of Bacteria <strong>and</strong> Archaea (GEBA) takes sequenc<strong>in</strong>g <strong>in</strong>to a new era, to study<br />

the genomes <strong>and</strong> ecological niches of entire populations consist<strong>in</strong>g of thous<strong>and</strong>s of microorganisms.<br />

These <strong>in</strong>itiatives put a dem<strong>and</strong> for new analysis <strong>tools</strong> to process <strong>and</strong> derive<br />

knowledge from the wealth of genomic <strong>in</strong>formation.<br />

This thesis describes development of new <strong>tools</strong> <strong>and</strong> methods to study these types<br />

of data. When the genome of characterized stra<strong>in</strong>s <strong>and</strong> environmental samples are sequenced,<br />

the ribosomal RNA genes are commonly chosen as a start<strong>in</strong>g po<strong>in</strong>t to describe<br />

the phylogeny <strong>and</strong> diversity. The rRNA genes are often <strong>in</strong>terpreted as an ‘evolutionary<br />

chronometer’ <strong>and</strong> the RNAmmer software was developed as a tool to quickly <strong>and</strong><br />

consistently identify the rRNA genes allow<strong>in</strong>g for large-scale analysis of phylogeny of complex<br />

data sets. RNAmmer solved previous issues of the gene boundary accuracy, that<br />

is observed when us<strong>in</strong>g BLAST approaches to mapp<strong>in</strong>g rRNA genes. The possibility to<br />

accurately map the start of rRNA transcripts has allowed the <strong>in</strong>vestigation of promotor<br />

structures of these highly expressed operons <strong>and</strong> a promotor analysis <strong>in</strong> E. coli K12 is<br />

demonstrated by apply<strong>in</strong>g a mathematical model of the energetics <strong>in</strong>volved <strong>in</strong> DNA helix<br />

open<strong>in</strong>g.<br />

But a s<strong>in</strong>gle gene, such as the 16S rRNA, can <strong>in</strong> nature not describe the phenotype<br />

nor the full cod<strong>in</strong>g potential of an organism. This thesis describes the development of<br />

the BLASTatlas tool, which is a visualization tool to overview similarity <strong>and</strong> differences<br />

between any number of genomes, metagenomic samples or sequence databases from the<br />

viewpo<strong>in</strong>t of a reference genome. This software has proved to be a powerful tool to study<br />

the localization <strong>and</strong> ga<strong>in</strong>/loss of gene clusters, such as pathogenicity isl<strong>and</strong>s <strong>in</strong> virulent<br />

organisms. The tool has been used <strong>in</strong> several research projects <strong>and</strong> collaborations <strong>and</strong><br />

was described as a cover article <strong>in</strong> Molecular BioSystems <strong>in</strong> 2008, <strong>and</strong> highlighted <strong>in</strong> the<br />

journal Chemical Biology. Despite the usefulness of this tool, it became obvious that a web<br />

based version, more “biologist friendly” with zoom<strong>in</strong>g capability, was needed. This lead<br />

to the GeneWiz browser, which was developed <strong>in</strong> a jo<strong>in</strong>t effort with the IT staff at <strong>CBS</strong>.<br />

The tool enables the user to <strong>in</strong>teractively zoom from a global chromosomal scale down<br />

the nucletide, while ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g the overview of all data be<strong>in</strong>g presented <strong>in</strong> the plot. It<br />

features disproportional zoom<strong>in</strong>g as known from google maps. At the time of writ<strong>in</strong>g this<br />

iii

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!