Computational tools and Interoperability in Comparative ... - CBS
Computational tools and Interoperability in Comparative ... - CBS
Computational tools and Interoperability in Comparative ... - CBS
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Abstract<br />
The scientific community is witness<strong>in</strong>g an explosion <strong>in</strong> both the number <strong>and</strong> the complexity<br />
of DNA sequenc<strong>in</strong>g projects. As sequenc<strong>in</strong>g equipment becomes more reliable,<br />
faster <strong>and</strong> less expensive, new possibilities of apply<strong>in</strong>g the technology are open<strong>in</strong>g up.<br />
The early genome sequenc<strong>in</strong>g projects, dat<strong>in</strong>g back almost 15 years, presented only <strong>in</strong>dividual<br />
microbial stra<strong>in</strong>s <strong>and</strong> the large efforts <strong>and</strong> scientific achievements at this time<br />
qualified publication <strong>in</strong> high rank<strong>in</strong>g journals. Today however, projects like the Human<br />
Microbiome Project (HMP), Human Gut Microbiome Initiative (HGMI) <strong>and</strong> the Genomic<br />
Encyclopedia of Bacteria <strong>and</strong> Archaea (GEBA) takes sequenc<strong>in</strong>g <strong>in</strong>to a new era, to study<br />
the genomes <strong>and</strong> ecological niches of entire populations consist<strong>in</strong>g of thous<strong>and</strong>s of microorganisms.<br />
These <strong>in</strong>itiatives put a dem<strong>and</strong> for new analysis <strong>tools</strong> to process <strong>and</strong> derive<br />
knowledge from the wealth of genomic <strong>in</strong>formation.<br />
This thesis describes development of new <strong>tools</strong> <strong>and</strong> methods to study these types<br />
of data. When the genome of characterized stra<strong>in</strong>s <strong>and</strong> environmental samples are sequenced,<br />
the ribosomal RNA genes are commonly chosen as a start<strong>in</strong>g po<strong>in</strong>t to describe<br />
the phylogeny <strong>and</strong> diversity. The rRNA genes are often <strong>in</strong>terpreted as an ‘evolutionary<br />
chronometer’ <strong>and</strong> the RNAmmer software was developed as a tool to quickly <strong>and</strong><br />
consistently identify the rRNA genes allow<strong>in</strong>g for large-scale analysis of phylogeny of complex<br />
data sets. RNAmmer solved previous issues of the gene boundary accuracy, that<br />
is observed when us<strong>in</strong>g BLAST approaches to mapp<strong>in</strong>g rRNA genes. The possibility to<br />
accurately map the start of rRNA transcripts has allowed the <strong>in</strong>vestigation of promotor<br />
structures of these highly expressed operons <strong>and</strong> a promotor analysis <strong>in</strong> E. coli K12 is<br />
demonstrated by apply<strong>in</strong>g a mathematical model of the energetics <strong>in</strong>volved <strong>in</strong> DNA helix<br />
open<strong>in</strong>g.<br />
But a s<strong>in</strong>gle gene, such as the 16S rRNA, can <strong>in</strong> nature not describe the phenotype<br />
nor the full cod<strong>in</strong>g potential of an organism. This thesis describes the development of<br />
the BLASTatlas tool, which is a visualization tool to overview similarity <strong>and</strong> differences<br />
between any number of genomes, metagenomic samples or sequence databases from the<br />
viewpo<strong>in</strong>t of a reference genome. This software has proved to be a powerful tool to study<br />
the localization <strong>and</strong> ga<strong>in</strong>/loss of gene clusters, such as pathogenicity isl<strong>and</strong>s <strong>in</strong> virulent<br />
organisms. The tool has been used <strong>in</strong> several research projects <strong>and</strong> collaborations <strong>and</strong><br />
was described as a cover article <strong>in</strong> Molecular BioSystems <strong>in</strong> 2008, <strong>and</strong> highlighted <strong>in</strong> the<br />
journal Chemical Biology. Despite the usefulness of this tool, it became obvious that a web<br />
based version, more “biologist friendly” with zoom<strong>in</strong>g capability, was needed. This lead<br />
to the GeneWiz browser, which was developed <strong>in</strong> a jo<strong>in</strong>t effort with the IT staff at <strong>CBS</strong>.<br />
The tool enables the user to <strong>in</strong>teractively zoom from a global chromosomal scale down<br />
the nucletide, while ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g the overview of all data be<strong>in</strong>g presented <strong>in</strong> the plot. It<br />
features disproportional zoom<strong>in</strong>g as known from google maps. At the time of writ<strong>in</strong>g this<br />
iii