Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Thesis abstract<br />
Deoxyribonucleic acid (DNA) is the primary structure that carries the genetic<br />
<strong>in</strong>formation of organisms <strong>in</strong> genomes. The <strong>in</strong>troduction of the first<br />
DNA sequenc<strong>in</strong>g methods <strong>in</strong> 1977 marked a major breakthrough <strong>in</strong> life sciences.<br />
Today, these methods are widely applied and grant <strong>in</strong>sight <strong>in</strong>to the<br />
’bluepr<strong>in</strong>ts’ of organisms from all doma<strong>in</strong>s of life.<br />
The analysis of environmental <strong>microbial</strong> sequence data is becom<strong>in</strong>g <strong>in</strong>creas<strong>in</strong>gly<br />
important <strong>in</strong> times of global climate change, because microbes are<br />
central catalysts <strong>in</strong> nutrient cycles such as the carbon cycle that profoundly<br />
affects Earth’s climate. Microbes perform almost all metabolic processes<br />
that are thermodynamically possible.<br />
DNA sequenc<strong>in</strong>g is carried out around the globe and the result<strong>in</strong>g data<br />
is submitted to the public repositories of the International Nucleotide Sequence<br />
<strong>Data</strong>base Collaboration (INSDC). <strong>Data</strong> <strong>in</strong> the INSDC is accumulat<strong>in</strong>g<br />
exponentially. This trend shows the need for efficient data process<strong>in</strong>g<br />
strategies <strong>in</strong> order to ga<strong>in</strong> knowledge out of this ever <strong>in</strong>creas<strong>in</strong>g amount of<br />
sequence data.<br />
For this, it is important to annotate sequence data with as much contextual<br />
data as possible. Contextual data are data about the environmental<br />
context and the process<strong>in</strong>g steps that were applied. These can range from<br />
data about the geographic location, sampl<strong>in</strong>g time, habitat, or about experimental<br />
procedures used to obta<strong>in</strong> the sequences up to video data recorded<br />
dur<strong>in</strong>g sampl<strong>in</strong>g. Especially data about the geographic location (x, y, z)<br />
and the po<strong>in</strong>t <strong>in</strong> time (t), when samples are taken from the environment<br />
are essential. Comparability and <strong>in</strong>terpretability are preserved. Ample<br />
analysis approaches become possible, when contextual and sequence data<br />
are <strong>in</strong>tegrated.<br />
In this doctoral thesis, data <strong><strong>in</strong>tegration</strong> is promoted <strong>in</strong> three ways: Firstly,<br />
through the development of contextual data capture, submission and <strong><strong>in</strong>tegration</strong><br />
tools. Secondly, through the development of standards for contextual<br />
data and thirdly, through demonstration of <strong>in</strong> silico hypothesis generation<br />
for a large metagenomic data set.