11.03.2014 Views

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Thesis abstract<br />

Deoxyribonucleic acid (DNA) is the primary structure that carries the genetic<br />

<strong>in</strong>formation of organisms <strong>in</strong> genomes. The <strong>in</strong>troduction of the first<br />

DNA sequenc<strong>in</strong>g methods <strong>in</strong> 1977 marked a major breakthrough <strong>in</strong> life sciences.<br />

Today, these methods are widely applied and grant <strong>in</strong>sight <strong>in</strong>to the<br />

’bluepr<strong>in</strong>ts’ of organisms from all doma<strong>in</strong>s of life.<br />

The analysis of environmental <strong>microbial</strong> sequence data is becom<strong>in</strong>g <strong>in</strong>creas<strong>in</strong>gly<br />

important <strong>in</strong> times of global climate change, because microbes are<br />

central catalysts <strong>in</strong> nutrient cycles such as the carbon cycle that profoundly<br />

affects Earth’s climate. Microbes perform almost all metabolic processes<br />

that are thermodynamically possible.<br />

DNA sequenc<strong>in</strong>g is carried out around the globe and the result<strong>in</strong>g data<br />

is submitted to the public repositories of the International Nucleotide Sequence<br />

<strong>Data</strong>base Collaboration (INSDC). <strong>Data</strong> <strong>in</strong> the INSDC is accumulat<strong>in</strong>g<br />

exponentially. This trend shows the need for efficient data process<strong>in</strong>g<br />

strategies <strong>in</strong> order to ga<strong>in</strong> knowledge out of this ever <strong>in</strong>creas<strong>in</strong>g amount of<br />

sequence data.<br />

For this, it is important to annotate sequence data with as much contextual<br />

data as possible. Contextual data are data about the environmental<br />

context and the process<strong>in</strong>g steps that were applied. These can range from<br />

data about the geographic location, sampl<strong>in</strong>g time, habitat, or about experimental<br />

procedures used to obta<strong>in</strong> the sequences up to video data recorded<br />

dur<strong>in</strong>g sampl<strong>in</strong>g. Especially data about the geographic location (x, y, z)<br />

and the po<strong>in</strong>t <strong>in</strong> time (t), when samples are taken from the environment<br />

are essential. Comparability and <strong>in</strong>terpretability are preserved. Ample<br />

analysis approaches become possible, when contextual and sequence data<br />

are <strong>in</strong>tegrated.<br />

In this doctoral thesis, data <strong><strong>in</strong>tegration</strong> is promoted <strong>in</strong> three ways: Firstly,<br />

through the development of contextual data capture, submission and <strong><strong>in</strong>tegration</strong><br />

tools. Secondly, through the development of standards for contextual<br />

data and thirdly, through demonstration of <strong>in</strong> silico hypothesis generation<br />

for a large metagenomic data set.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!