11.03.2014 Views

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.3. New database structure and content 69<br />

ERA [Seshadri et al., 2007], IMG/M [Markowitz et al., 2008] and<br />

MG-RAST [Meyer et al., 2008].<br />

Nevertheless, it is <strong>in</strong>creas<strong>in</strong>gly apparent that the full potential of comparative<br />

genome and metagenome analysis can be achieved only if the<br />

geographic and environmental context of the sequence data is considered<br />

[Field et al., 2008, Field, 2008]. The metadata describ<strong>in</strong>g a<br />

sample’s geographic location and habitat, the details of its process<strong>in</strong>g,<br />

from the time of sampl<strong>in</strong>g to sequenc<strong>in</strong>g and subsequent analyses are<br />

important, e.g. modell<strong>in</strong>g species’ responses to environmental change<br />

or the spread and niche adaptation of bacteria and viruses. This suite<br />

of metadata is collectively referred as contextual data [Kottmann et al.,<br />

2008].<br />

Megx.net is the first database to <strong>in</strong>tegrate curated contextual data<br />

with their respective genes, genomes and metagenomes <strong>in</strong> the mar<strong>in</strong>e<br />

environment [Lombardot et al., 2006]. Now, the extended megx.net<br />

database resource allows post factum retrieval of <strong>in</strong>terpolated environmental<br />

parameters, such as temperature, nitrate, phosphate, etc. for<br />

any location <strong>in</strong> the ocean waters based on profile and remote sens<strong>in</strong>g<br />

data. Furthermore, the content has been significantly updated to <strong>in</strong>clude<br />

prokaryote and mar<strong>in</strong>e phage genomes, metagenomes from the<br />

GOS project [Rusch et al., 2007] and all georeferenced small and large<br />

subunit ribosomal RNA (rRNA) sequences from the SILVA database<br />

project [Pruesse et al., 2007].<br />

The extended megx.net portal is the first resource of its k<strong>in</strong>d to offer<br />

access to this unique comb<strong>in</strong>ation of data, <strong>in</strong>clud<strong>in</strong>g manually curated<br />

habitat descriptors for genomes, metagenomes and marker genes, their<br />

respective contextual data and additionally <strong>in</strong>tegrated environmental<br />

data. See the megx.net onl<strong>in</strong>e video tutorial for a guided <strong>in</strong>troduction<br />

and overview at http://www.megx.net/portal/tutorial.html.<br />

5.3 New database structure and content<br />

The Microbial Ecological Genomics <strong>Data</strong>Base (MegDB), the backbone<br />

of megx.net, is a centralized database based on the PostgreSQL<br />

database management system. The georeferenced data concern<strong>in</strong>g geographic<br />

coord<strong>in</strong>ates and time are managed with the PostGIS extension

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!