Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
5.3. New database structure and content 69<br />
ERA [Seshadri et al., 2007], IMG/M [Markowitz et al., 2008] and<br />
MG-RAST [Meyer et al., 2008].<br />
Nevertheless, it is <strong>in</strong>creas<strong>in</strong>gly apparent that the full potential of comparative<br />
genome and metagenome analysis can be achieved only if the<br />
geographic and environmental context of the sequence data is considered<br />
[Field et al., 2008, Field, 2008]. The metadata describ<strong>in</strong>g a<br />
sample’s geographic location and habitat, the details of its process<strong>in</strong>g,<br />
from the time of sampl<strong>in</strong>g to sequenc<strong>in</strong>g and subsequent analyses are<br />
important, e.g. modell<strong>in</strong>g species’ responses to environmental change<br />
or the spread and niche adaptation of bacteria and viruses. This suite<br />
of metadata is collectively referred as contextual data [Kottmann et al.,<br />
2008].<br />
Megx.net is the first database to <strong>in</strong>tegrate curated contextual data<br />
with their respective genes, genomes and metagenomes <strong>in</strong> the mar<strong>in</strong>e<br />
environment [Lombardot et al., 2006]. Now, the extended megx.net<br />
database resource allows post factum retrieval of <strong>in</strong>terpolated environmental<br />
parameters, such as temperature, nitrate, phosphate, etc. for<br />
any location <strong>in</strong> the ocean waters based on profile and remote sens<strong>in</strong>g<br />
data. Furthermore, the content has been significantly updated to <strong>in</strong>clude<br />
prokaryote and mar<strong>in</strong>e phage genomes, metagenomes from the<br />
GOS project [Rusch et al., 2007] and all georeferenced small and large<br />
subunit ribosomal RNA (rRNA) sequences from the SILVA database<br />
project [Pruesse et al., 2007].<br />
The extended megx.net portal is the first resource of its k<strong>in</strong>d to offer<br />
access to this unique comb<strong>in</strong>ation of data, <strong>in</strong>clud<strong>in</strong>g manually curated<br />
habitat descriptors for genomes, metagenomes and marker genes, their<br />
respective contextual data and additionally <strong>in</strong>tegrated environmental<br />
data. See the megx.net onl<strong>in</strong>e video tutorial for a guided <strong>in</strong>troduction<br />
and overview at http://www.megx.net/portal/tutorial.html.<br />
5.3 New database structure and content<br />
The Microbial Ecological Genomics <strong>Data</strong>Base (MegDB), the backbone<br />
of megx.net, is a centralized database based on the PostgreSQL<br />
database management system. The georeferenced data concern<strong>in</strong>g geographic<br />
coord<strong>in</strong>ates and time are managed with the PostGIS extension