Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
22 2. MetaBar<br />
is given a unique identifier and at any stage the sheets can be uploaded<br />
to the MetaBar database server. To label samples, identifiers<br />
can be pr<strong>in</strong>ted as barcodes. An <strong>in</strong>tuitive web <strong>in</strong>terface provides quick<br />
access to the contextual data <strong>in</strong> the MetaBar database as well as user<br />
and project management capabilities. Export functions facilitate contextual<br />
and sequence data submission to the International Nucleotide<br />
Sequence <strong>Data</strong>base Collaboration (INSDC), compris<strong>in</strong>g of the DNA<br />
<strong>Data</strong>Base of Japan (DDBJ), the European Molecular Biology Laboratory<br />
database (EMBL) and GenBank.<br />
MetaBar requests and stores<br />
contextual data <strong>in</strong> compliance to the Genomic Standards Consortium<br />
specifications.<br />
The MetaBar open source code base for local <strong>in</strong>stallation<br />
is available under the GNU General Public License version 3<br />
(GNU GPL3).<br />
Conclusion: The MetaBar software supports the typical workflow<br />
from data acquisition and field-sampl<strong>in</strong>g to contextual data enriched<br />
sequence submission to an INSDC database.<br />
The <strong><strong>in</strong>tegration</strong> with<br />
the megx.net mar<strong>in</strong>e Ecological Genomics database and portal facilitates<br />
georeferenced data <strong><strong>in</strong>tegration</strong> and metadata-based comparisons<br />
of sampl<strong>in</strong>g sites as well as <strong>in</strong>teractive data visualization.<br />
The ample<br />
export functionalities and the INSDC submission support enable<br />
exchange of data across discipl<strong>in</strong>es and safeguard<strong>in</strong>g contextual data.<br />
2.2 Background<br />
The technological advancement <strong>in</strong> molecular biology facilitates <strong>in</strong>vestigations<br />
of biodiversity and functions on a temporal and geospatial<br />
scale. Improved sampl<strong>in</strong>g and laboratory methods, together with fast<br />
and affordable sequenc<strong>in</strong>g technologies [Hall, 2007], provide the framework<br />
to create a network of data po<strong>in</strong>ts capable to answer basic ecological<br />
questions such as: ‘Who is out there?’ and ‘What are these organisms<br />
do<strong>in</strong>g?’ To shed light on the complex <strong>in</strong>terplay, adaptation and<br />
survival mechanisms of organisms <strong>in</strong> times of global change, contextual<br />
data describ<strong>in</strong>g the surround<strong>in</strong>g environment of sampl<strong>in</strong>g locations are<br />
of crucial importance [Field et al., 2008]. At the very least, the latitude<br />
and longitude (x, y),the depth/altitude (z) <strong>in</strong> relation to sea level, and<br />
the sampl<strong>in</strong>g date and time (t) must be provided to allow anchor<strong>in</strong>g