Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
56 4. MIMARKS<br />
Doug Wendel, Owen White, Andrew Whiteley, Andreas Wilke, Jennifer<br />
R Wortman, Tanya Yatsunenko and Frank Oliver Glöckner<br />
Submitted to: nature biotechnology, accepted April 2011<br />
Personal Contribution: Initial talk with the title: “Survey results:<br />
MInimal list of contextual data fields for ENvironmental Sequences<br />
(MIENS)” at the 6th meet<strong>in</strong>g of the GSC at the EBI (H<strong>in</strong>xton, UK)<br />
October 2008, which was the start<strong>in</strong>g po<strong>in</strong>t for the development of this<br />
standard, that was later renamed to MIMARKS. Contributed suggestions<br />
for improvements of the data fields, dur<strong>in</strong>g implementation work<br />
of this standard <strong>in</strong> the tools MetaBar and CD<strong>in</strong>Fusion.<br />
Relevance: Standards development for contextual data.<br />
4.1 Abstract<br />
Here we present a standard developed by the Genomic Standards Consortium<br />
(GSC) for report<strong>in</strong>g marker gene sequences—the m<strong>in</strong>imum<br />
<strong>in</strong>formation about a marker gene sequence (MIMARKS). We also <strong>in</strong>troduce<br />
a system for describ<strong>in</strong>g the environment from which a biological<br />
sample orig<strong>in</strong>ates. The ‘environmental packages’ apply to any<br />
genome sequence of known orig<strong>in</strong> and can be used <strong>in</strong> comb<strong>in</strong>ation<br />
with MIMARKS and other GSC checklists. F<strong>in</strong>ally, to establish a<br />
unified standard for describ<strong>in</strong>g sequence data and to provide a s<strong>in</strong>gle<br />
po<strong>in</strong>t of entry for the scientific community to access and learn about<br />
GSC checklists, we present the m<strong>in</strong>imum <strong>in</strong>formation about any (x)<br />
sequence (MIxS). Adoption of MIxS will enhance our ability to analyze<br />
natural genetic diversity documented by massive DNA sequenc<strong>in</strong>g<br />
efforts from myriad ecosystems <strong>in</strong> our ever-chang<strong>in</strong>g biosphere.<br />
4.2 Introduction<br />
Without specific guidel<strong>in</strong>es, most genomic, metagenomic and marker<br />
gene sequences <strong>in</strong> databases are sparsely annotated with the <strong>in</strong>formation<br />
required to guide data <strong><strong>in</strong>tegration</strong>, comparative studies and knowl-