Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
42 3. CD<strong>in</strong>Fusion<br />
sortium (GSC), an <strong>in</strong>ternational consortium, which promotes mechanisms<br />
to standardize the description of genomes and the exchange<br />
of genomic data, to create a series of checklists def<strong>in</strong><strong>in</strong>g the m<strong>in</strong>imal<br />
set of CD that should accompany a sequence submission. The M<strong>in</strong>imum<br />
Information About a (Meta)Genome Sequence (MIGS/MIMS)<br />
checklist [Field et al., 2008] outl<strong>in</strong>es a conceptual structure for extend<strong>in</strong>g<br />
the core <strong>in</strong>formation that has been traditionally captured by the<br />
INSDC (DDBJ/EMBL/GenBank) to describe genomic and metagenomic<br />
sequences. The M<strong>in</strong>imum Information about a MARKer gene<br />
Sequence (MIMARKS) standard complements the MIGS/MIMS specification<br />
by add<strong>in</strong>g two new “report types”, a ”MIMARKS-survey”<br />
and a ”MIMARKS-specimen”, the former be<strong>in</strong>g the checklist for uncultured<br />
diversity marker gene surveys, the latter is designed for marker<br />
gene sequences obta<strong>in</strong>ed from any material identifiable via specimens.<br />
The standards also cover sets of measurements and observations describ<strong>in</strong>g<br />
particular habitats, termed ”environmental packages”. Collectively<br />
the MIGS/MIMS/MIMARKS standards are now called MIxS<br />
(M<strong>in</strong>imum Information about any (x) Sequence) (Yilmaz et al., The<br />
M<strong>in</strong>imum <strong>in</strong>formation about a marker gene sequence (MIMARKS)<br />
and m<strong>in</strong>imum <strong>in</strong>formation about any (x) sequence (MIxS) specifications,<br />
accepted).<br />
Through collaboration with the GSC, the INSDC now offers the structures<br />
to store the data items specified <strong>in</strong> the GSC checklists. This<br />
facilitates an early <strong><strong>in</strong>tegration</strong> of sequence data and CD. However,<br />
specialized tools to allow this <strong><strong>in</strong>tegration</strong> for different user scenarios<br />
are needed.<br />
The European Nucleotide Archive (ENA) provides an on-l<strong>in</strong>e submission<br />
system called Web<strong>in</strong> which conta<strong>in</strong>s prepared web forms for<br />
the submission of GSC compliant data. It shows all fields with descriptions,<br />
explanations and examples and does data validation <strong>in</strong> the<br />
forms (https://www.ebi.ac.uk/embl/genomes/submission/log<strong>in</strong>.jsf, accessed:<br />
16.03.2011). The Investigation Study Assay (ISA) Infrastructure<br />
offers a software suite that produces documents that can be submitted<br />
to the Sequence Read Archive (SRA) repository [Rocca-Serra<br />
et al., 2010]. With the Quantitative Insights Into Microbial Ecology<br />
(QIIME) web application [Caporaso et al., 2010] users can generate<br />
and validate MIMARKS-compliant templates. F<strong>in</strong>ally, MetaBar is a