11.03.2014 Views

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

42 3. CD<strong>in</strong>Fusion<br />

sortium (GSC), an <strong>in</strong>ternational consortium, which promotes mechanisms<br />

to standardize the description of genomes and the exchange<br />

of genomic data, to create a series of checklists def<strong>in</strong><strong>in</strong>g the m<strong>in</strong>imal<br />

set of CD that should accompany a sequence submission. The M<strong>in</strong>imum<br />

Information About a (Meta)Genome Sequence (MIGS/MIMS)<br />

checklist [Field et al., 2008] outl<strong>in</strong>es a conceptual structure for extend<strong>in</strong>g<br />

the core <strong>in</strong>formation that has been traditionally captured by the<br />

INSDC (DDBJ/EMBL/GenBank) to describe genomic and metagenomic<br />

sequences. The M<strong>in</strong>imum Information about a MARKer gene<br />

Sequence (MIMARKS) standard complements the MIGS/MIMS specification<br />

by add<strong>in</strong>g two new “report types”, a ”MIMARKS-survey”<br />

and a ”MIMARKS-specimen”, the former be<strong>in</strong>g the checklist for uncultured<br />

diversity marker gene surveys, the latter is designed for marker<br />

gene sequences obta<strong>in</strong>ed from any material identifiable via specimens.<br />

The standards also cover sets of measurements and observations describ<strong>in</strong>g<br />

particular habitats, termed ”environmental packages”. Collectively<br />

the MIGS/MIMS/MIMARKS standards are now called MIxS<br />

(M<strong>in</strong>imum Information about any (x) Sequence) (Yilmaz et al., The<br />

M<strong>in</strong>imum <strong>in</strong>formation about a marker gene sequence (MIMARKS)<br />

and m<strong>in</strong>imum <strong>in</strong>formation about any (x) sequence (MIxS) specifications,<br />

accepted).<br />

Through collaboration with the GSC, the INSDC now offers the structures<br />

to store the data items specified <strong>in</strong> the GSC checklists. This<br />

facilitates an early <strong><strong>in</strong>tegration</strong> of sequence data and CD. However,<br />

specialized tools to allow this <strong><strong>in</strong>tegration</strong> for different user scenarios<br />

are needed.<br />

The European Nucleotide Archive (ENA) provides an on-l<strong>in</strong>e submission<br />

system called Web<strong>in</strong> which conta<strong>in</strong>s prepared web forms for<br />

the submission of GSC compliant data. It shows all fields with descriptions,<br />

explanations and examples and does data validation <strong>in</strong> the<br />

forms (https://www.ebi.ac.uk/embl/genomes/submission/log<strong>in</strong>.jsf, accessed:<br />

16.03.2011). The Investigation Study Assay (ISA) Infrastructure<br />

offers a software suite that produces documents that can be submitted<br />

to the Sequence Read Archive (SRA) repository [Rocca-Serra<br />

et al., 2010]. With the Quantitative Insights Into Microbial Ecology<br />

(QIIME) web application [Caporaso et al., 2010] users can generate<br />

and validate MIMARKS-compliant templates. F<strong>in</strong>ally, MetaBar is a

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!