Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3.3. Results 45<br />
sequence obta<strong>in</strong>ed from seawater. Subsequently the web forms were<br />
Figure 3.2: CD<strong>in</strong>Fusion web user <strong>in</strong>terface. The CD are entered <strong>in</strong>to the auto-generated web<br />
forms. Details about each parameter are accessible with the “more <strong>in</strong>fo” l<strong>in</strong>k. These details<br />
are retrieved us<strong>in</strong>g a web service access<strong>in</strong>g the GSC database and are therefore always up to<br />
date.<br />
filled with all the CD available for this particular sequence (example<br />
Figure 3.2). After generat<strong>in</strong>g and download<strong>in</strong>g the output file, the CD<br />
enriched FASTA was imported <strong>in</strong>to Sequ<strong>in</strong> version 11.00. CD<strong>in</strong>Fusion<br />
<strong>in</strong>serted qualifiers specified by GenBank <strong>in</strong>to the header l<strong>in</strong>e of the<br />
FASTA file. The tool placed the rest of the CD <strong>in</strong>to a tab delimited<br />
structured comment file. This file was loaded <strong>in</strong>to Sequ<strong>in</strong> with the<br />
“Advanced Table Readers” option <strong>in</strong> the “Annotate” menu. The CD<br />
appeared <strong>in</strong> the metadata section between the header and the feature<br />
table section. By select<strong>in</strong>g “Done”, the Sequ<strong>in</strong> file was saved and the<br />
complete submission was prepared. The INSDC database entry for<br />
this submission can be accessed at [Accession number: JF681370].<br />
This use case exemplifies submission scenarios, where a s<strong>in</strong>gle sequence<br />
and its CD are to be submitted to the INSDC databases. S<strong>in</strong>gle sequences<br />
can, for example, be marker genes or genomes that consist of<br />
a s<strong>in</strong>gle sequence or contig.<br />
In the second use case, a permanent draft genome from a Rhodopirellula<br />
baltica stra<strong>in</strong> along with its associated CD was prepared for submission.<br />
After the 6.9 Mb MultiFASTA file was uploaded, the user was