11.03.2014 Views

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

50 3. CD<strong>in</strong>Fusion<br />

able separately and is also released under the GNU LGPL 3 license.<br />

It may also be used for other projects.<br />

If only one sequence is detected <strong>in</strong> the FASTA upload, the control<br />

flow will be directed towards the 2a GSC SELECTED 1to1 JSP (use<br />

case 1 <strong>in</strong> the Results section), shown <strong>in</strong> Figure 3. If the user opts to<br />

annotate all sequences of a MultiFASTA file (Figure 3.4) with either<br />

one CD set or many CD sets, the control flow will be directed either<br />

to the 3b CD INPUT 1tom JSP (use case 2 <strong>in</strong> the Results section) or<br />

to the 3b1 CD INPUT ntom JSP (use case 3 <strong>in</strong> the Results section),<br />

respectively.<br />

After the CD have been entered <strong>in</strong>to the web forms, these data may<br />

be downloaded as comma separated value (CSV) files. The CSV<br />

files may serve as local backups and can be edited off-l<strong>in</strong>e and uploaded<br />

to CD<strong>in</strong>Fusion to re-populate the web forms. Each session<br />

concludes with a confirmation step, where users can revisit any previous<br />

step and correct CD <strong>in</strong>put if necessary. This holds true for<br />

all three branches of the workflow (Figure 3.3 and 3.4). If the user<br />

chooses to proceed to the file download, a CD FASTA file and a<br />

structured comment file are generated and can, depend<strong>in</strong>g on their<br />

size, either be imported to Sequ<strong>in</strong> or merged on the command l<strong>in</strong>e<br />

us<strong>in</strong>g tbl2asn (http://www.ncbi.nlm.nih.gov/genbank/tbl2asn2.html, accessed:<br />

30.03.2011) before submission.<br />

Implementation of the GSC checklists <strong>in</strong> CD<strong>in</strong>Fusion<br />

Once the user has uploaded a MultiFASTA file and its contents have<br />

been validated, the data is processed along the data model shown <strong>in</strong><br />

Figure 3.5. For each CD set a CDElement is created that conta<strong>in</strong>s<br />

an object for a “type of report” and an object for an “environmental<br />

package”. These Java classes were auto-generated from the relations<br />

<strong>in</strong> the PostgreSQL GSC database us<strong>in</strong>g the Ibator tool from<br />

the iBatis project (http://ibatis.apache.org). The GSC database is<br />

hosted and ma<strong>in</strong>ta<strong>in</strong>ed at the Max Planck Institute for Mar<strong>in</strong>e Microbiology,<br />

Bremen, Germany. The Java classes cover the MIGS, MIMS<br />

and MIMARKS (MIxS) specifications. The GSC plans to ref<strong>in</strong>e these<br />

standards annually. With every new version of the standards the Java<br />

classes can easily be updated us<strong>in</strong>g the Ibator tool.<br />

The short names of the parameters are resolved us<strong>in</strong>g a web service

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!