11.03.2014 Views

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

90 7. Summary and discussion<br />

and descriptions us<strong>in</strong>g this web service.<br />

The standard development was a transparent and open process. The<br />

strength of the community-based standard development process is for<br />

sure that many people can contribute. The drawback is the slow pace<br />

with which this is done. This is due to frequent reiterations and a<br />

very eloborate decision mak<strong>in</strong>g process. It took almost three years to<br />

publish this standard. Because many people and <strong>in</strong>stitutions all over<br />

the world were <strong>in</strong>volved, this aga<strong>in</strong> <strong>in</strong>creases the chance that it will be<br />

used by the community.<br />

A confus<strong>in</strong>g fact about the MIxS standards is that they are <strong>in</strong>tended<br />

to be m<strong>in</strong>imal, which is also reflected <strong>in</strong> the names of the standards.<br />

The community-based development process, however, lead to the fact,<br />

that these standards are comprehensive rather than m<strong>in</strong>imal. It is of<br />

course difficult to estimate which parameters are relevant and which<br />

are not. Of course not all parameters <strong>in</strong> the standards are mandatory.<br />

Users are confronted with long lists of parameters and need to decide<br />

which parameters to use and which not. This process might be overwhelm<strong>in</strong>g<br />

at first.<br />

For data <strong><strong>in</strong>tegration</strong> and the contextualization of sequence data, it is a<br />

very important step forward that the MIMARKS standard is now published.<br />

However, the standards leave plenty of room for improvement.<br />

Currently, most of the MIxS parameters are free-text fields. To extract<br />

<strong>in</strong>formation out of these fields automatically, is not trivial, from<br />

a computational po<strong>in</strong>t of view [Hirschman et al., 2008]. All possible<br />

variations and ambiguities of natural language have to be taken <strong>in</strong>to<br />

account to <strong>in</strong>terpret the <strong>in</strong>formation <strong>in</strong> these fields correctly. Correct<br />

<strong>in</strong>terpretation cannot be guaranteed. It is recommended to use SI units<br />

http://www.bipm.org/en/CGPM/db/11/12/ for measurements, though this<br />

is not strictly enforced. Controlled vocabularies or even ontology terms<br />

are, with the exception of the environmental ontology (EnvO) terms,<br />

not offered to the users. These are th<strong>in</strong>gs that should be taken <strong>in</strong>to<br />

account for the annually planned updates of the standard.<br />

Stability versus ’liv<strong>in</strong>g standards’<br />

It has to be noted that even though the MIxS standards are <strong>in</strong>tended<br />

to be ’liv<strong>in</strong>g standards’, it is very important from a programmer’s and<br />

user’s po<strong>in</strong>t of view to have stability. The planned annual ref<strong>in</strong>ement

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!