Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
90 7. Summary and discussion<br />
and descriptions us<strong>in</strong>g this web service.<br />
The standard development was a transparent and open process. The<br />
strength of the community-based standard development process is for<br />
sure that many people can contribute. The drawback is the slow pace<br />
with which this is done. This is due to frequent reiterations and a<br />
very eloborate decision mak<strong>in</strong>g process. It took almost three years to<br />
publish this standard. Because many people and <strong>in</strong>stitutions all over<br />
the world were <strong>in</strong>volved, this aga<strong>in</strong> <strong>in</strong>creases the chance that it will be<br />
used by the community.<br />
A confus<strong>in</strong>g fact about the MIxS standards is that they are <strong>in</strong>tended<br />
to be m<strong>in</strong>imal, which is also reflected <strong>in</strong> the names of the standards.<br />
The community-based development process, however, lead to the fact,<br />
that these standards are comprehensive rather than m<strong>in</strong>imal. It is of<br />
course difficult to estimate which parameters are relevant and which<br />
are not. Of course not all parameters <strong>in</strong> the standards are mandatory.<br />
Users are confronted with long lists of parameters and need to decide<br />
which parameters to use and which not. This process might be overwhelm<strong>in</strong>g<br />
at first.<br />
For data <strong><strong>in</strong>tegration</strong> and the contextualization of sequence data, it is a<br />
very important step forward that the MIMARKS standard is now published.<br />
However, the standards leave plenty of room for improvement.<br />
Currently, most of the MIxS parameters are free-text fields. To extract<br />
<strong>in</strong>formation out of these fields automatically, is not trivial, from<br />
a computational po<strong>in</strong>t of view [Hirschman et al., 2008]. All possible<br />
variations and ambiguities of natural language have to be taken <strong>in</strong>to<br />
account to <strong>in</strong>terpret the <strong>in</strong>formation <strong>in</strong> these fields correctly. Correct<br />
<strong>in</strong>terpretation cannot be guaranteed. It is recommended to use SI units<br />
http://www.bipm.org/en/CGPM/db/11/12/ for measurements, though this<br />
is not strictly enforced. Controlled vocabularies or even ontology terms<br />
are, with the exception of the environmental ontology (EnvO) terms,<br />
not offered to the users. These are th<strong>in</strong>gs that should be taken <strong>in</strong>to<br />
account for the annually planned updates of the standard.<br />
Stability versus ’liv<strong>in</strong>g standards’<br />
It has to be noted that even though the MIxS standards are <strong>in</strong>tended<br />
to be ’liv<strong>in</strong>g standards’, it is very important from a programmer’s and<br />
user’s po<strong>in</strong>t of view to have stability. The planned annual ref<strong>in</strong>ement