Annual Scientific Report 2015
EMBL_EBI_ASR_2015_DigitalEdition
EMBL_EBI_ASR_2015_DigitalEdition
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Cheminformatics and Metabolism<br />
Our team works on methods to decipher, organise and disseminate information<br />
about the metabolism of organisms. We develop and maintain MetaboLights,<br />
a metabolomics reference database and archive, and ChEBI, the database and<br />
ontology of chemical entities of biological interest.<br />
We develop algorithms to: process chemical information,<br />
predict metabolomes based on genomic and other<br />
information, determine the structure of metabolites<br />
by stochastic screening of large candidate spaces, and<br />
enable the identification of molecules with desired<br />
properties. This requires algorithms based on machine<br />
learning and other statistical methods for the prediction<br />
of spectroscopic and other physicochemical properties<br />
for compounds represented in chemical graphs.<br />
Our research is dedicated to the elucidation of<br />
metabolomes, Computer-Assisted Structure Elucidation<br />
(CASE), the reconstruction of metabolic networks,<br />
biomedical and biochemical ontologies and algorithm<br />
development in cheminformatics and bioinformatics.<br />
The chemical diversity of the metabolome and a lack of<br />
accepted reporting standards currently make analysis<br />
challenging and time-consuming. Part of our research<br />
comprises the development and implementation of<br />
methods to analyse spectroscopic data in metabolomics.<br />
Major achievements<br />
ChEBI<br />
We added over 5 200 fully curated entries to the ChEBI<br />
database during <strong>2015</strong> (total number of fully curated<br />
entries, over 47 500). Around 30% of new entries are<br />
from direct data submissions, and the remaining entries<br />
are from a variety of sources. Our manual curation<br />
concentrated on natural products, including compounds<br />
identified in submissions to the MetaboLights database,<br />
together with requests from individual users and<br />
research groups. We also put considerable curation<br />
efforts into the Species table, introduced to store<br />
taxonomy and citation information for natural products,<br />
which now contains over 11 000 entries.<br />
On the ChEBI public website, we enhanced main entity<br />
pages to allow users to view information about an entity<br />
in a wider biological context. All biological reactions<br />
and pathways in which the entity participates displayed<br />
using Rhea and Reactome, respectively. ChEBI became<br />
entirely open source (again) in <strong>2015</strong> when we replaced<br />
the Marvin chemistry sketchpad (used for drawing<br />
chemical structures when performing exact structure,<br />
substructure, and similarity searches) with Ketcher.<br />
From its first release in July 2004, ChEBI used<br />
SourceForge as an issue tracker for logging and tracking<br />
requests for new entries, updates to existing entries, new<br />
feature requests, bugs and other queries. During <strong>2015</strong>,<br />
several press reports emerged regarding problems with<br />
SourceForge, which gave rise to worries about its future<br />
stability. In response to user requests, we migrated the<br />
ChEBI issue tracker to GitHub as it offers advanced<br />
features and flexibility.<br />
MetaboLights<br />
We developed new features to help researchers explore<br />
the reference compounds, studies and associated<br />
organism information in MetaboLights. These included<br />
species search functionality based on model organisms,<br />
free-text search and a browsable ‘tree of life’ with<br />
automatic classification compatible with any taxonomic<br />
identifier present in 89 different taxonomy sources.<br />
We linked over 18 000 compounds in MetaboLights<br />
to ChEBI, enriching compounds with 5355 spectra for<br />
Mass Spectrometry and 2890 for NMR. MetaboLights<br />
exported just over 150 datasets (265 total as of December<br />
<strong>2015</strong>) to MetabolomeXchange, the global EBI Search<br />
and the Omics Discovery Index.<br />
In <strong>2015</strong> we completely redesigned MetaboLights to<br />
accommodate a more flexible submission system,<br />
which now supports incremental submissions.<br />
The status of a study in our curation process is<br />
now clearly visible. We also designed a new online<br />
validation system that displays a detailed status of<br />
our current minimum reporting requirements. We<br />
incorporated new technology frameworks, for example<br />
AngularJS, and built new Web Services. We worked on<br />
integrating online analysis tools, for example adapting<br />
MetaboAnalyst (David Wishart group, University of<br />
Alberta). We incorporated tools into our architecture<br />
using R packages and the EMBL-EBI R-cloud. In<br />
tandem, we developed an authenticated Web Service<br />
that enables improved visualisations and response time,<br />
especially for larger datasets.<br />
Community standards<br />
The COSMOS consortium, an EU-funded endeavour<br />
for the coordination of standards for metabolomics,<br />
concluded its work in September <strong>2015</strong>. The<br />
consortium, led by our team, combined the efforts<br />
113<br />
<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>