Annual Scientific Report 2015
EMBL_EBI_ASR_2015_DigitalEdition
EMBL_EBI_ASR_2015_DigitalEdition
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Functional Genomics Development<br />
Our team develops software for ArrayExpress, a core EMBL-EBI resource,<br />
and the BioStudies database, a resource for biological datasets that do not<br />
have a dedicated home within the institute’s services. We also contribute to the<br />
development of BioSamples, which centralises biological sample data.<br />
Together with the Expression Atlas team, we build<br />
and maintain data management tools, user interfaces,<br />
programmatic interfaces, and annotation and data<br />
submission systems for functional genomics resources.<br />
We also collaborate on a number of European<br />
‘multi-omics’ and medical informatics projects in a datamanagement<br />
capacity.<br />
Major achievements<br />
BioStudies<br />
The BioStudies database holds descriptions of biological<br />
studies, and links to data from these studies in other<br />
databases at EMBL-EBI and beyond. It is a repository<br />
for supplementary data files from published life-science<br />
experiments that do not fit in the structured archives at<br />
EMBL-EBI.<br />
In <strong>2015</strong> we redeveloped the BioStudies user interface<br />
to reflect the recommendations of a user experience<br />
study. We drew on a wide range of information<br />
sources to compile data for a comprehensive initial<br />
release of the database. This included supplementary<br />
information from articles in Europe PMC, studies from<br />
the EurocanPlatform project, datasets from diXa and<br />
HeCaToS toxicogenomics studies, imaging data from the<br />
Image Data Repository project and original, unsolicited<br />
submissions to BioStudies. This initial work enabled us<br />
to further refine our data-management principles and<br />
fine-tune the functionality of the user interface.<br />
A simple data submission tool will be launched in<br />
early 2016.<br />
ArrayExpress<br />
We continued to maintain ArrayExpress datamanagement<br />
tools and access interfaces, focusing on<br />
a major redesign of the post-submission experiment<br />
management tool. We continued to improve and<br />
automate the management of sequencing data, and<br />
collaborated with the European Nucleotide Archive<br />
team on methods for storing raw sequence data<br />
and information.<br />
We put significant efforts into further improving<br />
Annotare, our main data-submission tool (originally<br />
released in 2014). The emphasis of our work was on<br />
addressing the needs of curators in handling submitted<br />
datasets of varying levels of quality, and improving the<br />
file-upload functionality.<br />
BioSamples<br />
We continued to collaborate with the Samples,<br />
Phenotypes and Ontologies team on the basic<br />
infrastructure components of the BioSamples<br />
database, perhaps most notably in the context of the<br />
BioMedBridges project. We built a prototype system that<br />
links information from biobanks across BioSamples,<br />
the Biobanking and Biomolecular Resources Research<br />
Infrastructure (BBMRI) Hub and the Resource<br />
Entitlement Management System (REMS). The system<br />
now uses the Shibboleth system for authentication and<br />
access-rights control.<br />
Toxicogenomics and<br />
biomedical informatics<br />
Our work on the diXa toxicogenomics data warehouse—<br />
an Innovative Medicines Initiative (IMI) project—fed<br />
into the development of BioStudies from the outset.<br />
In <strong>2015</strong> we transferred datasets from the dedicated<br />
diXa data warehouse to the BioStudies database, and<br />
developed new functionality. We also started work on<br />
the HeCaToS project, which represents the continuation<br />
of diXa in terms of data management.<br />
We participate in a number of medical informatics<br />
projects, for example providing data management<br />
solutions for the EU-AIMS project on autism spectrum<br />
disorder. As a part of the European Medical Information<br />
Framework (EMIF) project for the reuse of patient<br />
health records in clinical research, we are building a<br />
cloud environment for multi-omics data management<br />
and analysis, providing and integrating (among other<br />
components) our R-cloud scientific computation<br />
infrastructure, several Docker-enabled data-analysis<br />
pipelines, and the iRODS data-management system.<br />
97<br />
<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>