Annual Scientific Report 2015
EMBL_EBI_ASR_2015_DigitalEdition
EMBL_EBI_ASR_2015_DigitalEdition
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Samples, Phenotypes and Ontologies<br />
The Samples, Phenotypes and Ontologies team has three major activity<br />
strands: BioSamples and semantic data integration, mouse informatics, and<br />
the Gene Ontology (GO) Editorial Office. We focus on metadata integration,<br />
ontology development and supporting tooling, as well as content development<br />
and delivery for the BioSamples database and mouse data for the biomedical<br />
research community.<br />
Our team’s activities diversified as the team grew in<br />
<strong>2015</strong>, when we received new funding for two large-scale<br />
collaborative projects: ELIXIR Excelerate and CORBEL.<br />
Helen Parkinson leads informatics work packages<br />
in both these projects and co-leads the ELIXIR<br />
interoperability platform, which delivers cross-domain<br />
infrastructure for life sciences in Europe.<br />
Major achievements<br />
Supporting target validation with<br />
semantics and data integration<br />
In <strong>2015</strong> we enhanced ontology services substantially and<br />
integrated them with industry efforts such as the Open<br />
Targets (formerly CTTV) and Roche drug discovery. We<br />
deployed the Experimental Factor Ontology in the first<br />
public release of the new Target Validation platform,<br />
and developed mapping tools and new ontology content<br />
to support its source disease annotations in EMBL-EBI<br />
databases, including the European Variation Archive,<br />
UniProt and Reactome. We also designed a phenotype–<br />
disease annotation representation that allows this<br />
public–private partnership to integrate rare and<br />
common diseases according to shared phenotype.<br />
The BioSolr project brought Solr and ElasticSearch<br />
users together from EMBL-EBI and several other<br />
organisations, including the NCBI. Our team provided<br />
use cases, data and expertise to assist with the release<br />
of Solr and ElasticSearch expansion plugins that aim to<br />
bring ontology-enabled search to a wide range of users.<br />
The European Bank for induced pluripotent Stem<br />
Cells (EBiSC) project and may others have adopted<br />
these plugins.<br />
Our outreach events in this domain included running<br />
a workshop at Semantic Web Applications and Tools<br />
for the Life Sciences (SWAT4LS) International in<br />
Cambridge, UK and presenting the BioSolr project at<br />
the Bioinformatics Open Source Conference in Dublin,<br />
Ireland.<br />
The Gene Ontology now has content for apoptosis, cilia<br />
and viruses, and its representation of human intestinal<br />
parasites is much improved. An industry collaboration<br />
with F. Hoffmann-La Roche AG to deliver improvements<br />
of an existing in-house mapping to GO led to the<br />
addition or revision of 200 terms. We also standardised<br />
protein-complex annotations based on consultation<br />
with experts from the IntAct resource..<br />
NHGRI-EBI GWAS Catalog<br />
In <strong>2015</strong> we relaunched the GWAS Catalog at EMBL-EBI<br />
based on a completely new infrastructure. We<br />
provided a new search interface, improved facilities<br />
for downloading data and a new curation platform to<br />
improve search functionality, enhance the presentation<br />
of the catalog content, improve links to other resources<br />
(especially Ensembl) and expand scientific content. The<br />
resource now provides enriched ontology-driven search<br />
capabilities, accurate display of complex interaction<br />
studies and haplotype analyses, and structured ancestry<br />
information for studies published from 2011 onwards.<br />
Biosamples<br />
The BioSamples database has grown to more than 4<br />
million samples, and we have incorporated it into several<br />
projects that access biological sample data at the point<br />
of acquisition. We worked closely with developers and<br />
data providers in the EBiSC consortium to ensure that<br />
Biosamples can model induced pluripotent stem cells<br />
accurately, and that cell lines can be registered as soon<br />
as they are generated. This facilitates the submission<br />
of experimental data to the various assay databases at<br />
EMBL-EBI.<br />
The Biosamples database has a new RESTful API to<br />
enable programmatic submission of sample data, and<br />
is more tightly integrated with the ENA to ensure that<br />
all sample data is accessioned and cross-referenced<br />
between these two resources.<br />
In collaboration with colleagues Peter Robinson at<br />
the Institute for Medical Genetics and Human<br />
Genetics in Berlin, Germany and Tudor Groza at<br />
the Garvan Institute of Medical Research in Sydney,<br />
Australia, we performed text mining to associate<br />
diseases with phenotypes based on co-occurence in the<br />
scientific literature.<br />
121<br />
<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>