22.08.2016 Views

Annual Scientific Report 2015

EMBL_EBI_ASR_2015_DigitalEdition

EMBL_EBI_ASR_2015_DigitalEdition

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Samples, Phenotypes and Ontologies<br />

The Samples, Phenotypes and Ontologies team has three major activity<br />

strands: BioSamples and semantic data integration, mouse informatics, and<br />

the Gene Ontology (GO) Editorial Office. We focus on metadata integration,<br />

ontology development and supporting tooling, as well as content development<br />

and delivery for the BioSamples database and mouse data for the biomedical<br />

research community.<br />

Our team’s activities diversified as the team grew in<br />

<strong>2015</strong>, when we received new funding for two large-scale<br />

collaborative projects: ELIXIR Excelerate and CORBEL.<br />

Helen Parkinson leads informatics work packages<br />

in both these projects and co-leads the ELIXIR<br />

interoperability platform, which delivers cross-domain<br />

infrastructure for life sciences in Europe.<br />

Major achievements<br />

Supporting target validation with<br />

semantics and data integration<br />

In <strong>2015</strong> we enhanced ontology services substantially and<br />

integrated them with industry efforts such as the Open<br />

Targets (formerly CTTV) and Roche drug discovery. We<br />

deployed the Experimental Factor Ontology in the first<br />

public release of the new Target Validation platform,<br />

and developed mapping tools and new ontology content<br />

to support its source disease annotations in EMBL-EBI<br />

databases, including the European Variation Archive,<br />

UniProt and Reactome. We also designed a phenotype–<br />

disease annotation representation that allows this<br />

public–private partnership to integrate rare and<br />

common diseases according to shared phenotype.<br />

The BioSolr project brought Solr and ElasticSearch<br />

users together from EMBL-EBI and several other<br />

organisations, including the NCBI. Our team provided<br />

use cases, data and expertise to assist with the release<br />

of Solr and ElasticSearch expansion plugins that aim to<br />

bring ontology-enabled search to a wide range of users.<br />

The European Bank for induced pluripotent Stem<br />

Cells (EBiSC) project and may others have adopted<br />

these plugins.<br />

Our outreach events in this domain included running<br />

a workshop at Semantic Web Applications and Tools<br />

for the Life Sciences (SWAT4LS) International in<br />

Cambridge, UK and presenting the BioSolr project at<br />

the Bioinformatics Open Source Conference in Dublin,<br />

Ireland.<br />

The Gene Ontology now has content for apoptosis, cilia<br />

and viruses, and its representation of human intestinal<br />

parasites is much improved. An industry collaboration<br />

with F. Hoffmann-La Roche AG to deliver improvements<br />

of an existing in-house mapping to GO led to the<br />

addition or revision of 200 terms. We also standardised<br />

protein-complex annotations based on consultation<br />

with experts from the IntAct resource..<br />

NHGRI-EBI GWAS Catalog<br />

In <strong>2015</strong> we relaunched the GWAS Catalog at EMBL-EBI<br />

based on a completely new infrastructure. We<br />

provided a new search interface, improved facilities<br />

for downloading data and a new curation platform to<br />

improve search functionality, enhance the presentation<br />

of the catalog content, improve links to other resources<br />

(especially Ensembl) and expand scientific content. The<br />

resource now provides enriched ontology-driven search<br />

capabilities, accurate display of complex interaction<br />

studies and haplotype analyses, and structured ancestry<br />

information for studies published from 2011 onwards.<br />

Biosamples<br />

The BioSamples database has grown to more than 4<br />

million samples, and we have incorporated it into several<br />

projects that access biological sample data at the point<br />

of acquisition. We worked closely with developers and<br />

data providers in the EBiSC consortium to ensure that<br />

Biosamples can model induced pluripotent stem cells<br />

accurately, and that cell lines can be registered as soon<br />

as they are generated. This facilitates the submission<br />

of experimental data to the various assay databases at<br />

EMBL-EBI.<br />

The Biosamples database has a new RESTful API to<br />

enable programmatic submission of sample data, and<br />

is more tightly integrated with the ENA to ensure that<br />

all sample data is accessioned and cross-referenced<br />

between these two resources.<br />

In collaboration with colleagues Peter Robinson at<br />

the Institute for Medical Genetics and Human<br />

Genetics in Berlin, Germany and Tudor Groza at<br />

the Garvan Institute of Medical Research in Sydney,<br />

Australia, we performed text mining to associate<br />

diseases with phenotypes based on co-occurence in the<br />

scientific literature.<br />

121<br />

<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!