22.08.2016 Views

Annual Scientific Report 2015

EMBL_EBI_ASR_2015_DigitalEdition

EMBL_EBI_ASR_2015_DigitalEdition

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Gene Expression<br />

The Gene Expression team handles the acquisition, curation, quality control,<br />

statistical analysis and visualisation of functional genomics data at EMBL-EBI,<br />

focusing on microarray, high-throughput sequencing-based gene expression and<br />

related proteomics data.<br />

We are responsible for several core EMBL-EBI<br />

resources, including the Expression Atlas, which enables<br />

users to query for information about gene expression,<br />

and the ArrayExpress archive of functional genomics<br />

data. We contribute substantially to online and<br />

face-to-face training in transcriptomics, in particular<br />

relating to our team’s resources but also for related<br />

topics such as next-generation sequencing.<br />

We are a centre of excellence for RNA-sequencing<br />

quality control and analysis, the results of which<br />

are used by numerous resources at EMBL-EBI and<br />

externally. We are increasingly interested in<br />

epigenetic analysis, for example methylation, and<br />

work towards placing transcriptomic data in a broader<br />

regulatory context.<br />

We are part of Open Targets (formerly the Centre<br />

for Therapeutic Target Validation, CTTV) and the<br />

Cancer Genome Atlas Pan-Cancer analysis project.<br />

Analysis and visualisation on plant data is also a major<br />

component of our work through our involvement in<br />

Gramene project.<br />

We collaborate closely with the Brazma, Marioni,<br />

Stegle and Teichmann research groups at EMBL-EBI<br />

and with the Choudhary group at the Wellcome<br />

Trust Sanger Institute, developing new methods and<br />

algorithms, integrating new types of data across multiple<br />

platforms, and investigating relationships between<br />

transcriptomics and proteomics data in the context of<br />

cancer genomics.<br />

Major achievements<br />

ArrayExpress, Expression Atlas and<br />

related projects<br />

In <strong>2015</strong> we capitalised on the deployment and continual<br />

improvement of Annotare, the ArrayExpress submission<br />

tool, and focused our curation efforts on datasets in<br />

the Expression Atlas, which held 100 000 assays in<br />

December <strong>2015</strong> (a six-fold increase compared to 2014).<br />

These assays included 157 RNA-seq experiments, over<br />

7000 differential comparisons across 26 organisms, and<br />

568 plant experiments.<br />

At the end of <strong>2015</strong> the Baseline Expression Atlas<br />

contained 46 RNA-seq studies, including data from<br />

many high impact studies (e.g. GTEx and FANTOM5)<br />

and its first proteomics study.<br />

We improved the Expression Atlas interface<br />

substantially, applying many enhancements in its<br />

presentation of search results (e.g. faceting). We<br />

developed new functionalities that will be available<br />

to users in early 2016, for example gene co-expression<br />

and a new Bioconductor package for easy access to<br />

Atlas data in R language. The Expression Atlas now<br />

contributes transcriptomic data and visualisations to<br />

many resources, including the Open Targets (formerly<br />

CTTV), Ensembl, Reactome, Plant Reactome and<br />

International Mouse Phenotyping Consortium portals.<br />

We developed an RNA-seq pipeline and adapted it to<br />

help analyse public RNA-seq data for major species<br />

in the European Nucleotide Archive’s Sequence<br />

Read Archive. This functionality resulted in 148 000<br />

processed sequencing runs in 85 species by the end of<br />

the year. Where applicable, this data is included in both<br />

the Expression Atlas and Ensembl.<br />

Future plans<br />

In 2016 our development efforts for the biology-centric<br />

Expression Atlas will centre on integration of baseline<br />

RNA-sequencing gene expression and proteomics data.<br />

The BioStudies database, developed by the Sarkans<br />

team, will serve as the back-end for dealing with new<br />

types of data, including molecular imaging data.<br />

We will continue to expand our analyses and develop<br />

intuitive visualisation methods for both the existing<br />

data in Expression Atlas and for novel data types, such<br />

as epigenetic (methylation), genetic (eQTL), single-cell<br />

RNA-seq and smallRNA-seq. We will also complete the<br />

analysis of public RNA-seq data in major species and<br />

make the raw results available publicly.<br />

As a part of the pan-cancer project of the ICGC, we will<br />

continue to investigate aberrant transcription patterns<br />

across many cancer types.<br />

99<br />

<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!