Annual Scientific Report 2015
EMBL_EBI_ASR_2015_DigitalEdition
EMBL_EBI_ASR_2015_DigitalEdition
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Gene Expression<br />
The Gene Expression team handles the acquisition, curation, quality control,<br />
statistical analysis and visualisation of functional genomics data at EMBL-EBI,<br />
focusing on microarray, high-throughput sequencing-based gene expression and<br />
related proteomics data.<br />
We are responsible for several core EMBL-EBI<br />
resources, including the Expression Atlas, which enables<br />
users to query for information about gene expression,<br />
and the ArrayExpress archive of functional genomics<br />
data. We contribute substantially to online and<br />
face-to-face training in transcriptomics, in particular<br />
relating to our team’s resources but also for related<br />
topics such as next-generation sequencing.<br />
We are a centre of excellence for RNA-sequencing<br />
quality control and analysis, the results of which<br />
are used by numerous resources at EMBL-EBI and<br />
externally. We are increasingly interested in<br />
epigenetic analysis, for example methylation, and<br />
work towards placing transcriptomic data in a broader<br />
regulatory context.<br />
We are part of Open Targets (formerly the Centre<br />
for Therapeutic Target Validation, CTTV) and the<br />
Cancer Genome Atlas Pan-Cancer analysis project.<br />
Analysis and visualisation on plant data is also a major<br />
component of our work through our involvement in<br />
Gramene project.<br />
We collaborate closely with the Brazma, Marioni,<br />
Stegle and Teichmann research groups at EMBL-EBI<br />
and with the Choudhary group at the Wellcome<br />
Trust Sanger Institute, developing new methods and<br />
algorithms, integrating new types of data across multiple<br />
platforms, and investigating relationships between<br />
transcriptomics and proteomics data in the context of<br />
cancer genomics.<br />
Major achievements<br />
ArrayExpress, Expression Atlas and<br />
related projects<br />
In <strong>2015</strong> we capitalised on the deployment and continual<br />
improvement of Annotare, the ArrayExpress submission<br />
tool, and focused our curation efforts on datasets in<br />
the Expression Atlas, which held 100 000 assays in<br />
December <strong>2015</strong> (a six-fold increase compared to 2014).<br />
These assays included 157 RNA-seq experiments, over<br />
7000 differential comparisons across 26 organisms, and<br />
568 plant experiments.<br />
At the end of <strong>2015</strong> the Baseline Expression Atlas<br />
contained 46 RNA-seq studies, including data from<br />
many high impact studies (e.g. GTEx and FANTOM5)<br />
and its first proteomics study.<br />
We improved the Expression Atlas interface<br />
substantially, applying many enhancements in its<br />
presentation of search results (e.g. faceting). We<br />
developed new functionalities that will be available<br />
to users in early 2016, for example gene co-expression<br />
and a new Bioconductor package for easy access to<br />
Atlas data in R language. The Expression Atlas now<br />
contributes transcriptomic data and visualisations to<br />
many resources, including the Open Targets (formerly<br />
CTTV), Ensembl, Reactome, Plant Reactome and<br />
International Mouse Phenotyping Consortium portals.<br />
We developed an RNA-seq pipeline and adapted it to<br />
help analyse public RNA-seq data for major species<br />
in the European Nucleotide Archive’s Sequence<br />
Read Archive. This functionality resulted in 148 000<br />
processed sequencing runs in 85 species by the end of<br />
the year. Where applicable, this data is included in both<br />
the Expression Atlas and Ensembl.<br />
Future plans<br />
In 2016 our development efforts for the biology-centric<br />
Expression Atlas will centre on integration of baseline<br />
RNA-sequencing gene expression and proteomics data.<br />
The BioStudies database, developed by the Sarkans<br />
team, will serve as the back-end for dealing with new<br />
types of data, including molecular imaging data.<br />
We will continue to expand our analyses and develop<br />
intuitive visualisation methods for both the existing<br />
data in Expression Atlas and for novel data types, such<br />
as epigenetic (methylation), genetic (eQTL), single-cell<br />
RNA-seq and smallRNA-seq. We will also complete the<br />
analysis of public RNA-seq data in major species and<br />
make the raw results available publicly.<br />
As a part of the pan-cancer project of the ICGC, we will<br />
continue to investigate aberrant transcription patterns<br />
across many cancer types.<br />
99<br />
<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>