Annual Scientific Report 2015
EMBL_EBI_ASR_2015_DigitalEdition
EMBL_EBI_ASR_2015_DigitalEdition
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Functional Genomics<br />
The Functional Genomics team provides bioinformatics services and conducts<br />
research in functional genomics data analysis, concentrating on high-throughput<br />
sequencing-based gene-expression and related proteomics data.<br />
We are responsible for core EMBL-EBI resources<br />
including the Expression Atlas, which enables users to<br />
query for gene expression; the ArrayExpress archive of<br />
functional genomics data; and the emerging<br />
BioStudies database. We contribute substantially to<br />
training in transcriptomics and other EMBL-EBI<br />
bioinformatics tools.<br />
The Brazma research group compliments the Functional<br />
Genomics service team, developing new methods and<br />
algorithms and integrating new types of data across<br />
multiple platforms. We are particularly interested in<br />
cancer genomics and elucidating relationships between<br />
transcriptomics and proteomics. We collaborate closely<br />
with several research groups at EMBL-EBI, including<br />
the Marioni, and Stegle groups.<br />
Major achievements<br />
BioStudies Database and the<br />
Expression Atlas<br />
In <strong>2015</strong> we released BioStudies (McEntyre et al., <strong>2015</strong>):<br />
a new database that holds descriptions of biological<br />
studies, links to data from these studies in other<br />
databases and data that do not fit in the structured<br />
archives at EMBL-EBI. BioStudies can accept a wide<br />
range of study types described using a simple format.<br />
Developed jointly with the Literature Services team, it<br />
enables authors to submit supplementary information<br />
and link to it from the manuscript publication. Data<br />
from 558 182 studies are available from BioStudies<br />
Database.<br />
We increased the content of the RNA-sequencing-based<br />
Baseline Expression Atlas significantly, releasing the<br />
first large-scale proteomics data on protein expression<br />
in human tissues (Petryszak et al, <strong>2015</strong>). Taken together,<br />
the Baseline and Differential Expression Atlases now<br />
offer data from 2620 studies and over 97 484 assays. In<br />
addition to the growth of data volume and diversity, we<br />
implemented many user interface improvements.<br />
Research<br />
In research we focused on two related areas: comparison<br />
of transcript and protein expression levels, and data<br />
integration for cancer genomics. We compared gene<br />
expression profiles, in both transcript and protein levels,<br />
across multiple human tissues, using several large-scale<br />
datasets. Overall we showed a higher level of correlation<br />
than has been reported previously, even in cases where<br />
different samples were used in transcriptomics and<br />
proteomics experiments. We continued our research<br />
into isoform-level gene expression, comparing data at<br />
transcript and proteome levels. A publication describing<br />
this research is under review, and several more are<br />
in preparation.<br />
Together with colleagues at the University of California<br />
Santa Cruz and Memorial Sloan Kettering Cancer<br />
Center, we led a working group on RNA and DNA<br />
data integration for the Pan-cancer project of the<br />
International Cancer Genome Consortium. As a part of<br />
this work we also collaborated closely on a number of<br />
investigations with the Stegle group at EMBL-EBI and<br />
the Korbel group at EMBL Heidelberg, and prepared the<br />
results of these analyses for publication.<br />
Future plans<br />
Integration of baseline RNA sequencing gene<br />
expression and proteomics data will be the focus of<br />
our development of the Expression Atlas. The new<br />
BioStudies database will serve as the back-end for<br />
dealing with new types of data, including molecular<br />
imaging data.<br />
Large-scale data integration and systems biology will<br />
remain the focus of our research. We will extend our<br />
work on cancer genomics as a part of the pan-cancer<br />
project of the ICGC, in which we are co-leading the<br />
transcriptomics/genomics integration working group<br />
that aims to study aberrant transcription patterns across<br />
many cancer types. We will also expand our research<br />
into dominant transcripts to protein abundance data.<br />
95<br />
<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>