22.08.2016 Views

Annual Scientific Report 2015

EMBL_EBI_ASR_2015_DigitalEdition

EMBL_EBI_ASR_2015_DigitalEdition

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Functional Genomics<br />

The Functional Genomics team provides bioinformatics services and conducts<br />

research in functional genomics data analysis, concentrating on high-throughput<br />

sequencing-based gene-expression and related proteomics data.<br />

We are responsible for core EMBL-EBI resources<br />

including the Expression Atlas, which enables users to<br />

query for gene expression; the ArrayExpress archive of<br />

functional genomics data; and the emerging<br />

BioStudies database. We contribute substantially to<br />

training in transcriptomics and other EMBL-EBI<br />

bioinformatics tools.<br />

The Brazma research group compliments the Functional<br />

Genomics service team, developing new methods and<br />

algorithms and integrating new types of data across<br />

multiple platforms. We are particularly interested in<br />

cancer genomics and elucidating relationships between<br />

transcriptomics and proteomics. We collaborate closely<br />

with several research groups at EMBL-EBI, including<br />

the Marioni, and Stegle groups.<br />

Major achievements<br />

BioStudies Database and the<br />

Expression Atlas<br />

In <strong>2015</strong> we released BioStudies (McEntyre et al., <strong>2015</strong>):<br />

a new database that holds descriptions of biological<br />

studies, links to data from these studies in other<br />

databases and data that do not fit in the structured<br />

archives at EMBL-EBI. BioStudies can accept a wide<br />

range of study types described using a simple format.<br />

Developed jointly with the Literature Services team, it<br />

enables authors to submit supplementary information<br />

and link to it from the manuscript publication. Data<br />

from 558 182 studies are available from BioStudies<br />

Database.<br />

We increased the content of the RNA-sequencing-based<br />

Baseline Expression Atlas significantly, releasing the<br />

first large-scale proteomics data on protein expression<br />

in human tissues (Petryszak et al, <strong>2015</strong>). Taken together,<br />

the Baseline and Differential Expression Atlases now<br />

offer data from 2620 studies and over 97 484 assays. In<br />

addition to the growth of data volume and diversity, we<br />

implemented many user interface improvements.<br />

Research<br />

In research we focused on two related areas: comparison<br />

of transcript and protein expression levels, and data<br />

integration for cancer genomics. We compared gene<br />

expression profiles, in both transcript and protein levels,<br />

across multiple human tissues, using several large-scale<br />

datasets. Overall we showed a higher level of correlation<br />

than has been reported previously, even in cases where<br />

different samples were used in transcriptomics and<br />

proteomics experiments. We continued our research<br />

into isoform-level gene expression, comparing data at<br />

transcript and proteome levels. A publication describing<br />

this research is under review, and several more are<br />

in preparation.<br />

Together with colleagues at the University of California<br />

Santa Cruz and Memorial Sloan Kettering Cancer<br />

Center, we led a working group on RNA and DNA<br />

data integration for the Pan-cancer project of the<br />

International Cancer Genome Consortium. As a part of<br />

this work we also collaborated closely on a number of<br />

investigations with the Stegle group at EMBL-EBI and<br />

the Korbel group at EMBL Heidelberg, and prepared the<br />

results of these analyses for publication.<br />

Future plans<br />

Integration of baseline RNA sequencing gene<br />

expression and proteomics data will be the focus of<br />

our development of the Expression Atlas. The new<br />

BioStudies database will serve as the back-end for<br />

dealing with new types of data, including molecular<br />

imaging data.<br />

Large-scale data integration and systems biology will<br />

remain the focus of our research. We will extend our<br />

work on cancer genomics as a part of the pan-cancer<br />

project of the ICGC, in which we are co-leading the<br />

transcriptomics/genomics integration working group<br />

that aims to study aberrant transcription patterns across<br />

many cancer types. We will also expand our research<br />

into dominant transcripts to protein abundance data.<br />

95<br />

<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!