22.08.2016 Views

Annual Scientific Report 2015

EMBL_EBI_ASR_2015_DigitalEdition

EMBL_EBI_ASR_2015_DigitalEdition

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Genome Analysis<br />

The Genome Analysis team is responsible for the development of scalable<br />

solutions for the functional annotation of genomes, with a focus on genetics<br />

and epigenomics. Within Ensembl, we maintain the Variation and Regulation<br />

databases that collect all available data on vertebrate genetic markers and gene<br />

expression regulation.<br />

Whereas the cost of sequencing a genome has dropped,<br />

the downstream analysis is still costly and error<br />

prone. Outside of the coding sequences, which are<br />

comparatively well understood, our knowledge of what<br />

the vast majority of the genome does is still hazy, and<br />

our ability to predict the functional consequence of<br />

most variants approximate. For example, most genetic<br />

markers that are revealed by genome wide association<br />

studies (GWAS) lie outside of coding regions. It is<br />

therefore very difficult to understand the corresponding<br />

disease mechanisms, let alone devise effective<br />

therapeutic strategies to counter them.<br />

The first obstacle that needs to be addressed is easy<br />

access to the data. For a number of practical, commercial<br />

or ethical reasons, many genetic datasets are isolated<br />

in separate databases. It is now clear that to detect<br />

minute genetic associations researchers must pool their<br />

resources so as to create a greater common dataset. It<br />

is similarly clear that to understand the full dynamics<br />

of the epigenome, the world research community needs<br />

to coordinate a very large number of assays. With<br />

this in mind, we are active members in a number of<br />

international collaborations to collect and share data.<br />

Having gained access to the relevant datasets, we<br />

develop technical solutions to efficiently process them,<br />

and cross-examine large numbers of experiments. We<br />

therefore develop specialised infrastructure to process<br />

multidimensional genome-wide datasets. This includes<br />

user-friendly webtools as well as more advanced<br />

command-line tools and micro-services. In particular,<br />

the Variant Effect Predictor (VEP) stands out as the<br />

most popular way of querying Ensembl for a set of<br />

DNA variants.<br />

Downstream of our data and our tools, we deliver<br />

integrative analyses that summarise all the data that<br />

we processed. Our users can rely on these results, with<br />

the knowledge that we integrated as many experiments<br />

as possible to produce the most precise annotation<br />

possible. For example, the Ensembl Regulatory Build is<br />

the first published annotation that integrates ChIP-Seq<br />

and open chromatin experiments from multiple<br />

consortia into a single synthetic summary.<br />

Major achievements<br />

A lot of our efforts were focused on obtaining more<br />

datasets and encouraging healthy data sharing between<br />

research partners. We are active members of the<br />

Global Alliance for Genomics and Health (GA4GH),<br />

an organisation that strives to break the technical and<br />

legal barriers to genomic data sharing across the world.<br />

This year, the Global Alliance finished its first standard<br />

exchange protocols. We also play an important role<br />

in the International Human Epigenome Consortium<br />

(IHEC) and the Functional Annotation of Animal<br />

Genomes (FAANG) consortium, that set the standards<br />

and collect references to epigenomic datasets produced<br />

by all major consortia. Finally, we are collaborating with<br />

the GWAS Catalog project to help collect and curate<br />

available GWAS results.<br />

We also continued to improve access to our data. We<br />

continued to improve the VEP, for example by speeding<br />

it up. We also helped develop and release user-friendly<br />

tools for epigenomics as produced by the BLUEPRINT<br />

consortium, GenomeStats and EpiExplorer. We<br />

also improved the querying of the datasets that we<br />

store locally, for example by acceleration linkage<br />

disequilibrium calculations and displaying them as<br />

Manhattan plots. Cis-regulatory interactions are making<br />

their way into Ensembl, and we are currently able to<br />

display user datasets (see Figure 1).<br />

Finally, we kept abreast of technical development in<br />

the field. Our Ensembl Regulatory Build algorithm was<br />

published in Genome Biology. We collaborated with<br />

Peter Fraser and Mikhail Spivakov of the Babraham<br />

Institute on the downstream analysis of an exciting<br />

technology, called Promoter Capture Hi-C, that produces<br />

high-resolution maps of the 3 dimensional conformation<br />

of the chromatin. We are also working with the CTTV<br />

Epigenome Project, where we compare and contrast cell<br />

line epigenomic data.<br />

Future plans<br />

In the next few years, we aim to greatly expand the<br />

number of species, tissues and cell types covered by<br />

the Ensembl Regulatory Build. Having spent much of<br />

85<br />

<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!