22.08.2016 Views

Annual Scientific Report 2015

EMBL_EBI_ASR_2015_DigitalEdition

EMBL_EBI_ASR_2015_DigitalEdition

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Variation Archive<br />

Genetic variation represents the individual genomes of studied organisms and<br />

human patients as compared with one another and against the reference sequence<br />

for the species. It is a fundamental data type in molecular biology, population<br />

research and clinical investigation, and our team manages resources that make<br />

it available to researchers worldwide: the European Variation Archive (EVA)<br />

and the European Genome-phenome Archive (EGA).<br />

Human genetic data presents particular challenges<br />

in terms of protecting participant privacy when<br />

individually unique genomes are archived for scientific<br />

research, often requiring controlled-access approval<br />

systems to ensure compliance with data access policies.<br />

The EGA supports secure, controlled-access data<br />

management for human genomes and variation data,<br />

providing a standard mechanism for providing access to<br />

data to a wide set of research users in a secure manner.<br />

It is jointly developed by EMBL-EBI and the Centre for<br />

Genomic Regulation (CRG) in Barcelona, Spain.<br />

EMBL-EBI is a member of the Global Alliance for<br />

Genomics and Health (GA4GH), which partnered<br />

with ELIXIR in <strong>2015</strong> to provide genomics services that<br />

balance data protection and efficient data sharing. The<br />

project develops ‘Beacons’, which provide consent-based<br />

access to genomic data in the EGA as well as national<br />

resources in Finland, Sweden and the Netherlands.<br />

Variation data is the primary analysis product of the<br />

sequencing, alignment and variant-calling pipeline to<br />

studies of population genetics, genotype-to-phenotype<br />

association and functional analysis linking the genome<br />

to molecular pathways. The EVA, the global reference<br />

catalogue of genetic variation, provides a basis for<br />

interpreting each new genome and variant observed in<br />

research and clinical studies. It includes the Database<br />

of Genomic Variant (DGVa) project, provides a primary<br />

archive service for genetic variation data and builds<br />

on EMBL-EBI’s sequence-level archives (i.e. ENA and<br />

EGA), supporting value-added analysis and<br />

visualisation resources.<br />

The European Genome-phenome Archive is<br />

now co-developed with the Centre for Genomic<br />

Regulation in Barcelona, Spain.<br />

Together with international partners, the EVA provides<br />

a stable, accessioned database that catalogues and<br />

provides access to genetic variation in all species. This<br />

is a powerful tool for researchers working in clinical,<br />

agricultural, biotechnological and ecological research.<br />

Major achievements<br />

European Genome-phenome Archive<br />

Our team handled a 50% increase in the volume of<br />

data archived in the EGA and a 65% increase in the<br />

number of files submitted; the resource grew to over<br />

three Petabytes of human genomic and variation<br />

data. We deployed a new EGA downloader service,<br />

which distributed over 1.7 Petabytes of data in <strong>2015</strong>. In<br />

collaboration with the GA4GH, we implemented a tiered<br />

Beacon for the EGA, enabling both anonymous and<br />

registered access to a limited collection of variation data.<br />

By being a member of the Beacon Network, multiple<br />

institutions can now easily discover relevant datasets<br />

from a single access point.<br />

We re-built the EGA pipeline to improve capacity and<br />

reliability, and achieved a reduction of the quarterly<br />

average processing time from three weeks to one and<br />

a half days. In the context of the NIH-funded Big Data<br />

to Knowledge (BD2K) project, we exposed variation<br />

data and delivered EGA content for the Omics Data<br />

Discovery Index.<br />

Together with our federation partners and co-developers<br />

at the CRG in Barcelona, we substantially increased the<br />

resource’s capacity to distribute data. The CRG now<br />

distributes files via FTP or Aspera, and EMBL-EBI<br />

distributes files via the EGA downloader. In addition,<br />

a new programmatic interface hosted at CRG provides<br />

access to publicly available metadata about studies,<br />

samples and datasets held in the Archive.<br />

Together with our partners in ELIXIR, the GA4GH<br />

and CRG, we developed a three-tiered Beacon that<br />

provides a single point of access to datasets from<br />

multiple institutions. The Beacon Project allows users<br />

to make simple, anonymous queries on controlled-<br />

93<br />

<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!