Annual Scientific Report 2015
EMBL_EBI_ASR_2015_DigitalEdition
EMBL_EBI_ASR_2015_DigitalEdition
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Variation Archive<br />
Genetic variation represents the individual genomes of studied organisms and<br />
human patients as compared with one another and against the reference sequence<br />
for the species. It is a fundamental data type in molecular biology, population<br />
research and clinical investigation, and our team manages resources that make<br />
it available to researchers worldwide: the European Variation Archive (EVA)<br />
and the European Genome-phenome Archive (EGA).<br />
Human genetic data presents particular challenges<br />
in terms of protecting participant privacy when<br />
individually unique genomes are archived for scientific<br />
research, often requiring controlled-access approval<br />
systems to ensure compliance with data access policies.<br />
The EGA supports secure, controlled-access data<br />
management for human genomes and variation data,<br />
providing a standard mechanism for providing access to<br />
data to a wide set of research users in a secure manner.<br />
It is jointly developed by EMBL-EBI and the Centre for<br />
Genomic Regulation (CRG) in Barcelona, Spain.<br />
EMBL-EBI is a member of the Global Alliance for<br />
Genomics and Health (GA4GH), which partnered<br />
with ELIXIR in <strong>2015</strong> to provide genomics services that<br />
balance data protection and efficient data sharing. The<br />
project develops ‘Beacons’, which provide consent-based<br />
access to genomic data in the EGA as well as national<br />
resources in Finland, Sweden and the Netherlands.<br />
Variation data is the primary analysis product of the<br />
sequencing, alignment and variant-calling pipeline to<br />
studies of population genetics, genotype-to-phenotype<br />
association and functional analysis linking the genome<br />
to molecular pathways. The EVA, the global reference<br />
catalogue of genetic variation, provides a basis for<br />
interpreting each new genome and variant observed in<br />
research and clinical studies. It includes the Database<br />
of Genomic Variant (DGVa) project, provides a primary<br />
archive service for genetic variation data and builds<br />
on EMBL-EBI’s sequence-level archives (i.e. ENA and<br />
EGA), supporting value-added analysis and<br />
visualisation resources.<br />
The European Genome-phenome Archive is<br />
now co-developed with the Centre for Genomic<br />
Regulation in Barcelona, Spain.<br />
Together with international partners, the EVA provides<br />
a stable, accessioned database that catalogues and<br />
provides access to genetic variation in all species. This<br />
is a powerful tool for researchers working in clinical,<br />
agricultural, biotechnological and ecological research.<br />
Major achievements<br />
European Genome-phenome Archive<br />
Our team handled a 50% increase in the volume of<br />
data archived in the EGA and a 65% increase in the<br />
number of files submitted; the resource grew to over<br />
three Petabytes of human genomic and variation<br />
data. We deployed a new EGA downloader service,<br />
which distributed over 1.7 Petabytes of data in <strong>2015</strong>. In<br />
collaboration with the GA4GH, we implemented a tiered<br />
Beacon for the EGA, enabling both anonymous and<br />
registered access to a limited collection of variation data.<br />
By being a member of the Beacon Network, multiple<br />
institutions can now easily discover relevant datasets<br />
from a single access point.<br />
We re-built the EGA pipeline to improve capacity and<br />
reliability, and achieved a reduction of the quarterly<br />
average processing time from three weeks to one and<br />
a half days. In the context of the NIH-funded Big Data<br />
to Knowledge (BD2K) project, we exposed variation<br />
data and delivered EGA content for the Omics Data<br />
Discovery Index.<br />
Together with our federation partners and co-developers<br />
at the CRG in Barcelona, we substantially increased the<br />
resource’s capacity to distribute data. The CRG now<br />
distributes files via FTP or Aspera, and EMBL-EBI<br />
distributes files via the EGA downloader. In addition,<br />
a new programmatic interface hosted at CRG provides<br />
access to publicly available metadata about studies,<br />
samples and datasets held in the Archive.<br />
Together with our partners in ELIXIR, the GA4GH<br />
and CRG, we developed a three-tiered Beacon that<br />
provides a single point of access to datasets from<br />
multiple institutions. The Beacon Project allows users<br />
to make simple, anonymous queries on controlled-<br />
93<br />
<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>