22.08.2016 Views

Annual Scientific Report 2015

EMBL_EBI_ASR_2015_DigitalEdition

EMBL_EBI_ASR_2015_DigitalEdition

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Justin Paschall<br />

Variation Archive<br />

MA 2008, Washington University St. Louis.<br />

Team Leader at EMBL-EBI from 2012 to <strong>2015</strong>.<br />

access datasets, for example a simple yes/no question<br />

as to whether a certain allele exists at a certain position<br />

within a Beacon. We helped develop a tiered-access<br />

beacon that grants an authorised, registered user<br />

access to additional data, such as allele frequencies or<br />

extra datasets. We also co-developed a third level for<br />

controlled access, which serves users who have been<br />

granted full access to specific datasets.<br />

European Variation Archive<br />

In 2016 the number of species represented in the EVA<br />

increased to 22, representing substantial growth in<br />

variation data for animals and plants. Work supported<br />

by the BBSRC and the UK’s Confederation of British<br />

Industry enabled us to offer datasets for plants and<br />

crops, including barley, maize, rice, sorghum and tomato.<br />

Thanks to our participation in the NextGen project<br />

to preserve farm-animal biodiversity, the archive also<br />

offers variation data on animals including cow, goat<br />

and sheep. Other submitted datasets include chicken,<br />

chimpanzee, dog, mosquito, mouse and vervet monkey.<br />

We made available the highly accessed human datasets<br />

from Phase 3 of the 1000 Genomes Project and from the<br />

Exome Aggregation Consortium (ExAC) v0.3.<br />

We improved the Variant Browser in a number of ways,<br />

for example making datasets from 13 different species<br />

more accessible and integrating the variant annotations<br />

generated by the Ensembl’s Variant Effect Predictor<br />

(VEP) tool. To help users detect infrequently occurring<br />

variants with potentially high impact, we updated the<br />

Variant Browser to filter by consequence type, proteinsubstitution<br />

score and minor allele frequency, and to<br />

display population statistics such as allele frequencies.<br />

To guarantee the correctness of submitted data and<br />

improve data quality, we implemented a Variant Call<br />

Format (VCF) validator. Users will be able to download<br />

the results of website queries in VCF format in 2016.<br />

We improved the representation of clinical information<br />

with a new display for data from ClinVar. To support<br />

drug discovery, we also helped standardise ClinVar data,<br />

providing ontology-based representation of disease to<br />

the new Target Validation Platform by Open Targets<br />

(formerly CTTV). In addition, as part of the GA4GH, we<br />

developed global standards and delivered a new API for<br />

archived variation data.<br />

Helen Parkinson<br />

Head of Molecular Archival Resources<br />

PhD Genetics, 1997. Research Associate in<br />

Genetics, University of Leicester 1997-2000.<br />

At EMBL-EBI since 2000.<br />

Future plans<br />

We will complete our work to facilitate access to data in<br />

the EGA by authorised users, enabling them to analyse<br />

these datasets more easily by integrating them with<br />

genome viewers such as Ensembl and open-source<br />

biomedical research platforms such as Galaxy. We will<br />

also extend the downloader to allow access to indexed,<br />

encrypted files. As part of the Accelerating Medicines<br />

Partnership, we will provide federated access to the Type<br />

2 Diabetes Portal. We also plan to develop a new front<br />

end for EGA, improve the submissions workflow, and<br />

optimise file quality control through better feedback<br />

processes with submitters and users.<br />

Funding from the EU-funded CORBEL and ELIXIR-<br />

EXCELERATE grants will enable our team to carry<br />

out important work on establishing standards and<br />

best practice for the secure access to sensitive data.<br />

To allow federated access to the EGA, we will develop<br />

and implement standards for authentication and<br />

authorisation.<br />

Integrating the EGA with EMBL-EBI services including<br />

Ensembl, BioSamples and the ENA will be an important<br />

part of our work in 2016. We will also integrate the<br />

IOBIO visualisation tool to provide more user-friendly<br />

display of file statistics.<br />

The EVA will continue brokering submissions to dbSNP,<br />

saving users an extra step. Including dbSNP data on<br />

multiple species will enable us to connect users with<br />

a catalogue of variation data that submitted through<br />

external services, and all the reference SNP identifiers<br />

(rs) generated by dbSNP.<br />

The EVA will move into a rolling release cycle in 2016,<br />

making datasets available as soon as their processing<br />

is finished. We will improve the performance of the<br />

processing pipeline and the website, creating a better<br />

experience for submitters and end users alike.<br />

Selected publication<br />

Lappalainen I, et al. (<strong>2015</strong>) The European<br />

Genome-phenome Archive of human data consented for<br />

biomedical research. Nature Genetics 47:692-695<br />

<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong> 94

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!