Annual Scientific Report 2015
EMBL_EBI_ASR_2015_DigitalEdition
EMBL_EBI_ASR_2015_DigitalEdition
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Justin Paschall<br />
Variation Archive<br />
MA 2008, Washington University St. Louis.<br />
Team Leader at EMBL-EBI from 2012 to <strong>2015</strong>.<br />
access datasets, for example a simple yes/no question<br />
as to whether a certain allele exists at a certain position<br />
within a Beacon. We helped develop a tiered-access<br />
beacon that grants an authorised, registered user<br />
access to additional data, such as allele frequencies or<br />
extra datasets. We also co-developed a third level for<br />
controlled access, which serves users who have been<br />
granted full access to specific datasets.<br />
European Variation Archive<br />
In 2016 the number of species represented in the EVA<br />
increased to 22, representing substantial growth in<br />
variation data for animals and plants. Work supported<br />
by the BBSRC and the UK’s Confederation of British<br />
Industry enabled us to offer datasets for plants and<br />
crops, including barley, maize, rice, sorghum and tomato.<br />
Thanks to our participation in the NextGen project<br />
to preserve farm-animal biodiversity, the archive also<br />
offers variation data on animals including cow, goat<br />
and sheep. Other submitted datasets include chicken,<br />
chimpanzee, dog, mosquito, mouse and vervet monkey.<br />
We made available the highly accessed human datasets<br />
from Phase 3 of the 1000 Genomes Project and from the<br />
Exome Aggregation Consortium (ExAC) v0.3.<br />
We improved the Variant Browser in a number of ways,<br />
for example making datasets from 13 different species<br />
more accessible and integrating the variant annotations<br />
generated by the Ensembl’s Variant Effect Predictor<br />
(VEP) tool. To help users detect infrequently occurring<br />
variants with potentially high impact, we updated the<br />
Variant Browser to filter by consequence type, proteinsubstitution<br />
score and minor allele frequency, and to<br />
display population statistics such as allele frequencies.<br />
To guarantee the correctness of submitted data and<br />
improve data quality, we implemented a Variant Call<br />
Format (VCF) validator. Users will be able to download<br />
the results of website queries in VCF format in 2016.<br />
We improved the representation of clinical information<br />
with a new display for data from ClinVar. To support<br />
drug discovery, we also helped standardise ClinVar data,<br />
providing ontology-based representation of disease to<br />
the new Target Validation Platform by Open Targets<br />
(formerly CTTV). In addition, as part of the GA4GH, we<br />
developed global standards and delivered a new API for<br />
archived variation data.<br />
Helen Parkinson<br />
Head of Molecular Archival Resources<br />
PhD Genetics, 1997. Research Associate in<br />
Genetics, University of Leicester 1997-2000.<br />
At EMBL-EBI since 2000.<br />
Future plans<br />
We will complete our work to facilitate access to data in<br />
the EGA by authorised users, enabling them to analyse<br />
these datasets more easily by integrating them with<br />
genome viewers such as Ensembl and open-source<br />
biomedical research platforms such as Galaxy. We will<br />
also extend the downloader to allow access to indexed,<br />
encrypted files. As part of the Accelerating Medicines<br />
Partnership, we will provide federated access to the Type<br />
2 Diabetes Portal. We also plan to develop a new front<br />
end for EGA, improve the submissions workflow, and<br />
optimise file quality control through better feedback<br />
processes with submitters and users.<br />
Funding from the EU-funded CORBEL and ELIXIR-<br />
EXCELERATE grants will enable our team to carry<br />
out important work on establishing standards and<br />
best practice for the secure access to sensitive data.<br />
To allow federated access to the EGA, we will develop<br />
and implement standards for authentication and<br />
authorisation.<br />
Integrating the EGA with EMBL-EBI services including<br />
Ensembl, BioSamples and the ENA will be an important<br />
part of our work in 2016. We will also integrate the<br />
IOBIO visualisation tool to provide more user-friendly<br />
display of file statistics.<br />
The EVA will continue brokering submissions to dbSNP,<br />
saving users an extra step. Including dbSNP data on<br />
multiple species will enable us to connect users with<br />
a catalogue of variation data that submitted through<br />
external services, and all the reference SNP identifiers<br />
(rs) generated by dbSNP.<br />
The EVA will move into a rolling release cycle in 2016,<br />
making datasets available as soon as their processing<br />
is finished. We will improve the performance of the<br />
processing pipeline and the website, creating a better<br />
experience for submitters and end users alike.<br />
Selected publication<br />
Lappalainen I, et al. (<strong>2015</strong>) The European<br />
Genome-phenome Archive of human data consented for<br />
biomedical research. Nature Genetics 47:692-695<br />
<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong> 94