Annual Scientific Report 2015
EMBL_EBI_ASR_2015_DigitalEdition
EMBL_EBI_ASR_2015_DigitalEdition
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Vertebrate Genomics<br />
Paul Flicek<br />
Bronwen Aken<br />
Andrew Yates<br />
Daniel Zerbino<br />
• Welcomed three new Team Leaders in September: Bronwen<br />
Aken, Andrew Yates and Daniel Zerbino;<br />
• Updated the data from several major projects (e.g. 1000<br />
Genomes Project, BLUEPRINT) to reflect the new GRCh38<br />
human reference assembly;<br />
• Issued five major releases of Ensembl, and provided updates<br />
to other highly used resources, e.g. human (now assembly v.<br />
GRCh37.p8) and mouse (now GRC38m.p4) genomes;<br />
• Published the Ensembl regulatory build and the genome of<br />
the vervet monkey;<br />
• Released important annotation updates to the rat (Rnor_6.0)<br />
and zebrafish (GRCz10) genome assemblies, and introduced<br />
a dynamic gene gain/loss view of these datasets;<br />
• Released BLUEPRINT data via GenomeStats, a web-based<br />
tool for carrying out analyses of epigenomic data;<br />
• Released the beta version of our TrackHub registry;<br />
• Developed new views and tools, enhanced performance and<br />
usability of existing views, extended support for track hubs,<br />
and improved our mirror sites;<br />
• Developed a new BioMart system to provide fast access to all<br />
regulatory data from the new Ensembl Regulatory Build and<br />
added new bindings in Bioconductor;<br />
• Helped launch the Functional Annotation of Animal<br />
Genomes (FAANG) project, in which we lead efforts to define<br />
data and metadata standards;<br />
• Upgraded the HipSci project website to improve the<br />
discoverability of individual cell lines and related data;<br />
• Improved display of variation data tables and introduced<br />
Manhattan plots for linkage disequilibrium data;<br />
• Managed the growth in usage of the Ensembl REST API,<br />
which had over 70 million requests in <strong>2015</strong>;<br />
• Introduced a new visualisation tool for long-range<br />
connections between genomic regions;<br />
• Promoted Ensembl resources through social media,<br />
conferences, webinars and 97 workshops;<br />
• Helped complete the relocation of the GWAS Catalog<br />
software infrastructure from the NHGRI in the US to<br />
EMBL-EBI;<br />
• Improved the GWAS Catalog website by updating the search<br />
interface with SOLR technology and supporting ontology<br />
expansion queries.<br />
Non-vertebrate Genomics<br />
Paul Kersey<br />
• Issued six public releases of Ensembl Genomes;<br />
• Contributed to the regular data releases of Vector Base,<br />
Wormbase and PomBase;<br />
• Increased the number of bacterial genomes available through<br />
the Ensembl public interface to nearly 30 000;<br />
• Increased the number of fungal genomes 10-fold and protist<br />
genomes 5-fold;<br />
• Made major contributions to the paper describing the<br />
genome of Anopheles stephensi, the primary mosquito<br />
vector of malaria in urban India;<br />
• Extended community curation to plant pathogens,<br />
and released new data-mining tools for PhytoPath and<br />
WormBase ParaSite;<br />
• Issued a new, more contiguous and complete assembly of the<br />
bread wheat genome to the research community.<br />
Variation<br />
Justin Paschall and Helen Parkinson<br />
• Handled a 50% increase in the volume of data archived in<br />
the European Genome-phenome Archive (EGA) and a 65%<br />
increase in the number of files submitted;<br />
• Deployed a new EGA downloader service, which distributed<br />
over 1.7 Petabytes of data;<br />
• Implemented a Global Alliance for Genomics and Health<br />
‘Beacon’ for the EGA, enabling users to access a limited<br />
collection of variation data through a single, three-tiered<br />
entry point;<br />
• Re-built the EGA pipeline, reducing the quarterly average<br />
processing time from three weeks to one and a half days;<br />
• In collaboration with colleagues at CRG Barcelona, increased<br />
the EGA’s capacity to distribute data via FTP, Aspera and a<br />
customised downloader;<br />
• Managed the growth of the European Variation Archive to<br />
22 datasets on various organisms, including crop species<br />
and domesticated animals;<br />
• Made available datasets from Phase 3 of the 1000 Genomes<br />
Project and from the Exome Aggregation Consortium<br />
(ExAC);<br />
• Improved the EVA browser by integrating variant<br />
annotations generated by the Ensembl Variant Effect<br />
Predictor tool and applying advanced search filters;<br />
• Improved the representation of clinical information with a<br />
new display for data from ClinVar;<br />
• Helped standardise ClinVar data in the context of Open<br />
Targets (formerly CTTV) and developed global standards for<br />
variation data as part of the GA4GH.<br />
<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong> 15