22.08.2016 Views

Annual Scientific Report 2015

EMBL_EBI_ASR_2015_DigitalEdition

EMBL_EBI_ASR_2015_DigitalEdition

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Guy Cochrane<br />

European Nucleotide Archive<br />

PhD University of East Anglia, 1999.<br />

At EMBL-EBI since 2002.<br />

Team Leader since 2009.<br />

Quality through standards<br />

The ENA team tracks growth in the uptake of<br />

sequencing and the emergence of innovations in the<br />

field, as these directly impact the growth and evolution<br />

of our services. In <strong>2015</strong> we entered the final stages of a<br />

transition from direct, manual submission processing<br />

to a system that focuses curator input using formally<br />

defined checklists and validation rules. Checklists offer<br />

a structured way to collect information about a sample,<br />

presenting attribute names alongside their definitions,<br />

usage conventions and syntactic rules (e.g. conventional<br />

expression, controlled vocabulary). This approach<br />

allows us to optimise datasets for discoverability and<br />

reanalysis across classes of data submission. It also<br />

allows ENA curators to create and edit attributes<br />

efficiently. This supports concurrent working on the<br />

system and allows for safety operations such as rollback.<br />

The system makes it possible for a single editing event<br />

to drive a change to the attribute across all checklists in<br />

which it appears, where appropriate. Such normalisation<br />

ensures a consistent experience for the data submitter,<br />

supports the capture of consistent and reliable data and,<br />

ultimately, improves the presentation of search services.<br />

Data compression<br />

We advanced our CRAM reference-based sequence<br />

data compression technology in <strong>2015</strong>. We continued to<br />

offer and support CRAM as a public software package<br />

for its broadest possible use, extended the technology<br />

itself and adopted it more deeply across ENA services.<br />

We transitioned to CRAM v. 3; extended the software<br />

to include more effective, faster compression; adopted<br />

new compression codecs; improved the treatment of<br />

unmapped reads; established greater controls on data<br />

integrity under random access; and provided more<br />

support for external tools such as the widely used<br />

hts-jdk. We enriched services for CRAM as a core data<br />

format within the Webin and ENA systems, providing<br />

full support across the Webin interfaces for CRAM<br />

submission and the systematic reference indexing of all<br />

submitted raw read CRAM data files to make these reads<br />

available through genomic coordinate-based queries.<br />

Future plans<br />

In 2016 we will continue to work with user communities<br />

on data standards, for example extending the established<br />

Marine Microbial Biodiversity, Bioinformatics and<br />

Biotechnology (M2B3) standard, including coverage of<br />

aquaculture and blue biotechnology-related studies. We<br />

also expect further work on pathogen-related standards.<br />

We will actively seek to collaborate with further<br />

communities to target coverage gaps, with a view to<br />

having checklist coverage across all classes of incoming<br />

data. Curation of data submissions representing<br />

non-assembly annotated sequence will<br />

become a fully autonomous strand of<br />

activity in 2016, which will complete our transition to<br />

having all major submission workflows operating in a<br />

scalable, quality-assured mode.<br />

We will implement specific computational workflows<br />

in the COMPARE Embassy Cloud system, initially<br />

covering bacterial assembly and functional annotation<br />

and typing/resistance profiling. We will further develop<br />

the COMPARE Data Hub concept to allow simpler<br />

user management and more integrated access. We<br />

will begin to construct a data portal for the pathogen<br />

surveillance community, with tailored search, browse<br />

and visualisation tools, and will continue to support data<br />

sharing and analysis efforts around emerging outbreaks.<br />

We will extend the existing ENA system for structured<br />

analysis output data, for example for antimicrobial<br />

drug-resistance profiles and abundance profiles from<br />

ecological studies. This will allow for the agile response<br />

to submissions and data presentation for as-yetunsupported<br />

data types. It has already been used as<br />

the basis for assembly and variation data in the EVA,<br />

submission and indexing support. Extending this system<br />

to serve as data infrastructure for EBI Metagenomics<br />

will help us improve submission and retrieval flexibility.<br />

Selected publications<br />

Gibson R, et al. (2016) Biocuration of functional<br />

annotation at the European nucleotide archive. Nucleic<br />

Acids Res. 44:D58-D66. doi:10.1093/nar/gkv1311<br />

Cochrane G, et al. (2016) The International Nucleotide<br />

Sequence Database Collaboration. Nucleic Acids Res.<br />

44:D48-D50. doi:10.1093/nar/gkv1323<br />

Ten Hoopen P, et al. (<strong>2015</strong>) Marine microbial<br />

biodiversity, bioinformatics and biotechnology (M2B3)<br />

data reporting and service standards. Stand Genomic Sci.<br />

10:20. doi:10.1186/s40793-015-0001-5<br />

Ip CL, et al. (<strong>2015</strong>) MinION Analysis and Reference<br />

Consortium: Phase 1 data release and analysis.<br />

F1000Res. 4:1075. doi:10.12688/f1000research.7201.1<br />

Mitchell A, et al. (2016) EBI metagenomics in 2016 -<br />

an expanding and evolving resource for the analysis<br />

and archiving of metagenomic data. Nucleic Acids Res.<br />

44:D595-D603. doi:10.1093/nar/gkv1195<br />

<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong> 82

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!