Annual Scientific Report 2015

Recommendations

Info

Protein Function Development The work of our team spans several major resources under the umbrella of UniProt, the comprehensive resource of protein sequences and functional annotation: the UniProt Knowledgebase, the UniProt Archive and the UniProt Reference Clusters. We develop software and services for protein information in the UniProt, Gene Ontology (GO) annotation and enzyme data resources at EMBL-EBI. We are also responsible for developing tools for UniProt and GO Annotation (GOA) curation, and for the study of novel, automatic methods for protein annotation. Major achievements The UniProt website facilitates the search, identification and analysis of gene products. In 2015 our team released new web interfaces and functionalities, all built in response to user feedback gathered in a number of user workshops, usability interviews/sessions, helpdesk reviews and surveys. We now offer better ways to customise search and present results. A new UniProt course in Train online allows users to browse, explore and analyse the profoundly rich, integrated collection of protein sequence data in this resource. Prior to the April 2015 release of UniProt, the UniProt Knowledgebase (UniProtKB) had doubled in size over the previous year to over 90 million entries, with a high level of redundancy. This was especially the case for bacterial species, where different genomes of the same bacterium have been sequenced and submitted independently (e.g. 4080 proteomes for Staphylococcus aureus, comprising 10.88 million entries). To deal with this redundancy, we developed a procedure to identify highly redundant proteomes within species groups. We implemented this procedure for bacterial species and the sequences corresponding to redundant proteomes (approximately 47 million entries) were moved from UniProtKB to the UniProt Archive (UniParc), where they are still available. This is the first concerted effort in a public protein database to deal firmly and effectively with redundancy in big data. We released a new version of the UniProt Java API that improved several issues, for example frequent library updates, retrieval speeds and server availability. With the new API, users can create their own UniProt service, query and retrieve sets of proteins of interest, for instance all records updated in the past few months, or belonging to a particular family or species. Our team worked with the UniProt user community as well as the NCBI RefSeq, Ensembl and Ensembl Genomes teams to provide a collection of non-redundant reference proteomes, and to maintain well-annotated organisms for biomedical and biotechnological research. New species released in 2015 include Theobroma cacao (cacao / cocoa), Brassica napus (rapeseed) and Papio anubis (olive baboon), among others. In collaboration with genomics resources Ensembl and COSMIC, we created data links between DNA sequences and the functional proteins they encode. Cross-references to specific genomic sequences are now provided for each protein isoform. We also began distributing variants with consequences at the protein level for human and other species, and released variants from external resources including the Exome Aggregation Consortium (ExAC) and the Exome Sequencing Project (ESP) in the protein context. We introduced new genome annotation track files in two formats, BED and bigBed, which allows users to map and visualise UniProtKB sequence feature annotations including domains, sites and posttranslational modifications as genome browser tracks. These can be visualised in Ensembl, the UCSC Genome Browser and NCBI Genome. This beta release of the UniProt genome annotation tracks resource contains sequence annotations only for human; other species will be added in future. We worked with the ProteomeXchange resources such as PeptideAtlas and MaxQB to provide experimental peptides from publicly available massspectrometry studies for UniProt proteins for several reference species. In 2015 our team extended the functionality of our automated annotation system, which assists in the curation of the 103 2015 EMBL-EBI Annual Scientific Report
Maria-Jesus Martin Protein Function Development BSc In Veterinary Medicine, University Autonoma in Madrid. PhD in Molecular Biology (Bioinformatics), 2003. At EMBL-EBI since 1996. Team Leader since 2009. millions of proteins in UniProt. Informed by specialist biocurators, the automated system adds as much useful information as possible to imported sequences, which now include domains, signal, transmembrane and coil regions. We extended UniRule and the Statistical Automatic Annotation System (SAAS), two systems for the automatic annotation of large volumes of uncharacterised proteins. These are now available through newly implemented interactive web pages, allowing our users to browse annotation rules. We also started to work in a service to download and/or use these rules as a system for genome annotation. We extended our collaborations with external automatic annotation communities including the Biofunction Prediction and Critical Assessment of Function Annotation initiatives, which will expand our knowledge and use of functional prediction methods. In 2015 we further extended the scope of GO annotation to support annotations to RNA, identified by RNAcentral identifiers. We made significant changes to our database and Protein2GO, the web-based GO curation tool used by UniProt and GO Consortium curators to contribute annotations to the GOA project, in order to support a number of changes to annotation format and rules agreed by the GO Consortium. We re-engineered our pipeline that verifies the taxonomic correctness of GO annotations using a much-extended set of taxonomic constraints that originate from both GO and other ontologies, principally UBERON. To make GO protein–protein interaction annotations available for visualisation in tools such as Cytoscape, we implemented a PSICQUIC (Proteomics Standards Initiative Common Query Interface) server, available through EMBL-EBI’s PSICQUIC portal. Our team maintains the Enzyme Portal, a resource that integrates enzyme-related data for all relevant EMBL-EBI resources and the underlying functional and genomic data. We re-launched the service, which now features improved interfaces and functionalities and provides a one-stop shop for all information available on enzymes. To further improve the discoverability of enzyme data, we collaborated with the Web Production team to refine the enzyme search within the EBI-Search and EBI Blast sequence search tools. Future plans In 2016 we plan to release a protein-sequence feature viewer that summarises functional sites in the UniProt web site. We will continue to engage with user communities working in functional prediction, and explore methods and data-exchange mechanisms to improve accuracy and coverage of protein annotations. We will maintain our focus on usability and engage with our users to ensure we maintain a global genome/ proteome- and gene-product-centric view of the sequence space. We aim to expand our collaboration with the ProteomeXchange resources in the integration of post-translational modifications in UniProtKB, and in the provision of experimental, unique peptide mappings for reference species. We will continue to co-operate with variation projects such as ExAC to integrate relevant genome and proteome information. Restructuring GO electronic annotation pipelines, principally those based on orthology supplied by Ensembl, will help us improve the quality of the projected annotations. We will continue the work undertaken on behalf of the GO Consortium in 2015 to transition from using UniProt cross-references rather than MOD-supplied mapping files to map from “foreign” identifiers to UniProtKB accessions. We also plan to revise the set of annotation files that we publish and submit to the GO Consortium. We plan to release a new QuickGO with re-designed interfaces and new features to improve the overall user experience. We will also continue to develop Protein2GO to keep it in line with changes in annotation strategy agreed by the GO Consortium, and to introduce additional function to enhance curators’ workflow. We will make use of the enhanced set of Web Services provided by the new QuickGO to provide improved searching capabilities, and the ability to use any available ECO evidence code. Following the successful relaunch of the Enzyme Portal in 2015, we will expand its functionalities in response to user needs and create new training activities. Selected publications Alpi E, Griss J, et al. (2015) Analysis of the tryptic search space in UniProt databases. Proteomics 15:48-57 Huntley RP, et al. (2015) The GOA database: Gene Ontology annotation updates for 2015. Nucleic Acids Res. 43:d1057-d1063 Pundir S, Magrane M, Martin MJ, O’Donovan C, UniProt Consortium (2015) Searching and navigating UniProt databases. Curr. Protoc. Bioinform. 50:1.27.1–1.27.10 UniProt Consortium (2015) UniProt: a hub for protein information. Nucleic Acids Res. 43:d204-d212 2015 EMBL-EBI Annual Scientific Report 104
Page 1 and 2:
The European Bioinformatics Institu
Page 3 and 4:
SERVICE TEAMS TRAINING PROGRAMME RE
Page 5 and 6:
Foreword We are pleased to present
Page 7 and 8:
awareness amongst some of our stron
Page 9 and 10:
Chemical biology The 17 million nov
Page 11 and 12:
The most extensive catalogue of str
Page 13 and 14:
“ EMBL -EBI services are the back
Page 15 and 16:
European Nucleotide Archive The ENA
Page 17 and 18:
Vertebrate Genomics Paul Flicek Bro
Page 19 and 20:
Functional Genomics Alvis Brazma
Page 21 and 22:
Pfam Pfam is a database of protein
Page 23 and 24:
Protein Data Bank in Europe Gerard
Page 25 and 26:
MetaboLights MetaboLights is a data
Page 27 and 28:
Proteomics Services and Molecular I
Page 29 and 30:
BioSamples The BioSamples database
Page 31 and 32:
“ EMBL -EBI is a critical mass of
Page 33 and 34:
EMBL International PhD Programme at
Page 35 and 36:
“ It would be a considerable loss
Page 37 and 38:
The Birney group used methods devel
Page 39 and 40:
Marioni group • Improved and exte
Page 41 and 42:
“ Because I work for a micro biot
Page 43 and 44:
Industry workshops • In silico AD
Page 45 and 46:
The work of our institute relies on
Page 47 and 48:
Web production Rodrigo Lopez System
Page 49 and 50:
2015 EMBL-EBI Annual Scientific Rep
Page 51 and 52:
Capital investment Support from the
Page 53 and 54: In 2015 our core data resources con
Page 55 and 56: Joint publications Most of our 299
Page 57 and 58: One from Many: Perspectives on a Mu
Page 59 and 60: 2015 EMBL-EBI Annual Scientific Rep
Page 61 and 62: European Nucleotide Archive • Mar
Page 63 and 64: Technical Services Cluster Scientif
Page 65 and 66: Expression Atlas • Oregon State U
Page 67 and 68: Photo: Uma Maheswari 2015 EMBL-EBI
Page 71 and 72: 037. Chiapparino A, Maeda K, Turei
Page 73 and 74: 115. Jakubec D, Hostas J, Laskowski
Page 75 and 76: 192. Perez-Riverol Y, Xu QW, Wang R
Page 77 and 78: 269. van den Berg BA, Reinders MJ,
Page 79 and 80: Director Ewan Birney Admininstratio
Page 83 and 84: Guy Cochrane European Nucleotide Ar
Page 85 and 86: Vertebrate Genomics Research The mo
Page 87 and 88: Daniel Zerbino Ensembl Genome Analy
Page 89 and 90: Future plans We will continue to de
Page 91 and 92: Andy Yates Genome Technology and In
Page 93 and 94: Paul Kersey Non-vertebrate Genomics
Page 95 and 96: Justin Paschall Variation Archive M
Page 97 and 98: Alvis Brazma Functional Genomics Ph
Page 99 and 100: Ugis Sarkans Functional Genomics De
Page 101 and 102: Robert Petryszak Gene Expression MP
Page 103: Rob Finn Sequence Families PhD in B
Page 107 and 108: Claire O’Donovan Protein Function
Page 109 and 110: (such as the on-going EMDataBank Ma
Page 111 and 112: Sameer Velankar PDBe Content and In
Page 113 and 114: containing the mapping between comp
Page 115 and 116: of 14 leading European labs in Meta
Page 117 and 118: Henning Hermjakob Proteomic service
Page 119 and 120: coimmunoprecipitation coimmunopreci
Page 121 and 122: development of Europe PMC as a plat
Page 123 and 124: Mouse informatics In 2015 we contin
Page 127 and 128: Train online, EMBL-EBI’s web-base
Page 129 and 130: Nils Koelling Quantitative genetics
Page 133 and 134: Pedro Beltrao PhD in Biology, Unive
Page 135 and 136: Ewan Birney PhD 2000, Wellcome Trus
Page 137 and 138: Anton Enright PhD in Computational
Page 139 and 140: Nick Goldman PhD University of Camb
Page 141 and 142: John Marioni PhD in Applied Mathema
Page 143 and 144: Julio-Saez Rodriguez PhD University
Page 145 and 146: Oliver Stegle PhD in Physics, Unive
Page 147 and 148: Future plans The Teichmann group wi
Page 149 and 150: findings regarding association were
Page 153 and 154: Future plans The Industry Programme
Page 155 and 156:
2015 EMBL-EBI Annual Scientific Rep
Page 157 and 158:
Reporting on usage We further devel
Page 159 and 160:
to find the support they need. The
Page 161 and 162:
Petteri Jokinen Systems & Networkin
Page 163 and 164:
Standby Facility and Database Disas
Page 165 and 166:
External Relations leads on brand a
Page 167 and 168:
Mark Green EMBL-EBI Administration
show all

Annual Scientific Report 2015

Create successful ePaper yourself

Delete template?

Save as template?