Annual Scientific Report 2015

Recommendations

Info

Protein Function Content One of the central activities of the Protein Function Content team is the biocuration of our databases, interpreting and integrating information relevant to biology. The primary goals of biocuration are accurate and comprehensive representation of biological knowledge, as well as facilitating easy access to this data for working scientists and providing a basis for computational analysis. The curation methods we apply to UniProtKB/ Swiss-Prot include manual extraction and structuring of experimental information from the literature, manual verification of results from computational analyses, quality assessment, integration of large-scale datasets and continuous updating as new information becomes available. UniProt has two complementary approaches to automatic annotation of protein sequences with a high degree of accuracy. UniRule is a collection of manually curated annotation rules, which define annotations that can be propagated based on specific conditions. The Statistical Automatic Annotation System (SAAS) is an automatic, decision-tree-based, rule-generating system. The central components of these approaches are rules based on the manually curated data in UniProtKB/ Swiss-Prot from the experimental literature and InterPro classification. The UniProt GO annotation (GOA) program aims to add high-quality Gene Ontology (GO) annotations to proteins in the UniProt Knowledgebase (UniProtKB). We supplement UniProt manual and electronic GO annotations with manual annotations supplied by external collaborating GO Consortium groups. This ensures that users have a comprehensive GO annotation dataset. UniProt is a member of the GO Consortium. Major achievements As a core contributor to the Consensus CDS project, UniProt is creating an authoritative complete proteome set for Homo sapiens in close collaboration with the RefSeq annotation group at the National Center for Biotechnology Information (NCBI) and the Ensembl and HAVANA teams at EMBL-EBI and the Wellcome Trust Sanger Institute. A component of this effort involves ensuring a curated and complete synchronisation with the HUGO Gene Nomenclature Committee (HGNC), which has assigned unique gene symbols and names to 39 000 human loci (19 003 of which are listed as coding for proteins). Information on the reviewed set of 20 199 entries is available on the UniProt website. We play a major role in establishing minimum standards for genome annotation across the taxonomic range, largely thanks to collaborations arising from the annual NCBI Genome Annotation Workshops, which are attended by researchers from life science organisations worldwide. These standards have contributed significantly to the annotation of complete genomes and proteomes and are helping scientists exploit these data to their full potential. The UniProt Automatic Annotation effort made great strides in 2015. We increased the number of UniRules significantly, with an emphasis on enzymes across the taxonomic space to enable us to respond to the need for annotation of uncharacterised genomes. We began establishing relationships with sequencing and annotation centres such as Genoscope to share these rules and to expand into new approaches. The UniProt GO annotation program provides high-quality GO annotations to proteins in UniProtKB. The assignment of GO terms to UniProt records is an integral part of UniProt biocuration. UniProt manual and electronic GO annotations are supplemented with manual annotations supplied by external collaborating GO Consortium groups, to ensure a comprehensive GO annotation dataset is supplied to users. Our curators are key members of the GO Consortium Reference Genomes Initiative for the human proteome and provide high-quality annotations for human proteins. In 2014, we provided a manually curated set of human proteins for the validation of the computational approaches submitted to for the Critical Assessment of Function Annotation experiment (CAFA) and presented a guide to how best to use and interpret Gene Ontology data at the Automated Function Prediction SIG at the International Conference on Intelligent Systems for Molecular Biology (ISMB). 105 2015 EMBL-EBI Annual Scientific Report
Claire O’Donovan Protein Function Content BSc (Hons) in Biochemistry, University College Cork, 1992. Diploma in Computer Science University College Cork, 1993. Future plans In 2016 we will continue work on a ‘gold-standard’ dataset across the taxonomic range, with a particular focus on the UniProt proteomes set to fully address the requirements of the biochemical community. We will also continue to expand and refine our Ensembl and Genome Reference Consortium collaborations to ensure that UniProtKB provides the most appropriate gene-centric view of the protein space, allowing a cleaner and more logical mapping of gene and genomic resources to UniProtKB. We will continue to co-operate with diverse data providers (e.g., Ensembl, RefSeq, PRIDE) to integrate relevant genome and proteome information, and will import variation information from COSMIC. We also plan to extend our nomenclature collaborations to include higher-level organisms. We will prioritise the extraction of experimental data from the literature and extend our use of data-mining methods to identify scientific literature of particular interest with regard to our annotation priorities. We are committed to expanding UniRule by extending the number and range of rules with additional curator resources, both internal and external, and providing these rules to external collaborators for use in their systems. In 2016 we also plan to extend the scope of GO annotation to encompass entities other than proteins, in particular RNA and protein complexes. At EMBL since 1993, at EMBL-EBI since 1994. Team Leader since 2009. Selected publications Alam-Faruque Y, Hill DP, Dimmer EC, et al. (2014) Representing kidney development using the gene ontology. PLoS One 9:e99864 Alpi E, Griss J, da Silva AW, et al. (2014) Analysis of the tryptic search space in UniProt databases. Proteomics 15:48-57 Huntley RP, Sawford T, Martin MJ and O’Donovan C (2014) Understanding how and why the Gene Ontology and its annotations evolve: the GO within UniProt. Gigascience 3:4 Huntley RP, Sawford T, Mutowo-Meullenet P, et al. (2014) The GOA database: gene ontology annotation updates for 2015. Nucl Acids Res 43(database issue):d1057-63 Poux S, Magrane M, Arighi CN, et al. (2014) Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data. Database (Oxford) 2014: bau016 UniProt, the Universal Protein Resource, is integrated with data resources spanning all of molecular biology. 2015 EMBL-EBI Annual Scientific Report 106
Page 1 and 2:
The European Bioinformatics Institu
Page 3 and 4:
SERVICE TEAMS TRAINING PROGRAMME RE
Page 5 and 6:
Foreword We are pleased to present
Page 7 and 8:
awareness amongst some of our stron
Page 9 and 10:
Chemical biology The 17 million nov
Page 11 and 12:
The most extensive catalogue of str
Page 13 and 14:
“ EMBL -EBI services are the back
Page 15 and 16:
European Nucleotide Archive The ENA
Page 17 and 18:
Vertebrate Genomics Paul Flicek Bro
Page 19 and 20:
Functional Genomics Alvis Brazma
Page 21 and 22:
Pfam Pfam is a database of protein
Page 23 and 24:
Protein Data Bank in Europe Gerard
Page 25 and 26:
MetaboLights MetaboLights is a data
Page 27 and 28:
Proteomics Services and Molecular I
Page 29 and 30:
BioSamples The BioSamples database
Page 31 and 32:
“ EMBL -EBI is a critical mass of
Page 33 and 34:
EMBL International PhD Programme at
Page 35 and 36:
“ It would be a considerable loss
Page 37 and 38:
The Birney group used methods devel
Page 39 and 40:
Marioni group • Improved and exte
Page 41 and 42:
“ Because I work for a micro biot
Page 43 and 44:
Industry workshops • In silico AD
Page 45 and 46:
The work of our institute relies on
Page 47 and 48:
Web production Rodrigo Lopez System
Page 49 and 50:
2015 EMBL-EBI Annual Scientific Rep
Page 51 and 52:
Capital investment Support from the
Page 53 and 54:
In 2015 our core data resources con
Page 55 and 56: Joint publications Most of our 299
Page 57 and 58: One from Many: Perspectives on a Mu
Page 59 and 60: 2015 EMBL-EBI Annual Scientific Rep
Page 61 and 62: European Nucleotide Archive • Mar
Page 63 and 64: Technical Services Cluster Scientif
Page 65 and 66: Expression Atlas • Oregon State U
Page 67 and 68: Photo: Uma Maheswari 2015 EMBL-EBI
Page 71 and 72: 037. Chiapparino A, Maeda K, Turei
Page 73 and 74: 115. Jakubec D, Hostas J, Laskowski
Page 75 and 76: 192. Perez-Riverol Y, Xu QW, Wang R
Page 77 and 78: 269. van den Berg BA, Reinders MJ,
Page 79 and 80: Director Ewan Birney Admininstratio
Page 83 and 84: Guy Cochrane European Nucleotide Ar
Page 85 and 86: Vertebrate Genomics Research The mo
Page 87 and 88: Daniel Zerbino Ensembl Genome Analy
Page 89 and 90: Future plans We will continue to de
Page 91 and 92: Andy Yates Genome Technology and In
Page 93 and 94: Paul Kersey Non-vertebrate Genomics
Page 95 and 96: Justin Paschall Variation Archive M
Page 97 and 98: Alvis Brazma Functional Genomics Ph
Page 99 and 100: Ugis Sarkans Functional Genomics De
Page 101 and 102: Robert Petryszak Gene Expression MP
Page 103 and 104: Rob Finn Sequence Families PhD in B
Page 105: Maria-Jesus Martin Protein Function
Page 109 and 110: (such as the on-going EMDataBank Ma
Page 111 and 112: Sameer Velankar PDBe Content and In
Page 113 and 114: containing the mapping between comp
Page 115 and 116: of 14 leading European labs in Meta
Page 117 and 118: Henning Hermjakob Proteomic service
Page 119 and 120: coimmunoprecipitation coimmunopreci
Page 121 and 122: development of Europe PMC as a plat
Page 123 and 124: Mouse informatics In 2015 we contin
Page 127 and 128: Train online, EMBL-EBI’s web-base
Page 129 and 130: Nils Koelling Quantitative genetics
Page 133 and 134: Pedro Beltrao PhD in Biology, Unive
Page 135 and 136: Ewan Birney PhD 2000, Wellcome Trus
Page 137 and 138: Anton Enright PhD in Computational
Page 139 and 140: Nick Goldman PhD University of Camb
Page 141 and 142: John Marioni PhD in Applied Mathema
Page 143 and 144: Julio-Saez Rodriguez PhD University
Page 145 and 146: Oliver Stegle PhD in Physics, Unive
Page 147 and 148: Future plans The Teichmann group wi
Page 149 and 150: findings regarding association were
Page 153 and 154: Future plans The Industry Programme
Page 157 and 158:
Reporting on usage We further devel
Page 159 and 160:
to find the support they need. The
Page 161 and 162:
Petteri Jokinen Systems & Networkin
Page 163 and 164:
Standby Facility and Database Disas
Page 165 and 166:
External Relations leads on brand a
Page 167 and 168:
Mark Green EMBL-EBI Administration
show all

Annual Scientific Report 2015

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?