Proteomics servicesThe Proteomics Services team develops tools and resources for therepresentation, deposition, distribution and analysis of proteomics andsystems biology data.We follow an open-source, open-data approach: all of the resources we develop are freely available. The team is a majorcontributor to community standards, in particular the Proteomics Standards Initiative (PSI) of the international HumanProteome Organisation (HUPO) and systems biology standards (COMBINE Network). We provide public databases asreference implementations for community standards: the PRIDE proteomics identifications database, the IntAct molecularinteraction database, the Reactome pathway database and BioModels Database, a repository of computational models ofbiological systems.As a result of long-term engagement with the community, journal editors and funding organisations, data deposition in ourstandards-compliant data resources is becoming a strongly recommended part of the publishing process. This has resulted ina rapid increase in the data content of our resources. Our curation teams ensure consistency and appropriate annotation of alldata, whether from direct depositions or literature curation, to provide the community with high-quality reference datasets.We also contribute to the development of data integration technologies, using protocols like Distributed Annotation System(DAS) and semantic web technologies, and provide stable identifiers for biomolecular entities through identifiers.org.Major achievementsA major success in <strong>2012</strong> was achieving full production modefor the ProteomeXchange consortium. In this EU-fundedconsortium, PRIDE works with a number of internationalpartners (e.g., PeptideAtlas, UniProt, University of Ghent,University of Liverpool, ETH Zurich, University of Michigan,Wiley-VCH) to co-ordinate data deposition and disseminationstrategies for mass spectrometry data, providing a singleentry point for data deposition, a shared accession numberspace and a deposition metadata format. In spring <strong>2012</strong>,ProteomeXchange started full production and achieved gooduser acceptance, reaching 77 submissions with more than 20million spectra by November <strong>2012</strong> (see Figure).ProteomeXchange submission strategies include thecapability to deposit large raw datasets, and in <strong>2012</strong> wecompletely redeveloped PRIDE data deposition support,publishing Java libraries (Griss et al, <strong>2012</strong>; Reisinger et al,<strong>2012</strong>), an updated PRIDE Converter submission tool (Côté etal, <strong>2012</strong>), and the PRIDE Inspector data access tool (Wang etal, <strong>2012</strong>).IMEx is an international consortium of major molecularinteraction data providers that globally synchronise theirdata deposition and curation efforts (Orchard et al, <strong>2012</strong>).IMEx partners share formats, identifier spaces and curationstrategies and directly share the web-based IntAct curationinfrastructure (Kerrien et al., <strong>2012</strong>). This avoids redundantdevelopment while retaining the value of each individualresource. IMEx partners include UniProt (Switzerland andthe United Kingdom), I2D (Canada), InnateDB (Ireland andCanada), Molecular Connections (India) and MechanoBio(Singapore).Reactome provides review-style, curated and peer-reviewedhuman pathways in a computationally accessible form. In<strong>2012</strong> we collaborated with the Ouwehand group to providea detailed update of platelet-related pathways (Jupe et al,<strong>2012</strong>), as well as many other high-profile pathways. Wecurate disease variants of normal physiological pathways toprovide users with information about mutations and how theyaffect pathways. We also focus on cancer-relatedsignalling pathways.Our BioModels team contributed to a large-scale collaborativeeffort for the semi-automated generation of systems biologymodels based on KEGG pathways, Biocarta, MetaCyc andSABIO-RK data. More than 142 000 models, covering 1852species, were generated, opening up new opportunities46 <strong>2012</strong> <strong>EMBL</strong>-<strong>EBI</strong> <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>
Henning HermjakobMSc Bioinformatics University of Bielefeld, Germany,1995. Research Assistant at the German NationalCentre for Biotechnology (GBF), 1996.At <strong>EMBL</strong>-<strong>EBI</strong> since 1997.to explore and refine models. We rose to the challenge ofintegrating these models into BioModels Database in <strong>2012</strong>,increasing the number of available models by two orders ofmagnitude.Future plansFollowing the upgrade of the PRIDE submission system, wehave turned our attention to redeveloping the core databaseand web interface. This is essential for maintaining goodresponse times and helping users access increasingly largeand complex proteomics datasets. We will begin providingquality-controlled subsets and derived datasets from PRIDE,evolving this resource from a primary database into a systemsbiology source of protein expression data.IntAct will increasingly provide confidence-scored interactiondatasets derived from integration of individual publications.The same strategies will be applied, where possible, tointeraction data from multiple sources, which we accessthrough the PSICQUIC interface. This will ensure we canprovide integrated, up-to-date interaction datasets. We willredesign the IntAct website in 2013 to improve accessibility.In 2013 we will make IntAct datasets accessible throughReactome. This will enhance Reactome’s visualisation ofmolecular interactions in the context of molecular pathways.The next Reactome website release will include a newinteractive pathway viewer, and will feature improved overlayof external information such as expression data. Reactomecuration will focus on disease-induced modifications ofpathways, supported by improved visualisation toolsand close collaboration with disease-oriented researchcommunities.We will develop the new storage infrastructure for theBioModels Database so that the resource can cope with theinflux of computationally generated models. We will developnew features (e.g., full model versioning, support for moremodelling formats), update the user interface, enhance searchfacilities and improve overall performance.We will continue to work with journals, editors and dataproducers to make more data publicly available by utilisingcommunity-supported standards.Figure. Distribution of ProteomeXchange submissions in <strong>2012</strong><strong>2012</strong> <strong>EMBL</strong>-<strong>EBI</strong> <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>47
- Page 1 and 2: EMBL-European Bioinformatics Instit
- Page 3: Table of contentsIntroduction & ove
- Page 6 and 7: EMBL-EBI 2012It was a year of trans
- Page 8 and 9: New service developments• Underst
- Page 10 and 11: Organisation ofEMBL-EBI Leadership
- Page 12 and 13: dGenes, genomes and variationThe Eu
- Page 14 and 15: dGenes, genomes and variationSummar
- Page 16 and 17: European Nucleotide ArchiveOur team
- Page 18 and 19: Vertebrate genomicsThe Vertebrate G
- Page 20 and 21: Nonvertebrate genomicsWe provide to
- Page 22 and 23: gMolecular atlasLife scientists are
- Page 24 and 25: Functional genomicsThe Functional G
- Page 26 and 27: Functional genomics productionOur t
- Page 28 and 29: Functional genomics developmentOur
- Page 30 and 31: PProteins and protein familiesUniPr
- Page 32 and 33: UniProt contentOne of the central a
- Page 34 and 35: UniProt developmentOur team provide
- Page 36 and 37: InterProOur team co-ordinates the I
- Page 38 and 39: sMolecular and cellular structureUn
- Page 40 and 41: Protein Data Bank in EuropeThe majo
- Page 42 and 43: PDBe content and integrationOur goa
- Page 44 and 45: PDBe databases and servicesOur team
- Page 46 and 47: yMolecular systemsThe genes and gen
- Page 50 and 51: Chemical biologyThe importance of s
- Page 52 and 53: ChEMBLThe ChEMBL team develops and
- Page 54 and 55: Cheminformatics and metabolismOur t
- Page 56 and 57: cCross-domain toolsand resourcesSci
- Page 58 and 59: Literature servicesScientific liter
- Page 60 and 61: Research2012 has seen the further t
- Page 62 and 63: Bertone groupPluripotency, reprogra
- Page 64 and 65: Birney groupNucleotide dataDNA sequ
- Page 66 and 67: Enright groupFunctional genomics an
- Page 68 and 69: Goldman groupEvolutionary tools for
- Page 70 and 71: Le Novère groupComputational syste
- Page 72 and 73: Luscombe groupGenomics and regulato
- Page 74 and 75: Marioni groupComputational and evol
- Page 76 and 77: Rebholz groupPhenotypes and multili
- Page 78 and 79: Saez-Rodriguez groupSystems biomedi
- Page 80 and 81: Thornton groupProteins: structure,
- Page 82 and 83: The EMBL International PhDProgramme
- Page 84 and 85: SupportOur support teams provide fo
- Page 86 and 87: T TrainingAs part of EMBL-EBI’s m
- Page 88 and 89: IIndustry programmeSince 1996 the I
- Page 90 and 91: NExternal relationsAs a European In
- Page 92 and 93: sExternal servicesOur team manages
- Page 94 and 95: SSystems and networkingOur team man
- Page 96 and 97: q AdministrationThe EMBL-EBI Admini
- Page 98 and 99:
Funding and resource allocationDesp
- Page 100 and 101:
Growth of core resourcesIn 2012 the
- Page 102 and 103:
CollaborationsEMBL-EBI is a highly
- Page 104 and 105:
Staff growthOur organisational stru
- Page 106 and 107:
Scientific advisory commiteesEMBL S
- Page 108 and 109:
The International Nucleotide Sequen
- Page 110 and 111:
EMDataBank Advisory Committee• Jo
- Page 112 and 113:
Major database collaborationsARRAYE
- Page 114 and 115:
THE GENE ONTOLOGY CONSORTIUM• Agb
- Page 116 and 117:
REACTOME• New York University Med
- Page 118 and 119:
Publications in 2012In 2012, EMBL-E
- Page 120 and 121:
Doreleijers, J. F., Vranken W. F.,
- Page 122 and 123:
Kruger, F. A., Rostom R. and Overin
- Page 124 and 125:
Sahakyan, Aleksandr B., Cavalli And
- Page 128:
EMBL - European Bioinformatics Inst