PDBe content and integrationOur goal is to ensure that PDBe truly serves the needs of the biomedicalcommunity. We are constantly improving PDBe’s web interface anddesigning new tools to make structural data available toall.In the context of the SIFTS project, we integrate structural data with other biological data to facilitate discovery. Theseintegrated data form the basis for many query interfaces that allow biomacromolecular structure data to be presented inits biological context. Our specific focus areas are: data integrity, data quality, integration and data dissemination to thenon-expert biomedical community.Major achievementsIn <strong>2012</strong> PDBe annotation staff curated a record number ofPDB entries (1676 entries, up 12% from 2011), the majorityof which were annotated within one working day of being deposited.PDBe staff also annotated 253 EMDB entries - 58%of all EMDB entries deposited in <strong>2012</strong>. One of these entrieswas the 1000th EMDB entry to be processed at <strong>EMBL</strong>-<strong>EBI</strong>since the EMDB archive was established here in 2002.PDBe staff, in collaboration with other wwPDB members,carried out a remediation project to improve therepresentation of the biologically important peptide-likeantibiotic and inhibitor molecules in the PDB. The remediationefforts addressed issues related to consistent representationand functional annotation of these molecules. A collectionof almost 600 reference dictionary files, containing detailedchemical and functional description of antibiotic and inhibitormolecules, was made available through the PDB ftp area.We remediated EMDB data to improve the quality andconsistency of the archive. Our team addressed issues relatedto taxonomy information, author and citation information,map-symmetry records and consistent representation of allEMDB map files.Validation information about newly deposited structures (aswell as those already in the public archive) is a critical elementof usability. PDBe implemented an initial version of thevalidation-software pipeline as part of the new Deposition andAnnotation system, developed in collaboration with the otherwwPDB partners.Since 2000, PDBe has provided valuable integration servicesvia SIFTS (Structure Integration with Function, Taxonomyand Sequences), which is maintained in close collaborationwith the UniProt team. SIFTS supports several PDBe toolsand services and is used by many other major bioinformaticsresources. We improved the SIFTS infrastructure and thecoverage of mapping information, upgrading the versioningand adding new data on GO and InterPro assignments.Our team undertook the redesign of the PDBe website andsearch infrastructure, which is scheduled to launch in 2013.We worked with many users in both academic and industrysectors to better understand how they use and navigatestructure-related information; our findings formed the basis ofthe redesign. We are also revamping the search infrastructurefor PDB and EMDB data and developing an API for accessingstructure data.Future plansWe strive to make biomacromolecular structure data availablein useful ways to a broad biomedical community. Wecontinually improve the quality and integrity of the data in ourcare, and endeavor to integrate them with other sources ofbiomedical information.We will work with our wwPDB partners to simplify thequality assessment of crystal structure entries in the PDBby progressing an X-ray validation pipeline, which wasdeveloped at PDBe. We will also develop similar pipelinesfor NMR and 3DEM data, again making quality assessmenteasier for non-expert users. We will design a more intuitiveinterface to display validation information for all kinds of3D biomacromolecular structure data, and will launch acompletely redesigned web site in 2013.A new wwPDB Deposition and Annotation system will bereleased in 2013, and we expect this to result in a majorchange in annotation practices. We will work closely with theother wwPDB partners to ensure an efficient transition.40 <strong>2012</strong> <strong>EMBL</strong>-<strong>EBI</strong> <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>
Sameer VelankarPhD, Indian Institute of Science, 1997. Postdoctoralresearcher, Oxford University, UK,1997-2000.At <strong>EMBL</strong>-<strong>EBI</strong> since 2000. Team leadersince 2011.Figure. The wwPDB X-ray validation pipeline assesses the quality of experimental data, the atomic model and the fit of the model to the data,using community-standard software. The pipeline will be run for all structures already in the PDB archive and also for all newly depositedstructures. It produces high-quality statistics, graphs, plots and a summary report to help expert and non-expert users assess the global andlocal quality of individual PDB entries so they can decide if it is suitable for their needs. The validation results can be used to compare andselect the most suitable model of a particular biomacromolecule in the entire PDB.Selected referencesVelankar, S., et al. (<strong>2012</strong>) PDBe: Protein Data Bank in Europe.Nucleic Acids Res 40, D445-D452.Gore, S., Velankar, S. and Kleywegt, G.J. (<strong>2012</strong>) Implementingan X-ray validation pipeline for the Protein Data Bank. ActaCrystallogr D68, 478-483.<strong>2012</strong> <strong>EMBL</strong>-<strong>EBI</strong> <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>41
- Page 1 and 2: EMBL-European Bioinformatics Instit
- Page 3: Table of contentsIntroduction & ove
- Page 6 and 7: EMBL-EBI 2012It was a year of trans
- Page 8 and 9: New service developments• Underst
- Page 10 and 11: Organisation ofEMBL-EBI Leadership
- Page 12 and 13: dGenes, genomes and variationThe Eu
- Page 14 and 15: dGenes, genomes and variationSummar
- Page 16 and 17: European Nucleotide ArchiveOur team
- Page 18 and 19: Vertebrate genomicsThe Vertebrate G
- Page 20 and 21: Nonvertebrate genomicsWe provide to
- Page 22 and 23: gMolecular atlasLife scientists are
- Page 24 and 25: Functional genomicsThe Functional G
- Page 26 and 27: Functional genomics productionOur t
- Page 28 and 29: Functional genomics developmentOur
- Page 30 and 31: PProteins and protein familiesUniPr
- Page 32 and 33: UniProt contentOne of the central a
- Page 34 and 35: UniProt developmentOur team provide
- Page 36 and 37: InterProOur team co-ordinates the I
- Page 38 and 39: sMolecular and cellular structureUn
- Page 40 and 41: Protein Data Bank in EuropeThe majo
- Page 44 and 45: PDBe databases and servicesOur team
- Page 46 and 47: yMolecular systemsThe genes and gen
- Page 48 and 49: Proteomics servicesThe Proteomics S
- Page 50 and 51: Chemical biologyThe importance of s
- Page 52 and 53: ChEMBLThe ChEMBL team develops and
- Page 54 and 55: Cheminformatics and metabolismOur t
- Page 56 and 57: cCross-domain toolsand resourcesSci
- Page 58 and 59: Literature servicesScientific liter
- Page 60 and 61: Research2012 has seen the further t
- Page 62 and 63: Bertone groupPluripotency, reprogra
- Page 64 and 65: Birney groupNucleotide dataDNA sequ
- Page 66 and 67: Enright groupFunctional genomics an
- Page 68 and 69: Goldman groupEvolutionary tools for
- Page 70 and 71: Le Novère groupComputational syste
- Page 72 and 73: Luscombe groupGenomics and regulato
- Page 74 and 75: Marioni groupComputational and evol
- Page 76 and 77: Rebholz groupPhenotypes and multili
- Page 78 and 79: Saez-Rodriguez groupSystems biomedi
- Page 80 and 81: Thornton groupProteins: structure,
- Page 82 and 83: The EMBL International PhDProgramme
- Page 84 and 85: SupportOur support teams provide fo
- Page 86 and 87: T TrainingAs part of EMBL-EBI’s m
- Page 88 and 89: IIndustry programmeSince 1996 the I
- Page 90 and 91: NExternal relationsAs a European In
- Page 92 and 93:
sExternal servicesOur team manages
- Page 94 and 95:
SSystems and networkingOur team man
- Page 96 and 97:
q AdministrationThe EMBL-EBI Admini
- Page 98 and 99:
Funding and resource allocationDesp
- Page 100 and 101:
Growth of core resourcesIn 2012 the
- Page 102 and 103:
CollaborationsEMBL-EBI is a highly
- Page 104 and 105:
Staff growthOur organisational stru
- Page 106 and 107:
Scientific advisory commiteesEMBL S
- Page 108 and 109:
The International Nucleotide Sequen
- Page 110 and 111:
EMDataBank Advisory Committee• Jo
- Page 112 and 113:
Major database collaborationsARRAYE
- Page 114 and 115:
THE GENE ONTOLOGY CONSORTIUM• Agb
- Page 116 and 117:
REACTOME• New York University Med
- Page 118 and 119:
Publications in 2012In 2012, EMBL-E
- Page 120 and 121:
Doreleijers, J. F., Vranken W. F.,
- Page 122 and 123:
Kruger, F. A., Rostom R. and Overin
- Page 124 and 125:
Sahakyan, Aleksandr B., Cavalli And
- Page 128:
EMBL - European Bioinformatics Inst