Annual Scientific Report 2015

Recommendations

Info

ChEMBL Drug discovery is more costly than ever, and innovation in efficacy and safety remains a significant challenge. Changes in the pharmaceutical industry over the past decade have led to an increase in drug-discovery activities in organisations that typically have access neither to large databases of legacy bioactivity data nor the experienced staff needed to manage them. Our team develops and manages ChEMBL, EMBL-EBI’s database of quantitative small-molecule bioactivity data focused in the area of drug discovery; SureChEMBL, a patent resource containing chemical structures extracted from patents on a daily basis; and UniChem, a resource to link chemical structures across databases, both internal and external to EMBL-EBI. ChEMBL contains data on curated chemical structures, bioactivity values and their relationship to biological targets and phenotypic assays. SureChEMBL combines full patent text and automatically data-mined chemical structures, significantly extending the speed and scope of public data available to drug-discovery researchers. The combination of structure–activity relationship (SAR) data from the scientific literature, deposited data from neglected disease high-throughout screens and now the patent literature all make ChEMBL an important and enabling resource for scientists working in pharmaceutical R&D. Our research interests centre on data mining the ChEMBL database for applications relevant to translational drug discovery, including aspects of genetic variability, drug safety and neglected diseases. Major achievements In 2015 there was a major change in the ChEMBL Group when John Overington, who had been Team Leader since the database was taken on by EMBL-EBI in 2008, left to join the London-based biotech company Stratified Medical. Since April 2015 Anne Hersey has been Acting Team Leader. We continued to expand the data coverage of ChEMBL to include drug-metabolism and pharmacokinetic (DMPK) data, and undertook extensive target and disease annotations on approved drugs and clinical candidates. We also developed methods to enhance and streamline the curation of data and significantly updated our Web Services as a flexible way for users to access ChEMBL data. We further refined the SureChEMBL patent annotation pipeline to improve its robustness and provided and provided new methods to access the annotations. The number of databases indexed in UniChem has increased to 27. We put in place a process to update the resource automatically every week. During 2015 ChEMBL data content continued to expand, with the number of compounds reaching 1.7 million and the number of bioactivities nearly 14 million. Access to the full ChEMBL data continues to be freely available in a wide variety of technical formats including a web interface, data downloads, web services and Semantic Web technologies. During the year there were approximately 15,000 unique visitors per month to the web interface on average. There were substantial increases in the extraction of data from the scientific literature; in particular, we extracted data on drug metabolism and disposition and integrated it into the database. ChEMBL Web Services were significantly expanded and re-implemented to expose more data types and provide new functionality. In addition, we added cheminformatics Web Services based on RDkit that allow users to perform more complex queries and to combine data and chemistry-aware queries. UniChem grew to contain links to over 100 million chemical structures from 27 source databases. For example, the UniChem web services are used on the ChEMBL web interface to provide dynamic links to other resources via the matching of the InChI/InChI Key. We fully automated the mechanism of updating and registering compounds in UniChem and since the start of 2015 weekly updates have been provided via the web interface, web services and as downloadable files. At the end of 2015 the number of novel chemical entities annotated in SureChEMBL stood at approximately 17 million, growing at a rate of around 80,000 novel chemicals per month from roughly 50,000 new patents. Previously, the patent data in SureChEMBL was available only via a web interface. In 2015, in response to user demand, we increased options for users to access the data. We now provide a quarterly download of files 111 2015 EMBL-EBI Annual Scientific Report
containing the mapping between compound structures and the patents they appear in. We also developed a data client feed that enables users to maintain a regular stream of the patent data behind a firewall and integrate the data with their in-house data. Using SciBite’s Termite software, we annotated the patent corpus with dictionaries and ontologies for biological terms including gene and diseases. The mapping is available through the OpenPHACTS API. In 2015 we intensified our work on the annotation of marketed drugs and clinical candidates with their intended therapeutic targets and diseases. This provides a rich source of data for researchers interested in validating therapeutic targets and identifying novel ones. This was carried out as part of the NIH-funded Illuminating the Druggable Genome (IDG) project and n Open Targets (formerly CTTV) project. We continued to participate in two EU-funded projects, eTOX and HeCaTos, which aim to better identify and curate toxicity data and apply it to the prediction of toxicological endpoints. We continued our work on OpenPHACTS, an Innovative Medicines Initiative project that integrates pharmacological data across diverse resources. We also participated in the IDG project, Open Targets and Corbel, the European infrastructure project that follows on from BioMedBridges. Research We developed methods to predict molecular targets and off-targets using structural information from phenotypic screening data, an essential step in lead optimisation, polypharmacology and the study of side effects. Through collaborations, we validated three Mtb targets from our predictions for potential tuberculosis drug leads using biochemical and biophysical methods, and from the generation of target-ligand structures. In addition to developing a general theoretical model of drug resistance and drug combinations, we performed extensive data mining to retrieve information on the relationship between the type of target and the physicochemical properties of antibiotics – information that is important for the development of new drugs. John Overington Chemogenomics BSc Chemistry, Bath. PhD in Crystallography, Birkbeck College, London, 1991. Postdoctoral research, ICRF, 1990-1992. Pfizer 1992-2000. Inpharmatica 2000-2008. At EMBL-EBI from 2008 to 2015. Anne Hersey Acting Team Leader, ChEMBL BSc Chemistry, University of Kent, PhD in Physical Chemistry, University of Kent, 1982, GlaxoSmithKline and former companies1982- 2009. At EMBL-EBI since 2009. Future projects and goals In 2016 we will continue to broaden the utility and content of ChEMBL and SureChEMBL by adding additional annotation, for example on diseases, targets and data measured using genetic variants of proteins. We will expand our use of ontologies to increase indexing of the ChEMBL data, particularly for complex and high-value endpoints such as ADMET, and in vivo pharmacology assays. We will develop technologies that enable us to build curation and data submission interfaces in a flexible and extendable way, and use text-mining methodologies to identify journal articles that enhance our coverage of chemical space. We will continue to develop automation methods for ChEMBL to enable the database to be updated more regularly and simply. We will also develop a sub-structure and similarity search facility for UniChem. Selected publications Davies M, et al. (2015) ChEMBL web services: streamlining access to drug discovery data and utilities. Nucleic Acids Res. 43: W612-W620 Gaulton A, et al. (2015) A large-scale crop protection bioassay data set. Scientific Data 2:150032 Mugumbate G, et al. (2015) Mycobacterial dihydrofolate reductase inhibitors identified using chemogenomic methods and in vitro validation. PLoS One 10:e0121492 Papadatos G, et al. (2016) SureChEMBL: a large-scale, chemically annotated patent document database Nucleic Acids Res. 44:D1220-D1228 Papadatos G, (2015) Activity, assay and target data curation and quality in the ChEMBL database. J. Computer-Aided Mol. Design 29:885-896 2015 EMBL-EBI Annual Scientific Report 112
Page 1 and 2:
The European Bioinformatics Institu
Page 3 and 4:
SERVICE TEAMS TRAINING PROGRAMME RE
Page 5 and 6:
Foreword We are pleased to present
Page 7 and 8:
awareness amongst some of our stron
Page 9 and 10:
Chemical biology The 17 million nov
Page 11 and 12:
The most extensive catalogue of str
Page 13 and 14:
“ EMBL -EBI services are the back
Page 15 and 16:
European Nucleotide Archive The ENA
Page 17 and 18:
Vertebrate Genomics Paul Flicek Bro
Page 19 and 20:
Functional Genomics Alvis Brazma
Page 21 and 22:
Pfam Pfam is a database of protein
Page 23 and 24:
Protein Data Bank in Europe Gerard
Page 25 and 26:
MetaboLights MetaboLights is a data
Page 27 and 28:
Proteomics Services and Molecular I
Page 29 and 30:
BioSamples The BioSamples database
Page 31 and 32:
“ EMBL -EBI is a critical mass of
Page 33 and 34:
EMBL International PhD Programme at
Page 35 and 36:
“ It would be a considerable loss
Page 37 and 38:
The Birney group used methods devel
Page 39 and 40:
Marioni group • Improved and exte
Page 41 and 42:
“ Because I work for a micro biot
Page 43 and 44:
Industry workshops • In silico AD
Page 45 and 46:
The work of our institute relies on
Page 47 and 48:
Web production Rodrigo Lopez System
Page 49 and 50:
2015 EMBL-EBI Annual Scientific Rep
Page 51 and 52:
Capital investment Support from the
Page 53 and 54:
In 2015 our core data resources con
Page 55 and 56:
Joint publications Most of our 299
Page 57 and 58:
One from Many: Perspectives on a Mu
Page 59 and 60:
2015 EMBL-EBI Annual Scientific Rep
Page 61 and 62: European Nucleotide Archive • Mar
Page 63 and 64: Technical Services Cluster Scientif
Page 65 and 66: Expression Atlas • Oregon State U
Page 67 and 68: Photo: Uma Maheswari 2015 EMBL-EBI
Page 69 and 70: 2015 EMBL-EBI Annual Scientific Rep
Page 71 and 72: 037. Chiapparino A, Maeda K, Turei
Page 73 and 74: 115. Jakubec D, Hostas J, Laskowski
Page 75 and 76: 192. Perez-Riverol Y, Xu QW, Wang R
Page 77 and 78: 269. van den Berg BA, Reinders MJ,
Page 79 and 80: Director Ewan Birney Admininstratio
Page 83 and 84: Guy Cochrane European Nucleotide Ar
Page 85 and 86: Vertebrate Genomics Research The mo
Page 87 and 88: Daniel Zerbino Ensembl Genome Analy
Page 89 and 90: Future plans We will continue to de
Page 91 and 92: Andy Yates Genome Technology and In
Page 93 and 94: Paul Kersey Non-vertebrate Genomics
Page 95 and 96: Justin Paschall Variation Archive M
Page 97 and 98: Alvis Brazma Functional Genomics Ph
Page 99 and 100: Ugis Sarkans Functional Genomics De
Page 101 and 102: Robert Petryszak Gene Expression MP
Page 103 and 104: Rob Finn Sequence Families PhD in B
Page 105 and 106: Maria-Jesus Martin Protein Function
Page 107 and 108: Claire O’Donovan Protein Function
Page 109 and 110: (such as the on-going EMDataBank Ma
Page 111: Sameer Velankar PDBe Content and In
Page 115 and 116: of 14 leading European labs in Meta
Page 117 and 118: Henning Hermjakob Proteomic service
Page 119 and 120: coimmunoprecipitation coimmunopreci
Page 121 and 122: development of Europe PMC as a plat
Page 123 and 124: Mouse informatics In 2015 we contin
Page 127 and 128: Train online, EMBL-EBI’s web-base
Page 129 and 130: Nils Koelling Quantitative genetics
Page 133 and 134: Pedro Beltrao PhD in Biology, Unive
Page 135 and 136: Ewan Birney PhD 2000, Wellcome Trus
Page 137 and 138: Anton Enright PhD in Computational
Page 139 and 140: Nick Goldman PhD University of Camb
Page 141 and 142: John Marioni PhD in Applied Mathema
Page 143 and 144: Julio-Saez Rodriguez PhD University
Page 145 and 146: Oliver Stegle PhD in Physics, Unive
Page 147 and 148: Future plans The Teichmann group wi
Page 149 and 150: findings regarding association were
Page 153 and 154: Future plans The Industry Programme
Page 157 and 158: Reporting on usage We further devel
Page 159 and 160: to find the support they need. The
Page 161 and 162: Petteri Jokinen Systems & Networkin
Page 163 and 164:
Standby Facility and Database Disas
Page 165 and 166:
External Relations leads on brand a
Page 167 and 168:
Mark Green EMBL-EBI Administration
show all

Annual Scientific Report 2015

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?