22.08.2016 Views

Annual Scientific Report 2015

EMBL_EBI_ASR_2015_DigitalEdition

EMBL_EBI_ASR_2015_DigitalEdition

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ChEMBL<br />

Drug discovery is more costly than ever, and innovation in efficacy and safety<br />

remains a significant challenge. Changes in the pharmaceutical industry over the<br />

past decade have led to an increase in drug-discovery activities in organisations<br />

that typically have access neither to large databases of legacy bioactivity data nor<br />

the experienced staff needed to manage them. Our team develops and manages<br />

ChEMBL, EMBL-EBI’s database of quantitative small-molecule bioactivity<br />

data focused in the area of drug discovery; SureChEMBL, a patent resource<br />

containing chemical structures extracted from patents on a daily basis; and<br />

UniChem, a resource to link chemical structures across databases, both internal<br />

and external to EMBL-EBI.<br />

ChEMBL contains data on curated chemical structures,<br />

bioactivity values and their relationship to biological<br />

targets and phenotypic assays. SureChEMBL combines<br />

full patent text and automatically data-mined chemical<br />

structures, significantly extending the speed and scope<br />

of public data available to drug-discovery researchers.<br />

The combination of structure–activity relationship<br />

(SAR) data from the scientific literature, deposited<br />

data from neglected disease high-throughout screens<br />

and now the patent literature all make ChEMBL an<br />

important and enabling resource for scientists working<br />

in pharmaceutical R&D.<br />

Our research interests centre on data mining the<br />

ChEMBL database for applications relevant to<br />

translational drug discovery, including aspects of genetic<br />

variability, drug safety and neglected diseases.<br />

Major achievements<br />

In <strong>2015</strong> there was a major change in the ChEMBL Group<br />

when John Overington, who had been Team Leader<br />

since the database was taken on by EMBL-EBI in 2008,<br />

left to join the London-based biotech company Stratified<br />

Medical. Since April <strong>2015</strong> Anne Hersey has been Acting<br />

Team Leader.<br />

We continued to expand the data coverage of ChEMBL<br />

to include drug-metabolism and pharmacokinetic<br />

(DMPK) data, and undertook extensive target and<br />

disease annotations on approved drugs and clinical<br />

candidates. We also developed methods to enhance and<br />

streamline the curation of data and significantly updated<br />

our Web Services as a flexible way for users to access<br />

ChEMBL data.<br />

We further refined the SureChEMBL patent annotation<br />

pipeline to improve its robustness and provided and<br />

provided new methods to access the annotations.<br />

The number of databases indexed in UniChem has<br />

increased to 27. We put in place a process to update the<br />

resource automatically every week.<br />

During <strong>2015</strong> ChEMBL data content continued to<br />

expand, with the number of compounds reaching<br />

1.7 million and the number of bioactivities nearly 14<br />

million. Access to the full ChEMBL data continues to<br />

be freely available in a wide variety of technical formats<br />

including a web interface, data downloads, web services<br />

and Semantic Web technologies. During the year there<br />

were approximately 15,000 unique visitors per month<br />

to the web interface on average. There were substantial<br />

increases in the extraction of data from the scientific<br />

literature; in particular, we extracted data on drug<br />

metabolism and disposition and integrated it into the<br />

database. ChEMBL Web Services were significantly<br />

expanded and re-implemented to expose more data<br />

types and provide new functionality. In addition, we<br />

added cheminformatics Web Services based on RDkit<br />

that allow users to perform more complex queries and to<br />

combine data and chemistry-aware queries.<br />

UniChem grew to contain links to over 100 million<br />

chemical structures from 27 source databases. For<br />

example, the UniChem web services are used on the<br />

ChEMBL web interface to provide dynamic links to<br />

other resources via the matching of the InChI/InChI<br />

Key. We fully automated the mechanism of updating and<br />

registering compounds in UniChem and since the start<br />

of <strong>2015</strong> weekly updates have been provided via the web<br />

interface, web services and as downloadable files.<br />

At the end of <strong>2015</strong> the number of novel chemical entities<br />

annotated in SureChEMBL stood at approximately<br />

17 million, growing at a rate of around 80,000 novel<br />

chemicals per month from roughly 50,000 new patents.<br />

Previously, the patent data in SureChEMBL was<br />

available only via a web interface. In <strong>2015</strong>, in response to<br />

user demand, we increased options for users to access<br />

the data. We now provide a quarterly download of files<br />

111<br />

<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!