Annual Scientific Report 2015
EMBL_EBI_ASR_2015_DigitalEdition
EMBL_EBI_ASR_2015_DigitalEdition
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
ChEMBL<br />
Drug discovery is more costly than ever, and innovation in efficacy and safety<br />
remains a significant challenge. Changes in the pharmaceutical industry over the<br />
past decade have led to an increase in drug-discovery activities in organisations<br />
that typically have access neither to large databases of legacy bioactivity data nor<br />
the experienced staff needed to manage them. Our team develops and manages<br />
ChEMBL, EMBL-EBI’s database of quantitative small-molecule bioactivity<br />
data focused in the area of drug discovery; SureChEMBL, a patent resource<br />
containing chemical structures extracted from patents on a daily basis; and<br />
UniChem, a resource to link chemical structures across databases, both internal<br />
and external to EMBL-EBI.<br />
ChEMBL contains data on curated chemical structures,<br />
bioactivity values and their relationship to biological<br />
targets and phenotypic assays. SureChEMBL combines<br />
full patent text and automatically data-mined chemical<br />
structures, significantly extending the speed and scope<br />
of public data available to drug-discovery researchers.<br />
The combination of structure–activity relationship<br />
(SAR) data from the scientific literature, deposited<br />
data from neglected disease high-throughout screens<br />
and now the patent literature all make ChEMBL an<br />
important and enabling resource for scientists working<br />
in pharmaceutical R&D.<br />
Our research interests centre on data mining the<br />
ChEMBL database for applications relevant to<br />
translational drug discovery, including aspects of genetic<br />
variability, drug safety and neglected diseases.<br />
Major achievements<br />
In <strong>2015</strong> there was a major change in the ChEMBL Group<br />
when John Overington, who had been Team Leader<br />
since the database was taken on by EMBL-EBI in 2008,<br />
left to join the London-based biotech company Stratified<br />
Medical. Since April <strong>2015</strong> Anne Hersey has been Acting<br />
Team Leader.<br />
We continued to expand the data coverage of ChEMBL<br />
to include drug-metabolism and pharmacokinetic<br />
(DMPK) data, and undertook extensive target and<br />
disease annotations on approved drugs and clinical<br />
candidates. We also developed methods to enhance and<br />
streamline the curation of data and significantly updated<br />
our Web Services as a flexible way for users to access<br />
ChEMBL data.<br />
We further refined the SureChEMBL patent annotation<br />
pipeline to improve its robustness and provided and<br />
provided new methods to access the annotations.<br />
The number of databases indexed in UniChem has<br />
increased to 27. We put in place a process to update the<br />
resource automatically every week.<br />
During <strong>2015</strong> ChEMBL data content continued to<br />
expand, with the number of compounds reaching<br />
1.7 million and the number of bioactivities nearly 14<br />
million. Access to the full ChEMBL data continues to<br />
be freely available in a wide variety of technical formats<br />
including a web interface, data downloads, web services<br />
and Semantic Web technologies. During the year there<br />
were approximately 15,000 unique visitors per month<br />
to the web interface on average. There were substantial<br />
increases in the extraction of data from the scientific<br />
literature; in particular, we extracted data on drug<br />
metabolism and disposition and integrated it into the<br />
database. ChEMBL Web Services were significantly<br />
expanded and re-implemented to expose more data<br />
types and provide new functionality. In addition, we<br />
added cheminformatics Web Services based on RDkit<br />
that allow users to perform more complex queries and to<br />
combine data and chemistry-aware queries.<br />
UniChem grew to contain links to over 100 million<br />
chemical structures from 27 source databases. For<br />
example, the UniChem web services are used on the<br />
ChEMBL web interface to provide dynamic links to<br />
other resources via the matching of the InChI/InChI<br />
Key. We fully automated the mechanism of updating and<br />
registering compounds in UniChem and since the start<br />
of <strong>2015</strong> weekly updates have been provided via the web<br />
interface, web services and as downloadable files.<br />
At the end of <strong>2015</strong> the number of novel chemical entities<br />
annotated in SureChEMBL stood at approximately<br />
17 million, growing at a rate of around 80,000 novel<br />
chemicals per month from roughly 50,000 new patents.<br />
Previously, the patent data in SureChEMBL was<br />
available only via a web interface. In <strong>2015</strong>, in response to<br />
user demand, we increased options for users to access<br />
the data. We now provide a quarterly download of files<br />
111<br />
<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>