5th German Conference on Cheminformatics: 23 ... - BioMed Central
5th German Conference on Cheminformatics: 23 ... - BioMed Central
5th German Conference on Cheminformatics: 23 ... - BioMed Central
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
MEETING ABSTRACTS Open Access<br />
<str<strong>on</strong>g>5th</str<strong>on</strong>g> <str<strong>on</strong>g>German</str<strong>on</strong>g> <str<strong>on</strong>g>C<strong>on</strong>ference</str<strong>on</strong>g> <strong>on</strong> <strong>Cheminformatics</strong>:<br />
<strong>23</strong>. CIC-Workshop<br />
Goslar, <str<strong>on</strong>g>German</str<strong>on</strong>g>y. 8-10 November 2009<br />
Edited by Frank Oellien, Uli Fechner and Thomas Engel<br />
Published: 04 May 2010<br />
These abstracts are available <strong>on</strong>line at http://www.jcheminf.com/supplements/2/S1<br />
INTRODUCTION<br />
A1<br />
<str<strong>on</strong>g>5th</str<strong>on</strong>g> <str<strong>on</strong>g>German</str<strong>on</strong>g> <str<strong>on</strong>g>C<strong>on</strong>ference</str<strong>on</strong>g> <strong>on</strong> Chemoinformatics: <strong>23</strong>. CIC-Workshop.<br />
November 8-10, 2009, Goslar, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Frank Oellien 1* , Uli Fechner 2 , Thomas Engel 3<br />
1 GDCh-CIC Divisi<strong>on</strong> Chair, Intervet Innovati<strong>on</strong> GmbH, Zur Propstei, 55270<br />
Schwabenheim, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 GDCh-CIC divisi<strong>on</strong> board member, Beilstein-<br />
Institut zur Förderung der Chemischen Wissenschaften, Trakehner Str. 7-9,<br />
60487 Frankfurt, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 3 GDCh-CIC Divisi<strong>on</strong> Co-Chair, Fakultät für Chemie<br />
und Pharmazie, Universität München, Butenandtstr. 5-13, 81377 München,<br />
<str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
E-mail: frank.oellien@sp.intervet.com<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):A1<br />
From the 8 th to the 10 th November 2009, the Chemistry-Informati<strong>on</strong>-<br />
Computers (CIC) divisi<strong>on</strong> of the <str<strong>on</strong>g>German</str<strong>on</strong>g> Chemical Society (GDCh) has invited<br />
the chemoinformatics and modeling community to Goslar, <str<strong>on</strong>g>German</str<strong>on</strong>g>y to<br />
participate in the <str<strong>on</strong>g>5th</str<strong>on</strong>g> <str<strong>on</strong>g>German</str<strong>on</strong>g> <str<strong>on</strong>g>C<strong>on</strong>ference</str<strong>on</strong>g> <strong>on</strong> Chemoinformatics<br />
(GCC 2009). The internati<strong>on</strong>al symposium addressed a broad range of<br />
modern research topics in the field of computers and chemistry. The focus<br />
was <strong>on</strong> recent developments and trends in the fields of Chemoinformatics<br />
and Drug Discovery, Chemical Informati<strong>on</strong>, Patents and Databases, Molecular<br />
Modeling, Computati<strong>on</strong>al Material Science and Nanotechnology. In additi<strong>on</strong>,<br />
other c<strong>on</strong>tributi<strong>on</strong>s from the field of Computati<strong>on</strong>al Chemistry were<br />
welcome.<br />
The c<strong>on</strong>ference was opened traditi<strong>on</strong>ally with a “Free-Software-Sessi<strong>on</strong>” <strong>on</strong><br />
Sunday afterno<strong>on</strong> right before the official c<strong>on</strong>ference opening at 5 pm<br />
including three talks about the Open Source projects Bingo, Dingo and<br />
OrChem. In parallel the “Chemoinformatics Market Place” took place<br />
including software tutorials by Chemical Computing Group, Hemlholtz-<br />
Center Munich and the Cambridge Crystallograhic Data Center.<br />
The scientific program was opened by an evening talk giving an overview<br />
<strong>on</strong> the field of Systems Chemistry (Günter v<strong>on</strong> Kiedrowski). In additi<strong>on</strong>,<br />
the program included six plenary lectures (Eberhard Voit (USA), Knut<br />
Baumann (<str<strong>on</strong>g>German</str<strong>on</strong>g>y), Thomas Kostka (<str<strong>on</strong>g>German</str<strong>on</strong>g>y), Anth<strong>on</strong>y J. Williams<br />
(USA), Kalr-Heinz Baringhaus (<str<strong>on</strong>g>German</str<strong>on</strong>g>y), Christoph Sotriffer (<str<strong>on</strong>g>German</str<strong>on</strong>g>y)],<br />
17 general lectures as well as 54 poster presentati<strong>on</strong>s.<br />
Besides the scientific program a special highlight of the c<strong>on</strong>ference were<br />
the FIZ-CHEMIE-Berlin 2009 awards <strong>on</strong> M<strong>on</strong>day afterno<strong>on</strong> (Figure 1). The<br />
CIC divisi<strong>on</strong> awards this price each year to the best diploma thesis and<br />
the best PhD in the field of Computati<strong>on</strong>al Chemistry. The price for the<br />
PhD thesis was awarded to Dr. José Batista from the group of Prof.<br />
Dr. Jürgen Bajorath, University of B<strong>on</strong>n for his dissertati<strong>on</strong> “Analysis of<br />
Random Fragment Profiles for the Detacti<strong>on</strong> of Structure-Activity<br />
Relati<strong>on</strong>ships”. The award for the best diploma thesis has g<strong>on</strong>e to Frank<br />
Tristram from the group of Dr. Wolfgang Wenzel, Karlsruher Institute of<br />
Figure 1 (abstract A1). Fiz CHEMIE Berlin Awards 2009: from left to<br />
right, Frank Tristram (award for the best diploma thesis), Rene de<br />
Planque (Head of FIZ CHEMIE Verlin), Frank Oellien (Chair of the<br />
GDCh-CIC divisi<strong>on</strong>), José Batista (award for the best PhD thesis).<br />
Technology with the title “Modellierubng der Hauptkettenbeweglichkeit in<br />
der rechnergestützten Medikamentenentwicklung”.<br />
FREE SOFTWARE SESSION<br />
PRESENTATIONS<br />
F1<br />
Bingo from SciTouch LLC: chemistry cartridge for Oracle database<br />
Dmitry Pavlov * , Mikhail Rybalkin, Boris Karulin<br />
SciTouch LLC, Budapeshtskaya st. <strong>23</strong>-2-82, RUS-192212 St. Petersburg, Russia<br />
E-mail: dpavlov@scitouch.net<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):F1<br />
Bingo is a data cartridge for Oracle database that provides the industry’s<br />
next-generati<strong>on</strong>, fast, scalable, and efficient storage and searching<br />
soluti<strong>on</strong> for chemical informati<strong>on</strong>.<br />
Bingo seamlessly integrates the chemistry into Oracle databases. Its<br />
extensible indexing is designed to enable scientists to store, index, and
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
search chemical moieties al<strong>on</strong>gside numbers and text within <strong>on</strong>e<br />
underlying relati<strong>on</strong>al database server.<br />
For molecule structure searching, Bingo supports 2D and 3D exact and<br />
substructure searches, as well as similarity, tautomer, Markush, formula,<br />
molecular weight, and flexmatch searches. For reacti<strong>on</strong> searches, Bingo<br />
supports reacti<strong>on</strong> substructure search (RSS) with opti<strong>on</strong>al automatic<br />
generati<strong>on</strong> of atom-to-atom mapping. All of these techniques are<br />
available through extensi<strong>on</strong>s to the SQL and PL/SQL syntax.<br />
Bingo also has features not present in other cartridges, for example,<br />
advanced tautomer search, res<strong>on</strong>ance substructure search, and fast<br />
updating of the index when adding new structures.<br />
The presentati<strong>on</strong> itself you can download from our site: http://<br />
opensource.scitouch.net/downloads/bingo-cic.pdf<br />
F2<br />
Dingo: 2D molecule and reacti<strong>on</strong> structural formula rendering library<br />
Mikhail Kozhevnikov * , Boris Karulin<br />
SciTouch LLC, Budapeshtskaya st. <strong>23</strong>-2-82, RUS-192212 St. Petersburg, Russia<br />
E-mail: kozhevnikov@scitouch.net<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):F2<br />
2D structural formulae are widely accepted as a way of representing<br />
chemical compounds and reacti<strong>on</strong>s. High-quality visualizati<strong>on</strong> of those<br />
formulae can greatly facilitate scientist percepti<strong>on</strong>.<br />
We are proud to present a tool for such visualizati<strong>on</strong>, called Dingo [1]. It<br />
is capable of rendering molecules, reacti<strong>on</strong>s and queries al<strong>on</strong>g with a<br />
wide range of attributes <strong>on</strong> jpgWindows, Linux, Solaris and Mac OS in<br />
accordance with IUPAC recommendati<strong>on</strong>s [2].<br />
Dingo accepts comm<strong>on</strong> file formats, such as MDL Molfile [3] and Rxnfile,<br />
SMILES [4] with several extensi<strong>on</strong>s and more. The result can be stored as<br />
a PDF, SVG, PNG or EMF file. One can also embed Dingo in an applicati<strong>on</strong><br />
and use it via simple C interface or a C# wrapper to produce image files<br />
or render directly to a Windows HDC.<br />
Dingo is a str<strong>on</strong>g competitor for existing proprietary tools <strong>on</strong> the market,<br />
while it is available under the terms of GPLv3 free of charge.<br />
The aim of the project is to provide a simple and efficient mechanism for<br />
2D structural formula rendering. High quality and the variety of<br />
supported formats allows inclusi<strong>on</strong> of resulting images in scientific<br />
papers, further editing of the output in a vector form or embedding the<br />
library in more complex applicati<strong>on</strong>s.<br />
References<br />
1. [http://opensource.scitouch.net/indigo/dingo].<br />
2. [http://www.iupac.org/objID/Article/pac8002x0277].<br />
3. [http://www.symyx.com/downloads/public/ctfile/ctfile.pdf].<br />
4. [http://www.daylight.com/smiles/].<br />
F3<br />
OrChem<br />
Mark Rijnbeek<br />
Protein and Nucleotide Database Group, Chemoinformatics and Metabolism<br />
Team, EMBL Outstati<strong>on</strong> - Hinxt<strong>on</strong>, European Bioinformatics Institute,<br />
Wellcome Trust Genome Campus, Hinxt<strong>on</strong>, Cambridge, CB10 1SD, UK<br />
E-mail: markr@ebi.ac.uk<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):F3<br />
Background: Registrati<strong>on</strong>, indexing and searching of chemical structures<br />
in relati<strong>on</strong>al databases is <strong>on</strong>e of the core areas of cheminformatics.<br />
However, little detail has been published <strong>on</strong> the inner workings of search<br />
engines and their development has been mostly closed-source. We<br />
decided to develop an open source chemistry extensi<strong>on</strong> for Oracle, the<br />
de facto database platform in the commercial world.<br />
Results: Here we present OrChem, an extensi<strong>on</strong> for the Oracle 11G<br />
database that adds registrati<strong>on</strong> and indexing of chemical structures to<br />
support fast substructure and similarity searching. The cheminformatics<br />
functi<strong>on</strong>alityisprovidedbytheChemistryDevelopmentKit.OrChem<br />
provides similarity searching with resp<strong>on</strong>se times in the order of sec<strong>on</strong>ds<br />
for databases with milli<strong>on</strong>s of compounds, depending <strong>on</strong> a given<br />
similarity cut-off. For substructure searching, it can make use of multiple<br />
processor cores <strong>on</strong> today’s powerful database servers to provide fast<br />
resp<strong>on</strong>se times in equally large data sets.<br />
Availability: OrChem is free software and can be redistributed and/or<br />
modified under the terms of the GNU Lesser General Public License as<br />
published by the Free Software Foundati<strong>on</strong>. All software is available via<br />
http://orchem.sourceforge.net.<br />
ORAL PRESENTATIONS<br />
Page 2 of 25<br />
O1<br />
Systems chemistry: from chemical self-replicati<strong>on</strong> to trisoligo-based<br />
nanoc<strong>on</strong>structi<strong>on</strong><br />
Günter v<strong>on</strong> Kiedrowski<br />
Lehrstuhl für Organische Chemie I - Bioorganische Chemie, Ruhr-Universität,<br />
Universitätsstrasse 150, 44780 Bochum, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
E-mail: kiedro@rub.de<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O1<br />
Self-replicati<strong>on</strong> is <strong>on</strong>e of the major principles without life could not exist.<br />
The emergence of self-replicating systems <strong>on</strong> the early earth is generally<br />
believed to have taken place before the advent of instructed protein<br />
synthesis based <strong>on</strong> a complex translati<strong>on</strong> machinery. Whether the origin<br />
of self-replicati<strong>on</strong> is identical to the origin of the hypothetical RNA world<br />
or whether it existed at an earlier stage of evoluti<strong>on</strong> is an open questi<strong>on</strong><br />
that has stimulated chemists to search for chemical systems capable of<br />
making copies of itselves via autocatalytic reacti<strong>on</strong>s. As self-replicati<strong>on</strong><br />
means autocatalysis plus informati<strong>on</strong> transfer, the reacti<strong>on</strong> products must<br />
necessarily be able to store more structural informati<strong>on</strong> than their<br />
precursors. Templating as a means to transfer structural informati<strong>on</strong> has<br />
been exploited since the first successful example of a chemical selfreplicating<br />
system almost two decades ago [1]. Today we have a broad<br />
variety of such systems employing olig<strong>on</strong>ucleotides, peptides, and small<br />
organic molecules as templates and autocatalytic [1-3], cross-catalytic [4],<br />
collectively autocatalytic and n<strong>on</strong>-aut<strong>on</strong>omous (stepwise) schemes of selfreplicati<strong>on</strong><br />
[5].<br />
A link between the chemical self-replicati<strong>on</strong> and nanotechnology was first<br />
pointed out by G.M.Whitesides in his debate with Drexler. The ribosome<br />
is an example for a nanomachine which may be viewed as a threedimensi<strong>on</strong>ally<br />
defined array of 51 modular proteins positi<strong>on</strong>ed by the<br />
rRNA scaffold. Biomimetic approaches towards the 3D-nano-scaffolding of<br />
modular functi<strong>on</strong>s may be based <strong>on</strong> trisolig<strong>on</strong>ucleotides [6], viz. synthetic<br />
3-arm juncti<strong>on</strong>s in which the 3’-ends of three olig<strong>on</strong>ucleotides are<br />
c<strong>on</strong>nected by a suitable linker. We report <strong>on</strong> trisoligos as building blocks<br />
for the n<strong>on</strong>covalent c<strong>on</strong>structi<strong>on</strong> of nanoobjects. Kinetic c<strong>on</strong>trol - applied<br />
by rapid cooling during self-assembly - was found to favor small and<br />
defined nano-structures instead of large polymeric networks [6]. Maximal<br />
instructi<strong>on</strong> was employed as design principles to generate n<strong>on</strong>covalent<br />
3D-nano-objects in which both, the topology and the geometry is<br />
defined [7]. Examples include dodecahedral nano-scaffolds composed<br />
from 20 trisoligos each bearing three individually defined sequences [8].<br />
It was dem<strong>on</strong>strated that the c<strong>on</strong>nectivity informati<strong>on</strong> in the nanoscaffold<br />
juncti<strong>on</strong>s can be copied by chemical means [9]. Chemical<br />
copyingschemesmaybeseeninc<strong>on</strong>juncti<strong>on</strong>with“surface-promoted<br />
replicati<strong>on</strong> and exp<strong>on</strong>ential amplificati<strong>on</strong> of DNA analogues” (SPREAD) [5].<br />
Applicati<strong>on</strong>s of such scaffolds include the positi<strong>on</strong>ing of modular<br />
functi<strong>on</strong>s such as multidentate thioether-based gold cluster labels<br />
(RUBiGold) [10] which have been tailored for m<strong>on</strong>oc<strong>on</strong>jugability and<br />
thermostability.<br />
My lecture will introduce chemical self-replicati<strong>on</strong> and multicomp<strong>on</strong>ent<br />
assembly as facets of systems chemistry [3,11,12] - a nascent field which<br />
understands itself as the bottom-up pendant of systems biology towards<br />
synthetic biology [14]. The field is clearly inspired by the origin-of-life<br />
problem but goes bey<strong>on</strong>d traditi<strong>on</strong>al prebiotic chemistry in its missi<strong>on</strong><br />
towards a quantititive dynamic understanding of complex reacti<strong>on</strong><br />
networks with autocatalytic comp<strong>on</strong>ents. Software tools like our SimFit<br />
and kinetic NMR titrati<strong>on</strong> [2] may significantly help to decipher signatures<br />
of interesting dynamic phenomena such as self-replicati<strong>on</strong>, chiral<br />
symmetry-breaking, and metabolic autocatalysis found in organic [13]<br />
and biomolecular [12] reacti<strong>on</strong> systems.
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
References<br />
1. v<strong>on</strong> Kiedrowski G: Angew Chem Int Ed Engl 1986, 25:932.<br />
2. Stahl I, v<strong>on</strong> Kiedrowski G: J Am Chem Soc 2006, 28:14014-14015.<br />
3. Kindermann M, et al: Angew Chem 2005, 117:6908-6913.<br />
4. Sievers D, v<strong>on</strong> Kiedrowski G: Nature 1994, 369:221.<br />
5. Luther A, Brandsch R, v<strong>on</strong> Kiedrowski G: Nature 1998, 396:245.<br />
6. Scheffler M, et al: Angew Chem Int Ed Engl 1999, 38:3312.<br />
7. v<strong>on</strong> Kiedrowski G, et al: Pure Appl Chem 2003, 75:609.<br />
8. Zimmermann J, et al: Angew Chem 2008, 120:3682-3686.<br />
9. Eckardt L, et al: Nature 2002, 420:286.<br />
10. Pankau WM, Mönninghoff S, v<strong>on</strong> Kiedrowski G: Angew Chem 2006,<br />
118:19<strong>23</strong>-1926.<br />
11. Stankiewicz J, Eckardt LH: Angew Chem 2006, 118:350-352.<br />
12. Hayden EJ, v<strong>on</strong> Kiedrowski G, Lehman N: Angew Chem 2008, 120:8552-8556.<br />
13. Kramer D, Pankau WM, v<strong>on</strong> Kiedrowski G: in press.<br />
14. MoU.<br />
O2<br />
The role of systems modeling in drug discovery and predictive health<br />
Eberhard O Voit<br />
Integrative BioSystems Institute and Wallace H. Coulter Department of<br />
Biomedical Engineering, Georgia Institute of Technology, 313 Ferst Drive,<br />
Atlanta, GA 30332-0535, USA<br />
E-mail: eberhard.voit@bme.gatech.edu<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O2<br />
Systems biology is the result of a c<strong>on</strong>fluence of recent advances in<br />
molecular biology, engineering, and the computati<strong>on</strong>al sciences. It can<br />
loosely be categorized into experimental and computati<strong>on</strong>al systems<br />
biology. Experimental high-throughput methods, assisted by robotics, image<br />
analysis, and bioinformatics, have been used in the drug industry for quite a<br />
while, and current screening tests for drug efficacy and toxicity regularly<br />
involve genomic, proteomic, and molecular modeling approaches. By<br />
c<strong>on</strong>trast, the role of computati<strong>on</strong>al methods of biological systems analysis is<br />
still emerging. This presentati<strong>on</strong> focuses <strong>on</strong> computati<strong>on</strong>al systems<br />
modeling and its increasingly important role at several junctures of the drug<br />
development pipeline. Examples to be discussed include mathematical<br />
models for receptor dynamics, pharmacokinetics, and metabolic and<br />
signaling pathway analysis. In the c<strong>on</strong>text of the latter, Biochemical Systems<br />
Theory is proposed as a highly advantageous default framework for model<br />
design, diagnostics, manipulati<strong>on</strong>, and system optimizati<strong>on</strong>. The<br />
development of dynamic models for complex disease processes permits the<br />
straightforward inclusi<strong>on</strong> of methods for custom-tailoring models, which is a<br />
key step toward pers<strong>on</strong>alized medicine and predictive health [1-5].<br />
References<br />
1. Voit EO: Computati<strong>on</strong>al Analysis of Biochemical Systems. A Practical Guide<br />
for Biochemists and Molecular Biologists Cambridge University Press,<br />
Cambridge, UK 2000, xii + 530.<br />
2. Voit EO: Metabolic modeling: A tool of drug discovery in the postgenomic<br />
era. Drug Discovery Today 2002, 7(11):621-628.<br />
3. Voit EO: The dawn of a new era of metabolic systems analysis. Drug<br />
Discovery Today BioSilico 2004, 2(5):182-189.<br />
4. Voit EO, Brigham KL: The role of systems biology in predictive health and<br />
pers<strong>on</strong>alized medicine. The Open Path. J 2008, 2:68-70.<br />
5. Voit EO: A systems-theoretical framework for health and disease. Math<br />
Biosc 2009, 217:11-18.<br />
O3<br />
Molecular bioactivity extrapolati<strong>on</strong> to novel targets by support vector<br />
machines<br />
Gerard JP Van Westen 1* ,JKWegner 2 , AP IJzerman 1 , HWT Van Vlijmen 2 , A Bender 1<br />
1 Amsterdam Center for Drug Research, Einsteinweg 55, <strong>23</strong>33 CC, Leiden, The<br />
Netherlands; 2 Tibotec, Gen De Wittelaan L 11B 3, 2800 Mechelen, Belgium<br />
E-mail: gerard@lacdr.leidenuniv.nl<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O3<br />
The early phases of drug discovery use in silico models to rati<strong>on</strong>alize<br />
structure activity relati<strong>on</strong>ships, and to predict the activity of novel<br />
Page 3 of 25<br />
compounds. However, the performance of these models is not always<br />
acceptable and the reliability of external predicti<strong>on</strong>s - both to novel<br />
compounds and to related protein targets - is often limited.<br />
Proteochemometric modeling [1] adds a target descripti<strong>on</strong>, based <strong>on</strong><br />
physicochemical properties of the binding site, to these models.<br />
Our proteochemometric models [2] are based <strong>on</strong> Scitegic circular<br />
fingerprints <strong>on</strong> the compound side and <strong>on</strong> a customized protein<br />
fingerprint <strong>on</strong> the target side. This protein fingerprint is based <strong>on</strong> a<br />
selecti<strong>on</strong> of physicochemical descriptors obtained from the AAindex<br />
database. Through PCA we selected a number of physicochemical<br />
properties which are hashed in a fingerprint using the Scitegic hashing<br />
algorithm. We compared this fingerprint to a number of protein<br />
descriptors previously published, including the Z-scales, the FASGAI and<br />
the BLOSUM descriptors. Our fingerprint performs superior to all of<br />
these. In additi<strong>on</strong>, we show that proteochemometric models improve<br />
external predicti<strong>on</strong> capabilities. In the case of classificati<strong>on</strong> this leads to<br />
models with a higher specificity when compared to c<strong>on</strong>venti<strong>on</strong>al QSAR.<br />
In the case of regressi<strong>on</strong> our models show an average lower RMSE of<br />
0.12 log units when based <strong>on</strong> a pIC50 output variable compared to<br />
c<strong>on</strong>venti<strong>on</strong>al QSAR modeling the same data-set. Furthermore, our<br />
models enable target extrapolati<strong>on</strong>. As a result we can predict the<br />
activity of known and new compounds <strong>on</strong> new targets while retaining<br />
the same model quality as when performing external validati<strong>on</strong> without<br />
target extrapolati<strong>on</strong>.<br />
References<br />
1. Freyhult E, Prusis P, Lapinsh M, Wikberg JE, Moult<strong>on</strong> V, Gustafss<strong>on</strong> MG:<br />
Unbiased descriptor and parameter selecti<strong>on</strong> c<strong>on</strong>firms the<br />
potential of proteochemometric modelling. BMC Bioinformatics 2005,<br />
6:50.<br />
2. Doddareddy MKR, Van Westen GJP, Horst Van der E, Peir<strong>on</strong>cely JE,<br />
Corthals F, IJzerman AP, Emmerich M, Jenkins JL, Bender A:<br />
Chemogenomics: Looking at Biology through the Lens of Chemistry. Stat<br />
Anal Data Mining 2009 in press.<br />
O4<br />
Representati<strong>on</strong> and searching of biomolecules<br />
Joeseph L Durant * , WL Chen, BD Christie, DL Grier, BA Leland, JG Nourse<br />
Symyx Technologies, 2440 Camino Ram<strong>on</strong>, San Ram<strong>on</strong>, California, USA<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O4<br />
Biomolecules present challenges to chemical informati<strong>on</strong> systems<br />
designed for small molecules. Their sizes, up to tens of thousands of<br />
atoms, overwhelm representati<strong>on</strong>/storage/searching soluti<strong>on</strong>s built <strong>on</strong><br />
explicit chemical representati<strong>on</strong> of the structures. But biomolecules are<br />
largely made up of many repeats of a limited number of buildingblock<br />
molecules, a fact which has been used to provide a compressed<br />
representati<strong>on</strong> for biomolecules using templates for the building<br />
blocks.<br />
We have adopted a modified template-based representati<strong>on</strong> for<br />
biomolecules. Our primary interest is in the chemically modified porti<strong>on</strong>s<br />
of biomolecules, for which we choose to use explicit chemistry. These<br />
areas of explicit chemistry are then embedded in the templatecompressed,<br />
unmodified porti<strong>on</strong>s of the full biomolecule.<br />
The regi<strong>on</strong>s c<strong>on</strong>taining explicit chemistry are indexed, and thus can be<br />
structure searched with good performance. A limited number of residues<br />
surrounding explicit chemistry regi<strong>on</strong>s are included in the index for<br />
searching the c<strong>on</strong>text of these explicit regi<strong>on</strong>s. By using explicit chemistry<br />
to represent modified regi<strong>on</strong>s we can search across classes of<br />
modificati<strong>on</strong>s for comm<strong>on</strong> features. For example a single substructure<br />
search query will find green fluorescent protein, and its histidine,<br />
phenylalanine and tryptophan analogs.<br />
Templates are stored with the structure providing a self-c<strong>on</strong>tained file<br />
format. The use of NEMA keys allows templates from different structures<br />
to be compared, and allows storage of structures c<strong>on</strong>taining a can<strong>on</strong>ical<br />
list of templates. The residues have defined attachment points, allowing<br />
automated traversal of a protein backb<strong>on</strong>e, or locati<strong>on</strong> of n<strong>on</strong>-backb<strong>on</strong>e<br />
b<strong>on</strong>ds to residues.<br />
We will present example structures and structural queries highlighting<br />
capabilities of our representati<strong>on</strong>.
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
O5<br />
Cross-validati<strong>on</strong> is dead. L<strong>on</strong>g live cross-validati<strong>on</strong>! Model validati<strong>on</strong><br />
based <strong>on</strong> resampling<br />
Knut Baumann<br />
Institute of Pharmaceutical Chemistry, University of Technology<br />
Braunschweig, Beethovenstr. 55, D-38106 Braunschweig, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O5<br />
Cross-validati<strong>on</strong> was originally invented to estimate the predicti<strong>on</strong> error of<br />
a mathematical modelling procedure. It can be shown that crossvalidati<strong>on</strong><br />
estimates the predicti<strong>on</strong> error almost unbiasedly. N<strong>on</strong>etheless,<br />
there are numerous reports in the chemoinformatic literature that crossvalidated<br />
figures of merit cannot be trusted and that a so-called external<br />
test set has to be used to estimate the predicti<strong>on</strong> error of a mathematical<br />
model. In most cases where cross-validati<strong>on</strong> fails to estimate the<br />
predicti<strong>on</strong> error correctly, this can be traced back to the fact that it was<br />
employed as an objective functi<strong>on</strong> for model selecti<strong>on</strong>. Typically each<br />
model has some meta-parameters that need to be tuned such as the<br />
choice of the actual descriptors and the number of variables in a QSAR<br />
equati<strong>on</strong>, the network topology of a neural net, or the complexity of a<br />
decisi<strong>on</strong> tree. In this case the meta-parameter is varied and the crossvalidated<br />
predicti<strong>on</strong> error is determined for each setting. Finally, the<br />
parameter setting is chosen that optimizes the cross-validated predicti<strong>on</strong><br />
error in an attempt to optimize the predictivity of the model. However, in<br />
these cases cross-validati<strong>on</strong> is no l<strong>on</strong>ger an unbiased estimator of the<br />
predicti<strong>on</strong> error and may grossly deviate from the result of an external<br />
test set. It can be shown that the “amount” of model selecti<strong>on</strong> can<br />
directly be related to the inflati<strong>on</strong> of cross-validated figures of merit.<br />
Hence, the model selecti<strong>on</strong> step has to be separated from the step of<br />
estimating the predicti<strong>on</strong> error. If this is d<strong>on</strong>e correctly, cross-validati<strong>on</strong><br />
(or resampling in general) retains its property of unbiasedly estimating<br />
the predicti<strong>on</strong> error. Matter of factly, it can be shown that data splitting<br />
into a training set and an external test set often estimates the predicti<strong>on</strong><br />
error less precise than proper cross-validati<strong>on</strong>. It is this variabability of<br />
predicti<strong>on</strong> errors, which depends <strong>on</strong> test set size, that causes seemingly<br />
paradox phenomena such as the so-called “Kubinyi’s paradox<strong>on</strong>” for small<br />
data sets.<br />
O6<br />
A unified approach to the applicability domain problem of<br />
QSAR models<br />
Dragos Horvath * , Gilles Marcou, Alexandre Varnek<br />
Laboratoire d’InfoChimie, Université de Strasbourg - CNRS, Institut de Chimie,<br />
4 rue Blaise Pascal, 67000 Strasbourg, France<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O6<br />
The present work proposes a unified c<strong>on</strong>ceptual framework to describe<br />
and quantify the important issue of the Applicability Domains (AD) of<br />
Quantitative Structure-Activity Relati<strong>on</strong>ships (QSARs). AD models are<br />
c<strong>on</strong>ceived as meta-models designed to associate an untrustworthiness<br />
score to any molecule M subject to property predicti<strong>on</strong> by a QSAR<br />
model. Untrustworthiness scores or “AD metrics” are an expressi<strong>on</strong> of the<br />
relati<strong>on</strong>ship between M (represented by its descriptors in chemical space)<br />
and the space z<strong>on</strong>es populated by the training molecules at the basis of<br />
model μ. Scores integrating some of the classical AD criteria (similaritybased,<br />
box-based) were c<strong>on</strong>sidered in additi<strong>on</strong> to newly invented terms,<br />
such as the dissimilarity to outlier-free training sets and the correlati<strong>on</strong><br />
breakdown count.<br />
A loose correlati<strong>on</strong> is expected to exist between this untrustworthiness<br />
and the error affecting the predicted property. While high<br />
untrustworthiness does not preclude correct predicti<strong>on</strong>s, inaccurate<br />
predicti<strong>on</strong>s at low untrustworthiness must be imperatively avoided. This<br />
kind of relati<strong>on</strong>ship is characteristic for the Neighborhood Behavior (NB)<br />
problem: dissimilar molecule pairs may or may not display similar<br />
properties, but similar molecule pairs with different properties are<br />
explicitly “forbidden”. Therefore, statistical tools developed to tackle this<br />
latter aspect were applied, and lead to a unified AD metric<br />
benchmarking scheme.<br />
A first use of untrustworthiness scores resides in prioritizati<strong>on</strong> of<br />
predicti<strong>on</strong>s, without need to specify a hard AD border. Moreover, if a<br />
Page 4 of 25<br />
significant set of external compounds is available, the formalism allows<br />
optimal AD borderlines to be fitted. Eventually, c<strong>on</strong>sensus AD definiti<strong>on</strong>s<br />
were built by means of a n<strong>on</strong>parametric mixing scheme of two AD<br />
metrics of comparable quality, and shown to outperform their respective<br />
parents.<br />
O7<br />
Quantifying model errors using similarity to training data<br />
Rob D Brown 1* , JD H<strong>on</strong>eycutt 1 , SL Aar<strong>on</strong> 2<br />
1 2<br />
Accelrys Inc, 10188 Telesis Court, San Diego, CA 92121, USA; Accelrys Inc,<br />
Cambridge, UK<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O7<br />
When making a predicti<strong>on</strong> with a statistical model, it is not sufficient to<br />
know that the model is “good”, in the sense that it is able to make<br />
accurate predicti<strong>on</strong>s <strong>on</strong> test data. Another relevant questi<strong>on</strong> is: How good<br />
is the model for a specific sample whose properties we wish to predict?<br />
Stated another way: Is the sample within or outside the model’s domain<br />
of applicability or what is the degree to which a test compound is within<br />
the model’s domain of applicability. Numerous studies have been d<strong>on</strong>e<br />
<strong>on</strong> determining appropriate measures to address this questi<strong>on</strong> [1-4]. Here<br />
we focus <strong>on</strong> a derivative questi<strong>on</strong>: Can we determine an applicability<br />
domain measure suitable for deriving quantitative error bars – that is,<br />
error bars which accurately reflect the expected error when making<br />
predicti<strong>on</strong>s for specified values of the domain measure? Such a measure<br />
could then be used to provide an indicati<strong>on</strong> of the c<strong>on</strong>fidence in a given<br />
predicti<strong>on</strong> (i.e. the likely error in a predicti<strong>on</strong> based <strong>on</strong> to what degree<br />
the test compound is part of the model’s domain of applicability). Ideally,<br />
we wish such a measure to be simple to calculate and to understand, to<br />
apply to models of all types – including classificati<strong>on</strong> and regressi<strong>on</strong><br />
models for both molecular and n<strong>on</strong>-molecular data - and to be free of<br />
adjustable parameters. C<strong>on</strong>sistent with recent work by others [5,6], the<br />
measures we have seen that best meet these criteria are distances to<br />
individual samples in the training data. We describe our attempts to<br />
c<strong>on</strong>struct a recipe for deriving quantitative error bars from these<br />
distances.<br />
References<br />
1. Erikss<strong>on</strong> L, Jaworska J, Worth AP, Cr<strong>on</strong>in MTD, McDowell RM, Gramatica P:<br />
Methods for Reliability and Uncertainty Assessment and for Applicability<br />
Evaluati<strong>on</strong>s of Classificati<strong>on</strong>- and Regressi<strong>on</strong>-Based QSARs. Envir<strong>on</strong>mental<br />
Health Perspectives 2003, 111:1361.<br />
2. Tropsha A, Gramatica P, Gombar VK: The Importance of Being Earnest:<br />
Validati<strong>on</strong> is the Absolute Essential for Successful Applicati<strong>on</strong> and<br />
Interpretati<strong>on</strong> of QSPR Models. QSAR Comb Sci 2003, 22:69.<br />
3. Jaworska J, Nikolova-Jeliazkova N, Aldenberg T: QSAR applicabilty domain<br />
estimati<strong>on</strong> by projecti<strong>on</strong> of the training set descriptor space: a review.<br />
Altern Lab Anim 2005, 33:445-59.<br />
4. Stanforth RW, Kolossov E, Mirkin B: A Measure of Domain of Applicability<br />
for QSAR Modelling Based <strong>on</strong> Intelligent K-Means Clustering. QSAR &<br />
Combinatorial Science 2007, 26:837.<br />
5. Sheridan RP, Feust<strong>on</strong> BP, Maiorov VN, Kearsley SK: Similarity to Molecules<br />
in the Training Set Is a Good Discriminator for Predicti<strong>on</strong> Accuracy in<br />
QSAR. J Chem Inf Comput Sci 2004, 44:1912.<br />
6. Horvath D, Marcou G, Varnek A: Predicting the Predictability: A Unified<br />
Approach to the Applicability Domain Problem of QSAR Models. J Chem<br />
Inf Comput Sci 2009, 49:49.<br />
O8<br />
A Branch-and-Bound approach for tautomer enumerati<strong>on</strong><br />
Torsten Thalheim * , R-U Ebert, R Kühne, G Schüürmann<br />
UFZ Department Ecological Chemistry - Helmholtz Centre for Envir<strong>on</strong>mental<br />
Research, Permoser Str. 15, 04318 Leipzig, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O8<br />
For molecules with mobile H atoms, the result of quantitative structureactivity<br />
relati<strong>on</strong>ships (QSARs) depends <strong>on</strong> the positi<strong>on</strong> of the respective<br />
hydrogen atoms. Thus to obtain reliable results, tautomerism needs to be<br />
taken into account.<br />
In the last years, many approaches were introduced to achieve this. In<br />
this work we present a further development of our previous algorithm
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
based <strong>on</strong> InChI-layers. While the InChI ansatz supports <strong>on</strong>ly heteroatomtautomerism,<br />
we suggest an extensi<strong>on</strong> regarding carb<strong>on</strong> atoms too.<br />
Whereas with other tautomer generating algorithms the hydrogen shifts<br />
are based <strong>on</strong> pattern-rules, we try to overcome the rule c<strong>on</strong>stricti<strong>on</strong> and<br />
evolve a more comm<strong>on</strong> soluti<strong>on</strong>. The advantage of our approach is quite<br />
simple. Due to the avoidance of a rule system with its necessity for<br />
excepti<strong>on</strong>s to the rules, we can apply our soluti<strong>on</strong> to any kind of<br />
tautomerism definiti<strong>on</strong>.<br />
We set up a Branch-and-Bound approach, which is optimized to generate<br />
a complete enumerati<strong>on</strong> of all tautomers, with regard to a certain<br />
definiti<strong>on</strong>, from any structure. With few and easy decisi<strong>on</strong>s like symmetry<br />
detecti<strong>on</strong>, we avoid a lot of calculati<strong>on</strong> overhead. Decisi<strong>on</strong>s with<br />
significant influence <strong>on</strong> the algorithm efficiency are made as early as<br />
possible.<br />
We have set up several kinds of tautomer definiti<strong>on</strong>s and derived a stable<br />
definiti<strong>on</strong> covering the major kinds of protropic tautomerism.<br />
Furthermore we analyzed, what expenditure of time for large databases<br />
(case study: more than 70,000 entries) is needed to investigate which<br />
structures have tautomers and which not for more than 99% of the<br />
database entries.<br />
This study has been financially supported by the EU project OSIRIS<br />
(IP, c<strong>on</strong>tract no. 037017).<br />
O9<br />
KnowledgeSpace - a publicly available virtual chemistry space<br />
Carsten Detering * , H Claussen, M Gastreich, C Lemmen<br />
BioSolveIT GmbH, An der Ziegelei 79, 53757 Sankt Augustin, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O9<br />
Virtual high throughput screening of in-house compound collecti<strong>on</strong>s and<br />
vendor catalogs is a validated approach in the quest for novel molecular<br />
entities. However, these libraries are small compared to the overall<br />
synthesizable number of compounds from validated “wet” chemical<br />
reacti<strong>on</strong>s in pharma companies or the public domain.<br />
In order to overcome this limitati<strong>on</strong>, we designed a large virtual<br />
combinatorial chemistry space from publicly available combinatorial<br />
libraries that gives access to billi<strong>on</strong>s of synthetically accessible<br />
compounds. Together with FTrees, a fuzzy similarity calculator, the<br />
researcher has a means of searching this KnowledgeSpace for analogues<br />
to <strong>on</strong>e or several query molecules within a few minutes. The resulting<br />
compounds not <strong>on</strong>ly exhibit similar properties to the query molecule(s),<br />
but also feature an annotati<strong>on</strong> through which of the synthetic routes<br />
these molecules can be made. Results can be expected to be diverse,<br />
based <strong>on</strong> FTrees scaffold hopping capabilities, and provide ideas for hit<br />
follow-up into novel compound classes.<br />
In this c<strong>on</strong>tributi<strong>on</strong> we present the design and properties of the<br />
KnowledgeSpace and other in-house chemistry spaces that build <strong>on</strong> the<br />
same strategy as well as validati<strong>on</strong> of results and a number of successful<br />
applicati<strong>on</strong>s including prospective results.<br />
O10<br />
3D pharmacophore alignments: does improved geometric accuracy<br />
affect virtual screening performance?<br />
Gerhard Wolber 1* , T Seidel 2 , F Bendix 2<br />
1 University of Innsbruck, Institute of Pharmacy, Dept. Pharmaceutical<br />
Chemistry, Innrain 52, 6020 Innsbruck, Austria; 2 Inte:Ligand GmbH,<br />
Mariahilferstrasse 74B/11, A-1070 Vienna, Austria<br />
E-mail: gerhard.wolber@uibk.ac.at<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O10<br />
Virtual screening using three-dimensi<strong>on</strong>al arrangements of chemical<br />
features (3D pharmacophores) has become an important method in<br />
computer-aided drug design. Although frequently used, c<strong>on</strong>siderable<br />
differences exist in the interpretati<strong>on</strong> of these chemical features and their<br />
corresp<strong>on</strong>ding 3D overlay algorithms. We have recently developed an<br />
efficient and accurate 3D alignment algorithm based <strong>on</strong> a pattern<br />
recogniti<strong>on</strong> technique [1]. In the presented work, we extend this<br />
algorithm to be used for high-performance virtual database screening<br />
and investigate, whether applying this geometrically more accurate 3D<br />
Page 5 of 25<br />
alignment algorithm improves virtual screening results over c<strong>on</strong>venti<strong>on</strong>al<br />
incremental n-point distance matching approaches.<br />
Reference<br />
1. Wolber G, Dornhofer A, Langer T: Efficient overlay of small molecules<br />
using 3-D pharmacophores. J Comput-Aided Mol Design 2006,<br />
20(12):773-788.<br />
O11<br />
The protein flexibility in receptor-ligand docking simulati<strong>on</strong>s<br />
Frank Tristram<br />
Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermannv<strong>on</strong>-Helmholtz-Platz<br />
1, 76344 Eggenstein-Leopoldshafen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O11<br />
Many small-molecule drugs work by binding specifically to a target<br />
protein in the cell. It is known for over a century that both the ligand<br />
and protein receptor change their c<strong>on</strong>formati<strong>on</strong> in the associati<strong>on</strong><br />
process, which is called the induced-fit effect. Ligand c<strong>on</strong>formati<strong>on</strong>al<br />
change is routinely treated in methods for in-silico drug discovery,<br />
because typical drug molecules have seldom more than 100 atoms. The<br />
flexibility of proteins, which often have more than 10000 atoms, is much<br />
harder to treat and was therefore neglected in most high-speed docking<br />
methods, limiting their accuracy and predictive value.<br />
In this project we have developed an efficient numerical search<br />
procedure that succeeds to model the interacti<strong>on</strong> between a flexible<br />
ligand and important flexible parts of the protein in a computati<strong>on</strong>ally<br />
affordable protocol. Our virtual screening software FlexScreen thus<br />
minimizes the total interacti<strong>on</strong> energy of the emerging protein-ligand<br />
complex including flexible backb<strong>on</strong>e regi<strong>on</strong>s. We have dem<strong>on</strong>strated the<br />
reliability and accuracy of this approach <strong>on</strong> several examples, for which at<br />
least two different receptor-c<strong>on</strong>formati<strong>on</strong>s have been experimentally<br />
observed. We succeeded to predict the correct binding pose of a proteinligand<br />
complex starting from the crystal structure of an unrelated protein<br />
c<strong>on</strong>formati<strong>on</strong>, where large c<strong>on</strong>formati<strong>on</strong>al changes of the protein are<br />
necessary to bind the ligand. Correctly predicting such c<strong>on</strong>formati<strong>on</strong>al<br />
changes makes our approach attractive for virtual screening of medium<br />
sized databases, in particular for kinases and other target receptors which<br />
have flexible binding pockets.<br />
O12<br />
Random molecular substructures as fragment-type descriptors<br />
José Batista 1,2<br />
1 Jado Technologies GmbH, Tatzberg 47-51, D-01307 Dresden, <str<strong>on</strong>g>German</str<strong>on</strong>g>y;<br />
2 Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical<br />
Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universtität<br />
B<strong>on</strong>n, Dahlmannstr. 2, D-53113 B<strong>on</strong>n, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O12<br />
A novel approach for analysis of structure-activity relati<strong>on</strong>ships for sets of<br />
active compounds is reported. Molecular similarity relati<strong>on</strong>ships are<br />
analyzed am<strong>on</strong>g compounds with related biological activity based <strong>on</strong> the<br />
evaluati<strong>on</strong> of randomly generated fragment populati<strong>on</strong>s. Fragments are<br />
randomly generated by iterative b<strong>on</strong>d deleti<strong>on</strong> in molecular graphs and<br />
hence depart from those obtained by well-defined fragmentati<strong>on</strong><br />
schemes [1]. A major finding has been that for any given activity class,<br />
dependency relati<strong>on</strong>ships of fragment co-occurrence exists within<br />
random fragment populati<strong>on</strong>s. Through the analysis of these relati<strong>on</strong>ships<br />
the chemical informati<strong>on</strong> c<strong>on</strong>tent of generated substructures can be<br />
quantified [2]. This has lead to the identificati<strong>on</strong> of small numbers (10-<br />
40) of so-called activity class characteristic substructures (ACCS) that have<br />
been successfully utilized in virtual screening applicati<strong>on</strong>s [3,4].<br />
References<br />
1. Batista J, Godden JW, Bajorath J: J Chem Inf Model 2006, 46(5):1937-1944,<br />
(PMID: 16995724).<br />
2. Batista J, Bajorath J: J Chem Inf Model 2007, 47(4):1405-1413, (PMID:<br />
17585755).<br />
3. Batista J, Bajorath J: Chem Med Chem 2008, 3(1):67-73, (PMID: 17952883).<br />
4. Stumpfe D, Frizler M, Sisay MT, Batista J, Vogt I, Gütschow M, Bajorath J:<br />
Chem Med Chem 2009, 4(1):52-54, (PMID: 19053132).
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
O13<br />
Making people’s lives easier, better and more beautiful - and how<br />
computer-aided materials design may help<br />
Thomas Kostka<br />
Henkel AG & KGaA, Henkelstrasse 67, 40191 Düsseldorf, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O13<br />
Computer-based simulati<strong>on</strong>s provide new insights at the molecular scale<br />
of chemical processes and material properties - new products may be<br />
developed more ec<strong>on</strong>omically by a more focused strategy. Thus, the<br />
overall product development cycle for new materials and active<br />
ingredients can be reduced significantly.<br />
R&D explores and evaluates c<strong>on</strong>tinuously these possibilities and utilizes<br />
the enormous potential linked with scientific computing technologies.<br />
The method portfolio ranges from quantum chemical, molecular<br />
mechanics and mesoscopic simulati<strong>on</strong>s to QSAR approaches.<br />
Examples will be given how scientific computing methods c<strong>on</strong>tribute to<br />
innovative products with state-of-the-art molecular modeling and<br />
mathematical methods - both, for c<strong>on</strong>sumer goods and industrial<br />
applicati<strong>on</strong>s.<br />
O14<br />
Biomaterialcoatings - a challenging task studied by the molecular<br />
fragment dynamics<br />
Carsten Wittekindt * , H Kuhn<br />
CAM-D Technologies GmbH, Schützenbahn 70, 45117 Essen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
E-mail: wittekindt@molecular-dynamics.de<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O14<br />
A special and important task in medicinal implant technology is to<br />
preventthematerialfromfoulingormakeitcompatiblefortissuesby<br />
coating the material with functi<strong>on</strong>alized lipid biomolecules. In order to<br />
rati<strong>on</strong>alize the design of such new materials structural informati<strong>on</strong> about<br />
the film is needed. In this presentati<strong>on</strong> we give an account <strong>on</strong> the<br />
investigati<strong>on</strong> of coating of the tetraetherlipid GDNT <strong>on</strong> a borofluorate<br />
surface in different solvents. To gain insights into the dynamic process <strong>on</strong><br />
the microsec<strong>on</strong>d and micrometer scale we applied our recently<br />
developed Molecular Fragment Dynamics method (MFD) [1].<br />
Aside of the size of the system and the durati<strong>on</strong> of the simulati<strong>on</strong>, the<br />
challenging task of the investigati<strong>on</strong> lies in the divers chemical<br />
functi<strong>on</strong>ality of the compounds. The applied MFD algorithm is developed<br />
to resolve especially these problems. The MFD algorithm is based <strong>on</strong> the<br />
Dissipative-Particle-Dynamics method (DPD). Whereas in the DPD method<br />
the system is divided into different regi<strong>on</strong>s of fluid subsystems, the MFD<br />
algorithm calculates the interacti<strong>on</strong> of molecular fragments.<br />
C<strong>on</strong>sequently, the MFD method is well suited for the molecular<br />
modelling simulati<strong>on</strong> of systems having a diverse variety of chemical<br />
functi<strong>on</strong>alities thus allowing for investigati<strong>on</strong> of biocoatings, polymers<br />
and tensids. Figure 1.<br />
Reference<br />
1. Ryjkina E, Kuhn H, Rehage H, Müller F, Peggau J: Molecular Dynamic<br />
Computer Simulati<strong>on</strong>s of Phase Behavior of N<strong>on</strong>-I<strong>on</strong>ic Surfactants.<br />
Angew Chem Int Ed 2002, 41:983.<br />
Figure 1 (abstract O14).<br />
O15<br />
Mechanistic studies <strong>on</strong> the ring-opening polymerisati<strong>on</strong> of D,L-lactide<br />
with zinc guanidine complexes<br />
S Herres-Pawlis 1* , J Börner 2 , Vieira I dos Santos 2 , U Flörke 2<br />
1 2<br />
TU Dortmund, Fakultät Chemie, 44221 Dortmund, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; University of<br />
Paderborn, Paderborn, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O15<br />
The increasing shortage of resources has enforced the search for<br />
producti<strong>on</strong> techniques for biodegradable polymers made of renewable<br />
raw materials. An example is polylactide (PLA), an aliphatic polyester,<br />
which can be obtained by the metal-catalysed ring-opening<br />
polymerisati<strong>on</strong> (ROP) of lactide (Fig. 1). The m<strong>on</strong>omer for PLA producti<strong>on</strong><br />
is available from corn or sugar beets by a bacterial fermentati<strong>on</strong> process<br />
in few steps [1] The PLA can be either recycled or composted after use,<br />
making the use CO 2-emissi<strong>on</strong>-neutral. Most large-scale processes are<br />
based <strong>on</strong> the use of tin compounds as initiators [2]. However, for use in<br />
food packaging or similar applicati<strong>on</strong>s, heavy metals are undesirable [3].<br />
In order to substitute heavy-metal-based catalyst systems, especially zinc<br />
guanidine complexes are suited for the polymerisati<strong>on</strong> due to their<br />
polymerisati<strong>on</strong> activity and n<strong>on</strong>-toxicity [4]. The investigati<strong>on</strong>s <strong>on</strong> the<br />
catalytic potential in the bulk polymerisati<strong>on</strong> of lactide have shown that<br />
these compounds are able to act as initiators for lactide polymerisati<strong>on</strong><br />
with <strong>on</strong>ly few excepti<strong>on</strong>s. Polylactides with molecular weights (MW)<br />
between 20.000 and 170.000 g/mol could be obtained.<br />
As the polymerisati<strong>on</strong> activity depends str<strong>on</strong>gly <strong>on</strong> the ligand structure<br />
(spacer, guanidine and amine unit), DFT studies <strong>on</strong> the complex<br />
properties have been performed [5]. A correlati<strong>on</strong> between calculated<br />
partial charges <strong>on</strong> the zinc and the guanidine N atoms and the catalytic<br />
activity has been found. The guanidine is supposed to open<br />
nucleophilically the lactide ring and supersede the presence of alkoxides<br />
which is traditi<strong>on</strong>ally required. This crucial step in the ROP has been<br />
analysed by DFT and a mecha-nism will be presented. The deeper<br />
understanding of the catalytic transiti<strong>on</strong> state will enable a more efficient<br />
access to catalyst design. Figure 2.<br />
Figure 1 (abstract O15). ROP of D,L-lactide<br />
Figure 2 (abstract O15). General motif for hybridquanidines<br />
Page 6 of 25
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
References<br />
1. Bigg DM: Adv Polym Tech 2005, 24:69.<br />
2. Kricheldorf HR, Meier-Haak J: Macromol Chem 1993, 194:715.<br />
3. Kricheldorf HR: Chemosphere 2001, 43:49.<br />
4. Börner J, Herres-Pawlis S, Flörke U, Huber K: Eur J Inorg Chem 2007, 5645.<br />
5. Börner J, Flörke U, Huber K, Döring A, Kuckling D, Herres-Pawlis S: Chem Eur<br />
J 2009, 15:<strong>23</strong>62.<br />
O16<br />
ChemSpider - building a foundati<strong>on</strong> for the semantic web by hosting<br />
a crowd sourced databasing platform for chemistry<br />
Anth<strong>on</strong>y J Williams 1* , Valery Tkachenko 2 , Sergey Golotvin 2 , Richard Kidd 3 ,<br />
Graham McCann 3<br />
1<br />
Royal Society of Chemistry, 904 Tamaras Circle, Wake Forest, NC-27587, USA;<br />
2 3<br />
NCBI, NLM, Nati<strong>on</strong>al Institutes of Health, Washingt<strong>on</strong>, USA; Royal Society of<br />
Chemistry, Cambridge, UK<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O16<br />
There is an increasing availability of free and open access resources for<br />
chemists to use <strong>on</strong> the internet. Coupled with the increasing availability<br />
of Open Source software tools we are in the middle of a revoluti<strong>on</strong> in<br />
data availability and tools to manipulate these data. ChemSpider is a free<br />
access website for chemists built with the intenti<strong>on</strong> of providing a<br />
structure centric community for chemists. It was developed with the<br />
intenti<strong>on</strong> of aggregating and indexing available sources of chemical<br />
structures and their associated informati<strong>on</strong> into a single searchable<br />
repository and making it available to everybody, at no charge.<br />
There are tens if not hundreds of chemical structure databases such as<br />
literature data, chemical vendor catalogs, molecular properties,<br />
envir<strong>on</strong>mental data, toxicity data, analytical data etc. and no single way<br />
to search across them. Despite the fact that there were a large number of<br />
databases c<strong>on</strong>taining chemical compounds and data available <strong>on</strong>line their<br />
inherent quality, accuracy and completeness was lacking in many regards.<br />
The intenti<strong>on</strong> with ChemSpider was to provide a platform whereby the<br />
chemistry community could c<strong>on</strong>tribute to cleaning up the data,<br />
improving the quality of data <strong>on</strong>line and expanding the informati<strong>on</strong><br />
available to include data such as reacti<strong>on</strong> syntheses, analytical data,<br />
experimental properties and linking to other valuable resources. It has<br />
grown into a resource c<strong>on</strong>taining over 21 milli<strong>on</strong> unique chemical<br />
structures from over 200 data sources.<br />
ChemSpider has enabled real time curati<strong>on</strong> of the data, associati<strong>on</strong> of<br />
analytical data with chemical structures, real-time depositi<strong>on</strong> of single<br />
or batch chemical structures (including with activity data) and<br />
transacti<strong>on</strong>-based predicti<strong>on</strong>s of physicochemical data. The social<br />
community aspects of the system dem<strong>on</strong>strate the potential of this<br />
approach. Curati<strong>on</strong> of the data c<strong>on</strong>tinues daily and thousands of edits<br />
and depositi<strong>on</strong>s by members of the community have dramatically<br />
improved the quality of the data relative to other public resources for<br />
chemistry.<br />
This presentati<strong>on</strong> will provide an overview of the history of ChemSpider,<br />
the present capabilities of the platform and how it can become <strong>on</strong>e of<br />
the primary foundati<strong>on</strong>s of the semantic web for chemistry. It will also<br />
discuss some of the present projects underway since the acquisiti<strong>on</strong> of<br />
ChemSpider by the Royal Society of Chemistry.<br />
O17<br />
Integrati<strong>on</strong> of chemical informati<strong>on</strong> with protein sequences and<br />
3D structures<br />
Adel Golovin * , K Henrick, G Kleywegt<br />
EMBL-EBI/PDBe, Hinxt<strong>on</strong> Hall, Genome Campus, Cambridge, Cams,<br />
CB10 1SD, UK<br />
E-mail: golovin@ebi.ac.uk<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O17<br />
The Protein Data Bank (PDB) c<strong>on</strong>tains a wealth of small molecule - macro<br />
molecule complexes the study of which c<strong>on</strong>tribute enormously to our<br />
understanding of the interacti<strong>on</strong>s. However, exploiting and mining this<br />
treasure trove of data requires advanced analysis and retrieval methods<br />
that take into account both types of molecules. One such method is<br />
PDBeMotif, that has been developed by the Protein Data Bank in Europe<br />
Page 7 of 25<br />
(PDBe) at EMBL-EBI. Utilizing a relati<strong>on</strong>al database model at the back-end,<br />
the data structure represents a network of molecule, residue and motif<br />
interacti<strong>on</strong>s as well as their relative positi<strong>on</strong>s in the sequence and in 3D.<br />
The loader applies a number of algorithms to analyse PDB and derive<br />
necessary informati<strong>on</strong>, such as planarity and aromaticity of the chemical<br />
compounds, hydrogen-b<strong>on</strong>ds network, coordinati<strong>on</strong> geometry, b<strong>on</strong>d<br />
types (including pi electr<strong>on</strong> interacti<strong>on</strong>s), 3D structural motifs, sequence<br />
domains and families. It collects informati<strong>on</strong> about sequence features,<br />
motifs and catalytic sites from available Distributed Annotati<strong>on</strong> System<br />
(DAS) resources. The web applicati<strong>on</strong> allows for a wide variety of searches<br />
and data analysis including protein motifs with chemical fragments<br />
associati<strong>on</strong>, protein sites characterisati<strong>on</strong>, correlating properties, hits<br />
multiple sequence and 3D alignments. The whole system is released<br />
under GPL and available with the source code from http://sourceforge.<br />
net/projects/pdbsam and <strong>on</strong> line at http://www.ebi.ac.uk/pdbe-site/<br />
PDBeMotif/<br />
References<br />
1. Golovin A, Henrick K: Chemical Substructure Search in SQL. J Chem Inf<br />
Model 2009, 49(1):22-27.<br />
2. Golovin A, Henrick K: MSDmotif: exploring protein sites and motifs.<br />
BMC Bioinformatics 2008, 9:312.<br />
3. Golovin A, Dimitropoulos D, Oldfield T, Rachedi A, Henrick K: MSDsite: A<br />
Database Search and Retrieval System for the Analysis and Viewing of<br />
Bound Ligands and Active Sites. PROTEINS: Structure, Functi<strong>on</strong>, and<br />
Bioinformatics 2005, 58(1):190-9.<br />
4. Golovin A, Oldfield TJ, Tate JG, Velankar S, Bart<strong>on</strong> GJ, Boutselakis H,<br />
Dimitropoulos D, Fill<strong>on</strong> J, Hussain A, I<strong>on</strong>ides JMC, John M, Keller PA,<br />
Krissinel E, McNeil P, Naim A, Newman R, Paj<strong>on</strong> A, Pineda J, Rachedi A,<br />
Copeland J, Sitnov A, Sobhany S, Suarez-Uruena A, Swaminathan J,<br />
Tagari M, Tromm S, Vranken W, Henrick K: E-MSD: an integrated data<br />
resource for bioinformatics. Nucleic Acids Research 2004, 32 Database:<br />
D211-D216.<br />
O18<br />
Pers<strong>on</strong>alized informati<strong>on</strong> spaces: improved access to<br />
chemical digital libraries<br />
O Koepler * , W-T Balke, B Köhncke, I Sens, S Tönnies<br />
Technische Informati<strong>on</strong>sbibliothek Hannover (TIB), Welfengarten,1B, 30167<br />
Hannover, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O18<br />
Today digital libraries provide access to a vast, but largely unstructured,<br />
amount of document collecti<strong>on</strong>s. Facing the ever increasing challenge of<br />
the informati<strong>on</strong> overload c<strong>on</strong>tent providers have to focus <strong>on</strong> new ways in<br />
user-centered retrieval, not <strong>on</strong>ly providing tools for searching informati<strong>on</strong>,<br />
but tools for pers<strong>on</strong>alizing, managing, evaluating and working with the<br />
returned search results.<br />
Within the ViFaChem II research project the TIB and the L3S Research<br />
Center, Hannover, have developed an enhanced retrieval platform for<br />
chemical digital libraries.<br />
The prototypical interface processes full text and bibliographic metadata<br />
collecti<strong>on</strong>s to create semantically enriched document collecti<strong>on</strong>s with<br />
chemical metadata. Processing steps include text mining for chemical<br />
entities, reacti<strong>on</strong> types and chemical structure rec<strong>on</strong>structi<strong>on</strong>. However,<br />
the ViFaChem digital library does not <strong>on</strong>ly offer the classical document<br />
access via a text or chemical structure search, but also semantic search<br />
based <strong>on</strong> the faceted browsing paradigm. In particular it includes the<br />
Semantic GrowBag algorithm, which automatically creates facets for<br />
navigati<strong>on</strong>al access and query refinement using the relati<strong>on</strong>ship between<br />
documents and author keywords. Based <strong>on</strong> higher-order co-occurrences<br />
of keywords in documents these graphs hierarchically arrange all related<br />
topics dynamically with respect to the underlying c<strong>on</strong>tent collecti<strong>on</strong>.<br />
Having different pers<strong>on</strong>al views <strong>on</strong> the document collecti<strong>on</strong> the user now<br />
can search and navigate through search results using individual retrieval<br />
strategies. Facets for chemical reacti<strong>on</strong>s, chemical entities or topics<br />
combined with an underlying <strong>on</strong>tology thus allow a semantic driven<br />
access to documents.<br />
Combined with Web 2.0 features the interface of ViFaChem II provides<br />
the user with a new experience in searching large document collecti<strong>on</strong>s,<br />
where the user can navigate through query results based <strong>on</strong> his<br />
pers<strong>on</strong>alized knowledge space.
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
O19<br />
Current aspects and future trends of computer-aided rescaffolding<br />
Karl-Heinz Baringhaus * , Gerhard Hessler, Thomas Klabunde<br />
Sanofi-Aventis Deutschland GmbH, CAS Drug Design, Building G 878, 65926<br />
Frankfurt am Main, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O19<br />
The competitive pressure in pharmaceutical industry is reflected by l<strong>on</strong>ger<br />
development times and increasing development costs yielding less new<br />
chemical entities. In particular, identificati<strong>on</strong> of suitable lead compounds<br />
is <strong>on</strong>e of the key challenges in early drug discovery. Next to well<br />
established techniques like high throughput screening (HTS) and virtual<br />
screening computer-aided rescaffolding has become an important<br />
approach in detecting novel chemical structures [1,2].<br />
The development of New Chemical Entities (NCEs) requires the<br />
explorati<strong>on</strong> of novel landscapes in chemical space. Rescaffolding is in<br />
particular important in lead identificati<strong>on</strong> but also in lead optimizati<strong>on</strong>. If<br />
a lead series cannot be optimized in multidimensi<strong>on</strong>al space, scaffoldrescuing<br />
is often perceived as back-up strategy to transfer hitherto<br />
available SAR into a new scaffold.<br />
With a binding site or a pharmacophore in hand often de novo design<br />
methods are applied to identify novel chemical matter. However, de novo<br />
design differs from rescaffolding in that the goal of the first is primarily to<br />
generate new molecules in chemical space while the latter aims to design<br />
new scaffolds under c<strong>on</strong>straints: high similarity in the desired property<br />
space and novelty in scaffold space. This allows jumping out of the known<br />
regi<strong>on</strong> of chemical space towards new regi<strong>on</strong>s of chemistry resulting in<br />
molecules with similar properties and activities but with novel frameworks.<br />
This talk will focus <strong>on</strong> ligand-based rescaffolding by taking into account<br />
<strong>on</strong>ly the topology of reference molecules. Several descriptors have been<br />
successfully applied for rescaffolding purposes, e.g. CATS (including<br />
CATS3D), Feature Trees (cf. ReCore), Topomers, Gaussian shape (cf. ROCS,<br />
BROOD) and others. In fragment-based rescaffolding new molecules are<br />
assembled from small (sub-) molecular fragments or building blocks often<br />
by applying pseudo-chemical rules in their recombinati<strong>on</strong> (e.g., RECAP).<br />
The use of fragments reduces the sampling rate of chemical space and<br />
provides access to synthesizable new compounds. This will be<br />
exemplified by two rescaffolding examples taking into account 2D as well<br />
as 3D descriptors.<br />
References<br />
1. Krueger BA, Dietrich A, Baringhaus K-H, Schneider G: Scaffold-Hopping<br />
Potential of Fragment-Based De Novo Design: The Chances and Limits of<br />
Variati<strong>on</strong>. Combinatorial Chemistry & High Throughput Screening 2009,<br />
12(4):383-396.<br />
2. Mauser H, Guba W: Recent developments in de novo design and scaffold<br />
hopping. Current Opini<strong>on</strong> in Drug Discovery & Development 2008,<br />
11(3):365-374.<br />
O20<br />
The membrane bound aromatic p-hydroxybenzoic acid<br />
oligoprenyltransferase (UbiA) - how iterative improvements lead to a<br />
realistic structure that offers new insights into functi<strong>on</strong>al aspects of<br />
prenyl transferases and terpene synthases<br />
W Brandt * , J Kufka, D Schulze, E Schulze, F Rausch, L Wessjohann<br />
Leibniz Institute of Plant Biochemistry, Department of Bioorganic Chemistry,<br />
Weinberg 3, D-06120 Halle (Saale), <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
E-mail: wbrandt@ipb-halle.de<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O20<br />
Prenyltransfering enzymes are at the basis of the vast isoprenoid natural<br />
product diversity. 4-Hydroxybenzoate oligoprenyltransferase of E. coli,<br />
encoded in the gene ubiA, is a key enzyme in the biosynthetic pathway<br />
to ubiquin<strong>on</strong>e. No X-ray structure exists of this membrane protein. It<br />
catalyses the prenylati<strong>on</strong> of 4-hydroxybenzoic acid in the 3-positi<strong>on</strong> using<br />
an oligoprenyl diphosphate as sec<strong>on</strong>d substrate.<br />
With homology modeling techniques, a first model indicated two putative<br />
active sites[1]. Based <strong>on</strong> this model, amino acids identified as important<br />
for the catalytic mechanism were selectively replaced to obtain five new<br />
mutants. All mutants were tested for their ability to form geranylated<br />
hydroxybenzoate from geranyl diphosphate, but <strong>on</strong>ly the unmodified<br />
Page 8 of 25<br />
UbiA-enzyme and to minor extent <strong>on</strong>e mutant showed enzymatic activity.<br />
These results indicated the involvement of all mutated amino acids in the<br />
catalytic mechanism but at the same time demanded a remodelling of<br />
the previously proposed enzyme structure combining two active sites.<br />
They have been placed into close proximity in the new model. Based <strong>on</strong><br />
these experimental results and structural classificati<strong>on</strong> of prenyl enzymes,<br />
a highly relevant 3D-model could be developed. This model is able to<br />
explain a wide range of substrate specificities and is in complete<br />
agreement with the results of site directed mutagenesis [2].<br />
Aromatic prenyl transferases, prenyl diphosphate synthases, and terpene<br />
synthases, activate an (olig)prenyl diphosphate to form a stabilized prenyl<br />
cati<strong>on</strong> reactive intermediate that, after additi<strong>on</strong> to a nucleophile (C=C<br />
b<strong>on</strong>d) and deprot<strong>on</strong>ati<strong>on</strong> delivers the product(s). Aromatic amino acids<br />
have been suggested to stabilize the cati<strong>on</strong> intermediate. Ab inito LMP2<br />
calculati<strong>on</strong>s indicate not <strong>on</strong>ly stabilizati<strong>on</strong> of a prenyl cati<strong>on</strong> by aromatic<br />
amino acid side chains but also by a methi<strong>on</strong>ine side chain. This<br />
suggesti<strong>on</strong> is supported by site directed mutagenesis, bioinformatics, and<br />
modelling studies. In additi<strong>on</strong>, a new catalytic diad composed of Tyr and<br />
Asp, represented by a Yx(x)xxD-motif, is identified as important player for<br />
deprot<strong>on</strong>ati<strong>on</strong> and prot<strong>on</strong>-relay in intermediates, or for the finalizing<br />
deprot<strong>on</strong>ati<strong>on</strong> step of many prenyl transferring and cyclizing enzymes.<br />
References<br />
1. Brauer L, Brandt W, Wessjohann L: J Mol Model 2004, 10:317-327.<br />
2. Brauer L, Brandt W, Schulze D, Zakharova S, Wessjohann L: Chembiochem<br />
2008, 9:982-992.<br />
O21<br />
CELLmicrocosmos 2.2: advancements and applicati<strong>on</strong>s in modeling of<br />
three-dimensi<strong>on</strong>al PDB membranes<br />
Björn Sommer 1* , T Dingersen 1 , S Schneider 2 , S Rubert 1 , C Gamroth 1<br />
1<br />
University of Bielefeld, Universitätsstraße 25, 33615 Bielefeld, <str<strong>on</strong>g>German</str<strong>on</strong>g>y;<br />
2<br />
D-85748 Garching, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O21<br />
Background: Today, <strong>on</strong>ly a few programs support membrane<br />
computati<strong>on</strong> and/or modeling in 3D. They enable the user to create very<br />
simple-structured membrane layers and usually assume a high level of<br />
bio-chemical/-physical knowledge. The CELLmicrocosmos 2 project<br />
developed a tool providing a simplified workflow to create membrane<br />
(bi-)layers: The MembraneEditor (CmME).<br />
Results: The geometry-based, scalable and modular computati<strong>on</strong> c<strong>on</strong>cept<br />
supports fast to more complex membrane generati<strong>on</strong>s. CmME is based <strong>on</strong><br />
the integrati<strong>on</strong> of two different types of PDB [1] models: Lipids are<br />
integrated with editable percental distributi<strong>on</strong> values and algorithms.<br />
Proteins are inserted and aligned into the bilayer manually or automatically,<br />
by using data from the PDB_TM [2] or OPM [3] database. Compatibility with<br />
other programs is offered by extensive PDB format export settings. High<br />
lipid densities are possible through advanced packing algorithms. Lipid<br />
distributi<strong>on</strong>s can be developed by using the Plugin-Interface. Originally not<br />
intended to change the atomic structure of the molecules due performance<br />
issues, now it is also possible to access the atomic level for user-defined<br />
computati<strong>on</strong>s. Multiple membrane (bi-)layers and microdomains are<br />
supported as well as a reengineering functi<strong>on</strong> providing the re-editing of<br />
externally simulated PDB membranes.<br />
C<strong>on</strong>clusi<strong>on</strong>s: The capabilities of CmME has been extended and tested to<br />
meet the requirements of different PDB visualizati<strong>on</strong> programs as well<br />
as molecular dynamics (MD) simulati<strong>on</strong> envir<strong>on</strong>ments like Gromacs [4].<br />
The documentati<strong>on</strong> and the Java Webstart applicati<strong>on</strong>, requiring <strong>on</strong>ly<br />
an internet c<strong>on</strong>necti<strong>on</strong> and Java 6, is accessible at: http://Cm2.<br />
CELLmicrocosmos.org<br />
References<br />
1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H,<br />
Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Research<br />
Oxford University Press, Oxford 2000, 28:<strong>23</strong>5-242.<br />
2. Tusnády GE, Dosztányi Z, Sim<strong>on</strong> I: Transmembrane proteins in the Protein<br />
Data Bank: identificati<strong>on</strong> and classificati<strong>on</strong>. Bioinformatics 2004,<br />
20(17):2964-2972.<br />
3. Lomize MA, Lomize AL, Pogozheva ID, Mosberg HI: OPM: Orientati<strong>on</strong>s of<br />
Proteins in Membranes database. Bioinformatics 2006, 22:6<strong>23</strong>-625.<br />
4. Lindahl E, Hess B, Spoel van der D: GROMACS 3.0: A package for<br />
molecular simulati<strong>on</strong> and trajectory analysis. J Mol Mod 2001, 7:306-317.
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
O22<br />
Approaching the limits: scoring functi<strong>on</strong>s for affinity predicti<strong>on</strong><br />
Christoph Sotriffer<br />
Institut für Pharmazie und Lebensmittelchemie, Universität Würzburg,<br />
Am Hubland, D - 97074 Würzburg, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O22<br />
One of the main tasks of computati<strong>on</strong>al methods in ligand design and<br />
lead identificati<strong>on</strong> is the elucidati<strong>on</strong> and assessment of interacti<strong>on</strong> modes<br />
between small-molecule ligands and protein target structures. This<br />
generally requires to estimate the relative or absolute affinity of a<br />
protein-ligand complex from its three-dimensi<strong>on</strong>al coordinates. Although<br />
statistical thermodynamics would in principle provide the necessary<br />
equati<strong>on</strong>s to calculate free energies of binding from molecular properties,<br />
these equati<strong>on</strong>s are not readily amenable to computati<strong>on</strong> since<br />
appropriate ensembles of the solvated systems must be generated and<br />
thoroughly sampled, which normally requires prohibitively l<strong>on</strong>g<br />
computing times. For the purpose of drug design, and virtual screening<br />
in particular, simpler and faster methods are needed, which are<br />
comm<strong>on</strong>ly referred to as scoring functi<strong>on</strong>s.<br />
In general, three major classes of scoring functi<strong>on</strong>s can be<br />
distinguished: force-field based methods, knowledge-based potentials,<br />
and empirical scoring functi<strong>on</strong>s. For any of these classes, many different<br />
functi<strong>on</strong>s have been developed over the past years. Although largescale<br />
and truly unbiased comparative assessments of their performance<br />
are relatively rare due to inherent difficulties in setting up appropriate<br />
test sets, the strengths and limitati<strong>on</strong>s of current scoring functi<strong>on</strong>s are<br />
fairly evident from the available data. In general, good results can be<br />
obtained in the identificati<strong>on</strong> or reproducti<strong>on</strong> of experimentally<br />
observed binding modes. The ability to distinguish active ligands from<br />
decoys appears, at least, to be sufficient to make virtual screening a<br />
practically useful endeavour. However, the correlati<strong>on</strong> with the<br />
experimental binding free energy and the possibility to quantitate the<br />
effects of small structural changes <strong>on</strong> the ligand affinity are in many<br />
cases still disappointing.<br />
Based <strong>on</strong> the approximati<strong>on</strong>s used by current scoring functi<strong>on</strong>s, strategies<br />
for improvement can be defined. On the other hand, there are some<br />
fundamental problems, also with respect to the underlying experimental<br />
data, which suggest that the best functi<strong>on</strong>s presently available may<br />
already be close to the limit of what can be achieved with empirical<br />
approaches.<br />
O<strong>23</strong><br />
I<strong>on</strong> permeati<strong>on</strong> through neur<strong>on</strong>al nAChR i<strong>on</strong> channel<br />
Jens Krüger * , G Fels<br />
University of Paderborn, Warburger Strasse 100, 33100 Paderborn, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O<strong>23</strong><br />
Alzheimer’s dementia is related to loss of inter-neur<strong>on</strong> communicati<strong>on</strong> by<br />
nicotinic acetylcholine receptors (nAChR) at post synaptic neur<strong>on</strong>s, and<br />
successful therapy approaches rely e.g. <strong>on</strong> modulati<strong>on</strong>s of resp<strong>on</strong>se<br />
signals initiated by i<strong>on</strong> flux through the receptor integrated i<strong>on</strong> channel.<br />
We have evaluated Umbrella Sampling (US) and Steered Molecular<br />
Dynamics (SMD) pulling for calculati<strong>on</strong> of the potential of mean force<br />
(PMF) of i<strong>on</strong> permeati<strong>on</strong> through neur<strong>on</strong>al alpha7 nAChR, a cati<strong>on</strong><br />
selective channel, in order to understand the changes in electrical<br />
potentials c<strong>on</strong>nected to channel gating. SMD enables the discriminati<strong>on</strong><br />
between i<strong>on</strong> in- and efflux and allows detailed insight into permeati<strong>on</strong><br />
pathways (Figure 1).<br />
Both methods show the same performance characteristics but require a<br />
different number of simulati<strong>on</strong>s with different simulati<strong>on</strong> length. For<br />
c<strong>on</strong>structi<strong>on</strong> of <strong>on</strong>e PMF 300 separate 300 ps simulati<strong>on</strong>s are performed<br />
for US, but <strong>on</strong>ly <strong>on</strong>e 2100 ps run for SMD pulling. Reas<strong>on</strong>able error<br />
estimates require three repetiti<strong>on</strong>s of US but two times twelve repetiti<strong>on</strong>s<br />
for SMD, which sum up to 9360 CPU hours in case of US and 6490 for<br />
SMD. SMD, therefore, needs 30% less time <strong>on</strong> low latency HPC clusters,<br />
but US gains advantage through distributed computing.<br />
The potential energy barriers are calculated for Na + and Cl - based <strong>on</strong> US<br />
and SMD pulling, enabling differentiati<strong>on</strong> between i<strong>on</strong> types and their<br />
in- and efflux through the channel.<br />
Figure 1 (abstract O<strong>23</strong>). Cut through the transmembrane part of<br />
the nAChR. The thin gray lines corresp<strong>on</strong>d to individual permeati<strong>on</strong><br />
pathways.<br />
O24<br />
Fleksy: a flexible approach to induced fit docking<br />
Markus Wagener 1* , SB Nabuurs 2 , J de Vlieg 1<br />
1 Schering-Plough Research Institute, PO Box 20, 5340BH Oss,<br />
The Netherlands; 2 Nijmegen, The Netherlands<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O24<br />
Protein receptor rearrangements up<strong>on</strong> ligand binding are a major<br />
complicating factor in structure-based drug design. An accurate predicti<strong>on</strong><br />
of these so-called induced fit phenomena calls for ligand docking and<br />
virtual screening approaches capable of c<strong>on</strong>sidering receptor flexibility.<br />
We present Fleksy [1], a flexible approach aimed at accurately positi<strong>on</strong>ing<br />
small molecule ligands into a protein receptor, while taking both ligand<br />
and receptor flexibility into account. Our method c<strong>on</strong>sists of an ensemble<br />
docking stage in which the ligand of interest is docked into a structural<br />
ensemble of receptor c<strong>on</strong>formati<strong>on</strong>s, followed by a complex optimizati<strong>on</strong><br />
stage during which both ligand and protein are allowed to move. Pivotal<br />
to our method is the use of receptor ensembles to describe protein<br />
flexibility. To c<strong>on</strong>struct these ensembles we use a backb<strong>on</strong>e dependent<br />
rotamer library and implement the c<strong>on</strong>cept of interacti<strong>on</strong> sampling. The<br />
latter allows for the evaluati<strong>on</strong> of different orientati<strong>on</strong>s and, when<br />
relevant, different tautomers of ambivalent interacti<strong>on</strong> partners in the<br />
binding site such as asparagine, glutamine and histidine side chains. The<br />
docking stage comprises an ensemble-based soft-docking experiment<br />
using FlexX-Ensemble [2], followed by an effective flexible receptor-ligand<br />
complex optimizati<strong>on</strong> using Yasara [3]. Ultimately Fleksy results in a set of<br />
receptor-ligand complexes ranked using a c<strong>on</strong>sensus scoring functi<strong>on</strong><br />
which combines both docking scores and force field energies. Figure 1.<br />
Averaged over three cross-docking datasets, in total c<strong>on</strong>taining 35<br />
different pharmaceutically relevant receptor-ligand complexes, Fleksy<br />
reproduces the observed binding mode within 2.0 Å for 78% of the<br />
complexes. This compares favorably to the rigid receptor FlexX program<br />
[4] which <strong>on</strong> average reaches a success rate of 44% for these datasets.<br />
References<br />
1. Nabuurs SB, Wagener M, de Vlieg J: J Med Chem 2007, 50:6507.<br />
2. Claussen H, Buning C, Rarey M, Lengauer T: J Mol Biol 2001, 308:377.<br />
3. [http://www.yasara.org/].<br />
4. Rarey M, Kramer B, Lengauer T, Klebe G: J Mol Biol 1996, 261:470.<br />
Figure 1 (abstract O24).<br />
Page 9 of 25
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
POSTER PRESENTATIONS<br />
P1<br />
CWM global search - an internet search engine for the chemist<br />
Alexander Kos * , H-J Himmler<br />
AKos GmbH, Austr. 26, D-79585 Steinen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P1<br />
The Internet is a rich source of data and informati<strong>on</strong> for chemist. There are<br />
numerous multidisciplinary databases available for free <strong>on</strong> the Internet. Some<br />
examples of such data repositories are: PubChem, ChemSpider, eMolecules,<br />
Drugbank, KEGG, NIST, ChemSynthesis, PharmGKB, Free patents <strong>on</strong>line ...<br />
It should be obvious that an end user is a) not aware of all the resources,<br />
and b) has not the time to learn every user interface and is unable to<br />
search over all of them. We provide CWM Global Search as an applicati<strong>on</strong><br />
that enables to search by structure, CAS Registry Number and free text<br />
over all these sources. Presently CWM Global Search performs searches in<br />
30 databases and search engines accessing more than 100 milli<strong>on</strong> pages<br />
that associate data with structures.<br />
The user can submit a single query structure or several using SDFiles. In<br />
additi<strong>on</strong> to molecule searches CWM Global Search also allows to submit<br />
reacti<strong>on</strong> queries. In that case several single molecule searches are performed<br />
for the reactants, reagents and products. This makes it easy to find<br />
commercial suppliers and other synthesis relevant informati<strong>on</strong> such as<br />
safety sheets in <strong>on</strong>e query.<br />
Searching is technically less problematic than providing the answers in a<br />
digestible way for the user. Our first approach is to provide profiles for<br />
searching. You can choose “Availability” if you are interested to find a<br />
commercial supplier, or “Biology” if you are looking for biological effects.<br />
The sec<strong>on</strong>d help comes when we display the summary of the results. You<br />
get a table with hyperlinks color coded by topics. If the result page of doing<br />
asearchc<strong>on</strong>tainsalinktoanMSDS,thetopic‘Safety’ is highlighted. With<br />
profiles you limit your search to certain sources, and with topics your<br />
answers will be ordered.<br />
CWM Global Search is not the applicati<strong>on</strong> for exact searches like “give me<br />
the melting point of anthracene”. You will find the melting point <strong>on</strong><br />
many pages, and the topic “Physical Property” might help, but, in this<br />
case the link to Wikipedia gives the result quickest. Internet pages<br />
provide us the data in unordered fashi<strong>on</strong> and finding the exact answers<br />
is time c<strong>on</strong>suming. In some case like commercial suppliers we check<br />
against an internal list if the page really displays a supplier, or, if for<br />
instance PubChem has <strong>on</strong>ly a reference to ChemSpider, and ChemSpider<br />
references again just PubChem, but nowhere you will find a supplier. We<br />
also have to c<strong>on</strong>sider that many providers of the resources would not<br />
allow us to extract data directly without leading the user to their Internet<br />
pages. The nature of the results is fuzzy in CWM Global Search. This is an<br />
advantage if you look for instance for biological effects, which can be<br />
many, and/or if you want to learn why a compound could be important.<br />
We generate both InChI names and keys for the query structure and<br />
afterwards perform fast text searches in the various databases or search<br />
engines. In additi<strong>on</strong> we also generate Smiles. Since prior to the existence<br />
of standard InChI’s (released 2009) the various database providers used<br />
different settings to generate the InChI’s stored in their database, we use<br />
multiple settings when generating an InChI identifier used for a structure<br />
search in CWM Global Search. This way we greatly maximize the chance to<br />
find an InChI independent of the settings used by the database provider.<br />
P2<br />
The status of the InChI project and the InChI trust<br />
Stephen Heller 1* , Alan McNaught 2<br />
1<br />
NIST, PCPD, MS-8380, 100 Bureau Drive, Gaithersburg, 20899-8380, USA;<br />
2<br />
InChI Trust, Cambridge, UK<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P2<br />
The IUPAC InChI/InChIKey project has evolved to the point where its<br />
future development and promulgati<strong>on</strong> require a new management<br />
system that will provide stable and financially viable administrative<br />
arrangements for the foreseeable future. This is necessary to give the<br />
world-wide chemistry community that IUPAC serves the c<strong>on</strong>fidence that<br />
facilities for development, maintenance and support of the InChI/InChIKey<br />
Page 10 of 25<br />
algorithm are firmly established <strong>on</strong> an<strong>on</strong>goingbasis,andwillensure<br />
acceptance and usage of InChI as a mainstream standard. This new<br />
management system is provided by means of a new organizati<strong>on</strong>,<br />
the InChI Trust, an independent not-for-profit entity paid for by<br />
the community and those who use and benefit from the InChI algorithm.<br />
The missi<strong>on</strong> of the Trust is quite simple and limited; its sole purpose is to<br />
create and support administratively and financially a scientifically robust<br />
and comprehensive InChI algorithm and related standards and protocols.<br />
This presentati<strong>on</strong> will describe the current technical state of the InChI<br />
algorithm and how the InChI Trust is working to assure the c<strong>on</strong>tinued<br />
support and delivery of the InChI algorithm.<br />
P3<br />
jsMolEditor: an open source molecule editor for the next<br />
generati<strong>on</strong> web<br />
L Duan<br />
School of Pharmacy, East China University of Science and Technology,<br />
130 Meil<strong>on</strong>g Road, Shanghai 200<strong>23</strong>7, PR China<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P3<br />
Molecule editor have become essential building blocks for modern<br />
chemistry related databases with structural operati<strong>on</strong> available. It was quite<br />
a challenging task to input molecule structures in a web browser before<br />
Java Applets technology was applied in chemoinformatics. Until now, Java<br />
Applets are still the de facto standard for web based databases. With the<br />
development of web technologies and script programming, Ajax, a Web 2.0<br />
technology, is becoming mainstream in web development. This gradually<br />
eliminates the need for plug-ins such as Java and ActiveX. Therefore, it is<br />
possible to develop powerful web-based tools without plug-ins.<br />
This poster introduces a JavaScript based molecule editor: jsMolEditor,<br />
which had made a significant difference from other editors available <strong>on</strong><br />
web pages. Unlike server side editors such as PubChem editor [1],<br />
jsMolEditor implemented all its functi<strong>on</strong>ality in JavaScript <strong>on</strong> the client<br />
side. Thus it is not necessary to maintain a c<strong>on</strong>necti<strong>on</strong> to a server during<br />
operati<strong>on</strong>. jsMolEditor works completely independently <strong>on</strong> the client side.<br />
The JavaScript nature of jsMolEditor also made it able to run in standard<br />
web browsers without specific plugins or virtual machines. The 2D drawing<br />
system deals with web browser differences. By combining Canvas and<br />
VML, jsMolEditor was able to work <strong>on</strong> all comm<strong>on</strong> web browsers including<br />
Internet Explorer, Firefox, Safari, Opera and Chrome. Instead of building<br />
jsMolEditor in JavaScript from scratch, GWT [2] is utilized as a crosscompiler<br />
from Java to JavaScript, which enables the reuse of existing Java<br />
chemoinformatics frameworks. jsMolEditor used a ported versi<strong>on</strong> of MX [3]<br />
as the bottom layer framework to handle basic molfile I/O. Thanks to MX’s<br />
open source license, the development of jsMolEditor saves a lot of time by<br />
eliminating the process of “reinventing the wheel”. The poster represents<br />
not <strong>on</strong>ly jsMolEditor itself, but also the building process of jsMolEditor and<br />
the feasibility of applying the same c<strong>on</strong>cept to other chemistry gadgets <strong>on</strong><br />
web pages, such as spectrum viewers.<br />
jsMolEditor’s <strong>on</strong>line demo is available at http://chemhack.com/jsmoleditor/<br />
and the source code is available at http://github.com/chemhack/jsmoleditor/.<br />
References<br />
1. PubChem Server Side Structure Editor. [http://pubchem.ncbi.nlm.nih.gov/<br />
edit/].<br />
2. GWT, Google Web Toolkit. [http://code.google.com/webtoolkit/].<br />
3. MX Chemoinformatics Toolkit. [http://metamolecular.com/mx].<br />
P4<br />
Mining public-source databases for structure-activity relati<strong>on</strong>ships<br />
Bernd Wendt 1* , Fabian Bös 2 , Ulrike Uhrig 2<br />
1 Elara Pharmaceuticals, Heidelberg, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 Tripos Internati<strong>on</strong>al,<br />
Martin-Kollar-Str. 17, München, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P4<br />
Modeling off-target effects has become a highly important and relevant<br />
comp<strong>on</strong>ent of the computati<strong>on</strong>al chemistry toolset. The presentati<strong>on</strong> will<br />
describe a new c<strong>on</strong>tributi<strong>on</strong> that seeks to allow the extracti<strong>on</strong> of 3Dstructure-activity<br />
relati<strong>on</strong>ships from public informati<strong>on</strong> starting from a<br />
chemical structure. Several public source databases such as PubChem [1]<br />
offering structure as well as activity informati<strong>on</strong> for a number of targets
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
have been examined for their value in extracting useful structure-activity<br />
relati<strong>on</strong>ships (SARs). A Topomer search [2] of public-source databases using<br />
the structures of a set of 255 marketed drugs [3] as queries yielded sets of<br />
shape- and pharmacophore similar hits. SAR-tables were c<strong>on</strong>structed by<br />
collecting hits around each query structure and for a particular reported<br />
activity. A new method: quantitative series enrichment analysis (QSEA) [4]<br />
was applied to these SAR-tables to capture trends and to transform these<br />
trends into 3D-QSAR models. Overall more than 400 SAR-tables with<br />
Topomer CoMFA models were found by extracting trends from the<br />
PubChem and ChemBank [5] database. The resulting models were able to<br />
highlight the structural details of certain off-target effects of marketed drugs<br />
even in those cases where the traditi<strong>on</strong>al structural similarity would<br />
c<strong>on</strong>clude that no off-target effect would exist. This dem<strong>on</strong>strates the<br />
usefulness of the approach in modeling off-target effects.<br />
References<br />
1. [http://pubchem.ncbi.nlm.nih.gov].<br />
2. Cramer RD: J Chem Inf Comp Sci 2004, 44:1221-1227.<br />
3. Cleves AE, Jain AN: J Med Chem 2006, 49:2921-2938.<br />
4. Wendt B, Cramer RD: J Comp Aided Mol Des 2008, 22:541-555.<br />
5. [http://chembank.broad.harvard.edu/].<br />
P5<br />
OCHEM - <strong>on</strong>-line CHEmical database & modeling envir<strong>on</strong>ment<br />
Sergii Novotarskyi 1* , Iurii Sushko 1 , R Körner 1 , Anil Pandey Kumar 1 ,<br />
Matthias Rupp 1 , VV Prokopenko 2 , Igor Tetko 1<br />
1 Institute of Bioinformatics and Systems Biology - Helmholtz Zentrum<br />
Muenchen, <str<strong>on</strong>g>German</str<strong>on</strong>g> Research Center for Envir<strong>on</strong>mental Health, Ingolstaedter<br />
Landstrasse 1, D-85764 Neuherberg, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 Kiev, Ukraine<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P5<br />
The main goal of OCHEM database http://qspr.eu is to collect, store and<br />
manipulate chemical data with the purpose of their use for model<br />
development. Its main features, that distinguish it from other available<br />
databases include:<br />
1. The database is open and it is based <strong>on</strong> Wiki-style principles. We<br />
encourage users to submit data and to correct inaccurate submitted data;<br />
2. The database is aimed at collecting high-quality data. To achieve this<br />
we require users to submit references to the article, where the data was<br />
published. The reference may include the article name, journal name,<br />
date of publicati<strong>on</strong>, page number, line number, etc.<br />
3. Since the compound properties may vary depending <strong>on</strong> the c<strong>on</strong>diti<strong>on</strong>s,<br />
under which they were measured, we store the measurement c<strong>on</strong>diti<strong>on</strong>s<br />
with the data to provide the users with more accurate informati<strong>on</strong> about<br />
each data point.<br />
The modeling framework is being developed to complement the Wiki-style<br />
database of chemical structures. Its main goal is to provide a flexible and<br />
expandable calculati<strong>on</strong> envir<strong>on</strong>ment that would allow a user to create and<br />
manipulate QSAR and QSPR models <strong>on</strong>-line. The modeling framework is<br />
integrated with the database web-interface that allows easy transfer<br />
of database data to the models. The web interface of the modeling<br />
envir<strong>on</strong>ment is aimed to provide to the Web users easy means to create<br />
high-quality predicti<strong>on</strong> models and estimate their accuracy of predicti<strong>on</strong> and<br />
applicability domain. The developed models can be published <strong>on</strong> the Web<br />
and be accessed by other users to predict new molecules <strong>on</strong>-line. This tool is<br />
aimed to generate a new paradigm for structure activity relati<strong>on</strong>ship<br />
knowledge bases, making QSAR/QSPR models active, user-c<strong>on</strong>tributed and<br />
easily accessible for benchmarking, general use and educati<strong>on</strong>al purposes.<br />
The examples of the use of the database within nati<strong>on</strong>al and EU projects will<br />
be exemplified.<br />
P6<br />
ChEBI: a chemistry <strong>on</strong>tology and database<br />
Paula de Matos * , A Dekker, M Ennis, Janna Hastings, K Haug, S Turner,<br />
Christoph Steinbeck<br />
<strong>Cheminformatics</strong> and Metabolism Team, European Bioinformatics Institute,<br />
Hinxt<strong>on</strong> Cambridge, CB10 1SD, UK<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P6<br />
The bioinformatics community has developed a policy of open access and<br />
open data since its incepti<strong>on</strong>. This is c<strong>on</strong>trary to chemoinformatics which<br />
Page 11 of 25<br />
has traditi<strong>on</strong>ally been a closed-access area. In 2004, two complementary<br />
open access databases were initiated by the bioinformatics community,<br />
ChEBI [1] and PubChem. PubChem serves as automated repository <strong>on</strong> the<br />
biological activities of small molecules and ChEBI (Chemical Entities of<br />
Biological Interest) as a manually annotated database of molecular<br />
entities focused <strong>on</strong> ‘small’ chemical compounds. Although ChEBI is<br />
reas<strong>on</strong>ably compact c<strong>on</strong>taining just over 18,000 entities, it provides a<br />
wide range of data items such as chemical nomenclature, an <strong>on</strong>tology<br />
and chemical structures. The ChEBI database has a str<strong>on</strong>g focus <strong>on</strong><br />
quality with excepti<strong>on</strong>al efforts afforded to IUPAC nomenclature rules,<br />
classificati<strong>on</strong> within the <strong>on</strong>tology and best IUPAC practices when drawing<br />
chemical structures.<br />
ChEBI is currently undergoing a period of restructuring which will allow it<br />
to incorporate the small molecule structures from (and link to) EBI’s new<br />
chemogenomics database ChEMBL [2], increasing its small molecules<br />
coverage to over 500,000 entities. We have restructured the chemical<br />
structure search facility to use Orchem [3] an Oracle chemistry plug-in<br />
using the Chemistry Development Kit [4]. The facility allows a user to<br />
draw a chemical structure or load <strong>on</strong>e from a file and then execute either<br />
a substructure or similarity search. Furthermore the ChEBI text search will<br />
have extensive facilities for querying based <strong>on</strong> not <strong>on</strong>ly names but<br />
formula, a range of charges and molecular weight. The ability to query<br />
the ChEBI <strong>on</strong>tology and retrieve all children for a given entity will also be<br />
included.<br />
In order to aid the distributi<strong>on</strong> of ChEBI to the chemoinformatics<br />
community we have extended our export formats to include an MDL sdf<br />
format with a lighter versi<strong>on</strong> c<strong>on</strong>sisting <strong>on</strong>ly of compound structure,<br />
name and identifier. A complete versi<strong>on</strong> is available with all the ChEBI<br />
data properties such as syn<strong>on</strong>yms, cross-references, SMILES and InChI.<br />
Furthermore cross-references in ChEBI have been extended to include<br />
BRENDA the enzyme database, NMRShiftDB the database for organic<br />
structures and their nuclear magnetic res<strong>on</strong>ance (nmr) spectra, Rhea the<br />
biochemical reacti<strong>on</strong> database and IntEnz the enzyme nomenclature<br />
database.<br />
ChEBI is available at http://www.ebi.ac.uk/chebi.<br />
References<br />
1. Degtyarenko K, de Matos P, Ennis M, Hastings J, Zbinden M, McNaught A,<br />
Alcántara R, Darsow M, Guedj M, Ashburner M: ChEBI: a database and<br />
<strong>on</strong>tology for chemical entities of biological interest. Nucleic Acids Res<br />
2008, 36:D344-D350.<br />
2. [http://www.ebi.ac.uk/chembl/].<br />
3. Rijnbeek M, Steinbeck C: An open source chemistry search engine for<br />
Oracle. in press.<br />
4. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen EL: The<br />
Chemistry Development Kit (CDK): An Open-Source Java Library for<br />
Chemo- and Bioinformatics. J Chem Inf Comput Sci 2003, 43(2):493-500.<br />
P7<br />
Comparing manual and automated extracti<strong>on</strong> of chemical entities<br />
from documents<br />
Christian Tyrchan * , Sorel Muresan<br />
AstraZeneca R&D, LG CVGI, Pepparedsleden 1, S-43183 Mölndal, Sweden<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P7<br />
The chemical informati<strong>on</strong> landscape is changing rapidly with a yearly<br />
increase of over 1 milli<strong>on</strong> new compounds and more than 700,000<br />
publicati<strong>on</strong>s related to chemistry [1]. Exploring the chemical space covered<br />
by relevant journals and patents is a crucial step in early stage medicinal<br />
chemistry projects. Extracting chemical entities from unstructured text is a<br />
complex task and different approaches are currently used including<br />
manual extracti<strong>on</strong> by expert curators, text mining supported by chemical<br />
NER or combinati<strong>on</strong>s thereof [2]. The chemical informati<strong>on</strong> and<br />
corresp<strong>on</strong>ding annotati<strong>on</strong>s are subsequently stored in relati<strong>on</strong>al databases<br />
allowing for complex chemical and text queries.<br />
To assess the capability of chemical NER in documents and to understand<br />
thecoverageandaccuracyoftheunderlyingdatawecomparedthe<br />
chemistry extracted by manual curati<strong>on</strong> (GVKBIO) and text mining<br />
(SureChem) from a small patent corpus.<br />
GVKBIO databases are populated with explicit relati<strong>on</strong>ships between<br />
compounds, assays and sequence identifiers that have been manually<br />
extracted from journals and patents <strong>on</strong> a large scale [3].
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
SureChem Portal [4] is a gateway for chemical patent search <strong>on</strong> full text<br />
collecti<strong>on</strong>s for USPTO, EPO and WO. SureChem users can perform structure<br />
and keyword searches <strong>on</strong> more than 9 milli<strong>on</strong> unique compounds.<br />
We have selected a set of 250 patents covering various target classes and for<br />
which a minimum of 25 records per patents were retrieved from GVKBIO<br />
Patent database. The analysis was d<strong>on</strong>e using PipelinePilot protocols [5].<br />
These initial results dem<strong>on</strong>strate the benefits and challenges of text<br />
mining for chemical informati<strong>on</strong> extracti<strong>on</strong> from unstructured text.<br />
References<br />
1. Engel T: J Chem Inf Model 2006, 46:2267-2277.<br />
2. Banville DL, (Ed.): Chemical Informati<strong>on</strong> Mining: Facilitating Literature-Based<br />
Discovery CRC Press: Boca Rat<strong>on</strong>, L<strong>on</strong>d<strong>on</strong> New York 2009.<br />
3. [http://www.gvkbio.com/informatics.html].<br />
4. [http://www.surechem.org].<br />
5. [http://accelrys.com/products/scitegic].<br />
P8<br />
Embedded infrastructure for primary data in chemistry<br />
Igor Grbavac 1* , Oliver Koepler 1 , S Dohmeyer-Fischer 2 , Gregor Fels 2 ,<br />
Irina Sens 1 , J Brase 1<br />
1 <str<strong>on</strong>g>German</str<strong>on</strong>g> Nati<strong>on</strong>al Library of Science and Technology, Welfengarten 1B,<br />
30167 Hannover, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 Universität Paderborn, Department Chemistry,<br />
Warburgerstrasse 100, 33098 Paderborn, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P8<br />
Researchers around the world produce vast amounts of experimental data<br />
each day. Primary (experimental) data are the key elements of research<br />
projects and scientific publicati<strong>on</strong>s. During the scientific workflow from the<br />
experiment to the final scientific publicati<strong>on</strong> primary data is analysed,<br />
interpreted and c<strong>on</strong>densed to a final quintessence. But until now the<br />
handling of primary data in chemistry c<strong>on</strong>tains no comm<strong>on</strong>ly approved<br />
standards c<strong>on</strong>cerning reusability and l<strong>on</strong>g-time accessibility. Predominantly<br />
there are no quality c<strong>on</strong>trol, no assured l<strong>on</strong>g-time archival storage, no<br />
proofs and no exploitati<strong>on</strong> of primary data and c<strong>on</strong>sequently there is no<br />
assurance of data given. The c<strong>on</strong>cept study “K<strong>on</strong>zeptstudie Vernetzte<br />
Primärdaten-Infrastruktur für den Wissenschaftler-Arbeitsplatz” of the TIB,<br />
the workgroup of Prof. Fels and the FIZ CHEMIE aims to identify the key<br />
factors for an embedded infrastructure of primary data in the field of<br />
chemistry. In this infrastructure, chemical primary data shall be saved<br />
persistently in a central database, linked and be citable by the use of DOI<br />
(digital object identifier), be accessible and searchable. Figure 1.<br />
We have carried out a questi<strong>on</strong>naire am<strong>on</strong>g researchers to analyse the<br />
scientific workflow and the chemical life-cycle of primary data in chemistry<br />
and to identify the requirements for processes and structures in an<br />
embedded scientific infrastructure for researchers. We have furthermore<br />
defined both a general and extended metadata scheme for the storage,<br />
citati<strong>on</strong> and searching primary datasets. General metadata include<br />
elements necessary for citati<strong>on</strong>s while chemical metadata cover specific<br />
needs of describing and searching within the data itself.<br />
Figure 1 (abstract P8).<br />
P9<br />
Predicti<strong>on</strong> of the partiti<strong>on</strong> coefficient between air and body<br />
compartments from the chemical structure<br />
Stefanie Stöckl * , R Kühne, R-U Ebert, G Schüürmann<br />
UFZ Department of Ecological Chemistry, Helmholtz Centre for<br />
Envir<strong>on</strong>mental Research, Permoserstr. 15, 04318 Leipzig , <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P9<br />
Page 12 of 25<br />
For PBPK modeling, partiti<strong>on</strong> coefficients between tissues and<br />
envir<strong>on</strong>mental compartments are required. A simple approach starts with<br />
the system blood/air, fat/air, and fat/blood. Employing thermodynamic<br />
relati<strong>on</strong>ships, <strong>on</strong>e of the three coefficients can be calculated from the<br />
other two values. With respect to available human and mammal data,<br />
modeling efforts focus <strong>on</strong> the blood/air and fat/air partiti<strong>on</strong> coefficient.<br />
Respective data sets from literature have been collected and evaluated.<br />
The chemical domain of the sets is presented in terms of chemical<br />
structure, complexity, and polarity. Data gaps have been identified.<br />
The blood/air and fat/air partiti<strong>on</strong> coefficient data sets have been applied<br />
to a validati<strong>on</strong> of literature models, with particular remark <strong>on</strong> the<br />
performance for specific compound classes. A new model for the blood/<br />
air partiti<strong>on</strong> coefficient and a preliminary new model for the fat/air<br />
partiti<strong>on</strong> coefficient are presented.<br />
The study was supported by the EU projects 2FUN (c<strong>on</strong>tract No. 036976)<br />
and OSIRIS (IP, c<strong>on</strong>tract No. 037017).<br />
P10<br />
Modelling dissociati<strong>on</strong> c<strong>on</strong>stants of organic acids by local molecular<br />
parameters<br />
Haiying Yu * , R Kühne, R-U Ebert, G Schüürmann<br />
Department of Ecological Chemistry, Helmholtz Centre for Envir<strong>on</strong>mental<br />
Research – UFZ, Permoserstr. 15, 04318, Leipzig, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P10<br />
The dissociati<strong>on</strong> c<strong>on</strong>stant pKa defines the i<strong>on</strong>izati<strong>on</strong> degree of organic<br />
compounds. It is an important parameter that affects their toxicity and<br />
envir<strong>on</strong>mental behavior and fate. Am<strong>on</strong>g the present approaches to<br />
predict pKa values of organic chemicals, increment methods show a<br />
good performance but are limited by missing values of special groups.<br />
The purpose of this study is to develop models for predicting the pKa of<br />
organic acids directly from their molecular structures.<br />
A quantum chemical method was introduced by employing local molecular<br />
parameters. Experimental pKa values of more than 1000 organic acids,<br />
including phenols, aromatic carboxylic acids, aliphatic carboxylic acids and<br />
alcohols were selected, and all molecules were optimized using the semiempirical<br />
AM1 Hamilt<strong>on</strong>ian. Simple models with several descriptors for<br />
different subsets were calibrated by multilinear regressi<strong>on</strong> (MLR). Substituent<br />
positi<strong>on</strong>s <strong>on</strong> aromatic rings were also taken into account.<br />
The obtained models yield good predictive squared correlati<strong>on</strong> coefficients<br />
q2 and superior performance compared with other quantum chemical<br />
models. The predicti<strong>on</strong> capability is further evaluated using cross<br />
validati<strong>on</strong>. The models unravel that the potential of oxygen atom c<strong>on</strong>joint<br />
to i<strong>on</strong>izable hydrogen atom to accept extra electr<strong>on</strong>ic charge is the single<br />
most important c<strong>on</strong>tributi<strong>on</strong> to the dissociati<strong>on</strong> of hydrogen atom.<br />
N<strong>on</strong>-linear statistical analysis methods were also applied because of the<br />
n<strong>on</strong>-linear character of the employed descriptors.<br />
The study was supported by the China Scholarship Council and by the EU<br />
project OSIRIS (IP, c<strong>on</strong>tact No. 037017), which is gratefully acknowledged.<br />
P11<br />
Where are the boundaries? Automated pocket detecti<strong>on</strong> for<br />
druggability studies<br />
Andrea Volkamer 1* , T Grombacher 2 , Matthias Rarey 1<br />
1 University of Hamburg, Center for Bioinformatics, Bundesstr. 43, 20146<br />
Hamburg, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 Bioinformatics, Merck Ser<strong>on</strong>o, Frankfurter Str. 250,<br />
64293 Darmstadt, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P11<br />
Computer-based predicti<strong>on</strong> of protein druggability is an essential task in<br />
the drug development process. Early identificati<strong>on</strong> of disease modifying
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
targets that can be modulated by low-molecular weight compounds can<br />
help to speed up and reduce costs in drug discovery. Recently, first<br />
methods have been presented performing a druggability estimati<strong>on</strong><br />
solely based <strong>on</strong> the 3D structure of the protein [1-3]. The essential first<br />
step for such methods is the identificati<strong>on</strong> of the active site. A multitude<br />
of methods exist for automated active site predicti<strong>on</strong> [4-6]. However,<br />
most methods developed for automated docking procedures do not<br />
explicitly focus <strong>on</strong> the definiti<strong>on</strong> of the boundary of the active site. Since<br />
druggability estimates are based <strong>on</strong> structural descriptors of the active<br />
site, a precise descripti<strong>on</strong> of the active site boundaries is vital for correct<br />
predicti<strong>on</strong>s.<br />
In this work, we present a method to predict protein binding pockets and<br />
split them into subpockets such that small molecules are mostly<br />
c<strong>on</strong>tained within <strong>on</strong>e sub-pocket. The method is based <strong>on</strong> a novel<br />
strategy to geometrically detect narrow regi<strong>on</strong>s in pockets. For<br />
druggability predicti<strong>on</strong>s, such pocket descripti<strong>on</strong>s result in more<br />
meaningful structural descriptors like active site surface or volume.<br />
Moreover, if several structures from <strong>on</strong>e protein are known, sub-pockets<br />
can give hints about protein flexibility and induced fit c<strong>on</strong>formati<strong>on</strong>al<br />
changes.<br />
Our method was evaluated <strong>on</strong> 718 proteins from the PDBbind [7] data<br />
set, as well as 5419 proteins from the scPDB [8] data set. Binding pockets<br />
are correctly predicted in 94% and 93% of the datasets. 38%, respectively<br />
45% of the proteins from the two datasets c<strong>on</strong>tain pockets which can be<br />
divided into more than <strong>on</strong>e sub-pocket. In all cases <strong>on</strong>e sub-pocket<br />
completely covers the co-crystallized ligand. Besides the classical overlapmeasure<br />
of ligand versus predicted active site, we additi<strong>on</strong>ally c<strong>on</strong>sidered<br />
the pocket coverage by the co-crystallized ligand. We found that the<br />
number of test cases with more than 30% pocket coverage rises from<br />
30% to 74% (PDBbind) and from 28% to 63% (scPDB), respectively, when<br />
c<strong>on</strong>sidering sub-pockets.<br />
References<br />
1. Nayal M, H<strong>on</strong>ig B: Proteins 2006, 63(4):892-906.<br />
2. Cheng C, et al: Nat Biotechnol 2007, 25(1):71-5.<br />
3. Halgren TA: J Chem Inf Model 2009, 49(2):377-89.<br />
4. Hendlich M: J Mol Graph Model 1997, 15(6):359-63.<br />
5. Weisel M, et al: Chem Cent J 2007, 1:7.<br />
6. An J, et al: Mol Cell Proteomics 2005, 4(6):752-61.<br />
7. Wang R, et al: J Med Chem 2004, 47(12):2977-80.<br />
8. Kellenberger E, et al: J Chem Inf Model 2006, 46(2):717-27.<br />
P12<br />
Protein negative/positively cooperative binding to<br />
zwitteri<strong>on</strong>ic/ani<strong>on</strong>ic vesicles<br />
F Torrens * , G Castellano<br />
Institut Universitari de Ciència Molecular, Universitat de València, Edifici<br />
d’Instituts de Paterna, P. O. Box 22085, 46071 València, Spain<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P12<br />
The role of electrostatics is studied in the adsorpti<strong>on</strong> of cati<strong>on</strong>ic<br />
proteins to zwitteri<strong>on</strong>ic phosphatidylcholine (PC) and ani<strong>on</strong>ic mixed PC/<br />
phosphatidylglycerol (PG) small unilamellar vesicles (SUVs) [1]. For<br />
model proteins the interacti<strong>on</strong> is m<strong>on</strong>itored vs. PG c<strong>on</strong>tent at low i<strong>on</strong>ic<br />
strength [2]. The adsorpti<strong>on</strong> of lysozyme-myoglobin-bovine serum<br />
albumin (BSA) (isoelectric point, pI 5-11) is investigated in SUVs, al<strong>on</strong>g<br />
with changes of the fluorescence emissi<strong>on</strong> spectra of the proteins, via<br />
theiradsorpti<strong>on</strong><strong>on</strong>SUVs[3].IntheGouy-Chapmanformalismthe<br />
activity coefficient goes with the square of charge number [4].<br />
Deviati<strong>on</strong>s from the ideal model indicate asymmetric locati<strong>on</strong> of ani<strong>on</strong>ic<br />
phospholipid in the bilayer inner leaflet, in mixed zwitteri<strong>on</strong>ic/ani<strong>on</strong>ic<br />
SUVs for protein-PC/PG, in agreement with experiments-molecular<br />
dynamics simulati<strong>on</strong>s. Effective SUV charge stays c<strong>on</strong>stant. Myoglobin-,<br />
DNC-melittin- and melittin-zwitteri<strong>on</strong>ic associati<strong>on</strong>s are described by a<br />
partiti<strong>on</strong> model, modulated by electrostatic charging of membrane as<br />
protein accumulates at interface. Provisi<strong>on</strong>al c<strong>on</strong>clusi<strong>on</strong>s follow. (1) In<br />
mixed zwitteri<strong>on</strong>ic/ani<strong>on</strong>ic vesicles the charge effect <strong>on</strong> the protein<br />
binding model was analyzed. For lysozyme-ani<strong>on</strong>ic enough vesicles and<br />
myoglobin the electrostatic repulsi<strong>on</strong> between cati<strong>on</strong>ic ad proteins<br />
dominates over the electrostatic attracti<strong>on</strong> between ad protein dipoles.<br />
(2) The salt effect <strong>on</strong> the protein binding model of mixed zwitteri<strong>on</strong>ic/<br />
ani<strong>on</strong>ic vesicles was analyzed. The cooperativity increases with i<strong>on</strong>ic<br />
Page 13 of 25<br />
strength. The corresp<strong>on</strong>ding interpretati<strong>on</strong> is that the electrostatic<br />
repulsi<strong>on</strong> between cati<strong>on</strong>ic ad proteins decreases with increasing salt<br />
effect, and the electrostatic attracti<strong>on</strong> between ad protein dipoles<br />
becomes dominant over the electrostatic repulsi<strong>on</strong> between ad protein<br />
charges. (3) In ani<strong>on</strong>ic vesicles the effect of vesicle charge <strong>on</strong> protein<br />
binding shows that, with increasing ani<strong>on</strong>ic character of the vesicles,<br />
the protein-protein electrostatic repulsi<strong>on</strong> is decreasingly important vs.<br />
the protein-vesicle attracti<strong>on</strong>, and the electrostatic attracti<strong>on</strong> between<br />
ad protein dipoles becomes dominant over the electrostatic repulsi<strong>on</strong><br />
between ad protein charges. (4) For lysozyme-mixed zwitteri<strong>on</strong>ic/<br />
ani<strong>on</strong>ic vesicles and myoglobin cooperativity increases with pH. With<br />
increasing pH and decreasing cati<strong>on</strong>ic character of the protein, the<br />
protein-protein electrostatic repulsi<strong>on</strong> is decreasingly important against<br />
the protein-SUV attracti<strong>on</strong>, and the electrostatic attracti<strong>on</strong> between ad<br />
protein dipoles becomes dominant over the electrostatic repulsi<strong>on</strong><br />
between ad protein charges. Furthermore the opposed effect is<br />
observed for lysozyme-zwitteri<strong>on</strong>ic vesicles. (5) For protein-mixed<br />
zwitteri<strong>on</strong>ic/ani<strong>on</strong>ic vesicle binding there is more dispersi<strong>on</strong> in the<br />
results, which could indicate asymmetric locati<strong>on</strong> of ani<strong>on</strong>ic<br />
phospholipid.<br />
References<br />
1. Torrens F, Campos A, Abad C: Cell Mol Biol 2003, 49:991.<br />
2. Torrens F, Abad C, Codoñer A, García-Lopera R, Campos A: Eur Polym J<br />
2005, 41:1439.<br />
3. Torrens F, Castellano G, Campos A, Abad C: JMolStruct2007,<br />
216:834-836.<br />
4. Torrens F, Castellano G, Campos A, Abad C: JMolStruct2009,<br />
274:924-926.<br />
P13<br />
CARTESIUS: a group functi<strong>on</strong> based toolkit for hybrid molecular<br />
modelling<br />
Mikhail Rudenko 1* , AL Tchougreeff 1,2<br />
1 Institute of Physics and Technology of RAS, Nakhimovsky ave. 36/1, 117218,<br />
Moscow; Moscow Center for C<strong>on</strong>tinuous Mathematical Educati<strong>on</strong>, 119991,<br />
Moscow, Russia; 2 Aachen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P13<br />
Modern methods of quantum chemistry have achieved significant<br />
successes. Nevertheless, modeling of certain classes of compounds<br />
remains problematic. The main complicati<strong>on</strong>s are both high computati<strong>on</strong>al<br />
complexity of these methods and the lack of a unique way to determine<br />
the ground state with correct asymptotic properties for complex systems.<br />
One of the possible soluti<strong>on</strong>s is development of the hybrid molecular<br />
modeling methods.<br />
Our approach to hybrid molecular modeling is based <strong>on</strong> the ideas of<br />
the group functi<strong>on</strong> approximati<strong>on</strong>, first suggested by McWeeny [1]<br />
and further elaborated in [2]. This approximati<strong>on</strong> lays a solid basis<br />
for simultaneous usage of most appropriate methods for computati<strong>on</strong><br />
of properties of each part of a complex system and c<strong>on</strong>sistent<br />
interpretati<strong>on</strong> of the properties of the system independently of the<br />
methods used.<br />
The CARTESIUS toolkit implements the above menti<strong>on</strong>ed scheme in<br />
both flexible and computati<strong>on</strong>ally efficient way. This is achieved by<br />
means of combining cutting-edge C++ techniques for computati<strong>on</strong>ally<br />
intensive parts of the code and the full power of Pyth<strong>on</strong> for the c<strong>on</strong>trol<br />
interface.<br />
As a result we have a carefully designed software system, which<br />
possesses both the power of the quantum chemistry codes developed<br />
previously with use of the group functi<strong>on</strong> approximati<strong>on</strong> (BF [3], EHCF [4],<br />
BFSCF [5] etc.) and high potential for further enhancements.<br />
This work is partially supported by the RFBR through the grant No 07-<br />
03-01128.<br />
References<br />
1. McWeeny R: Methods of Molecular Quantum Mechanics Academic, L<strong>on</strong>d<strong>on</strong>,<br />
2 1992.<br />
2. Tchougréeff AL: Hybrid Methods of Molecular Modeling Springer 2008.<br />
3. Tokmachev M, Tchougréeff AL: J Phys Chem A 2003, 107:358.<br />
4. Soudackov AV, Tchougréeff AL, Misurkin IA: Theor Chim Acta 1992,<br />
83:389.<br />
5. Tchougréeff AL, Tokmachev AM: Int J Quant Chem 2006, 106:571.
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
P14<br />
Molecular fragments chemoinformatics<br />
Hubert Kuhn 1* , Stefan Neumann 2 , Christoph Steinbeck 3 , Carsten Wittekindt 1* ,<br />
Achim Zielesny 4<br />
1 CAM-D, Essen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 GNWI, Oer-Erkenschwick, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 3 European<br />
Bioinformatics Institute (EBI), Hinxt<strong>on</strong>, UK; 4 University of Applied Sciences of<br />
Gelsenkirchen, Institute for Bioinformatics and Chemoinformatics,<br />
Recklinghausen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P14<br />
The descripti<strong>on</strong> of chemical structures as a collecti<strong>on</strong> of c<strong>on</strong>nected<br />
molecular fragments is a basic requirement of coarse grained simulati<strong>on</strong><br />
methods like molecular fragment dynamics. These methods use<br />
molecular fragments as their basic interacting entities (“atoms”) and<br />
allow the modelling and investigati<strong>on</strong> of very large chemical systems.<br />
Therefore a molecular fragments chemoinformatics is in need that<br />
supports the fragment-based representati<strong>on</strong> of chemical structures as<br />
well as the elementary operati<strong>on</strong>s up<strong>on</strong> them. The poster outlines<br />
definiti<strong>on</strong>s and approaches to tackle these issues from an adequate<br />
molecular line notati<strong>on</strong> up to the graphical representati<strong>on</strong> of simulati<strong>on</strong><br />
boxes.<br />
P15<br />
A theoretical investigati<strong>on</strong> of microhydrati<strong>on</strong> of amino acids<br />
Catherine Michaux * , J Wouters, D Jacquemin, Eric A Perpète<br />
Unité de Chimie Physique Théorique et Structurale, Facultés<br />
Universitaires Notre-Dame de la Paix, rue de Bruxelles, 61,<br />
B-5000 Namur, Belgium<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P15<br />
Water plays a role in stabilizing biomolecular structure and facilitating<br />
biological functi<strong>on</strong>. Water can generate small active clusters and<br />
macroscopic assemblies, that are able to transmit informati<strong>on</strong> <strong>on</strong> various<br />
scales [1]. Prot<strong>on</strong>ati<strong>on</strong> and microhydrati<strong>on</strong> of proteins or DNA are of<br />
fundamental importance in biochemical processes such as prot<strong>on</strong><br />
transport, water-mediated catalysis, molecular recogniti<strong>on</strong>, protein folding,<br />
etc. The understanding of these hydrati<strong>on</strong> effects at the molecular level<br />
requires the characterizati<strong>on</strong> of the interacti<strong>on</strong>s between biomolecules<br />
and their envir<strong>on</strong>ment. Due to its relevance in many fields, the<br />
microhydrati<strong>on</strong> process of nucleic acid bases or amino acids has received<br />
a widespread attenti<strong>on</strong> [2].<br />
In this work, we first describe the microhydrati<strong>on</strong> of prot<strong>on</strong>ated<br />
amino acid (AA), and particularly of prot<strong>on</strong>ated glycine (GlyH+) [3][4],<br />
alanine (AlaH+) [5] and proline (ProH+) [6]. First a high-level theoretical<br />
method was setting up in order to compute the structures and<br />
properties of GlyH+-water complexes, Gly being the simplest AA and a<br />
suitable model for such a study. Then complexes with more than <strong>on</strong>e<br />
water molecule as well as other amino acids (Ala and Pro) were<br />
investigated, to extend the validity domain of our computati<strong>on</strong>al<br />
procedure.<br />
We then investigate a series of complexes made up of a deprot<strong>on</strong>ated<br />
(ani<strong>on</strong>ic) AA and a single water molecule [7]. Such species have recently<br />
been identified with mass spectrometry, allowing meaningful<br />
comparis<strong>on</strong>s between theoretical and experimental complexati<strong>on</strong><br />
energies. The selected systems are [Gly-H]-, [Ala-H]-, [Val-H]-, [Asp-H]-,<br />
[Gln-H]-, as well as the acetic acid as a model benchmark.<br />
References<br />
1. Chaplin MF: Nat Rev Mol Cell Biol 2006, 7:861-866.<br />
2. de Vries MS, Hobza P: Annu Rev Phys Chem 2007, 58:585-612.<br />
3. Michaux C, Wouters J, Jacquemin D, Perpète EA: Chem Phys Lett 2007,<br />
445:57-61.<br />
4. Michaux C, Wouters J, Perpète EA, Jacquemin D: J Phys Chem B 2008,<br />
112:2430-2438.<br />
5. Michaux C, Wouters J, Perpète EA, Jacquemin D: J Phys Chem B 2008,<br />
112:9896-9902.<br />
6. Michaux C, Wouters J, Perpète EA, Jacquemin D: J Phys Chem B Letters 2008,<br />
112:7702-7705.<br />
7. Michaux C, Wouters J, Perpète EA, Jacquemin D: J Am Soc Mass Spectrom<br />
2009, 20:632-638.<br />
P16<br />
Predicting the binding type of compounds <strong>on</strong> the 4 adenosine<br />
receptors using proteochemometric models<br />
Olaf van den Hoven * , Gerard van Westen, Andreas Bender<br />
University of Leiden, LACDR, Einsteinweg 55, <strong>23</strong>33 CC Leiden,<br />
The Netherlands<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P16<br />
Page 14 of 25<br />
Aim: The use of in silico models could save a lot of m<strong>on</strong>ey, time and<br />
resources in the laboratories when in search for potent binding<br />
candidates of designated receptors. In this paper a proteochemometric<br />
model was developed that is able to distinguish between full ag<strong>on</strong>ist,<br />
ag<strong>on</strong>ist and antag<strong>on</strong>ist across all 4 Adenosine receptors (A1, A2A, A2B<br />
and A3).<br />
Experimental: The software program Pipeline Pilot was used to create a<br />
proteochemometric model c<strong>on</strong>taining 2 SVMs [1]. The first SVM model<br />
was built to predict binding activity, the sec<strong>on</strong>d SVM model to predict<br />
binding type activity. For the first model an in house database, Glida and<br />
part of the Starlite database were used. The pKi values in these databases<br />
were then capped to create an active and inactive class. For the sec<strong>on</strong>d<br />
model a classificati<strong>on</strong> in full ag<strong>on</strong>ist, ag<strong>on</strong>ist and antag<strong>on</strong>ist derived from<br />
the Glida database was used. Different Proteinfingerprints (ProtFPs) were<br />
tested. These FPs c<strong>on</strong>sisted of amino acid residues at important binding<br />
positi<strong>on</strong>s according to two entropy analysis (TEA) and crystal structure<br />
analysis [2]. The quality of the model was determined by MCC, specificity,<br />
sensitivity, predictive ability (Q2) and the correlati<strong>on</strong> coefficient (R2). The<br />
Tanimoto similarity coefficient was used to visualize the degree of<br />
average and minimal similarity of the three binding type groups.<br />
Results: In c<strong>on</strong>tinuous models FPs having a combinati<strong>on</strong> of TEA and<br />
crystal structure informati<strong>on</strong> scored the best (Q2 = 0.6683), but overall<br />
scores were not high enough to benefit from a c<strong>on</strong>tinuous model. A<br />
categorical model was thus build and it was able to distinguish between<br />
ag<strong>on</strong>ist (99% correctly predicted) and antag<strong>on</strong>ist (98% correct) and<br />
partially full ag<strong>on</strong>ists (63% correct) with a specificity of 0.85, a sensitivity<br />
of 0.83 and a MCC of 0.69. The Tanimoto coefficient showed that indeed<br />
antag<strong>on</strong>ist and full ag<strong>on</strong>ist were the least similar, where ag<strong>on</strong>ist show<br />
more resemblance to the full ag<strong>on</strong>ist. The model finally predicted 2 novel<br />
full ag<strong>on</strong>istic compounds in our in house database, being GIFT136 and<br />
GIFT589.<br />
C<strong>on</strong>clusi<strong>on</strong>: A model that is able to distinguish between binding types<br />
was successfully created.<br />
References<br />
1. K<strong>on</strong>tijevskis A: J Chem Inf Model 2008, 48(9):1840.<br />
2. Ye K: Proteins 2006, 63(4):1018.<br />
P17<br />
pharmACOphore: multiple flexible ligand alignment based <strong>on</strong> ant<br />
col<strong>on</strong>y optimizati<strong>on</strong><br />
Gerhard Hessler 1* , Oliver Korb 2 , P M<strong>on</strong>ecke 3 , T Stützle 4 , TE Exner 5<br />
1 Drug Design, Chemical and Analytical Sciences, Sanofi-Aventis Deutschland<br />
GmbH, Frankfurt, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 Cambridge, UK; 3 Frankfurt, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 4 Brussels,<br />
Belgium; 5 K<strong>on</strong>stanz, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P17<br />
Molecular alignment of biologically active compounds is a key technique<br />
in ligand-based drug design. Such alignments can be used for similaritybased<br />
identificati<strong>on</strong> of new compounds by using known active template<br />
structures as seeds. Furthermore, alignments allow for the identificati<strong>on</strong><br />
of potential pharmacophoric features in a set of chemically divers,<br />
biologically active molecules.<br />
Here, pharmACOphore is presented, a new approach for pairwise as well<br />
as multiple flexible alignments of ligands based <strong>on</strong> ant col<strong>on</strong>y<br />
optimizati<strong>on</strong> (ACO) [1]. Translati<strong>on</strong>al, rotati<strong>on</strong>al and torsi<strong>on</strong>al degrees of<br />
freedom of all ligand structures are encoded <strong>on</strong> a pherom<strong>on</strong>e vector<br />
(Figure 1). The artificial ant col<strong>on</strong>y uses this representati<strong>on</strong> to mark<br />
favourable values for each degree of freedom by depositing a<br />
pherom<strong>on</strong>e trail <strong>on</strong>to the corresp<strong>on</strong>ding pherom<strong>on</strong>e vector. The amount<br />
of pherom<strong>on</strong>e deposited directly depends <strong>on</strong> the soluti<strong>on</strong> quality,<br />
assessed by the scoring functi<strong>on</strong>.
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
Figure 1 (abstract P17).<br />
The scoring functi<strong>on</strong> was parameterized by reproducing reference<br />
alignments taken from four different proteins. The performance of the<br />
new alignment algorithm will be dem<strong>on</strong>strated by pairwise alignments<br />
obtained for examples from the FlexS data set [2] and by an additi<strong>on</strong>al<br />
example for multiple flexible alignment.<br />
References<br />
1. Dorigo M, Stützle T: Ant Col<strong>on</strong>y Optimizati<strong>on</strong> MIT Press, Cambridge, MA,<br />
USA 2004.<br />
2. Lemmen C, Lengauer T, Klebe G: FLEXS: A Method for Fast Flexible<br />
Ligand Superpositi<strong>on</strong>. J Med Chem 1998, 41:4502-4520.<br />
P18<br />
Bioisosteric similarity of drugs in virtual screening<br />
Michael C Hutter * , Markus Krier<br />
Center of Bioinformatics, Saarland University, Building C7.1, D-66041<br />
Saarbrücken, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P18<br />
Choosing compounds for screening is difficult problem due the vast<br />
chemical space. The questi<strong>on</strong> is thus which compounds are most likely to<br />
be hits. The comparis<strong>on</strong> of drugs that all target the same enzyme,<br />
however, shows reoccurring chemical modificati<strong>on</strong> throughout all<br />
therapeutic categories. These so-called bioisosteric replacements [1]<br />
comprise simple exchanges of terminal atoms as well as more complex<br />
structural modificati<strong>on</strong>s, such as ring closures or rearrangements of larger<br />
fragments. To detect and evaluate all kinds of replacements we have<br />
designed an approach that adopts the algorithmic c<strong>on</strong>cept used to<br />
assess the homology of amino acid sequences to chemical molecules.<br />
The mutual exchange frequencies between distinct atom types are<br />
expressed in a substituti<strong>on</strong> matrix [2]. Likewise, pair-wise alignment<br />
betweenthemoleculesisc<strong>on</strong>structed using dynamic programming [3],<br />
with the compounds being represented as unique SMILES. To obtain the<br />
actual exchange frequencies, we refined an initial matrix based <strong>on</strong><br />
observed chemical replacements [4] by collecting the generated<br />
alignments of 1353 drugs from 33 therapeutic categories in an<br />
automated procedure. To compute the mutual bioisosteric similarity<br />
between two molecules a specific functi<strong>on</strong> has been derived that makes<br />
use of the alignment [5].<br />
To asses the suitability of this bioisosteric similarity for virtual screening,<br />
we compared the recovery of known drugs against the background of<br />
other substances. The majority of drugs possess a higher similarity<br />
within the same class than compared to substances from the ZINC [6]<br />
ortheProusScienceDrugsoftheFuturedatabase[7].Likewise,drugs<br />
for the same target are usually recovered at higher values of similarities<br />
than compared to other methods based <strong>on</strong> most comm<strong>on</strong> substructure<br />
and fingerprint approaches. Moreover, n<strong>on</strong>drugs without any<br />
pharmaceutical functi<strong>on</strong> exhibit c<strong>on</strong>siderably lower similarities than<br />
actual drugs.<br />
Furthermore, this bioisosteric similarity can be used to express the<br />
chemical diversity within a given compound class. We found that e.g.<br />
inhibitors of the HIV Reverse Transcriptase are more divers than<br />
Angiotensin-II Antag<strong>on</strong>ists and Tetracycline Antibiotics.<br />
References<br />
1. Patani GA, LaVoie EJ: Chem Rev 1996, 96:3147.<br />
2. Henikoff S, Henikoff JG: Proc Natl Acad Sci USA 1992, 89:10195.<br />
3. Smith TF, Waterman MS: J Mol Biol 1981, 147:195.<br />
4. Sheridan RP: J Chem Inf Comput Sci 2002, 42:103.<br />
5. Krier M, Hutter MC: J Chem Inf Model 2009, 49:1280.<br />
6. [http://zinc.docking.org].<br />
7. [http://pubchem.ncbi.nlm.nih.gov/].<br />
Page 15 of 25<br />
P19<br />
Updating existing QSAR models: selecti<strong>on</strong> and weighting of new data<br />
Tomas Öberg * , T Liu<br />
University of Kalmar, Sweden<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P19<br />
Computati<strong>on</strong>al chemistry and quantitative structure-activity relati<strong>on</strong>ships<br />
(QSAR) are foreseen to be extensively used in the implementati<strong>on</strong> of the<br />
new REACH regulati<strong>on</strong> for chemicals in Europe. However, for some<br />
compound groups the data are too few in number to permit both<br />
calibrati<strong>on</strong> and testing of a new model. Usage and previously developed<br />
or updated models are then viable alternatives.<br />
Perfluorocarboxylic acids (PFCAs) and fluoroteleomer alcohols (FTOHs) are<br />
two groups of envir<strong>on</strong>mentally relevant compounds, with unique physical<br />
and chemical properties. The subcooled liquid vapour pressure (pL) is <strong>on</strong>e<br />
such property, where experimental determinati<strong>on</strong>s are limited and far from<br />
c<strong>on</strong>sistent [1]. Updating is, however, challenging when the new compounds<br />
are far outside of the original calibrati<strong>on</strong> domain space. But by carefully<br />
selecting and weighting <strong>on</strong>ly three new compounds, we have been able to<br />
update a previously developed general QSAR model [2], to cover the new<br />
domain while maintaining predictive performance for the earlier calibrati<strong>on</strong><br />
and test data. The optimal weighting scheme was determined from the<br />
sample leverages and residuals in the calibrati<strong>on</strong> phase [3].<br />
The performance of this re-calibrated model greatly surpassed previous<br />
modelling attempts [4], when applied to an external test set of two<br />
PFCAsandfourFTOHswithpLintherange0.2-200Pa;withQ2Ext=<br />
0.994 and RMSEP = 0.190 units of log Pa. The domain coverage also<br />
increased from 1% to 51%, for 426 perfluoroalkylated compounds<br />
selected from the REACH registrati<strong>on</strong> list, the PhysProp database, and the<br />
OECD 2006 survey [5]. Selecti<strong>on</strong> and weighting of new calibrati<strong>on</strong> data<br />
can thus facilitate the extensi<strong>on</strong> and use of existing QSAR models.<br />
This investigati<strong>on</strong> was supported by the EU FP7 project CADASTER (grant<br />
agreement no. 212668).<br />
References<br />
1. Goss K-U, Br<strong>on</strong>ner G, Harner T, Hertel M, Schmidt TC: Envir<strong>on</strong> Sci Technol<br />
2006, 40:3572.<br />
2. Öberg T, Liu T: QSAR Comb Sci 2008, 27:273.<br />
3. Stork CL, Kowalski BR: Chemometr Intell Lab Syst 1999, 48:151.<br />
4. Arp HP, Niederer C, Goss K-U: Envir<strong>on</strong> Sci Technol 2006, 40:7298.<br />
5. Lists of PFOS, PFAS, PFOA, PFCA, related compounds and chemicals that<br />
may degrade to PFCA. ENV/JM/MONO(2006)16, OECD. 2007.<br />
P20<br />
DrugScorePPI for scoring protein-protein interacti<strong>on</strong>s: improving<br />
a knowledge-based scoring functi<strong>on</strong> by atomtype-based QSAR<br />
Dennis M Krüger * , Holger Gohlke<br />
Heinrich-Heine-University, Computati<strong>on</strong>al Pharmaceutical Chemistry, 40225<br />
Düsseldorf, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P20<br />
Protein-protein complexes are known to play key roles in many cellular<br />
processes. Therefore, knowledge of the three-dimensi<strong>on</strong>al structure of<br />
protein-complexes is of fundamental importance. A key goal in proteinprotein<br />
docking is to identify near-native protein-complex structures. In<br />
this work, we address this problem by deriving a knowledge-based<br />
scoring functi<strong>on</strong> from protein-protein complex structures and further finetuning<br />
of the statistical potentials against experimentally determined<br />
alanine-scanning results.<br />
Based <strong>on</strong> the formalism of the DrugScore approach1, distance-dependent<br />
pair potentials are derived from 850 crystallographically determined
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
protein-protein complexes 2. These DrugScorePPI potentials display<br />
quantitative differences compared to those of DrugScore, which was<br />
derived from protein-ligand complexes. When used as an objective<br />
functi<strong>on</strong> to score a n<strong>on</strong>-redundant dataset of 54 targets with “unbound<br />
perturbati<strong>on</strong>” soluti<strong>on</strong>s, DrugscorePPI was able to rank a near-native<br />
soluti<strong>on</strong> in the top ten in 89% and in the top five in 65% of the cases.<br />
Applied to a dataset of “unbound docking” soluti<strong>on</strong>s, DrugscorePPI was<br />
able to rank a near-native soluti<strong>on</strong> in the top ten in 100% and in the top<br />
five in 67% of the cases. Furthermore, Drugscore-PPI was used for<br />
computati<strong>on</strong>al alanine-scanning of a dataset of 18 targets with a total of<br />
309 mutati<strong>on</strong>s to predict changes in the binding free energy up<strong>on</strong><br />
mutati<strong>on</strong>s in the interface. Computed and experimental values showed a<br />
correlati<strong>on</strong> of R2 = 0.34. To improve the predictive power, a QSAR-model<br />
was built based <strong>on</strong> 24 residue-specific atom types that improves the<br />
correlati<strong>on</strong> coefficient to a value of 0.53, with a root mean square<br />
deviati<strong>on</strong> of 0.89 kcal/mol. A Leave-One-Out analysis yields a correlati<strong>on</strong><br />
coefficient of 0.41. This clearly dem<strong>on</strong>strates the robustness of the model.<br />
The applicati<strong>on</strong> to an independent validati<strong>on</strong> dataset of alaninemutati<strong>on</strong>s<br />
was used to show the predictive power of the method and<br />
yields a correlati<strong>on</strong> coefficient of 0.51. Based <strong>on</strong> these findings,<br />
Drugscore-PPI was used to successful identify hotspots in multiple<br />
protein-interfaces. These results suggest that DrugscorePPI is an adequate<br />
method to score protein-protein interacti<strong>on</strong>s.<br />
References<br />
1. Gohlke H, Hendlich M, Klebe G: Knowledge-based scoring functi<strong>on</strong> to<br />
predict protein-ligand interacti<strong>on</strong>s. J Mol Biol 2000, 295(2):337-356.<br />
2. Huang SY, Zou X: An iterative knowledge-based scoring functi<strong>on</strong> for<br />
protein-protein recogniti<strong>on</strong>. Proteins 2008, 72(2):557-579.<br />
P21<br />
Adaptati<strong>on</strong> of formal c<strong>on</strong>cept analysis for the systematic explorati<strong>on</strong><br />
of structure-activity and structure-selectivity relati<strong>on</strong>ships<br />
Eugen Lounkine * , Jürgen Bajorath<br />
Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical<br />
Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität<br />
B<strong>on</strong>n, Dahlmannstr. 2, D-53113 B<strong>on</strong>n, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P21<br />
Formal C<strong>on</strong>cept Analysis (FCA) is a data mining and visualizati<strong>on</strong><br />
approach originating from informati<strong>on</strong> science. It operates <strong>on</strong> binary<br />
relati<strong>on</strong>ships between objects and attributes, which are reported in a<br />
formal c<strong>on</strong>text. Formal c<strong>on</strong>cepts are sets of objects that share a defined<br />
subset of attributes. FCA organizes these c<strong>on</strong>cepts in lattices that reflect<br />
their relati<strong>on</strong>ship in terms of shared objects and/or attributes and allows<br />
the identificati<strong>on</strong> of objects with defined sets of attributes [1].<br />
Two adaptati<strong>on</strong>s of FCA that allow the systematic analysis of structureactivity<br />
and-selectivity relati<strong>on</strong>ships are presented. Fragment Formal<br />
C<strong>on</strong>cept Analysis (FragFCA) assesses the distributi<strong>on</strong> of molecular fragment<br />
combinati<strong>on</strong>s am<strong>on</strong>g ligands with closely related biological targets. This<br />
allows the identificati<strong>on</strong> of fragment signatures that exclusively occur in<br />
compounds with a defined activity profile. FragFCA also identifies fragment<br />
combinati<strong>on</strong>s that are characteristic of highly potent compounds against<br />
defined targets. Fragment signatures usually represent combinati<strong>on</strong>s of<br />
two or three fragments and can be used to differentiate active compounds<br />
of closely related targets for different target families [2,3].<br />
Molecular Formal C<strong>on</strong>cept Analysis (MolFCA) is introduced for the<br />
systematic comparis<strong>on</strong> of the selectivity of a compound against multiple<br />
targets and the extracti<strong>on</strong> of compounds with complex selectivity profiles<br />
from biologically annotated databases. Selectivity is assessed based <strong>on</strong><br />
pair-wise compound potency ratios. Thisallowsthedefiniti<strong>on</strong>multiple<br />
selectivity queries involving the comparis<strong>on</strong> of an arbitrary number of<br />
targets and compound potency values or ratios. The individual queries are<br />
applied in a sequential manner to retrieve compounds with desired<br />
selectivity against targets of interest. MolFCA operates <strong>on</strong> activity space<br />
representati<strong>on</strong>s of compounds and thus allows the identificati<strong>on</strong> of<br />
structurally diverse compounds matching a given selectivity profile [4].<br />
References<br />
1. Priss U: Ann Rev Inf Sci Technol 2006, 40:521.<br />
2. Lounkine E, Auer J, Bajorath J: J Med Chem 2008, 51:5342.<br />
3. Krüger F, Lounkine E, Bajorath J: Chem Med Chem 2009, 4:1174.<br />
4. Lounkine E, Stumpfe D, Bajorath J: J Chem Inf Model 2009, 49:1359.<br />
Page 16 of 25<br />
P22<br />
Systematic extracti<strong>on</strong> of structure-activity relati<strong>on</strong>ship informati<strong>on</strong> from<br />
biological screening data<br />
Mathias Wawer * , L Peltas<strong>on</strong>, Jürgen Bajorath<br />
Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical<br />
Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität<br />
B<strong>on</strong>n, Dahlmannstr. 2, D-53113 B<strong>on</strong>n, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P22<br />
The analysis of high-throughput screening data poses significant<br />
challenges to medicinal and computati<strong>on</strong>al chemists. The number of<br />
compounds assayed in a single screen is prohibitively large for manual<br />
data analysis and no generally applicable computati<strong>on</strong>al methods have<br />
thus far been developed to c<strong>on</strong>sistently solve the problem of how to<br />
best select hits for further chemical explorati<strong>on</strong>.<br />
Focusing <strong>on</strong> the questi<strong>on</strong> of how structure-activity relati<strong>on</strong>ship (SAR)<br />
informati<strong>on</strong> can be used to support this decisi<strong>on</strong>, we present methods<br />
for the descriptive analysis of screening data. Network representati<strong>on</strong>s<br />
visualize the distributi<strong>on</strong> of 2D similarity relati<strong>on</strong>ships and potency in a<br />
data set and give an overview of global and local features of an activity<br />
landscape. Although dominated by many weakly active hits, different<br />
local SAR envir<strong>on</strong>ments can be identified am<strong>on</strong>g screening hits, thus<br />
helping to focus <strong>on</strong> regi<strong>on</strong>s in chemical space that might show favorable<br />
SAR behavior in further explorati<strong>on</strong> [1].<br />
A more detailed analysis of the data is achieved by systematically mining<br />
the network for SAR pathways, i.e. sequences of pairwise similar<br />
compounds that c<strong>on</strong>nect two molecules via a gradually increasing<br />
potency gradient. The SAR pathways are calculated exhaustively for all<br />
possible compound pairs in a data set to identify those having most<br />
significant SAR informati<strong>on</strong> c<strong>on</strong>tent. Often, high-scoring pathways lead to<br />
activity cliffs, i.e. pairs of similar compounds with significant differences in<br />
potency, and scaffold transiti<strong>on</strong>s can be observed al<strong>on</strong>g the pathways.<br />
Furthermore, a tree structure organizes alternative pathways that begin at<br />
the same compound but lead to different molecules and chemotypes.<br />
Similarly, SAR trees can be generated from all pathways that lead<br />
to an activity cliff in order to characterize the surrounding SAR<br />
microenvir<strong>on</strong>ment [2].<br />
References<br />
1. Wawer M, Peltas<strong>on</strong> L, Bajorath J: J Med Chem 2009, 52:1075-1080.<br />
2. Wawer M, Bajorath J: Chem Med Chem in press.<br />
P<strong>23</strong><br />
High throughput in-silico screening against flexible protein receptors<br />
Horacio Pérez-Sánchez 1* , Bernhard Fischer 1 , Daria Kokh 2 , Holger Merlitz 1 ,<br />
Wolfgang Wenzel 1<br />
1 Institut für Nanotechnologie, Forschungszentrum Karlsruhe, Postfach 3640,<br />
76021 Karlsruhe, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 Wuppertal, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P<strong>23</strong><br />
Based <strong>on</strong> the stochastic tunneling method (STUN) [1] we have developed<br />
FlexScreen [2], a novel strategy for high-throughput in-silico screening of<br />
large ligand databases. Each ligand of the database is docked against the<br />
receptor using an all-atom representati<strong>on</strong> of both ligand and receptor. The<br />
ligands with the best evaluated affinity are selected as lead candidates for<br />
drug development. Using the thymidine kinase inhibitors as a prototypical<br />
example we documented [3] the shortcomings of rigid receptor screens in a<br />
realistic system. We dem<strong>on</strong>strate a gain in both overall binding energy and<br />
overall rank of the known substrates when two screens with a rigid and<br />
flexible (up to 15 sidechain dihedral angles) receptor are compared. We<br />
note that the STUN suffers <strong>on</strong>ly a comparatively small loss of efficiency<br />
when an increasing number of receptor degrees of freedom is c<strong>on</strong>sidered.<br />
FlexScreen thus offers a viable compromise [4] between docking flexibility<br />
and computati<strong>on</strong>al efficiency to perform fully automated database screens<br />
<strong>on</strong> hundreds of thousands of ligands. We also investigate enrichment rates<br />
[5] of rigid, soft and flexible receptor models [6] for 12 diverse receptors<br />
using libraries c<strong>on</strong>taining up to 13000 molecules. A flexible sidechain model<br />
with flexible dihedral angles for up to 12 aminoacids increased both binding<br />
propensity and enrichment rates: EF_1 values increased by 35% <strong>on</strong> average<br />
with respect to rigid-docking (3-8 flexible sidechains). This methodology will<br />
be so<strong>on</strong> available for the Cell processor and Pipeline Pilot.
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
References<br />
1. Wenzel W, Hamacher K: Stochastic tunneling approach for global<br />
minimizati<strong>on</strong> of complex potential energy landscapes. Physical Review<br />
Letters 1999, 82(15):3003-3007.<br />
2. Merlitz H, Burghardt B, Wenzel W: Applicati<strong>on</strong> of the stochastic tunneling<br />
method to high throughput database screening. Chemical Physics Letters<br />
2003, 370(1-2):68-73.<br />
3. Merlitz H, Burghardt B, Wenzel W: Impact of receptor c<strong>on</strong>formati<strong>on</strong> <strong>on</strong> in<br />
silico screening performance. Chemical Physics Letters 2004,<br />
390(4-6):500-505.<br />
4. Fischer B, et al: Accuracy of binding mode predicti<strong>on</strong> with a cascadic<br />
stochastic tunneling method. Proteins-Structure Functi<strong>on</strong> and Bioinformatics<br />
2007, 68(1):195-204.<br />
5. Kokh DB, Wenzel WG: Flexible side chain models improve enrichment<br />
rates in in silico screening. Journal of Medicinal Chemistry 2008,<br />
51(19):5919-5931.<br />
6. Fischer B, Fukuzawa K, Wenzel W: Receptor-specific scoring functi<strong>on</strong>s<br />
derived from quantum chemical models improve affinity estimates for<br />
in-silico drug discovery. Proteins-Structure Functi<strong>on</strong> and Bioinformatics 2008,<br />
70(4):1264-1273.<br />
P24<br />
Virtual screening by high-throughput docking using hydrogen<br />
b<strong>on</strong>ding c<strong>on</strong>straints for targeting a protein-protein interface in<br />
M. tuberculosis<br />
Oliver Koch 1,2* , T Jäger 2 , L Flohé 2 , PM Selzer 1<br />
1 Intervet Innovati<strong>on</strong> GmbH, Zur Propstei, 55270 Schwabenheim, <str<strong>on</strong>g>German</str<strong>on</strong>g>y;<br />
2 MOLISA GmbH, Brenneckestraße 20, 39118 Magdeburg, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P24<br />
The resurgence of Tuberculosis, caused primarily by Mycobacterium<br />
tuberculosis, and the appearance of drug resistant tuberculosis strains<br />
encourage the need for new drugs with alternative mode of acti<strong>on</strong>s [1]. An<br />
interesting drug-target is the Thioredoxin-Reductase (TrxR)/Thioredoxin (Trx)<br />
System of M. tuberculosis, which is resp<strong>on</strong>sible for providing reducing<br />
equivalents for many cellular processes including bacterial antioxidant<br />
defence [2].<br />
We performed a virtual screening approach for identifying novel drugs for<br />
the treatment of Tuberculosis that used the hydrogen b<strong>on</strong>ding c<strong>on</strong>straint<br />
functi<strong>on</strong>ality of GOLD for pre-processing instead of using pharmacophore<br />
filtering. Two important hydrogen b<strong>on</strong>ding interacti<strong>on</strong>s could be identified<br />
at the protein-protein interface of the TrxR-Trx complex. A high-throughput<br />
docking was applied to filter the Intervet in-house compound library (~6.5<br />
milli<strong>on</strong> compounds) for possible hits that interact with these two hydrogen<br />
b<strong>on</strong>ding acceptors. This reduced the number of interesting compounds to<br />
151768 that were redocked without any c<strong>on</strong>straints and more accurate<br />
docking settings. Finally, the ranking of the reduced compound set was<br />
obtained using a normalisati<strong>on</strong> based <strong>on</strong> molecular weight [3] and a<br />
c<strong>on</strong>sensus scoring in a restrictive way using Chemscore and ASPscore.<br />
So far, six out of the first 25 of the ranked compound list showed an<br />
activity with an IC 50 value upto M range. Especially three compounds<br />
with different scaffolds and a low molecular weight are promising<br />
candidates for further developments.<br />
References<br />
1. World Health Organizati<strong>on</strong> Tuberculosis Programme. [http://who.int/tb/].<br />
2. Jaeger T, Flohé L: The thiol-based redox networks of pathogens:<br />
unexploited targets in the search for new drugs. Biofactors 2006,<br />
27:109-120.<br />
3. Carta A, Knox JS, Lloyd DG: Unbiasing Scoring Functi<strong>on</strong>s: A New<br />
Normalizati<strong>on</strong> and Rescoring Strategy. JCIM 2007, 47:1564-1571.<br />
P25<br />
Ensemble docking revisited<br />
Oliver Korb * , S Bowden, T Olss<strong>on</strong>, D Frenkel, J Liebeschuetz, J Cole<br />
Cambridge Crystallographic Data Centre, 12 Uni<strong>on</strong> Road, Cambridge,<br />
CB2 1EZ, UK<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P25<br />
In recent years, the importance of c<strong>on</strong>sidering induced fit effects in<br />
molecular docking calculati<strong>on</strong>s has been widely recognised in the<br />
Page 17 of 25<br />
molecular modelling community. While small-scale protein side-chain<br />
movements are now accounted for in many state-of-the-art docking<br />
strategies, the explicit modelling of large-scale protein moti<strong>on</strong>s such as<br />
loop movements in kinase domains is still a challenging task. For this<br />
reas<strong>on</strong> ensemble-based methods have been introduced taking into<br />
account several discrete protein c<strong>on</strong>formati<strong>on</strong>s in the c<strong>on</strong>formati<strong>on</strong>al<br />
sampling step. Our protein-ligand docking approach GOLD [1,2] has been<br />
extended to search such c<strong>on</strong>formati<strong>on</strong>al ensembles time-efficiently. The<br />
performance of the approach has been assessed <strong>on</strong> several protein<br />
targets using different scoring functi<strong>on</strong>s. A detailed analysis of pose<br />
predicti<strong>on</strong> and virtual screening results in dependence of the number of<br />
protein structures c<strong>on</strong>sidered in the c<strong>on</strong>formati<strong>on</strong>al ensemble will be<br />
presented and limitati<strong>on</strong>s of the approach will be highlighted.<br />
References<br />
1. J<strong>on</strong>es G, Willett P, Glen RC: Molecular Recogniti<strong>on</strong> of Receptor Sites Using<br />
a Genetic Algorithm with a Descripti<strong>on</strong> of Desolvati<strong>on</strong>. J Mol Biol 1995,<br />
245:43-53.<br />
2. J<strong>on</strong>es G, Willett P, Glen RC, Leach AR, Taylor R: Development and<br />
Validati<strong>on</strong> of a Genetic Algorithm for Flexible Docking. J Mol Biol 1997,<br />
267:727-748.<br />
P26<br />
Picking out polymorphs: H-b<strong>on</strong>d predicti<strong>on</strong> and crystal structure<br />
stability<br />
Peter TA Galek * , Frank H Allen, László Fábián<br />
Cambridge Crystallographic Data Centre, 12 Uni<strong>on</strong> Road, Cambridge,<br />
CB2 1EZ, UK<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P26<br />
A methodology has been developed to predict the propensity for<br />
hydrogen b<strong>on</strong>ds to form in crystal structures, treating each potential<br />
H-b<strong>on</strong>d as a binary resp<strong>on</strong>se variable, and modelling its likelihood using<br />
a set of relevant chemical descriptors [1]. Modelling is tailored to a target<br />
using chemically similar known structures,frome.g.theCambridge<br />
Structural Database [2], making it accessible to the complete spectrum of<br />
organic structures, including solvates, hydrates and cocrystals. Recent<br />
work has developed the approach to predicting inter- and intramolecular<br />
H-b<strong>on</strong>ds when either type can occur.<br />
By way of a comparis<strong>on</strong> between possible and observed H-b<strong>on</strong>ds, the<br />
method has been applied to assess structural stability, which shows much<br />
promiseinthedomainofpolymorphscreeninginthepharmaceutical<br />
industry. We will introduce the methodology and illustrate its applicati<strong>on</strong><br />
using a selecti<strong>on</strong> of pharmaceutical compounds, <strong>on</strong>e of which will be<br />
Abbott’s well-publicised anti-HIV medicati<strong>on</strong> rit<strong>on</strong>avir (Norvir). Owing to a<br />
hidden, more stable form II with much lower bioavailability, rit<strong>on</strong>avir was<br />
temporarily withdrawn from the market with significant financial impact<br />
[3]. Our method quickly suggests a real threat of polymorphism in this<br />
compound, and str<strong>on</strong>gly supports the relative stability of form II over<br />
form I. For all examples, the high predictivity of the method is emphasised.<br />
References<br />
1. Galek PTA, Fábián L, Allen FH, Motherwell WDS, Feeder N: Acta Cryst 2007,<br />
B63:768-782.<br />
2. Allen FH: Acta Cryst 2002, B58:380-388.<br />
3. Bauer J, Spant<strong>on</strong> S, Henry R, Quick J, Dziki W, Porter W, Morris J: Pharm Res<br />
2001, 18:859-866.<br />
P27<br />
Kernel learning for ligand-based virtual screening: discovery of a new<br />
PPARg ag<strong>on</strong>ist<br />
Matthias Rupp 1* , T Schroeter 2 , R Steri 1 , Ewgenij Proschak 1 , K Hansen 2 , H Zettl 1 ,<br />
O Rau 1 , M Schubert-Zsilavecz 1 , K-R Müller 2 , Gisbert Schneider 1<br />
1<br />
University of Frankfurt, Siesmayerstr. 70, 603<strong>23</strong> Frankfurt am Main, <str<strong>on</strong>g>German</str<strong>on</strong>g>y;<br />
2<br />
Berlin, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P27<br />
We dem<strong>on</strong>strate the theoretical and practical applicati<strong>on</strong> of modern<br />
kernel-based machine learning methods to ligand-based virtual<br />
screening by successful prospective screening for novel ag<strong>on</strong>ists of the<br />
peroxisome proliferator-activated receptor g (PPARg) [1].PPARg is a<br />
nuclear receptor involved in lipid and glucose metabolism, and related
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
to type-2 diabetes and dyslipidemia. Applied methods included a graph<br />
kernel designed for molecular similarity analysis [2], kernel principle<br />
comp<strong>on</strong>ent analysis [3], multiple kernel learning [4], and, Gaussian<br />
process regressi<strong>on</strong> [5].<br />
In the machine learning approach to ligand-based virtual screening, <strong>on</strong>e<br />
uses the similarity principle [6] to identify potentially active compounds<br />
based <strong>on</strong> their similarity to known reference ligands. Kernel-based<br />
machine learning [7] uses the “kernel trick”, a systematic approach to the<br />
derivati<strong>on</strong> of n<strong>on</strong>-linear versi<strong>on</strong>s of linear algorithms like separating<br />
hyperplanes and regressi<strong>on</strong>. Prerequisites for kernel learning are similarity<br />
measures with the mathematical property of positive semidefiniteness<br />
(kernels).<br />
The iterative similarity optimal assignment graph kernel (ISOAK) [2] is<br />
defined directly <strong>on</strong> the annotated structure graph, and was designed<br />
specifically for the comparis<strong>on</strong> of small molecules. In our virtual screening<br />
study, its use improved results, e.g., in principle comp<strong>on</strong>ent analysisbased<br />
visualizati<strong>on</strong> and Gaussian process regressi<strong>on</strong>. Following a<br />
thorough retrospective validati<strong>on</strong> using a data set of 176 published<br />
PPARg ag<strong>on</strong>ists [8], we screened a vendor library for novel ag<strong>on</strong>ists.<br />
Subsequent testing of 15 compounds in a cell-based transactivati<strong>on</strong> assay<br />
[9] yielded four active compounds.<br />
The most interesting hit, a natural product derivative with cyclobutane<br />
scaffold, is a full selective PPARg ag<strong>on</strong>ist (EC50 = 10 ± 0.2 μM, inactive <strong>on</strong><br />
PPARa and PPARb/δ at 10 μM). We dem<strong>on</strong>strate how the interplay of<br />
several modern kernel-based machine learning approaches can<br />
successfully improve ligand-based virtual screening results.<br />
References<br />
1. Henke B: Progress in Medicinal Chemistry 2004, 42:1-53.<br />
2. Rupp M, Proschak E, Schneider G: J Chem Inf Model 2007, 47(6):2280-2286.<br />
3. Schölkopf B, Smola A, Müller K-R: Neural Comput 1998, 10(5):1299-1319.<br />
4. S<strong>on</strong>nenburg S, Rätsch G, Schäfer C, Schölkopf B: J Mach Learn Res 2006,<br />
7(7):1531-1565.<br />
5. Rasmussen C, Williams C: MIT Press, Cambridge 2006.<br />
6. Johns<strong>on</strong> M, Maggiora M: Wiley, New York 1990.<br />
7. Schölkopf B, Smola A: MIT Press, Cambridge 2002.<br />
8. Rücker C, Scarsi M, Meringer M: Bioorg Med Chem 2006, 14(15):5178-5195.<br />
9. Rau O, Wurglics M, Paulke A, Zitzkowski J, Meindl N, Bock A,<br />
Dingermann T, Abdel-Tawab M, Schubert-Zsilavecz M: Planta Med 2006,<br />
72(10):881-887.<br />
P28<br />
OrChem: an open source chemistry search engine for Oracle<br />
Mark L Rijnbeek * , Christoph Steinbeck<br />
EMBL EBI, Wellcome Trust Genome Campus, Cambridge, UK<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P28<br />
Registrati<strong>on</strong>, indexing and searching of chemical structures in relati<strong>on</strong>al<br />
databases is <strong>on</strong>e of the core areas of cheminformatics. However, little<br />
detail has been published <strong>on</strong> the inner workings of search engines and<br />
their development is mostly closed-source. We decided to implement an<br />
open source chemical library for Oracle, the de-facto database standard<br />
in the commercial world.<br />
We present OrChem, an extensi<strong>on</strong> for the Oracle 11G database that adds<br />
registrati<strong>on</strong> and indexing of chemical structures to support fast<br />
substructure and similarity searching. The cheminformatics functi<strong>on</strong>ality is<br />
provided by the Chemistry Development Kit [1,2]. OrChem enables<br />
similarity searching with resp<strong>on</strong>se times in the order of sec<strong>on</strong>ds for<br />
databases with milli<strong>on</strong>s of compounds, depending <strong>on</strong> provided similarity<br />
cut-off. For substructure searching, OrChem can make use of multiple<br />
processor cores <strong>on</strong> today’s powerful database servers to provide fast<br />
resp<strong>on</strong>se times in equally large data sets.<br />
OrChem is a mix of PL/SQL and Java that executes inside the database.<br />
The user interacts with OrChem with calls to PL/SQL and Java Stored<br />
Procedures. Starting with Oracle 11g there is a just-in-time (JIT) compiler<br />
for the Oracle JVM envir<strong>on</strong>ment which makes Java run much faster inside<br />
the database than previously.<br />
OrChem is built <strong>on</strong> top of the Chemistry Development Kit (CDK) and<br />
depends <strong>on</strong> this Java library in numerous ways. For example, compounds<br />
are represented internally as CDK molecule objects, the CDK’s I/O<br />
package is used to retrieve compound data, and its subgraph<br />
isomorphism algorithms are used for substructure validati<strong>on</strong>. OrChem<br />
adds its own Java layer <strong>on</strong> top of the CDK to implement fast database<br />
storage and retrieval. With the CDK loaded into Oracle, a large<br />
cheminformatics library becomes readily available to PL/SQL. With little<br />
effort developers can build database functi<strong>on</strong>s around the CDK and so<br />
quickly implement chemistry extensi<strong>on</strong>s for Oracle.<br />
References<br />
1. Steinbeck , et al: J Chem Inf Comput Sci 2003, 43(2):493-500.<br />
2. Steinbeck , et al: Curr Pharm Des 2006, 12(17):2111-2120.<br />
P29<br />
Computati<strong>on</strong>al fragment-based drug design to explore the<br />
hydrophobic subpocket of the mitotic kinesin Eg5 allosteric<br />
binding site<br />
Ksenia Oguievetskaia *† , Laetitia Martin-Chanas *† , Artem Vorotyntsev,<br />
Olivia Doppelt-Azeroual, Xavier Brotel, Stewart A Adcock,<br />
Alexandre G de Brevern, Francois Delfaud, Fabrice Moriaud<br />
MEDIT SA, 2 rue du Belvédère, 91120 Palaiseau, France<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P29<br />
Page 18 of 25<br />
Eg5, a mitotic kinesin exclusively involved in the formati<strong>on</strong> and functi<strong>on</strong><br />
of the mitotic spindle has attracted interest as an anticancer drug target.<br />
Eg5 is co-crystallized with several inhibitors bound to its allosteric binding<br />
pocket. Each of these occupies a pocket formed by loop5/helix a2.<br />
Recently designed inhibitors additi<strong>on</strong>ally occupy a hydrophobic pocket of<br />
this site. The goal of the present study was to identify new fragments<br />
which fill this hydrophobic pocket and might be interesting chemical<br />
moieties to design new inhibitors.<br />
We’ve used the fragment based protocol of MED-SuMo which exploits the<br />
whole PDB <strong>on</strong> the basis that similar protein surfaces might bind the same<br />
fragment. The MED-SuMo software is able to compare and superimpose<br />
similar protein-ligand interacti<strong>on</strong> surfaces at PDB scale. Protein-fragment<br />
complexes are encoded as MED-Porti<strong>on</strong>s by cross-mining 3D PDB ligand<br />
structures with fragment-like PubChem molecules. Inhibitors are designed<br />
by hybridising MED-Porti<strong>on</strong>s with MED-Hybridise.<br />
Without using the known Eg5 PDB ligands that occupy the hydrophobic<br />
pocket, we’ve predicted new potential ligands that would fill<br />
simultaneously both pockets. Some of the latter are identified in libraries<br />
of synthetically accessible molecules by the MED-Search software.<br />
P30<br />
Learning antibacterial activity against S. Aureus <strong>on</strong> the Chimiothèque<br />
Nati<strong>on</strong>ale dataset<br />
G Marcou 1* , N Lachiche 1 , L Brillet 1 , J-M Paris 2 , A Varnek 1<br />
1 Université de Strasbourg, Faculté de Chimie de Strasbourg, Laboratoire<br />
d’infochimie, 4 rue Blaise Pascal, 67000 Strasbourg, France; 2 Paris, France<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P30<br />
“Chimiothèque Nati<strong>on</strong>ale” (CN) represents a library of synthetic and<br />
natural products from various French public laboratories. Recently,<br />
experimental screening for S. Aureus antibacterial activity of the part of<br />
this library has been performed by Dr J.-M. Paris and collaborators in<br />
Ecole des Mines (Paris, France). Here, experimental results of the<br />
screening have been used to build classificati<strong>on</strong> models using SMF<br />
descriptors and ISIDA modeling tools (Naïve Bayes (NB) and SVM<br />
modules). The dataset c<strong>on</strong>sisted in a large and structurally diverse set of<br />
4563 compounds, 62 of which having a dem<strong>on</strong>strated antibacterial<br />
activity. In NB calculati<strong>on</strong>s, several variati<strong>on</strong>s of the algorithm were used.<br />
In particular, the aprioridistributi<strong>on</strong> of SMF descriptors was modeled<br />
using a binomial law, Bernouilli distributi<strong>on</strong>s or first order logic. SVM<br />
calculati<strong>on</strong>s were performed with an RBF kernel which parameters (C,g)<br />
were optimized to maximize the balanced accuracy.<br />
Both NB and SVM models were validated using external 5-fold cross<br />
validati<strong>on</strong>, repeated three times <strong>on</strong> randomized data set. Additi<strong>on</strong>ally<br />
fifteen Y-randomizati<strong>on</strong>s were performed in order to check for chance<br />
correlati<strong>on</strong>. Predictive performance of the models has been assessed by<br />
combinati<strong>on</strong> of Precisi<strong>on</strong> and Recall parameters: the models with Precisi<strong>on</strong><br />
>0.1andRecall > 0.6 corresp<strong>on</strong>d to over 10-fold enrichment and,<br />
therefore, were c<strong>on</strong>sidered as acceptable.<br />
A c<strong>on</strong>sensus model was applied to estimate the antibacterial activity of<br />
122 new compounds for the CN. All predicti<strong>on</strong>s were d<strong>on</strong>e blindly.
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
P31<br />
Multi-parameter scoring functi<strong>on</strong>s for ligand- and structure-based<br />
de novo design<br />
Fabian Bös 1* , KM Smith 2 , Q Liu 3<br />
1 Tripos Internati<strong>on</strong>al, Martin-Kollar-Str. 17, 81829 München, <str<strong>on</strong>g>German</str<strong>on</strong>g>y;<br />
2 St. Louis, USA; 3 Shanghai, PR China<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P31<br />
Successful drug discovery often requires optimizati<strong>on</strong> against a set of<br />
biological and physical properties. We describe de novo design studies that<br />
dem<strong>on</strong>strate successful scaffold hops between known classes of ligands for<br />
p38 MAP kinases using ligand-based and structure-based multi-parameter<br />
scoring functi<strong>on</strong>s coupled to the molecular inventi<strong>on</strong> engine Muse.<br />
The ligand-based scoring functi<strong>on</strong> includes pharmacophoric and steric<br />
tuplets and structural (fingerprint based) similarity. In additi<strong>on</strong> various<br />
selectivity or ADME related properties (e.g. Lipinski properties, polar<br />
surface area, activity at off-targets, etc.). can be taken into account to<br />
guide the evoluti<strong>on</strong> of structures meeting multiple design criteria.<br />
The structure-based scoring functi<strong>on</strong> uses Surflex-Dock to pose and score<br />
invented structures inside the target’s active site. In additi<strong>on</strong>, a number of<br />
simple molecular properties (e.g. clogP, Lipinski properties, etc.) are used as<br />
score comp<strong>on</strong>ents to focus the design <strong>on</strong> medicinally relevant chemistries.<br />
With the ability of Surflex-Dock to start the docking process with a single or<br />
multiple placed fragments, this scoring functi<strong>on</strong> can be applied in fragment<br />
based drug discovery to optimize attachments <strong>on</strong>to a pre-placed substructure.<br />
P32<br />
Ligand-side tautomer enumerati<strong>on</strong> and scoring for structure-based<br />
drug-design<br />
Thomas Seidel * , G Wolber<br />
Inte:Ligand GmbH, Mariahilferstrasse 74B/11, A-1070 Vienna, Austria<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P32<br />
Tautomeric rearrangements of a molecule lead to distinct equilibrated<br />
structural states of the same chemical compound that may differ<br />
significantly in molecular shape, surface, nature of functi<strong>on</strong>al groups,<br />
hydrogen-b<strong>on</strong>ding pattern and other derived molecular properties [1].<br />
Especially for the structure-based pharmacophore modeling of ligandprotein<br />
complexes [2], knowledge of the most favorable tautomeric ligand<br />
states may be crucial for the quality and correctness of the generated<br />
pharmacophore models and derived putative binding modes. We will<br />
present a ligand-side tautomer enumerati<strong>on</strong> and ranking algorithm that<br />
c<strong>on</strong>siders both geometrical c<strong>on</strong>straints imposed by the c<strong>on</strong>formati<strong>on</strong><br />
of the bound ligand as well as intra- and inter-molecular energetic<br />
c<strong>on</strong>tributi<strong>on</strong>s to find the preferred tautomeric states of the ligand in the<br />
macromolecular envir<strong>on</strong>ment of the binding-site. The presented tautomer<br />
ranking algorithm is based <strong>on</strong> scores that are derived solely from MMFF94<br />
[3-9] energies (thus the method works for a wide range of organic<br />
molecules) and has proven to be able to top-rank known preferred<br />
tautomeric states of ligands in a series of investigated protein complexes.<br />
References<br />
1. Pospisil P, et al: J Recept Signal Transduct 2003, <strong>23</strong>:361-371.<br />
2. Wolber G, Langer T: J Chem Inf Model 2005, 45:160-169.<br />
3. Halgren TA: J Comput Chem 1996, 17:490-519.<br />
4. Halgren TA: J Comput Chem 1996, 17:520-552.<br />
5. Halgren TA: J Comput Chem 1996, 17:553-586.<br />
6. Halgren TA, Nachbar RB: J Comput Chem 1996, 17:587-615.<br />
7. Halgren TA: J Comput Chem 1996, 17:616-651.<br />
8. Halgren TA: J Comput Chem 1999, 20:720-729.<br />
9. Halgren TA: J Comput Chem 1999, 20:730-748.<br />
P33<br />
Maximum-score diversity selecti<strong>on</strong> for early drug discovery<br />
Thorsten Meinl * , C Ostermann, O Nimz, A Zaliani, MR Berthold<br />
University of K<strong>on</strong>stanz, 78457 K<strong>on</strong>stanz, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P33<br />
Diversity selecti<strong>on</strong> is a comm<strong>on</strong> task in early drug discovery, be it for<br />
removing redundant molecules prior to HTS or reducing the number of<br />
Page 19 of 25<br />
molecules to synthesize from scratch. One drawback of the current<br />
approach, especially with regard to HTS, is, however, that <strong>on</strong>ly the<br />
structural diversity is taken into account. The fact that a molecule may<br />
be highly active or completely inactive is usually ignored. This is<br />
especially remarkable, as quite a lot of research is involved in<br />
improving virtual screening methods in order to forecast activity. We<br />
therefore present a modified versi<strong>on</strong> of diversity selecti<strong>on</strong> – which we<br />
termed Maximum-Score Diversity Selecti<strong>on</strong> – which additi<strong>on</strong>ally takes<br />
the predicted activities of the molecules into account. Not very<br />
surprisingly both objectives – maximizing activity whilst also<br />
maximizing diversity in the selected subset – c<strong>on</strong>flict. As a result,<br />
we end up with a multiobjective optimizati<strong>on</strong> problem. We will show,<br />
that the task of diversity selecti<strong>on</strong> is quite complicated (it is NPcomplete)<br />
and therefore heuristic approaches are needed for typical<br />
dataset sizes.<br />
A comm<strong>on</strong> and popular approach is using multiobjective genetic<br />
algorithms, such as NSGA-II [1], for optimizing both objectives for the<br />
selected subsets. However, we will show that usual implementati<strong>on</strong>s<br />
suffer from severe limitati<strong>on</strong>s that prevent them from finding quite a lot<br />
of possible interesting soluti<strong>on</strong>s. Therefore, we evaluated two other<br />
heuristic for maximum-score diversity selecti<strong>on</strong>. One is special heuristic<br />
(called BB2) that was motivated by the menti<strong>on</strong>ed proof of NPcompleteness<br />
[2]. The other is a novel heuristics called Score Erosi<strong>on</strong><br />
which was specifically developed for our actual problem. Am<strong>on</strong>g all<br />
three heuristics, Score Erosi<strong>on</strong> is by far the fastest <strong>on</strong>e while finding<br />
soluti<strong>on</strong>s of equal quality compared to the genetic algorithm and BB2.<br />
This will be shown <strong>on</strong> several real world datasets, both public and<br />
internal <strong>on</strong>es.<br />
All experiments were carried out using the data analysis platform KNIME<br />
[3] therefore we will also show some example how maximumscore<br />
diversity selecti<strong>on</strong> can be performed inside workflow-based<br />
envir<strong>on</strong>ments.<br />
References<br />
1. Deb K, Pratap A, Agarwal S, Meyarivan T: A fast and elitist multiobjective<br />
genetic algorithm: NSGA-II. IEEE Transacti<strong>on</strong>s <strong>on</strong> Evoluti<strong>on</strong>ary Computati<strong>on</strong><br />
2002, 6:182-197.<br />
2. Erkut E: The discrete p-dispersi<strong>on</strong> problem. European Journal of<br />
Operati<strong>on</strong>al Research 1990, 46(1):48-60.<br />
3. [http://www.knime.org/].<br />
P34<br />
Progress <strong>on</strong> an open source computer-assisted structure elucidati<strong>on</strong><br />
suite (SENECA)<br />
Stefan Kuhn * , Christoph Steinbeck<br />
Wellcome Trust Genome Campus, Hinxt<strong>on</strong>, Cambridge CB10 1SD, UK<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P34<br />
We suggested and developed various comp<strong>on</strong>ents for Computer-Assisted<br />
Structure Elucidati<strong>on</strong> (CASE) over the years [1,2]. Our current goal is to<br />
integrate these into an easy to use and efficient platform for end users,<br />
called SENECA. This is based <strong>on</strong> Bioclipse [3], an integrated software suite<br />
for chemo- and bioinformatics providing plugins for file handling and<br />
visualisati<strong>on</strong> of compounds and spectra.<br />
The SENECA feature currently comprises the following comp<strong>on</strong>ents and<br />
algorithms:<br />
A deterministic structure generator suitable for small chemical spaces.<br />
Simulated Annealing and Genetic Algorithm for random structure walks<br />
in large spaces.<br />
Simulati<strong>on</strong> of 13C NMR spectra for ranking results.<br />
SENECA is a Bioclipse feature, available via the update site at http://www.<br />
ebi.ac.uk/steinbeck-srv/speclipse/net.bioclipse.seneca-updatesite/.<br />
References<br />
1. Steinbeck C, Krause S, Kuhn S: NMRShiftDB - C<strong>on</strong>structing a free chemical<br />
informati<strong>on</strong> system with open-source comp<strong>on</strong>ents. Journal of Chemical<br />
Informati<strong>on</strong> and Computer Sciences 2003, 43:1733-1739.<br />
2. Kuhn S, Egert B, Neumann S, Steinbeck C: Building blocks for automated<br />
elucidati<strong>on</strong> of metabolites: Machine learning methods for NMR<br />
predicti<strong>on</strong>. BMC Bioinformatics 2008, 9:400.<br />
3. Spjuth O, Helmus T, Willighagen EL, Kuhn S, Eklund M, et al: Bioclipse: An<br />
open source workbench for chemo- and bioinformatics. BMC<br />
Bioinformatics 2007, 8:59.
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
P35<br />
Predicti<strong>on</strong> of highly-c<strong>on</strong>nected ‘hub’-proteins in protein interacti<strong>on</strong><br />
networks using QSAR<br />
M Hsing 1* , K Byler 2 , A Cherkasov 1<br />
1<br />
University of British Columbia, <strong>23</strong>29 West Mall, Vancouver, BC Canada;<br />
2<br />
Rochester, USA<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P35<br />
Proteins that are most essential for functi<strong>on</strong>ing and viability of bacterial<br />
cell have been shown to exhibit larger number of interacti<strong>on</strong>s with other<br />
cell comp<strong>on</strong>ents. Thus, by identifying the most c<strong>on</strong>nected proteins (or<br />
hubs) in protein interacti<strong>on</strong> networks (PINs), <strong>on</strong>e may discover<br />
prospective drug targets that can be utilized to combat emergent and<br />
drug-resistant pathogens such as Methicillin-Resistant Staphylococcus<br />
aureus 252 (MRSA). The advantage of using such hub proteins as drug<br />
targets lies in their essentiality, n<strong>on</strong>-replaceable positi<strong>on</strong> in the PIN and<br />
lower rate of mutati<strong>on</strong>, which can help to counter bacterial resistance.<br />
However, finding or predicting such hub proteins remains a challenging<br />
task as the corresp<strong>on</strong>ding experiments are very costly, while traditi<strong>on</strong>al<br />
bioinformatics approaches generally fail in forecasting PIN data due to<br />
the general lack of agreement between the existing datasets [1].<br />
Thus, we have decided to utilize various structural and physicochemical<br />
features of proteins, related to traditi<strong>on</strong>al QSAR properties for predicting<br />
highly c<strong>on</strong>nected proteins. Using our own in-house generated PIN for the<br />
MRSA cell we have trained a boosting tree-based classifier that uses 75<br />
physical and chemical QSAR descriptors computed for all proteins in the<br />
interacti<strong>on</strong> network [2]. The utilized parameters included molecular<br />
weight, net charge, isoelectric point, hydrophobicity, surface area, solvent<br />
accessibilities, electr<strong>on</strong>egativity, sec<strong>on</strong>dary structure compositi<strong>on</strong>, surface<br />
coils and flexibility am<strong>on</strong>g other QSAR descriptors.<br />
The developed QSAR model has yielded a high predicti<strong>on</strong> accuracy of<br />
80% for the validati<strong>on</strong> set and was used to predict additi<strong>on</strong>al hubs in the<br />
rest of the MRSA proteome. The predicted hubs have then been<br />
evaluated experimentally and 55% of them were c<strong>on</strong>firmed as high<br />
interactors what corresp<strong>on</strong>ds to >5 fold dataset enrichment for potential<br />
hub-proteins provided by the developed QSAR model.<br />
Thus, the successful development of accurate hub classifiers dem<strong>on</strong>strated<br />
that highly-c<strong>on</strong>nected proteins tend to share certain structural and<br />
physicochemical features that can be characterized and quantified by<br />
c<strong>on</strong>venti<strong>on</strong>al QSAR descriptors.<br />
It is anticipated that the developed hub classifiers will represent a useful<br />
tool for the predicti<strong>on</strong> of highly-interacting proteins and can find broad<br />
applicati<strong>on</strong> for planning and executing large-scale proteomic experiments<br />
and for identificati<strong>on</strong> of novel and prospective antibacterial drug targets –<br />
even in those organisms that currently lack protein interacti<strong>on</strong> data.<br />
References<br />
1. Huang H, Jedynak BM, Bader JS: PLoS Comput Biol 2007, 3:e214.<br />
2. Byler K, Hsing M, Cherkasov A: QSAR & Combinatorial Science 2009,<br />
28:509-519.<br />
P36<br />
Efficient extracti<strong>on</strong> of can<strong>on</strong>ical spatial relati<strong>on</strong>ships using a recursive<br />
enumerati<strong>on</strong> of k-subsets<br />
Georg Hinselmann * , Nikolas Fechner, A Jahn, Andreas Zell<br />
University of Tübingen, Sand 1, 72076 Tübingen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P36<br />
The spatial arrangement of a chemical compound plays an important role<br />
regarding the related properties or activities. A straightforward approach to<br />
encode the geometry is to enumerate pairwise spatial relati<strong>on</strong>ships between<br />
k substructures, like functi<strong>on</strong>al groups or subgraphs. This leads to a<br />
combinatorial explosi<strong>on</strong> with the number of features of interest and<br />
redundant informati<strong>on</strong>. The goal of this work is to compute all possible<br />
k-subsets of spatial points and to extract a single can<strong>on</strong>ical descriptor for<br />
each subset in sub-polynomial computati<strong>on</strong> time. More precisely, the<br />
problem is to reduce the complexity of n k = n·(n - 1)...(n - k) possible<br />
relati<strong>on</strong>ships (patterns or descriptors) for n features and k-point<br />
relati<strong>on</strong>ships.<br />
We propose a two-step algorithm to solve this problem. A modified<br />
algorithm for the computati<strong>on</strong> of the binomial coefficient computes<br />
Page 20 of 25<br />
the k-subsets [1] c<strong>on</strong>taining the possible combinati<strong>on</strong>s of the n<br />
relevant features. If a k-subset is completed in the inner recursi<strong>on</strong>, the<br />
algorithm computes a can<strong>on</strong>ical representati<strong>on</strong> for it. By defining a<br />
natural order by means of the geometrical center of gravity of the k<br />
points, we extract k patterns that describe the distance to the center<br />
of gravity and type of the spatial feature k Î F. Then, the algorithm<br />
returns a unique identifier for the lexicographically sorted array of<br />
patterns. If applicable ( f� � f� �� f<br />
0 1 � ), an additi<strong>on</strong>al identifier<br />
k<br />
d�d 0�1 �<br />
�k�1�k is added which has the form ,<br />
f� � f f f<br />
o � ��� �<br />
1 k�1<br />
�k<br />
where dij denotes the geometrical distance between features i, j. Else<br />
( f� � f f<br />
o � ��<br />
1 � ), this step is omitted. Therefore, this approach<br />
k<br />
also c<strong>on</strong>siders stereochemistry. Finally, <strong>on</strong>e feature is returned for each<br />
k-subset resulting in a set of C(n, k) patterns describing the structure.<br />
Themainresultisthatthenumberoffeaturesisreducedfromnkto C(n, k), which equals the binomial coefficient. This procedure is useful<br />
in combinati<strong>on</strong> with similarity approaches that use spatial relati<strong>on</strong>ships,<br />
like pharmacophore searches, fingerprints, or graph kernels. We<br />
experimentally validated the algorithm <strong>on</strong> numerous QSAR benchmark<br />
sets in combinati<strong>on</strong> with the pharmacophore kernel [2].<br />
References<br />
1. Rolfe T: SIGCSE Bull 2001, 33(3):35-36.<br />
2. Mahé P, Ralaivola L, Stoven V, Vert J-P: J Chem Inf Mod 2006,<br />
46(5):2003-2014.<br />
P37<br />
A comparative study of in silico predicti<strong>on</strong> of pKa<br />
C Matijssen<br />
Cancer Research UK Centre for Cancer Therapeutics, The Institute of Cancer<br />
Research, 15 Cotswold Road, Sutt<strong>on</strong>, Surrey, SM2 5NG, UK<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P37<br />
The i<strong>on</strong>izati<strong>on</strong> c<strong>on</strong>stant (pKa) is the measure of the strength of an acid or<br />
base in a soluti<strong>on</strong>. Most ligands act as a weak acid or base. Accurate<br />
determinati<strong>on</strong> of pK a is important as it can improve the pharmaceutical<br />
properties of a compound [1]. In general, charged compounds have better<br />
solubility, but are less effective in membrane permeati<strong>on</strong>. Therefore,<br />
optimizati<strong>on</strong> of the pharmacokinetic profile of a compound can be<br />
performed by increasing or decreasing its i<strong>on</strong>izati<strong>on</strong> by changing functi<strong>on</strong>al<br />
groups. Another role for optimizati<strong>on</strong> of the charged group could be the<br />
interacti<strong>on</strong> with its target. Changing the charge <strong>on</strong> the functi<strong>on</strong>al group<br />
could improve the interacti<strong>on</strong> with its associated target and result in<br />
improved binding affinity.<br />
Different methods have been developed for the computati<strong>on</strong>al<br />
determinati<strong>on</strong> of pK a values. These can be based <strong>on</strong> different methods such<br />
as QSAR [2] or quantum chemistry approaches [3]. These two methods differ<br />
c<strong>on</strong>siderable in terms of computati<strong>on</strong>al resource and hence, the time<br />
needed for a predicti<strong>on</strong>. Depending <strong>on</strong> the number of ligands that needed<br />
predicti<strong>on</strong> and time available, a choice for <strong>on</strong>e method can be made.<br />
A comparis<strong>on</strong> between several pK a predictors (Pipeline Pilot, Moka,<br />
Epik and Jaguar) was made. All methods perform well when a diverse set<br />
of ligands which covers a range of pK a values is c<strong>on</strong>sidered. However,<br />
when optimizing a series of compounds the influence of small changes<br />
to the molecule and its effect <strong>on</strong> pKa becomes more difficult to predict.<br />
References<br />
1. Waterbeemd van de H, Gifford E: Nature Rev Drug Disc 2003, 2:192.<br />
2. Milletti F, Storchi L, Sforna G, Cruciani G: J Chem Inf Model 2007, 47:2172.<br />
3. Kinsella GK, Rodriguez F, Wats<strong>on</strong> GW, Rozas I: Bioorg Med Chem 2007, 15:2850.<br />
P38<br />
Kernel-based estimati<strong>on</strong> of the applicability domain of QSAR models<br />
Nikolas Fechner * , Georg Hinselmann, A Jahn, A Zell<br />
University of Tübingen, Sand 1, 72076 Tübingen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P38<br />
Machine learning techniques have become a valuable tool to assess<br />
molecular properties without the need of in vitro experiments. Most of
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
these methods do not give any informati<strong>on</strong> if a molecule that is<br />
predicted can be sufficiently described by the knowledge c<strong>on</strong>tained in<br />
the model. Thus, the estimati<strong>on</strong> of the reliability of a model-based<br />
predicti<strong>on</strong> is an important questi<strong>on</strong> in machine learning based QSAR<br />
modeling.<br />
One approach to solve this problem is to describe the porti<strong>on</strong> of the<br />
chemical space used during the training phase of a model. Any molecule<br />
includedinthesamesubspaceisthen c<strong>on</strong>sidered as a structure for<br />
which the model is regarded as valid. This c<strong>on</strong>cept of the descripti<strong>on</strong> of<br />
the subspace in which a model is regarded as reliable is known as the<br />
estimati<strong>on</strong> of the applicability domain of this model [1].<br />
Most machine learning approaches for QSAR rely <strong>on</strong> a vectorial<br />
representati<strong>on</strong> of the molecules. The applicability domain is expressed as<br />
a subspace of the vector space with <strong>on</strong>e dimensi<strong>on</strong> for each descriptor<br />
used. This c<strong>on</strong>cept can be not directly applied to kernel-based techniques<br />
like support vector machines. These methods rely <strong>on</strong> an implicit feature<br />
space that is <strong>on</strong>ly defined by the applied kernel similarity and with<br />
unknown dimensi<strong>on</strong>s. The applicability domain of a kernel-based model<br />
therefore has to be defined by means of the kernel. C<strong>on</strong>sequently, this<br />
allows to use structured similarity measures, like the Optimal Assignment<br />
Kernel [2] and its extensi<strong>on</strong> [3], instead of a numerical encoding. Thus, it<br />
is possible to describe the complex chemical structure of many drugs<br />
better than it would be using descriptors.<br />
In this work, several approaches to define the applicability domain of a<br />
QSAR model by means of a kernel are presented and compared to each<br />
other. The approach is to extend the c<strong>on</strong>cept of a kernel density<br />
estimati<strong>on</strong> to incorporate further informati<strong>on</strong> c<strong>on</strong>tained in a trained<br />
model. This can be achieved by using a weighted average kernel<br />
similarity of a predicted molecule to the training data set. The weights<br />
can be obtained either by exploiting the knowledge c<strong>on</strong>tained in the<br />
learned model or by approaches that describe the feature space structure<br />
using the kernel.<br />
References<br />
1. Jaworska J, Nikolova-Jeliazkova N, Aldenberg T: Altern Lab Anim 2005,<br />
33:445-459.<br />
2. Fröhlich H, Wegner J, Sieker F, Zell A: QSAR & Comb Sci 2006, 25:317-326.<br />
3. Fechner N, Jahn A, Hinselmann G, Zell A: J Chem Inf Mod 2009, 49:549-560.<br />
P39<br />
Expanding and understanding metabolite space<br />
Julio E Peir<strong>on</strong>cely 1* , Andreas Bender 2 , M Rojas-Chertó 2 , T Reijmers 2 ,<br />
L Coulier 1 , T Hankemeier 2<br />
1 TNO, Quality of Life, Utrechtseweg 48, Zeist, The Netherlands; 2 Netherlands<br />
Metabolomics Centre, University of Leiden, LACDR, Einsteinweg 55, <strong>23</strong>33 CC<br />
Leiden, The Netherlands<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P39<br />
In metabolomics the identity and role of low mass molecules called<br />
metabolites that are produced in cell metabolic processes are<br />
investigated. These make them valuable indicators of the phenotype of a<br />
biological system. The ‘Metabolite Space’ is the total chemical universe of<br />
metabolites present in all compartments and in all states from any<br />
organism. These molecules exhibit comm<strong>on</strong> features that form what can<br />
be called ‘metabolite likeness’. Here,wefocus<strong>on</strong>thehumanmetabolite<br />
space, including both endogenous and exogenous (such as drug)<br />
metabolites. In order to analyze the ‘Metabolite Space’, we collected data<br />
from the Human Metabolome Database (HMDB) [1] which is a<br />
comprehensive database for human metabolites c<strong>on</strong>taining over 7000<br />
compounds that were identified in several human biofluids and tissues.<br />
As there still remain many compounds to be identified that lay outside<br />
the boundaries of this known space, exploring this unknown regi<strong>on</strong> is<br />
crucial to evaluate ‘metabolite likeness’.<br />
In order to expand ‘Metabolite Space’ in our approach we employed the<br />
Retrosynthetic Combinatorial Analysis Procedure (RECAP) [2] to generate<br />
new molecules that possess features similar to those present in<br />
metabolites, however in other (but still likely) rearrangements. We studied<br />
how discernible these new molecules are from real metabolites and,<br />
hence, whether synthetic organic chemistry reacti<strong>on</strong>s are indeed able to<br />
expand the known universe of metabolites. We further studied the new<br />
chemistry present in the expanded metabolite space by looking at<br />
Murcko assemblies [3], ring systems and other chemical properties. The<br />
new metabolite space is compared to other small molecules, such as<br />
those obtained from the ZINC database, that are not metabolites. By<br />
combining all the above analyses we expect to characterize better the<br />
metabolite space, and furthermore, to predict the metabolite-likeness of a<br />
molecule and to understand its immanent properties.<br />
References<br />
1. Wishart DS, et al: Nucleic Acids Res 2007, 35 Database: D521-526.<br />
2. Lewell XQ, Judd DB, Wats<strong>on</strong> SP, Hann MM: J Chem Inf Comput Sci 1998,<br />
38:511-522.<br />
3. Bemis GW, Murcko MA: J Med Chem 1996, 39:2887-2893.<br />
P40<br />
Classificati<strong>on</strong> of CYP450 1A2 inhibitors using PubChem data<br />
Sergii Novotarskyi * , Iurii Sushko, R Körner, AK Pandey, Igor Tetko<br />
Helmholtz Zentrum München, Ingolstädter Landstraße 1, D-85764<br />
Neuherberg, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P40<br />
Page 21 of 25<br />
Cytochromes P450 (CYP450) are a superfamily of enzymes, involved in<br />
metabolism of a large number of xenobiotic compounds. CYP450 are<br />
involved in degradati<strong>on</strong> of a large amount of drugs, currently present <strong>on</strong><br />
the market. The promiscuity with respect to substrates makes the CYP450<br />
enzymes pr<strong>on</strong>e to inhibiti<strong>on</strong> by a large amount of drugs, which gives<br />
way to clinically significant drug-drug interacti<strong>on</strong>s.<br />
In this work different machine learning methods were applied to classify<br />
the inhibitors/n<strong>on</strong>inhibitors of human CYP450 1A2. The structures and<br />
the active/inactive classificati<strong>on</strong> c<strong>on</strong>cerning CYP1A2 inhibiti<strong>on</strong> were taken<br />
from PubChem BioAssay database. This assay uses human CYP1A2 to<br />
measure the demethylati<strong>on</strong> of luciferin 6’ methyl ether (Luciferin-ME;<br />
Promega-Glo) to luciferin.<br />
The tested methods include k nearest neighbors (kNN), decisi<strong>on</strong> tree,<br />
random forest, support vector machine (SVM) and associative neural<br />
networks (ASNN). The descriptors used were those from the Drag<strong>on</strong><br />
software, the fragment descriptors and the E-state indices.<br />
The training and test sets were handled separately to avoid different<br />
possibilities of overfitting - including overfitting by descriptor selecti<strong>on</strong>.<br />
Different applicability domain (AD) approaches were used to estimate the<br />
c<strong>on</strong>fidence of classificati<strong>on</strong>.<br />
As a result the models managed to correctly classify 80% of the test set<br />
instances. The accuracy of classificati<strong>on</strong> was found to be up to 95%, if<br />
<strong>on</strong>ly 30% most c<strong>on</strong>fident predicti<strong>on</strong>s were taken into account. The<br />
model was also applied to an external test set of 187 molecules,<br />
collected from literature and measured using a different etal<strong>on</strong> reacti<strong>on</strong>.<br />
For this set accuracy of 78% was achieved <strong>on</strong> the 30% most c<strong>on</strong>fident<br />
predicti<strong>on</strong>s.<br />
All the developed models are fast enough to be used for virtual screening<br />
of CYP1A2 inhibitors and n<strong>on</strong>inhibitors. The developed models are<br />
publicly available <strong>on</strong>-line at the http://qspr.eu web site.<br />
P41<br />
Applicability domain for classificati<strong>on</strong> problems<br />
Iurii Sushko * , S Novotarskyi, AK Pandey, R Körner, Igor Tetko<br />
Sushko, I., Helmholtz Zentrum München/IBIS, Ingolstädter Landstraße 1,<br />
D-85764 Neuherberg, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P41<br />
Classificati<strong>on</strong> models are frequent in QSAR modeling. It is of crucial<br />
importance to provide good accuracy estimati<strong>on</strong> for classificati<strong>on</strong>.<br />
Applicability domain provides additi<strong>on</strong>al informati<strong>on</strong> to identify which<br />
compounds are classified with best accuracy and which are expected to<br />
have poor and unreliable predicti<strong>on</strong>s. The selecti<strong>on</strong> of the most reliable<br />
predicti<strong>on</strong>s can dramatically improve performance of methods while<br />
decreasing coverage of predicti<strong>on</strong>s [1].<br />
In binary classificati<strong>on</strong> problems, labels for machine learning methods are<br />
discrete {-1, 1}. N<strong>on</strong>etheless, model usually yields predicti<strong>on</strong> that is<br />
c<strong>on</strong>tinuous. Most apparent metrics for accuracy estimati<strong>on</strong> is distance<br />
between predicti<strong>on</strong> point and edge of a class, i.e. the more is the<br />
distance between predicti<strong>on</strong> the edge of the class, the more reliable and<br />
accurate is the predicti<strong>on</strong> of given compound. This metric has been<br />
already used in several previous studies (e.g., [2]) and dem<strong>on</strong>strated good
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
separati<strong>on</strong> of reliable and n<strong>on</strong>-reliable classificati<strong>on</strong>s. In quantitative<br />
predicti<strong>on</strong>s, the standard deviati<strong>on</strong> of ensemble predicti<strong>on</strong>s has been<br />
found as the most accurate measure distance in a recent benchmarking<br />
[3].<br />
We propose to integrate both metrics. Rather than giving a point<br />
estimate, this approach provides us with a probability distributi<strong>on</strong> of<br />
findingparticularcompoundin<strong>on</strong>eof the classes. Suggested metrics is<br />
probability<br />
�<br />
E<br />
Navxdx ( , , )<br />
where E is class domain a - ensemble’s average predicti<strong>on</strong>, v – variance of<br />
ensemble’s predicti<strong>on</strong>,N(a, v, x) is probability density of the Gaussian<br />
distributi<strong>on</strong>. Performance of this metric and its comparis<strong>on</strong> to the<br />
traditi<strong>on</strong>al <strong>on</strong>es are evaluated for several QSAR/QSPR classificati<strong>on</strong><br />
problems. The developed approach can be freely accessed to develop<br />
and estimate applicability domain of classificati<strong>on</strong> models at http://qspr.eu<br />
web site.<br />
References<br />
1. Tetko IV, Bruneau P, Mewes HW, Rohrer DC, Poda GI: Drug Discov Today<br />
2006, 11:700.<br />
2. Manallack DT, Tehan BG, Gancia E, Huds<strong>on</strong> BD, Ford MG, Livingst<strong>on</strong>e DJ,<br />
Whitley DC, Pitt WR: J Chem Inf Comput Sci 2003, 43:674.<br />
3. Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Oberg T,<br />
Todeschini R, Fourches D, Varnek A: J Chem Inf Model 2008, 48:1733.<br />
P42<br />
Automatic pharmacophore model generati<strong>on</strong> using weighted<br />
substructure assignments<br />
Andreas Jahn * , H Planatscher, Georg Hinselmann, Nikolas Fechner, A Zell<br />
University of Tübingen, Sand 1, 72076 Tübingen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P42<br />
The generati<strong>on</strong> of a pharmacophore model is a challenging process,<br />
which often requires the interacti<strong>on</strong> of medicinal chemists. Given a<br />
number of ligands for a specific target, the aim is to identify the<br />
pharmacophore patterns that are resp<strong>on</strong>sible for the biological activities<br />
of chemical compounds. A recent study of optimal assignment methods<br />
has shown that the assignment of chemical substructures is able to<br />
detect active compounds in a data set [1]. Therefore, we investigated the<br />
possibility to use this technique to identify key features of a set of active<br />
compounds.<br />
To determine important substructures of active compounds, we<br />
integrated n weight factors, where n is the number of substructures. The<br />
substructures were defined using the pharmacophore definiti<strong>on</strong>s of Phase<br />
3.0 [2]. To define the individual weights of the pharmacophore patterns,<br />
we integrated a genetic algorithm which assigns weight factors to the<br />
previously defined patterns. The experimental setup was designed as<br />
follows: Given a data set with active compounds, the most active<br />
compound was selected as query structure for the experiment. The<br />
remaining active compounds were inserted into a background data set<br />
c<strong>on</strong>taining inactive compounds. The genetic algorithm evolved n weights<br />
for the pharmacophore patterns of the query structure. To evaluate the<br />
fitness of an individual, we performed a single query screening with the<br />
weights of the individual. During the optimizati<strong>on</strong> process, the BEDROC<br />
score [3] is optimized which puts emphasis <strong>on</strong> the early recogniti<strong>on</strong><br />
performance. The result of the genetic algorithm was a weight vector<br />
that assigns each pharmacophore feature the weight of the best<br />
individual.<br />
We evaluated our approach <strong>on</strong> a subset of the Directory of Useful Decoys<br />
that is suitable for ligand-based virtual screening [1][4]. The query<br />
structure was extracted from the same complexed crystal structure used<br />
by Huang et al. [4] to determine the binding site of the protein.<br />
The presented method is able to provide valuable informati<strong>on</strong> about key<br />
features that are important for the biological activity of a compound.<br />
Additi<strong>on</strong>ally, informati<strong>on</strong> of the protein structure is not needed.<br />
Therefore, the method can also be used to derive a pharmacophore<br />
model if no protein structure is available (e.g. GPCRs).<br />
Page 22 of 25<br />
References<br />
1. Jahn A, Hinselmann G, Fechner N, Zell A: Journal of <strong>Cheminformatics</strong> 2009,<br />
1:14.<br />
2. Phase, versi<strong>on</strong> 3.0 Schrödinger, LLC, New York, NY 2008.<br />
3. Truch<strong>on</strong> JF, Bayly CI: J Chem Inf Model 2007, 41:488-508.<br />
4. Huang N, Shoichet B, Irwin J: J Med Chem 2006, 49:6789-6801.<br />
P43<br />
A combined combinatorial and pKa-based approach to ligand<br />
prot<strong>on</strong>ati<strong>on</strong> states<br />
Tim ten Brink * , Thomas E Exner<br />
Department for Chemistry, University of K<strong>on</strong>stanz, D-78457 K<strong>on</strong>stanz,<br />
<str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P43<br />
The prot<strong>on</strong>ati<strong>on</strong> of the ligand molecule and the protein binding site has<br />
a significant influence <strong>on</strong> the results obtained by protein-ligand<br />
docking. Due to the inability of X-ray crystallography to resolve the<br />
hydrogen atom in protein and protein complex structures, the correct<br />
prot<strong>on</strong>ati<strong>on</strong> for the protein and the ligand has to be assigned <strong>on</strong> a<br />
theoretical basis before the structures can be used. Because of the local<br />
envir<strong>on</strong>ment inside the binding site and because of the influence of<br />
the ligand and the protein <strong>on</strong>to each other, the ligand prot<strong>on</strong>ati<strong>on</strong><br />
can differ from the prot<strong>on</strong>ati<strong>on</strong> <strong>on</strong>e would expect for the ligand in<br />
soluti<strong>on</strong> under physiological c<strong>on</strong>diti<strong>on</strong>s. Hence for protein-ligand<br />
docking different prot<strong>on</strong>ati<strong>on</strong> states of the ligand have to be taken into<br />
account.<br />
Our recently introduced structure preparati<strong>on</strong> tool SPORES [1] used a rule<br />
based method to generate a standard prot<strong>on</strong>ati<strong>on</strong> for each ligand<br />
molecule and afterwards generated different prot<strong>on</strong>ati<strong>on</strong> states in a<br />
combinatorial way by adding and removing single hydrogen atoms<br />
bel<strong>on</strong>ging to predefined functi<strong>on</strong>al groups. Docking of all these<br />
prot<strong>on</strong>ati<strong>on</strong> states of a given ligand molecule often led to scoring<br />
problems due to the different number of hydrogen interacti<strong>on</strong>s formed<br />
by the different protomers of the ligand molecule. To overcome these<br />
problems two methods to filter highly charged and unstable prot<strong>on</strong>ati<strong>on</strong><br />
states from the docking were implemented. One based <strong>on</strong> the difference<br />
between the standard prot<strong>on</strong>ati<strong>on</strong> of the ligand and the actual<br />
prot<strong>on</strong>ati<strong>on</strong> state and <strong>on</strong>e based <strong>on</strong> pKa values calculated with<br />
ChemAx<strong>on</strong>’s MARVIN software [2]. Here we present a new approach in<br />
which the ligand atoms c<strong>on</strong>sidered for the combinatorial method not<br />
chosen from predefined functi<strong>on</strong>al groups but according to the<br />
calculated pKa values which leads to a wider variety but with a smaller<br />
overall number of prot<strong>on</strong>ati<strong>on</strong> states.<br />
References<br />
1. ten Brink T, Exner TE: J Chem Inf Model 2009, 49(6):1535-1546.<br />
2. MARVIN 5.2.0, ChemAx<strong>on</strong> 2009 [http://www.chemax<strong>on</strong>.com].<br />
P44<br />
Semi-empirical derived descriptors for the modelling of properties of<br />
N-c<strong>on</strong>taining heterocycles<br />
Alexander Entzian 1* , Horst Bögel 1 , Mirko Buchholz 2 , Ulrich Heiser 2<br />
1 Martin-Luther-University Halle-Wittenberg, Institute for Organic Chemistry,<br />
Kurt-Mothes-Str. 2, 06120 Halle, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 Probiodrug AG, Dept. Medicinal<br />
Chemistry, Weinbergweg 22, 06120 Halle, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P44<br />
Nitrogen is <strong>on</strong>e of the most prominent hetero atoms found in<br />
heterocycles. The corresp<strong>on</strong>ding electr<strong>on</strong> l<strong>on</strong>e pairs of these nitrogen<br />
atoms are mainly resp<strong>on</strong>sible for properties like the basicity and the pkbvalues<br />
of the investigated heterocycle. Nevertheless, the overall electr<strong>on</strong>ic<br />
state of a molecule is also directly related to observable physico-chemical<br />
properties. This fact underlines a possible c<strong>on</strong>necti<strong>on</strong> between different<br />
investigated properties.<br />
The electr<strong>on</strong>ic properties of nitrogen c<strong>on</strong>taining compounds were<br />
analyzed with the aim to further predict these properties from<br />
quantum-mechanical descriptors. We thereby expect a relati<strong>on</strong>ship<br />
between the prot<strong>on</strong> affinity, the pkb-value and the strength of metal<br />
complexati<strong>on</strong>.
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
Because basicity seems to be a fundamental property of these compounds,<br />
the work was first focused <strong>on</strong> the prot<strong>on</strong> affinities of the molecules<br />
in the gas phase. We thereby c<strong>on</strong>sider the fact, that these values<br />
are not influenced by the solvati<strong>on</strong> in liquid phases. L<strong>on</strong>e pairs in<br />
nitrogen c<strong>on</strong>taining heterocycles play also an important role for metal<br />
complexati<strong>on</strong> or due to their nucleophilic attack in chemical reacti<strong>on</strong>s.<br />
Therefore 55 heterocyclic compounds were selected that bel<strong>on</strong>g to the<br />
compound classes of substituted pyridines, pyrazoles and imidazoles.<br />
These compound set was carefully divided into a training (37) and test<br />
set (18), and different descriptors of the electr<strong>on</strong>ic states were<br />
calculated by using the semi-empirical molecular orbital software<br />
MSINDO [1]. In regressi<strong>on</strong> analysis the following descriptors were<br />
identified and marked as important:<br />
Charge of the nitrogen atom,<br />
Energy of the l<strong>on</strong>e pair electr<strong>on</strong>s and<br />
Perturbati<strong>on</strong> treatment of the interacti<strong>on</strong> energy - Klopman [2].<br />
Using this set of descriptors a model was built and validated that predicts<br />
the experimental data for prot<strong>on</strong> affinity with an R 2 of 0.93 and a Q 2 of<br />
0.91. Furthermore, it was possible to successfully develop models for the<br />
predicti<strong>on</strong> of pkb-values and the strength of a metal complex formati<strong>on</strong><br />
for these compounds.<br />
References<br />
1. Bredow T, Geudtner G, Jug K: J Comput Chem 2001, 22:861.<br />
2. Klopman G: J Amer Chem Soc 1968, 90:2<strong>23</strong>.<br />
P45<br />
QSAR of anti-inflammatory drugs<br />
Ver<strong>on</strong>ica R Khayrullina 1* , H Bögel 2<br />
1 Bashkir State University, RU-450074 Ufa, Russia; 2 Halle, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P45<br />
The computer analysis of relati<strong>on</strong>s between molecular structures and their<br />
biological activity using fragment-based methods is very useful to draw<br />
c<strong>on</strong>clusi<strong>on</strong>s for the understanding of drug acti<strong>on</strong> and for the<br />
development of more efficient n<strong>on</strong>-toxic drug candidates.<br />
We used the computer system SARD-21 (Structure Activity Relati<strong>on</strong>ship &<br />
Design) to investigate comm<strong>on</strong> structural features (fragments and<br />
substituents) typical for high- and low-effective n<strong>on</strong>-steroid antiinflammatory<br />
drugs (NSAIDs) succesfully.<br />
This derived informati<strong>on</strong> has been used for the model for predicti<strong>on</strong> of<br />
anti-inflammatory effectiveness of medicines with 76% and 81% level of<br />
recogniti<strong>on</strong> by two methods. This informati<strong>on</strong> could be used for creating<br />
new highly effective NSAIDs, and for increasing effectiveness of already<br />
known comp<strong>on</strong>ents.<br />
In a sec<strong>on</strong>d part of this paper the interrelati<strong>on</strong> between structure and<br />
efficacy for anti-inflammatory drug acti<strong>on</strong> is carried out using traditi<strong>on</strong>al<br />
QSAR with descriptors from topology and from quantum-mechanical<br />
calculati<strong>on</strong>s followed by regressi<strong>on</strong> models from modelling.<br />
The aim of this paper is to compare both molecular approaches of<br />
molecular design of drugs.<br />
Reference<br />
1. Khayrullina VR, et al: Russ Chem Bull, Int Ed 2006, 55:1.<br />
P46<br />
Fingerprint-based detecti<strong>on</strong> of acute aquatic toxicity<br />
Luca Pireddu 1* , L Michielan 2 , M Floris 1 , P Rodriguez-Tomé 1 , S Moro 2<br />
1 CRS4, Pula, Italy; 2 University of Padova, Padova, Italy<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P46<br />
In this work we show the effectiveness of 2D structural fingerprints in the<br />
predicti<strong>on</strong> of aquatic toxicity of chemical compounds, creating a selfc<strong>on</strong>tained<br />
system for structure-based aquatic toxicity classificati<strong>on</strong>. Using<br />
the data from the U.S. Envir<strong>on</strong>mental Protecti<strong>on</strong> Agency Fat Head<br />
Minnow (EPA-FHM) dataset [1] we build a n<strong>on</strong>-linear RBF SVM [2]<br />
classifier that distinguishes acutely toxic compounds from less toxic<br />
compounds, loosely according to the criteri<strong>on</strong> stipulated by the E.U.<br />
Reach legislati<strong>on</strong> [3]. The classifier achieves up to 86% accuracy in leave-<br />
Page <strong>23</strong> of 25<br />
<strong>on</strong>e-out validati<strong>on</strong> using 580 of the dataset’s 614 compounds. This<br />
performance is comparable with models built from the same dataset<br />
using more sophisticated molecular descriptors, such as AutoMEP and<br />
Sterimol descriptors [4]. We apply our classificati<strong>on</strong> model to predict the<br />
aquatic toxicity of 3M compounds in the MMsINC database [5].<br />
Furthermore, we create a linear SVM model using the same technique<br />
and apply it to the MMsINC data, with the additi<strong>on</strong>al integrati<strong>on</strong> of the<br />
EXPLAIN system [6] which allows us to show which structural features are<br />
resp<strong>on</strong>sible for the model classifying a molecule as less toxic or acutely<br />
toxic.<br />
References<br />
1. Russom CL, Bradbury SP, Broderius SJ, Hammermeister DE, Drumm<strong>on</strong>d RA:<br />
Predicting modes of acti<strong>on</strong> from chemical structure: Acute toxicity in<br />
the fathead minnow (Pimephales promelas). Envir<strong>on</strong>mental Toxicology and<br />
Chemistry 1997, 16(5):948-967.<br />
2. Boser B, Guy<strong>on</strong> I, Vapnik V: A Training Algorithm for Optimal Margin<br />
Classifiers. Computati<strong>on</strong>al Learning Theory 1992, 144-152.<br />
3. EU: Corrigendum to Regulati<strong>on</strong> (EC) No 1907/2006 of the European<br />
Parliament and of the Council of 18 December 2006 c<strong>on</strong>cerning the<br />
Registrati<strong>on</strong>, Evaluati<strong>on</strong>, Authorizati<strong>on</strong> and Restricti<strong>on</strong> of Chemicals<br />
(REACH). Off J Eur Uni<strong>on</strong> L136 2007, 50.<br />
4. Michielan L, Pireddu L, Floris M, Bacilieri M, Rodriguez-Tomé P, Moro S:<br />
2009 in press.<br />
5. Masciocchi B, Frau G, Fant<strong>on</strong> M, Sturlese M, Floris M, Pireddu L, Palla P,<br />
Cedrati F, RodriguezTomé P, Moro S: MMsINC: a large-scale<br />
chemoinformatics database. Nucleic Acids Research 2008, 37 Database:<br />
D284-90.<br />
6. Poulin B, Eisner R, Szafr<strong>on</strong> D, Lu P, Greiner R, Wishart D, Fyshe A, Pearcy B,<br />
Macd<strong>on</strong>ell C, Anvik J: Visual Explanati<strong>on</strong> of Evidence in Additive<br />
Classifiers. Innovative Applicati<strong>on</strong>s of Artificial Intelligence 2006.<br />
P47<br />
Adaptive matrix metrics for molecular descriptor assessment in<br />
QSPR classificati<strong>on</strong><br />
Axel J Soto 1* , Marc Strickert 2 , GE Vazquez 1<br />
1 Planta Piloto de Ingeniería Química (PLAPIQUI), Universidad Naci<strong>on</strong>al del<br />
Sur, Alem 1250, 8000 Bahía Blanca, Argentina; 2 Leibniz Institute of Plant<br />
Genetics and Crop Plant Research (IPK), Corrensstr. 3, 06466 Gatersleben,<br />
<str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P47<br />
QSPR methods represent a useful approach in the drug discovery process,<br />
since they allow predicting in advance biological or physicochemical<br />
properties of a candidate drug. For this goal, it is necessary that the QSPR<br />
method be as accurate as possible to provide reliable predicti<strong>on</strong>s.<br />
Moreover, the selecti<strong>on</strong> of the molecular descriptors is an important task<br />
to create QSPR predicti<strong>on</strong> models of low complexity which, at the same<br />
time, provide accurate predicti<strong>on</strong>s.<br />
In this work, a matrix-based method [1] is used to transform the original<br />
data space of chemical compounds into an alternative space where<br />
compounds with different target properties can be better separated. For<br />
usingthisapproach,QSPRisc<strong>on</strong>sideredasaclassificati<strong>on</strong>problem.The<br />
advantage of using adaptive matrix metrics is twofold: it can be used to<br />
identify important molecular descriptors and at the same time it allows<br />
improving the classificati<strong>on</strong> accuracy.<br />
A recently proposed method making use of this c<strong>on</strong>cept [2] is extended<br />
to multi-class data. The new method is related to linear discriminant<br />
analysis and shows better results at yet higher computati<strong>on</strong>al costs. An<br />
applicati<strong>on</strong> for relating chemical descriptors to hydrophobicity property<br />
[3] shows promising results.<br />
References<br />
1. Strickert M, Keilwagen J, Schleif F-M, Villmann T, Biehl M: Matrix Metric<br />
Adaptati<strong>on</strong> Linear Discriminant Analysis of Biomedical Data. Lecture Notes<br />
in Computer Science 2009, 5517/2009:933-940.<br />
2. Strickert M, Soto AJ, Keilwagen J, Vazquez GE: Towards matrix-based<br />
selecti<strong>on</strong> of feature pairs for efficient ADMET predicti<strong>on</strong>. Argentine<br />
Symposium <strong>on</strong> Artificial Intelligence, ASAI 2009, 83-94.<br />
3. Soto AJ, Cecchini RL, Vazquez GE, P<strong>on</strong>z<strong>on</strong>i I: A Wrapper-based Feature<br />
Selecti<strong>on</strong> Method for ADMET Predicti<strong>on</strong> using Evoluti<strong>on</strong>ary Computing.<br />
Lecture Notes in Computer Science 2008, 4973/2008:188-199.
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
P48<br />
Evoluti<strong>on</strong>ary design of selective adenosine receptor ligands<br />
Horst van der Eelke * , J Kruisselbrink, A Aleman, MT Emmerich,<br />
Andreas Bender, AP IJzerman<br />
Divisi<strong>on</strong> of Medicinal Chemistry, Leiden/Amsterdam Center for Drug<br />
Research, Leiden University, Einsteinweg 55, <strong>23</strong>33CC Leiden, The Netherlands<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P48<br />
Evoluti<strong>on</strong>ary de novo design is the design of new active compounds from<br />
scratch, using evoluti<strong>on</strong>ary principles. It involves an iterative cycle of<br />
structure generati<strong>on</strong>, evaluati<strong>on</strong>, and selecti<strong>on</strong> of candidate structures.<br />
Selected candidates serve as input for the next generati<strong>on</strong>. By repeatedly<br />
selecting <strong>on</strong>ly the best structures as basis for structure generati<strong>on</strong>, the<br />
process evolves toward better (not necessarily best) soluti<strong>on</strong>s. Here, we<br />
applied an evoluti<strong>on</strong>ary algorithm (the Molecule Commander) for the design<br />
of selective adenosine receptor ligands. Generated compounds were all ruleof-5<br />
compliant and had a polar surface areas between 0 and 140 Å 2 ,<br />
favorable for intestinal absorpti<strong>on</strong>. In additi<strong>on</strong>, toxic compounds were<br />
filtered out using a categorical SVM model for predicti<strong>on</strong> of mutagenicity<br />
trained <strong>on</strong> Ames-test mutagenicity data (5-fold ROC score: 0.8948). Four<br />
pharmacophores were designed, <strong>on</strong>e for each human adenosine receptor<br />
subtype. The hA 1 receptor pharmacophore served as objective for the<br />
evoluti<strong>on</strong>, while the three other pharmacophores served as negative<br />
objective in order to obtain selectivity. In order to measure similarity with<br />
known adenosine receptor ligands, ring systems and scaffolds in the<br />
generated compounds were compared with those extracted from adenosine<br />
ligands in the StARLITe database. With each new generati<strong>on</strong>, the structures<br />
displayed an increasing number of ring systems also found in adenosine<br />
receptor ligands while the number of uniquecorestructures(scaffolds)<br />
increased as well. Eventually, the best (ADMET and pharmacophore score)<br />
candidate structures will be proposed for synthesis and tested for activity.<br />
P49<br />
I<strong>on</strong>otropic GABA receptors: modelling and design of selective ligands<br />
Vladimir A Palyulin * , EV Radchenko, DE Osolodkin, VI Chupakhin, NS Zefirov<br />
Department of Chemistry, Moscow State University, Moscow 119991, Russia<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P49<br />
I<strong>on</strong>otropic GABA A and GABA C receptors play an important role in the operati<strong>on</strong><br />
of CNS and serve as targets for many neuroactive drugs. Using the homology<br />
modelling and molecular dynamics, the 3D models of the receptors were built<br />
and some aspects of ligand-target interacti<strong>on</strong>s were elucidated [1,2].<br />
To better understand the structural factors c<strong>on</strong>trolling the activity and<br />
selectivity of the ligands, a series of QSAR models [3] were derived based<br />
<strong>on</strong> the Molecular Field Topology Analysis (MFTA) [4], CoMFA and<br />
Topomer CoMFA approaches. They were compared with each other as<br />
well as with the molecular modelling results.<br />
Finally, a number of potential selective ligand structures were identified<br />
by means of the virtual screening [5] from the available chemicals<br />
databases and the generated structure libraries.<br />
References<br />
1. Osolodkin DI, Chupakhin VI, Palyulin VA, Zefirov NS: J Mol Graph Model<br />
2009, 27:813.<br />
2. Chupakhin VI, Palyulin VA, Zefirov NS: Doklady Biochem Biophys 2006, 408:169.<br />
3. Chupakhin VI, Bobrov SV, Radchenko EV, Palyulin VA, Zefirov NS: Doklady<br />
Chemistry 2008, 422:227.<br />
4. Palyulin VA, Radchenko EV, Zefirov NS: J Chem Inf Comp Sci 2000, 40:659.<br />
5. Radchenko EV, Palyulin VA, Zefirov NS: Chemoinformatics Approaches to<br />
Virtual Screening RSC: Varnek A, Tropsha A 2008, 150-181.<br />
P50<br />
PoseView – molecular interacti<strong>on</strong> patterns at a glance<br />
Katrin Stierand * , Matthias Rarey<br />
Center for Bioinformatics (ZBH), University of Hamburg, Bundesstrasse 43,<br />
20146 Hamburg, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P50<br />
Chemists are well trained in perceiving 2D molecular sketches. On the side<br />
of computer assistance, the automated generati<strong>on</strong> of such sketches<br />
becomes very difficult when it comes to multi-molecular arrangements<br />
such as protein-ligand complexes in adrugdesignc<strong>on</strong>text.Existing<br />
soluti<strong>on</strong>s to date suffer from drawbacks such as missing important<br />
interacti<strong>on</strong> types, inappropriate levels of abstracti<strong>on</strong> and layout quality.<br />
During the last few years we have developed PoseView [1,2], a tool which<br />
displays molecular complexes incorporating a simple, easy-to-perceive<br />
arrangement of the ligand and the amino acids towards which it forms<br />
interacti<strong>on</strong>s. Resulting in atomic resoluti<strong>on</strong> diagrams, PoseView operates<br />
<strong>on</strong> a fast tree re-arrangement algorithm to minimize crossing lines in the<br />
sketches. Due to a de-coupling of interacti<strong>on</strong> percepti<strong>on</strong> and the drawing<br />
engine, PoseView can draw any interacti<strong>on</strong>s determined by either<br />
distance-based rules or the FlexX interacti<strong>on</strong> model (which itself is user<br />
accessible). Owing to the small molecule drawing engine 2Ddraw [3],<br />
molecules are drawn in a textbook-like manner following the IUPAC<br />
regulati<strong>on</strong>s.<br />
The tool has a generic file interface for other complexes than proteinligand<br />
arrangements. It can therefore be used as well for the display of,<br />
e.g., RNA/DNA complexes with small molecules. For batch processing, an<br />
additi<strong>on</strong>al command line interface is available; output can be provided in<br />
various formats, am<strong>on</strong>gst them gif, ps, svg and pdf.<br />
Besides the underlying interacti<strong>on</strong> models, we will present new<br />
algorithmic approaches, assess usability issues and a large-scale validati<strong>on</strong><br />
study <strong>on</strong> the PDB.<br />
References<br />
1. Stierand K, Rarey M: ChemMedChem 2007, 2:853.<br />
2. Stierand K, Maaß P, Rarey M: Bioinformatics 2006, 22:1710.<br />
3. Fricker PC, Gastreich M, Rarey M: J Chem Inf Comput Sci 2004, 44(3):1065.<br />
P51<br />
Get the best from substructure mining<br />
Jeroen Kazius<br />
Curios-IT, Anna van Burenhof 63, <strong>23</strong>16GP Leiden, The Netherlands<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P51<br />
Page 24 of 25<br />
The chemical informati<strong>on</strong> that is present in a set of compounds is rarely<br />
fullyexploited.Thisismostlybecause no descriptor set can capture all<br />
biologically important features. As a result, valuable chemical<br />
knowledge can thus stay hidden from hypothesis-based drug design.<br />
The simplest form of a structure-activity relati<strong>on</strong>ship (SAR) is a<br />
substructure that predisposes compounds towards reduced or increased<br />
biological activity. Such simple patterns should not be missed during<br />
drug design.<br />
The aim of substructure mining is to present those substructures that are<br />
most likely related to biological activity. This method thus provides rapid<br />
access to a substantial repertoire of chemical descriptors that otherwise<br />
remains hidden: substructures. In short, substructure mining c<strong>on</strong>sists of a<br />
focused, but exhaustive, series of substructure searches.<br />
This poster describes how AweSuM, the new Awesome Substructure<br />
Mining tool from Curios-IT, was employed to learn the most interesting<br />
substructures. The poster also discusses the value of enriching the data<br />
with 2D pharmacophore informati<strong>on</strong> prior to mining. An enriched,<br />
detailed SAR analysis produced a scaffold that summarises the chemical<br />
c<strong>on</strong>tent of datasets better than any standard substructure. The<br />
pharmacophore that AweSuM extracted shows predictive power and<br />
agrees with published chemical knowledge. These results dem<strong>on</strong>strate<br />
that useful SAR knowledge can be extracted from the vast space of<br />
substructure descriptors. In this way, AweSuM reveals key substructures<br />
(e.g., pharmacophores or toxicophores), which can often be predictive for<br />
biological activities.<br />
P52<br />
Rescoring of docking poses using force field-based methods<br />
Nina M Fischer * , WM Schneider, O Kohlbacher<br />
Eberhard Karls University, Center for Bioinformatics Tübingen, Sand 14, 72076<br />
Tübingen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P52<br />
Existing protein-ligand docking methods computati<strong>on</strong>ally screen<br />
thousands to milli<strong>on</strong>s of organic molecules against protein structures,<br />
trying to find those with complementary shapes and highest binding free
Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />
http://www.jcheminf.com/supplements/2/S1<br />
energies. To allow large molecular databases to be screened rapidly,<br />
simple and approximative scoring functi<strong>on</strong>s are used as a fast filter,<br />
resultinginlowhitrates.Therefore, docking hit lists are comm<strong>on</strong>ly<br />
rescored in a final step by more rigorous and time-c<strong>on</strong>suming methods<br />
to gain a more accurate final list of ranked compounds.<br />
Molecular mechanics Poiss<strong>on</strong>-Boltzmann surface area (MM-PBSA) or<br />
generalized Born surface area (MM-GBSA) methods are currently<br />
c<strong>on</strong>sidered to be suitable techniques for rescoring. These physically<br />
realistic approaches incorporate more sophisticated models for solvati<strong>on</strong><br />
and electrostatic interacti<strong>on</strong>s than most scoring functi<strong>on</strong>s. Hence, they can<br />
discriminate more reliably between correct and incorrect docking poses.<br />
A high-throughput rescoring protocol using force field-based methods<br />
has been proposed by Brown and Muchmore [1]. They used 18 in-house<br />
urokinase-ligand crystal structures and their corresp<strong>on</strong>ding experimentally<br />
determined binding free energies as a test set. On the basis of this<br />
rescoring protocol we tested several molecular dynamics simulati<strong>on</strong><br />
protocols in combinati<strong>on</strong> with different MM-PBSA, and MM-GBSA<br />
calculati<strong>on</strong> procedures using Amber 10 [2] and Gromacs 4 [3].<br />
C<strong>on</strong>sidering performance and accuracy, our best rescoring protocol<br />
performs similarly to the <strong>on</strong>e described by Brown and Muchmore [1].<br />
It has a comparable run-time and achieves a correlati<strong>on</strong> between<br />
experimental and rescored values of 0.88 compared to 0.87.<br />
However, we used urokinase-ligand complexes generated using the<br />
docking program Glide [4] instead of crystal structures. Additi<strong>on</strong>ally, this<br />
shows that our rescoring protocol improves the correlati<strong>on</strong> of 0.57<br />
between experimental values and Glide scores significantly to 0.88,<br />
thereby achieving a more accurate list of ranked compounds.<br />
The protocol will be incorporated into BALLView [5], an open-source<br />
molecular viewer and modeling tool. Thus, it will be available free of<br />
charge and can be c<strong>on</strong>veniently used to rescore (docked) protein-ligand<br />
complexes.<br />
References<br />
1. Brown SP: J Chem Inf Model 2007, 47:1493.<br />
2. Case DA: University of California, San Francisco 2008.<br />
3. Hess B: J Chem Theory Comput 2008, 4:435.<br />
4. Friesner RA: J Med Chem 2004, 47:1739.<br />
5. Moll A: J Comput Aided Mol Des 2005, 19:791.<br />
P53<br />
The pipelined metabolite identificati<strong>on</strong> based <strong>on</strong> MS fragmentati<strong>on</strong><br />
Miguel Rojas-Cherto * , PT Kasper, Peir<strong>on</strong>cely Julio E, T Reijmers,<br />
RJ Vreeken, T Hankemeier<br />
Leiden, The Netherlands<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P53<br />
Structural characterizati<strong>on</strong> and identificati<strong>on</strong> of comp<strong>on</strong>ents of complex<br />
biological mixtures c<strong>on</strong>stitutes <strong>on</strong>e of the central aspects of metabolomics.<br />
Metabolite identificati<strong>on</strong> is a challenging but essential task in studies of<br />
biological samples. Mass spectrometry, because of its high sensitivity and<br />
specificity, is widely and successfully used in analysis of biological samples.<br />
Identificati<strong>on</strong> of metabolites can be in principle achieved using high<br />
resoluti<strong>on</strong> multistage mass spectrometry (MSn) because it provides a<br />
feature rich fingerprint of the precursor structure. However, neither general<br />
methodology for the identificati<strong>on</strong> nor extensive databases of metabolites<br />
with multistage mass spectrometric data are available at the moment. We<br />
dem<strong>on</strong>strate in this poster the feasibility of the strategy for metabolite<br />
identificati<strong>on</strong> based <strong>on</strong> analysis of fragmentati<strong>on</strong> trees.<br />
High resoluti<strong>on</strong> multi stage MS experiments were performed <strong>on</strong><br />
LTQ-Orbitrap (Thermo) equipped with Triversa nanoMate (Advi<strong>on</strong>)<br />
nanoelectrospray i<strong>on</strong> source using defined protocol. An in-house<br />
developed software, integrating am<strong>on</strong>g others: Chemistry Development<br />
Kit (CDK) and XCMS libraries, was used for spectral data processing.<br />
Resulting fragmentati<strong>on</strong> trees were stored in a local database.<br />
Multi-stage Molecular Formula (MMF) tool, which uses a method to<br />
resolve the elemental compositi<strong>on</strong> of the compound and fragment i<strong>on</strong>s<br />
derived from MSn data using a cyclic c<strong>on</strong>straining process has been<br />
developed. The process of elemental formula assignment and fitting<br />
within elemental formula paths removes artefacts of the spectra.<br />
Background signals, spikes, c<strong>on</strong>taminati<strong>on</strong>s, side peaks, etc., are rejected<br />
in this stage and are not included in the fragmentati<strong>on</strong> tree. The<br />
resulting fragmentati<strong>on</strong> trees are stored in XML format.<br />
Repeatability, reproducibility and robustness of fragmentati<strong>on</strong> tree<br />
acquisiti<strong>on</strong>s were tested by changing experimental c<strong>on</strong>diti<strong>on</strong>s and<br />
varying the c<strong>on</strong>centrati<strong>on</strong> of the metabolite of interest. An acquisiti<strong>on</strong><br />
protocol was established for the reliable and reproducible acquisiti<strong>on</strong> of<br />
mass spectral trees. It was investigated to which extent the variati<strong>on</strong> of<br />
c<strong>on</strong>diti<strong>on</strong>s such as fragmentati<strong>on</strong> energy, isolati<strong>on</strong> width etc. did change<br />
the fragmentati<strong>on</strong> pattern or topology of hierarchical relati<strong>on</strong>s between<br />
fragments.<br />
We dem<strong>on</strong>strate how the developed analytical strategy based <strong>on</strong><br />
analysis of fragmentati<strong>on</strong> trees can be used to discriminate between<br />
metabolite isomers with the same elemental compositi<strong>on</strong> and an <strong>on</strong>ly<br />
slightly different structure, but with a significantly different biological<br />
functi<strong>on</strong>.<br />
Our results provide firm basis for developing a generic, multi stage mass<br />
spectrometry based platform for efficient identificati<strong>on</strong> of metabolites.<br />
P54<br />
SBE13, a newly identified inhibitor of inactive polo-like kinase 1<br />
Sarah Keppner 1* , Ewgenij Proschak 2 , Gisbert Schneider 2 , B Spänkuch 1<br />
1 Goethe-University, Medical School, Department of Gynecology and<br />
Obstetrics, Theodor-Stern-Kai 7, 60590 Frankfurt, <str<strong>on</strong>g>German</str<strong>on</strong>g>y;<br />
2 Goethe-University, Frankfurt, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />
Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P54<br />
Page 25 of 25<br />
Protein kinases are important targets for drug development. The almost<br />
identical protein folding of kinases and the comm<strong>on</strong> co-substrate ATP<br />
leads to the problem of inhibitor selectivity. Type II inhibitors, targeting<br />
the inactive c<strong>on</strong>formati<strong>on</strong> of kinases, occupy a hydrophobic pocket with<br />
less c<strong>on</strong>served surrounding amino acids [1].<br />
Human polo-like kinase 1 (Plk1) represents a promising target for<br />
approaches to identify new therapeutic agents. Plk1 bel<strong>on</strong>gs to a family<br />
of highly c<strong>on</strong>served serine/thre<strong>on</strong>ine kinases, and is a key player in<br />
mitosis, where it modulates the spindle checkpoint at metaphase/<br />
anaphase transiti<strong>on</strong>. Plk1 is over-expressed in all today analyzed human<br />
tumors of different origin and serves as a negative prognostic marker in<br />
cancer patients. The newly identified inhibitor, SBE13, a vanillin derivative,<br />
targets Plk1 in its inactive c<strong>on</strong>formati<strong>on</strong> [2]. This leads to selectivity<br />
within the Plk family and towards Aurora A. This selectivity can be<br />
explained by docking studies of SBE13 into the binding pocket of<br />
homology models of Plk1, Plk2 and Plk3 in their inactive c<strong>on</strong>formati<strong>on</strong>.<br />
SBE13 showed anti-proliferative effects in cancer cell lines of different<br />
origins with EC 50 values between 5 μM and39μM and induced apoptosis.<br />
Increasing c<strong>on</strong>centrati<strong>on</strong>s of SBE13 result in increasing amounts of cells in<br />
G 2/M phase 13 hours after double thymidin block of HeLa cells. The kinase<br />
activity of Plk1 was inhibited with an IC 50 of 200 pM.<br />
Taken together, we could show that carefully designed structure-based<br />
virtual screening is well-suited to identify selective type II kinase<br />
inhibitors targeting Plk1 as potential anti-cancer therapeutics.<br />
References<br />
1. Liu Y, Gray NS: Nat Chem Biol 2006, 2:358-364.<br />
2. Keppner S, Proschak E, Schneider G, Spänkuch B: Chem Med Chem 2009 in<br />
press.<br />
Cite abstracts in this supplement using the relevant abstract number,<br />
e.g.: Keppner et al.: SBE13, a newly identified inhibitor of inactive pololike<br />
kinase 1. Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P54