23.12.2012 Views

5th German Conference on Cheminformatics: 23 ... - BioMed Central

5th German Conference on Cheminformatics: 23 ... - BioMed Central

5th German Conference on Cheminformatics: 23 ... - BioMed Central

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

MEETING ABSTRACTS Open Access<br />

<str<strong>on</strong>g>5th</str<strong>on</strong>g> <str<strong>on</strong>g>German</str<strong>on</strong>g> <str<strong>on</strong>g>C<strong>on</strong>ference</str<strong>on</strong>g> <strong>on</strong> <strong>Cheminformatics</strong>:<br />

<strong>23</strong>. CIC-Workshop<br />

Goslar, <str<strong>on</strong>g>German</str<strong>on</strong>g>y. 8-10 November 2009<br />

Edited by Frank Oellien, Uli Fechner and Thomas Engel<br />

Published: 04 May 2010<br />

These abstracts are available <strong>on</strong>line at http://www.jcheminf.com/supplements/2/S1<br />

INTRODUCTION<br />

A1<br />

<str<strong>on</strong>g>5th</str<strong>on</strong>g> <str<strong>on</strong>g>German</str<strong>on</strong>g> <str<strong>on</strong>g>C<strong>on</strong>ference</str<strong>on</strong>g> <strong>on</strong> Chemoinformatics: <strong>23</strong>. CIC-Workshop.<br />

November 8-10, 2009, Goslar, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Frank Oellien 1* , Uli Fechner 2 , Thomas Engel 3<br />

1 GDCh-CIC Divisi<strong>on</strong> Chair, Intervet Innovati<strong>on</strong> GmbH, Zur Propstei, 55270<br />

Schwabenheim, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 GDCh-CIC divisi<strong>on</strong> board member, Beilstein-<br />

Institut zur Förderung der Chemischen Wissenschaften, Trakehner Str. 7-9,<br />

60487 Frankfurt, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 3 GDCh-CIC Divisi<strong>on</strong> Co-Chair, Fakultät für Chemie<br />

und Pharmazie, Universität München, Butenandtstr. 5-13, 81377 München,<br />

<str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

E-mail: frank.oellien@sp.intervet.com<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):A1<br />

From the 8 th to the 10 th November 2009, the Chemistry-Informati<strong>on</strong>-<br />

Computers (CIC) divisi<strong>on</strong> of the <str<strong>on</strong>g>German</str<strong>on</strong>g> Chemical Society (GDCh) has invited<br />

the chemoinformatics and modeling community to Goslar, <str<strong>on</strong>g>German</str<strong>on</strong>g>y to<br />

participate in the <str<strong>on</strong>g>5th</str<strong>on</strong>g> <str<strong>on</strong>g>German</str<strong>on</strong>g> <str<strong>on</strong>g>C<strong>on</strong>ference</str<strong>on</strong>g> <strong>on</strong> Chemoinformatics<br />

(GCC 2009). The internati<strong>on</strong>al symposium addressed a broad range of<br />

modern research topics in the field of computers and chemistry. The focus<br />

was <strong>on</strong> recent developments and trends in the fields of Chemoinformatics<br />

and Drug Discovery, Chemical Informati<strong>on</strong>, Patents and Databases, Molecular<br />

Modeling, Computati<strong>on</strong>al Material Science and Nanotechnology. In additi<strong>on</strong>,<br />

other c<strong>on</strong>tributi<strong>on</strong>s from the field of Computati<strong>on</strong>al Chemistry were<br />

welcome.<br />

The c<strong>on</strong>ference was opened traditi<strong>on</strong>ally with a “Free-Software-Sessi<strong>on</strong>” <strong>on</strong><br />

Sunday afterno<strong>on</strong> right before the official c<strong>on</strong>ference opening at 5 pm<br />

including three talks about the Open Source projects Bingo, Dingo and<br />

OrChem. In parallel the “Chemoinformatics Market Place” took place<br />

including software tutorials by Chemical Computing Group, Hemlholtz-<br />

Center Munich and the Cambridge Crystallograhic Data Center.<br />

The scientific program was opened by an evening talk giving an overview<br />

<strong>on</strong> the field of Systems Chemistry (Günter v<strong>on</strong> Kiedrowski). In additi<strong>on</strong>,<br />

the program included six plenary lectures (Eberhard Voit (USA), Knut<br />

Baumann (<str<strong>on</strong>g>German</str<strong>on</strong>g>y), Thomas Kostka (<str<strong>on</strong>g>German</str<strong>on</strong>g>y), Anth<strong>on</strong>y J. Williams<br />

(USA), Kalr-Heinz Baringhaus (<str<strong>on</strong>g>German</str<strong>on</strong>g>y), Christoph Sotriffer (<str<strong>on</strong>g>German</str<strong>on</strong>g>y)],<br />

17 general lectures as well as 54 poster presentati<strong>on</strong>s.<br />

Besides the scientific program a special highlight of the c<strong>on</strong>ference were<br />

the FIZ-CHEMIE-Berlin 2009 awards <strong>on</strong> M<strong>on</strong>day afterno<strong>on</strong> (Figure 1). The<br />

CIC divisi<strong>on</strong> awards this price each year to the best diploma thesis and<br />

the best PhD in the field of Computati<strong>on</strong>al Chemistry. The price for the<br />

PhD thesis was awarded to Dr. José Batista from the group of Prof.<br />

Dr. Jürgen Bajorath, University of B<strong>on</strong>n for his dissertati<strong>on</strong> “Analysis of<br />

Random Fragment Profiles for the Detacti<strong>on</strong> of Structure-Activity<br />

Relati<strong>on</strong>ships”. The award for the best diploma thesis has g<strong>on</strong>e to Frank<br />

Tristram from the group of Dr. Wolfgang Wenzel, Karlsruher Institute of<br />

Figure 1 (abstract A1). Fiz CHEMIE Berlin Awards 2009: from left to<br />

right, Frank Tristram (award for the best diploma thesis), Rene de<br />

Planque (Head of FIZ CHEMIE Verlin), Frank Oellien (Chair of the<br />

GDCh-CIC divisi<strong>on</strong>), José Batista (award for the best PhD thesis).<br />

Technology with the title “Modellierubng der Hauptkettenbeweglichkeit in<br />

der rechnergestützten Medikamentenentwicklung”.<br />

FREE SOFTWARE SESSION<br />

PRESENTATIONS<br />

F1<br />

Bingo from SciTouch LLC: chemistry cartridge for Oracle database<br />

Dmitry Pavlov * , Mikhail Rybalkin, Boris Karulin<br />

SciTouch LLC, Budapeshtskaya st. <strong>23</strong>-2-82, RUS-192212 St. Petersburg, Russia<br />

E-mail: dpavlov@scitouch.net<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):F1<br />

Bingo is a data cartridge for Oracle database that provides the industry’s<br />

next-generati<strong>on</strong>, fast, scalable, and efficient storage and searching<br />

soluti<strong>on</strong> for chemical informati<strong>on</strong>.<br />

Bingo seamlessly integrates the chemistry into Oracle databases. Its<br />

extensible indexing is designed to enable scientists to store, index, and


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

search chemical moieties al<strong>on</strong>gside numbers and text within <strong>on</strong>e<br />

underlying relati<strong>on</strong>al database server.<br />

For molecule structure searching, Bingo supports 2D and 3D exact and<br />

substructure searches, as well as similarity, tautomer, Markush, formula,<br />

molecular weight, and flexmatch searches. For reacti<strong>on</strong> searches, Bingo<br />

supports reacti<strong>on</strong> substructure search (RSS) with opti<strong>on</strong>al automatic<br />

generati<strong>on</strong> of atom-to-atom mapping. All of these techniques are<br />

available through extensi<strong>on</strong>s to the SQL and PL/SQL syntax.<br />

Bingo also has features not present in other cartridges, for example,<br />

advanced tautomer search, res<strong>on</strong>ance substructure search, and fast<br />

updating of the index when adding new structures.<br />

The presentati<strong>on</strong> itself you can download from our site: http://<br />

opensource.scitouch.net/downloads/bingo-cic.pdf<br />

F2<br />

Dingo: 2D molecule and reacti<strong>on</strong> structural formula rendering library<br />

Mikhail Kozhevnikov * , Boris Karulin<br />

SciTouch LLC, Budapeshtskaya st. <strong>23</strong>-2-82, RUS-192212 St. Petersburg, Russia<br />

E-mail: kozhevnikov@scitouch.net<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):F2<br />

2D structural formulae are widely accepted as a way of representing<br />

chemical compounds and reacti<strong>on</strong>s. High-quality visualizati<strong>on</strong> of those<br />

formulae can greatly facilitate scientist percepti<strong>on</strong>.<br />

We are proud to present a tool for such visualizati<strong>on</strong>, called Dingo [1]. It<br />

is capable of rendering molecules, reacti<strong>on</strong>s and queries al<strong>on</strong>g with a<br />

wide range of attributes <strong>on</strong> jpgWindows, Linux, Solaris and Mac OS in<br />

accordance with IUPAC recommendati<strong>on</strong>s [2].<br />

Dingo accepts comm<strong>on</strong> file formats, such as MDL Molfile [3] and Rxnfile,<br />

SMILES [4] with several extensi<strong>on</strong>s and more. The result can be stored as<br />

a PDF, SVG, PNG or EMF file. One can also embed Dingo in an applicati<strong>on</strong><br />

and use it via simple C interface or a C# wrapper to produce image files<br />

or render directly to a Windows HDC.<br />

Dingo is a str<strong>on</strong>g competitor for existing proprietary tools <strong>on</strong> the market,<br />

while it is available under the terms of GPLv3 free of charge.<br />

The aim of the project is to provide a simple and efficient mechanism for<br />

2D structural formula rendering. High quality and the variety of<br />

supported formats allows inclusi<strong>on</strong> of resulting images in scientific<br />

papers, further editing of the output in a vector form or embedding the<br />

library in more complex applicati<strong>on</strong>s.<br />

References<br />

1. [http://opensource.scitouch.net/indigo/dingo].<br />

2. [http://www.iupac.org/objID/Article/pac8002x0277].<br />

3. [http://www.symyx.com/downloads/public/ctfile/ctfile.pdf].<br />

4. [http://www.daylight.com/smiles/].<br />

F3<br />

OrChem<br />

Mark Rijnbeek<br />

Protein and Nucleotide Database Group, Chemoinformatics and Metabolism<br />

Team, EMBL Outstati<strong>on</strong> - Hinxt<strong>on</strong>, European Bioinformatics Institute,<br />

Wellcome Trust Genome Campus, Hinxt<strong>on</strong>, Cambridge, CB10 1SD, UK<br />

E-mail: markr@ebi.ac.uk<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):F3<br />

Background: Registrati<strong>on</strong>, indexing and searching of chemical structures<br />

in relati<strong>on</strong>al databases is <strong>on</strong>e of the core areas of cheminformatics.<br />

However, little detail has been published <strong>on</strong> the inner workings of search<br />

engines and their development has been mostly closed-source. We<br />

decided to develop an open source chemistry extensi<strong>on</strong> for Oracle, the<br />

de facto database platform in the commercial world.<br />

Results: Here we present OrChem, an extensi<strong>on</strong> for the Oracle 11G<br />

database that adds registrati<strong>on</strong> and indexing of chemical structures to<br />

support fast substructure and similarity searching. The cheminformatics<br />

functi<strong>on</strong>alityisprovidedbytheChemistryDevelopmentKit.OrChem<br />

provides similarity searching with resp<strong>on</strong>se times in the order of sec<strong>on</strong>ds<br />

for databases with milli<strong>on</strong>s of compounds, depending <strong>on</strong> a given<br />

similarity cut-off. For substructure searching, it can make use of multiple<br />

processor cores <strong>on</strong> today’s powerful database servers to provide fast<br />

resp<strong>on</strong>se times in equally large data sets.<br />

Availability: OrChem is free software and can be redistributed and/or<br />

modified under the terms of the GNU Lesser General Public License as<br />

published by the Free Software Foundati<strong>on</strong>. All software is available via<br />

http://orchem.sourceforge.net.<br />

ORAL PRESENTATIONS<br />

Page 2 of 25<br />

O1<br />

Systems chemistry: from chemical self-replicati<strong>on</strong> to trisoligo-based<br />

nanoc<strong>on</strong>structi<strong>on</strong><br />

Günter v<strong>on</strong> Kiedrowski<br />

Lehrstuhl für Organische Chemie I - Bioorganische Chemie, Ruhr-Universität,<br />

Universitätsstrasse 150, 44780 Bochum, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

E-mail: kiedro@rub.de<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O1<br />

Self-replicati<strong>on</strong> is <strong>on</strong>e of the major principles without life could not exist.<br />

The emergence of self-replicating systems <strong>on</strong> the early earth is generally<br />

believed to have taken place before the advent of instructed protein<br />

synthesis based <strong>on</strong> a complex translati<strong>on</strong> machinery. Whether the origin<br />

of self-replicati<strong>on</strong> is identical to the origin of the hypothetical RNA world<br />

or whether it existed at an earlier stage of evoluti<strong>on</strong> is an open questi<strong>on</strong><br />

that has stimulated chemists to search for chemical systems capable of<br />

making copies of itselves via autocatalytic reacti<strong>on</strong>s. As self-replicati<strong>on</strong><br />

means autocatalysis plus informati<strong>on</strong> transfer, the reacti<strong>on</strong> products must<br />

necessarily be able to store more structural informati<strong>on</strong> than their<br />

precursors. Templating as a means to transfer structural informati<strong>on</strong> has<br />

been exploited since the first successful example of a chemical selfreplicating<br />

system almost two decades ago [1]. Today we have a broad<br />

variety of such systems employing olig<strong>on</strong>ucleotides, peptides, and small<br />

organic molecules as templates and autocatalytic [1-3], cross-catalytic [4],<br />

collectively autocatalytic and n<strong>on</strong>-aut<strong>on</strong>omous (stepwise) schemes of selfreplicati<strong>on</strong><br />

[5].<br />

A link between the chemical self-replicati<strong>on</strong> and nanotechnology was first<br />

pointed out by G.M.Whitesides in his debate with Drexler. The ribosome<br />

is an example for a nanomachine which may be viewed as a threedimensi<strong>on</strong>ally<br />

defined array of 51 modular proteins positi<strong>on</strong>ed by the<br />

rRNA scaffold. Biomimetic approaches towards the 3D-nano-scaffolding of<br />

modular functi<strong>on</strong>s may be based <strong>on</strong> trisolig<strong>on</strong>ucleotides [6], viz. synthetic<br />

3-arm juncti<strong>on</strong>s in which the 3’-ends of three olig<strong>on</strong>ucleotides are<br />

c<strong>on</strong>nected by a suitable linker. We report <strong>on</strong> trisoligos as building blocks<br />

for the n<strong>on</strong>covalent c<strong>on</strong>structi<strong>on</strong> of nanoobjects. Kinetic c<strong>on</strong>trol - applied<br />

by rapid cooling during self-assembly - was found to favor small and<br />

defined nano-structures instead of large polymeric networks [6]. Maximal<br />

instructi<strong>on</strong> was employed as design principles to generate n<strong>on</strong>covalent<br />

3D-nano-objects in which both, the topology and the geometry is<br />

defined [7]. Examples include dodecahedral nano-scaffolds composed<br />

from 20 trisoligos each bearing three individually defined sequences [8].<br />

It was dem<strong>on</strong>strated that the c<strong>on</strong>nectivity informati<strong>on</strong> in the nanoscaffold<br />

juncti<strong>on</strong>s can be copied by chemical means [9]. Chemical<br />

copyingschemesmaybeseeninc<strong>on</strong>juncti<strong>on</strong>with“surface-promoted<br />

replicati<strong>on</strong> and exp<strong>on</strong>ential amplificati<strong>on</strong> of DNA analogues” (SPREAD) [5].<br />

Applicati<strong>on</strong>s of such scaffolds include the positi<strong>on</strong>ing of modular<br />

functi<strong>on</strong>s such as multidentate thioether-based gold cluster labels<br />

(RUBiGold) [10] which have been tailored for m<strong>on</strong>oc<strong>on</strong>jugability and<br />

thermostability.<br />

My lecture will introduce chemical self-replicati<strong>on</strong> and multicomp<strong>on</strong>ent<br />

assembly as facets of systems chemistry [3,11,12] - a nascent field which<br />

understands itself as the bottom-up pendant of systems biology towards<br />

synthetic biology [14]. The field is clearly inspired by the origin-of-life<br />

problem but goes bey<strong>on</strong>d traditi<strong>on</strong>al prebiotic chemistry in its missi<strong>on</strong><br />

towards a quantititive dynamic understanding of complex reacti<strong>on</strong><br />

networks with autocatalytic comp<strong>on</strong>ents. Software tools like our SimFit<br />

and kinetic NMR titrati<strong>on</strong> [2] may significantly help to decipher signatures<br />

of interesting dynamic phenomena such as self-replicati<strong>on</strong>, chiral<br />

symmetry-breaking, and metabolic autocatalysis found in organic [13]<br />

and biomolecular [12] reacti<strong>on</strong> systems.


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

References<br />

1. v<strong>on</strong> Kiedrowski G: Angew Chem Int Ed Engl 1986, 25:932.<br />

2. Stahl I, v<strong>on</strong> Kiedrowski G: J Am Chem Soc 2006, 28:14014-14015.<br />

3. Kindermann M, et al: Angew Chem 2005, 117:6908-6913.<br />

4. Sievers D, v<strong>on</strong> Kiedrowski G: Nature 1994, 369:221.<br />

5. Luther A, Brandsch R, v<strong>on</strong> Kiedrowski G: Nature 1998, 396:245.<br />

6. Scheffler M, et al: Angew Chem Int Ed Engl 1999, 38:3312.<br />

7. v<strong>on</strong> Kiedrowski G, et al: Pure Appl Chem 2003, 75:609.<br />

8. Zimmermann J, et al: Angew Chem 2008, 120:3682-3686.<br />

9. Eckardt L, et al: Nature 2002, 420:286.<br />

10. Pankau WM, Mönninghoff S, v<strong>on</strong> Kiedrowski G: Angew Chem 2006,<br />

118:19<strong>23</strong>-1926.<br />

11. Stankiewicz J, Eckardt LH: Angew Chem 2006, 118:350-352.<br />

12. Hayden EJ, v<strong>on</strong> Kiedrowski G, Lehman N: Angew Chem 2008, 120:8552-8556.<br />

13. Kramer D, Pankau WM, v<strong>on</strong> Kiedrowski G: in press.<br />

14. MoU.<br />

O2<br />

The role of systems modeling in drug discovery and predictive health<br />

Eberhard O Voit<br />

Integrative BioSystems Institute and Wallace H. Coulter Department of<br />

Biomedical Engineering, Georgia Institute of Technology, 313 Ferst Drive,<br />

Atlanta, GA 30332-0535, USA<br />

E-mail: eberhard.voit@bme.gatech.edu<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O2<br />

Systems biology is the result of a c<strong>on</strong>fluence of recent advances in<br />

molecular biology, engineering, and the computati<strong>on</strong>al sciences. It can<br />

loosely be categorized into experimental and computati<strong>on</strong>al systems<br />

biology. Experimental high-throughput methods, assisted by robotics, image<br />

analysis, and bioinformatics, have been used in the drug industry for quite a<br />

while, and current screening tests for drug efficacy and toxicity regularly<br />

involve genomic, proteomic, and molecular modeling approaches. By<br />

c<strong>on</strong>trast, the role of computati<strong>on</strong>al methods of biological systems analysis is<br />

still emerging. This presentati<strong>on</strong> focuses <strong>on</strong> computati<strong>on</strong>al systems<br />

modeling and its increasingly important role at several junctures of the drug<br />

development pipeline. Examples to be discussed include mathematical<br />

models for receptor dynamics, pharmacokinetics, and metabolic and<br />

signaling pathway analysis. In the c<strong>on</strong>text of the latter, Biochemical Systems<br />

Theory is proposed as a highly advantageous default framework for model<br />

design, diagnostics, manipulati<strong>on</strong>, and system optimizati<strong>on</strong>. The<br />

development of dynamic models for complex disease processes permits the<br />

straightforward inclusi<strong>on</strong> of methods for custom-tailoring models, which is a<br />

key step toward pers<strong>on</strong>alized medicine and predictive health [1-5].<br />

References<br />

1. Voit EO: Computati<strong>on</strong>al Analysis of Biochemical Systems. A Practical Guide<br />

for Biochemists and Molecular Biologists Cambridge University Press,<br />

Cambridge, UK 2000, xii + 530.<br />

2. Voit EO: Metabolic modeling: A tool of drug discovery in the postgenomic<br />

era. Drug Discovery Today 2002, 7(11):621-628.<br />

3. Voit EO: The dawn of a new era of metabolic systems analysis. Drug<br />

Discovery Today BioSilico 2004, 2(5):182-189.<br />

4. Voit EO, Brigham KL: The role of systems biology in predictive health and<br />

pers<strong>on</strong>alized medicine. The Open Path. J 2008, 2:68-70.<br />

5. Voit EO: A systems-theoretical framework for health and disease. Math<br />

Biosc 2009, 217:11-18.<br />

O3<br />

Molecular bioactivity extrapolati<strong>on</strong> to novel targets by support vector<br />

machines<br />

Gerard JP Van Westen 1* ,JKWegner 2 , AP IJzerman 1 , HWT Van Vlijmen 2 , A Bender 1<br />

1 Amsterdam Center for Drug Research, Einsteinweg 55, <strong>23</strong>33 CC, Leiden, The<br />

Netherlands; 2 Tibotec, Gen De Wittelaan L 11B 3, 2800 Mechelen, Belgium<br />

E-mail: gerard@lacdr.leidenuniv.nl<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O3<br />

The early phases of drug discovery use in silico models to rati<strong>on</strong>alize<br />

structure activity relati<strong>on</strong>ships, and to predict the activity of novel<br />

Page 3 of 25<br />

compounds. However, the performance of these models is not always<br />

acceptable and the reliability of external predicti<strong>on</strong>s - both to novel<br />

compounds and to related protein targets - is often limited.<br />

Proteochemometric modeling [1] adds a target descripti<strong>on</strong>, based <strong>on</strong><br />

physicochemical properties of the binding site, to these models.<br />

Our proteochemometric models [2] are based <strong>on</strong> Scitegic circular<br />

fingerprints <strong>on</strong> the compound side and <strong>on</strong> a customized protein<br />

fingerprint <strong>on</strong> the target side. This protein fingerprint is based <strong>on</strong> a<br />

selecti<strong>on</strong> of physicochemical descriptors obtained from the AAindex<br />

database. Through PCA we selected a number of physicochemical<br />

properties which are hashed in a fingerprint using the Scitegic hashing<br />

algorithm. We compared this fingerprint to a number of protein<br />

descriptors previously published, including the Z-scales, the FASGAI and<br />

the BLOSUM descriptors. Our fingerprint performs superior to all of<br />

these. In additi<strong>on</strong>, we show that proteochemometric models improve<br />

external predicti<strong>on</strong> capabilities. In the case of classificati<strong>on</strong> this leads to<br />

models with a higher specificity when compared to c<strong>on</strong>venti<strong>on</strong>al QSAR.<br />

In the case of regressi<strong>on</strong> our models show an average lower RMSE of<br />

0.12 log units when based <strong>on</strong> a pIC50 output variable compared to<br />

c<strong>on</strong>venti<strong>on</strong>al QSAR modeling the same data-set. Furthermore, our<br />

models enable target extrapolati<strong>on</strong>. As a result we can predict the<br />

activity of known and new compounds <strong>on</strong> new targets while retaining<br />

the same model quality as when performing external validati<strong>on</strong> without<br />

target extrapolati<strong>on</strong>.<br />

References<br />

1. Freyhult E, Prusis P, Lapinsh M, Wikberg JE, Moult<strong>on</strong> V, Gustafss<strong>on</strong> MG:<br />

Unbiased descriptor and parameter selecti<strong>on</strong> c<strong>on</strong>firms the<br />

potential of proteochemometric modelling. BMC Bioinformatics 2005,<br />

6:50.<br />

2. Doddareddy MKR, Van Westen GJP, Horst Van der E, Peir<strong>on</strong>cely JE,<br />

Corthals F, IJzerman AP, Emmerich M, Jenkins JL, Bender A:<br />

Chemogenomics: Looking at Biology through the Lens of Chemistry. Stat<br />

Anal Data Mining 2009 in press.<br />

O4<br />

Representati<strong>on</strong> and searching of biomolecules<br />

Joeseph L Durant * , WL Chen, BD Christie, DL Grier, BA Leland, JG Nourse<br />

Symyx Technologies, 2440 Camino Ram<strong>on</strong>, San Ram<strong>on</strong>, California, USA<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O4<br />

Biomolecules present challenges to chemical informati<strong>on</strong> systems<br />

designed for small molecules. Their sizes, up to tens of thousands of<br />

atoms, overwhelm representati<strong>on</strong>/storage/searching soluti<strong>on</strong>s built <strong>on</strong><br />

explicit chemical representati<strong>on</strong> of the structures. But biomolecules are<br />

largely made up of many repeats of a limited number of buildingblock<br />

molecules, a fact which has been used to provide a compressed<br />

representati<strong>on</strong> for biomolecules using templates for the building<br />

blocks.<br />

We have adopted a modified template-based representati<strong>on</strong> for<br />

biomolecules. Our primary interest is in the chemically modified porti<strong>on</strong>s<br />

of biomolecules, for which we choose to use explicit chemistry. These<br />

areas of explicit chemistry are then embedded in the templatecompressed,<br />

unmodified porti<strong>on</strong>s of the full biomolecule.<br />

The regi<strong>on</strong>s c<strong>on</strong>taining explicit chemistry are indexed, and thus can be<br />

structure searched with good performance. A limited number of residues<br />

surrounding explicit chemistry regi<strong>on</strong>s are included in the index for<br />

searching the c<strong>on</strong>text of these explicit regi<strong>on</strong>s. By using explicit chemistry<br />

to represent modified regi<strong>on</strong>s we can search across classes of<br />

modificati<strong>on</strong>s for comm<strong>on</strong> features. For example a single substructure<br />

search query will find green fluorescent protein, and its histidine,<br />

phenylalanine and tryptophan analogs.<br />

Templates are stored with the structure providing a self-c<strong>on</strong>tained file<br />

format. The use of NEMA keys allows templates from different structures<br />

to be compared, and allows storage of structures c<strong>on</strong>taining a can<strong>on</strong>ical<br />

list of templates. The residues have defined attachment points, allowing<br />

automated traversal of a protein backb<strong>on</strong>e, or locati<strong>on</strong> of n<strong>on</strong>-backb<strong>on</strong>e<br />

b<strong>on</strong>ds to residues.<br />

We will present example structures and structural queries highlighting<br />

capabilities of our representati<strong>on</strong>.


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

O5<br />

Cross-validati<strong>on</strong> is dead. L<strong>on</strong>g live cross-validati<strong>on</strong>! Model validati<strong>on</strong><br />

based <strong>on</strong> resampling<br />

Knut Baumann<br />

Institute of Pharmaceutical Chemistry, University of Technology<br />

Braunschweig, Beethovenstr. 55, D-38106 Braunschweig, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O5<br />

Cross-validati<strong>on</strong> was originally invented to estimate the predicti<strong>on</strong> error of<br />

a mathematical modelling procedure. It can be shown that crossvalidati<strong>on</strong><br />

estimates the predicti<strong>on</strong> error almost unbiasedly. N<strong>on</strong>etheless,<br />

there are numerous reports in the chemoinformatic literature that crossvalidated<br />

figures of merit cannot be trusted and that a so-called external<br />

test set has to be used to estimate the predicti<strong>on</strong> error of a mathematical<br />

model. In most cases where cross-validati<strong>on</strong> fails to estimate the<br />

predicti<strong>on</strong> error correctly, this can be traced back to the fact that it was<br />

employed as an objective functi<strong>on</strong> for model selecti<strong>on</strong>. Typically each<br />

model has some meta-parameters that need to be tuned such as the<br />

choice of the actual descriptors and the number of variables in a QSAR<br />

equati<strong>on</strong>, the network topology of a neural net, or the complexity of a<br />

decisi<strong>on</strong> tree. In this case the meta-parameter is varied and the crossvalidated<br />

predicti<strong>on</strong> error is determined for each setting. Finally, the<br />

parameter setting is chosen that optimizes the cross-validated predicti<strong>on</strong><br />

error in an attempt to optimize the predictivity of the model. However, in<br />

these cases cross-validati<strong>on</strong> is no l<strong>on</strong>ger an unbiased estimator of the<br />

predicti<strong>on</strong> error and may grossly deviate from the result of an external<br />

test set. It can be shown that the “amount” of model selecti<strong>on</strong> can<br />

directly be related to the inflati<strong>on</strong> of cross-validated figures of merit.<br />

Hence, the model selecti<strong>on</strong> step has to be separated from the step of<br />

estimating the predicti<strong>on</strong> error. If this is d<strong>on</strong>e correctly, cross-validati<strong>on</strong><br />

(or resampling in general) retains its property of unbiasedly estimating<br />

the predicti<strong>on</strong> error. Matter of factly, it can be shown that data splitting<br />

into a training set and an external test set often estimates the predicti<strong>on</strong><br />

error less precise than proper cross-validati<strong>on</strong>. It is this variabability of<br />

predicti<strong>on</strong> errors, which depends <strong>on</strong> test set size, that causes seemingly<br />

paradox phenomena such as the so-called “Kubinyi’s paradox<strong>on</strong>” for small<br />

data sets.<br />

O6<br />

A unified approach to the applicability domain problem of<br />

QSAR models<br />

Dragos Horvath * , Gilles Marcou, Alexandre Varnek<br />

Laboratoire d’InfoChimie, Université de Strasbourg - CNRS, Institut de Chimie,<br />

4 rue Blaise Pascal, 67000 Strasbourg, France<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O6<br />

The present work proposes a unified c<strong>on</strong>ceptual framework to describe<br />

and quantify the important issue of the Applicability Domains (AD) of<br />

Quantitative Structure-Activity Relati<strong>on</strong>ships (QSARs). AD models are<br />

c<strong>on</strong>ceived as meta-models designed to associate an untrustworthiness<br />

score to any molecule M subject to property predicti<strong>on</strong> by a QSAR<br />

model. Untrustworthiness scores or “AD metrics” are an expressi<strong>on</strong> of the<br />

relati<strong>on</strong>ship between M (represented by its descriptors in chemical space)<br />

and the space z<strong>on</strong>es populated by the training molecules at the basis of<br />

model μ. Scores integrating some of the classical AD criteria (similaritybased,<br />

box-based) were c<strong>on</strong>sidered in additi<strong>on</strong> to newly invented terms,<br />

such as the dissimilarity to outlier-free training sets and the correlati<strong>on</strong><br />

breakdown count.<br />

A loose correlati<strong>on</strong> is expected to exist between this untrustworthiness<br />

and the error affecting the predicted property. While high<br />

untrustworthiness does not preclude correct predicti<strong>on</strong>s, inaccurate<br />

predicti<strong>on</strong>s at low untrustworthiness must be imperatively avoided. This<br />

kind of relati<strong>on</strong>ship is characteristic for the Neighborhood Behavior (NB)<br />

problem: dissimilar molecule pairs may or may not display similar<br />

properties, but similar molecule pairs with different properties are<br />

explicitly “forbidden”. Therefore, statistical tools developed to tackle this<br />

latter aspect were applied, and lead to a unified AD metric<br />

benchmarking scheme.<br />

A first use of untrustworthiness scores resides in prioritizati<strong>on</strong> of<br />

predicti<strong>on</strong>s, without need to specify a hard AD border. Moreover, if a<br />

Page 4 of 25<br />

significant set of external compounds is available, the formalism allows<br />

optimal AD borderlines to be fitted. Eventually, c<strong>on</strong>sensus AD definiti<strong>on</strong>s<br />

were built by means of a n<strong>on</strong>parametric mixing scheme of two AD<br />

metrics of comparable quality, and shown to outperform their respective<br />

parents.<br />

O7<br />

Quantifying model errors using similarity to training data<br />

Rob D Brown 1* , JD H<strong>on</strong>eycutt 1 , SL Aar<strong>on</strong> 2<br />

1 2<br />

Accelrys Inc, 10188 Telesis Court, San Diego, CA 92121, USA; Accelrys Inc,<br />

Cambridge, UK<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O7<br />

When making a predicti<strong>on</strong> with a statistical model, it is not sufficient to<br />

know that the model is “good”, in the sense that it is able to make<br />

accurate predicti<strong>on</strong>s <strong>on</strong> test data. Another relevant questi<strong>on</strong> is: How good<br />

is the model for a specific sample whose properties we wish to predict?<br />

Stated another way: Is the sample within or outside the model’s domain<br />

of applicability or what is the degree to which a test compound is within<br />

the model’s domain of applicability. Numerous studies have been d<strong>on</strong>e<br />

<strong>on</strong> determining appropriate measures to address this questi<strong>on</strong> [1-4]. Here<br />

we focus <strong>on</strong> a derivative questi<strong>on</strong>: Can we determine an applicability<br />

domain measure suitable for deriving quantitative error bars – that is,<br />

error bars which accurately reflect the expected error when making<br />

predicti<strong>on</strong>s for specified values of the domain measure? Such a measure<br />

could then be used to provide an indicati<strong>on</strong> of the c<strong>on</strong>fidence in a given<br />

predicti<strong>on</strong> (i.e. the likely error in a predicti<strong>on</strong> based <strong>on</strong> to what degree<br />

the test compound is part of the model’s domain of applicability). Ideally,<br />

we wish such a measure to be simple to calculate and to understand, to<br />

apply to models of all types – including classificati<strong>on</strong> and regressi<strong>on</strong><br />

models for both molecular and n<strong>on</strong>-molecular data - and to be free of<br />

adjustable parameters. C<strong>on</strong>sistent with recent work by others [5,6], the<br />

measures we have seen that best meet these criteria are distances to<br />

individual samples in the training data. We describe our attempts to<br />

c<strong>on</strong>struct a recipe for deriving quantitative error bars from these<br />

distances.<br />

References<br />

1. Erikss<strong>on</strong> L, Jaworska J, Worth AP, Cr<strong>on</strong>in MTD, McDowell RM, Gramatica P:<br />

Methods for Reliability and Uncertainty Assessment and for Applicability<br />

Evaluati<strong>on</strong>s of Classificati<strong>on</strong>- and Regressi<strong>on</strong>-Based QSARs. Envir<strong>on</strong>mental<br />

Health Perspectives 2003, 111:1361.<br />

2. Tropsha A, Gramatica P, Gombar VK: The Importance of Being Earnest:<br />

Validati<strong>on</strong> is the Absolute Essential for Successful Applicati<strong>on</strong> and<br />

Interpretati<strong>on</strong> of QSPR Models. QSAR Comb Sci 2003, 22:69.<br />

3. Jaworska J, Nikolova-Jeliazkova N, Aldenberg T: QSAR applicabilty domain<br />

estimati<strong>on</strong> by projecti<strong>on</strong> of the training set descriptor space: a review.<br />

Altern Lab Anim 2005, 33:445-59.<br />

4. Stanforth RW, Kolossov E, Mirkin B: A Measure of Domain of Applicability<br />

for QSAR Modelling Based <strong>on</strong> Intelligent K-Means Clustering. QSAR &<br />

Combinatorial Science 2007, 26:837.<br />

5. Sheridan RP, Feust<strong>on</strong> BP, Maiorov VN, Kearsley SK: Similarity to Molecules<br />

in the Training Set Is a Good Discriminator for Predicti<strong>on</strong> Accuracy in<br />

QSAR. J Chem Inf Comput Sci 2004, 44:1912.<br />

6. Horvath D, Marcou G, Varnek A: Predicting the Predictability: A Unified<br />

Approach to the Applicability Domain Problem of QSAR Models. J Chem<br />

Inf Comput Sci 2009, 49:49.<br />

O8<br />

A Branch-and-Bound approach for tautomer enumerati<strong>on</strong><br />

Torsten Thalheim * , R-U Ebert, R Kühne, G Schüürmann<br />

UFZ Department Ecological Chemistry - Helmholtz Centre for Envir<strong>on</strong>mental<br />

Research, Permoser Str. 15, 04318 Leipzig, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O8<br />

For molecules with mobile H atoms, the result of quantitative structureactivity<br />

relati<strong>on</strong>ships (QSARs) depends <strong>on</strong> the positi<strong>on</strong> of the respective<br />

hydrogen atoms. Thus to obtain reliable results, tautomerism needs to be<br />

taken into account.<br />

In the last years, many approaches were introduced to achieve this. In<br />

this work we present a further development of our previous algorithm


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

based <strong>on</strong> InChI-layers. While the InChI ansatz supports <strong>on</strong>ly heteroatomtautomerism,<br />

we suggest an extensi<strong>on</strong> regarding carb<strong>on</strong> atoms too.<br />

Whereas with other tautomer generating algorithms the hydrogen shifts<br />

are based <strong>on</strong> pattern-rules, we try to overcome the rule c<strong>on</strong>stricti<strong>on</strong> and<br />

evolve a more comm<strong>on</strong> soluti<strong>on</strong>. The advantage of our approach is quite<br />

simple. Due to the avoidance of a rule system with its necessity for<br />

excepti<strong>on</strong>s to the rules, we can apply our soluti<strong>on</strong> to any kind of<br />

tautomerism definiti<strong>on</strong>.<br />

We set up a Branch-and-Bound approach, which is optimized to generate<br />

a complete enumerati<strong>on</strong> of all tautomers, with regard to a certain<br />

definiti<strong>on</strong>, from any structure. With few and easy decisi<strong>on</strong>s like symmetry<br />

detecti<strong>on</strong>, we avoid a lot of calculati<strong>on</strong> overhead. Decisi<strong>on</strong>s with<br />

significant influence <strong>on</strong> the algorithm efficiency are made as early as<br />

possible.<br />

We have set up several kinds of tautomer definiti<strong>on</strong>s and derived a stable<br />

definiti<strong>on</strong> covering the major kinds of protropic tautomerism.<br />

Furthermore we analyzed, what expenditure of time for large databases<br />

(case study: more than 70,000 entries) is needed to investigate which<br />

structures have tautomers and which not for more than 99% of the<br />

database entries.<br />

This study has been financially supported by the EU project OSIRIS<br />

(IP, c<strong>on</strong>tract no. 037017).<br />

O9<br />

KnowledgeSpace - a publicly available virtual chemistry space<br />

Carsten Detering * , H Claussen, M Gastreich, C Lemmen<br />

BioSolveIT GmbH, An der Ziegelei 79, 53757 Sankt Augustin, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O9<br />

Virtual high throughput screening of in-house compound collecti<strong>on</strong>s and<br />

vendor catalogs is a validated approach in the quest for novel molecular<br />

entities. However, these libraries are small compared to the overall<br />

synthesizable number of compounds from validated “wet” chemical<br />

reacti<strong>on</strong>s in pharma companies or the public domain.<br />

In order to overcome this limitati<strong>on</strong>, we designed a large virtual<br />

combinatorial chemistry space from publicly available combinatorial<br />

libraries that gives access to billi<strong>on</strong>s of synthetically accessible<br />

compounds. Together with FTrees, a fuzzy similarity calculator, the<br />

researcher has a means of searching this KnowledgeSpace for analogues<br />

to <strong>on</strong>e or several query molecules within a few minutes. The resulting<br />

compounds not <strong>on</strong>ly exhibit similar properties to the query molecule(s),<br />

but also feature an annotati<strong>on</strong> through which of the synthetic routes<br />

these molecules can be made. Results can be expected to be diverse,<br />

based <strong>on</strong> FTrees scaffold hopping capabilities, and provide ideas for hit<br />

follow-up into novel compound classes.<br />

In this c<strong>on</strong>tributi<strong>on</strong> we present the design and properties of the<br />

KnowledgeSpace and other in-house chemistry spaces that build <strong>on</strong> the<br />

same strategy as well as validati<strong>on</strong> of results and a number of successful<br />

applicati<strong>on</strong>s including prospective results.<br />

O10<br />

3D pharmacophore alignments: does improved geometric accuracy<br />

affect virtual screening performance?<br />

Gerhard Wolber 1* , T Seidel 2 , F Bendix 2<br />

1 University of Innsbruck, Institute of Pharmacy, Dept. Pharmaceutical<br />

Chemistry, Innrain 52, 6020 Innsbruck, Austria; 2 Inte:Ligand GmbH,<br />

Mariahilferstrasse 74B/11, A-1070 Vienna, Austria<br />

E-mail: gerhard.wolber@uibk.ac.at<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O10<br />

Virtual screening using three-dimensi<strong>on</strong>al arrangements of chemical<br />

features (3D pharmacophores) has become an important method in<br />

computer-aided drug design. Although frequently used, c<strong>on</strong>siderable<br />

differences exist in the interpretati<strong>on</strong> of these chemical features and their<br />

corresp<strong>on</strong>ding 3D overlay algorithms. We have recently developed an<br />

efficient and accurate 3D alignment algorithm based <strong>on</strong> a pattern<br />

recogniti<strong>on</strong> technique [1]. In the presented work, we extend this<br />

algorithm to be used for high-performance virtual database screening<br />

and investigate, whether applying this geometrically more accurate 3D<br />

Page 5 of 25<br />

alignment algorithm improves virtual screening results over c<strong>on</strong>venti<strong>on</strong>al<br />

incremental n-point distance matching approaches.<br />

Reference<br />

1. Wolber G, Dornhofer A, Langer T: Efficient overlay of small molecules<br />

using 3-D pharmacophores. J Comput-Aided Mol Design 2006,<br />

20(12):773-788.<br />

O11<br />

The protein flexibility in receptor-ligand docking simulati<strong>on</strong>s<br />

Frank Tristram<br />

Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermannv<strong>on</strong>-Helmholtz-Platz<br />

1, 76344 Eggenstein-Leopoldshafen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O11<br />

Many small-molecule drugs work by binding specifically to a target<br />

protein in the cell. It is known for over a century that both the ligand<br />

and protein receptor change their c<strong>on</strong>formati<strong>on</strong> in the associati<strong>on</strong><br />

process, which is called the induced-fit effect. Ligand c<strong>on</strong>formati<strong>on</strong>al<br />

change is routinely treated in methods for in-silico drug discovery,<br />

because typical drug molecules have seldom more than 100 atoms. The<br />

flexibility of proteins, which often have more than 10000 atoms, is much<br />

harder to treat and was therefore neglected in most high-speed docking<br />

methods, limiting their accuracy and predictive value.<br />

In this project we have developed an efficient numerical search<br />

procedure that succeeds to model the interacti<strong>on</strong> between a flexible<br />

ligand and important flexible parts of the protein in a computati<strong>on</strong>ally<br />

affordable protocol. Our virtual screening software FlexScreen thus<br />

minimizes the total interacti<strong>on</strong> energy of the emerging protein-ligand<br />

complex including flexible backb<strong>on</strong>e regi<strong>on</strong>s. We have dem<strong>on</strong>strated the<br />

reliability and accuracy of this approach <strong>on</strong> several examples, for which at<br />

least two different receptor-c<strong>on</strong>formati<strong>on</strong>s have been experimentally<br />

observed. We succeeded to predict the correct binding pose of a proteinligand<br />

complex starting from the crystal structure of an unrelated protein<br />

c<strong>on</strong>formati<strong>on</strong>, where large c<strong>on</strong>formati<strong>on</strong>al changes of the protein are<br />

necessary to bind the ligand. Correctly predicting such c<strong>on</strong>formati<strong>on</strong>al<br />

changes makes our approach attractive for virtual screening of medium<br />

sized databases, in particular for kinases and other target receptors which<br />

have flexible binding pockets.<br />

O12<br />

Random molecular substructures as fragment-type descriptors<br />

José Batista 1,2<br />

1 Jado Technologies GmbH, Tatzberg 47-51, D-01307 Dresden, <str<strong>on</strong>g>German</str<strong>on</strong>g>y;<br />

2 Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical<br />

Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universtität<br />

B<strong>on</strong>n, Dahlmannstr. 2, D-53113 B<strong>on</strong>n, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O12<br />

A novel approach for analysis of structure-activity relati<strong>on</strong>ships for sets of<br />

active compounds is reported. Molecular similarity relati<strong>on</strong>ships are<br />

analyzed am<strong>on</strong>g compounds with related biological activity based <strong>on</strong> the<br />

evaluati<strong>on</strong> of randomly generated fragment populati<strong>on</strong>s. Fragments are<br />

randomly generated by iterative b<strong>on</strong>d deleti<strong>on</strong> in molecular graphs and<br />

hence depart from those obtained by well-defined fragmentati<strong>on</strong><br />

schemes [1]. A major finding has been that for any given activity class,<br />

dependency relati<strong>on</strong>ships of fragment co-occurrence exists within<br />

random fragment populati<strong>on</strong>s. Through the analysis of these relati<strong>on</strong>ships<br />

the chemical informati<strong>on</strong> c<strong>on</strong>tent of generated substructures can be<br />

quantified [2]. This has lead to the identificati<strong>on</strong> of small numbers (10-<br />

40) of so-called activity class characteristic substructures (ACCS) that have<br />

been successfully utilized in virtual screening applicati<strong>on</strong>s [3,4].<br />

References<br />

1. Batista J, Godden JW, Bajorath J: J Chem Inf Model 2006, 46(5):1937-1944,<br />

(PMID: 16995724).<br />

2. Batista J, Bajorath J: J Chem Inf Model 2007, 47(4):1405-1413, (PMID:<br />

17585755).<br />

3. Batista J, Bajorath J: Chem Med Chem 2008, 3(1):67-73, (PMID: 17952883).<br />

4. Stumpfe D, Frizler M, Sisay MT, Batista J, Vogt I, Gütschow M, Bajorath J:<br />

Chem Med Chem 2009, 4(1):52-54, (PMID: 19053132).


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

O13<br />

Making people’s lives easier, better and more beautiful - and how<br />

computer-aided materials design may help<br />

Thomas Kostka<br />

Henkel AG & KGaA, Henkelstrasse 67, 40191 Düsseldorf, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O13<br />

Computer-based simulati<strong>on</strong>s provide new insights at the molecular scale<br />

of chemical processes and material properties - new products may be<br />

developed more ec<strong>on</strong>omically by a more focused strategy. Thus, the<br />

overall product development cycle for new materials and active<br />

ingredients can be reduced significantly.<br />

R&D explores and evaluates c<strong>on</strong>tinuously these possibilities and utilizes<br />

the enormous potential linked with scientific computing technologies.<br />

The method portfolio ranges from quantum chemical, molecular<br />

mechanics and mesoscopic simulati<strong>on</strong>s to QSAR approaches.<br />

Examples will be given how scientific computing methods c<strong>on</strong>tribute to<br />

innovative products with state-of-the-art molecular modeling and<br />

mathematical methods - both, for c<strong>on</strong>sumer goods and industrial<br />

applicati<strong>on</strong>s.<br />

O14<br />

Biomaterialcoatings - a challenging task studied by the molecular<br />

fragment dynamics<br />

Carsten Wittekindt * , H Kuhn<br />

CAM-D Technologies GmbH, Schützenbahn 70, 45117 Essen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

E-mail: wittekindt@molecular-dynamics.de<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O14<br />

A special and important task in medicinal implant technology is to<br />

preventthematerialfromfoulingormakeitcompatiblefortissuesby<br />

coating the material with functi<strong>on</strong>alized lipid biomolecules. In order to<br />

rati<strong>on</strong>alize the design of such new materials structural informati<strong>on</strong> about<br />

the film is needed. In this presentati<strong>on</strong> we give an account <strong>on</strong> the<br />

investigati<strong>on</strong> of coating of the tetraetherlipid GDNT <strong>on</strong> a borofluorate<br />

surface in different solvents. To gain insights into the dynamic process <strong>on</strong><br />

the microsec<strong>on</strong>d and micrometer scale we applied our recently<br />

developed Molecular Fragment Dynamics method (MFD) [1].<br />

Aside of the size of the system and the durati<strong>on</strong> of the simulati<strong>on</strong>, the<br />

challenging task of the investigati<strong>on</strong> lies in the divers chemical<br />

functi<strong>on</strong>ality of the compounds. The applied MFD algorithm is developed<br />

to resolve especially these problems. The MFD algorithm is based <strong>on</strong> the<br />

Dissipative-Particle-Dynamics method (DPD). Whereas in the DPD method<br />

the system is divided into different regi<strong>on</strong>s of fluid subsystems, the MFD<br />

algorithm calculates the interacti<strong>on</strong> of molecular fragments.<br />

C<strong>on</strong>sequently, the MFD method is well suited for the molecular<br />

modelling simulati<strong>on</strong> of systems having a diverse variety of chemical<br />

functi<strong>on</strong>alities thus allowing for investigati<strong>on</strong> of biocoatings, polymers<br />

and tensids. Figure 1.<br />

Reference<br />

1. Ryjkina E, Kuhn H, Rehage H, Müller F, Peggau J: Molecular Dynamic<br />

Computer Simulati<strong>on</strong>s of Phase Behavior of N<strong>on</strong>-I<strong>on</strong>ic Surfactants.<br />

Angew Chem Int Ed 2002, 41:983.<br />

Figure 1 (abstract O14).<br />

O15<br />

Mechanistic studies <strong>on</strong> the ring-opening polymerisati<strong>on</strong> of D,L-lactide<br />

with zinc guanidine complexes<br />

S Herres-Pawlis 1* , J Börner 2 , Vieira I dos Santos 2 , U Flörke 2<br />

1 2<br />

TU Dortmund, Fakultät Chemie, 44221 Dortmund, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; University of<br />

Paderborn, Paderborn, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O15<br />

The increasing shortage of resources has enforced the search for<br />

producti<strong>on</strong> techniques for biodegradable polymers made of renewable<br />

raw materials. An example is polylactide (PLA), an aliphatic polyester,<br />

which can be obtained by the metal-catalysed ring-opening<br />

polymerisati<strong>on</strong> (ROP) of lactide (Fig. 1). The m<strong>on</strong>omer for PLA producti<strong>on</strong><br />

is available from corn or sugar beets by a bacterial fermentati<strong>on</strong> process<br />

in few steps [1] The PLA can be either recycled or composted after use,<br />

making the use CO 2-emissi<strong>on</strong>-neutral. Most large-scale processes are<br />

based <strong>on</strong> the use of tin compounds as initiators [2]. However, for use in<br />

food packaging or similar applicati<strong>on</strong>s, heavy metals are undesirable [3].<br />

In order to substitute heavy-metal-based catalyst systems, especially zinc<br />

guanidine complexes are suited for the polymerisati<strong>on</strong> due to their<br />

polymerisati<strong>on</strong> activity and n<strong>on</strong>-toxicity [4]. The investigati<strong>on</strong>s <strong>on</strong> the<br />

catalytic potential in the bulk polymerisati<strong>on</strong> of lactide have shown that<br />

these compounds are able to act as initiators for lactide polymerisati<strong>on</strong><br />

with <strong>on</strong>ly few excepti<strong>on</strong>s. Polylactides with molecular weights (MW)<br />

between 20.000 and 170.000 g/mol could be obtained.<br />

As the polymerisati<strong>on</strong> activity depends str<strong>on</strong>gly <strong>on</strong> the ligand structure<br />

(spacer, guanidine and amine unit), DFT studies <strong>on</strong> the complex<br />

properties have been performed [5]. A correlati<strong>on</strong> between calculated<br />

partial charges <strong>on</strong> the zinc and the guanidine N atoms and the catalytic<br />

activity has been found. The guanidine is supposed to open<br />

nucleophilically the lactide ring and supersede the presence of alkoxides<br />

which is traditi<strong>on</strong>ally required. This crucial step in the ROP has been<br />

analysed by DFT and a mecha-nism will be presented. The deeper<br />

understanding of the catalytic transiti<strong>on</strong> state will enable a more efficient<br />

access to catalyst design. Figure 2.<br />

Figure 1 (abstract O15). ROP of D,L-lactide<br />

Figure 2 (abstract O15). General motif for hybridquanidines<br />

Page 6 of 25


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

References<br />

1. Bigg DM: Adv Polym Tech 2005, 24:69.<br />

2. Kricheldorf HR, Meier-Haak J: Macromol Chem 1993, 194:715.<br />

3. Kricheldorf HR: Chemosphere 2001, 43:49.<br />

4. Börner J, Herres-Pawlis S, Flörke U, Huber K: Eur J Inorg Chem 2007, 5645.<br />

5. Börner J, Flörke U, Huber K, Döring A, Kuckling D, Herres-Pawlis S: Chem Eur<br />

J 2009, 15:<strong>23</strong>62.<br />

O16<br />

ChemSpider - building a foundati<strong>on</strong> for the semantic web by hosting<br />

a crowd sourced databasing platform for chemistry<br />

Anth<strong>on</strong>y J Williams 1* , Valery Tkachenko 2 , Sergey Golotvin 2 , Richard Kidd 3 ,<br />

Graham McCann 3<br />

1<br />

Royal Society of Chemistry, 904 Tamaras Circle, Wake Forest, NC-27587, USA;<br />

2 3<br />

NCBI, NLM, Nati<strong>on</strong>al Institutes of Health, Washingt<strong>on</strong>, USA; Royal Society of<br />

Chemistry, Cambridge, UK<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O16<br />

There is an increasing availability of free and open access resources for<br />

chemists to use <strong>on</strong> the internet. Coupled with the increasing availability<br />

of Open Source software tools we are in the middle of a revoluti<strong>on</strong> in<br />

data availability and tools to manipulate these data. ChemSpider is a free<br />

access website for chemists built with the intenti<strong>on</strong> of providing a<br />

structure centric community for chemists. It was developed with the<br />

intenti<strong>on</strong> of aggregating and indexing available sources of chemical<br />

structures and their associated informati<strong>on</strong> into a single searchable<br />

repository and making it available to everybody, at no charge.<br />

There are tens if not hundreds of chemical structure databases such as<br />

literature data, chemical vendor catalogs, molecular properties,<br />

envir<strong>on</strong>mental data, toxicity data, analytical data etc. and no single way<br />

to search across them. Despite the fact that there were a large number of<br />

databases c<strong>on</strong>taining chemical compounds and data available <strong>on</strong>line their<br />

inherent quality, accuracy and completeness was lacking in many regards.<br />

The intenti<strong>on</strong> with ChemSpider was to provide a platform whereby the<br />

chemistry community could c<strong>on</strong>tribute to cleaning up the data,<br />

improving the quality of data <strong>on</strong>line and expanding the informati<strong>on</strong><br />

available to include data such as reacti<strong>on</strong> syntheses, analytical data,<br />

experimental properties and linking to other valuable resources. It has<br />

grown into a resource c<strong>on</strong>taining over 21 milli<strong>on</strong> unique chemical<br />

structures from over 200 data sources.<br />

ChemSpider has enabled real time curati<strong>on</strong> of the data, associati<strong>on</strong> of<br />

analytical data with chemical structures, real-time depositi<strong>on</strong> of single<br />

or batch chemical structures (including with activity data) and<br />

transacti<strong>on</strong>-based predicti<strong>on</strong>s of physicochemical data. The social<br />

community aspects of the system dem<strong>on</strong>strate the potential of this<br />

approach. Curati<strong>on</strong> of the data c<strong>on</strong>tinues daily and thousands of edits<br />

and depositi<strong>on</strong>s by members of the community have dramatically<br />

improved the quality of the data relative to other public resources for<br />

chemistry.<br />

This presentati<strong>on</strong> will provide an overview of the history of ChemSpider,<br />

the present capabilities of the platform and how it can become <strong>on</strong>e of<br />

the primary foundati<strong>on</strong>s of the semantic web for chemistry. It will also<br />

discuss some of the present projects underway since the acquisiti<strong>on</strong> of<br />

ChemSpider by the Royal Society of Chemistry.<br />

O17<br />

Integrati<strong>on</strong> of chemical informati<strong>on</strong> with protein sequences and<br />

3D structures<br />

Adel Golovin * , K Henrick, G Kleywegt<br />

EMBL-EBI/PDBe, Hinxt<strong>on</strong> Hall, Genome Campus, Cambridge, Cams,<br />

CB10 1SD, UK<br />

E-mail: golovin@ebi.ac.uk<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O17<br />

The Protein Data Bank (PDB) c<strong>on</strong>tains a wealth of small molecule - macro<br />

molecule complexes the study of which c<strong>on</strong>tribute enormously to our<br />

understanding of the interacti<strong>on</strong>s. However, exploiting and mining this<br />

treasure trove of data requires advanced analysis and retrieval methods<br />

that take into account both types of molecules. One such method is<br />

PDBeMotif, that has been developed by the Protein Data Bank in Europe<br />

Page 7 of 25<br />

(PDBe) at EMBL-EBI. Utilizing a relati<strong>on</strong>al database model at the back-end,<br />

the data structure represents a network of molecule, residue and motif<br />

interacti<strong>on</strong>s as well as their relative positi<strong>on</strong>s in the sequence and in 3D.<br />

The loader applies a number of algorithms to analyse PDB and derive<br />

necessary informati<strong>on</strong>, such as planarity and aromaticity of the chemical<br />

compounds, hydrogen-b<strong>on</strong>ds network, coordinati<strong>on</strong> geometry, b<strong>on</strong>d<br />

types (including pi electr<strong>on</strong> interacti<strong>on</strong>s), 3D structural motifs, sequence<br />

domains and families. It collects informati<strong>on</strong> about sequence features,<br />

motifs and catalytic sites from available Distributed Annotati<strong>on</strong> System<br />

(DAS) resources. The web applicati<strong>on</strong> allows for a wide variety of searches<br />

and data analysis including protein motifs with chemical fragments<br />

associati<strong>on</strong>, protein sites characterisati<strong>on</strong>, correlating properties, hits<br />

multiple sequence and 3D alignments. The whole system is released<br />

under GPL and available with the source code from http://sourceforge.<br />

net/projects/pdbsam and <strong>on</strong> line at http://www.ebi.ac.uk/pdbe-site/<br />

PDBeMotif/<br />

References<br />

1. Golovin A, Henrick K: Chemical Substructure Search in SQL. J Chem Inf<br />

Model 2009, 49(1):22-27.<br />

2. Golovin A, Henrick K: MSDmotif: exploring protein sites and motifs.<br />

BMC Bioinformatics 2008, 9:312.<br />

3. Golovin A, Dimitropoulos D, Oldfield T, Rachedi A, Henrick K: MSDsite: A<br />

Database Search and Retrieval System for the Analysis and Viewing of<br />

Bound Ligands and Active Sites. PROTEINS: Structure, Functi<strong>on</strong>, and<br />

Bioinformatics 2005, 58(1):190-9.<br />

4. Golovin A, Oldfield TJ, Tate JG, Velankar S, Bart<strong>on</strong> GJ, Boutselakis H,<br />

Dimitropoulos D, Fill<strong>on</strong> J, Hussain A, I<strong>on</strong>ides JMC, John M, Keller PA,<br />

Krissinel E, McNeil P, Naim A, Newman R, Paj<strong>on</strong> A, Pineda J, Rachedi A,<br />

Copeland J, Sitnov A, Sobhany S, Suarez-Uruena A, Swaminathan J,<br />

Tagari M, Tromm S, Vranken W, Henrick K: E-MSD: an integrated data<br />

resource for bioinformatics. Nucleic Acids Research 2004, 32 Database:<br />

D211-D216.<br />

O18<br />

Pers<strong>on</strong>alized informati<strong>on</strong> spaces: improved access to<br />

chemical digital libraries<br />

O Koepler * , W-T Balke, B Köhncke, I Sens, S Tönnies<br />

Technische Informati<strong>on</strong>sbibliothek Hannover (TIB), Welfengarten,1B, 30167<br />

Hannover, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O18<br />

Today digital libraries provide access to a vast, but largely unstructured,<br />

amount of document collecti<strong>on</strong>s. Facing the ever increasing challenge of<br />

the informati<strong>on</strong> overload c<strong>on</strong>tent providers have to focus <strong>on</strong> new ways in<br />

user-centered retrieval, not <strong>on</strong>ly providing tools for searching informati<strong>on</strong>,<br />

but tools for pers<strong>on</strong>alizing, managing, evaluating and working with the<br />

returned search results.<br />

Within the ViFaChem II research project the TIB and the L3S Research<br />

Center, Hannover, have developed an enhanced retrieval platform for<br />

chemical digital libraries.<br />

The prototypical interface processes full text and bibliographic metadata<br />

collecti<strong>on</strong>s to create semantically enriched document collecti<strong>on</strong>s with<br />

chemical metadata. Processing steps include text mining for chemical<br />

entities, reacti<strong>on</strong> types and chemical structure rec<strong>on</strong>structi<strong>on</strong>. However,<br />

the ViFaChem digital library does not <strong>on</strong>ly offer the classical document<br />

access via a text or chemical structure search, but also semantic search<br />

based <strong>on</strong> the faceted browsing paradigm. In particular it includes the<br />

Semantic GrowBag algorithm, which automatically creates facets for<br />

navigati<strong>on</strong>al access and query refinement using the relati<strong>on</strong>ship between<br />

documents and author keywords. Based <strong>on</strong> higher-order co-occurrences<br />

of keywords in documents these graphs hierarchically arrange all related<br />

topics dynamically with respect to the underlying c<strong>on</strong>tent collecti<strong>on</strong>.<br />

Having different pers<strong>on</strong>al views <strong>on</strong> the document collecti<strong>on</strong> the user now<br />

can search and navigate through search results using individual retrieval<br />

strategies. Facets for chemical reacti<strong>on</strong>s, chemical entities or topics<br />

combined with an underlying <strong>on</strong>tology thus allow a semantic driven<br />

access to documents.<br />

Combined with Web 2.0 features the interface of ViFaChem II provides<br />

the user with a new experience in searching large document collecti<strong>on</strong>s,<br />

where the user can navigate through query results based <strong>on</strong> his<br />

pers<strong>on</strong>alized knowledge space.


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

O19<br />

Current aspects and future trends of computer-aided rescaffolding<br />

Karl-Heinz Baringhaus * , Gerhard Hessler, Thomas Klabunde<br />

Sanofi-Aventis Deutschland GmbH, CAS Drug Design, Building G 878, 65926<br />

Frankfurt am Main, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O19<br />

The competitive pressure in pharmaceutical industry is reflected by l<strong>on</strong>ger<br />

development times and increasing development costs yielding less new<br />

chemical entities. In particular, identificati<strong>on</strong> of suitable lead compounds<br />

is <strong>on</strong>e of the key challenges in early drug discovery. Next to well<br />

established techniques like high throughput screening (HTS) and virtual<br />

screening computer-aided rescaffolding has become an important<br />

approach in detecting novel chemical structures [1,2].<br />

The development of New Chemical Entities (NCEs) requires the<br />

explorati<strong>on</strong> of novel landscapes in chemical space. Rescaffolding is in<br />

particular important in lead identificati<strong>on</strong> but also in lead optimizati<strong>on</strong>. If<br />

a lead series cannot be optimized in multidimensi<strong>on</strong>al space, scaffoldrescuing<br />

is often perceived as back-up strategy to transfer hitherto<br />

available SAR into a new scaffold.<br />

With a binding site or a pharmacophore in hand often de novo design<br />

methods are applied to identify novel chemical matter. However, de novo<br />

design differs from rescaffolding in that the goal of the first is primarily to<br />

generate new molecules in chemical space while the latter aims to design<br />

new scaffolds under c<strong>on</strong>straints: high similarity in the desired property<br />

space and novelty in scaffold space. This allows jumping out of the known<br />

regi<strong>on</strong> of chemical space towards new regi<strong>on</strong>s of chemistry resulting in<br />

molecules with similar properties and activities but with novel frameworks.<br />

This talk will focus <strong>on</strong> ligand-based rescaffolding by taking into account<br />

<strong>on</strong>ly the topology of reference molecules. Several descriptors have been<br />

successfully applied for rescaffolding purposes, e.g. CATS (including<br />

CATS3D), Feature Trees (cf. ReCore), Topomers, Gaussian shape (cf. ROCS,<br />

BROOD) and others. In fragment-based rescaffolding new molecules are<br />

assembled from small (sub-) molecular fragments or building blocks often<br />

by applying pseudo-chemical rules in their recombinati<strong>on</strong> (e.g., RECAP).<br />

The use of fragments reduces the sampling rate of chemical space and<br />

provides access to synthesizable new compounds. This will be<br />

exemplified by two rescaffolding examples taking into account 2D as well<br />

as 3D descriptors.<br />

References<br />

1. Krueger BA, Dietrich A, Baringhaus K-H, Schneider G: Scaffold-Hopping<br />

Potential of Fragment-Based De Novo Design: The Chances and Limits of<br />

Variati<strong>on</strong>. Combinatorial Chemistry & High Throughput Screening 2009,<br />

12(4):383-396.<br />

2. Mauser H, Guba W: Recent developments in de novo design and scaffold<br />

hopping. Current Opini<strong>on</strong> in Drug Discovery & Development 2008,<br />

11(3):365-374.<br />

O20<br />

The membrane bound aromatic p-hydroxybenzoic acid<br />

oligoprenyltransferase (UbiA) - how iterative improvements lead to a<br />

realistic structure that offers new insights into functi<strong>on</strong>al aspects of<br />

prenyl transferases and terpene synthases<br />

W Brandt * , J Kufka, D Schulze, E Schulze, F Rausch, L Wessjohann<br />

Leibniz Institute of Plant Biochemistry, Department of Bioorganic Chemistry,<br />

Weinberg 3, D-06120 Halle (Saale), <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

E-mail: wbrandt@ipb-halle.de<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O20<br />

Prenyltransfering enzymes are at the basis of the vast isoprenoid natural<br />

product diversity. 4-Hydroxybenzoate oligoprenyltransferase of E. coli,<br />

encoded in the gene ubiA, is a key enzyme in the biosynthetic pathway<br />

to ubiquin<strong>on</strong>e. No X-ray structure exists of this membrane protein. It<br />

catalyses the prenylati<strong>on</strong> of 4-hydroxybenzoic acid in the 3-positi<strong>on</strong> using<br />

an oligoprenyl diphosphate as sec<strong>on</strong>d substrate.<br />

With homology modeling techniques, a first model indicated two putative<br />

active sites[1]. Based <strong>on</strong> this model, amino acids identified as important<br />

for the catalytic mechanism were selectively replaced to obtain five new<br />

mutants. All mutants were tested for their ability to form geranylated<br />

hydroxybenzoate from geranyl diphosphate, but <strong>on</strong>ly the unmodified<br />

Page 8 of 25<br />

UbiA-enzyme and to minor extent <strong>on</strong>e mutant showed enzymatic activity.<br />

These results indicated the involvement of all mutated amino acids in the<br />

catalytic mechanism but at the same time demanded a remodelling of<br />

the previously proposed enzyme structure combining two active sites.<br />

They have been placed into close proximity in the new model. Based <strong>on</strong><br />

these experimental results and structural classificati<strong>on</strong> of prenyl enzymes,<br />

a highly relevant 3D-model could be developed. This model is able to<br />

explain a wide range of substrate specificities and is in complete<br />

agreement with the results of site directed mutagenesis [2].<br />

Aromatic prenyl transferases, prenyl diphosphate synthases, and terpene<br />

synthases, activate an (olig)prenyl diphosphate to form a stabilized prenyl<br />

cati<strong>on</strong> reactive intermediate that, after additi<strong>on</strong> to a nucleophile (C=C<br />

b<strong>on</strong>d) and deprot<strong>on</strong>ati<strong>on</strong> delivers the product(s). Aromatic amino acids<br />

have been suggested to stabilize the cati<strong>on</strong> intermediate. Ab inito LMP2<br />

calculati<strong>on</strong>s indicate not <strong>on</strong>ly stabilizati<strong>on</strong> of a prenyl cati<strong>on</strong> by aromatic<br />

amino acid side chains but also by a methi<strong>on</strong>ine side chain. This<br />

suggesti<strong>on</strong> is supported by site directed mutagenesis, bioinformatics, and<br />

modelling studies. In additi<strong>on</strong>, a new catalytic diad composed of Tyr and<br />

Asp, represented by a Yx(x)xxD-motif, is identified as important player for<br />

deprot<strong>on</strong>ati<strong>on</strong> and prot<strong>on</strong>-relay in intermediates, or for the finalizing<br />

deprot<strong>on</strong>ati<strong>on</strong> step of many prenyl transferring and cyclizing enzymes.<br />

References<br />

1. Brauer L, Brandt W, Wessjohann L: J Mol Model 2004, 10:317-327.<br />

2. Brauer L, Brandt W, Schulze D, Zakharova S, Wessjohann L: Chembiochem<br />

2008, 9:982-992.<br />

O21<br />

CELLmicrocosmos 2.2: advancements and applicati<strong>on</strong>s in modeling of<br />

three-dimensi<strong>on</strong>al PDB membranes<br />

Björn Sommer 1* , T Dingersen 1 , S Schneider 2 , S Rubert 1 , C Gamroth 1<br />

1<br />

University of Bielefeld, Universitätsstraße 25, 33615 Bielefeld, <str<strong>on</strong>g>German</str<strong>on</strong>g>y;<br />

2<br />

D-85748 Garching, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O21<br />

Background: Today, <strong>on</strong>ly a few programs support membrane<br />

computati<strong>on</strong> and/or modeling in 3D. They enable the user to create very<br />

simple-structured membrane layers and usually assume a high level of<br />

bio-chemical/-physical knowledge. The CELLmicrocosmos 2 project<br />

developed a tool providing a simplified workflow to create membrane<br />

(bi-)layers: The MembraneEditor (CmME).<br />

Results: The geometry-based, scalable and modular computati<strong>on</strong> c<strong>on</strong>cept<br />

supports fast to more complex membrane generati<strong>on</strong>s. CmME is based <strong>on</strong><br />

the integrati<strong>on</strong> of two different types of PDB [1] models: Lipids are<br />

integrated with editable percental distributi<strong>on</strong> values and algorithms.<br />

Proteins are inserted and aligned into the bilayer manually or automatically,<br />

by using data from the PDB_TM [2] or OPM [3] database. Compatibility with<br />

other programs is offered by extensive PDB format export settings. High<br />

lipid densities are possible through advanced packing algorithms. Lipid<br />

distributi<strong>on</strong>s can be developed by using the Plugin-Interface. Originally not<br />

intended to change the atomic structure of the molecules due performance<br />

issues, now it is also possible to access the atomic level for user-defined<br />

computati<strong>on</strong>s. Multiple membrane (bi-)layers and microdomains are<br />

supported as well as a reengineering functi<strong>on</strong> providing the re-editing of<br />

externally simulated PDB membranes.<br />

C<strong>on</strong>clusi<strong>on</strong>s: The capabilities of CmME has been extended and tested to<br />

meet the requirements of different PDB visualizati<strong>on</strong> programs as well<br />

as molecular dynamics (MD) simulati<strong>on</strong> envir<strong>on</strong>ments like Gromacs [4].<br />

The documentati<strong>on</strong> and the Java Webstart applicati<strong>on</strong>, requiring <strong>on</strong>ly<br />

an internet c<strong>on</strong>necti<strong>on</strong> and Java 6, is accessible at: http://Cm2.<br />

CELLmicrocosmos.org<br />

References<br />

1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H,<br />

Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Research<br />

Oxford University Press, Oxford 2000, 28:<strong>23</strong>5-242.<br />

2. Tusnády GE, Dosztányi Z, Sim<strong>on</strong> I: Transmembrane proteins in the Protein<br />

Data Bank: identificati<strong>on</strong> and classificati<strong>on</strong>. Bioinformatics 2004,<br />

20(17):2964-2972.<br />

3. Lomize MA, Lomize AL, Pogozheva ID, Mosberg HI: OPM: Orientati<strong>on</strong>s of<br />

Proteins in Membranes database. Bioinformatics 2006, 22:6<strong>23</strong>-625.<br />

4. Lindahl E, Hess B, Spoel van der D: GROMACS 3.0: A package for<br />

molecular simulati<strong>on</strong> and trajectory analysis. J Mol Mod 2001, 7:306-317.


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

O22<br />

Approaching the limits: scoring functi<strong>on</strong>s for affinity predicti<strong>on</strong><br />

Christoph Sotriffer<br />

Institut für Pharmazie und Lebensmittelchemie, Universität Würzburg,<br />

Am Hubland, D - 97074 Würzburg, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O22<br />

One of the main tasks of computati<strong>on</strong>al methods in ligand design and<br />

lead identificati<strong>on</strong> is the elucidati<strong>on</strong> and assessment of interacti<strong>on</strong> modes<br />

between small-molecule ligands and protein target structures. This<br />

generally requires to estimate the relative or absolute affinity of a<br />

protein-ligand complex from its three-dimensi<strong>on</strong>al coordinates. Although<br />

statistical thermodynamics would in principle provide the necessary<br />

equati<strong>on</strong>s to calculate free energies of binding from molecular properties,<br />

these equati<strong>on</strong>s are not readily amenable to computati<strong>on</strong> since<br />

appropriate ensembles of the solvated systems must be generated and<br />

thoroughly sampled, which normally requires prohibitively l<strong>on</strong>g<br />

computing times. For the purpose of drug design, and virtual screening<br />

in particular, simpler and faster methods are needed, which are<br />

comm<strong>on</strong>ly referred to as scoring functi<strong>on</strong>s.<br />

In general, three major classes of scoring functi<strong>on</strong>s can be<br />

distinguished: force-field based methods, knowledge-based potentials,<br />

and empirical scoring functi<strong>on</strong>s. For any of these classes, many different<br />

functi<strong>on</strong>s have been developed over the past years. Although largescale<br />

and truly unbiased comparative assessments of their performance<br />

are relatively rare due to inherent difficulties in setting up appropriate<br />

test sets, the strengths and limitati<strong>on</strong>s of current scoring functi<strong>on</strong>s are<br />

fairly evident from the available data. In general, good results can be<br />

obtained in the identificati<strong>on</strong> or reproducti<strong>on</strong> of experimentally<br />

observed binding modes. The ability to distinguish active ligands from<br />

decoys appears, at least, to be sufficient to make virtual screening a<br />

practically useful endeavour. However, the correlati<strong>on</strong> with the<br />

experimental binding free energy and the possibility to quantitate the<br />

effects of small structural changes <strong>on</strong> the ligand affinity are in many<br />

cases still disappointing.<br />

Based <strong>on</strong> the approximati<strong>on</strong>s used by current scoring functi<strong>on</strong>s, strategies<br />

for improvement can be defined. On the other hand, there are some<br />

fundamental problems, also with respect to the underlying experimental<br />

data, which suggest that the best functi<strong>on</strong>s presently available may<br />

already be close to the limit of what can be achieved with empirical<br />

approaches.<br />

O<strong>23</strong><br />

I<strong>on</strong> permeati<strong>on</strong> through neur<strong>on</strong>al nAChR i<strong>on</strong> channel<br />

Jens Krüger * , G Fels<br />

University of Paderborn, Warburger Strasse 100, 33100 Paderborn, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O<strong>23</strong><br />

Alzheimer’s dementia is related to loss of inter-neur<strong>on</strong> communicati<strong>on</strong> by<br />

nicotinic acetylcholine receptors (nAChR) at post synaptic neur<strong>on</strong>s, and<br />

successful therapy approaches rely e.g. <strong>on</strong> modulati<strong>on</strong>s of resp<strong>on</strong>se<br />

signals initiated by i<strong>on</strong> flux through the receptor integrated i<strong>on</strong> channel.<br />

We have evaluated Umbrella Sampling (US) and Steered Molecular<br />

Dynamics (SMD) pulling for calculati<strong>on</strong> of the potential of mean force<br />

(PMF) of i<strong>on</strong> permeati<strong>on</strong> through neur<strong>on</strong>al alpha7 nAChR, a cati<strong>on</strong><br />

selective channel, in order to understand the changes in electrical<br />

potentials c<strong>on</strong>nected to channel gating. SMD enables the discriminati<strong>on</strong><br />

between i<strong>on</strong> in- and efflux and allows detailed insight into permeati<strong>on</strong><br />

pathways (Figure 1).<br />

Both methods show the same performance characteristics but require a<br />

different number of simulati<strong>on</strong>s with different simulati<strong>on</strong> length. For<br />

c<strong>on</strong>structi<strong>on</strong> of <strong>on</strong>e PMF 300 separate 300 ps simulati<strong>on</strong>s are performed<br />

for US, but <strong>on</strong>ly <strong>on</strong>e 2100 ps run for SMD pulling. Reas<strong>on</strong>able error<br />

estimates require three repetiti<strong>on</strong>s of US but two times twelve repetiti<strong>on</strong>s<br />

for SMD, which sum up to 9360 CPU hours in case of US and 6490 for<br />

SMD. SMD, therefore, needs 30% less time <strong>on</strong> low latency HPC clusters,<br />

but US gains advantage through distributed computing.<br />

The potential energy barriers are calculated for Na + and Cl - based <strong>on</strong> US<br />

and SMD pulling, enabling differentiati<strong>on</strong> between i<strong>on</strong> types and their<br />

in- and efflux through the channel.<br />

Figure 1 (abstract O<strong>23</strong>). Cut through the transmembrane part of<br />

the nAChR. The thin gray lines corresp<strong>on</strong>d to individual permeati<strong>on</strong><br />

pathways.<br />

O24<br />

Fleksy: a flexible approach to induced fit docking<br />

Markus Wagener 1* , SB Nabuurs 2 , J de Vlieg 1<br />

1 Schering-Plough Research Institute, PO Box 20, 5340BH Oss,<br />

The Netherlands; 2 Nijmegen, The Netherlands<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):O24<br />

Protein receptor rearrangements up<strong>on</strong> ligand binding are a major<br />

complicating factor in structure-based drug design. An accurate predicti<strong>on</strong><br />

of these so-called induced fit phenomena calls for ligand docking and<br />

virtual screening approaches capable of c<strong>on</strong>sidering receptor flexibility.<br />

We present Fleksy [1], a flexible approach aimed at accurately positi<strong>on</strong>ing<br />

small molecule ligands into a protein receptor, while taking both ligand<br />

and receptor flexibility into account. Our method c<strong>on</strong>sists of an ensemble<br />

docking stage in which the ligand of interest is docked into a structural<br />

ensemble of receptor c<strong>on</strong>formati<strong>on</strong>s, followed by a complex optimizati<strong>on</strong><br />

stage during which both ligand and protein are allowed to move. Pivotal<br />

to our method is the use of receptor ensembles to describe protein<br />

flexibility. To c<strong>on</strong>struct these ensembles we use a backb<strong>on</strong>e dependent<br />

rotamer library and implement the c<strong>on</strong>cept of interacti<strong>on</strong> sampling. The<br />

latter allows for the evaluati<strong>on</strong> of different orientati<strong>on</strong>s and, when<br />

relevant, different tautomers of ambivalent interacti<strong>on</strong> partners in the<br />

binding site such as asparagine, glutamine and histidine side chains. The<br />

docking stage comprises an ensemble-based soft-docking experiment<br />

using FlexX-Ensemble [2], followed by an effective flexible receptor-ligand<br />

complex optimizati<strong>on</strong> using Yasara [3]. Ultimately Fleksy results in a set of<br />

receptor-ligand complexes ranked using a c<strong>on</strong>sensus scoring functi<strong>on</strong><br />

which combines both docking scores and force field energies. Figure 1.<br />

Averaged over three cross-docking datasets, in total c<strong>on</strong>taining 35<br />

different pharmaceutically relevant receptor-ligand complexes, Fleksy<br />

reproduces the observed binding mode within 2.0 Å for 78% of the<br />

complexes. This compares favorably to the rigid receptor FlexX program<br />

[4] which <strong>on</strong> average reaches a success rate of 44% for these datasets.<br />

References<br />

1. Nabuurs SB, Wagener M, de Vlieg J: J Med Chem 2007, 50:6507.<br />

2. Claussen H, Buning C, Rarey M, Lengauer T: J Mol Biol 2001, 308:377.<br />

3. [http://www.yasara.org/].<br />

4. Rarey M, Kramer B, Lengauer T, Klebe G: J Mol Biol 1996, 261:470.<br />

Figure 1 (abstract O24).<br />

Page 9 of 25


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

POSTER PRESENTATIONS<br />

P1<br />

CWM global search - an internet search engine for the chemist<br />

Alexander Kos * , H-J Himmler<br />

AKos GmbH, Austr. 26, D-79585 Steinen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P1<br />

The Internet is a rich source of data and informati<strong>on</strong> for chemist. There are<br />

numerous multidisciplinary databases available for free <strong>on</strong> the Internet. Some<br />

examples of such data repositories are: PubChem, ChemSpider, eMolecules,<br />

Drugbank, KEGG, NIST, ChemSynthesis, PharmGKB, Free patents <strong>on</strong>line ...<br />

It should be obvious that an end user is a) not aware of all the resources,<br />

and b) has not the time to learn every user interface and is unable to<br />

search over all of them. We provide CWM Global Search as an applicati<strong>on</strong><br />

that enables to search by structure, CAS Registry Number and free text<br />

over all these sources. Presently CWM Global Search performs searches in<br />

30 databases and search engines accessing more than 100 milli<strong>on</strong> pages<br />

that associate data with structures.<br />

The user can submit a single query structure or several using SDFiles. In<br />

additi<strong>on</strong> to molecule searches CWM Global Search also allows to submit<br />

reacti<strong>on</strong> queries. In that case several single molecule searches are performed<br />

for the reactants, reagents and products. This makes it easy to find<br />

commercial suppliers and other synthesis relevant informati<strong>on</strong> such as<br />

safety sheets in <strong>on</strong>e query.<br />

Searching is technically less problematic than providing the answers in a<br />

digestible way for the user. Our first approach is to provide profiles for<br />

searching. You can choose “Availability” if you are interested to find a<br />

commercial supplier, or “Biology” if you are looking for biological effects.<br />

The sec<strong>on</strong>d help comes when we display the summary of the results. You<br />

get a table with hyperlinks color coded by topics. If the result page of doing<br />

asearchc<strong>on</strong>tainsalinktoanMSDS,thetopic‘Safety’ is highlighted. With<br />

profiles you limit your search to certain sources, and with topics your<br />

answers will be ordered.<br />

CWM Global Search is not the applicati<strong>on</strong> for exact searches like “give me<br />

the melting point of anthracene”. You will find the melting point <strong>on</strong><br />

many pages, and the topic “Physical Property” might help, but, in this<br />

case the link to Wikipedia gives the result quickest. Internet pages<br />

provide us the data in unordered fashi<strong>on</strong> and finding the exact answers<br />

is time c<strong>on</strong>suming. In some case like commercial suppliers we check<br />

against an internal list if the page really displays a supplier, or, if for<br />

instance PubChem has <strong>on</strong>ly a reference to ChemSpider, and ChemSpider<br />

references again just PubChem, but nowhere you will find a supplier. We<br />

also have to c<strong>on</strong>sider that many providers of the resources would not<br />

allow us to extract data directly without leading the user to their Internet<br />

pages. The nature of the results is fuzzy in CWM Global Search. This is an<br />

advantage if you look for instance for biological effects, which can be<br />

many, and/or if you want to learn why a compound could be important.<br />

We generate both InChI names and keys for the query structure and<br />

afterwards perform fast text searches in the various databases or search<br />

engines. In additi<strong>on</strong> we also generate Smiles. Since prior to the existence<br />

of standard InChI’s (released 2009) the various database providers used<br />

different settings to generate the InChI’s stored in their database, we use<br />

multiple settings when generating an InChI identifier used for a structure<br />

search in CWM Global Search. This way we greatly maximize the chance to<br />

find an InChI independent of the settings used by the database provider.<br />

P2<br />

The status of the InChI project and the InChI trust<br />

Stephen Heller 1* , Alan McNaught 2<br />

1<br />

NIST, PCPD, MS-8380, 100 Bureau Drive, Gaithersburg, 20899-8380, USA;<br />

2<br />

InChI Trust, Cambridge, UK<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P2<br />

The IUPAC InChI/InChIKey project has evolved to the point where its<br />

future development and promulgati<strong>on</strong> require a new management<br />

system that will provide stable and financially viable administrative<br />

arrangements for the foreseeable future. This is necessary to give the<br />

world-wide chemistry community that IUPAC serves the c<strong>on</strong>fidence that<br />

facilities for development, maintenance and support of the InChI/InChIKey<br />

Page 10 of 25<br />

algorithm are firmly established <strong>on</strong> an<strong>on</strong>goingbasis,andwillensure<br />

acceptance and usage of InChI as a mainstream standard. This new<br />

management system is provided by means of a new organizati<strong>on</strong>,<br />

the InChI Trust, an independent not-for-profit entity paid for by<br />

the community and those who use and benefit from the InChI algorithm.<br />

The missi<strong>on</strong> of the Trust is quite simple and limited; its sole purpose is to<br />

create and support administratively and financially a scientifically robust<br />

and comprehensive InChI algorithm and related standards and protocols.<br />

This presentati<strong>on</strong> will describe the current technical state of the InChI<br />

algorithm and how the InChI Trust is working to assure the c<strong>on</strong>tinued<br />

support and delivery of the InChI algorithm.<br />

P3<br />

jsMolEditor: an open source molecule editor for the next<br />

generati<strong>on</strong> web<br />

L Duan<br />

School of Pharmacy, East China University of Science and Technology,<br />

130 Meil<strong>on</strong>g Road, Shanghai 200<strong>23</strong>7, PR China<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P3<br />

Molecule editor have become essential building blocks for modern<br />

chemistry related databases with structural operati<strong>on</strong> available. It was quite<br />

a challenging task to input molecule structures in a web browser before<br />

Java Applets technology was applied in chemoinformatics. Until now, Java<br />

Applets are still the de facto standard for web based databases. With the<br />

development of web technologies and script programming, Ajax, a Web 2.0<br />

technology, is becoming mainstream in web development. This gradually<br />

eliminates the need for plug-ins such as Java and ActiveX. Therefore, it is<br />

possible to develop powerful web-based tools without plug-ins.<br />

This poster introduces a JavaScript based molecule editor: jsMolEditor,<br />

which had made a significant difference from other editors available <strong>on</strong><br />

web pages. Unlike server side editors such as PubChem editor [1],<br />

jsMolEditor implemented all its functi<strong>on</strong>ality in JavaScript <strong>on</strong> the client<br />

side. Thus it is not necessary to maintain a c<strong>on</strong>necti<strong>on</strong> to a server during<br />

operati<strong>on</strong>. jsMolEditor works completely independently <strong>on</strong> the client side.<br />

The JavaScript nature of jsMolEditor also made it able to run in standard<br />

web browsers without specific plugins or virtual machines. The 2D drawing<br />

system deals with web browser differences. By combining Canvas and<br />

VML, jsMolEditor was able to work <strong>on</strong> all comm<strong>on</strong> web browsers including<br />

Internet Explorer, Firefox, Safari, Opera and Chrome. Instead of building<br />

jsMolEditor in JavaScript from scratch, GWT [2] is utilized as a crosscompiler<br />

from Java to JavaScript, which enables the reuse of existing Java<br />

chemoinformatics frameworks. jsMolEditor used a ported versi<strong>on</strong> of MX [3]<br />

as the bottom layer framework to handle basic molfile I/O. Thanks to MX’s<br />

open source license, the development of jsMolEditor saves a lot of time by<br />

eliminating the process of “reinventing the wheel”. The poster represents<br />

not <strong>on</strong>ly jsMolEditor itself, but also the building process of jsMolEditor and<br />

the feasibility of applying the same c<strong>on</strong>cept to other chemistry gadgets <strong>on</strong><br />

web pages, such as spectrum viewers.<br />

jsMolEditor’s <strong>on</strong>line demo is available at http://chemhack.com/jsmoleditor/<br />

and the source code is available at http://github.com/chemhack/jsmoleditor/.<br />

References<br />

1. PubChem Server Side Structure Editor. [http://pubchem.ncbi.nlm.nih.gov/<br />

edit/].<br />

2. GWT, Google Web Toolkit. [http://code.google.com/webtoolkit/].<br />

3. MX Chemoinformatics Toolkit. [http://metamolecular.com/mx].<br />

P4<br />

Mining public-source databases for structure-activity relati<strong>on</strong>ships<br />

Bernd Wendt 1* , Fabian Bös 2 , Ulrike Uhrig 2<br />

1 Elara Pharmaceuticals, Heidelberg, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 Tripos Internati<strong>on</strong>al,<br />

Martin-Kollar-Str. 17, München, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P4<br />

Modeling off-target effects has become a highly important and relevant<br />

comp<strong>on</strong>ent of the computati<strong>on</strong>al chemistry toolset. The presentati<strong>on</strong> will<br />

describe a new c<strong>on</strong>tributi<strong>on</strong> that seeks to allow the extracti<strong>on</strong> of 3Dstructure-activity<br />

relati<strong>on</strong>ships from public informati<strong>on</strong> starting from a<br />

chemical structure. Several public source databases such as PubChem [1]<br />

offering structure as well as activity informati<strong>on</strong> for a number of targets


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

have been examined for their value in extracting useful structure-activity<br />

relati<strong>on</strong>ships (SARs). A Topomer search [2] of public-source databases using<br />

the structures of a set of 255 marketed drugs [3] as queries yielded sets of<br />

shape- and pharmacophore similar hits. SAR-tables were c<strong>on</strong>structed by<br />

collecting hits around each query structure and for a particular reported<br />

activity. A new method: quantitative series enrichment analysis (QSEA) [4]<br />

was applied to these SAR-tables to capture trends and to transform these<br />

trends into 3D-QSAR models. Overall more than 400 SAR-tables with<br />

Topomer CoMFA models were found by extracting trends from the<br />

PubChem and ChemBank [5] database. The resulting models were able to<br />

highlight the structural details of certain off-target effects of marketed drugs<br />

even in those cases where the traditi<strong>on</strong>al structural similarity would<br />

c<strong>on</strong>clude that no off-target effect would exist. This dem<strong>on</strong>strates the<br />

usefulness of the approach in modeling off-target effects.<br />

References<br />

1. [http://pubchem.ncbi.nlm.nih.gov].<br />

2. Cramer RD: J Chem Inf Comp Sci 2004, 44:1221-1227.<br />

3. Cleves AE, Jain AN: J Med Chem 2006, 49:2921-2938.<br />

4. Wendt B, Cramer RD: J Comp Aided Mol Des 2008, 22:541-555.<br />

5. [http://chembank.broad.harvard.edu/].<br />

P5<br />

OCHEM - <strong>on</strong>-line CHEmical database & modeling envir<strong>on</strong>ment<br />

Sergii Novotarskyi 1* , Iurii Sushko 1 , R Körner 1 , Anil Pandey Kumar 1 ,<br />

Matthias Rupp 1 , VV Prokopenko 2 , Igor Tetko 1<br />

1 Institute of Bioinformatics and Systems Biology - Helmholtz Zentrum<br />

Muenchen, <str<strong>on</strong>g>German</str<strong>on</strong>g> Research Center for Envir<strong>on</strong>mental Health, Ingolstaedter<br />

Landstrasse 1, D-85764 Neuherberg, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 Kiev, Ukraine<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P5<br />

The main goal of OCHEM database http://qspr.eu is to collect, store and<br />

manipulate chemical data with the purpose of their use for model<br />

development. Its main features, that distinguish it from other available<br />

databases include:<br />

1. The database is open and it is based <strong>on</strong> Wiki-style principles. We<br />

encourage users to submit data and to correct inaccurate submitted data;<br />

2. The database is aimed at collecting high-quality data. To achieve this<br />

we require users to submit references to the article, where the data was<br />

published. The reference may include the article name, journal name,<br />

date of publicati<strong>on</strong>, page number, line number, etc.<br />

3. Since the compound properties may vary depending <strong>on</strong> the c<strong>on</strong>diti<strong>on</strong>s,<br />

under which they were measured, we store the measurement c<strong>on</strong>diti<strong>on</strong>s<br />

with the data to provide the users with more accurate informati<strong>on</strong> about<br />

each data point.<br />

The modeling framework is being developed to complement the Wiki-style<br />

database of chemical structures. Its main goal is to provide a flexible and<br />

expandable calculati<strong>on</strong> envir<strong>on</strong>ment that would allow a user to create and<br />

manipulate QSAR and QSPR models <strong>on</strong>-line. The modeling framework is<br />

integrated with the database web-interface that allows easy transfer<br />

of database data to the models. The web interface of the modeling<br />

envir<strong>on</strong>ment is aimed to provide to the Web users easy means to create<br />

high-quality predicti<strong>on</strong> models and estimate their accuracy of predicti<strong>on</strong> and<br />

applicability domain. The developed models can be published <strong>on</strong> the Web<br />

and be accessed by other users to predict new molecules <strong>on</strong>-line. This tool is<br />

aimed to generate a new paradigm for structure activity relati<strong>on</strong>ship<br />

knowledge bases, making QSAR/QSPR models active, user-c<strong>on</strong>tributed and<br />

easily accessible for benchmarking, general use and educati<strong>on</strong>al purposes.<br />

The examples of the use of the database within nati<strong>on</strong>al and EU projects will<br />

be exemplified.<br />

P6<br />

ChEBI: a chemistry <strong>on</strong>tology and database<br />

Paula de Matos * , A Dekker, M Ennis, Janna Hastings, K Haug, S Turner,<br />

Christoph Steinbeck<br />

<strong>Cheminformatics</strong> and Metabolism Team, European Bioinformatics Institute,<br />

Hinxt<strong>on</strong> Cambridge, CB10 1SD, UK<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P6<br />

The bioinformatics community has developed a policy of open access and<br />

open data since its incepti<strong>on</strong>. This is c<strong>on</strong>trary to chemoinformatics which<br />

Page 11 of 25<br />

has traditi<strong>on</strong>ally been a closed-access area. In 2004, two complementary<br />

open access databases were initiated by the bioinformatics community,<br />

ChEBI [1] and PubChem. PubChem serves as automated repository <strong>on</strong> the<br />

biological activities of small molecules and ChEBI (Chemical Entities of<br />

Biological Interest) as a manually annotated database of molecular<br />

entities focused <strong>on</strong> ‘small’ chemical compounds. Although ChEBI is<br />

reas<strong>on</strong>ably compact c<strong>on</strong>taining just over 18,000 entities, it provides a<br />

wide range of data items such as chemical nomenclature, an <strong>on</strong>tology<br />

and chemical structures. The ChEBI database has a str<strong>on</strong>g focus <strong>on</strong><br />

quality with excepti<strong>on</strong>al efforts afforded to IUPAC nomenclature rules,<br />

classificati<strong>on</strong> within the <strong>on</strong>tology and best IUPAC practices when drawing<br />

chemical structures.<br />

ChEBI is currently undergoing a period of restructuring which will allow it<br />

to incorporate the small molecule structures from (and link to) EBI’s new<br />

chemogenomics database ChEMBL [2], increasing its small molecules<br />

coverage to over 500,000 entities. We have restructured the chemical<br />

structure search facility to use Orchem [3] an Oracle chemistry plug-in<br />

using the Chemistry Development Kit [4]. The facility allows a user to<br />

draw a chemical structure or load <strong>on</strong>e from a file and then execute either<br />

a substructure or similarity search. Furthermore the ChEBI text search will<br />

have extensive facilities for querying based <strong>on</strong> not <strong>on</strong>ly names but<br />

formula, a range of charges and molecular weight. The ability to query<br />

the ChEBI <strong>on</strong>tology and retrieve all children for a given entity will also be<br />

included.<br />

In order to aid the distributi<strong>on</strong> of ChEBI to the chemoinformatics<br />

community we have extended our export formats to include an MDL sdf<br />

format with a lighter versi<strong>on</strong> c<strong>on</strong>sisting <strong>on</strong>ly of compound structure,<br />

name and identifier. A complete versi<strong>on</strong> is available with all the ChEBI<br />

data properties such as syn<strong>on</strong>yms, cross-references, SMILES and InChI.<br />

Furthermore cross-references in ChEBI have been extended to include<br />

BRENDA the enzyme database, NMRShiftDB the database for organic<br />

structures and their nuclear magnetic res<strong>on</strong>ance (nmr) spectra, Rhea the<br />

biochemical reacti<strong>on</strong> database and IntEnz the enzyme nomenclature<br />

database.<br />

ChEBI is available at http://www.ebi.ac.uk/chebi.<br />

References<br />

1. Degtyarenko K, de Matos P, Ennis M, Hastings J, Zbinden M, McNaught A,<br />

Alcántara R, Darsow M, Guedj M, Ashburner M: ChEBI: a database and<br />

<strong>on</strong>tology for chemical entities of biological interest. Nucleic Acids Res<br />

2008, 36:D344-D350.<br />

2. [http://www.ebi.ac.uk/chembl/].<br />

3. Rijnbeek M, Steinbeck C: An open source chemistry search engine for<br />

Oracle. in press.<br />

4. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen EL: The<br />

Chemistry Development Kit (CDK): An Open-Source Java Library for<br />

Chemo- and Bioinformatics. J Chem Inf Comput Sci 2003, 43(2):493-500.<br />

P7<br />

Comparing manual and automated extracti<strong>on</strong> of chemical entities<br />

from documents<br />

Christian Tyrchan * , Sorel Muresan<br />

AstraZeneca R&D, LG CVGI, Pepparedsleden 1, S-43183 Mölndal, Sweden<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P7<br />

The chemical informati<strong>on</strong> landscape is changing rapidly with a yearly<br />

increase of over 1 milli<strong>on</strong> new compounds and more than 700,000<br />

publicati<strong>on</strong>s related to chemistry [1]. Exploring the chemical space covered<br />

by relevant journals and patents is a crucial step in early stage medicinal<br />

chemistry projects. Extracting chemical entities from unstructured text is a<br />

complex task and different approaches are currently used including<br />

manual extracti<strong>on</strong> by expert curators, text mining supported by chemical<br />

NER or combinati<strong>on</strong>s thereof [2]. The chemical informati<strong>on</strong> and<br />

corresp<strong>on</strong>ding annotati<strong>on</strong>s are subsequently stored in relati<strong>on</strong>al databases<br />

allowing for complex chemical and text queries.<br />

To assess the capability of chemical NER in documents and to understand<br />

thecoverageandaccuracyoftheunderlyingdatawecomparedthe<br />

chemistry extracted by manual curati<strong>on</strong> (GVKBIO) and text mining<br />

(SureChem) from a small patent corpus.<br />

GVKBIO databases are populated with explicit relati<strong>on</strong>ships between<br />

compounds, assays and sequence identifiers that have been manually<br />

extracted from journals and patents <strong>on</strong> a large scale [3].


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

SureChem Portal [4] is a gateway for chemical patent search <strong>on</strong> full text<br />

collecti<strong>on</strong>s for USPTO, EPO and WO. SureChem users can perform structure<br />

and keyword searches <strong>on</strong> more than 9 milli<strong>on</strong> unique compounds.<br />

We have selected a set of 250 patents covering various target classes and for<br />

which a minimum of 25 records per patents were retrieved from GVKBIO<br />

Patent database. The analysis was d<strong>on</strong>e using PipelinePilot protocols [5].<br />

These initial results dem<strong>on</strong>strate the benefits and challenges of text<br />

mining for chemical informati<strong>on</strong> extracti<strong>on</strong> from unstructured text.<br />

References<br />

1. Engel T: J Chem Inf Model 2006, 46:2267-2277.<br />

2. Banville DL, (Ed.): Chemical Informati<strong>on</strong> Mining: Facilitating Literature-Based<br />

Discovery CRC Press: Boca Rat<strong>on</strong>, L<strong>on</strong>d<strong>on</strong> New York 2009.<br />

3. [http://www.gvkbio.com/informatics.html].<br />

4. [http://www.surechem.org].<br />

5. [http://accelrys.com/products/scitegic].<br />

P8<br />

Embedded infrastructure for primary data in chemistry<br />

Igor Grbavac 1* , Oliver Koepler 1 , S Dohmeyer-Fischer 2 , Gregor Fels 2 ,<br />

Irina Sens 1 , J Brase 1<br />

1 <str<strong>on</strong>g>German</str<strong>on</strong>g> Nati<strong>on</strong>al Library of Science and Technology, Welfengarten 1B,<br />

30167 Hannover, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 Universität Paderborn, Department Chemistry,<br />

Warburgerstrasse 100, 33098 Paderborn, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P8<br />

Researchers around the world produce vast amounts of experimental data<br />

each day. Primary (experimental) data are the key elements of research<br />

projects and scientific publicati<strong>on</strong>s. During the scientific workflow from the<br />

experiment to the final scientific publicati<strong>on</strong> primary data is analysed,<br />

interpreted and c<strong>on</strong>densed to a final quintessence. But until now the<br />

handling of primary data in chemistry c<strong>on</strong>tains no comm<strong>on</strong>ly approved<br />

standards c<strong>on</strong>cerning reusability and l<strong>on</strong>g-time accessibility. Predominantly<br />

there are no quality c<strong>on</strong>trol, no assured l<strong>on</strong>g-time archival storage, no<br />

proofs and no exploitati<strong>on</strong> of primary data and c<strong>on</strong>sequently there is no<br />

assurance of data given. The c<strong>on</strong>cept study “K<strong>on</strong>zeptstudie Vernetzte<br />

Primärdaten-Infrastruktur für den Wissenschaftler-Arbeitsplatz” of the TIB,<br />

the workgroup of Prof. Fels and the FIZ CHEMIE aims to identify the key<br />

factors for an embedded infrastructure of primary data in the field of<br />

chemistry. In this infrastructure, chemical primary data shall be saved<br />

persistently in a central database, linked and be citable by the use of DOI<br />

(digital object identifier), be accessible and searchable. Figure 1.<br />

We have carried out a questi<strong>on</strong>naire am<strong>on</strong>g researchers to analyse the<br />

scientific workflow and the chemical life-cycle of primary data in chemistry<br />

and to identify the requirements for processes and structures in an<br />

embedded scientific infrastructure for researchers. We have furthermore<br />

defined both a general and extended metadata scheme for the storage,<br />

citati<strong>on</strong> and searching primary datasets. General metadata include<br />

elements necessary for citati<strong>on</strong>s while chemical metadata cover specific<br />

needs of describing and searching within the data itself.<br />

Figure 1 (abstract P8).<br />

P9<br />

Predicti<strong>on</strong> of the partiti<strong>on</strong> coefficient between air and body<br />

compartments from the chemical structure<br />

Stefanie Stöckl * , R Kühne, R-U Ebert, G Schüürmann<br />

UFZ Department of Ecological Chemistry, Helmholtz Centre for<br />

Envir<strong>on</strong>mental Research, Permoserstr. 15, 04318 Leipzig , <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P9<br />

Page 12 of 25<br />

For PBPK modeling, partiti<strong>on</strong> coefficients between tissues and<br />

envir<strong>on</strong>mental compartments are required. A simple approach starts with<br />

the system blood/air, fat/air, and fat/blood. Employing thermodynamic<br />

relati<strong>on</strong>ships, <strong>on</strong>e of the three coefficients can be calculated from the<br />

other two values. With respect to available human and mammal data,<br />

modeling efforts focus <strong>on</strong> the blood/air and fat/air partiti<strong>on</strong> coefficient.<br />

Respective data sets from literature have been collected and evaluated.<br />

The chemical domain of the sets is presented in terms of chemical<br />

structure, complexity, and polarity. Data gaps have been identified.<br />

The blood/air and fat/air partiti<strong>on</strong> coefficient data sets have been applied<br />

to a validati<strong>on</strong> of literature models, with particular remark <strong>on</strong> the<br />

performance for specific compound classes. A new model for the blood/<br />

air partiti<strong>on</strong> coefficient and a preliminary new model for the fat/air<br />

partiti<strong>on</strong> coefficient are presented.<br />

The study was supported by the EU projects 2FUN (c<strong>on</strong>tract No. 036976)<br />

and OSIRIS (IP, c<strong>on</strong>tract No. 037017).<br />

P10<br />

Modelling dissociati<strong>on</strong> c<strong>on</strong>stants of organic acids by local molecular<br />

parameters<br />

Haiying Yu * , R Kühne, R-U Ebert, G Schüürmann<br />

Department of Ecological Chemistry, Helmholtz Centre for Envir<strong>on</strong>mental<br />

Research – UFZ, Permoserstr. 15, 04318, Leipzig, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P10<br />

The dissociati<strong>on</strong> c<strong>on</strong>stant pKa defines the i<strong>on</strong>izati<strong>on</strong> degree of organic<br />

compounds. It is an important parameter that affects their toxicity and<br />

envir<strong>on</strong>mental behavior and fate. Am<strong>on</strong>g the present approaches to<br />

predict pKa values of organic chemicals, increment methods show a<br />

good performance but are limited by missing values of special groups.<br />

The purpose of this study is to develop models for predicting the pKa of<br />

organic acids directly from their molecular structures.<br />

A quantum chemical method was introduced by employing local molecular<br />

parameters. Experimental pKa values of more than 1000 organic acids,<br />

including phenols, aromatic carboxylic acids, aliphatic carboxylic acids and<br />

alcohols were selected, and all molecules were optimized using the semiempirical<br />

AM1 Hamilt<strong>on</strong>ian. Simple models with several descriptors for<br />

different subsets were calibrated by multilinear regressi<strong>on</strong> (MLR). Substituent<br />

positi<strong>on</strong>s <strong>on</strong> aromatic rings were also taken into account.<br />

The obtained models yield good predictive squared correlati<strong>on</strong> coefficients<br />

q2 and superior performance compared with other quantum chemical<br />

models. The predicti<strong>on</strong> capability is further evaluated using cross<br />

validati<strong>on</strong>. The models unravel that the potential of oxygen atom c<strong>on</strong>joint<br />

to i<strong>on</strong>izable hydrogen atom to accept extra electr<strong>on</strong>ic charge is the single<br />

most important c<strong>on</strong>tributi<strong>on</strong> to the dissociati<strong>on</strong> of hydrogen atom.<br />

N<strong>on</strong>-linear statistical analysis methods were also applied because of the<br />

n<strong>on</strong>-linear character of the employed descriptors.<br />

The study was supported by the China Scholarship Council and by the EU<br />

project OSIRIS (IP, c<strong>on</strong>tact No. 037017), which is gratefully acknowledged.<br />

P11<br />

Where are the boundaries? Automated pocket detecti<strong>on</strong> for<br />

druggability studies<br />

Andrea Volkamer 1* , T Grombacher 2 , Matthias Rarey 1<br />

1 University of Hamburg, Center for Bioinformatics, Bundesstr. 43, 20146<br />

Hamburg, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 Bioinformatics, Merck Ser<strong>on</strong>o, Frankfurter Str. 250,<br />

64293 Darmstadt, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P11<br />

Computer-based predicti<strong>on</strong> of protein druggability is an essential task in<br />

the drug development process. Early identificati<strong>on</strong> of disease modifying


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

targets that can be modulated by low-molecular weight compounds can<br />

help to speed up and reduce costs in drug discovery. Recently, first<br />

methods have been presented performing a druggability estimati<strong>on</strong><br />

solely based <strong>on</strong> the 3D structure of the protein [1-3]. The essential first<br />

step for such methods is the identificati<strong>on</strong> of the active site. A multitude<br />

of methods exist for automated active site predicti<strong>on</strong> [4-6]. However,<br />

most methods developed for automated docking procedures do not<br />

explicitly focus <strong>on</strong> the definiti<strong>on</strong> of the boundary of the active site. Since<br />

druggability estimates are based <strong>on</strong> structural descriptors of the active<br />

site, a precise descripti<strong>on</strong> of the active site boundaries is vital for correct<br />

predicti<strong>on</strong>s.<br />

In this work, we present a method to predict protein binding pockets and<br />

split them into subpockets such that small molecules are mostly<br />

c<strong>on</strong>tained within <strong>on</strong>e sub-pocket. The method is based <strong>on</strong> a novel<br />

strategy to geometrically detect narrow regi<strong>on</strong>s in pockets. For<br />

druggability predicti<strong>on</strong>s, such pocket descripti<strong>on</strong>s result in more<br />

meaningful structural descriptors like active site surface or volume.<br />

Moreover, if several structures from <strong>on</strong>e protein are known, sub-pockets<br />

can give hints about protein flexibility and induced fit c<strong>on</strong>formati<strong>on</strong>al<br />

changes.<br />

Our method was evaluated <strong>on</strong> 718 proteins from the PDBbind [7] data<br />

set, as well as 5419 proteins from the scPDB [8] data set. Binding pockets<br />

are correctly predicted in 94% and 93% of the datasets. 38%, respectively<br />

45% of the proteins from the two datasets c<strong>on</strong>tain pockets which can be<br />

divided into more than <strong>on</strong>e sub-pocket. In all cases <strong>on</strong>e sub-pocket<br />

completely covers the co-crystallized ligand. Besides the classical overlapmeasure<br />

of ligand versus predicted active site, we additi<strong>on</strong>ally c<strong>on</strong>sidered<br />

the pocket coverage by the co-crystallized ligand. We found that the<br />

number of test cases with more than 30% pocket coverage rises from<br />

30% to 74% (PDBbind) and from 28% to 63% (scPDB), respectively, when<br />

c<strong>on</strong>sidering sub-pockets.<br />

References<br />

1. Nayal M, H<strong>on</strong>ig B: Proteins 2006, 63(4):892-906.<br />

2. Cheng C, et al: Nat Biotechnol 2007, 25(1):71-5.<br />

3. Halgren TA: J Chem Inf Model 2009, 49(2):377-89.<br />

4. Hendlich M: J Mol Graph Model 1997, 15(6):359-63.<br />

5. Weisel M, et al: Chem Cent J 2007, 1:7.<br />

6. An J, et al: Mol Cell Proteomics 2005, 4(6):752-61.<br />

7. Wang R, et al: J Med Chem 2004, 47(12):2977-80.<br />

8. Kellenberger E, et al: J Chem Inf Model 2006, 46(2):717-27.<br />

P12<br />

Protein negative/positively cooperative binding to<br />

zwitteri<strong>on</strong>ic/ani<strong>on</strong>ic vesicles<br />

F Torrens * , G Castellano<br />

Institut Universitari de Ciència Molecular, Universitat de València, Edifici<br />

d’Instituts de Paterna, P. O. Box 22085, 46071 València, Spain<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P12<br />

The role of electrostatics is studied in the adsorpti<strong>on</strong> of cati<strong>on</strong>ic<br />

proteins to zwitteri<strong>on</strong>ic phosphatidylcholine (PC) and ani<strong>on</strong>ic mixed PC/<br />

phosphatidylglycerol (PG) small unilamellar vesicles (SUVs) [1]. For<br />

model proteins the interacti<strong>on</strong> is m<strong>on</strong>itored vs. PG c<strong>on</strong>tent at low i<strong>on</strong>ic<br />

strength [2]. The adsorpti<strong>on</strong> of lysozyme-myoglobin-bovine serum<br />

albumin (BSA) (isoelectric point, pI 5-11) is investigated in SUVs, al<strong>on</strong>g<br />

with changes of the fluorescence emissi<strong>on</strong> spectra of the proteins, via<br />

theiradsorpti<strong>on</strong><strong>on</strong>SUVs[3].IntheGouy-Chapmanformalismthe<br />

activity coefficient goes with the square of charge number [4].<br />

Deviati<strong>on</strong>s from the ideal model indicate asymmetric locati<strong>on</strong> of ani<strong>on</strong>ic<br />

phospholipid in the bilayer inner leaflet, in mixed zwitteri<strong>on</strong>ic/ani<strong>on</strong>ic<br />

SUVs for protein-PC/PG, in agreement with experiments-molecular<br />

dynamics simulati<strong>on</strong>s. Effective SUV charge stays c<strong>on</strong>stant. Myoglobin-,<br />

DNC-melittin- and melittin-zwitteri<strong>on</strong>ic associati<strong>on</strong>s are described by a<br />

partiti<strong>on</strong> model, modulated by electrostatic charging of membrane as<br />

protein accumulates at interface. Provisi<strong>on</strong>al c<strong>on</strong>clusi<strong>on</strong>s follow. (1) In<br />

mixed zwitteri<strong>on</strong>ic/ani<strong>on</strong>ic vesicles the charge effect <strong>on</strong> the protein<br />

binding model was analyzed. For lysozyme-ani<strong>on</strong>ic enough vesicles and<br />

myoglobin the electrostatic repulsi<strong>on</strong> between cati<strong>on</strong>ic ad proteins<br />

dominates over the electrostatic attracti<strong>on</strong> between ad protein dipoles.<br />

(2) The salt effect <strong>on</strong> the protein binding model of mixed zwitteri<strong>on</strong>ic/<br />

ani<strong>on</strong>ic vesicles was analyzed. The cooperativity increases with i<strong>on</strong>ic<br />

Page 13 of 25<br />

strength. The corresp<strong>on</strong>ding interpretati<strong>on</strong> is that the electrostatic<br />

repulsi<strong>on</strong> between cati<strong>on</strong>ic ad proteins decreases with increasing salt<br />

effect, and the electrostatic attracti<strong>on</strong> between ad protein dipoles<br />

becomes dominant over the electrostatic repulsi<strong>on</strong> between ad protein<br />

charges. (3) In ani<strong>on</strong>ic vesicles the effect of vesicle charge <strong>on</strong> protein<br />

binding shows that, with increasing ani<strong>on</strong>ic character of the vesicles,<br />

the protein-protein electrostatic repulsi<strong>on</strong> is decreasingly important vs.<br />

the protein-vesicle attracti<strong>on</strong>, and the electrostatic attracti<strong>on</strong> between<br />

ad protein dipoles becomes dominant over the electrostatic repulsi<strong>on</strong><br />

between ad protein charges. (4) For lysozyme-mixed zwitteri<strong>on</strong>ic/<br />

ani<strong>on</strong>ic vesicles and myoglobin cooperativity increases with pH. With<br />

increasing pH and decreasing cati<strong>on</strong>ic character of the protein, the<br />

protein-protein electrostatic repulsi<strong>on</strong> is decreasingly important against<br />

the protein-SUV attracti<strong>on</strong>, and the electrostatic attracti<strong>on</strong> between ad<br />

protein dipoles becomes dominant over the electrostatic repulsi<strong>on</strong><br />

between ad protein charges. Furthermore the opposed effect is<br />

observed for lysozyme-zwitteri<strong>on</strong>ic vesicles. (5) For protein-mixed<br />

zwitteri<strong>on</strong>ic/ani<strong>on</strong>ic vesicle binding there is more dispersi<strong>on</strong> in the<br />

results, which could indicate asymmetric locati<strong>on</strong> of ani<strong>on</strong>ic<br />

phospholipid.<br />

References<br />

1. Torrens F, Campos A, Abad C: Cell Mol Biol 2003, 49:991.<br />

2. Torrens F, Abad C, Codoñer A, García-Lopera R, Campos A: Eur Polym J<br />

2005, 41:1439.<br />

3. Torrens F, Castellano G, Campos A, Abad C: JMolStruct2007,<br />

216:834-836.<br />

4. Torrens F, Castellano G, Campos A, Abad C: JMolStruct2009,<br />

274:924-926.<br />

P13<br />

CARTESIUS: a group functi<strong>on</strong> based toolkit for hybrid molecular<br />

modelling<br />

Mikhail Rudenko 1* , AL Tchougreeff 1,2<br />

1 Institute of Physics and Technology of RAS, Nakhimovsky ave. 36/1, 117218,<br />

Moscow; Moscow Center for C<strong>on</strong>tinuous Mathematical Educati<strong>on</strong>, 119991,<br />

Moscow, Russia; 2 Aachen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P13<br />

Modern methods of quantum chemistry have achieved significant<br />

successes. Nevertheless, modeling of certain classes of compounds<br />

remains problematic. The main complicati<strong>on</strong>s are both high computati<strong>on</strong>al<br />

complexity of these methods and the lack of a unique way to determine<br />

the ground state with correct asymptotic properties for complex systems.<br />

One of the possible soluti<strong>on</strong>s is development of the hybrid molecular<br />

modeling methods.<br />

Our approach to hybrid molecular modeling is based <strong>on</strong> the ideas of<br />

the group functi<strong>on</strong> approximati<strong>on</strong>, first suggested by McWeeny [1]<br />

and further elaborated in [2]. This approximati<strong>on</strong> lays a solid basis<br />

for simultaneous usage of most appropriate methods for computati<strong>on</strong><br />

of properties of each part of a complex system and c<strong>on</strong>sistent<br />

interpretati<strong>on</strong> of the properties of the system independently of the<br />

methods used.<br />

The CARTESIUS toolkit implements the above menti<strong>on</strong>ed scheme in<br />

both flexible and computati<strong>on</strong>ally efficient way. This is achieved by<br />

means of combining cutting-edge C++ techniques for computati<strong>on</strong>ally<br />

intensive parts of the code and the full power of Pyth<strong>on</strong> for the c<strong>on</strong>trol<br />

interface.<br />

As a result we have a carefully designed software system, which<br />

possesses both the power of the quantum chemistry codes developed<br />

previously with use of the group functi<strong>on</strong> approximati<strong>on</strong> (BF [3], EHCF [4],<br />

BFSCF [5] etc.) and high potential for further enhancements.<br />

This work is partially supported by the RFBR through the grant No 07-<br />

03-01128.<br />

References<br />

1. McWeeny R: Methods of Molecular Quantum Mechanics Academic, L<strong>on</strong>d<strong>on</strong>,<br />

2 1992.<br />

2. Tchougréeff AL: Hybrid Methods of Molecular Modeling Springer 2008.<br />

3. Tokmachev M, Tchougréeff AL: J Phys Chem A 2003, 107:358.<br />

4. Soudackov AV, Tchougréeff AL, Misurkin IA: Theor Chim Acta 1992,<br />

83:389.<br />

5. Tchougréeff AL, Tokmachev AM: Int J Quant Chem 2006, 106:571.


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

P14<br />

Molecular fragments chemoinformatics<br />

Hubert Kuhn 1* , Stefan Neumann 2 , Christoph Steinbeck 3 , Carsten Wittekindt 1* ,<br />

Achim Zielesny 4<br />

1 CAM-D, Essen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 GNWI, Oer-Erkenschwick, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 3 European<br />

Bioinformatics Institute (EBI), Hinxt<strong>on</strong>, UK; 4 University of Applied Sciences of<br />

Gelsenkirchen, Institute for Bioinformatics and Chemoinformatics,<br />

Recklinghausen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P14<br />

The descripti<strong>on</strong> of chemical structures as a collecti<strong>on</strong> of c<strong>on</strong>nected<br />

molecular fragments is a basic requirement of coarse grained simulati<strong>on</strong><br />

methods like molecular fragment dynamics. These methods use<br />

molecular fragments as their basic interacting entities (“atoms”) and<br />

allow the modelling and investigati<strong>on</strong> of very large chemical systems.<br />

Therefore a molecular fragments chemoinformatics is in need that<br />

supports the fragment-based representati<strong>on</strong> of chemical structures as<br />

well as the elementary operati<strong>on</strong>s up<strong>on</strong> them. The poster outlines<br />

definiti<strong>on</strong>s and approaches to tackle these issues from an adequate<br />

molecular line notati<strong>on</strong> up to the graphical representati<strong>on</strong> of simulati<strong>on</strong><br />

boxes.<br />

P15<br />

A theoretical investigati<strong>on</strong> of microhydrati<strong>on</strong> of amino acids<br />

Catherine Michaux * , J Wouters, D Jacquemin, Eric A Perpète<br />

Unité de Chimie Physique Théorique et Structurale, Facultés<br />

Universitaires Notre-Dame de la Paix, rue de Bruxelles, 61,<br />

B-5000 Namur, Belgium<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P15<br />

Water plays a role in stabilizing biomolecular structure and facilitating<br />

biological functi<strong>on</strong>. Water can generate small active clusters and<br />

macroscopic assemblies, that are able to transmit informati<strong>on</strong> <strong>on</strong> various<br />

scales [1]. Prot<strong>on</strong>ati<strong>on</strong> and microhydrati<strong>on</strong> of proteins or DNA are of<br />

fundamental importance in biochemical processes such as prot<strong>on</strong><br />

transport, water-mediated catalysis, molecular recogniti<strong>on</strong>, protein folding,<br />

etc. The understanding of these hydrati<strong>on</strong> effects at the molecular level<br />

requires the characterizati<strong>on</strong> of the interacti<strong>on</strong>s between biomolecules<br />

and their envir<strong>on</strong>ment. Due to its relevance in many fields, the<br />

microhydrati<strong>on</strong> process of nucleic acid bases or amino acids has received<br />

a widespread attenti<strong>on</strong> [2].<br />

In this work, we first describe the microhydrati<strong>on</strong> of prot<strong>on</strong>ated<br />

amino acid (AA), and particularly of prot<strong>on</strong>ated glycine (GlyH+) [3][4],<br />

alanine (AlaH+) [5] and proline (ProH+) [6]. First a high-level theoretical<br />

method was setting up in order to compute the structures and<br />

properties of GlyH+-water complexes, Gly being the simplest AA and a<br />

suitable model for such a study. Then complexes with more than <strong>on</strong>e<br />

water molecule as well as other amino acids (Ala and Pro) were<br />

investigated, to extend the validity domain of our computati<strong>on</strong>al<br />

procedure.<br />

We then investigate a series of complexes made up of a deprot<strong>on</strong>ated<br />

(ani<strong>on</strong>ic) AA and a single water molecule [7]. Such species have recently<br />

been identified with mass spectrometry, allowing meaningful<br />

comparis<strong>on</strong>s between theoretical and experimental complexati<strong>on</strong><br />

energies. The selected systems are [Gly-H]-, [Ala-H]-, [Val-H]-, [Asp-H]-,<br />

[Gln-H]-, as well as the acetic acid as a model benchmark.<br />

References<br />

1. Chaplin MF: Nat Rev Mol Cell Biol 2006, 7:861-866.<br />

2. de Vries MS, Hobza P: Annu Rev Phys Chem 2007, 58:585-612.<br />

3. Michaux C, Wouters J, Jacquemin D, Perpète EA: Chem Phys Lett 2007,<br />

445:57-61.<br />

4. Michaux C, Wouters J, Perpète EA, Jacquemin D: J Phys Chem B 2008,<br />

112:2430-2438.<br />

5. Michaux C, Wouters J, Perpète EA, Jacquemin D: J Phys Chem B 2008,<br />

112:9896-9902.<br />

6. Michaux C, Wouters J, Perpète EA, Jacquemin D: J Phys Chem B Letters 2008,<br />

112:7702-7705.<br />

7. Michaux C, Wouters J, Perpète EA, Jacquemin D: J Am Soc Mass Spectrom<br />

2009, 20:632-638.<br />

P16<br />

Predicting the binding type of compounds <strong>on</strong> the 4 adenosine<br />

receptors using proteochemometric models<br />

Olaf van den Hoven * , Gerard van Westen, Andreas Bender<br />

University of Leiden, LACDR, Einsteinweg 55, <strong>23</strong>33 CC Leiden,<br />

The Netherlands<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P16<br />

Page 14 of 25<br />

Aim: The use of in silico models could save a lot of m<strong>on</strong>ey, time and<br />

resources in the laboratories when in search for potent binding<br />

candidates of designated receptors. In this paper a proteochemometric<br />

model was developed that is able to distinguish between full ag<strong>on</strong>ist,<br />

ag<strong>on</strong>ist and antag<strong>on</strong>ist across all 4 Adenosine receptors (A1, A2A, A2B<br />

and A3).<br />

Experimental: The software program Pipeline Pilot was used to create a<br />

proteochemometric model c<strong>on</strong>taining 2 SVMs [1]. The first SVM model<br />

was built to predict binding activity, the sec<strong>on</strong>d SVM model to predict<br />

binding type activity. For the first model an in house database, Glida and<br />

part of the Starlite database were used. The pKi values in these databases<br />

were then capped to create an active and inactive class. For the sec<strong>on</strong>d<br />

model a classificati<strong>on</strong> in full ag<strong>on</strong>ist, ag<strong>on</strong>ist and antag<strong>on</strong>ist derived from<br />

the Glida database was used. Different Proteinfingerprints (ProtFPs) were<br />

tested. These FPs c<strong>on</strong>sisted of amino acid residues at important binding<br />

positi<strong>on</strong>s according to two entropy analysis (TEA) and crystal structure<br />

analysis [2]. The quality of the model was determined by MCC, specificity,<br />

sensitivity, predictive ability (Q2) and the correlati<strong>on</strong> coefficient (R2). The<br />

Tanimoto similarity coefficient was used to visualize the degree of<br />

average and minimal similarity of the three binding type groups.<br />

Results: In c<strong>on</strong>tinuous models FPs having a combinati<strong>on</strong> of TEA and<br />

crystal structure informati<strong>on</strong> scored the best (Q2 = 0.6683), but overall<br />

scores were not high enough to benefit from a c<strong>on</strong>tinuous model. A<br />

categorical model was thus build and it was able to distinguish between<br />

ag<strong>on</strong>ist (99% correctly predicted) and antag<strong>on</strong>ist (98% correct) and<br />

partially full ag<strong>on</strong>ists (63% correct) with a specificity of 0.85, a sensitivity<br />

of 0.83 and a MCC of 0.69. The Tanimoto coefficient showed that indeed<br />

antag<strong>on</strong>ist and full ag<strong>on</strong>ist were the least similar, where ag<strong>on</strong>ist show<br />

more resemblance to the full ag<strong>on</strong>ist. The model finally predicted 2 novel<br />

full ag<strong>on</strong>istic compounds in our in house database, being GIFT136 and<br />

GIFT589.<br />

C<strong>on</strong>clusi<strong>on</strong>: A model that is able to distinguish between binding types<br />

was successfully created.<br />

References<br />

1. K<strong>on</strong>tijevskis A: J Chem Inf Model 2008, 48(9):1840.<br />

2. Ye K: Proteins 2006, 63(4):1018.<br />

P17<br />

pharmACOphore: multiple flexible ligand alignment based <strong>on</strong> ant<br />

col<strong>on</strong>y optimizati<strong>on</strong><br />

Gerhard Hessler 1* , Oliver Korb 2 , P M<strong>on</strong>ecke 3 , T Stützle 4 , TE Exner 5<br />

1 Drug Design, Chemical and Analytical Sciences, Sanofi-Aventis Deutschland<br />

GmbH, Frankfurt, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 Cambridge, UK; 3 Frankfurt, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 4 Brussels,<br />

Belgium; 5 K<strong>on</strong>stanz, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P17<br />

Molecular alignment of biologically active compounds is a key technique<br />

in ligand-based drug design. Such alignments can be used for similaritybased<br />

identificati<strong>on</strong> of new compounds by using known active template<br />

structures as seeds. Furthermore, alignments allow for the identificati<strong>on</strong><br />

of potential pharmacophoric features in a set of chemically divers,<br />

biologically active molecules.<br />

Here, pharmACOphore is presented, a new approach for pairwise as well<br />

as multiple flexible alignments of ligands based <strong>on</strong> ant col<strong>on</strong>y<br />

optimizati<strong>on</strong> (ACO) [1]. Translati<strong>on</strong>al, rotati<strong>on</strong>al and torsi<strong>on</strong>al degrees of<br />

freedom of all ligand structures are encoded <strong>on</strong> a pherom<strong>on</strong>e vector<br />

(Figure 1). The artificial ant col<strong>on</strong>y uses this representati<strong>on</strong> to mark<br />

favourable values for each degree of freedom by depositing a<br />

pherom<strong>on</strong>e trail <strong>on</strong>to the corresp<strong>on</strong>ding pherom<strong>on</strong>e vector. The amount<br />

of pherom<strong>on</strong>e deposited directly depends <strong>on</strong> the soluti<strong>on</strong> quality,<br />

assessed by the scoring functi<strong>on</strong>.


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

Figure 1 (abstract P17).<br />

The scoring functi<strong>on</strong> was parameterized by reproducing reference<br />

alignments taken from four different proteins. The performance of the<br />

new alignment algorithm will be dem<strong>on</strong>strated by pairwise alignments<br />

obtained for examples from the FlexS data set [2] and by an additi<strong>on</strong>al<br />

example for multiple flexible alignment.<br />

References<br />

1. Dorigo M, Stützle T: Ant Col<strong>on</strong>y Optimizati<strong>on</strong> MIT Press, Cambridge, MA,<br />

USA 2004.<br />

2. Lemmen C, Lengauer T, Klebe G: FLEXS: A Method for Fast Flexible<br />

Ligand Superpositi<strong>on</strong>. J Med Chem 1998, 41:4502-4520.<br />

P18<br />

Bioisosteric similarity of drugs in virtual screening<br />

Michael C Hutter * , Markus Krier<br />

Center of Bioinformatics, Saarland University, Building C7.1, D-66041<br />

Saarbrücken, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P18<br />

Choosing compounds for screening is difficult problem due the vast<br />

chemical space. The questi<strong>on</strong> is thus which compounds are most likely to<br />

be hits. The comparis<strong>on</strong> of drugs that all target the same enzyme,<br />

however, shows reoccurring chemical modificati<strong>on</strong> throughout all<br />

therapeutic categories. These so-called bioisosteric replacements [1]<br />

comprise simple exchanges of terminal atoms as well as more complex<br />

structural modificati<strong>on</strong>s, such as ring closures or rearrangements of larger<br />

fragments. To detect and evaluate all kinds of replacements we have<br />

designed an approach that adopts the algorithmic c<strong>on</strong>cept used to<br />

assess the homology of amino acid sequences to chemical molecules.<br />

The mutual exchange frequencies between distinct atom types are<br />

expressed in a substituti<strong>on</strong> matrix [2]. Likewise, pair-wise alignment<br />

betweenthemoleculesisc<strong>on</strong>structed using dynamic programming [3],<br />

with the compounds being represented as unique SMILES. To obtain the<br />

actual exchange frequencies, we refined an initial matrix based <strong>on</strong><br />

observed chemical replacements [4] by collecting the generated<br />

alignments of 1353 drugs from 33 therapeutic categories in an<br />

automated procedure. To compute the mutual bioisosteric similarity<br />

between two molecules a specific functi<strong>on</strong> has been derived that makes<br />

use of the alignment [5].<br />

To asses the suitability of this bioisosteric similarity for virtual screening,<br />

we compared the recovery of known drugs against the background of<br />

other substances. The majority of drugs possess a higher similarity<br />

within the same class than compared to substances from the ZINC [6]<br />

ortheProusScienceDrugsoftheFuturedatabase[7].Likewise,drugs<br />

for the same target are usually recovered at higher values of similarities<br />

than compared to other methods based <strong>on</strong> most comm<strong>on</strong> substructure<br />

and fingerprint approaches. Moreover, n<strong>on</strong>drugs without any<br />

pharmaceutical functi<strong>on</strong> exhibit c<strong>on</strong>siderably lower similarities than<br />

actual drugs.<br />

Furthermore, this bioisosteric similarity can be used to express the<br />

chemical diversity within a given compound class. We found that e.g.<br />

inhibitors of the HIV Reverse Transcriptase are more divers than<br />

Angiotensin-II Antag<strong>on</strong>ists and Tetracycline Antibiotics.<br />

References<br />

1. Patani GA, LaVoie EJ: Chem Rev 1996, 96:3147.<br />

2. Henikoff S, Henikoff JG: Proc Natl Acad Sci USA 1992, 89:10195.<br />

3. Smith TF, Waterman MS: J Mol Biol 1981, 147:195.<br />

4. Sheridan RP: J Chem Inf Comput Sci 2002, 42:103.<br />

5. Krier M, Hutter MC: J Chem Inf Model 2009, 49:1280.<br />

6. [http://zinc.docking.org].<br />

7. [http://pubchem.ncbi.nlm.nih.gov/].<br />

Page 15 of 25<br />

P19<br />

Updating existing QSAR models: selecti<strong>on</strong> and weighting of new data<br />

Tomas Öberg * , T Liu<br />

University of Kalmar, Sweden<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P19<br />

Computati<strong>on</strong>al chemistry and quantitative structure-activity relati<strong>on</strong>ships<br />

(QSAR) are foreseen to be extensively used in the implementati<strong>on</strong> of the<br />

new REACH regulati<strong>on</strong> for chemicals in Europe. However, for some<br />

compound groups the data are too few in number to permit both<br />

calibrati<strong>on</strong> and testing of a new model. Usage and previously developed<br />

or updated models are then viable alternatives.<br />

Perfluorocarboxylic acids (PFCAs) and fluoroteleomer alcohols (FTOHs) are<br />

two groups of envir<strong>on</strong>mentally relevant compounds, with unique physical<br />

and chemical properties. The subcooled liquid vapour pressure (pL) is <strong>on</strong>e<br />

such property, where experimental determinati<strong>on</strong>s are limited and far from<br />

c<strong>on</strong>sistent [1]. Updating is, however, challenging when the new compounds<br />

are far outside of the original calibrati<strong>on</strong> domain space. But by carefully<br />

selecting and weighting <strong>on</strong>ly three new compounds, we have been able to<br />

update a previously developed general QSAR model [2], to cover the new<br />

domain while maintaining predictive performance for the earlier calibrati<strong>on</strong><br />

and test data. The optimal weighting scheme was determined from the<br />

sample leverages and residuals in the calibrati<strong>on</strong> phase [3].<br />

The performance of this re-calibrated model greatly surpassed previous<br />

modelling attempts [4], when applied to an external test set of two<br />

PFCAsandfourFTOHswithpLintherange0.2-200Pa;withQ2Ext=<br />

0.994 and RMSEP = 0.190 units of log Pa. The domain coverage also<br />

increased from 1% to 51%, for 426 perfluoroalkylated compounds<br />

selected from the REACH registrati<strong>on</strong> list, the PhysProp database, and the<br />

OECD 2006 survey [5]. Selecti<strong>on</strong> and weighting of new calibrati<strong>on</strong> data<br />

can thus facilitate the extensi<strong>on</strong> and use of existing QSAR models.<br />

This investigati<strong>on</strong> was supported by the EU FP7 project CADASTER (grant<br />

agreement no. 212668).<br />

References<br />

1. Goss K-U, Br<strong>on</strong>ner G, Harner T, Hertel M, Schmidt TC: Envir<strong>on</strong> Sci Technol<br />

2006, 40:3572.<br />

2. Öberg T, Liu T: QSAR Comb Sci 2008, 27:273.<br />

3. Stork CL, Kowalski BR: Chemometr Intell Lab Syst 1999, 48:151.<br />

4. Arp HP, Niederer C, Goss K-U: Envir<strong>on</strong> Sci Technol 2006, 40:7298.<br />

5. Lists of PFOS, PFAS, PFOA, PFCA, related compounds and chemicals that<br />

may degrade to PFCA. ENV/JM/MONO(2006)16, OECD. 2007.<br />

P20<br />

DrugScorePPI for scoring protein-protein interacti<strong>on</strong>s: improving<br />

a knowledge-based scoring functi<strong>on</strong> by atomtype-based QSAR<br />

Dennis M Krüger * , Holger Gohlke<br />

Heinrich-Heine-University, Computati<strong>on</strong>al Pharmaceutical Chemistry, 40225<br />

Düsseldorf, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P20<br />

Protein-protein complexes are known to play key roles in many cellular<br />

processes. Therefore, knowledge of the three-dimensi<strong>on</strong>al structure of<br />

protein-complexes is of fundamental importance. A key goal in proteinprotein<br />

docking is to identify near-native protein-complex structures. In<br />

this work, we address this problem by deriving a knowledge-based<br />

scoring functi<strong>on</strong> from protein-protein complex structures and further finetuning<br />

of the statistical potentials against experimentally determined<br />

alanine-scanning results.<br />

Based <strong>on</strong> the formalism of the DrugScore approach1, distance-dependent<br />

pair potentials are derived from 850 crystallographically determined


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

protein-protein complexes 2. These DrugScorePPI potentials display<br />

quantitative differences compared to those of DrugScore, which was<br />

derived from protein-ligand complexes. When used as an objective<br />

functi<strong>on</strong> to score a n<strong>on</strong>-redundant dataset of 54 targets with “unbound<br />

perturbati<strong>on</strong>” soluti<strong>on</strong>s, DrugscorePPI was able to rank a near-native<br />

soluti<strong>on</strong> in the top ten in 89% and in the top five in 65% of the cases.<br />

Applied to a dataset of “unbound docking” soluti<strong>on</strong>s, DrugscorePPI was<br />

able to rank a near-native soluti<strong>on</strong> in the top ten in 100% and in the top<br />

five in 67% of the cases. Furthermore, Drugscore-PPI was used for<br />

computati<strong>on</strong>al alanine-scanning of a dataset of 18 targets with a total of<br />

309 mutati<strong>on</strong>s to predict changes in the binding free energy up<strong>on</strong><br />

mutati<strong>on</strong>s in the interface. Computed and experimental values showed a<br />

correlati<strong>on</strong> of R2 = 0.34. To improve the predictive power, a QSAR-model<br />

was built based <strong>on</strong> 24 residue-specific atom types that improves the<br />

correlati<strong>on</strong> coefficient to a value of 0.53, with a root mean square<br />

deviati<strong>on</strong> of 0.89 kcal/mol. A Leave-One-Out analysis yields a correlati<strong>on</strong><br />

coefficient of 0.41. This clearly dem<strong>on</strong>strates the robustness of the model.<br />

The applicati<strong>on</strong> to an independent validati<strong>on</strong> dataset of alaninemutati<strong>on</strong>s<br />

was used to show the predictive power of the method and<br />

yields a correlati<strong>on</strong> coefficient of 0.51. Based <strong>on</strong> these findings,<br />

Drugscore-PPI was used to successful identify hotspots in multiple<br />

protein-interfaces. These results suggest that DrugscorePPI is an adequate<br />

method to score protein-protein interacti<strong>on</strong>s.<br />

References<br />

1. Gohlke H, Hendlich M, Klebe G: Knowledge-based scoring functi<strong>on</strong> to<br />

predict protein-ligand interacti<strong>on</strong>s. J Mol Biol 2000, 295(2):337-356.<br />

2. Huang SY, Zou X: An iterative knowledge-based scoring functi<strong>on</strong> for<br />

protein-protein recogniti<strong>on</strong>. Proteins 2008, 72(2):557-579.<br />

P21<br />

Adaptati<strong>on</strong> of formal c<strong>on</strong>cept analysis for the systematic explorati<strong>on</strong><br />

of structure-activity and structure-selectivity relati<strong>on</strong>ships<br />

Eugen Lounkine * , Jürgen Bajorath<br />

Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical<br />

Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität<br />

B<strong>on</strong>n, Dahlmannstr. 2, D-53113 B<strong>on</strong>n, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P21<br />

Formal C<strong>on</strong>cept Analysis (FCA) is a data mining and visualizati<strong>on</strong><br />

approach originating from informati<strong>on</strong> science. It operates <strong>on</strong> binary<br />

relati<strong>on</strong>ships between objects and attributes, which are reported in a<br />

formal c<strong>on</strong>text. Formal c<strong>on</strong>cepts are sets of objects that share a defined<br />

subset of attributes. FCA organizes these c<strong>on</strong>cepts in lattices that reflect<br />

their relati<strong>on</strong>ship in terms of shared objects and/or attributes and allows<br />

the identificati<strong>on</strong> of objects with defined sets of attributes [1].<br />

Two adaptati<strong>on</strong>s of FCA that allow the systematic analysis of structureactivity<br />

and-selectivity relati<strong>on</strong>ships are presented. Fragment Formal<br />

C<strong>on</strong>cept Analysis (FragFCA) assesses the distributi<strong>on</strong> of molecular fragment<br />

combinati<strong>on</strong>s am<strong>on</strong>g ligands with closely related biological targets. This<br />

allows the identificati<strong>on</strong> of fragment signatures that exclusively occur in<br />

compounds with a defined activity profile. FragFCA also identifies fragment<br />

combinati<strong>on</strong>s that are characteristic of highly potent compounds against<br />

defined targets. Fragment signatures usually represent combinati<strong>on</strong>s of<br />

two or three fragments and can be used to differentiate active compounds<br />

of closely related targets for different target families [2,3].<br />

Molecular Formal C<strong>on</strong>cept Analysis (MolFCA) is introduced for the<br />

systematic comparis<strong>on</strong> of the selectivity of a compound against multiple<br />

targets and the extracti<strong>on</strong> of compounds with complex selectivity profiles<br />

from biologically annotated databases. Selectivity is assessed based <strong>on</strong><br />

pair-wise compound potency ratios. Thisallowsthedefiniti<strong>on</strong>multiple<br />

selectivity queries involving the comparis<strong>on</strong> of an arbitrary number of<br />

targets and compound potency values or ratios. The individual queries are<br />

applied in a sequential manner to retrieve compounds with desired<br />

selectivity against targets of interest. MolFCA operates <strong>on</strong> activity space<br />

representati<strong>on</strong>s of compounds and thus allows the identificati<strong>on</strong> of<br />

structurally diverse compounds matching a given selectivity profile [4].<br />

References<br />

1. Priss U: Ann Rev Inf Sci Technol 2006, 40:521.<br />

2. Lounkine E, Auer J, Bajorath J: J Med Chem 2008, 51:5342.<br />

3. Krüger F, Lounkine E, Bajorath J: Chem Med Chem 2009, 4:1174.<br />

4. Lounkine E, Stumpfe D, Bajorath J: J Chem Inf Model 2009, 49:1359.<br />

Page 16 of 25<br />

P22<br />

Systematic extracti<strong>on</strong> of structure-activity relati<strong>on</strong>ship informati<strong>on</strong> from<br />

biological screening data<br />

Mathias Wawer * , L Peltas<strong>on</strong>, Jürgen Bajorath<br />

Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical<br />

Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität<br />

B<strong>on</strong>n, Dahlmannstr. 2, D-53113 B<strong>on</strong>n, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P22<br />

The analysis of high-throughput screening data poses significant<br />

challenges to medicinal and computati<strong>on</strong>al chemists. The number of<br />

compounds assayed in a single screen is prohibitively large for manual<br />

data analysis and no generally applicable computati<strong>on</strong>al methods have<br />

thus far been developed to c<strong>on</strong>sistently solve the problem of how to<br />

best select hits for further chemical explorati<strong>on</strong>.<br />

Focusing <strong>on</strong> the questi<strong>on</strong> of how structure-activity relati<strong>on</strong>ship (SAR)<br />

informati<strong>on</strong> can be used to support this decisi<strong>on</strong>, we present methods<br />

for the descriptive analysis of screening data. Network representati<strong>on</strong>s<br />

visualize the distributi<strong>on</strong> of 2D similarity relati<strong>on</strong>ships and potency in a<br />

data set and give an overview of global and local features of an activity<br />

landscape. Although dominated by many weakly active hits, different<br />

local SAR envir<strong>on</strong>ments can be identified am<strong>on</strong>g screening hits, thus<br />

helping to focus <strong>on</strong> regi<strong>on</strong>s in chemical space that might show favorable<br />

SAR behavior in further explorati<strong>on</strong> [1].<br />

A more detailed analysis of the data is achieved by systematically mining<br />

the network for SAR pathways, i.e. sequences of pairwise similar<br />

compounds that c<strong>on</strong>nect two molecules via a gradually increasing<br />

potency gradient. The SAR pathways are calculated exhaustively for all<br />

possible compound pairs in a data set to identify those having most<br />

significant SAR informati<strong>on</strong> c<strong>on</strong>tent. Often, high-scoring pathways lead to<br />

activity cliffs, i.e. pairs of similar compounds with significant differences in<br />

potency, and scaffold transiti<strong>on</strong>s can be observed al<strong>on</strong>g the pathways.<br />

Furthermore, a tree structure organizes alternative pathways that begin at<br />

the same compound but lead to different molecules and chemotypes.<br />

Similarly, SAR trees can be generated from all pathways that lead<br />

to an activity cliff in order to characterize the surrounding SAR<br />

microenvir<strong>on</strong>ment [2].<br />

References<br />

1. Wawer M, Peltas<strong>on</strong> L, Bajorath J: J Med Chem 2009, 52:1075-1080.<br />

2. Wawer M, Bajorath J: Chem Med Chem in press.<br />

P<strong>23</strong><br />

High throughput in-silico screening against flexible protein receptors<br />

Horacio Pérez-Sánchez 1* , Bernhard Fischer 1 , Daria Kokh 2 , Holger Merlitz 1 ,<br />

Wolfgang Wenzel 1<br />

1 Institut für Nanotechnologie, Forschungszentrum Karlsruhe, Postfach 3640,<br />

76021 Karlsruhe, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 Wuppertal, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P<strong>23</strong><br />

Based <strong>on</strong> the stochastic tunneling method (STUN) [1] we have developed<br />

FlexScreen [2], a novel strategy for high-throughput in-silico screening of<br />

large ligand databases. Each ligand of the database is docked against the<br />

receptor using an all-atom representati<strong>on</strong> of both ligand and receptor. The<br />

ligands with the best evaluated affinity are selected as lead candidates for<br />

drug development. Using the thymidine kinase inhibitors as a prototypical<br />

example we documented [3] the shortcomings of rigid receptor screens in a<br />

realistic system. We dem<strong>on</strong>strate a gain in both overall binding energy and<br />

overall rank of the known substrates when two screens with a rigid and<br />

flexible (up to 15 sidechain dihedral angles) receptor are compared. We<br />

note that the STUN suffers <strong>on</strong>ly a comparatively small loss of efficiency<br />

when an increasing number of receptor degrees of freedom is c<strong>on</strong>sidered.<br />

FlexScreen thus offers a viable compromise [4] between docking flexibility<br />

and computati<strong>on</strong>al efficiency to perform fully automated database screens<br />

<strong>on</strong> hundreds of thousands of ligands. We also investigate enrichment rates<br />

[5] of rigid, soft and flexible receptor models [6] for 12 diverse receptors<br />

using libraries c<strong>on</strong>taining up to 13000 molecules. A flexible sidechain model<br />

with flexible dihedral angles for up to 12 aminoacids increased both binding<br />

propensity and enrichment rates: EF_1 values increased by 35% <strong>on</strong> average<br />

with respect to rigid-docking (3-8 flexible sidechains). This methodology will<br />

be so<strong>on</strong> available for the Cell processor and Pipeline Pilot.


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

References<br />

1. Wenzel W, Hamacher K: Stochastic tunneling approach for global<br />

minimizati<strong>on</strong> of complex potential energy landscapes. Physical Review<br />

Letters 1999, 82(15):3003-3007.<br />

2. Merlitz H, Burghardt B, Wenzel W: Applicati<strong>on</strong> of the stochastic tunneling<br />

method to high throughput database screening. Chemical Physics Letters<br />

2003, 370(1-2):68-73.<br />

3. Merlitz H, Burghardt B, Wenzel W: Impact of receptor c<strong>on</strong>formati<strong>on</strong> <strong>on</strong> in<br />

silico screening performance. Chemical Physics Letters 2004,<br />

390(4-6):500-505.<br />

4. Fischer B, et al: Accuracy of binding mode predicti<strong>on</strong> with a cascadic<br />

stochastic tunneling method. Proteins-Structure Functi<strong>on</strong> and Bioinformatics<br />

2007, 68(1):195-204.<br />

5. Kokh DB, Wenzel WG: Flexible side chain models improve enrichment<br />

rates in in silico screening. Journal of Medicinal Chemistry 2008,<br />

51(19):5919-5931.<br />

6. Fischer B, Fukuzawa K, Wenzel W: Receptor-specific scoring functi<strong>on</strong>s<br />

derived from quantum chemical models improve affinity estimates for<br />

in-silico drug discovery. Proteins-Structure Functi<strong>on</strong> and Bioinformatics 2008,<br />

70(4):1264-1273.<br />

P24<br />

Virtual screening by high-throughput docking using hydrogen<br />

b<strong>on</strong>ding c<strong>on</strong>straints for targeting a protein-protein interface in<br />

M. tuberculosis<br />

Oliver Koch 1,2* , T Jäger 2 , L Flohé 2 , PM Selzer 1<br />

1 Intervet Innovati<strong>on</strong> GmbH, Zur Propstei, 55270 Schwabenheim, <str<strong>on</strong>g>German</str<strong>on</strong>g>y;<br />

2 MOLISA GmbH, Brenneckestraße 20, 39118 Magdeburg, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P24<br />

The resurgence of Tuberculosis, caused primarily by Mycobacterium<br />

tuberculosis, and the appearance of drug resistant tuberculosis strains<br />

encourage the need for new drugs with alternative mode of acti<strong>on</strong>s [1]. An<br />

interesting drug-target is the Thioredoxin-Reductase (TrxR)/Thioredoxin (Trx)<br />

System of M. tuberculosis, which is resp<strong>on</strong>sible for providing reducing<br />

equivalents for many cellular processes including bacterial antioxidant<br />

defence [2].<br />

We performed a virtual screening approach for identifying novel drugs for<br />

the treatment of Tuberculosis that used the hydrogen b<strong>on</strong>ding c<strong>on</strong>straint<br />

functi<strong>on</strong>ality of GOLD for pre-processing instead of using pharmacophore<br />

filtering. Two important hydrogen b<strong>on</strong>ding interacti<strong>on</strong>s could be identified<br />

at the protein-protein interface of the TrxR-Trx complex. A high-throughput<br />

docking was applied to filter the Intervet in-house compound library (~6.5<br />

milli<strong>on</strong> compounds) for possible hits that interact with these two hydrogen<br />

b<strong>on</strong>ding acceptors. This reduced the number of interesting compounds to<br />

151768 that were redocked without any c<strong>on</strong>straints and more accurate<br />

docking settings. Finally, the ranking of the reduced compound set was<br />

obtained using a normalisati<strong>on</strong> based <strong>on</strong> molecular weight [3] and a<br />

c<strong>on</strong>sensus scoring in a restrictive way using Chemscore and ASPscore.<br />

So far, six out of the first 25 of the ranked compound list showed an<br />

activity with an IC 50 value upto M range. Especially three compounds<br />

with different scaffolds and a low molecular weight are promising<br />

candidates for further developments.<br />

References<br />

1. World Health Organizati<strong>on</strong> Tuberculosis Programme. [http://who.int/tb/].<br />

2. Jaeger T, Flohé L: The thiol-based redox networks of pathogens:<br />

unexploited targets in the search for new drugs. Biofactors 2006,<br />

27:109-120.<br />

3. Carta A, Knox JS, Lloyd DG: Unbiasing Scoring Functi<strong>on</strong>s: A New<br />

Normalizati<strong>on</strong> and Rescoring Strategy. JCIM 2007, 47:1564-1571.<br />

P25<br />

Ensemble docking revisited<br />

Oliver Korb * , S Bowden, T Olss<strong>on</strong>, D Frenkel, J Liebeschuetz, J Cole<br />

Cambridge Crystallographic Data Centre, 12 Uni<strong>on</strong> Road, Cambridge,<br />

CB2 1EZ, UK<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P25<br />

In recent years, the importance of c<strong>on</strong>sidering induced fit effects in<br />

molecular docking calculati<strong>on</strong>s has been widely recognised in the<br />

Page 17 of 25<br />

molecular modelling community. While small-scale protein side-chain<br />

movements are now accounted for in many state-of-the-art docking<br />

strategies, the explicit modelling of large-scale protein moti<strong>on</strong>s such as<br />

loop movements in kinase domains is still a challenging task. For this<br />

reas<strong>on</strong> ensemble-based methods have been introduced taking into<br />

account several discrete protein c<strong>on</strong>formati<strong>on</strong>s in the c<strong>on</strong>formati<strong>on</strong>al<br />

sampling step. Our protein-ligand docking approach GOLD [1,2] has been<br />

extended to search such c<strong>on</strong>formati<strong>on</strong>al ensembles time-efficiently. The<br />

performance of the approach has been assessed <strong>on</strong> several protein<br />

targets using different scoring functi<strong>on</strong>s. A detailed analysis of pose<br />

predicti<strong>on</strong> and virtual screening results in dependence of the number of<br />

protein structures c<strong>on</strong>sidered in the c<strong>on</strong>formati<strong>on</strong>al ensemble will be<br />

presented and limitati<strong>on</strong>s of the approach will be highlighted.<br />

References<br />

1. J<strong>on</strong>es G, Willett P, Glen RC: Molecular Recogniti<strong>on</strong> of Receptor Sites Using<br />

a Genetic Algorithm with a Descripti<strong>on</strong> of Desolvati<strong>on</strong>. J Mol Biol 1995,<br />

245:43-53.<br />

2. J<strong>on</strong>es G, Willett P, Glen RC, Leach AR, Taylor R: Development and<br />

Validati<strong>on</strong> of a Genetic Algorithm for Flexible Docking. J Mol Biol 1997,<br />

267:727-748.<br />

P26<br />

Picking out polymorphs: H-b<strong>on</strong>d predicti<strong>on</strong> and crystal structure<br />

stability<br />

Peter TA Galek * , Frank H Allen, László Fábián<br />

Cambridge Crystallographic Data Centre, 12 Uni<strong>on</strong> Road, Cambridge,<br />

CB2 1EZ, UK<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P26<br />

A methodology has been developed to predict the propensity for<br />

hydrogen b<strong>on</strong>ds to form in crystal structures, treating each potential<br />

H-b<strong>on</strong>d as a binary resp<strong>on</strong>se variable, and modelling its likelihood using<br />

a set of relevant chemical descriptors [1]. Modelling is tailored to a target<br />

using chemically similar known structures,frome.g.theCambridge<br />

Structural Database [2], making it accessible to the complete spectrum of<br />

organic structures, including solvates, hydrates and cocrystals. Recent<br />

work has developed the approach to predicting inter- and intramolecular<br />

H-b<strong>on</strong>ds when either type can occur.<br />

By way of a comparis<strong>on</strong> between possible and observed H-b<strong>on</strong>ds, the<br />

method has been applied to assess structural stability, which shows much<br />

promiseinthedomainofpolymorphscreeninginthepharmaceutical<br />

industry. We will introduce the methodology and illustrate its applicati<strong>on</strong><br />

using a selecti<strong>on</strong> of pharmaceutical compounds, <strong>on</strong>e of which will be<br />

Abbott’s well-publicised anti-HIV medicati<strong>on</strong> rit<strong>on</strong>avir (Norvir). Owing to a<br />

hidden, more stable form II with much lower bioavailability, rit<strong>on</strong>avir was<br />

temporarily withdrawn from the market with significant financial impact<br />

[3]. Our method quickly suggests a real threat of polymorphism in this<br />

compound, and str<strong>on</strong>gly supports the relative stability of form II over<br />

form I. For all examples, the high predictivity of the method is emphasised.<br />

References<br />

1. Galek PTA, Fábián L, Allen FH, Motherwell WDS, Feeder N: Acta Cryst 2007,<br />

B63:768-782.<br />

2. Allen FH: Acta Cryst 2002, B58:380-388.<br />

3. Bauer J, Spant<strong>on</strong> S, Henry R, Quick J, Dziki W, Porter W, Morris J: Pharm Res<br />

2001, 18:859-866.<br />

P27<br />

Kernel learning for ligand-based virtual screening: discovery of a new<br />

PPARg ag<strong>on</strong>ist<br />

Matthias Rupp 1* , T Schroeter 2 , R Steri 1 , Ewgenij Proschak 1 , K Hansen 2 , H Zettl 1 ,<br />

O Rau 1 , M Schubert-Zsilavecz 1 , K-R Müller 2 , Gisbert Schneider 1<br />

1<br />

University of Frankfurt, Siesmayerstr. 70, 603<strong>23</strong> Frankfurt am Main, <str<strong>on</strong>g>German</str<strong>on</strong>g>y;<br />

2<br />

Berlin, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P27<br />

We dem<strong>on</strong>strate the theoretical and practical applicati<strong>on</strong> of modern<br />

kernel-based machine learning methods to ligand-based virtual<br />

screening by successful prospective screening for novel ag<strong>on</strong>ists of the<br />

peroxisome proliferator-activated receptor g (PPARg) [1].PPARg is a<br />

nuclear receptor involved in lipid and glucose metabolism, and related


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

to type-2 diabetes and dyslipidemia. Applied methods included a graph<br />

kernel designed for molecular similarity analysis [2], kernel principle<br />

comp<strong>on</strong>ent analysis [3], multiple kernel learning [4], and, Gaussian<br />

process regressi<strong>on</strong> [5].<br />

In the machine learning approach to ligand-based virtual screening, <strong>on</strong>e<br />

uses the similarity principle [6] to identify potentially active compounds<br />

based <strong>on</strong> their similarity to known reference ligands. Kernel-based<br />

machine learning [7] uses the “kernel trick”, a systematic approach to the<br />

derivati<strong>on</strong> of n<strong>on</strong>-linear versi<strong>on</strong>s of linear algorithms like separating<br />

hyperplanes and regressi<strong>on</strong>. Prerequisites for kernel learning are similarity<br />

measures with the mathematical property of positive semidefiniteness<br />

(kernels).<br />

The iterative similarity optimal assignment graph kernel (ISOAK) [2] is<br />

defined directly <strong>on</strong> the annotated structure graph, and was designed<br />

specifically for the comparis<strong>on</strong> of small molecules. In our virtual screening<br />

study, its use improved results, e.g., in principle comp<strong>on</strong>ent analysisbased<br />

visualizati<strong>on</strong> and Gaussian process regressi<strong>on</strong>. Following a<br />

thorough retrospective validati<strong>on</strong> using a data set of 176 published<br />

PPARg ag<strong>on</strong>ists [8], we screened a vendor library for novel ag<strong>on</strong>ists.<br />

Subsequent testing of 15 compounds in a cell-based transactivati<strong>on</strong> assay<br />

[9] yielded four active compounds.<br />

The most interesting hit, a natural product derivative with cyclobutane<br />

scaffold, is a full selective PPARg ag<strong>on</strong>ist (EC50 = 10 ± 0.2 μM, inactive <strong>on</strong><br />

PPARa and PPARb/δ at 10 μM). We dem<strong>on</strong>strate how the interplay of<br />

several modern kernel-based machine learning approaches can<br />

successfully improve ligand-based virtual screening results.<br />

References<br />

1. Henke B: Progress in Medicinal Chemistry 2004, 42:1-53.<br />

2. Rupp M, Proschak E, Schneider G: J Chem Inf Model 2007, 47(6):2280-2286.<br />

3. Schölkopf B, Smola A, Müller K-R: Neural Comput 1998, 10(5):1299-1319.<br />

4. S<strong>on</strong>nenburg S, Rätsch G, Schäfer C, Schölkopf B: J Mach Learn Res 2006,<br />

7(7):1531-1565.<br />

5. Rasmussen C, Williams C: MIT Press, Cambridge 2006.<br />

6. Johns<strong>on</strong> M, Maggiora M: Wiley, New York 1990.<br />

7. Schölkopf B, Smola A: MIT Press, Cambridge 2002.<br />

8. Rücker C, Scarsi M, Meringer M: Bioorg Med Chem 2006, 14(15):5178-5195.<br />

9. Rau O, Wurglics M, Paulke A, Zitzkowski J, Meindl N, Bock A,<br />

Dingermann T, Abdel-Tawab M, Schubert-Zsilavecz M: Planta Med 2006,<br />

72(10):881-887.<br />

P28<br />

OrChem: an open source chemistry search engine for Oracle<br />

Mark L Rijnbeek * , Christoph Steinbeck<br />

EMBL EBI, Wellcome Trust Genome Campus, Cambridge, UK<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P28<br />

Registrati<strong>on</strong>, indexing and searching of chemical structures in relati<strong>on</strong>al<br />

databases is <strong>on</strong>e of the core areas of cheminformatics. However, little<br />

detail has been published <strong>on</strong> the inner workings of search engines and<br />

their development is mostly closed-source. We decided to implement an<br />

open source chemical library for Oracle, the de-facto database standard<br />

in the commercial world.<br />

We present OrChem, an extensi<strong>on</strong> for the Oracle 11G database that adds<br />

registrati<strong>on</strong> and indexing of chemical structures to support fast<br />

substructure and similarity searching. The cheminformatics functi<strong>on</strong>ality is<br />

provided by the Chemistry Development Kit [1,2]. OrChem enables<br />

similarity searching with resp<strong>on</strong>se times in the order of sec<strong>on</strong>ds for<br />

databases with milli<strong>on</strong>s of compounds, depending <strong>on</strong> provided similarity<br />

cut-off. For substructure searching, OrChem can make use of multiple<br />

processor cores <strong>on</strong> today’s powerful database servers to provide fast<br />

resp<strong>on</strong>se times in equally large data sets.<br />

OrChem is a mix of PL/SQL and Java that executes inside the database.<br />

The user interacts with OrChem with calls to PL/SQL and Java Stored<br />

Procedures. Starting with Oracle 11g there is a just-in-time (JIT) compiler<br />

for the Oracle JVM envir<strong>on</strong>ment which makes Java run much faster inside<br />

the database than previously.<br />

OrChem is built <strong>on</strong> top of the Chemistry Development Kit (CDK) and<br />

depends <strong>on</strong> this Java library in numerous ways. For example, compounds<br />

are represented internally as CDK molecule objects, the CDK’s I/O<br />

package is used to retrieve compound data, and its subgraph<br />

isomorphism algorithms are used for substructure validati<strong>on</strong>. OrChem<br />

adds its own Java layer <strong>on</strong> top of the CDK to implement fast database<br />

storage and retrieval. With the CDK loaded into Oracle, a large<br />

cheminformatics library becomes readily available to PL/SQL. With little<br />

effort developers can build database functi<strong>on</strong>s around the CDK and so<br />

quickly implement chemistry extensi<strong>on</strong>s for Oracle.<br />

References<br />

1. Steinbeck , et al: J Chem Inf Comput Sci 2003, 43(2):493-500.<br />

2. Steinbeck , et al: Curr Pharm Des 2006, 12(17):2111-2120.<br />

P29<br />

Computati<strong>on</strong>al fragment-based drug design to explore the<br />

hydrophobic subpocket of the mitotic kinesin Eg5 allosteric<br />

binding site<br />

Ksenia Oguievetskaia *† , Laetitia Martin-Chanas *† , Artem Vorotyntsev,<br />

Olivia Doppelt-Azeroual, Xavier Brotel, Stewart A Adcock,<br />

Alexandre G de Brevern, Francois Delfaud, Fabrice Moriaud<br />

MEDIT SA, 2 rue du Belvédère, 91120 Palaiseau, France<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P29<br />

Page 18 of 25<br />

Eg5, a mitotic kinesin exclusively involved in the formati<strong>on</strong> and functi<strong>on</strong><br />

of the mitotic spindle has attracted interest as an anticancer drug target.<br />

Eg5 is co-crystallized with several inhibitors bound to its allosteric binding<br />

pocket. Each of these occupies a pocket formed by loop5/helix a2.<br />

Recently designed inhibitors additi<strong>on</strong>ally occupy a hydrophobic pocket of<br />

this site. The goal of the present study was to identify new fragments<br />

which fill this hydrophobic pocket and might be interesting chemical<br />

moieties to design new inhibitors.<br />

We’ve used the fragment based protocol of MED-SuMo which exploits the<br />

whole PDB <strong>on</strong> the basis that similar protein surfaces might bind the same<br />

fragment. The MED-SuMo software is able to compare and superimpose<br />

similar protein-ligand interacti<strong>on</strong> surfaces at PDB scale. Protein-fragment<br />

complexes are encoded as MED-Porti<strong>on</strong>s by cross-mining 3D PDB ligand<br />

structures with fragment-like PubChem molecules. Inhibitors are designed<br />

by hybridising MED-Porti<strong>on</strong>s with MED-Hybridise.<br />

Without using the known Eg5 PDB ligands that occupy the hydrophobic<br />

pocket, we’ve predicted new potential ligands that would fill<br />

simultaneously both pockets. Some of the latter are identified in libraries<br />

of synthetically accessible molecules by the MED-Search software.<br />

P30<br />

Learning antibacterial activity against S. Aureus <strong>on</strong> the Chimiothèque<br />

Nati<strong>on</strong>ale dataset<br />

G Marcou 1* , N Lachiche 1 , L Brillet 1 , J-M Paris 2 , A Varnek 1<br />

1 Université de Strasbourg, Faculté de Chimie de Strasbourg, Laboratoire<br />

d’infochimie, 4 rue Blaise Pascal, 67000 Strasbourg, France; 2 Paris, France<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P30<br />

“Chimiothèque Nati<strong>on</strong>ale” (CN) represents a library of synthetic and<br />

natural products from various French public laboratories. Recently,<br />

experimental screening for S. Aureus antibacterial activity of the part of<br />

this library has been performed by Dr J.-M. Paris and collaborators in<br />

Ecole des Mines (Paris, France). Here, experimental results of the<br />

screening have been used to build classificati<strong>on</strong> models using SMF<br />

descriptors and ISIDA modeling tools (Naïve Bayes (NB) and SVM<br />

modules). The dataset c<strong>on</strong>sisted in a large and structurally diverse set of<br />

4563 compounds, 62 of which having a dem<strong>on</strong>strated antibacterial<br />

activity. In NB calculati<strong>on</strong>s, several variati<strong>on</strong>s of the algorithm were used.<br />

In particular, the aprioridistributi<strong>on</strong> of SMF descriptors was modeled<br />

using a binomial law, Bernouilli distributi<strong>on</strong>s or first order logic. SVM<br />

calculati<strong>on</strong>s were performed with an RBF kernel which parameters (C,g)<br />

were optimized to maximize the balanced accuracy.<br />

Both NB and SVM models were validated using external 5-fold cross<br />

validati<strong>on</strong>, repeated three times <strong>on</strong> randomized data set. Additi<strong>on</strong>ally<br />

fifteen Y-randomizati<strong>on</strong>s were performed in order to check for chance<br />

correlati<strong>on</strong>. Predictive performance of the models has been assessed by<br />

combinati<strong>on</strong> of Precisi<strong>on</strong> and Recall parameters: the models with Precisi<strong>on</strong><br />

>0.1andRecall > 0.6 corresp<strong>on</strong>d to over 10-fold enrichment and,<br />

therefore, were c<strong>on</strong>sidered as acceptable.<br />

A c<strong>on</strong>sensus model was applied to estimate the antibacterial activity of<br />

122 new compounds for the CN. All predicti<strong>on</strong>s were d<strong>on</strong>e blindly.


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

P31<br />

Multi-parameter scoring functi<strong>on</strong>s for ligand- and structure-based<br />

de novo design<br />

Fabian Bös 1* , KM Smith 2 , Q Liu 3<br />

1 Tripos Internati<strong>on</strong>al, Martin-Kollar-Str. 17, 81829 München, <str<strong>on</strong>g>German</str<strong>on</strong>g>y;<br />

2 St. Louis, USA; 3 Shanghai, PR China<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P31<br />

Successful drug discovery often requires optimizati<strong>on</strong> against a set of<br />

biological and physical properties. We describe de novo design studies that<br />

dem<strong>on</strong>strate successful scaffold hops between known classes of ligands for<br />

p38 MAP kinases using ligand-based and structure-based multi-parameter<br />

scoring functi<strong>on</strong>s coupled to the molecular inventi<strong>on</strong> engine Muse.<br />

The ligand-based scoring functi<strong>on</strong> includes pharmacophoric and steric<br />

tuplets and structural (fingerprint based) similarity. In additi<strong>on</strong> various<br />

selectivity or ADME related properties (e.g. Lipinski properties, polar<br />

surface area, activity at off-targets, etc.). can be taken into account to<br />

guide the evoluti<strong>on</strong> of structures meeting multiple design criteria.<br />

The structure-based scoring functi<strong>on</strong> uses Surflex-Dock to pose and score<br />

invented structures inside the target’s active site. In additi<strong>on</strong>, a number of<br />

simple molecular properties (e.g. clogP, Lipinski properties, etc.) are used as<br />

score comp<strong>on</strong>ents to focus the design <strong>on</strong> medicinally relevant chemistries.<br />

With the ability of Surflex-Dock to start the docking process with a single or<br />

multiple placed fragments, this scoring functi<strong>on</strong> can be applied in fragment<br />

based drug discovery to optimize attachments <strong>on</strong>to a pre-placed substructure.<br />

P32<br />

Ligand-side tautomer enumerati<strong>on</strong> and scoring for structure-based<br />

drug-design<br />

Thomas Seidel * , G Wolber<br />

Inte:Ligand GmbH, Mariahilferstrasse 74B/11, A-1070 Vienna, Austria<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P32<br />

Tautomeric rearrangements of a molecule lead to distinct equilibrated<br />

structural states of the same chemical compound that may differ<br />

significantly in molecular shape, surface, nature of functi<strong>on</strong>al groups,<br />

hydrogen-b<strong>on</strong>ding pattern and other derived molecular properties [1].<br />

Especially for the structure-based pharmacophore modeling of ligandprotein<br />

complexes [2], knowledge of the most favorable tautomeric ligand<br />

states may be crucial for the quality and correctness of the generated<br />

pharmacophore models and derived putative binding modes. We will<br />

present a ligand-side tautomer enumerati<strong>on</strong> and ranking algorithm that<br />

c<strong>on</strong>siders both geometrical c<strong>on</strong>straints imposed by the c<strong>on</strong>formati<strong>on</strong><br />

of the bound ligand as well as intra- and inter-molecular energetic<br />

c<strong>on</strong>tributi<strong>on</strong>s to find the preferred tautomeric states of the ligand in the<br />

macromolecular envir<strong>on</strong>ment of the binding-site. The presented tautomer<br />

ranking algorithm is based <strong>on</strong> scores that are derived solely from MMFF94<br />

[3-9] energies (thus the method works for a wide range of organic<br />

molecules) and has proven to be able to top-rank known preferred<br />

tautomeric states of ligands in a series of investigated protein complexes.<br />

References<br />

1. Pospisil P, et al: J Recept Signal Transduct 2003, <strong>23</strong>:361-371.<br />

2. Wolber G, Langer T: J Chem Inf Model 2005, 45:160-169.<br />

3. Halgren TA: J Comput Chem 1996, 17:490-519.<br />

4. Halgren TA: J Comput Chem 1996, 17:520-552.<br />

5. Halgren TA: J Comput Chem 1996, 17:553-586.<br />

6. Halgren TA, Nachbar RB: J Comput Chem 1996, 17:587-615.<br />

7. Halgren TA: J Comput Chem 1996, 17:616-651.<br />

8. Halgren TA: J Comput Chem 1999, 20:720-729.<br />

9. Halgren TA: J Comput Chem 1999, 20:730-748.<br />

P33<br />

Maximum-score diversity selecti<strong>on</strong> for early drug discovery<br />

Thorsten Meinl * , C Ostermann, O Nimz, A Zaliani, MR Berthold<br />

University of K<strong>on</strong>stanz, 78457 K<strong>on</strong>stanz, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P33<br />

Diversity selecti<strong>on</strong> is a comm<strong>on</strong> task in early drug discovery, be it for<br />

removing redundant molecules prior to HTS or reducing the number of<br />

Page 19 of 25<br />

molecules to synthesize from scratch. One drawback of the current<br />

approach, especially with regard to HTS, is, however, that <strong>on</strong>ly the<br />

structural diversity is taken into account. The fact that a molecule may<br />

be highly active or completely inactive is usually ignored. This is<br />

especially remarkable, as quite a lot of research is involved in<br />

improving virtual screening methods in order to forecast activity. We<br />

therefore present a modified versi<strong>on</strong> of diversity selecti<strong>on</strong> – which we<br />

termed Maximum-Score Diversity Selecti<strong>on</strong> – which additi<strong>on</strong>ally takes<br />

the predicted activities of the molecules into account. Not very<br />

surprisingly both objectives – maximizing activity whilst also<br />

maximizing diversity in the selected subset – c<strong>on</strong>flict. As a result,<br />

we end up with a multiobjective optimizati<strong>on</strong> problem. We will show,<br />

that the task of diversity selecti<strong>on</strong> is quite complicated (it is NPcomplete)<br />

and therefore heuristic approaches are needed for typical<br />

dataset sizes.<br />

A comm<strong>on</strong> and popular approach is using multiobjective genetic<br />

algorithms, such as NSGA-II [1], for optimizing both objectives for the<br />

selected subsets. However, we will show that usual implementati<strong>on</strong>s<br />

suffer from severe limitati<strong>on</strong>s that prevent them from finding quite a lot<br />

of possible interesting soluti<strong>on</strong>s. Therefore, we evaluated two other<br />

heuristic for maximum-score diversity selecti<strong>on</strong>. One is special heuristic<br />

(called BB2) that was motivated by the menti<strong>on</strong>ed proof of NPcompleteness<br />

[2]. The other is a novel heuristics called Score Erosi<strong>on</strong><br />

which was specifically developed for our actual problem. Am<strong>on</strong>g all<br />

three heuristics, Score Erosi<strong>on</strong> is by far the fastest <strong>on</strong>e while finding<br />

soluti<strong>on</strong>s of equal quality compared to the genetic algorithm and BB2.<br />

This will be shown <strong>on</strong> several real world datasets, both public and<br />

internal <strong>on</strong>es.<br />

All experiments were carried out using the data analysis platform KNIME<br />

[3] therefore we will also show some example how maximumscore<br />

diversity selecti<strong>on</strong> can be performed inside workflow-based<br />

envir<strong>on</strong>ments.<br />

References<br />

1. Deb K, Pratap A, Agarwal S, Meyarivan T: A fast and elitist multiobjective<br />

genetic algorithm: NSGA-II. IEEE Transacti<strong>on</strong>s <strong>on</strong> Evoluti<strong>on</strong>ary Computati<strong>on</strong><br />

2002, 6:182-197.<br />

2. Erkut E: The discrete p-dispersi<strong>on</strong> problem. European Journal of<br />

Operati<strong>on</strong>al Research 1990, 46(1):48-60.<br />

3. [http://www.knime.org/].<br />

P34<br />

Progress <strong>on</strong> an open source computer-assisted structure elucidati<strong>on</strong><br />

suite (SENECA)<br />

Stefan Kuhn * , Christoph Steinbeck<br />

Wellcome Trust Genome Campus, Hinxt<strong>on</strong>, Cambridge CB10 1SD, UK<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P34<br />

We suggested and developed various comp<strong>on</strong>ents for Computer-Assisted<br />

Structure Elucidati<strong>on</strong> (CASE) over the years [1,2]. Our current goal is to<br />

integrate these into an easy to use and efficient platform for end users,<br />

called SENECA. This is based <strong>on</strong> Bioclipse [3], an integrated software suite<br />

for chemo- and bioinformatics providing plugins for file handling and<br />

visualisati<strong>on</strong> of compounds and spectra.<br />

The SENECA feature currently comprises the following comp<strong>on</strong>ents and<br />

algorithms:<br />

A deterministic structure generator suitable for small chemical spaces.<br />

Simulated Annealing and Genetic Algorithm for random structure walks<br />

in large spaces.<br />

Simulati<strong>on</strong> of 13C NMR spectra for ranking results.<br />

SENECA is a Bioclipse feature, available via the update site at http://www.<br />

ebi.ac.uk/steinbeck-srv/speclipse/net.bioclipse.seneca-updatesite/.<br />

References<br />

1. Steinbeck C, Krause S, Kuhn S: NMRShiftDB - C<strong>on</strong>structing a free chemical<br />

informati<strong>on</strong> system with open-source comp<strong>on</strong>ents. Journal of Chemical<br />

Informati<strong>on</strong> and Computer Sciences 2003, 43:1733-1739.<br />

2. Kuhn S, Egert B, Neumann S, Steinbeck C: Building blocks for automated<br />

elucidati<strong>on</strong> of metabolites: Machine learning methods for NMR<br />

predicti<strong>on</strong>. BMC Bioinformatics 2008, 9:400.<br />

3. Spjuth O, Helmus T, Willighagen EL, Kuhn S, Eklund M, et al: Bioclipse: An<br />

open source workbench for chemo- and bioinformatics. BMC<br />

Bioinformatics 2007, 8:59.


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

P35<br />

Predicti<strong>on</strong> of highly-c<strong>on</strong>nected ‘hub’-proteins in protein interacti<strong>on</strong><br />

networks using QSAR<br />

M Hsing 1* , K Byler 2 , A Cherkasov 1<br />

1<br />

University of British Columbia, <strong>23</strong>29 West Mall, Vancouver, BC Canada;<br />

2<br />

Rochester, USA<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P35<br />

Proteins that are most essential for functi<strong>on</strong>ing and viability of bacterial<br />

cell have been shown to exhibit larger number of interacti<strong>on</strong>s with other<br />

cell comp<strong>on</strong>ents. Thus, by identifying the most c<strong>on</strong>nected proteins (or<br />

hubs) in protein interacti<strong>on</strong> networks (PINs), <strong>on</strong>e may discover<br />

prospective drug targets that can be utilized to combat emergent and<br />

drug-resistant pathogens such as Methicillin-Resistant Staphylococcus<br />

aureus 252 (MRSA). The advantage of using such hub proteins as drug<br />

targets lies in their essentiality, n<strong>on</strong>-replaceable positi<strong>on</strong> in the PIN and<br />

lower rate of mutati<strong>on</strong>, which can help to counter bacterial resistance.<br />

However, finding or predicting such hub proteins remains a challenging<br />

task as the corresp<strong>on</strong>ding experiments are very costly, while traditi<strong>on</strong>al<br />

bioinformatics approaches generally fail in forecasting PIN data due to<br />

the general lack of agreement between the existing datasets [1].<br />

Thus, we have decided to utilize various structural and physicochemical<br />

features of proteins, related to traditi<strong>on</strong>al QSAR properties for predicting<br />

highly c<strong>on</strong>nected proteins. Using our own in-house generated PIN for the<br />

MRSA cell we have trained a boosting tree-based classifier that uses 75<br />

physical and chemical QSAR descriptors computed for all proteins in the<br />

interacti<strong>on</strong> network [2]. The utilized parameters included molecular<br />

weight, net charge, isoelectric point, hydrophobicity, surface area, solvent<br />

accessibilities, electr<strong>on</strong>egativity, sec<strong>on</strong>dary structure compositi<strong>on</strong>, surface<br />

coils and flexibility am<strong>on</strong>g other QSAR descriptors.<br />

The developed QSAR model has yielded a high predicti<strong>on</strong> accuracy of<br />

80% for the validati<strong>on</strong> set and was used to predict additi<strong>on</strong>al hubs in the<br />

rest of the MRSA proteome. The predicted hubs have then been<br />

evaluated experimentally and 55% of them were c<strong>on</strong>firmed as high<br />

interactors what corresp<strong>on</strong>ds to >5 fold dataset enrichment for potential<br />

hub-proteins provided by the developed QSAR model.<br />

Thus, the successful development of accurate hub classifiers dem<strong>on</strong>strated<br />

that highly-c<strong>on</strong>nected proteins tend to share certain structural and<br />

physicochemical features that can be characterized and quantified by<br />

c<strong>on</strong>venti<strong>on</strong>al QSAR descriptors.<br />

It is anticipated that the developed hub classifiers will represent a useful<br />

tool for the predicti<strong>on</strong> of highly-interacting proteins and can find broad<br />

applicati<strong>on</strong> for planning and executing large-scale proteomic experiments<br />

and for identificati<strong>on</strong> of novel and prospective antibacterial drug targets –<br />

even in those organisms that currently lack protein interacti<strong>on</strong> data.<br />

References<br />

1. Huang H, Jedynak BM, Bader JS: PLoS Comput Biol 2007, 3:e214.<br />

2. Byler K, Hsing M, Cherkasov A: QSAR & Combinatorial Science 2009,<br />

28:509-519.<br />

P36<br />

Efficient extracti<strong>on</strong> of can<strong>on</strong>ical spatial relati<strong>on</strong>ships using a recursive<br />

enumerati<strong>on</strong> of k-subsets<br />

Georg Hinselmann * , Nikolas Fechner, A Jahn, Andreas Zell<br />

University of Tübingen, Sand 1, 72076 Tübingen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P36<br />

The spatial arrangement of a chemical compound plays an important role<br />

regarding the related properties or activities. A straightforward approach to<br />

encode the geometry is to enumerate pairwise spatial relati<strong>on</strong>ships between<br />

k substructures, like functi<strong>on</strong>al groups or subgraphs. This leads to a<br />

combinatorial explosi<strong>on</strong> with the number of features of interest and<br />

redundant informati<strong>on</strong>. The goal of this work is to compute all possible<br />

k-subsets of spatial points and to extract a single can<strong>on</strong>ical descriptor for<br />

each subset in sub-polynomial computati<strong>on</strong> time. More precisely, the<br />

problem is to reduce the complexity of n k = n·(n - 1)...(n - k) possible<br />

relati<strong>on</strong>ships (patterns or descriptors) for n features and k-point<br />

relati<strong>on</strong>ships.<br />

We propose a two-step algorithm to solve this problem. A modified<br />

algorithm for the computati<strong>on</strong> of the binomial coefficient computes<br />

Page 20 of 25<br />

the k-subsets [1] c<strong>on</strong>taining the possible combinati<strong>on</strong>s of the n<br />

relevant features. If a k-subset is completed in the inner recursi<strong>on</strong>, the<br />

algorithm computes a can<strong>on</strong>ical representati<strong>on</strong> for it. By defining a<br />

natural order by means of the geometrical center of gravity of the k<br />

points, we extract k patterns that describe the distance to the center<br />

of gravity and type of the spatial feature k Î F. Then, the algorithm<br />

returns a unique identifier for the lexicographically sorted array of<br />

patterns. If applicable ( f� � f� �� f<br />

0 1 � ), an additi<strong>on</strong>al identifier<br />

k<br />

d�d 0�1 �<br />

�k�1�k is added which has the form ,<br />

f� � f f f<br />

o � ��� �<br />

1 k�1<br />

�k<br />

where dij denotes the geometrical distance between features i, j. Else<br />

( f� � f f<br />

o � ��<br />

1 � ), this step is omitted. Therefore, this approach<br />

k<br />

also c<strong>on</strong>siders stereochemistry. Finally, <strong>on</strong>e feature is returned for each<br />

k-subset resulting in a set of C(n, k) patterns describing the structure.<br />

Themainresultisthatthenumberoffeaturesisreducedfromnkto C(n, k), which equals the binomial coefficient. This procedure is useful<br />

in combinati<strong>on</strong> with similarity approaches that use spatial relati<strong>on</strong>ships,<br />

like pharmacophore searches, fingerprints, or graph kernels. We<br />

experimentally validated the algorithm <strong>on</strong> numerous QSAR benchmark<br />

sets in combinati<strong>on</strong> with the pharmacophore kernel [2].<br />

References<br />

1. Rolfe T: SIGCSE Bull 2001, 33(3):35-36.<br />

2. Mahé P, Ralaivola L, Stoven V, Vert J-P: J Chem Inf Mod 2006,<br />

46(5):2003-2014.<br />

P37<br />

A comparative study of in silico predicti<strong>on</strong> of pKa<br />

C Matijssen<br />

Cancer Research UK Centre for Cancer Therapeutics, The Institute of Cancer<br />

Research, 15 Cotswold Road, Sutt<strong>on</strong>, Surrey, SM2 5NG, UK<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P37<br />

The i<strong>on</strong>izati<strong>on</strong> c<strong>on</strong>stant (pKa) is the measure of the strength of an acid or<br />

base in a soluti<strong>on</strong>. Most ligands act as a weak acid or base. Accurate<br />

determinati<strong>on</strong> of pK a is important as it can improve the pharmaceutical<br />

properties of a compound [1]. In general, charged compounds have better<br />

solubility, but are less effective in membrane permeati<strong>on</strong>. Therefore,<br />

optimizati<strong>on</strong> of the pharmacokinetic profile of a compound can be<br />

performed by increasing or decreasing its i<strong>on</strong>izati<strong>on</strong> by changing functi<strong>on</strong>al<br />

groups. Another role for optimizati<strong>on</strong> of the charged group could be the<br />

interacti<strong>on</strong> with its target. Changing the charge <strong>on</strong> the functi<strong>on</strong>al group<br />

could improve the interacti<strong>on</strong> with its associated target and result in<br />

improved binding affinity.<br />

Different methods have been developed for the computati<strong>on</strong>al<br />

determinati<strong>on</strong> of pK a values. These can be based <strong>on</strong> different methods such<br />

as QSAR [2] or quantum chemistry approaches [3]. These two methods differ<br />

c<strong>on</strong>siderable in terms of computati<strong>on</strong>al resource and hence, the time<br />

needed for a predicti<strong>on</strong>. Depending <strong>on</strong> the number of ligands that needed<br />

predicti<strong>on</strong> and time available, a choice for <strong>on</strong>e method can be made.<br />

A comparis<strong>on</strong> between several pK a predictors (Pipeline Pilot, Moka,<br />

Epik and Jaguar) was made. All methods perform well when a diverse set<br />

of ligands which covers a range of pK a values is c<strong>on</strong>sidered. However,<br />

when optimizing a series of compounds the influence of small changes<br />

to the molecule and its effect <strong>on</strong> pKa becomes more difficult to predict.<br />

References<br />

1. Waterbeemd van de H, Gifford E: Nature Rev Drug Disc 2003, 2:192.<br />

2. Milletti F, Storchi L, Sforna G, Cruciani G: J Chem Inf Model 2007, 47:2172.<br />

3. Kinsella GK, Rodriguez F, Wats<strong>on</strong> GW, Rozas I: Bioorg Med Chem 2007, 15:2850.<br />

P38<br />

Kernel-based estimati<strong>on</strong> of the applicability domain of QSAR models<br />

Nikolas Fechner * , Georg Hinselmann, A Jahn, A Zell<br />

University of Tübingen, Sand 1, 72076 Tübingen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P38<br />

Machine learning techniques have become a valuable tool to assess<br />

molecular properties without the need of in vitro experiments. Most of


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

these methods do not give any informati<strong>on</strong> if a molecule that is<br />

predicted can be sufficiently described by the knowledge c<strong>on</strong>tained in<br />

the model. Thus, the estimati<strong>on</strong> of the reliability of a model-based<br />

predicti<strong>on</strong> is an important questi<strong>on</strong> in machine learning based QSAR<br />

modeling.<br />

One approach to solve this problem is to describe the porti<strong>on</strong> of the<br />

chemical space used during the training phase of a model. Any molecule<br />

includedinthesamesubspaceisthen c<strong>on</strong>sidered as a structure for<br />

which the model is regarded as valid. This c<strong>on</strong>cept of the descripti<strong>on</strong> of<br />

the subspace in which a model is regarded as reliable is known as the<br />

estimati<strong>on</strong> of the applicability domain of this model [1].<br />

Most machine learning approaches for QSAR rely <strong>on</strong> a vectorial<br />

representati<strong>on</strong> of the molecules. The applicability domain is expressed as<br />

a subspace of the vector space with <strong>on</strong>e dimensi<strong>on</strong> for each descriptor<br />

used. This c<strong>on</strong>cept can be not directly applied to kernel-based techniques<br />

like support vector machines. These methods rely <strong>on</strong> an implicit feature<br />

space that is <strong>on</strong>ly defined by the applied kernel similarity and with<br />

unknown dimensi<strong>on</strong>s. The applicability domain of a kernel-based model<br />

therefore has to be defined by means of the kernel. C<strong>on</strong>sequently, this<br />

allows to use structured similarity measures, like the Optimal Assignment<br />

Kernel [2] and its extensi<strong>on</strong> [3], instead of a numerical encoding. Thus, it<br />

is possible to describe the complex chemical structure of many drugs<br />

better than it would be using descriptors.<br />

In this work, several approaches to define the applicability domain of a<br />

QSAR model by means of a kernel are presented and compared to each<br />

other. The approach is to extend the c<strong>on</strong>cept of a kernel density<br />

estimati<strong>on</strong> to incorporate further informati<strong>on</strong> c<strong>on</strong>tained in a trained<br />

model. This can be achieved by using a weighted average kernel<br />

similarity of a predicted molecule to the training data set. The weights<br />

can be obtained either by exploiting the knowledge c<strong>on</strong>tained in the<br />

learned model or by approaches that describe the feature space structure<br />

using the kernel.<br />

References<br />

1. Jaworska J, Nikolova-Jeliazkova N, Aldenberg T: Altern Lab Anim 2005,<br />

33:445-459.<br />

2. Fröhlich H, Wegner J, Sieker F, Zell A: QSAR & Comb Sci 2006, 25:317-326.<br />

3. Fechner N, Jahn A, Hinselmann G, Zell A: J Chem Inf Mod 2009, 49:549-560.<br />

P39<br />

Expanding and understanding metabolite space<br />

Julio E Peir<strong>on</strong>cely 1* , Andreas Bender 2 , M Rojas-Chertó 2 , T Reijmers 2 ,<br />

L Coulier 1 , T Hankemeier 2<br />

1 TNO, Quality of Life, Utrechtseweg 48, Zeist, The Netherlands; 2 Netherlands<br />

Metabolomics Centre, University of Leiden, LACDR, Einsteinweg 55, <strong>23</strong>33 CC<br />

Leiden, The Netherlands<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P39<br />

In metabolomics the identity and role of low mass molecules called<br />

metabolites that are produced in cell metabolic processes are<br />

investigated. These make them valuable indicators of the phenotype of a<br />

biological system. The ‘Metabolite Space’ is the total chemical universe of<br />

metabolites present in all compartments and in all states from any<br />

organism. These molecules exhibit comm<strong>on</strong> features that form what can<br />

be called ‘metabolite likeness’. Here,wefocus<strong>on</strong>thehumanmetabolite<br />

space, including both endogenous and exogenous (such as drug)<br />

metabolites. In order to analyze the ‘Metabolite Space’, we collected data<br />

from the Human Metabolome Database (HMDB) [1] which is a<br />

comprehensive database for human metabolites c<strong>on</strong>taining over 7000<br />

compounds that were identified in several human biofluids and tissues.<br />

As there still remain many compounds to be identified that lay outside<br />

the boundaries of this known space, exploring this unknown regi<strong>on</strong> is<br />

crucial to evaluate ‘metabolite likeness’.<br />

In order to expand ‘Metabolite Space’ in our approach we employed the<br />

Retrosynthetic Combinatorial Analysis Procedure (RECAP) [2] to generate<br />

new molecules that possess features similar to those present in<br />

metabolites, however in other (but still likely) rearrangements. We studied<br />

how discernible these new molecules are from real metabolites and,<br />

hence, whether synthetic organic chemistry reacti<strong>on</strong>s are indeed able to<br />

expand the known universe of metabolites. We further studied the new<br />

chemistry present in the expanded metabolite space by looking at<br />

Murcko assemblies [3], ring systems and other chemical properties. The<br />

new metabolite space is compared to other small molecules, such as<br />

those obtained from the ZINC database, that are not metabolites. By<br />

combining all the above analyses we expect to characterize better the<br />

metabolite space, and furthermore, to predict the metabolite-likeness of a<br />

molecule and to understand its immanent properties.<br />

References<br />

1. Wishart DS, et al: Nucleic Acids Res 2007, 35 Database: D521-526.<br />

2. Lewell XQ, Judd DB, Wats<strong>on</strong> SP, Hann MM: J Chem Inf Comput Sci 1998,<br />

38:511-522.<br />

3. Bemis GW, Murcko MA: J Med Chem 1996, 39:2887-2893.<br />

P40<br />

Classificati<strong>on</strong> of CYP450 1A2 inhibitors using PubChem data<br />

Sergii Novotarskyi * , Iurii Sushko, R Körner, AK Pandey, Igor Tetko<br />

Helmholtz Zentrum München, Ingolstädter Landstraße 1, D-85764<br />

Neuherberg, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P40<br />

Page 21 of 25<br />

Cytochromes P450 (CYP450) are a superfamily of enzymes, involved in<br />

metabolism of a large number of xenobiotic compounds. CYP450 are<br />

involved in degradati<strong>on</strong> of a large amount of drugs, currently present <strong>on</strong><br />

the market. The promiscuity with respect to substrates makes the CYP450<br />

enzymes pr<strong>on</strong>e to inhibiti<strong>on</strong> by a large amount of drugs, which gives<br />

way to clinically significant drug-drug interacti<strong>on</strong>s.<br />

In this work different machine learning methods were applied to classify<br />

the inhibitors/n<strong>on</strong>inhibitors of human CYP450 1A2. The structures and<br />

the active/inactive classificati<strong>on</strong> c<strong>on</strong>cerning CYP1A2 inhibiti<strong>on</strong> were taken<br />

from PubChem BioAssay database. This assay uses human CYP1A2 to<br />

measure the demethylati<strong>on</strong> of luciferin 6’ methyl ether (Luciferin-ME;<br />

Promega-Glo) to luciferin.<br />

The tested methods include k nearest neighbors (kNN), decisi<strong>on</strong> tree,<br />

random forest, support vector machine (SVM) and associative neural<br />

networks (ASNN). The descriptors used were those from the Drag<strong>on</strong><br />

software, the fragment descriptors and the E-state indices.<br />

The training and test sets were handled separately to avoid different<br />

possibilities of overfitting - including overfitting by descriptor selecti<strong>on</strong>.<br />

Different applicability domain (AD) approaches were used to estimate the<br />

c<strong>on</strong>fidence of classificati<strong>on</strong>.<br />

As a result the models managed to correctly classify 80% of the test set<br />

instances. The accuracy of classificati<strong>on</strong> was found to be up to 95%, if<br />

<strong>on</strong>ly 30% most c<strong>on</strong>fident predicti<strong>on</strong>s were taken into account. The<br />

model was also applied to an external test set of 187 molecules,<br />

collected from literature and measured using a different etal<strong>on</strong> reacti<strong>on</strong>.<br />

For this set accuracy of 78% was achieved <strong>on</strong> the 30% most c<strong>on</strong>fident<br />

predicti<strong>on</strong>s.<br />

All the developed models are fast enough to be used for virtual screening<br />

of CYP1A2 inhibitors and n<strong>on</strong>inhibitors. The developed models are<br />

publicly available <strong>on</strong>-line at the http://qspr.eu web site.<br />

P41<br />

Applicability domain for classificati<strong>on</strong> problems<br />

Iurii Sushko * , S Novotarskyi, AK Pandey, R Körner, Igor Tetko<br />

Sushko, I., Helmholtz Zentrum München/IBIS, Ingolstädter Landstraße 1,<br />

D-85764 Neuherberg, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P41<br />

Classificati<strong>on</strong> models are frequent in QSAR modeling. It is of crucial<br />

importance to provide good accuracy estimati<strong>on</strong> for classificati<strong>on</strong>.<br />

Applicability domain provides additi<strong>on</strong>al informati<strong>on</strong> to identify which<br />

compounds are classified with best accuracy and which are expected to<br />

have poor and unreliable predicti<strong>on</strong>s. The selecti<strong>on</strong> of the most reliable<br />

predicti<strong>on</strong>s can dramatically improve performance of methods while<br />

decreasing coverage of predicti<strong>on</strong>s [1].<br />

In binary classificati<strong>on</strong> problems, labels for machine learning methods are<br />

discrete {-1, 1}. N<strong>on</strong>etheless, model usually yields predicti<strong>on</strong> that is<br />

c<strong>on</strong>tinuous. Most apparent metrics for accuracy estimati<strong>on</strong> is distance<br />

between predicti<strong>on</strong> point and edge of a class, i.e. the more is the<br />

distance between predicti<strong>on</strong> the edge of the class, the more reliable and<br />

accurate is the predicti<strong>on</strong> of given compound. This metric has been<br />

already used in several previous studies (e.g., [2]) and dem<strong>on</strong>strated good


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

separati<strong>on</strong> of reliable and n<strong>on</strong>-reliable classificati<strong>on</strong>s. In quantitative<br />

predicti<strong>on</strong>s, the standard deviati<strong>on</strong> of ensemble predicti<strong>on</strong>s has been<br />

found as the most accurate measure distance in a recent benchmarking<br />

[3].<br />

We propose to integrate both metrics. Rather than giving a point<br />

estimate, this approach provides us with a probability distributi<strong>on</strong> of<br />

findingparticularcompoundin<strong>on</strong>eof the classes. Suggested metrics is<br />

probability<br />

�<br />

E<br />

Navxdx ( , , )<br />

where E is class domain a - ensemble’s average predicti<strong>on</strong>, v – variance of<br />

ensemble’s predicti<strong>on</strong>,N(a, v, x) is probability density of the Gaussian<br />

distributi<strong>on</strong>. Performance of this metric and its comparis<strong>on</strong> to the<br />

traditi<strong>on</strong>al <strong>on</strong>es are evaluated for several QSAR/QSPR classificati<strong>on</strong><br />

problems. The developed approach can be freely accessed to develop<br />

and estimate applicability domain of classificati<strong>on</strong> models at http://qspr.eu<br />

web site.<br />

References<br />

1. Tetko IV, Bruneau P, Mewes HW, Rohrer DC, Poda GI: Drug Discov Today<br />

2006, 11:700.<br />

2. Manallack DT, Tehan BG, Gancia E, Huds<strong>on</strong> BD, Ford MG, Livingst<strong>on</strong>e DJ,<br />

Whitley DC, Pitt WR: J Chem Inf Comput Sci 2003, 43:674.<br />

3. Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Oberg T,<br />

Todeschini R, Fourches D, Varnek A: J Chem Inf Model 2008, 48:1733.<br />

P42<br />

Automatic pharmacophore model generati<strong>on</strong> using weighted<br />

substructure assignments<br />

Andreas Jahn * , H Planatscher, Georg Hinselmann, Nikolas Fechner, A Zell<br />

University of Tübingen, Sand 1, 72076 Tübingen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P42<br />

The generati<strong>on</strong> of a pharmacophore model is a challenging process,<br />

which often requires the interacti<strong>on</strong> of medicinal chemists. Given a<br />

number of ligands for a specific target, the aim is to identify the<br />

pharmacophore patterns that are resp<strong>on</strong>sible for the biological activities<br />

of chemical compounds. A recent study of optimal assignment methods<br />

has shown that the assignment of chemical substructures is able to<br />

detect active compounds in a data set [1]. Therefore, we investigated the<br />

possibility to use this technique to identify key features of a set of active<br />

compounds.<br />

To determine important substructures of active compounds, we<br />

integrated n weight factors, where n is the number of substructures. The<br />

substructures were defined using the pharmacophore definiti<strong>on</strong>s of Phase<br />

3.0 [2]. To define the individual weights of the pharmacophore patterns,<br />

we integrated a genetic algorithm which assigns weight factors to the<br />

previously defined patterns. The experimental setup was designed as<br />

follows: Given a data set with active compounds, the most active<br />

compound was selected as query structure for the experiment. The<br />

remaining active compounds were inserted into a background data set<br />

c<strong>on</strong>taining inactive compounds. The genetic algorithm evolved n weights<br />

for the pharmacophore patterns of the query structure. To evaluate the<br />

fitness of an individual, we performed a single query screening with the<br />

weights of the individual. During the optimizati<strong>on</strong> process, the BEDROC<br />

score [3] is optimized which puts emphasis <strong>on</strong> the early recogniti<strong>on</strong><br />

performance. The result of the genetic algorithm was a weight vector<br />

that assigns each pharmacophore feature the weight of the best<br />

individual.<br />

We evaluated our approach <strong>on</strong> a subset of the Directory of Useful Decoys<br />

that is suitable for ligand-based virtual screening [1][4]. The query<br />

structure was extracted from the same complexed crystal structure used<br />

by Huang et al. [4] to determine the binding site of the protein.<br />

The presented method is able to provide valuable informati<strong>on</strong> about key<br />

features that are important for the biological activity of a compound.<br />

Additi<strong>on</strong>ally, informati<strong>on</strong> of the protein structure is not needed.<br />

Therefore, the method can also be used to derive a pharmacophore<br />

model if no protein structure is available (e.g. GPCRs).<br />

Page 22 of 25<br />

References<br />

1. Jahn A, Hinselmann G, Fechner N, Zell A: Journal of <strong>Cheminformatics</strong> 2009,<br />

1:14.<br />

2. Phase, versi<strong>on</strong> 3.0 Schrödinger, LLC, New York, NY 2008.<br />

3. Truch<strong>on</strong> JF, Bayly CI: J Chem Inf Model 2007, 41:488-508.<br />

4. Huang N, Shoichet B, Irwin J: J Med Chem 2006, 49:6789-6801.<br />

P43<br />

A combined combinatorial and pKa-based approach to ligand<br />

prot<strong>on</strong>ati<strong>on</strong> states<br />

Tim ten Brink * , Thomas E Exner<br />

Department for Chemistry, University of K<strong>on</strong>stanz, D-78457 K<strong>on</strong>stanz,<br />

<str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P43<br />

The prot<strong>on</strong>ati<strong>on</strong> of the ligand molecule and the protein binding site has<br />

a significant influence <strong>on</strong> the results obtained by protein-ligand<br />

docking. Due to the inability of X-ray crystallography to resolve the<br />

hydrogen atom in protein and protein complex structures, the correct<br />

prot<strong>on</strong>ati<strong>on</strong> for the protein and the ligand has to be assigned <strong>on</strong> a<br />

theoretical basis before the structures can be used. Because of the local<br />

envir<strong>on</strong>ment inside the binding site and because of the influence of<br />

the ligand and the protein <strong>on</strong>to each other, the ligand prot<strong>on</strong>ati<strong>on</strong><br />

can differ from the prot<strong>on</strong>ati<strong>on</strong> <strong>on</strong>e would expect for the ligand in<br />

soluti<strong>on</strong> under physiological c<strong>on</strong>diti<strong>on</strong>s. Hence for protein-ligand<br />

docking different prot<strong>on</strong>ati<strong>on</strong> states of the ligand have to be taken into<br />

account.<br />

Our recently introduced structure preparati<strong>on</strong> tool SPORES [1] used a rule<br />

based method to generate a standard prot<strong>on</strong>ati<strong>on</strong> for each ligand<br />

molecule and afterwards generated different prot<strong>on</strong>ati<strong>on</strong> states in a<br />

combinatorial way by adding and removing single hydrogen atoms<br />

bel<strong>on</strong>ging to predefined functi<strong>on</strong>al groups. Docking of all these<br />

prot<strong>on</strong>ati<strong>on</strong> states of a given ligand molecule often led to scoring<br />

problems due to the different number of hydrogen interacti<strong>on</strong>s formed<br />

by the different protomers of the ligand molecule. To overcome these<br />

problems two methods to filter highly charged and unstable prot<strong>on</strong>ati<strong>on</strong><br />

states from the docking were implemented. One based <strong>on</strong> the difference<br />

between the standard prot<strong>on</strong>ati<strong>on</strong> of the ligand and the actual<br />

prot<strong>on</strong>ati<strong>on</strong> state and <strong>on</strong>e based <strong>on</strong> pKa values calculated with<br />

ChemAx<strong>on</strong>’s MARVIN software [2]. Here we present a new approach in<br />

which the ligand atoms c<strong>on</strong>sidered for the combinatorial method not<br />

chosen from predefined functi<strong>on</strong>al groups but according to the<br />

calculated pKa values which leads to a wider variety but with a smaller<br />

overall number of prot<strong>on</strong>ati<strong>on</strong> states.<br />

References<br />

1. ten Brink T, Exner TE: J Chem Inf Model 2009, 49(6):1535-1546.<br />

2. MARVIN 5.2.0, ChemAx<strong>on</strong> 2009 [http://www.chemax<strong>on</strong>.com].<br />

P44<br />

Semi-empirical derived descriptors for the modelling of properties of<br />

N-c<strong>on</strong>taining heterocycles<br />

Alexander Entzian 1* , Horst Bögel 1 , Mirko Buchholz 2 , Ulrich Heiser 2<br />

1 Martin-Luther-University Halle-Wittenberg, Institute for Organic Chemistry,<br />

Kurt-Mothes-Str. 2, 06120 Halle, <str<strong>on</strong>g>German</str<strong>on</strong>g>y; 2 Probiodrug AG, Dept. Medicinal<br />

Chemistry, Weinbergweg 22, 06120 Halle, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P44<br />

Nitrogen is <strong>on</strong>e of the most prominent hetero atoms found in<br />

heterocycles. The corresp<strong>on</strong>ding electr<strong>on</strong> l<strong>on</strong>e pairs of these nitrogen<br />

atoms are mainly resp<strong>on</strong>sible for properties like the basicity and the pkbvalues<br />

of the investigated heterocycle. Nevertheless, the overall electr<strong>on</strong>ic<br />

state of a molecule is also directly related to observable physico-chemical<br />

properties. This fact underlines a possible c<strong>on</strong>necti<strong>on</strong> between different<br />

investigated properties.<br />

The electr<strong>on</strong>ic properties of nitrogen c<strong>on</strong>taining compounds were<br />

analyzed with the aim to further predict these properties from<br />

quantum-mechanical descriptors. We thereby expect a relati<strong>on</strong>ship<br />

between the prot<strong>on</strong> affinity, the pkb-value and the strength of metal<br />

complexati<strong>on</strong>.


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

Because basicity seems to be a fundamental property of these compounds,<br />

the work was first focused <strong>on</strong> the prot<strong>on</strong> affinities of the molecules<br />

in the gas phase. We thereby c<strong>on</strong>sider the fact, that these values<br />

are not influenced by the solvati<strong>on</strong> in liquid phases. L<strong>on</strong>e pairs in<br />

nitrogen c<strong>on</strong>taining heterocycles play also an important role for metal<br />

complexati<strong>on</strong> or due to their nucleophilic attack in chemical reacti<strong>on</strong>s.<br />

Therefore 55 heterocyclic compounds were selected that bel<strong>on</strong>g to the<br />

compound classes of substituted pyridines, pyrazoles and imidazoles.<br />

These compound set was carefully divided into a training (37) and test<br />

set (18), and different descriptors of the electr<strong>on</strong>ic states were<br />

calculated by using the semi-empirical molecular orbital software<br />

MSINDO [1]. In regressi<strong>on</strong> analysis the following descriptors were<br />

identified and marked as important:<br />

Charge of the nitrogen atom,<br />

Energy of the l<strong>on</strong>e pair electr<strong>on</strong>s and<br />

Perturbati<strong>on</strong> treatment of the interacti<strong>on</strong> energy - Klopman [2].<br />

Using this set of descriptors a model was built and validated that predicts<br />

the experimental data for prot<strong>on</strong> affinity with an R 2 of 0.93 and a Q 2 of<br />

0.91. Furthermore, it was possible to successfully develop models for the<br />

predicti<strong>on</strong> of pkb-values and the strength of a metal complex formati<strong>on</strong><br />

for these compounds.<br />

References<br />

1. Bredow T, Geudtner G, Jug K: J Comput Chem 2001, 22:861.<br />

2. Klopman G: J Amer Chem Soc 1968, 90:2<strong>23</strong>.<br />

P45<br />

QSAR of anti-inflammatory drugs<br />

Ver<strong>on</strong>ica R Khayrullina 1* , H Bögel 2<br />

1 Bashkir State University, RU-450074 Ufa, Russia; 2 Halle, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P45<br />

The computer analysis of relati<strong>on</strong>s between molecular structures and their<br />

biological activity using fragment-based methods is very useful to draw<br />

c<strong>on</strong>clusi<strong>on</strong>s for the understanding of drug acti<strong>on</strong> and for the<br />

development of more efficient n<strong>on</strong>-toxic drug candidates.<br />

We used the computer system SARD-21 (Structure Activity Relati<strong>on</strong>ship &<br />

Design) to investigate comm<strong>on</strong> structural features (fragments and<br />

substituents) typical for high- and low-effective n<strong>on</strong>-steroid antiinflammatory<br />

drugs (NSAIDs) succesfully.<br />

This derived informati<strong>on</strong> has been used for the model for predicti<strong>on</strong> of<br />

anti-inflammatory effectiveness of medicines with 76% and 81% level of<br />

recogniti<strong>on</strong> by two methods. This informati<strong>on</strong> could be used for creating<br />

new highly effective NSAIDs, and for increasing effectiveness of already<br />

known comp<strong>on</strong>ents.<br />

In a sec<strong>on</strong>d part of this paper the interrelati<strong>on</strong> between structure and<br />

efficacy for anti-inflammatory drug acti<strong>on</strong> is carried out using traditi<strong>on</strong>al<br />

QSAR with descriptors from topology and from quantum-mechanical<br />

calculati<strong>on</strong>s followed by regressi<strong>on</strong> models from modelling.<br />

The aim of this paper is to compare both molecular approaches of<br />

molecular design of drugs.<br />

Reference<br />

1. Khayrullina VR, et al: Russ Chem Bull, Int Ed 2006, 55:1.<br />

P46<br />

Fingerprint-based detecti<strong>on</strong> of acute aquatic toxicity<br />

Luca Pireddu 1* , L Michielan 2 , M Floris 1 , P Rodriguez-Tomé 1 , S Moro 2<br />

1 CRS4, Pula, Italy; 2 University of Padova, Padova, Italy<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P46<br />

In this work we show the effectiveness of 2D structural fingerprints in the<br />

predicti<strong>on</strong> of aquatic toxicity of chemical compounds, creating a selfc<strong>on</strong>tained<br />

system for structure-based aquatic toxicity classificati<strong>on</strong>. Using<br />

the data from the U.S. Envir<strong>on</strong>mental Protecti<strong>on</strong> Agency Fat Head<br />

Minnow (EPA-FHM) dataset [1] we build a n<strong>on</strong>-linear RBF SVM [2]<br />

classifier that distinguishes acutely toxic compounds from less toxic<br />

compounds, loosely according to the criteri<strong>on</strong> stipulated by the E.U.<br />

Reach legislati<strong>on</strong> [3]. The classifier achieves up to 86% accuracy in leave-<br />

Page <strong>23</strong> of 25<br />

<strong>on</strong>e-out validati<strong>on</strong> using 580 of the dataset’s 614 compounds. This<br />

performance is comparable with models built from the same dataset<br />

using more sophisticated molecular descriptors, such as AutoMEP and<br />

Sterimol descriptors [4]. We apply our classificati<strong>on</strong> model to predict the<br />

aquatic toxicity of 3M compounds in the MMsINC database [5].<br />

Furthermore, we create a linear SVM model using the same technique<br />

and apply it to the MMsINC data, with the additi<strong>on</strong>al integrati<strong>on</strong> of the<br />

EXPLAIN system [6] which allows us to show which structural features are<br />

resp<strong>on</strong>sible for the model classifying a molecule as less toxic or acutely<br />

toxic.<br />

References<br />

1. Russom CL, Bradbury SP, Broderius SJ, Hammermeister DE, Drumm<strong>on</strong>d RA:<br />

Predicting modes of acti<strong>on</strong> from chemical structure: Acute toxicity in<br />

the fathead minnow (Pimephales promelas). Envir<strong>on</strong>mental Toxicology and<br />

Chemistry 1997, 16(5):948-967.<br />

2. Boser B, Guy<strong>on</strong> I, Vapnik V: A Training Algorithm for Optimal Margin<br />

Classifiers. Computati<strong>on</strong>al Learning Theory 1992, 144-152.<br />

3. EU: Corrigendum to Regulati<strong>on</strong> (EC) No 1907/2006 of the European<br />

Parliament and of the Council of 18 December 2006 c<strong>on</strong>cerning the<br />

Registrati<strong>on</strong>, Evaluati<strong>on</strong>, Authorizati<strong>on</strong> and Restricti<strong>on</strong> of Chemicals<br />

(REACH). Off J Eur Uni<strong>on</strong> L136 2007, 50.<br />

4. Michielan L, Pireddu L, Floris M, Bacilieri M, Rodriguez-Tomé P, Moro S:<br />

2009 in press.<br />

5. Masciocchi B, Frau G, Fant<strong>on</strong> M, Sturlese M, Floris M, Pireddu L, Palla P,<br />

Cedrati F, RodriguezTomé P, Moro S: MMsINC: a large-scale<br />

chemoinformatics database. Nucleic Acids Research 2008, 37 Database:<br />

D284-90.<br />

6. Poulin B, Eisner R, Szafr<strong>on</strong> D, Lu P, Greiner R, Wishart D, Fyshe A, Pearcy B,<br />

Macd<strong>on</strong>ell C, Anvik J: Visual Explanati<strong>on</strong> of Evidence in Additive<br />

Classifiers. Innovative Applicati<strong>on</strong>s of Artificial Intelligence 2006.<br />

P47<br />

Adaptive matrix metrics for molecular descriptor assessment in<br />

QSPR classificati<strong>on</strong><br />

Axel J Soto 1* , Marc Strickert 2 , GE Vazquez 1<br />

1 Planta Piloto de Ingeniería Química (PLAPIQUI), Universidad Naci<strong>on</strong>al del<br />

Sur, Alem 1250, 8000 Bahía Blanca, Argentina; 2 Leibniz Institute of Plant<br />

Genetics and Crop Plant Research (IPK), Corrensstr. 3, 06466 Gatersleben,<br />

<str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P47<br />

QSPR methods represent a useful approach in the drug discovery process,<br />

since they allow predicting in advance biological or physicochemical<br />

properties of a candidate drug. For this goal, it is necessary that the QSPR<br />

method be as accurate as possible to provide reliable predicti<strong>on</strong>s.<br />

Moreover, the selecti<strong>on</strong> of the molecular descriptors is an important task<br />

to create QSPR predicti<strong>on</strong> models of low complexity which, at the same<br />

time, provide accurate predicti<strong>on</strong>s.<br />

In this work, a matrix-based method [1] is used to transform the original<br />

data space of chemical compounds into an alternative space where<br />

compounds with different target properties can be better separated. For<br />

usingthisapproach,QSPRisc<strong>on</strong>sideredasaclassificati<strong>on</strong>problem.The<br />

advantage of using adaptive matrix metrics is twofold: it can be used to<br />

identify important molecular descriptors and at the same time it allows<br />

improving the classificati<strong>on</strong> accuracy.<br />

A recently proposed method making use of this c<strong>on</strong>cept [2] is extended<br />

to multi-class data. The new method is related to linear discriminant<br />

analysis and shows better results at yet higher computati<strong>on</strong>al costs. An<br />

applicati<strong>on</strong> for relating chemical descriptors to hydrophobicity property<br />

[3] shows promising results.<br />

References<br />

1. Strickert M, Keilwagen J, Schleif F-M, Villmann T, Biehl M: Matrix Metric<br />

Adaptati<strong>on</strong> Linear Discriminant Analysis of Biomedical Data. Lecture Notes<br />

in Computer Science 2009, 5517/2009:933-940.<br />

2. Strickert M, Soto AJ, Keilwagen J, Vazquez GE: Towards matrix-based<br />

selecti<strong>on</strong> of feature pairs for efficient ADMET predicti<strong>on</strong>. Argentine<br />

Symposium <strong>on</strong> Artificial Intelligence, ASAI 2009, 83-94.<br />

3. Soto AJ, Cecchini RL, Vazquez GE, P<strong>on</strong>z<strong>on</strong>i I: A Wrapper-based Feature<br />

Selecti<strong>on</strong> Method for ADMET Predicti<strong>on</strong> using Evoluti<strong>on</strong>ary Computing.<br />

Lecture Notes in Computer Science 2008, 4973/2008:188-199.


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

P48<br />

Evoluti<strong>on</strong>ary design of selective adenosine receptor ligands<br />

Horst van der Eelke * , J Kruisselbrink, A Aleman, MT Emmerich,<br />

Andreas Bender, AP IJzerman<br />

Divisi<strong>on</strong> of Medicinal Chemistry, Leiden/Amsterdam Center for Drug<br />

Research, Leiden University, Einsteinweg 55, <strong>23</strong>33CC Leiden, The Netherlands<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P48<br />

Evoluti<strong>on</strong>ary de novo design is the design of new active compounds from<br />

scratch, using evoluti<strong>on</strong>ary principles. It involves an iterative cycle of<br />

structure generati<strong>on</strong>, evaluati<strong>on</strong>, and selecti<strong>on</strong> of candidate structures.<br />

Selected candidates serve as input for the next generati<strong>on</strong>. By repeatedly<br />

selecting <strong>on</strong>ly the best structures as basis for structure generati<strong>on</strong>, the<br />

process evolves toward better (not necessarily best) soluti<strong>on</strong>s. Here, we<br />

applied an evoluti<strong>on</strong>ary algorithm (the Molecule Commander) for the design<br />

of selective adenosine receptor ligands. Generated compounds were all ruleof-5<br />

compliant and had a polar surface areas between 0 and 140 Å 2 ,<br />

favorable for intestinal absorpti<strong>on</strong>. In additi<strong>on</strong>, toxic compounds were<br />

filtered out using a categorical SVM model for predicti<strong>on</strong> of mutagenicity<br />

trained <strong>on</strong> Ames-test mutagenicity data (5-fold ROC score: 0.8948). Four<br />

pharmacophores were designed, <strong>on</strong>e for each human adenosine receptor<br />

subtype. The hA 1 receptor pharmacophore served as objective for the<br />

evoluti<strong>on</strong>, while the three other pharmacophores served as negative<br />

objective in order to obtain selectivity. In order to measure similarity with<br />

known adenosine receptor ligands, ring systems and scaffolds in the<br />

generated compounds were compared with those extracted from adenosine<br />

ligands in the StARLITe database. With each new generati<strong>on</strong>, the structures<br />

displayed an increasing number of ring systems also found in adenosine<br />

receptor ligands while the number of uniquecorestructures(scaffolds)<br />

increased as well. Eventually, the best (ADMET and pharmacophore score)<br />

candidate structures will be proposed for synthesis and tested for activity.<br />

P49<br />

I<strong>on</strong>otropic GABA receptors: modelling and design of selective ligands<br />

Vladimir A Palyulin * , EV Radchenko, DE Osolodkin, VI Chupakhin, NS Zefirov<br />

Department of Chemistry, Moscow State University, Moscow 119991, Russia<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P49<br />

I<strong>on</strong>otropic GABA A and GABA C receptors play an important role in the operati<strong>on</strong><br />

of CNS and serve as targets for many neuroactive drugs. Using the homology<br />

modelling and molecular dynamics, the 3D models of the receptors were built<br />

and some aspects of ligand-target interacti<strong>on</strong>s were elucidated [1,2].<br />

To better understand the structural factors c<strong>on</strong>trolling the activity and<br />

selectivity of the ligands, a series of QSAR models [3] were derived based<br />

<strong>on</strong> the Molecular Field Topology Analysis (MFTA) [4], CoMFA and<br />

Topomer CoMFA approaches. They were compared with each other as<br />

well as with the molecular modelling results.<br />

Finally, a number of potential selective ligand structures were identified<br />

by means of the virtual screening [5] from the available chemicals<br />

databases and the generated structure libraries.<br />

References<br />

1. Osolodkin DI, Chupakhin VI, Palyulin VA, Zefirov NS: J Mol Graph Model<br />

2009, 27:813.<br />

2. Chupakhin VI, Palyulin VA, Zefirov NS: Doklady Biochem Biophys 2006, 408:169.<br />

3. Chupakhin VI, Bobrov SV, Radchenko EV, Palyulin VA, Zefirov NS: Doklady<br />

Chemistry 2008, 422:227.<br />

4. Palyulin VA, Radchenko EV, Zefirov NS: J Chem Inf Comp Sci 2000, 40:659.<br />

5. Radchenko EV, Palyulin VA, Zefirov NS: Chemoinformatics Approaches to<br />

Virtual Screening RSC: Varnek A, Tropsha A 2008, 150-181.<br />

P50<br />

PoseView – molecular interacti<strong>on</strong> patterns at a glance<br />

Katrin Stierand * , Matthias Rarey<br />

Center for Bioinformatics (ZBH), University of Hamburg, Bundesstrasse 43,<br />

20146 Hamburg, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P50<br />

Chemists are well trained in perceiving 2D molecular sketches. On the side<br />

of computer assistance, the automated generati<strong>on</strong> of such sketches<br />

becomes very difficult when it comes to multi-molecular arrangements<br />

such as protein-ligand complexes in adrugdesignc<strong>on</strong>text.Existing<br />

soluti<strong>on</strong>s to date suffer from drawbacks such as missing important<br />

interacti<strong>on</strong> types, inappropriate levels of abstracti<strong>on</strong> and layout quality.<br />

During the last few years we have developed PoseView [1,2], a tool which<br />

displays molecular complexes incorporating a simple, easy-to-perceive<br />

arrangement of the ligand and the amino acids towards which it forms<br />

interacti<strong>on</strong>s. Resulting in atomic resoluti<strong>on</strong> diagrams, PoseView operates<br />

<strong>on</strong> a fast tree re-arrangement algorithm to minimize crossing lines in the<br />

sketches. Due to a de-coupling of interacti<strong>on</strong> percepti<strong>on</strong> and the drawing<br />

engine, PoseView can draw any interacti<strong>on</strong>s determined by either<br />

distance-based rules or the FlexX interacti<strong>on</strong> model (which itself is user<br />

accessible). Owing to the small molecule drawing engine 2Ddraw [3],<br />

molecules are drawn in a textbook-like manner following the IUPAC<br />

regulati<strong>on</strong>s.<br />

The tool has a generic file interface for other complexes than proteinligand<br />

arrangements. It can therefore be used as well for the display of,<br />

e.g., RNA/DNA complexes with small molecules. For batch processing, an<br />

additi<strong>on</strong>al command line interface is available; output can be provided in<br />

various formats, am<strong>on</strong>gst them gif, ps, svg and pdf.<br />

Besides the underlying interacti<strong>on</strong> models, we will present new<br />

algorithmic approaches, assess usability issues and a large-scale validati<strong>on</strong><br />

study <strong>on</strong> the PDB.<br />

References<br />

1. Stierand K, Rarey M: ChemMedChem 2007, 2:853.<br />

2. Stierand K, Maaß P, Rarey M: Bioinformatics 2006, 22:1710.<br />

3. Fricker PC, Gastreich M, Rarey M: J Chem Inf Comput Sci 2004, 44(3):1065.<br />

P51<br />

Get the best from substructure mining<br />

Jeroen Kazius<br />

Curios-IT, Anna van Burenhof 63, <strong>23</strong>16GP Leiden, The Netherlands<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P51<br />

Page 24 of 25<br />

The chemical informati<strong>on</strong> that is present in a set of compounds is rarely<br />

fullyexploited.Thisismostlybecause no descriptor set can capture all<br />

biologically important features. As a result, valuable chemical<br />

knowledge can thus stay hidden from hypothesis-based drug design.<br />

The simplest form of a structure-activity relati<strong>on</strong>ship (SAR) is a<br />

substructure that predisposes compounds towards reduced or increased<br />

biological activity. Such simple patterns should not be missed during<br />

drug design.<br />

The aim of substructure mining is to present those substructures that are<br />

most likely related to biological activity. This method thus provides rapid<br />

access to a substantial repertoire of chemical descriptors that otherwise<br />

remains hidden: substructures. In short, substructure mining c<strong>on</strong>sists of a<br />

focused, but exhaustive, series of substructure searches.<br />

This poster describes how AweSuM, the new Awesome Substructure<br />

Mining tool from Curios-IT, was employed to learn the most interesting<br />

substructures. The poster also discusses the value of enriching the data<br />

with 2D pharmacophore informati<strong>on</strong> prior to mining. An enriched,<br />

detailed SAR analysis produced a scaffold that summarises the chemical<br />

c<strong>on</strong>tent of datasets better than any standard substructure. The<br />

pharmacophore that AweSuM extracted shows predictive power and<br />

agrees with published chemical knowledge. These results dem<strong>on</strong>strate<br />

that useful SAR knowledge can be extracted from the vast space of<br />

substructure descriptors. In this way, AweSuM reveals key substructures<br />

(e.g., pharmacophores or toxicophores), which can often be predictive for<br />

biological activities.<br />

P52<br />

Rescoring of docking poses using force field-based methods<br />

Nina M Fischer * , WM Schneider, O Kohlbacher<br />

Eberhard Karls University, Center for Bioinformatics Tübingen, Sand 14, 72076<br />

Tübingen, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P52<br />

Existing protein-ligand docking methods computati<strong>on</strong>ally screen<br />

thousands to milli<strong>on</strong>s of organic molecules against protein structures,<br />

trying to find those with complementary shapes and highest binding free


Journal of <strong>Cheminformatics</strong> 2010, Volume 2 Suppl 1<br />

http://www.jcheminf.com/supplements/2/S1<br />

energies. To allow large molecular databases to be screened rapidly,<br />

simple and approximative scoring functi<strong>on</strong>s are used as a fast filter,<br />

resultinginlowhitrates.Therefore, docking hit lists are comm<strong>on</strong>ly<br />

rescored in a final step by more rigorous and time-c<strong>on</strong>suming methods<br />

to gain a more accurate final list of ranked compounds.<br />

Molecular mechanics Poiss<strong>on</strong>-Boltzmann surface area (MM-PBSA) or<br />

generalized Born surface area (MM-GBSA) methods are currently<br />

c<strong>on</strong>sidered to be suitable techniques for rescoring. These physically<br />

realistic approaches incorporate more sophisticated models for solvati<strong>on</strong><br />

and electrostatic interacti<strong>on</strong>s than most scoring functi<strong>on</strong>s. Hence, they can<br />

discriminate more reliably between correct and incorrect docking poses.<br />

A high-throughput rescoring protocol using force field-based methods<br />

has been proposed by Brown and Muchmore [1]. They used 18 in-house<br />

urokinase-ligand crystal structures and their corresp<strong>on</strong>ding experimentally<br />

determined binding free energies as a test set. On the basis of this<br />

rescoring protocol we tested several molecular dynamics simulati<strong>on</strong><br />

protocols in combinati<strong>on</strong> with different MM-PBSA, and MM-GBSA<br />

calculati<strong>on</strong> procedures using Amber 10 [2] and Gromacs 4 [3].<br />

C<strong>on</strong>sidering performance and accuracy, our best rescoring protocol<br />

performs similarly to the <strong>on</strong>e described by Brown and Muchmore [1].<br />

It has a comparable run-time and achieves a correlati<strong>on</strong> between<br />

experimental and rescored values of 0.88 compared to 0.87.<br />

However, we used urokinase-ligand complexes generated using the<br />

docking program Glide [4] instead of crystal structures. Additi<strong>on</strong>ally, this<br />

shows that our rescoring protocol improves the correlati<strong>on</strong> of 0.57<br />

between experimental values and Glide scores significantly to 0.88,<br />

thereby achieving a more accurate list of ranked compounds.<br />

The protocol will be incorporated into BALLView [5], an open-source<br />

molecular viewer and modeling tool. Thus, it will be available free of<br />

charge and can be c<strong>on</strong>veniently used to rescore (docked) protein-ligand<br />

complexes.<br />

References<br />

1. Brown SP: J Chem Inf Model 2007, 47:1493.<br />

2. Case DA: University of California, San Francisco 2008.<br />

3. Hess B: J Chem Theory Comput 2008, 4:435.<br />

4. Friesner RA: J Med Chem 2004, 47:1739.<br />

5. Moll A: J Comput Aided Mol Des 2005, 19:791.<br />

P53<br />

The pipelined metabolite identificati<strong>on</strong> based <strong>on</strong> MS fragmentati<strong>on</strong><br />

Miguel Rojas-Cherto * , PT Kasper, Peir<strong>on</strong>cely Julio E, T Reijmers,<br />

RJ Vreeken, T Hankemeier<br />

Leiden, The Netherlands<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P53<br />

Structural characterizati<strong>on</strong> and identificati<strong>on</strong> of comp<strong>on</strong>ents of complex<br />

biological mixtures c<strong>on</strong>stitutes <strong>on</strong>e of the central aspects of metabolomics.<br />

Metabolite identificati<strong>on</strong> is a challenging but essential task in studies of<br />

biological samples. Mass spectrometry, because of its high sensitivity and<br />

specificity, is widely and successfully used in analysis of biological samples.<br />

Identificati<strong>on</strong> of metabolites can be in principle achieved using high<br />

resoluti<strong>on</strong> multistage mass spectrometry (MSn) because it provides a<br />

feature rich fingerprint of the precursor structure. However, neither general<br />

methodology for the identificati<strong>on</strong> nor extensive databases of metabolites<br />

with multistage mass spectrometric data are available at the moment. We<br />

dem<strong>on</strong>strate in this poster the feasibility of the strategy for metabolite<br />

identificati<strong>on</strong> based <strong>on</strong> analysis of fragmentati<strong>on</strong> trees.<br />

High resoluti<strong>on</strong> multi stage MS experiments were performed <strong>on</strong><br />

LTQ-Orbitrap (Thermo) equipped with Triversa nanoMate (Advi<strong>on</strong>)<br />

nanoelectrospray i<strong>on</strong> source using defined protocol. An in-house<br />

developed software, integrating am<strong>on</strong>g others: Chemistry Development<br />

Kit (CDK) and XCMS libraries, was used for spectral data processing.<br />

Resulting fragmentati<strong>on</strong> trees were stored in a local database.<br />

Multi-stage Molecular Formula (MMF) tool, which uses a method to<br />

resolve the elemental compositi<strong>on</strong> of the compound and fragment i<strong>on</strong>s<br />

derived from MSn data using a cyclic c<strong>on</strong>straining process has been<br />

developed. The process of elemental formula assignment and fitting<br />

within elemental formula paths removes artefacts of the spectra.<br />

Background signals, spikes, c<strong>on</strong>taminati<strong>on</strong>s, side peaks, etc., are rejected<br />

in this stage and are not included in the fragmentati<strong>on</strong> tree. The<br />

resulting fragmentati<strong>on</strong> trees are stored in XML format.<br />

Repeatability, reproducibility and robustness of fragmentati<strong>on</strong> tree<br />

acquisiti<strong>on</strong>s were tested by changing experimental c<strong>on</strong>diti<strong>on</strong>s and<br />

varying the c<strong>on</strong>centrati<strong>on</strong> of the metabolite of interest. An acquisiti<strong>on</strong><br />

protocol was established for the reliable and reproducible acquisiti<strong>on</strong> of<br />

mass spectral trees. It was investigated to which extent the variati<strong>on</strong> of<br />

c<strong>on</strong>diti<strong>on</strong>s such as fragmentati<strong>on</strong> energy, isolati<strong>on</strong> width etc. did change<br />

the fragmentati<strong>on</strong> pattern or topology of hierarchical relati<strong>on</strong>s between<br />

fragments.<br />

We dem<strong>on</strong>strate how the developed analytical strategy based <strong>on</strong><br />

analysis of fragmentati<strong>on</strong> trees can be used to discriminate between<br />

metabolite isomers with the same elemental compositi<strong>on</strong> and an <strong>on</strong>ly<br />

slightly different structure, but with a significantly different biological<br />

functi<strong>on</strong>.<br />

Our results provide firm basis for developing a generic, multi stage mass<br />

spectrometry based platform for efficient identificati<strong>on</strong> of metabolites.<br />

P54<br />

SBE13, a newly identified inhibitor of inactive polo-like kinase 1<br />

Sarah Keppner 1* , Ewgenij Proschak 2 , Gisbert Schneider 2 , B Spänkuch 1<br />

1 Goethe-University, Medical School, Department of Gynecology and<br />

Obstetrics, Theodor-Stern-Kai 7, 60590 Frankfurt, <str<strong>on</strong>g>German</str<strong>on</strong>g>y;<br />

2 Goethe-University, Frankfurt, <str<strong>on</strong>g>German</str<strong>on</strong>g>y<br />

Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P54<br />

Page 25 of 25<br />

Protein kinases are important targets for drug development. The almost<br />

identical protein folding of kinases and the comm<strong>on</strong> co-substrate ATP<br />

leads to the problem of inhibitor selectivity. Type II inhibitors, targeting<br />

the inactive c<strong>on</strong>formati<strong>on</strong> of kinases, occupy a hydrophobic pocket with<br />

less c<strong>on</strong>served surrounding amino acids [1].<br />

Human polo-like kinase 1 (Plk1) represents a promising target for<br />

approaches to identify new therapeutic agents. Plk1 bel<strong>on</strong>gs to a family<br />

of highly c<strong>on</strong>served serine/thre<strong>on</strong>ine kinases, and is a key player in<br />

mitosis, where it modulates the spindle checkpoint at metaphase/<br />

anaphase transiti<strong>on</strong>. Plk1 is over-expressed in all today analyzed human<br />

tumors of different origin and serves as a negative prognostic marker in<br />

cancer patients. The newly identified inhibitor, SBE13, a vanillin derivative,<br />

targets Plk1 in its inactive c<strong>on</strong>formati<strong>on</strong> [2]. This leads to selectivity<br />

within the Plk family and towards Aurora A. This selectivity can be<br />

explained by docking studies of SBE13 into the binding pocket of<br />

homology models of Plk1, Plk2 and Plk3 in their inactive c<strong>on</strong>formati<strong>on</strong>.<br />

SBE13 showed anti-proliferative effects in cancer cell lines of different<br />

origins with EC 50 values between 5 μM and39μM and induced apoptosis.<br />

Increasing c<strong>on</strong>centrati<strong>on</strong>s of SBE13 result in increasing amounts of cells in<br />

G 2/M phase 13 hours after double thymidin block of HeLa cells. The kinase<br />

activity of Plk1 was inhibited with an IC 50 of 200 pM.<br />

Taken together, we could show that carefully designed structure-based<br />

virtual screening is well-suited to identify selective type II kinase<br />

inhibitors targeting Plk1 as potential anti-cancer therapeutics.<br />

References<br />

1. Liu Y, Gray NS: Nat Chem Biol 2006, 2:358-364.<br />

2. Keppner S, Proschak E, Schneider G, Spänkuch B: Chem Med Chem 2009 in<br />

press.<br />

Cite abstracts in this supplement using the relevant abstract number,<br />

e.g.: Keppner et al.: SBE13, a newly identified inhibitor of inactive pololike<br />

kinase 1. Journal of <strong>Cheminformatics</strong> 2010, 2(Suppl 1):P54

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!