5th German Conference on Cheminformatics: 23 ... - BioMed Central

Journal of Cheminformatics 2010, Volume 2 Suppl 1 

http://www.jcheminf.com/supplements/2/S1 

MEETING ABSTRACTS Open Access 

<strong>5th</strong> <strong>German</strong> <strong>Conference</strong> on Cheminformatics: 

23. CIC-Workshop 

Goslar, <strong>German</strong>y. 8-10 November 2009 

Edited by Frank Oellien, Uli Fechner and Thomas Engel 

Published: 04 May 2010 

These abstracts are available online at http://www.jcheminf.com/supplements/2/S1 

INTRODUCTION 

A1 

<strong>5th</strong> <strong>German</strong> <strong>Conference</strong> on Chemoinformatics: 23. CIC-Workshop. 

November 8-10, 2009, Goslar, <strong>German</strong>y 

Frank Oellien 1* , Uli Fechner 2 , Thomas Engel 3 

1 GDCh-CIC Division Chair, Intervet Innovation GmbH, Zur Propstei, 55270 

Schwabenheim, <strong>German</strong>y; 2 GDCh-CIC division board member, Beilstein- 

Institut zur Förderung der Chemischen Wissenschaften, Trakehner Str. 7-9, 

60487 Frankfurt, <strong>German</strong>y; 3 GDCh-CIC Division Co-Chair, Fakultät für Chemie 

und Pharmazie, Universität München, Butenandtstr. 5-13, 81377 München, 

<strong>German</strong>y 

E-mail: frank.oellien@sp.intervet.com 

Journal of Cheminformatics 2010, 2(Suppl 1):A1 

From the 8 th to the 10 th November 2009, the Chemistry-Information- 

Computers (CIC) division of the <strong>German</strong> Chemical Society (GDCh) has invited 

the chemoinformatics and modeling community to Goslar, <strong>German</strong>y to 

participate in the <strong>5th</strong> <strong>German</strong> <strong>Conference</strong> on Chemoinformatics 

(GCC 2009). The international symposium addressed a broad range of 

modern research topics in the field of computers and chemistry. The focus 

was on recent developments and trends in the fields of Chemoinformatics 

and Drug Discovery, Chemical Information, Patents and Databases, Molecular 

Modeling, Computational Material Science and Nanotechnology. In addition, 

other contributions from the field of Computational Chemistry were 

welcome. 

The conference was opened traditionally with a “Free-Software-Session” on 

Sunday afternoon right before the official conference opening at 5 pm 

including three talks about the Open Source projects Bingo, Dingo and 

OrChem. In parallel the “Chemoinformatics Market Place” took place 

including software tutorials by Chemical Computing Group, Hemlholtz- 

Center Munich and the Cambridge Crystallograhic Data Center. 

The scientific program was opened by an evening talk giving an overview 

on the field of Systems Chemistry (Günter von Kiedrowski). In addition, 

the program included six plenary lectures (Eberhard Voit (USA), Knut 

Baumann (<strong>German</strong>y), Thomas Kostka (<strong>German</strong>y), Anthony J. Williams 

(USA), Kalr-Heinz Baringhaus (<strong>German</strong>y), Christoph Sotriffer (<strong>German</strong>y)], 

17 general lectures as well as 54 poster presentations. 

Besides the scientific program a special highlight of the conference were 

the FIZ-CHEMIE-Berlin 2009 awards on Monday afternoon (Figure 1). The 

CIC division awards this price each year to the best diploma thesis and 

the best PhD in the field of Computational Chemistry. The price for the 

PhD thesis was awarded to Dr. José Batista from the group of Prof. 

Dr. Jürgen Bajorath, University of Bonn for his dissertation “Analysis of 

Random Fragment Profiles for the Detaction of Structure-Activity 

Relationships”. The award for the best diploma thesis has gone to Frank 

Tristram from the group of Dr. Wolfgang Wenzel, Karlsruher Institute of 

Figure 1 (abstract A1). Fiz CHEMIE Berlin Awards 2009: from left to 

right, Frank Tristram (award for the best diploma thesis), Rene de 

Planque (Head of FIZ CHEMIE Verlin), Frank Oellien (Chair of the 

GDCh-CIC division), José Batista (award for the best PhD thesis). 

Technology with the title “Modellierubng der Hauptkettenbeweglichkeit in 

der rechnergestützten Medikamentenentwicklung”. 

FREE SOFTWARE SESSION 

PRESENTATIONS 

F1 

Bingo from SciTouch LLC: chemistry cartridge for Oracle database 

Dmitry Pavlov * , Mikhail Rybalkin, Boris Karulin 

SciTouch LLC, Budapeshtskaya st. 23-2-82, RUS-192212 St. Petersburg, Russia 

E-mail: dpavlov@scitouch.net 

Journal of Cheminformatics 2010, 2(Suppl 1):F1 

Bingo is a data cartridge for Oracle database that provides the industry’s 

next-generation, fast, scalable, and efficient storage and searching 

solution for chemical information. 

Bingo seamlessly integrates the chemistry into Oracle databases. Its 

extensible indexing is designed to enable scientists to store, index, and



search chemical moieties alongside numbers and text within one 

underlying relational database server. 

For molecule structure searching, Bingo supports 2D and 3D exact and 

substructure searches, as well as similarity, tautomer, Markush, formula, 

molecular weight, and flexmatch searches. For reaction searches, Bingo 

supports reaction substructure search (RSS) with optional automatic 

generation of atom-to-atom mapping. All of these techniques are 

available through extensions to the SQL and PL/SQL syntax. 

Bingo also has features not present in other cartridges, for example, 

advanced tautomer search, resonance substructure search, and fast 

updating of the index when adding new structures. 

The presentation itself you can download from our site: http:// 

opensource.scitouch.net/downloads/bingo-cic.pdf 

F2 

Dingo: 2D molecule and reaction structural formula rendering library 

Mikhail Kozhevnikov * , Boris Karulin 

SciTouch LLC, Budapeshtskaya st. 23-2-82, RUS-192212 St. Petersburg, Russia 

E-mail: kozhevnikov@scitouch.net 


2D structural formulae are widely accepted as a way of representing 

chemical compounds and reactions. High-quality visualization of those 

formulae can greatly facilitate scientist perception. 

We are proud to present a tool for such visualization, called Dingo [1]. It 

is capable of rendering molecules, reactions and queries along with a 

wide range of attributes on jpgWindows, Linux, Solaris and Mac OS in 

accordance with IUPAC recommendations [2]. 

Dingo accepts common file formats, such as MDL Molfile [3] and Rxnfile, 

SMILES [4] with several extensions and more. The result can be stored as 

a PDF, SVG, PNG or EMF file. One can also embed Dingo in an application 

and use it via simple C interface or a C# wrapper to produce image files 

or render directly to a Windows HDC. 

Dingo is a strong competitor for existing proprietary tools on the market, 

while it is available under the terms of GPLv3 free of charge. 

The aim of the project is to provide a simple and efficient mechanism for 

2D structural formula rendering. High quality and the variety of 

supported formats allows inclusion of resulting images in scientific 

papers, further editing of the output in a vector form or embedding the 

library in more complex applications. 

References 

1. [http://opensource.scitouch.net/indigo/dingo]. 

2. [http://www.iupac.org/objID/Article/pac8002x0277]. 

3. [http://www.symyx.com/downloads/public/ctfile/ctfile.pdf]. 

4. [http://www.daylight.com/smiles/]. 

F3 

OrChem 

Mark Rijnbeek 

Protein and Nucleotide Database Group, Chemoinformatics and Metabolism 

Team, EMBL Outstation - Hinxton, European Bioinformatics Institute, 

Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK 

E-mail: markr@ebi.ac.uk 


Background: Registration, indexing and searching of chemical structures 

in relational databases is one of the core areas of cheminformatics. 

However, little detail has been published on the inner workings of search 

engines and their development has been mostly closed-source. We 

decided to develop an open source chemistry extension for Oracle, the 

de facto database platform in the commercial world. 

Results: Here we present OrChem, an extension for the Oracle 11G 

database that adds registration and indexing of chemical structures to 

support fast substructure and similarity searching. The cheminformatics 

functionalityisprovidedbytheChemistryDevelopmentKit.OrChem 

provides similarity searching with response times in the order of seconds 

for databases with millions of compounds, depending on a given 

similarity cut-off. For substructure searching, it can make use of multiple 

processor cores on today’s powerful database servers to provide fast 

response times in equally large data sets. 

Availability: OrChem is free software and can be redistributed and/or 

modified under the terms of the GNU Lesser General Public License as 

published by the Free Software Foundation. All software is available via 

http://orchem.sourceforge.net. 

ORAL PRESENTATIONS 

Page 2 of 25 

O1 

Systems chemistry: from chemical self-replication to trisoligo-based 

nanoconstruction 

Günter von Kiedrowski 

Lehrstuhl für Organische Chemie I - Bioorganische Chemie, Ruhr-Universität, 

Universitätsstrasse 150, 44780 Bochum, <strong>German</strong>y 

E-mail: kiedro@rub.de 

Journal of Cheminformatics 2010, 2(Suppl 1):O1 

Self-replication is one of the major principles without life could not exist. 

The emergence of self-replicating systems on the early earth is generally 

believed to have taken place before the advent of instructed protein 

synthesis based on a complex translation machinery. Whether the origin 

of self-replication is identical to the origin of the hypothetical RNA world 

or whether it existed at an earlier stage of evolution is an open question 

that has stimulated chemists to search for chemical systems capable of 

making copies of itselves via autocatalytic reactions. As self-replication 

means autocatalysis plus information transfer, the reaction products must 

necessarily be able to store more structural information than their 

precursors. Templating as a means to transfer structural information has 

been exploited since the first successful example of a chemical selfreplicating 

system almost two decades ago [1]. Today we have a broad 

variety of such systems employing oligonucleotides, peptides, and small 

organic molecules as templates and autocatalytic [1-3], cross-catalytic [4], 

collectively autocatalytic and non-autonomous (stepwise) schemes of selfreplication 

[5]. 

A link between the chemical self-replication and nanotechnology was first 

pointed out by G.M.Whitesides in his debate with Drexler. The ribosome 

is an example for a nanomachine which may be viewed as a threedimensionally 

defined array of 51 modular proteins positioned by the 

rRNA scaffold. Biomimetic approaches towards the 3D-nano-scaffolding of 

modular functions may be based on trisoligonucleotides [6], viz. synthetic 

3-arm junctions in which the 3’-ends of three oligonucleotides are 

connected by a suitable linker. We report on trisoligos as building blocks 

for the noncovalent construction of nanoobjects. Kinetic control - applied 

by rapid cooling during self-assembly - was found to favor small and 

defined nano-structures instead of large polymeric networks [6]. Maximal 

instruction was employed as design principles to generate noncovalent 

3D-nano-objects in which both, the topology and the geometry is 

defined [7]. Examples include dodecahedral nano-scaffolds composed 

from 20 trisoligos each bearing three individually defined sequences [8]. 

It was demonstrated that the connectivity information in the nanoscaffold 

junctions can be copied by chemical means [9]. Chemical 

copyingschemesmaybeseeninconjunctionwith“surface-promoted 

replication and exponential amplification of DNA analogues” (SPREAD) [5]. 

Applications of such scaffolds include the positioning of modular 

functions such as multidentate thioether-based gold cluster labels 

(RUBiGold) [10] which have been tailored for monoconjugability and 

thermostability. 

My lecture will introduce chemical self-replication and multicomponent 

assembly as facets of systems chemistry [3,11,12] - a nascent field which 

understands itself as the bottom-up pendant of systems biology towards 

synthetic biology [14]. The field is clearly inspired by the origin-of-life 

problem but goes beyond traditional prebiotic chemistry in its mission 

towards a quantititive dynamic understanding of complex reaction 

networks with autocatalytic components. Software tools like our SimFit 

and kinetic NMR titration [2] may significantly help to decipher signatures 

of interesting dynamic phenomena such as self-replication, chiral 

symmetry-breaking, and metabolic autocatalysis found in organic [13] 

and biomolecular [12] reaction systems.



References 

1. von Kiedrowski G: Angew Chem Int Ed Engl 1986, 25:932. 

2. Stahl I, von Kiedrowski G: J Am Chem Soc 2006, 28:14014-14015. 

3. Kindermann M, et al: Angew Chem 2005, 117:6908-6913. 

4. Sievers D, von Kiedrowski G: Nature 1994, 369:221. 

5. Luther A, Brandsch R, von Kiedrowski G: Nature 1998, 396:245. 

6. Scheffler M, et al: Angew Chem Int Ed Engl 1999, 38:3312. 

7. von Kiedrowski G, et al: Pure Appl Chem 2003, 75:609. 

8. Zimmermann J, et al: Angew Chem 2008, 120:3682-3686. 

9. Eckardt L, et al: Nature 2002, 420:286. 

10. Pankau WM, Mönninghoff S, von Kiedrowski G: Angew Chem 2006, 

118:1923-1926. 

11. Stankiewicz J, Eckardt LH: Angew Chem 2006, 118:350-352. 

12. Hayden EJ, von Kiedrowski G, Lehman N: Angew Chem 2008, 120:8552-8556. 

13. Kramer D, Pankau WM, von Kiedrowski G: in press. 

14. MoU. 

O2 

The role of systems modeling in drug discovery and predictive health 

Eberhard O Voit 

Integrative BioSystems Institute and Wallace H. Coulter Department of 

Biomedical Engineering, Georgia Institute of Technology, 313 Ferst Drive, 

Atlanta, GA 30332-0535, USA 

E-mail: eberhard.voit@bme.gatech.edu 


Systems biology is the result of a confluence of recent advances in 

molecular biology, engineering, and the computational sciences. It can 

loosely be categorized into experimental and computational systems 

biology. Experimental high-throughput methods, assisted by robotics, image 

analysis, and bioinformatics, have been used in the drug industry for quite a 

while, and current screening tests for drug efficacy and toxicity regularly 

involve genomic, proteomic, and molecular modeling approaches. By 

contrast, the role of computational methods of biological systems analysis is 

still emerging. This presentation focuses on computational systems 

modeling and its increasingly important role at several junctures of the drug 

development pipeline. Examples to be discussed include mathematical 

models for receptor dynamics, pharmacokinetics, and metabolic and 

signaling pathway analysis. In the context of the latter, Biochemical Systems 

Theory is proposed as a highly advantageous default framework for model 

design, diagnostics, manipulation, and system optimization. The 

development of dynamic models for complex disease processes permits the 

straightforward inclusion of methods for custom-tailoring models, which is a 

key step toward personalized medicine and predictive health [1-5]. 

References 

1. Voit EO: Computational Analysis of Biochemical Systems. A Practical Guide 

for Biochemists and Molecular Biologists Cambridge University Press, 

Cambridge, UK 2000, xii + 530. 

2. Voit EO: Metabolic modeling: A tool of drug discovery in the postgenomic 

era. Drug Discovery Today 2002, 7(11):621-628. 

3. Voit EO: The dawn of a new era of metabolic systems analysis. Drug 

Discovery Today BioSilico 2004, 2(5):182-189. 

4. Voit EO, Brigham KL: The role of systems biology in predictive health and 

personalized medicine. The Open Path. J 2008, 2:68-70. 

5. Voit EO: A systems-theoretical framework for health and disease. Math 

Biosc 2009, 217:11-18. 

O3 

Molecular bioactivity extrapolation to novel targets by support vector 

machines 

Gerard JP Van Westen 1* ,JKWegner 2 , AP IJzerman 1 , HWT Van Vlijmen 2 , A Bender 1 

1 Amsterdam Center for Drug Research, Einsteinweg 55, 2333 CC, Leiden, The 

Netherlands; 2 Tibotec, Gen De Wittelaan L 11B 3, 2800 Mechelen, Belgium 

E-mail: gerard@lacdr.leidenuniv.nl 


The early phases of drug discovery use in silico models to rationalize 

structure activity relationships, and to predict the activity of novel 

Page 3 of 25 

compounds. However, the performance of these models is not always 

acceptable and the reliability of external predictions - both to novel 

compounds and to related protein targets - is often limited. 

Proteochemometric modeling [1] adds a target description, based on 

physicochemical properties of the binding site, to these models. 

Our proteochemometric models [2] are based on Scitegic circular 

fingerprints on the compound side and on a customized protein 

fingerprint on the target side. This protein fingerprint is based on a 

selection of physicochemical descriptors obtained from the AAindex 

database. Through PCA we selected a number of physicochemical 

properties which are hashed in a fingerprint using the Scitegic hashing 

algorithm. We compared this fingerprint to a number of protein 

descriptors previously published, including the Z-scales, the FASGAI and 

the BLOSUM descriptors. Our fingerprint performs superior to all of 

these. In addition, we show that proteochemometric models improve 

external prediction capabilities. In the case of classification this leads to 

models with a higher specificity when compared to conventional QSAR. 

In the case of regression our models show an average lower RMSE of 

0.12 log units when based on a pIC50 output variable compared to 

conventional QSAR modeling the same data-set. Furthermore, our 

models enable target extrapolation. As a result we can predict the 

activity of known and new compounds on new targets while retaining 

the same model quality as when performing external validation without 

target extrapolation. 

References 

1. Freyhult E, Prusis P, Lapinsh M, Wikberg JE, Moulton V, Gustafsson MG: 

Unbiased descriptor and parameter selection confirms the 

potential of proteochemometric modelling. BMC Bioinformatics 2005, 

6:50. 

2. Doddareddy MKR, Van Westen GJP, Horst Van der E, Peironcely JE, 

Corthals F, IJzerman AP, Emmerich M, Jenkins JL, Bender A: 

Chemogenomics: Looking at Biology through the Lens of Chemistry. Stat 

Anal Data Mining 2009 in press. 

O4 

Representation and searching of biomolecules 

Joeseph L Durant * , WL Chen, BD Christie, DL Grier, BA Leland, JG Nourse 

Symyx Technologies, 2440 Camino Ramon, San Ramon, California, USA 


Biomolecules present challenges to chemical information systems 

designed for small molecules. Their sizes, up to tens of thousands of 

atoms, overwhelm representation/storage/searching solutions built on 

explicit chemical representation of the structures. But biomolecules are 

largely made up of many repeats of a limited number of buildingblock 

molecules, a fact which has been used to provide a compressed 

representation for biomolecules using templates for the building 

blocks. 

We have adopted a modified template-based representation for 

biomolecules. Our primary interest is in the chemically modified portions 

of biomolecules, for which we choose to use explicit chemistry. These 

areas of explicit chemistry are then embedded in the templatecompressed, 

unmodified portions of the full biomolecule. 

The regions containing explicit chemistry are indexed, and thus can be 

structure searched with good performance. A limited number of residues 

surrounding explicit chemistry regions are included in the index for 

searching the context of these explicit regions. By using explicit chemistry 

to represent modified regions we can search across classes of 

modifications for common features. For example a single substructure 

search query will find green fluorescent protein, and its histidine, 

phenylalanine and tryptophan analogs. 

Templates are stored with the structure providing a self-contained file 

format. The use of NEMA keys allows templates from different structures 

to be compared, and allows storage of structures containing a canonical 

list of templates. The residues have defined attachment points, allowing 

automated traversal of a protein backbone, or location of non-backbone 

bonds to residues. 

We will present example structures and structural queries highlighting 

capabilities of our representation.



O5 

Cross-validation is dead. Long live cross-validation! Model validation 

based on resampling 

Knut Baumann 

Institute of Pharmaceutical Chemistry, University of Technology 

Braunschweig, Beethovenstr. 55, D-38106 Braunschweig, <strong>German</strong>y 


Cross-validation was originally invented to estimate the prediction error of 

a mathematical modelling procedure. It can be shown that crossvalidation 

estimates the prediction error almost unbiasedly. Nonetheless, 

there are numerous reports in the chemoinformatic literature that crossvalidated 

figures of merit cannot be trusted and that a so-called external 

test set has to be used to estimate the prediction error of a mathematical 

model. In most cases where cross-validation fails to estimate the 

prediction error correctly, this can be traced back to the fact that it was 

employed as an objective function for model selection. Typically each 

model has some meta-parameters that need to be tuned such as the 

choice of the actual descriptors and the number of variables in a QSAR 

equation, the network topology of a neural net, or the complexity of a 

decision tree. In this case the meta-parameter is varied and the crossvalidated 

prediction error is determined for each setting. Finally, the 

parameter setting is chosen that optimizes the cross-validated prediction 

error in an attempt to optimize the predictivity of the model. However, in 

these cases cross-validation is no longer an unbiased estimator of the 

prediction error and may grossly deviate from the result of an external 

test set. It can be shown that the “amount” of model selection can 

directly be related to the inflation of cross-validated figures of merit. 

Hence, the model selection step has to be separated from the step of 

estimating the prediction error. If this is done correctly, cross-validation 

(or resampling in general) retains its property of unbiasedly estimating 

the prediction error. Matter of factly, it can be shown that data splitting 

into a training set and an external test set often estimates the prediction 

error less precise than proper cross-validation. It is this variabability of 

prediction errors, which depends on test set size, that causes seemingly 

paradox phenomena such as the so-called “Kubinyi’s paradoxon” for small 

data sets. 

O6 

A unified approach to the applicability domain problem of 

QSAR models 

Dragos Horvath * , Gilles Marcou, Alexandre Varnek 

Laboratoire d’InfoChimie, Université de Strasbourg - CNRS, Institut de Chimie, 

4 rue Blaise Pascal, 67000 Strasbourg, France 


The present work proposes a unified conceptual framework to describe 

and quantify the important issue of the Applicability Domains (AD) of 

Quantitative Structure-Activity Relationships (QSARs). AD models are 

conceived as meta-models designed to associate an untrustworthiness 

score to any molecule M subject to property prediction by a QSAR 

model. Untrustworthiness scores or “AD metrics” are an expression of the 

relationship between M (represented by its descriptors in chemical space) 

and the space zones populated by the training molecules at the basis of 

model μ. Scores integrating some of the classical AD criteria (similaritybased, 

box-based) were considered in addition to newly invented terms, 

such as the dissimilarity to outlier-free training sets and the correlation 

breakdown count. 

A loose correlation is expected to exist between this untrustworthiness 

and the error affecting the predicted property. While high 

untrustworthiness does not preclude correct predictions, inaccurate 

predictions at low untrustworthiness must be imperatively avoided. This 

kind of relationship is characteristic for the Neighborhood Behavior (NB) 

problem: dissimilar molecule pairs may or may not display similar 

properties, but similar molecule pairs with different properties are 

explicitly “forbidden”. Therefore, statistical tools developed to tackle this 

latter aspect were applied, and lead to a unified AD metric 

benchmarking scheme. 

A first use of untrustworthiness scores resides in prioritization of 

predictions, without need to specify a hard AD border. Moreover, if a 

Page 4 of 25 

significant set of external compounds is available, the formalism allows 

optimal AD borderlines to be fitted. Eventually, consensus AD definitions 

were built by means of a nonparametric mixing scheme of two AD 

metrics of comparable quality, and shown to outperform their respective 

parents. 

O7 

Quantifying model errors using similarity to training data 

Rob D Brown 1* , JD Honeycutt 1 , SL Aaron 2 

1 2 

Accelrys Inc, 10188 Telesis Court, San Diego, CA 92121, USA; Accelrys Inc, 

Cambridge, UK 


When making a prediction with a statistical model, it is not sufficient to 

know that the model is “good”, in the sense that it is able to make 

accurate predictions on test data. Another relevant question is: How good 

is the model for a specific sample whose properties we wish to predict? 

Stated another way: Is the sample within or outside the model’s domain 

of applicability or what is the degree to which a test compound is within 

the model’s domain of applicability. Numerous studies have been done 

on determining appropriate measures to address this question [1-4]. Here 

we focus on a derivative question: Can we determine an applicability 

domain measure suitable for deriving quantitative error bars – that is, 

error bars which accurately reflect the expected error when making 

predictions for specified values of the domain measure? Such a measure 

could then be used to provide an indication of the confidence in a given 

prediction (i.e. the likely error in a prediction based on to what degree 

the test compound is part of the model’s domain of applicability). Ideally, 

we wish such a measure to be simple to calculate and to understand, to 

apply to models of all types – including classification and regression 

models for both molecular and non-molecular data - and to be free of 

adjustable parameters. Consistent with recent work by others [5,6], the 

measures we have seen that best meet these criteria are distances to 

individual samples in the training data. We describe our attempts to 

construct a recipe for deriving quantitative error bars from these 

distances. 

References 

1. Eriksson L, Jaworska J, Worth AP, Cronin MTD, McDowell RM, Gramatica P: 

Methods for Reliability and Uncertainty Assessment and for Applicability 

Evaluations of Classification- and Regression-Based QSARs. Environmental 

Health Perspectives 2003, 111:1361. 

2. Tropsha A, Gramatica P, Gombar VK: The Importance of Being Earnest: 

Validation is the Absolute Essential for Successful Application and 

Interpretation of QSPR Models. QSAR Comb Sci 2003, 22:69. 

3. Jaworska J, Nikolova-Jeliazkova N, Aldenberg T: QSAR applicabilty domain 

estimation by projection of the training set descriptor space: a review. 

Altern Lab Anim 2005, 33:445-59. 

4. Stanforth RW, Kolossov E, Mirkin B: A Measure of Domain of Applicability 

for QSAR Modelling Based on Intelligent K-Means Clustering. QSAR & 

Combinatorial Science 2007, 26:837. 

5. Sheridan RP, Feuston BP, Maiorov VN, Kearsley SK: Similarity to Molecules 

in the Training Set Is a Good Discriminator for Prediction Accuracy in 

QSAR. J Chem Inf Comput Sci 2004, 44:1912. 

6. Horvath D, Marcou G, Varnek A: Predicting the Predictability: A Unified 

Approach to the Applicability Domain Problem of QSAR Models. J Chem 

Inf Comput Sci 2009, 49:49. 

O8 

A Branch-and-Bound approach for tautomer enumeration 

Torsten Thalheim * , R-U Ebert, R Kühne, G Schüürmann 

UFZ Department Ecological Chemistry - Helmholtz Centre for Environmental 

Research, Permoser Str. 15, 04318 Leipzig, <strong>German</strong>y 


For molecules with mobile H atoms, the result of quantitative structureactivity 

relationships (QSARs) depends on the position of the respective 

hydrogen atoms. Thus to obtain reliable results, tautomerism needs to be 

taken into account. 

In the last years, many approaches were introduced to achieve this. In 

this work we present a further development of our previous algorithm



based on InChI-layers. While the InChI ansatz supports only heteroatomtautomerism, 

we suggest an extension regarding carbon atoms too. 

Whereas with other tautomer generating algorithms the hydrogen shifts 

are based on pattern-rules, we try to overcome the rule constriction and 

evolve a more common solution. The advantage of our approach is quite 

simple. Due to the avoidance of a rule system with its necessity for 

exceptions to the rules, we can apply our solution to any kind of 

tautomerism definition. 

We set up a Branch-and-Bound approach, which is optimized to generate 

a complete enumeration of all tautomers, with regard to a certain 

definition, from any structure. With few and easy decisions like symmetry 

detection, we avoid a lot of calculation overhead. Decisions with 

significant influence on the algorithm efficiency are made as early as 

possible. 

We have set up several kinds of tautomer definitions and derived a stable 

definition covering the major kinds of protropic tautomerism. 

Furthermore we analyzed, what expenditure of time for large databases 

(case study: more than 70,000 entries) is needed to investigate which 

structures have tautomers and which not for more than 99% of the 

database entries. 

This study has been financially supported by the EU project OSIRIS 

(IP, contract no. 037017). 

O9 

KnowledgeSpace - a publicly available virtual chemistry space 

Carsten Detering * , H Claussen, M Gastreich, C Lemmen 

BioSolveIT GmbH, An der Ziegelei 79, 53757 Sankt Augustin, <strong>German</strong>y 


Virtual high throughput screening of in-house compound collections and 

vendor catalogs is a validated approach in the quest for novel molecular 

entities. However, these libraries are small compared to the overall 

synthesizable number of compounds from validated “wet” chemical 

reactions in pharma companies or the public domain. 

In order to overcome this limitation, we designed a large virtual 

combinatorial chemistry space from publicly available combinatorial 

libraries that gives access to billions of synthetically accessible 

compounds. Together with FTrees, a fuzzy similarity calculator, the 

researcher has a means of searching this KnowledgeSpace for analogues 

to one or several query molecules within a few minutes. The resulting 

compounds not only exhibit similar properties to the query molecule(s), 

but also feature an annotation through which of the synthetic routes 

these molecules can be made. Results can be expected to be diverse, 

based on FTrees scaffold hopping capabilities, and provide ideas for hit 

follow-up into novel compound classes. 

In this contribution we present the design and properties of the 

KnowledgeSpace and other in-house chemistry spaces that build on the 

same strategy as well as validation of results and a number of successful 

applications including prospective results. 

O10 

3D pharmacophore alignments: does improved geometric accuracy 

affect virtual screening performance? 

Gerhard Wolber 1* , T Seidel 2 , F Bendix 2 

1 University of Innsbruck, Institute of Pharmacy, Dept. Pharmaceutical 

Chemistry, Innrain 52, 6020 Innsbruck, Austria; 2 Inte:Ligand GmbH, 

Mariahilferstrasse 74B/11, A-1070 Vienna, Austria 

E-mail: gerhard.wolber@uibk.ac.at 


Virtual screening using three-dimensional arrangements of chemical 

features (3D pharmacophores) has become an important method in 

computer-aided drug design. Although frequently used, considerable 

differences exist in the interpretation of these chemical features and their 

corresponding 3D overlay algorithms. We have recently developed an 

efficient and accurate 3D alignment algorithm based on a pattern 

recognition technique [1]. In the presented work, we extend this 

algorithm to be used for high-performance virtual database screening 

and investigate, whether applying this geometrically more accurate 3D 

Page 5 of 25 

alignment algorithm improves virtual screening results over conventional 

incremental n-point distance matching approaches. 

Reference 

1. Wolber G, Dornhofer A, Langer T: Efficient overlay of small molecules 

using 3-D pharmacophores. J Comput-Aided Mol Design 2006, 

20(12):773-788. 

O11 

The protein flexibility in receptor-ligand docking simulations 

Frank Tristram 

Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermannvon-Helmholtz-Platz 

1, 76344 Eggenstein-Leopoldshafen, <strong>German</strong>y 


Many small-molecule drugs work by binding specifically to a target 

protein in the cell. It is known for over a century that both the ligand 

and protein receptor change their conformation in the association 

process, which is called the induced-fit effect. Ligand conformational 

change is routinely treated in methods for in-silico drug discovery, 

because typical drug molecules have seldom more than 100 atoms. The 

flexibility of proteins, which often have more than 10000 atoms, is much 

harder to treat and was therefore neglected in most high-speed docking 

methods, limiting their accuracy and predictive value. 

In this project we have developed an efficient numerical search 

procedure that succeeds to model the interaction between a flexible 

ligand and important flexible parts of the protein in a computationally 

affordable protocol. Our virtual screening software FlexScreen thus 

minimizes the total interaction energy of the emerging protein-ligand 

complex including flexible backbone regions. We have demonstrated the 

reliability and accuracy of this approach on several examples, for which at 

least two different receptor-conformations have been experimentally 

observed. We succeeded to predict the correct binding pose of a proteinligand 

complex starting from the crystal structure of an unrelated protein 

conformation, where large conformational changes of the protein are 

necessary to bind the ligand. Correctly predicting such conformational 

changes makes our approach attractive for virtual screening of medium 

sized databases, in particular for kinases and other target receptors which 

have flexible binding pockets. 

O12 

Random molecular substructures as fragment-type descriptors 

José Batista 1,2 

1 Jado Technologies GmbH, Tatzberg 47-51, D-01307 Dresden, <strong>German</strong>y; 

2 Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical 

Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universtität 

Bonn, Dahlmannstr. 2, D-53113 Bonn, <strong>German</strong>y 


A novel approach for analysis of structure-activity relationships for sets of 

active compounds is reported. Molecular similarity relationships are 

analyzed among compounds with related biological activity based on the 

evaluation of randomly generated fragment populations. Fragments are 

randomly generated by iterative bond deletion in molecular graphs and 

hence depart from those obtained by well-defined fragmentation 

schemes [1]. A major finding has been that for any given activity class, 

dependency relationships of fragment co-occurrence exists within 

random fragment populations. Through the analysis of these relationships 

the chemical information content of generated substructures can be 

quantified [2]. This has lead to the identification of small numbers (10- 

40) of so-called activity class characteristic substructures (ACCS) that have 

been successfully utilized in virtual screening applications [3,4]. 

References 

1. Batista J, Godden JW, Bajorath J: J Chem Inf Model 2006, 46(5):1937-1944, 

(PMID: 16995724). 

2. Batista J, Bajorath J: J Chem Inf Model 2007, 47(4):1405-1413, (PMID: 

17585755). 

3. Batista J, Bajorath J: Chem Med Chem 2008, 3(1):67-73, (PMID: 17952883). 

4. Stumpfe D, Frizler M, Sisay MT, Batista J, Vogt I, Gütschow M, Bajorath J: 

Chem Med Chem 2009, 4(1):52-54, (PMID: 19053132).



O13 

Making people’s lives easier, better and more beautiful - and how 

computer-aided materials design may help 

Thomas Kostka 

Henkel AG & KGaA, Henkelstrasse 67, 40191 Düsseldorf, <strong>German</strong>y 


Computer-based simulations provide new insights at the molecular scale 

of chemical processes and material properties - new products may be 

developed more economically by a more focused strategy. Thus, the 

overall product development cycle for new materials and active 

ingredients can be reduced significantly. 

R&D explores and evaluates continuously these possibilities and utilizes 

the enormous potential linked with scientific computing technologies. 

The method portfolio ranges from quantum chemical, molecular 

mechanics and mesoscopic simulations to QSAR approaches. 

Examples will be given how scientific computing methods contribute to 

innovative products with state-of-the-art molecular modeling and 

mathematical methods - both, for consumer goods and industrial 

applications. 

O14 

Biomaterialcoatings - a challenging task studied by the molecular 

fragment dynamics 

Carsten Wittekindt * , H Kuhn 

CAM-D Technologies GmbH, Schützenbahn 70, 45117 Essen, <strong>German</strong>y 

E-mail: wittekindt@molecular-dynamics.de 


A special and important task in medicinal implant technology is to 

preventthematerialfromfoulingormakeitcompatiblefortissuesby 

coating the material with functionalized lipid biomolecules. In order to 

rationalize the design of such new materials structural information about 

the film is needed. In this presentation we give an account on the 

investigation of coating of the tetraetherlipid GDNT on a borofluorate 

surface in different solvents. To gain insights into the dynamic process on 

the microsecond and micrometer scale we applied our recently 

developed Molecular Fragment Dynamics method (MFD) [1]. 

Aside of the size of the system and the duration of the simulation, the 

challenging task of the investigation lies in the divers chemical 

functionality of the compounds. The applied MFD algorithm is developed 

to resolve especially these problems. The MFD algorithm is based on the 

Dissipative-Particle-Dynamics method (DPD). Whereas in the DPD method 

the system is divided into different regions of fluid subsystems, the MFD 

algorithm calculates the interaction of molecular fragments. 

Consequently, the MFD method is well suited for the molecular 

modelling simulation of systems having a diverse variety of chemical 

functionalities thus allowing for investigation of biocoatings, polymers 

and tensids. Figure 1. 

Reference 

1. Ryjkina E, Kuhn H, Rehage H, Müller F, Peggau J: Molecular Dynamic 

Computer Simulations of Phase Behavior of Non-Ionic Surfactants. 

Angew Chem Int Ed 2002, 41:983. 

Figure 1 (abstract O14). 

O15 

Mechanistic studies on the ring-opening polymerisation of D,L-lactide 

with zinc guanidine complexes 

S Herres-Pawlis 1* , J Börner 2 , Vieira I dos Santos 2 , U Flörke 2 

1 2 

TU Dortmund, Fakultät Chemie, 44221 Dortmund, <strong>German</strong>y; University of 

Paderborn, Paderborn, <strong>German</strong>y 


The increasing shortage of resources has enforced the search for 

production techniques for biodegradable polymers made of renewable 

raw materials. An example is polylactide (PLA), an aliphatic polyester, 

which can be obtained by the metal-catalysed ring-opening 

polymerisation (ROP) of lactide (Fig. 1). The monomer for PLA production 

is available from corn or sugar beets by a bacterial fermentation process 

in few steps [1] The PLA can be either recycled or composted after use, 

making the use CO 2-emission-neutral. Most large-scale processes are 

based on the use of tin compounds as initiators [2]. However, for use in 

food packaging or similar applications, heavy metals are undesirable [3]. 

In order to substitute heavy-metal-based catalyst systems, especially zinc 

guanidine complexes are suited for the polymerisation due to their 

polymerisation activity and non-toxicity [4]. The investigations on the 

catalytic potential in the bulk polymerisation of lactide have shown that 

these compounds are able to act as initiators for lactide polymerisation 

with only few exceptions. Polylactides with molecular weights (MW) 

between 20.000 and 170.000 g/mol could be obtained. 

As the polymerisation activity depends strongly on the ligand structure 

(spacer, guanidine and amine unit), DFT studies on the complex 

properties have been performed [5]. A correlation between calculated 

partial charges on the zinc and the guanidine N atoms and the catalytic 

activity has been found. The guanidine is supposed to open 

nucleophilically the lactide ring and supersede the presence of alkoxides 

which is traditionally required. This crucial step in the ROP has been 

analysed by DFT and a mecha-nism will be presented. The deeper 

understanding of the catalytic transition state will enable a more efficient 

access to catalyst design. Figure 2. 

Figure 1 (abstract O15). ROP of D,L-lactide 

Figure 2 (abstract O15). General motif for hybridquanidines 

Page 6 of 25



References 

1. Bigg DM: Adv Polym Tech 2005, 24:69. 

2. Kricheldorf HR, Meier-Haak J: Macromol Chem 1993, 194:715. 

3. Kricheldorf HR: Chemosphere 2001, 43:49. 

4. Börner J, Herres-Pawlis S, Flörke U, Huber K: Eur J Inorg Chem 2007, 5645. 

5. Börner J, Flörke U, Huber K, Döring A, Kuckling D, Herres-Pawlis S: Chem Eur 

J 2009, 15:2362. 

O16 

ChemSpider - building a foundation for the semantic web by hosting 

a crowd sourced databasing platform for chemistry 

Anthony J Williams 1* , Valery Tkachenko 2 , Sergey Golotvin 2 , Richard Kidd 3 , 

Graham McCann 3 

1 

Royal Society of Chemistry, 904 Tamaras Circle, Wake Forest, NC-27587, USA; 

2 3 

NCBI, NLM, National Institutes of Health, Washington, USA; Royal Society of 

Chemistry, Cambridge, UK 


There is an increasing availability of free and open access resources for 

chemists to use on the internet. Coupled with the increasing availability 

of Open Source software tools we are in the middle of a revolution in 

data availability and tools to manipulate these data. ChemSpider is a free 

access website for chemists built with the intention of providing a 

structure centric community for chemists. It was developed with the 

intention of aggregating and indexing available sources of chemical 

structures and their associated information into a single searchable 

repository and making it available to everybody, at no charge. 

There are tens if not hundreds of chemical structure databases such as 

literature data, chemical vendor catalogs, molecular properties, 

environmental data, toxicity data, analytical data etc. and no single way 

to search across them. Despite the fact that there were a large number of 

databases containing chemical compounds and data available online their 

inherent quality, accuracy and completeness was lacking in many regards. 

The intention with ChemSpider was to provide a platform whereby the 

chemistry community could contribute to cleaning up the data, 

improving the quality of data online and expanding the information 

available to include data such as reaction syntheses, analytical data, 

experimental properties and linking to other valuable resources. It has 

grown into a resource containing over 21 million unique chemical 

structures from over 200 data sources. 

ChemSpider has enabled real time curation of the data, association of 

analytical data with chemical structures, real-time deposition of single 

or batch chemical structures (including with activity data) and 

transaction-based predictions of physicochemical data. The social 

community aspects of the system demonstrate the potential of this 

approach. Curation of the data continues daily and thousands of edits 

and depositions by members of the community have dramatically 

improved the quality of the data relative to other public resources for 

chemistry. 

This presentation will provide an overview of the history of ChemSpider, 

the present capabilities of the platform and how it can become one of 

the primary foundations of the semantic web for chemistry. It will also 

discuss some of the present projects underway since the acquisition of 

ChemSpider by the Royal Society of Chemistry. 

O17 

Integration of chemical information with protein sequences and 

3D structures 

Adel Golovin * , K Henrick, G Kleywegt 

EMBL-EBI/PDBe, Hinxton Hall, Genome Campus, Cambridge, Cams, 

CB10 1SD, UK 

E-mail: golovin@ebi.ac.uk 


The Protein Data Bank (PDB) contains a wealth of small molecule - macro 

molecule complexes the study of which contribute enormously to our 

understanding of the interactions. However, exploiting and mining this 

treasure trove of data requires advanced analysis and retrieval methods 

that take into account both types of molecules. One such method is 

PDBeMotif, that has been developed by the Protein Data Bank in Europe 

Page 7 of 25 

(PDBe) at EMBL-EBI. Utilizing a relational database model at the back-end, 

the data structure represents a network of molecule, residue and motif 

interactions as well as their relative positions in the sequence and in 3D. 

The loader applies a number of algorithms to analyse PDB and derive 

necessary information, such as planarity and aromaticity of the chemical 

compounds, hydrogen-bonds network, coordination geometry, bond 

types (including pi electron interactions), 3D structural motifs, sequence 

domains and families. It collects information about sequence features, 

motifs and catalytic sites from available Distributed Annotation System 

(DAS) resources. The web application allows for a wide variety of searches 

and data analysis including protein motifs with chemical fragments 

association, protein sites characterisation, correlating properties, hits 

multiple sequence and 3D alignments. The whole system is released 

under GPL and available with the source code from http://sourceforge. 

net/projects/pdbsam and on line at http://www.ebi.ac.uk/pdbe-site/ 

PDBeMotif/ 

References 

1. Golovin A, Henrick K: Chemical Substructure Search in SQL. J Chem Inf 

Model 2009, 49(1):22-27. 

2. Golovin A, Henrick K: MSDmotif: exploring protein sites and motifs. 

BMC Bioinformatics 2008, 9:312. 

3. Golovin A, Dimitropoulos D, Oldfield T, Rachedi A, Henrick K: MSDsite: A 

Database Search and Retrieval System for the Analysis and Viewing of 

Bound Ligands and Active Sites. PROTEINS: Structure, Function, and 

Bioinformatics 2005, 58(1):190-9. 

4. Golovin A, Oldfield TJ, Tate JG, Velankar S, Barton GJ, Boutselakis H, 

Dimitropoulos D, Fillon J, Hussain A, Ionides JMC, John M, Keller PA, 

Krissinel E, McNeil P, Naim A, Newman R, Pajon A, Pineda J, Rachedi A, 

Copeland J, Sitnov A, Sobhany S, Suarez-Uruena A, Swaminathan J, 

Tagari M, Tromm S, Vranken W, Henrick K: E-MSD: an integrated data 

resource for bioinformatics. Nucleic Acids Research 2004, 32 Database: 

D211-D216. 

O18 

Personalized information spaces: improved access to 

chemical digital libraries 

O Koepler * , W-T Balke, B Köhncke, I Sens, S Tönnies 

Technische Informationsbibliothek Hannover (TIB), Welfengarten,1B, 30167 

Hannover, <strong>German</strong>y 


Today digital libraries provide access to a vast, but largely unstructured, 

amount of document collections. Facing the ever increasing challenge of 

the information overload content providers have to focus on new ways in 

user-centered retrieval, not only providing tools for searching information, 

but tools for personalizing, managing, evaluating and working with the 

returned search results. 

Within the ViFaChem II research project the TIB and the L3S Research 

Center, Hannover, have developed an enhanced retrieval platform for 

chemical digital libraries. 

The prototypical interface processes full text and bibliographic metadata 

collections to create semantically enriched document collections with 

chemical metadata. Processing steps include text mining for chemical 

entities, reaction types and chemical structure reconstruction. However, 

the ViFaChem digital library does not only offer the classical document 

access via a text or chemical structure search, but also semantic search 

based on the faceted browsing paradigm. In particular it includes the 

Semantic GrowBag algorithm, which automatically creates facets for 

navigational access and query refinement using the relationship between 

documents and author keywords. Based on higher-order co-occurrences 

of keywords in documents these graphs hierarchically arrange all related 

topics dynamically with respect to the underlying content collection. 

Having different personal views on the document collection the user now 

can search and navigate through search results using individual retrieval 

strategies. Facets for chemical reactions, chemical entities or topics 

combined with an underlying ontology thus allow a semantic driven 

access to documents. 

Combined with Web 2.0 features the interface of ViFaChem II provides 

the user with a new experience in searching large document collections, 

where the user can navigate through query results based on his 

personalized knowledge space.



O19 

Current aspects and future trends of computer-aided rescaffolding 

Karl-Heinz Baringhaus * , Gerhard Hessler, Thomas Klabunde 

Sanofi-Aventis Deutschland GmbH, CAS Drug Design, Building G 878, 65926 

Frankfurt am Main, <strong>German</strong>y 


The competitive pressure in pharmaceutical industry is reflected by longer 

development times and increasing development costs yielding less new 

chemical entities. In particular, identification of suitable lead compounds 

is one of the key challenges in early drug discovery. Next to well 

established techniques like high throughput screening (HTS) and virtual 

screening computer-aided rescaffolding has become an important 

approach in detecting novel chemical structures [1,2]. 

The development of New Chemical Entities (NCEs) requires the 

exploration of novel landscapes in chemical space. Rescaffolding is in 

particular important in lead identification but also in lead optimization. If 

a lead series cannot be optimized in multidimensional space, scaffoldrescuing 

is often perceived as back-up strategy to transfer hitherto 

available SAR into a new scaffold. 

With a binding site or a pharmacophore in hand often de novo design 

methods are applied to identify novel chemical matter. However, de novo 

design differs from rescaffolding in that the goal of the first is primarily to 

generate new molecules in chemical space while the latter aims to design 

new scaffolds under constraints: high similarity in the desired property 

space and novelty in scaffold space. This allows jumping out of the known 

region of chemical space towards new regions of chemistry resulting in 

molecules with similar properties and activities but with novel frameworks. 

This talk will focus on ligand-based rescaffolding by taking into account 

only the topology of reference molecules. Several descriptors have been 

successfully applied for rescaffolding purposes, e.g. CATS (including 

CATS3D), Feature Trees (cf. ReCore), Topomers, Gaussian shape (cf. ROCS, 

BROOD) and others. In fragment-based rescaffolding new molecules are 

assembled from small (sub-) molecular fragments or building blocks often 

by applying pseudo-chemical rules in their recombination (e.g., RECAP). 

The use of fragments reduces the sampling rate of chemical space and 

provides access to synthesizable new compounds. This will be 

exemplified by two rescaffolding examples taking into account 2D as well 

as 3D descriptors. 

References 

1. Krueger BA, Dietrich A, Baringhaus K-H, Schneider G: Scaffold-Hopping 

Potential of Fragment-Based De Novo Design: The Chances and Limits of 

Variation. Combinatorial Chemistry & High Throughput Screening 2009, 

12(4):383-396. 

2. Mauser H, Guba W: Recent developments in de novo design and scaffold 

hopping. Current Opinion in Drug Discovery & Development 2008, 

11(3):365-374. 

O20 

The membrane bound aromatic p-hydroxybenzoic acid 

oligoprenyltransferase (UbiA) - how iterative improvements lead to a 

realistic structure that offers new insights into functional aspects of 

prenyl transferases and terpene synthases 

W Brandt * , J Kufka, D Schulze, E Schulze, F Rausch, L Wessjohann 

Leibniz Institute of Plant Biochemistry, Department of Bioorganic Chemistry, 

Weinberg 3, D-06120 Halle (Saale), <strong>German</strong>y 

E-mail: wbrandt@ipb-halle.de 


Prenyltransfering enzymes are at the basis of the vast isoprenoid natural 

product diversity. 4-Hydroxybenzoate oligoprenyltransferase of E. coli, 

encoded in the gene ubiA, is a key enzyme in the biosynthetic pathway 

to ubiquinone. No X-ray structure exists of this membrane protein. It 

catalyses the prenylation of 4-hydroxybenzoic acid in the 3-position using 

an oligoprenyl diphosphate as second substrate. 

With homology modeling techniques, a first model indicated two putative 

active sites[1]. Based on this model, amino acids identified as important 

for the catalytic mechanism were selectively replaced to obtain five new 

mutants. All mutants were tested for their ability to form geranylated 

hydroxybenzoate from geranyl diphosphate, but only the unmodified 

Page 8 of 25 

UbiA-enzyme and to minor extent one mutant showed enzymatic activity. 

These results indicated the involvement of all mutated amino acids in the 

catalytic mechanism but at the same time demanded a remodelling of 

the previously proposed enzyme structure combining two active sites. 

They have been placed into close proximity in the new model. Based on 

these experimental results and structural classification of prenyl enzymes, 

a highly relevant 3D-model could be developed. This model is able to 

explain a wide range of substrate specificities and is in complete 

agreement with the results of site directed mutagenesis [2]. 

Aromatic prenyl transferases, prenyl diphosphate synthases, and terpene 

synthases, activate an (olig)prenyl diphosphate to form a stabilized prenyl 

cation reactive intermediate that, after addition to a nucleophile (C=C 

bond) and deprotonation delivers the product(s). Aromatic amino acids 

have been suggested to stabilize the cation intermediate. Ab inito LMP2 

calculations indicate not only stabilization of a prenyl cation by aromatic 

amino acid side chains but also by a methionine side chain. This 

suggestion is supported by site directed mutagenesis, bioinformatics, and 

modelling studies. In addition, a new catalytic diad composed of Tyr and 

Asp, represented by a Yx(x)xxD-motif, is identified as important player for 

deprotonation and proton-relay in intermediates, or for the finalizing 

deprotonation step of many prenyl transferring and cyclizing enzymes. 

References 

1. Brauer L, Brandt W, Wessjohann L: J Mol Model 2004, 10:317-327. 

2. Brauer L, Brandt W, Schulze D, Zakharova S, Wessjohann L: Chembiochem 

2008, 9:982-992. 

O21 

CELLmicrocosmos 2.2: advancements and applications in modeling of 

three-dimensional PDB membranes 

Björn Sommer 1* , T Dingersen 1 , S Schneider 2 , S Rubert 1 , C Gamroth 1 

1 

University of Bielefeld, Universitätsstraße 25, 33615 Bielefeld, <strong>German</strong>y; 

2 

D-85748 Garching, <strong>German</strong>y 


Background: Today, only a few programs support membrane 

computation and/or modeling in 3D. They enable the user to create very 

simple-structured membrane layers and usually assume a high level of 

bio-chemical/-physical knowledge. The CELLmicrocosmos 2 project 

developed a tool providing a simplified workflow to create membrane 

(bi-)layers: The MembraneEditor (CmME). 

Results: The geometry-based, scalable and modular computation concept 

supports fast to more complex membrane generations. CmME is based on 

the integration of two different types of PDB [1] models: Lipids are 

integrated with editable percental distribution values and algorithms. 

Proteins are inserted and aligned into the bilayer manually or automatically, 

by using data from the PDB_TM [2] or OPM [3] database. Compatibility with 

other programs is offered by extensive PDB format export settings. High 

lipid densities are possible through advanced packing algorithms. Lipid 

distributions can be developed by using the Plugin-Interface. Originally not 

intended to change the atomic structure of the molecules due performance 

issues, now it is also possible to access the atomic level for user-defined 

computations. Multiple membrane (bi-)layers and microdomains are 

supported as well as a reengineering function providing the re-editing of 

externally simulated PDB membranes. 

Conclusions: The capabilities of CmME has been extended and tested to 

meet the requirements of different PDB visualization programs as well 

as molecular dynamics (MD) simulation environments like Gromacs [4]. 

The documentation and the Java Webstart application, requiring only 

an internet connection and Java 6, is accessible at: http://Cm2. 

CELLmicrocosmos.org 

References 

1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, 

Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Research 

Oxford University Press, Oxford 2000, 28:235-242. 

2. Tusnády GE, Dosztányi Z, Simon I: Transmembrane proteins in the Protein 

Data Bank: identification and classification. Bioinformatics 2004, 

20(17):2964-2972. 

3. Lomize MA, Lomize AL, Pogozheva ID, Mosberg HI: OPM: Orientations of 

Proteins in Membranes database. Bioinformatics 2006, 22:623-625. 

4. Lindahl E, Hess B, Spoel van der D: GROMACS 3.0: A package for 

molecular simulation and trajectory analysis. J Mol Mod 2001, 7:306-317.



O22 

Approaching the limits: scoring functions for affinity prediction 

Christoph Sotriffer 

Institut für Pharmazie und Lebensmittelchemie, Universität Würzburg, 

Am Hubland, D - 97074 Würzburg, <strong>German</strong>y 


One of the main tasks of computational methods in ligand design and 

lead identification is the elucidation and assessment of interaction modes 

between small-molecule ligands and protein target structures. This 

generally requires to estimate the relative or absolute affinity of a 

protein-ligand complex from its three-dimensional coordinates. Although 

statistical thermodynamics would in principle provide the necessary 

equations to calculate free energies of binding from molecular properties, 

these equations are not readily amenable to computation since 

appropriate ensembles of the solvated systems must be generated and 

thoroughly sampled, which normally requires prohibitively long 

computing times. For the purpose of drug design, and virtual screening 

in particular, simpler and faster methods are needed, which are 

commonly referred to as scoring functions. 

In general, three major classes of scoring functions can be 

distinguished: force-field based methods, knowledge-based potentials, 

and empirical scoring functions. For any of these classes, many different 

functions have been developed over the past years. Although largescale 

and truly unbiased comparative assessments of their performance 

are relatively rare due to inherent difficulties in setting up appropriate 

test sets, the strengths and limitations of current scoring functions are 

fairly evident from the available data. In general, good results can be 

obtained in the identification or reproduction of experimentally 

observed binding modes. The ability to distinguish active ligands from 

decoys appears, at least, to be sufficient to make virtual screening a 

practically useful endeavour. However, the correlation with the 

experimental binding free energy and the possibility to quantitate the 

effects of small structural changes on the ligand affinity are in many 

cases still disappointing. 

Based on the approximations used by current scoring functions, strategies 

for improvement can be defined. On the other hand, there are some 

fundamental problems, also with respect to the underlying experimental 

data, which suggest that the best functions presently available may 

already be close to the limit of what can be achieved with empirical 

approaches. 

O23 

Ion permeation through neuronal nAChR ion channel 

Jens Krüger * , G Fels 

University of Paderborn, Warburger Strasse 100, 33100 Paderborn, <strong>German</strong>y 

Journal of Cheminformatics 2010, 2(Suppl 1):O23 

Alzheimer’s dementia is related to loss of inter-neuron communication by 

nicotinic acetylcholine receptors (nAChR) at post synaptic neurons, and 

successful therapy approaches rely e.g. on modulations of response 

signals initiated by ion flux through the receptor integrated ion channel. 

We have evaluated Umbrella Sampling (US) and Steered Molecular 

Dynamics (SMD) pulling for calculation of the potential of mean force 

(PMF) of ion permeation through neuronal alpha7 nAChR, a cation 

selective channel, in order to understand the changes in electrical 

potentials connected to channel gating. SMD enables the discrimination 

between ion in- and efflux and allows detailed insight into permeation 

pathways (Figure 1). 

Both methods show the same performance characteristics but require a 

different number of simulations with different simulation length. For 

construction of one PMF 300 separate 300 ps simulations are performed 

for US, but only one 2100 ps run for SMD pulling. Reasonable error 

estimates require three repetitions of US but two times twelve repetitions 

for SMD, which sum up to 9360 CPU hours in case of US and 6490 for 

SMD. SMD, therefore, needs 30% less time on low latency HPC clusters, 

but US gains advantage through distributed computing. 

The potential energy barriers are calculated for Na + and Cl - based on US 

and SMD pulling, enabling differentiation between ion types and their 

in- and efflux through the channel. 

Figure 1 (abstract O23). Cut through the transmembrane part of 

the nAChR. The thin gray lines correspond to individual permeation 

pathways. 

O24 

Fleksy: a flexible approach to induced fit docking 

Markus Wagener 1* , SB Nabuurs 2 , J de Vlieg 1 

1 Schering-Plough Research Institute, PO Box 20, 5340BH Oss, 

The Netherlands; 2 Nijmegen, The Netherlands 


Protein receptor rearrangements upon ligand binding are a major 

complicating factor in structure-based drug design. An accurate prediction 

of these so-called induced fit phenomena calls for ligand docking and 

virtual screening approaches capable of considering receptor flexibility. 

We present Fleksy [1], a flexible approach aimed at accurately positioning 

small molecule ligands into a protein receptor, while taking both ligand 

and receptor flexibility into account. Our method consists of an ensemble 

docking stage in which the ligand of interest is docked into a structural 

ensemble of receptor conformations, followed by a complex optimization 

stage during which both ligand and protein are allowed to move. Pivotal 

to our method is the use of receptor ensembles to describe protein 

flexibility. To construct these ensembles we use a backbone dependent 

rotamer library and implement the concept of interaction sampling. The 

latter allows for the evaluation of different orientations and, when 

relevant, different tautomers of ambivalent interaction partners in the 

binding site such as asparagine, glutamine and histidine side chains. The 

docking stage comprises an ensemble-based soft-docking experiment 

using FlexX-Ensemble [2], followed by an effective flexible receptor-ligand 

complex optimization using Yasara [3]. Ultimately Fleksy results in a set of 

receptor-ligand complexes ranked using a consensus scoring function 

which combines both docking scores and force field energies. Figure 1. 

Averaged over three cross-docking datasets, in total containing 35 

different pharmaceutically relevant receptor-ligand complexes, Fleksy 

reproduces the observed binding mode within 2.0 Å for 78% of the 

complexes. This compares favorably to the rigid receptor FlexX program 

[4] which on average reaches a success rate of 44% for these datasets. 

References 

1. Nabuurs SB, Wagener M, de Vlieg J: J Med Chem 2007, 50:6507. 

2. Claussen H, Buning C, Rarey M, Lengauer T: J Mol Biol 2001, 308:377. 

3. [http://www.yasara.org/]. 

4. Rarey M, Kramer B, Lengauer T, Klebe G: J Mol Biol 1996, 261:470. 

Figure 1 (abstract O24). 

Page 9 of 25



POSTER PRESENTATIONS 

P1 

CWM global search - an internet search engine for the chemist 

Alexander Kos * , H-J Himmler 

AKos GmbH, Austr. 26, D-79585 Steinen, <strong>German</strong>y 

Journal of Cheminformatics 2010, 2(Suppl 1):P1 

The Internet is a rich source of data and information for chemist. There are 

numerous multidisciplinary databases available for free on the Internet. Some 

examples of such data repositories are: PubChem, ChemSpider, eMolecules, 

Drugbank, KEGG, NIST, ChemSynthesis, PharmGKB, Free patents online ... 

It should be obvious that an end user is a) not aware of all the resources, 

and b) has not the time to learn every user interface and is unable to 

search over all of them. We provide CWM Global Search as an application 

that enables to search by structure, CAS Registry Number and free text 

over all these sources. Presently CWM Global Search performs searches in 

30 databases and search engines accessing more than 100 million pages 

that associate data with structures. 

The user can submit a single query structure or several using SDFiles. In 

addition to molecule searches CWM Global Search also allows to submit 

reaction queries. In that case several single molecule searches are performed 

for the reactants, reagents and products. This makes it easy to find 

commercial suppliers and other synthesis relevant information such as 

safety sheets in one query. 

Searching is technically less problematic than providing the answers in a 

digestible way for the user. Our first approach is to provide profiles for 

searching. You can choose “Availability” if you are interested to find a 

commercial supplier, or “Biology” if you are looking for biological effects. 

The second help comes when we display the summary of the results. You 

get a table with hyperlinks color coded by topics. If the result page of doing 

asearchcontainsalinktoanMSDS,thetopic‘Safety’ is highlighted. With 

profiles you limit your search to certain sources, and with topics your 

answers will be ordered. 

CWM Global Search is not the application for exact searches like “give me 

the melting point of anthracene”. You will find the melting point on 

many pages, and the topic “Physical Property” might help, but, in this 

case the link to Wikipedia gives the result quickest. Internet pages 

provide us the data in unordered fashion and finding the exact answers 

is time consuming. In some case like commercial suppliers we check 

against an internal list if the page really displays a supplier, or, if for 

instance PubChem has only a reference to ChemSpider, and ChemSpider 

references again just PubChem, but nowhere you will find a supplier. We 

also have to consider that many providers of the resources would not 

allow us to extract data directly without leading the user to their Internet 

pages. The nature of the results is fuzzy in CWM Global Search. This is an 

advantage if you look for instance for biological effects, which can be 

many, and/or if you want to learn why a compound could be important. 

We generate both InChI names and keys for the query structure and 

afterwards perform fast text searches in the various databases or search 

engines. In addition we also generate Smiles. Since prior to the existence 

of standard InChI’s (released 2009) the various database providers used 

different settings to generate the InChI’s stored in their database, we use 

multiple settings when generating an InChI identifier used for a structure 

search in CWM Global Search. This way we greatly maximize the chance to 

find an InChI independent of the settings used by the database provider. 

P2 

The status of the InChI project and the InChI trust 

Stephen Heller 1* , Alan McNaught 2 

1 

NIST, PCPD, MS-8380, 100 Bureau Drive, Gaithersburg, 20899-8380, USA; 

2 

InChI Trust, Cambridge, UK 


The IUPAC InChI/InChIKey project has evolved to the point where its 

future development and promulgation require a new management 

system that will provide stable and financially viable administrative 

arrangements for the foreseeable future. This is necessary to give the 

world-wide chemistry community that IUPAC serves the confidence that 

facilities for development, maintenance and support of the InChI/InChIKey 

Page 10 of 25 

algorithm are firmly established on anongoingbasis,andwillensure 

acceptance and usage of InChI as a mainstream standard. This new 

management system is provided by means of a new organization, 

the InChI Trust, an independent not-for-profit entity paid for by 

the community and those who use and benefit from the InChI algorithm. 

The mission of the Trust is quite simple and limited; its sole purpose is to 

create and support administratively and financially a scientifically robust 

and comprehensive InChI algorithm and related standards and protocols. 

This presentation will describe the current technical state of the InChI 

algorithm and how the InChI Trust is working to assure the continued 

support and delivery of the InChI algorithm. 

P3 

jsMolEditor: an open source molecule editor for the next 

generation web 

L Duan 

School of Pharmacy, East China University of Science and Technology, 

130 Meilong Road, Shanghai 200237, PR China 


Molecule editor have become essential building blocks for modern 

chemistry related databases with structural operation available. It was quite 

a challenging task to input molecule structures in a web browser before 

Java Applets technology was applied in chemoinformatics. Until now, Java 

Applets are still the de facto standard for web based databases. With the 

development of web technologies and script programming, Ajax, a Web 2.0 

technology, is becoming mainstream in web development. This gradually 

eliminates the need for plug-ins such as Java and ActiveX. Therefore, it is 

possible to develop powerful web-based tools without plug-ins. 

This poster introduces a JavaScript based molecule editor: jsMolEditor, 

which had made a significant difference from other editors available on 

web pages. Unlike server side editors such as PubChem editor [1], 

jsMolEditor implemented all its functionality in JavaScript on the client 

side. Thus it is not necessary to maintain a connection to a server during 

operation. jsMolEditor works completely independently on the client side. 

The JavaScript nature of jsMolEditor also made it able to run in standard 

web browsers without specific plugins or virtual machines. The 2D drawing 

system deals with web browser differences. By combining Canvas and 

VML, jsMolEditor was able to work on all common web browsers including 

Internet Explorer, Firefox, Safari, Opera and Chrome. Instead of building 

jsMolEditor in JavaScript from scratch, GWT [2] is utilized as a crosscompiler 

from Java to JavaScript, which enables the reuse of existing Java 

chemoinformatics frameworks. jsMolEditor used a ported version of MX [3] 

as the bottom layer framework to handle basic molfile I/O. Thanks to MX’s 

open source license, the development of jsMolEditor saves a lot of time by 

eliminating the process of “reinventing the wheel”. The poster represents 

not only jsMolEditor itself, but also the building process of jsMolEditor and 

the feasibility of applying the same concept to other chemistry gadgets on 

web pages, such as spectrum viewers. 

jsMolEditor’s online demo is available at http://chemhack.com/jsmoleditor/ 

and the source code is available at http://github.com/chemhack/jsmoleditor/. 

References 

1. PubChem Server Side Structure Editor. [http://pubchem.ncbi.nlm.nih.gov/ 

edit/]. 

2. GWT, Google Web Toolkit. [http://code.google.com/webtoolkit/]. 

3. MX Chemoinformatics Toolkit. [http://metamolecular.com/mx]. 

P4 

Mining public-source databases for structure-activity relationships 

Bernd Wendt 1* , Fabian Bös 2 , Ulrike Uhrig 2 

1 Elara Pharmaceuticals, Heidelberg, <strong>German</strong>y; 2 Tripos International, 

Martin-Kollar-Str. 17, München, <strong>German</strong>y 


Modeling off-target effects has become a highly important and relevant 

component of the computational chemistry toolset. The presentation will 

describe a new contribution that seeks to allow the extraction of 3Dstructure-activity 

relationships from public information starting from a 

chemical structure. Several public source databases such as PubChem [1] 

offering structure as well as activity information for a number of targets



have been examined for their value in extracting useful structure-activity 

relationships (SARs). A Topomer search [2] of public-source databases using 

the structures of a set of 255 marketed drugs [3] as queries yielded sets of 

shape- and pharmacophore similar hits. SAR-tables were constructed by 

collecting hits around each query structure and for a particular reported 

activity. A new method: quantitative series enrichment analysis (QSEA) [4] 

was applied to these SAR-tables to capture trends and to transform these 

trends into 3D-QSAR models. Overall more than 400 SAR-tables with 

Topomer CoMFA models were found by extracting trends from the 

PubChem and ChemBank [5] database. The resulting models were able to 

highlight the structural details of certain off-target effects of marketed drugs 

even in those cases where the traditional structural similarity would 

conclude that no off-target effect would exist. This demonstrates the 

usefulness of the approach in modeling off-target effects. 

References 

1. [http://pubchem.ncbi.nlm.nih.gov]. 

2. Cramer RD: J Chem Inf Comp Sci 2004, 44:1221-1227. 

3. Cleves AE, Jain AN: J Med Chem 2006, 49:2921-2938. 

4. Wendt B, Cramer RD: J Comp Aided Mol Des 2008, 22:541-555. 

5. [http://chembank.broad.harvard.edu/]. 

P5 

OCHEM - on-line CHEmical database & modeling environment 

Sergii Novotarskyi 1* , Iurii Sushko 1 , R Körner 1 , Anil Pandey Kumar 1 , 

Matthias Rupp 1 , VV Prokopenko 2 , Igor Tetko 1 

1 Institute of Bioinformatics and Systems Biology - Helmholtz Zentrum 

Muenchen, <strong>German</strong> Research Center for Environmental Health, Ingolstaedter 

Landstrasse 1, D-85764 Neuherberg, <strong>German</strong>y; 2 Kiev, Ukraine 


The main goal of OCHEM database http://qspr.eu is to collect, store and 

manipulate chemical data with the purpose of their use for model 

development. Its main features, that distinguish it from other available 

databases include: 

1. The database is open and it is based on Wiki-style principles. We 

encourage users to submit data and to correct inaccurate submitted data; 

2. The database is aimed at collecting high-quality data. To achieve this 

we require users to submit references to the article, where the data was 

published. The reference may include the article name, journal name, 

date of publication, page number, line number, etc. 

3. Since the compound properties may vary depending on the conditions, 

under which they were measured, we store the measurement conditions 

with the data to provide the users with more accurate information about 

each data point. 

The modeling framework is being developed to complement the Wiki-style 

database of chemical structures. Its main goal is to provide a flexible and 

expandable calculation environment that would allow a user to create and 

manipulate QSAR and QSPR models on-line. The modeling framework is 

integrated with the database web-interface that allows easy transfer 

of database data to the models. The web interface of the modeling 

environment is aimed to provide to the Web users easy means to create 

high-quality prediction models and estimate their accuracy of prediction and 

applicability domain. The developed models can be published on the Web 

and be accessed by other users to predict new molecules on-line. This tool is 

aimed to generate a new paradigm for structure activity relationship 

knowledge bases, making QSAR/QSPR models active, user-contributed and 

easily accessible for benchmarking, general use and educational purposes. 

The examples of the use of the database within national and EU projects will 

be exemplified. 

P6 

ChEBI: a chemistry ontology and database 

Paula de Matos * , A Dekker, M Ennis, Janna Hastings, K Haug, S Turner, 

Christoph Steinbeck 

Cheminformatics and Metabolism Team, European Bioinformatics Institute, 

Hinxton Cambridge, CB10 1SD, UK 


The bioinformatics community has developed a policy of open access and 

open data since its inception. This is contrary to chemoinformatics which 

Page 11 of 25 

has traditionally been a closed-access area. In 2004, two complementary 

open access databases were initiated by the bioinformatics community, 

ChEBI [1] and PubChem. PubChem serves as automated repository on the 

biological activities of small molecules and ChEBI (Chemical Entities of 

Biological Interest) as a manually annotated database of molecular 

entities focused on ‘small’ chemical compounds. Although ChEBI is 

reasonably compact containing just over 18,000 entities, it provides a 

wide range of data items such as chemical nomenclature, an ontology 

and chemical structures. The ChEBI database has a strong focus on 

quality with exceptional efforts afforded to IUPAC nomenclature rules, 

classification within the ontology and best IUPAC practices when drawing 

chemical structures. 

ChEBI is currently undergoing a period of restructuring which will allow it 

to incorporate the small molecule structures from (and link to) EBI’s new 

chemogenomics database ChEMBL [2], increasing its small molecules 

coverage to over 500,000 entities. We have restructured the chemical 

structure search facility to use Orchem [3] an Oracle chemistry plug-in 

using the Chemistry Development Kit [4]. The facility allows a user to 

draw a chemical structure or load one from a file and then execute either 

a substructure or similarity search. Furthermore the ChEBI text search will 

have extensive facilities for querying based on not only names but 

formula, a range of charges and molecular weight. The ability to query 

the ChEBI ontology and retrieve all children for a given entity will also be 

included. 

In order to aid the distribution of ChEBI to the chemoinformatics 

community we have extended our export formats to include an MDL sdf 

format with a lighter version consisting only of compound structure, 

name and identifier. A complete version is available with all the ChEBI 

data properties such as synonyms, cross-references, SMILES and InChI. 

Furthermore cross-references in ChEBI have been extended to include 

BRENDA the enzyme database, NMRShiftDB the database for organic 

structures and their nuclear magnetic resonance (nmr) spectra, Rhea the 

biochemical reaction database and IntEnz the enzyme nomenclature 

database. 

ChEBI is available at http://www.ebi.ac.uk/chebi. 

References 

1. Degtyarenko K, de Matos P, Ennis M, Hastings J, Zbinden M, McNaught A, 

Alcántara R, Darsow M, Guedj M, Ashburner M: ChEBI: a database and 

ontology for chemical entities of biological interest. Nucleic Acids Res 

2008, 36:D344-D350. 

2. [http://www.ebi.ac.uk/chembl/]. 

3. Rijnbeek M, Steinbeck C: An open source chemistry search engine for 

Oracle. in press. 

4. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen EL: The 

Chemistry Development Kit (CDK): An Open-Source Java Library for 

Chemo- and Bioinformatics. J Chem Inf Comput Sci 2003, 43(2):493-500. 

P7 

Comparing manual and automated extraction of chemical entities 

from documents 

Christian Tyrchan * , Sorel Muresan 

AstraZeneca R&D, LG CVGI, Pepparedsleden 1, S-43183 Mölndal, Sweden 


The chemical information landscape is changing rapidly with a yearly 

increase of over 1 million new compounds and more than 700,000 

publications related to chemistry [1]. Exploring the chemical space covered 

by relevant journals and patents is a crucial step in early stage medicinal 

chemistry projects. Extracting chemical entities from unstructured text is a 

complex task and different approaches are currently used including 

manual extraction by expert curators, text mining supported by chemical 

NER or combinations thereof [2]. The chemical information and 

corresponding annotations are subsequently stored in relational databases 

allowing for complex chemical and text queries. 

To assess the capability of chemical NER in documents and to understand 

thecoverageandaccuracyoftheunderlyingdatawecomparedthe 

chemistry extracted by manual curation (GVKBIO) and text mining 

(SureChem) from a small patent corpus. 

GVKBIO databases are populated with explicit relationships between 

compounds, assays and sequence identifiers that have been manually 

extracted from journals and patents on a large scale [3].



SureChem Portal [4] is a gateway for chemical patent search on full text 

collections for USPTO, EPO and WO. SureChem users can perform structure 

and keyword searches on more than 9 million unique compounds. 

We have selected a set of 250 patents covering various target classes and for 

which a minimum of 25 records per patents were retrieved from GVKBIO 

Patent database. The analysis was done using PipelinePilot protocols [5]. 

These initial results demonstrate the benefits and challenges of text 

mining for chemical information extraction from unstructured text. 

References 

1. Engel T: J Chem Inf Model 2006, 46:2267-2277. 

2. Banville DL, (Ed.): Chemical Information Mining: Facilitating Literature-Based 

Discovery CRC Press: Boca Raton, London New York 2009. 

3. [http://www.gvkbio.com/informatics.html]. 

4. [http://www.surechem.org]. 

5. [http://accelrys.com/products/scitegic]. 

P8 

Embedded infrastructure for primary data in chemistry 

Igor Grbavac 1* , Oliver Koepler 1 , S Dohmeyer-Fischer 2 , Gregor Fels 2 , 

Irina Sens 1 , J Brase 1 

1 <strong>German</strong> National Library of Science and Technology, Welfengarten 1B, 

30167 Hannover, <strong>German</strong>y; 2 Universität Paderborn, Department Chemistry, 

Warburgerstrasse 100, 33098 Paderborn, <strong>German</strong>y 


Researchers around the world produce vast amounts of experimental data 

each day. Primary (experimental) data are the key elements of research 

projects and scientific publications. During the scientific workflow from the 

experiment to the final scientific publication primary data is analysed, 

interpreted and condensed to a final quintessence. But until now the 

handling of primary data in chemistry contains no commonly approved 

standards concerning reusability and long-time accessibility. Predominantly 

there are no quality control, no assured long-time archival storage, no 

proofs and no exploitation of primary data and consequently there is no 

assurance of data given. The concept study “Konzeptstudie Vernetzte 

Primärdaten-Infrastruktur für den Wissenschaftler-Arbeitsplatz” of the TIB, 

the workgroup of Prof. Fels and the FIZ CHEMIE aims to identify the key 

factors for an embedded infrastructure of primary data in the field of 

chemistry. In this infrastructure, chemical primary data shall be saved 

persistently in a central database, linked and be citable by the use of DOI 

(digital object identifier), be accessible and searchable. Figure 1. 

We have carried out a questionnaire among researchers to analyse the 

scientific workflow and the chemical life-cycle of primary data in chemistry 

and to identify the requirements for processes and structures in an 

embedded scientific infrastructure for researchers. We have furthermore 

defined both a general and extended metadata scheme for the storage, 

citation and searching primary datasets. General metadata include 

elements necessary for citations while chemical metadata cover specific 

needs of describing and searching within the data itself. 

Figure 1 (abstract P8). 

P9 

Prediction of the partition coefficient between air and body 

compartments from the chemical structure 

Stefanie Stöckl * , R Kühne, R-U Ebert, G Schüürmann 

UFZ Department of Ecological Chemistry, Helmholtz Centre for 

Environmental Research, Permoserstr. 15, 04318 Leipzig , <strong>German</strong>y 


Page 12 of 25 

For PBPK modeling, partition coefficients between tissues and 

environmental compartments are required. A simple approach starts with 

the system blood/air, fat/air, and fat/blood. Employing thermodynamic 

relationships, one of the three coefficients can be calculated from the 

other two values. With respect to available human and mammal data, 

modeling efforts focus on the blood/air and fat/air partition coefficient. 

Respective data sets from literature have been collected and evaluated. 

The chemical domain of the sets is presented in terms of chemical 

structure, complexity, and polarity. Data gaps have been identified. 

The blood/air and fat/air partition coefficient data sets have been applied 

to a validation of literature models, with particular remark on the 

performance for specific compound classes. A new model for the blood/ 

air partition coefficient and a preliminary new model for the fat/air 

partition coefficient are presented. 

The study was supported by the EU projects 2FUN (contract No. 036976) 

and OSIRIS (IP, contract No. 037017). 

P10 

Modelling dissociation constants of organic acids by local molecular 

parameters 

Haiying Yu * , R Kühne, R-U Ebert, G Schüürmann 

Department of Ecological Chemistry, Helmholtz Centre for Environmental 

Research – UFZ, Permoserstr. 15, 04318, Leipzig, <strong>German</strong>y 


The dissociation constant pKa defines the ionization degree of organic 

compounds. It is an important parameter that affects their toxicity and 

environmental behavior and fate. Among the present approaches to 

predict pKa values of organic chemicals, increment methods show a 

good performance but are limited by missing values of special groups. 

The purpose of this study is to develop models for predicting the pKa of 

organic acids directly from their molecular structures. 

A quantum chemical method was introduced by employing local molecular 

parameters. Experimental pKa values of more than 1000 organic acids, 

including phenols, aromatic carboxylic acids, aliphatic carboxylic acids and 

alcohols were selected, and all molecules were optimized using the semiempirical 

AM1 Hamiltonian. Simple models with several descriptors for 

different subsets were calibrated by multilinear regression (MLR). Substituent 

positions on aromatic rings were also taken into account. 

The obtained models yield good predictive squared correlation coefficients 

q2 and superior performance compared with other quantum chemical 

models. The prediction capability is further evaluated using cross 

validation. The models unravel that the potential of oxygen atom conjoint 

to ionizable hydrogen atom to accept extra electronic charge is the single 

most important contribution to the dissociation of hydrogen atom. 

Non-linear statistical analysis methods were also applied because of the 

non-linear character of the employed descriptors. 

The study was supported by the China Scholarship Council and by the EU 

project OSIRIS (IP, contact No. 037017), which is gratefully acknowledged. 

P11 

Where are the boundaries? Automated pocket detection for 

druggability studies 

Andrea Volkamer 1* , T Grombacher 2 , Matthias Rarey 1 

1 University of Hamburg, Center for Bioinformatics, Bundesstr. 43, 20146 

Hamburg, <strong>German</strong>y; 2 Bioinformatics, Merck Serono, Frankfurter Str. 250, 

64293 Darmstadt, <strong>German</strong>y 


Computer-based prediction of protein druggability is an essential task in 

the drug development process. Early identification of disease modifying



targets that can be modulated by low-molecular weight compounds can 

help to speed up and reduce costs in drug discovery. Recently, first 

methods have been presented performing a druggability estimation 

solely based on the 3D structure of the protein [1-3]. The essential first 

step for such methods is the identification of the active site. A multitude 

of methods exist for automated active site prediction [4-6]. However, 

most methods developed for automated docking procedures do not 

explicitly focus on the definition of the boundary of the active site. Since 

druggability estimates are based on structural descriptors of the active 

site, a precise description of the active site boundaries is vital for correct 

predictions. 

In this work, we present a method to predict protein binding pockets and 

split them into subpockets such that small molecules are mostly 

contained within one sub-pocket. The method is based on a novel 

strategy to geometrically detect narrow regions in pockets. For 

druggability predictions, such pocket descriptions result in more 

meaningful structural descriptors like active site surface or volume. 

Moreover, if several structures from one protein are known, sub-pockets 

can give hints about protein flexibility and induced fit conformational 

changes. 

Our method was evaluated on 718 proteins from the PDBbind [7] data 

set, as well as 5419 proteins from the scPDB [8] data set. Binding pockets 

are correctly predicted in 94% and 93% of the datasets. 38%, respectively 

45% of the proteins from the two datasets contain pockets which can be 

divided into more than one sub-pocket. In all cases one sub-pocket 

completely covers the co-crystallized ligand. Besides the classical overlapmeasure 

of ligand versus predicted active site, we additionally considered 

the pocket coverage by the co-crystallized ligand. We found that the 

number of test cases with more than 30% pocket coverage rises from 

30% to 74% (PDBbind) and from 28% to 63% (scPDB), respectively, when 

considering sub-pockets. 

References 

1. Nayal M, Honig B: Proteins 2006, 63(4):892-906. 

2. Cheng C, et al: Nat Biotechnol 2007, 25(1):71-5. 

3. Halgren TA: J Chem Inf Model 2009, 49(2):377-89. 

4. Hendlich M: J Mol Graph Model 1997, 15(6):359-63. 

5. Weisel M, et al: Chem Cent J 2007, 1:7. 

6. An J, et al: Mol Cell Proteomics 2005, 4(6):752-61. 

7. Wang R, et al: J Med Chem 2004, 47(12):2977-80. 

8. Kellenberger E, et al: J Chem Inf Model 2006, 46(2):717-27. 

P12 

Protein negative/positively cooperative binding to 

zwitterionic/anionic vesicles 

F Torrens * , G Castellano 

Institut Universitari de Ciència Molecular, Universitat de València, Edifici 

d’Instituts de Paterna, P. O. Box 22085, 46071 València, Spain 


The role of electrostatics is studied in the adsorption of cationic 

proteins to zwitterionic phosphatidylcholine (PC) and anionic mixed PC/ 

phosphatidylglycerol (PG) small unilamellar vesicles (SUVs) [1]. For 

model proteins the interaction is monitored vs. PG content at low ionic 

strength [2]. The adsorption of lysozyme-myoglobin-bovine serum 

albumin (BSA) (isoelectric point, pI 5-11) is investigated in SUVs, along 

with changes of the fluorescence emission spectra of the proteins, via 

theiradsorptiononSUVs[3].IntheGouy-Chapmanformalismthe 

activity coefficient goes with the square of charge number [4]. 

Deviations from the ideal model indicate asymmetric location of anionic 

phospholipid in the bilayer inner leaflet, in mixed zwitterionic/anionic 

SUVs for protein-PC/PG, in agreement with experiments-molecular 

dynamics simulations. Effective SUV charge stays constant. Myoglobin-, 

DNC-melittin- and melittin-zwitterionic associations are described by a 

partition model, modulated by electrostatic charging of membrane as 

protein accumulates at interface. Provisional conclusions follow. (1) In 

mixed zwitterionic/anionic vesicles the charge effect on the protein 

binding model was analyzed. For lysozyme-anionic enough vesicles and 

myoglobin the electrostatic repulsion between cationic ad proteins 

dominates over the electrostatic attraction between ad protein dipoles. 

(2) The salt effect on the protein binding model of mixed zwitterionic/ 

anionic vesicles was analyzed. The cooperativity increases with ionic 

Page 13 of 25 

strength. The corresponding interpretation is that the electrostatic 

repulsion between cationic ad proteins decreases with increasing salt 

effect, and the electrostatic attraction between ad protein dipoles 

becomes dominant over the electrostatic repulsion between ad protein 

charges. (3) In anionic vesicles the effect of vesicle charge on protein 

binding shows that, with increasing anionic character of the vesicles, 

the protein-protein electrostatic repulsion is decreasingly important vs. 

the protein-vesicle attraction, and the electrostatic attraction between 

ad protein dipoles becomes dominant over the electrostatic repulsion 

between ad protein charges. (4) For lysozyme-mixed zwitterionic/ 

anionic vesicles and myoglobin cooperativity increases with pH. With 

increasing pH and decreasing cationic character of the protein, the 

protein-protein electrostatic repulsion is decreasingly important against 

the protein-SUV attraction, and the electrostatic attraction between ad 

protein dipoles becomes dominant over the electrostatic repulsion 

between ad protein charges. Furthermore the opposed effect is 

observed for lysozyme-zwitterionic vesicles. (5) For protein-mixed 

zwitterionic/anionic vesicle binding there is more dispersion in the 

results, which could indicate asymmetric location of anionic 

phospholipid. 

References 

1. Torrens F, Campos A, Abad C: Cell Mol Biol 2003, 49:991. 

2. Torrens F, Abad C, Codoñer A, García-Lopera R, Campos A: Eur Polym J 

2005, 41:1439. 

3. Torrens F, Castellano G, Campos A, Abad C: JMolStruct2007, 

216:834-836. 

4. Torrens F, Castellano G, Campos A, Abad C: JMolStruct2009, 

274:924-926. 

P13 

CARTESIUS: a group function based toolkit for hybrid molecular 

modelling 

Mikhail Rudenko 1* , AL Tchougreeff 1,2 

1 Institute of Physics and Technology of RAS, Nakhimovsky ave. 36/1, 117218, 

Moscow; Moscow Center for Continuous Mathematical Education, 119991, 

Moscow, Russia; 2 Aachen, <strong>German</strong>y 


Modern methods of quantum chemistry have achieved significant 

successes. Nevertheless, modeling of certain classes of compounds 

remains problematic. The main complications are both high computational 

complexity of these methods and the lack of a unique way to determine 

the ground state with correct asymptotic properties for complex systems. 

One of the possible solutions is development of the hybrid molecular 

modeling methods. 

Our approach to hybrid molecular modeling is based on the ideas of 

the group function approximation, first suggested by McWeeny [1] 

and further elaborated in [2]. This approximation lays a solid basis 

for simultaneous usage of most appropriate methods for computation 

of properties of each part of a complex system and consistent 

interpretation of the properties of the system independently of the 

methods used. 

The CARTESIUS toolkit implements the above mentioned scheme in 

both flexible and computationally efficient way. This is achieved by 

means of combining cutting-edge C++ techniques for computationally 

intensive parts of the code and the full power of Python for the control 

interface. 

As a result we have a carefully designed software system, which 

possesses both the power of the quantum chemistry codes developed 

previously with use of the group function approximation (BF [3], EHCF [4], 

BFSCF [5] etc.) and high potential for further enhancements. 

This work is partially supported by the RFBR through the grant No 07- 

03-01128. 

References 

1. McWeeny R: Methods of Molecular Quantum Mechanics Academic, London, 

2 1992. 

2. Tchougréeff AL: Hybrid Methods of Molecular Modeling Springer 2008. 

3. Tokmachev M, Tchougréeff AL: J Phys Chem A 2003, 107:358. 

4. Soudackov AV, Tchougréeff AL, Misurkin IA: Theor Chim Acta 1992, 

83:389. 

5. Tchougréeff AL, Tokmachev AM: Int J Quant Chem 2006, 106:571.



P14 

Molecular fragments chemoinformatics 

Hubert Kuhn 1* , Stefan Neumann 2 , Christoph Steinbeck 3 , Carsten Wittekindt 1* , 

Achim Zielesny 4 

1 CAM-D, Essen, <strong>German</strong>y; 2 GNWI, Oer-Erkenschwick, <strong>German</strong>y; 3 European 

Bioinformatics Institute (EBI), Hinxton, UK; 4 University of Applied Sciences of 

Gelsenkirchen, Institute for Bioinformatics and Chemoinformatics, 

Recklinghausen, <strong>German</strong>y 


The description of chemical structures as a collection of connected 

molecular fragments is a basic requirement of coarse grained simulation 

methods like molecular fragment dynamics. These methods use 

molecular fragments as their basic interacting entities (“atoms”) and 

allow the modelling and investigation of very large chemical systems. 

Therefore a molecular fragments chemoinformatics is in need that 

supports the fragment-based representation of chemical structures as 

well as the elementary operations upon them. The poster outlines 

definitions and approaches to tackle these issues from an adequate 

molecular line notation up to the graphical representation of simulation 

boxes. 

P15 

A theoretical investigation of microhydration of amino acids 

Catherine Michaux * , J Wouters, D Jacquemin, Eric A Perpète 

Unité de Chimie Physique Théorique et Structurale, Facultés 

Universitaires Notre-Dame de la Paix, rue de Bruxelles, 61, 

B-5000 Namur, Belgium 


Water plays a role in stabilizing biomolecular structure and facilitating 

biological function. Water can generate small active clusters and 

macroscopic assemblies, that are able to transmit information on various 

scales [1]. Protonation and microhydration of proteins or DNA are of 

fundamental importance in biochemical processes such as proton 

transport, water-mediated catalysis, molecular recognition, protein folding, 

etc. The understanding of these hydration effects at the molecular level 

requires the characterization of the interactions between biomolecules 

and their environment. Due to its relevance in many fields, the 

microhydration process of nucleic acid bases or amino acids has received 

a widespread attention [2]. 

In this work, we first describe the microhydration of protonated 

amino acid (AA), and particularly of protonated glycine (GlyH+) [3][4], 

alanine (AlaH+) [5] and proline (ProH+) [6]. First a high-level theoretical 

method was setting up in order to compute the structures and 

properties of GlyH+-water complexes, Gly being the simplest AA and a 

suitable model for such a study. Then complexes with more than one 

water molecule as well as other amino acids (Ala and Pro) were 

investigated, to extend the validity domain of our computational 

procedure. 

We then investigate a series of complexes made up of a deprotonated 

(anionic) AA and a single water molecule [7]. Such species have recently 

been identified with mass spectrometry, allowing meaningful 

comparisons between theoretical and experimental complexation 

energies. The selected systems are [Gly-H]-, [Ala-H]-, [Val-H]-, [Asp-H]-, 

[Gln-H]-, as well as the acetic acid as a model benchmark. 

References 

1. Chaplin MF: Nat Rev Mol Cell Biol 2006, 7:861-866. 

2. de Vries MS, Hobza P: Annu Rev Phys Chem 2007, 58:585-612. 

3. Michaux C, Wouters J, Jacquemin D, Perpète EA: Chem Phys Lett 2007, 

445:57-61. 

4. Michaux C, Wouters J, Perpète EA, Jacquemin D: J Phys Chem B 2008, 

112:2430-2438. 

5. Michaux C, Wouters J, Perpète EA, Jacquemin D: J Phys Chem B 2008, 

112:9896-9902. 

6. Michaux C, Wouters J, Perpète EA, Jacquemin D: J Phys Chem B Letters 2008, 

112:7702-7705. 

7. Michaux C, Wouters J, Perpète EA, Jacquemin D: J Am Soc Mass Spectrom 

2009, 20:632-638. 

P16 

Predicting the binding type of compounds on the 4 adenosine 

receptors using proteochemometric models 

Olaf van den Hoven * , Gerard van Westen, Andreas Bender 

University of Leiden, LACDR, Einsteinweg 55, 2333 CC Leiden, 

The Netherlands 


Page 14 of 25 

Aim: The use of in silico models could save a lot of money, time and 

resources in the laboratories when in search for potent binding 

candidates of designated receptors. In this paper a proteochemometric 

model was developed that is able to distinguish between full agonist, 

agonist and antagonist across all 4 Adenosine receptors (A1, A2A, A2B 

and A3). 

Experimental: The software program Pipeline Pilot was used to create a 

proteochemometric model containing 2 SVMs [1]. The first SVM model 

was built to predict binding activity, the second SVM model to predict 

binding type activity. For the first model an in house database, Glida and 

part of the Starlite database were used. The pKi values in these databases 

were then capped to create an active and inactive class. For the second 

model a classification in full agonist, agonist and antagonist derived from 

the Glida database was used. Different Proteinfingerprints (ProtFPs) were 

tested. These FPs consisted of amino acid residues at important binding 

positions according to two entropy analysis (TEA) and crystal structure 

analysis [2]. The quality of the model was determined by MCC, specificity, 

sensitivity, predictive ability (Q2) and the correlation coefficient (R2). The 

Tanimoto similarity coefficient was used to visualize the degree of 

average and minimal similarity of the three binding type groups. 

Results: In continuous models FPs having a combination of TEA and 

crystal structure information scored the best (Q2 = 0.6683), but overall 

scores were not high enough to benefit from a continuous model. A 

categorical model was thus build and it was able to distinguish between 

agonist (99% correctly predicted) and antagonist (98% correct) and 

partially full agonists (63% correct) with a specificity of 0.85, a sensitivity 

of 0.83 and a MCC of 0.69. The Tanimoto coefficient showed that indeed 

antagonist and full agonist were the least similar, where agonist show 

more resemblance to the full agonist. The model finally predicted 2 novel 

full agonistic compounds in our in house database, being GIFT136 and 

GIFT589. 

Conclusion: A model that is able to distinguish between binding types 

was successfully created. 

References 

1. Kontijevskis A: J Chem Inf Model 2008, 48(9):1840. 

2. Ye K: Proteins 2006, 63(4):1018. 

P17 

pharmACOphore: multiple flexible ligand alignment based on ant 

colony optimization 

Gerhard Hessler 1* , Oliver Korb 2 , P Monecke 3 , T Stützle 4 , TE Exner 5 

1 Drug Design, Chemical and Analytical Sciences, Sanofi-Aventis Deutschland 

GmbH, Frankfurt, <strong>German</strong>y; 2 Cambridge, UK; 3 Frankfurt, <strong>German</strong>y; 4 Brussels, 

Belgium; 5 Konstanz, <strong>German</strong>y 


Molecular alignment of biologically active compounds is a key technique 

in ligand-based drug design. Such alignments can be used for similaritybased 

identification of new compounds by using known active template 

structures as seeds. Furthermore, alignments allow for the identification 

of potential pharmacophoric features in a set of chemically divers, 

biologically active molecules. 

Here, pharmACOphore is presented, a new approach for pairwise as well 

as multiple flexible alignments of ligands based on ant colony 

optimization (ACO) [1]. Translational, rotational and torsional degrees of 

freedom of all ligand structures are encoded on a pheromone vector 

(Figure 1). The artificial ant colony uses this representation to mark 

favourable values for each degree of freedom by depositing a 

pheromone trail onto the corresponding pheromone vector. The amount 

of pheromone deposited directly depends on the solution quality, 

assessed by the scoring function.



Figure 1 (abstract P17). 

The scoring function was parameterized by reproducing reference 

alignments taken from four different proteins. The performance of the 

new alignment algorithm will be demonstrated by pairwise alignments 

obtained for examples from the FlexS data set [2] and by an additional 

example for multiple flexible alignment. 

References 

1. Dorigo M, Stützle T: Ant Colony Optimization MIT Press, Cambridge, MA, 

USA 2004. 

2. Lemmen C, Lengauer T, Klebe G: FLEXS: A Method for Fast Flexible 

Ligand Superposition. J Med Chem 1998, 41:4502-4520. 

P18 

Bioisosteric similarity of drugs in virtual screening 

Michael C Hutter * , Markus Krier 

Center of Bioinformatics, Saarland University, Building C7.1, D-66041 

Saarbrücken, <strong>German</strong>y 


Choosing compounds for screening is difficult problem due the vast 

chemical space. The question is thus which compounds are most likely to 

be hits. The comparison of drugs that all target the same enzyme, 

however, shows reoccurring chemical modification throughout all 

therapeutic categories. These so-called bioisosteric replacements [1] 

comprise simple exchanges of terminal atoms as well as more complex 

structural modifications, such as ring closures or rearrangements of larger 

fragments. To detect and evaluate all kinds of replacements we have 

designed an approach that adopts the algorithmic concept used to 

assess the homology of amino acid sequences to chemical molecules. 

The mutual exchange frequencies between distinct atom types are 

expressed in a substitution matrix [2]. Likewise, pair-wise alignment 

betweenthemoleculesisconstructed using dynamic programming [3], 

with the compounds being represented as unique SMILES. To obtain the 

actual exchange frequencies, we refined an initial matrix based on 

observed chemical replacements [4] by collecting the generated 

alignments of 1353 drugs from 33 therapeutic categories in an 

automated procedure. To compute the mutual bioisosteric similarity 

between two molecules a specific function has been derived that makes 

use of the alignment [5]. 

To asses the suitability of this bioisosteric similarity for virtual screening, 

we compared the recovery of known drugs against the background of 

other substances. The majority of drugs possess a higher similarity 

within the same class than compared to substances from the ZINC [6] 

ortheProusScienceDrugsoftheFuturedatabase[7].Likewise,drugs 

for the same target are usually recovered at higher values of similarities 

than compared to other methods based on most common substructure 

and fingerprint approaches. Moreover, nondrugs without any 

pharmaceutical function exhibit considerably lower similarities than 

actual drugs. 

Furthermore, this bioisosteric similarity can be used to express the 

chemical diversity within a given compound class. We found that e.g. 

inhibitors of the HIV Reverse Transcriptase are more divers than 

Angiotensin-II Antagonists and Tetracycline Antibiotics. 

References 

1. Patani GA, LaVoie EJ: Chem Rev 1996, 96:3147. 

2. Henikoff S, Henikoff JG: Proc Natl Acad Sci USA 1992, 89:10195. 

3. Smith TF, Waterman MS: J Mol Biol 1981, 147:195. 

4. Sheridan RP: J Chem Inf Comput Sci 2002, 42:103. 

5. Krier M, Hutter MC: J Chem Inf Model 2009, 49:1280. 

6. [http://zinc.docking.org]. 

7. [http://pubchem.ncbi.nlm.nih.gov/]. 

Page 15 of 25 

P19 

Updating existing QSAR models: selection and weighting of new data 

Tomas Öberg * , T Liu 

University of Kalmar, Sweden 


Computational chemistry and quantitative structure-activity relationships 

(QSAR) are foreseen to be extensively used in the implementation of the 

new REACH regulation for chemicals in Europe. However, for some 

compound groups the data are too few in number to permit both 

calibration and testing of a new model. Usage and previously developed 

or updated models are then viable alternatives. 

Perfluorocarboxylic acids (PFCAs) and fluoroteleomer alcohols (FTOHs) are 

two groups of environmentally relevant compounds, with unique physical 

and chemical properties. The subcooled liquid vapour pressure (pL) is one 

such property, where experimental determinations are limited and far from 

consistent [1]. Updating is, however, challenging when the new compounds 

are far outside of the original calibration domain space. But by carefully 

selecting and weighting only three new compounds, we have been able to 

update a previously developed general QSAR model [2], to cover the new 

domain while maintaining predictive performance for the earlier calibration 

and test data. The optimal weighting scheme was determined from the 

sample leverages and residuals in the calibration phase [3]. 

The performance of this re-calibrated model greatly surpassed previous 

modelling attempts [4], when applied to an external test set of two 

PFCAsandfourFTOHswithpLintherange0.2-200Pa;withQ2Ext= 

0.994 and RMSEP = 0.190 units of log Pa. The domain coverage also 

increased from 1% to 51%, for 426 perfluoroalkylated compounds 

selected from the REACH registration list, the PhysProp database, and the 

OECD 2006 survey [5]. Selection and weighting of new calibration data 

can thus facilitate the extension and use of existing QSAR models. 

This investigation was supported by the EU FP7 project CADASTER (grant 

agreement no. 212668). 

References 

1. Goss K-U, Bronner G, Harner T, Hertel M, Schmidt TC: Environ Sci Technol 

2006, 40:3572. 

2. Öberg T, Liu T: QSAR Comb Sci 2008, 27:273. 

3. Stork CL, Kowalski BR: Chemometr Intell Lab Syst 1999, 48:151. 

4. Arp HP, Niederer C, Goss K-U: Environ Sci Technol 2006, 40:7298. 

5. Lists of PFOS, PFAS, PFOA, PFCA, related compounds and chemicals that 

may degrade to PFCA. ENV/JM/MONO(2006)16, OECD. 2007. 

P20 

DrugScorePPI for scoring protein-protein interactions: improving 

a knowledge-based scoring function by atomtype-based QSAR 

Dennis M Krüger * , Holger Gohlke 

Heinrich-Heine-University, Computational Pharmaceutical Chemistry, 40225 

Düsseldorf, <strong>German</strong>y 


Protein-protein complexes are known to play key roles in many cellular 

processes. Therefore, knowledge of the three-dimensional structure of 

protein-complexes is of fundamental importance. A key goal in proteinprotein 

docking is to identify near-native protein-complex structures. In 

this work, we address this problem by deriving a knowledge-based 

scoring function from protein-protein complex structures and further finetuning 

of the statistical potentials against experimentally determined 

alanine-scanning results. 

Based on the formalism of the DrugScore approach1, distance-dependent 

pair potentials are derived from 850 crystallographically determined



protein-protein complexes 2. These DrugScorePPI potentials display 

quantitative differences compared to those of DrugScore, which was 

derived from protein-ligand complexes. When used as an objective 

function to score a non-redundant dataset of 54 targets with “unbound 

perturbation” solutions, DrugscorePPI was able to rank a near-native 

solution in the top ten in 89% and in the top five in 65% of the cases. 

Applied to a dataset of “unbound docking” solutions, DrugscorePPI was 

able to rank a near-native solution in the top ten in 100% and in the top 

five in 67% of the cases. Furthermore, Drugscore-PPI was used for 

computational alanine-scanning of a dataset of 18 targets with a total of 

309 mutations to predict changes in the binding free energy upon 

mutations in the interface. Computed and experimental values showed a 

correlation of R2 = 0.34. To improve the predictive power, a QSAR-model 

was built based on 24 residue-specific atom types that improves the 

correlation coefficient to a value of 0.53, with a root mean square 

deviation of 0.89 kcal/mol. A Leave-One-Out analysis yields a correlation 

coefficient of 0.41. This clearly demonstrates the robustness of the model. 

The application to an independent validation dataset of alaninemutations 

was used to show the predictive power of the method and 

yields a correlation coefficient of 0.51. Based on these findings, 

Drugscore-PPI was used to successful identify hotspots in multiple 

protein-interfaces. These results suggest that DrugscorePPI is an adequate 

method to score protein-protein interactions. 

References 

1. Gohlke H, Hendlich M, Klebe G: Knowledge-based scoring function to 

predict protein-ligand interactions. J Mol Biol 2000, 295(2):337-356. 

2. Huang SY, Zou X: An iterative knowledge-based scoring function for 

protein-protein recognition. Proteins 2008, 72(2):557-579. 

P21 

Adaptation of formal concept analysis for the systematic exploration 

of structure-activity and structure-selectivity relationships 

Eugen Lounkine * , Jürgen Bajorath 

Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical 

Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität 



Formal Concept Analysis (FCA) is a data mining and visualization 

approach originating from information science. It operates on binary 

relationships between objects and attributes, which are reported in a 

formal context. Formal concepts are sets of objects that share a defined 

subset of attributes. FCA organizes these concepts in lattices that reflect 

their relationship in terms of shared objects and/or attributes and allows 

the identification of objects with defined sets of attributes [1]. 

Two adaptations of FCA that allow the systematic analysis of structureactivity 

and-selectivity relationships are presented. Fragment Formal 

Concept Analysis (FragFCA) assesses the distribution of molecular fragment 

combinations among ligands with closely related biological targets. This 

allows the identification of fragment signatures that exclusively occur in 

compounds with a defined activity profile. FragFCA also identifies fragment 

combinations that are characteristic of highly potent compounds against 

defined targets. Fragment signatures usually represent combinations of 

two or three fragments and can be used to differentiate active compounds 

of closely related targets for different target families [2,3]. 

Molecular Formal Concept Analysis (MolFCA) is introduced for the 

systematic comparison of the selectivity of a compound against multiple 

targets and the extraction of compounds with complex selectivity profiles 

from biologically annotated databases. Selectivity is assessed based on 

pair-wise compound potency ratios. Thisallowsthedefinitionmultiple 

selectivity queries involving the comparison of an arbitrary number of 

targets and compound potency values or ratios. The individual queries are 

applied in a sequential manner to retrieve compounds with desired 

selectivity against targets of interest. MolFCA operates on activity space 

representations of compounds and thus allows the identification of 

structurally diverse compounds matching a given selectivity profile [4]. 

References 

1. Priss U: Ann Rev Inf Sci Technol 2006, 40:521. 

2. Lounkine E, Auer J, Bajorath J: J Med Chem 2008, 51:5342. 

3. Krüger F, Lounkine E, Bajorath J: Chem Med Chem 2009, 4:1174. 

4. Lounkine E, Stumpfe D, Bajorath J: J Chem Inf Model 2009, 49:1359. 

Page 16 of 25 

P22 

Systematic extraction of structure-activity relationship information from 

biological screening data 

Mathias Wawer * , L Peltason, Jürgen Bajorath 

Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical 

Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität 



The analysis of high-throughput screening data poses significant 

challenges to medicinal and computational chemists. The number of 

compounds assayed in a single screen is prohibitively large for manual 

data analysis and no generally applicable computational methods have 

thus far been developed to consistently solve the problem of how to 

best select hits for further chemical exploration. 

Focusing on the question of how structure-activity relationship (SAR) 

information can be used to support this decision, we present methods 

for the descriptive analysis of screening data. Network representations 

visualize the distribution of 2D similarity relationships and potency in a 

data set and give an overview of global and local features of an activity 

landscape. Although dominated by many weakly active hits, different 

local SAR environments can be identified among screening hits, thus 

helping to focus on regions in chemical space that might show favorable 

SAR behavior in further exploration [1]. 

A more detailed analysis of the data is achieved by systematically mining 

the network for SAR pathways, i.e. sequences of pairwise similar 

compounds that connect two molecules via a gradually increasing 

potency gradient. The SAR pathways are calculated exhaustively for all 

possible compound pairs in a data set to identify those having most 

significant SAR information content. Often, high-scoring pathways lead to 

activity cliffs, i.e. pairs of similar compounds with significant differences in 

potency, and scaffold transitions can be observed along the pathways. 

Furthermore, a tree structure organizes alternative pathways that begin at 

the same compound but lead to different molecules and chemotypes. 

Similarly, SAR trees can be generated from all pathways that lead 

to an activity cliff in order to characterize the surrounding SAR 

microenvironment [2]. 

References 

1. Wawer M, Peltason L, Bajorath J: J Med Chem 2009, 52:1075-1080. 

2. Wawer M, Bajorath J: Chem Med Chem in press. 

P23 

High throughput in-silico screening against flexible protein receptors 

Horacio Pérez-Sánchez 1* , Bernhard Fischer 1 , Daria Kokh 2 , Holger Merlitz 1 , 

Wolfgang Wenzel 1 

1 Institut für Nanotechnologie, Forschungszentrum Karlsruhe, Postfach 3640, 

76021 Karlsruhe, <strong>German</strong>y; 2 Wuppertal, <strong>German</strong>y 

Journal of Cheminformatics 2010, 2(Suppl 1):P23 

Based on the stochastic tunneling method (STUN) [1] we have developed 

FlexScreen [2], a novel strategy for high-throughput in-silico screening of 

large ligand databases. Each ligand of the database is docked against the 

receptor using an all-atom representation of both ligand and receptor. The 

ligands with the best evaluated affinity are selected as lead candidates for 

drug development. Using the thymidine kinase inhibitors as a prototypical 

example we documented [3] the shortcomings of rigid receptor screens in a 

realistic system. We demonstrate a gain in both overall binding energy and 

overall rank of the known substrates when two screens with a rigid and 

flexible (up to 15 sidechain dihedral angles) receptor are compared. We 

note that the STUN suffers only a comparatively small loss of efficiency 

when an increasing number of receptor degrees of freedom is considered. 

FlexScreen thus offers a viable compromise [4] between docking flexibility 

and computational efficiency to perform fully automated database screens 

on hundreds of thousands of ligands. We also investigate enrichment rates 

[5] of rigid, soft and flexible receptor models [6] for 12 diverse receptors 

using libraries containing up to 13000 molecules. A flexible sidechain model 

with flexible dihedral angles for up to 12 aminoacids increased both binding 

propensity and enrichment rates: EF_1 values increased by 35% on average 

with respect to rigid-docking (3-8 flexible sidechains). This methodology will 

be soon available for the Cell processor and Pipeline Pilot.



References 

1. Wenzel W, Hamacher K: Stochastic tunneling approach for global 

minimization of complex potential energy landscapes. Physical Review 

Letters 1999, 82(15):3003-3007. 

2. Merlitz H, Burghardt B, Wenzel W: Application of the stochastic tunneling 

method to high throughput database screening. Chemical Physics Letters 

2003, 370(1-2):68-73. 

3. Merlitz H, Burghardt B, Wenzel W: Impact of receptor conformation on in 

silico screening performance. Chemical Physics Letters 2004, 

390(4-6):500-505. 

4. Fischer B, et al: Accuracy of binding mode prediction with a cascadic 

stochastic tunneling method. Proteins-Structure Function and Bioinformatics 

2007, 68(1):195-204. 

5. Kokh DB, Wenzel WG: Flexible side chain models improve enrichment 

rates in in silico screening. Journal of Medicinal Chemistry 2008, 

51(19):5919-5931. 

6. Fischer B, Fukuzawa K, Wenzel W: Receptor-specific scoring functions 

derived from quantum chemical models improve affinity estimates for 

in-silico drug discovery. Proteins-Structure Function and Bioinformatics 2008, 

70(4):1264-1273. 

P24 

Virtual screening by high-throughput docking using hydrogen 

bonding constraints for targeting a protein-protein interface in 

M. tuberculosis 

Oliver Koch 1,2* , T Jäger 2 , L Flohé 2 , PM Selzer 1 

1 Intervet Innovation GmbH, Zur Propstei, 55270 Schwabenheim, <strong>German</strong>y; 

2 MOLISA GmbH, Brenneckestraße 20, 39118 Magdeburg, <strong>German</strong>y 


The resurgence of Tuberculosis, caused primarily by Mycobacterium 

tuberculosis, and the appearance of drug resistant tuberculosis strains 

encourage the need for new drugs with alternative mode of actions [1]. An 

interesting drug-target is the Thioredoxin-Reductase (TrxR)/Thioredoxin (Trx) 

System of M. tuberculosis, which is responsible for providing reducing 

equivalents for many cellular processes including bacterial antioxidant 

defence [2]. 

We performed a virtual screening approach for identifying novel drugs for 

the treatment of Tuberculosis that used the hydrogen bonding constraint 

functionality of GOLD for pre-processing instead of using pharmacophore 

filtering. Two important hydrogen bonding interactions could be identified 

at the protein-protein interface of the TrxR-Trx complex. A high-throughput 

docking was applied to filter the Intervet in-house compound library (~6.5 

million compounds) for possible hits that interact with these two hydrogen 

bonding acceptors. This reduced the number of interesting compounds to 

151768 that were redocked without any constraints and more accurate 

docking settings. Finally, the ranking of the reduced compound set was 

obtained using a normalisation based on molecular weight [3] and a 

consensus scoring in a restrictive way using Chemscore and ASPscore. 

So far, six out of the first 25 of the ranked compound list showed an 

activity with an IC 50 value upto M range. Especially three compounds 

with different scaffolds and a low molecular weight are promising 

candidates for further developments. 

References 

1. World Health Organization Tuberculosis Programme. [http://who.int/tb/]. 

2. Jaeger T, Flohé L: The thiol-based redox networks of pathogens: 

unexploited targets in the search for new drugs. Biofactors 2006, 

27:109-120. 

3. Carta A, Knox JS, Lloyd DG: Unbiasing Scoring Functions: A New 

Normalization and Rescoring Strategy. JCIM 2007, 47:1564-1571. 

P25 

Ensemble docking revisited 

Oliver Korb * , S Bowden, T Olsson, D Frenkel, J Liebeschuetz, J Cole 

Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, 

CB2 1EZ, UK 


In recent years, the importance of considering induced fit effects in 

molecular docking calculations has been widely recognised in the 

Page 17 of 25 

molecular modelling community. While small-scale protein side-chain 

movements are now accounted for in many state-of-the-art docking 

strategies, the explicit modelling of large-scale protein motions such as 

loop movements in kinase domains is still a challenging task. For this 

reason ensemble-based methods have been introduced taking into 

account several discrete protein conformations in the conformational 

sampling step. Our protein-ligand docking approach GOLD [1,2] has been 

extended to search such conformational ensembles time-efficiently. The 

performance of the approach has been assessed on several protein 

targets using different scoring functions. A detailed analysis of pose 

prediction and virtual screening results in dependence of the number of 

protein structures considered in the conformational ensemble will be 

presented and limitations of the approach will be highlighted. 

References 

1. Jones G, Willett P, Glen RC: Molecular Recognition of Receptor Sites Using 

a Genetic Algorithm with a Description of Desolvation. J Mol Biol 1995, 

245:43-53. 

2. Jones G, Willett P, Glen RC, Leach AR, Taylor R: Development and 

Validation of a Genetic Algorithm for Flexible Docking. J Mol Biol 1997, 

267:727-748. 

P26 

Picking out polymorphs: H-bond prediction and crystal structure 

stability 

Peter TA Galek * , Frank H Allen, László Fábián 

Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, 

CB2 1EZ, UK 


A methodology has been developed to predict the propensity for 

hydrogen bonds to form in crystal structures, treating each potential 

H-bond as a binary response variable, and modelling its likelihood using 

a set of relevant chemical descriptors [1]. Modelling is tailored to a target 

using chemically similar known structures,frome.g.theCambridge 

Structural Database [2], making it accessible to the complete spectrum of 

organic structures, including solvates, hydrates and cocrystals. Recent 

work has developed the approach to predicting inter- and intramolecular 

H-bonds when either type can occur. 

By way of a comparison between possible and observed H-bonds, the 

method has been applied to assess structural stability, which shows much 

promiseinthedomainofpolymorphscreeninginthepharmaceutical 

industry. We will introduce the methodology and illustrate its application 

using a selection of pharmaceutical compounds, one of which will be 

Abbott’s well-publicised anti-HIV medication ritonavir (Norvir). Owing to a 

hidden, more stable form II with much lower bioavailability, ritonavir was 

temporarily withdrawn from the market with significant financial impact 

[3]. Our method quickly suggests a real threat of polymorphism in this 

compound, and strongly supports the relative stability of form II over 

form I. For all examples, the high predictivity of the method is emphasised. 

References 

1. Galek PTA, Fábián L, Allen FH, Motherwell WDS, Feeder N: Acta Cryst 2007, 

B63:768-782. 

2. Allen FH: Acta Cryst 2002, B58:380-388. 

3. Bauer J, Spanton S, Henry R, Quick J, Dziki W, Porter W, Morris J: Pharm Res 

2001, 18:859-866. 

P27 

Kernel learning for ligand-based virtual screening: discovery of a new 

PPARg agonist 

Matthias Rupp 1* , T Schroeter 2 , R Steri 1 , Ewgenij Proschak 1 , K Hansen 2 , H Zettl 1 , 

O Rau 1 , M Schubert-Zsilavecz 1 , K-R Müller 2 , Gisbert Schneider 1 

1 

University of Frankfurt, Siesmayerstr. 70, 60323 Frankfurt am Main, <strong>German</strong>y; 

2 

Berlin, <strong>German</strong>y 


We demonstrate the theoretical and practical application of modern 

kernel-based machine learning methods to ligand-based virtual 

screening by successful prospective screening for novel agonists of the 

peroxisome proliferator-activated receptor g (PPARg) [1].PPARg is a 

nuclear receptor involved in lipid and glucose metabolism, and related



to type-2 diabetes and dyslipidemia. Applied methods included a graph 

kernel designed for molecular similarity analysis [2], kernel principle 

component analysis [3], multiple kernel learning [4], and, Gaussian 

process regression [5]. 

In the machine learning approach to ligand-based virtual screening, one 

uses the similarity principle [6] to identify potentially active compounds 

based on their similarity to known reference ligands. Kernel-based 

machine learning [7] uses the “kernel trick”, a systematic approach to the 

derivation of non-linear versions of linear algorithms like separating 

hyperplanes and regression. Prerequisites for kernel learning are similarity 

measures with the mathematical property of positive semidefiniteness 

(kernels). 

The iterative similarity optimal assignment graph kernel (ISOAK) [2] is 

defined directly on the annotated structure graph, and was designed 

specifically for the comparison of small molecules. In our virtual screening 

study, its use improved results, e.g., in principle component analysisbased 

visualization and Gaussian process regression. Following a 

thorough retrospective validation using a data set of 176 published 

PPARg agonists [8], we screened a vendor library for novel agonists. 

Subsequent testing of 15 compounds in a cell-based transactivation assay 

[9] yielded four active compounds. 

The most interesting hit, a natural product derivative with cyclobutane 

scaffold, is a full selective PPARg agonist (EC50 = 10 ± 0.2 μM, inactive on 

PPARa and PPARb/δ at 10 μM). We demonstrate how the interplay of 

several modern kernel-based machine learning approaches can 

successfully improve ligand-based virtual screening results. 

References 

1. Henke B: Progress in Medicinal Chemistry 2004, 42:1-53. 

2. Rupp M, Proschak E, Schneider G: J Chem Inf Model 2007, 47(6):2280-2286. 

3. Schölkopf B, Smola A, Müller K-R: Neural Comput 1998, 10(5):1299-1319. 

4. Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B: J Mach Learn Res 2006, 

7(7):1531-1565. 

5. Rasmussen C, Williams C: MIT Press, Cambridge 2006. 

6. Johnson M, Maggiora M: Wiley, New York 1990. 

7. Schölkopf B, Smola A: MIT Press, Cambridge 2002. 

8. Rücker C, Scarsi M, Meringer M: Bioorg Med Chem 2006, 14(15):5178-5195. 

9. Rau O, Wurglics M, Paulke A, Zitzkowski J, Meindl N, Bock A, 

Dingermann T, Abdel-Tawab M, Schubert-Zsilavecz M: Planta Med 2006, 

72(10):881-887. 

P28 

OrChem: an open source chemistry search engine for Oracle 

Mark L Rijnbeek * , Christoph Steinbeck 

EMBL EBI, Wellcome Trust Genome Campus, Cambridge, UK 


Registration, indexing and searching of chemical structures in relational 

databases is one of the core areas of cheminformatics. However, little 

detail has been published on the inner workings of search engines and 

their development is mostly closed-source. We decided to implement an 

open source chemical library for Oracle, the de-facto database standard 

in the commercial world. 

We present OrChem, an extension for the Oracle 11G database that adds 

registration and indexing of chemical structures to support fast 

substructure and similarity searching. The cheminformatics functionality is 

provided by the Chemistry Development Kit [1,2]. OrChem enables 

similarity searching with response times in the order of seconds for 

databases with millions of compounds, depending on provided similarity 

cut-off. For substructure searching, OrChem can make use of multiple 

processor cores on today’s powerful database servers to provide fast 

response times in equally large data sets. 

OrChem is a mix of PL/SQL and Java that executes inside the database. 

The user interacts with OrChem with calls to PL/SQL and Java Stored 

Procedures. Starting with Oracle 11g there is a just-in-time (JIT) compiler 

for the Oracle JVM environment which makes Java run much faster inside 

the database than previously. 

OrChem is built on top of the Chemistry Development Kit (CDK) and 

depends on this Java library in numerous ways. For example, compounds 

are represented internally as CDK molecule objects, the CDK’s I/O 

package is used to retrieve compound data, and its subgraph 

isomorphism algorithms are used for substructure validation. OrChem 

adds its own Java layer on top of the CDK to implement fast database 

storage and retrieval. With the CDK loaded into Oracle, a large 

cheminformatics library becomes readily available to PL/SQL. With little 

effort developers can build database functions around the CDK and so 

quickly implement chemistry extensions for Oracle. 

References 

1. Steinbeck , et al: J Chem Inf Comput Sci 2003, 43(2):493-500. 

2. Steinbeck , et al: Curr Pharm Des 2006, 12(17):2111-2120. 

P29 

Computational fragment-based drug design to explore the 

hydrophobic subpocket of the mitotic kinesin Eg5 allosteric 

binding site 

Ksenia Oguievetskaia *† , Laetitia Martin-Chanas *† , Artem Vorotyntsev, 

Olivia Doppelt-Azeroual, Xavier Brotel, Stewart A Adcock, 

Alexandre G de Brevern, Francois Delfaud, Fabrice Moriaud 

MEDIT SA, 2 rue du Belvédère, 91120 Palaiseau, France 


Page 18 of 25 

Eg5, a mitotic kinesin exclusively involved in the formation and function 

of the mitotic spindle has attracted interest as an anticancer drug target. 

Eg5 is co-crystallized with several inhibitors bound to its allosteric binding 

pocket. Each of these occupies a pocket formed by loop5/helix a2. 

Recently designed inhibitors additionally occupy a hydrophobic pocket of 

this site. The goal of the present study was to identify new fragments 

which fill this hydrophobic pocket and might be interesting chemical 

moieties to design new inhibitors. 

We’ve used the fragment based protocol of MED-SuMo which exploits the 

whole PDB on the basis that similar protein surfaces might bind the same 

fragment. The MED-SuMo software is able to compare and superimpose 

similar protein-ligand interaction surfaces at PDB scale. Protein-fragment 

complexes are encoded as MED-Portions by cross-mining 3D PDB ligand 

structures with fragment-like PubChem molecules. Inhibitors are designed 

by hybridising MED-Portions with MED-Hybridise. 

Without using the known Eg5 PDB ligands that occupy the hydrophobic 

pocket, we’ve predicted new potential ligands that would fill 

simultaneously both pockets. Some of the latter are identified in libraries 

of synthetically accessible molecules by the MED-Search software. 

P30 

Learning antibacterial activity against S. Aureus on the Chimiothèque 

Nationale dataset 

G Marcou 1* , N Lachiche 1 , L Brillet 1 , J-M Paris 2 , A Varnek 1 

1 Université de Strasbourg, Faculté de Chimie de Strasbourg, Laboratoire 

d’infochimie, 4 rue Blaise Pascal, 67000 Strasbourg, France; 2 Paris, France 


“Chimiothèque Nationale” (CN) represents a library of synthetic and 

natural products from various French public laboratories. Recently, 

experimental screening for S. Aureus antibacterial activity of the part of 

this library has been performed by Dr J.-M. Paris and collaborators in 

Ecole des Mines (Paris, France). Here, experimental results of the 

screening have been used to build classification models using SMF 

descriptors and ISIDA modeling tools (Naïve Bayes (NB) and SVM 

modules). The dataset consisted in a large and structurally diverse set of 

4563 compounds, 62 of which having a demonstrated antibacterial 

activity. In NB calculations, several variations of the algorithm were used. 

In particular, the aprioridistribution of SMF descriptors was modeled 

using a binomial law, Bernouilli distributions or first order logic. SVM 

calculations were performed with an RBF kernel which parameters (C,g) 

were optimized to maximize the balanced accuracy. 

Both NB and SVM models were validated using external 5-fold cross 

validation, repeated three times on randomized data set. Additionally 

fifteen Y-randomizations were performed in order to check for chance 

correlation. Predictive performance of the models has been assessed by 

combination of Precision and Recall parameters: the models with Precision 

>0.1andRecall > 0.6 correspond to over 10-fold enrichment and, 

therefore, were considered as acceptable. 

A consensus model was applied to estimate the antibacterial activity of 

122 new compounds for the CN. All predictions were done blindly.



P31 

Multi-parameter scoring functions for ligand- and structure-based 

de novo design 

Fabian Bös 1* , KM Smith 2 , Q Liu 3 

1 Tripos International, Martin-Kollar-Str. 17, 81829 München, <strong>German</strong>y; 

2 St. Louis, USA; 3 Shanghai, PR China 


Successful drug discovery often requires optimization against a set of 

biological and physical properties. We describe de novo design studies that 

demonstrate successful scaffold hops between known classes of ligands for 

p38 MAP kinases using ligand-based and structure-based multi-parameter 

scoring functions coupled to the molecular invention engine Muse. 

The ligand-based scoring function includes pharmacophoric and steric 

tuplets and structural (fingerprint based) similarity. In addition various 

selectivity or ADME related properties (e.g. Lipinski properties, polar 

surface area, activity at off-targets, etc.). can be taken into account to 

guide the evolution of structures meeting multiple design criteria. 

The structure-based scoring function uses Surflex-Dock to pose and score 

invented structures inside the target’s active site. In addition, a number of 

simple molecular properties (e.g. clogP, Lipinski properties, etc.) are used as 

score components to focus the design on medicinally relevant chemistries. 

With the ability of Surflex-Dock to start the docking process with a single or 

multiple placed fragments, this scoring function can be applied in fragment 

based drug discovery to optimize attachments onto a pre-placed substructure. 

P32 

Ligand-side tautomer enumeration and scoring for structure-based 

drug-design 

Thomas Seidel * , G Wolber 

Inte:Ligand GmbH, Mariahilferstrasse 74B/11, A-1070 Vienna, Austria 


Tautomeric rearrangements of a molecule lead to distinct equilibrated 

structural states of the same chemical compound that may differ 

significantly in molecular shape, surface, nature of functional groups, 

hydrogen-bonding pattern and other derived molecular properties [1]. 

Especially for the structure-based pharmacophore modeling of ligandprotein 

complexes [2], knowledge of the most favorable tautomeric ligand 

states may be crucial for the quality and correctness of the generated 

pharmacophore models and derived putative binding modes. We will 

present a ligand-side tautomer enumeration and ranking algorithm that 

considers both geometrical constraints imposed by the conformation 

of the bound ligand as well as intra- and inter-molecular energetic 

contributions to find the preferred tautomeric states of the ligand in the 

macromolecular environment of the binding-site. The presented tautomer 

ranking algorithm is based on scores that are derived solely from MMFF94 

[3-9] energies (thus the method works for a wide range of organic 

molecules) and has proven to be able to top-rank known preferred 

tautomeric states of ligands in a series of investigated protein complexes. 

References 

1. Pospisil P, et al: J Recept Signal Transduct 2003, 23:361-371. 

2. Wolber G, Langer T: J Chem Inf Model 2005, 45:160-169. 

3. Halgren TA: J Comput Chem 1996, 17:490-519. 



6. Halgren TA, Nachbar RB: J Comput Chem 1996, 17:587-615. 




P33 

Maximum-score diversity selection for early drug discovery 

Thorsten Meinl * , C Ostermann, O Nimz, A Zaliani, MR Berthold 

University of Konstanz, 78457 Konstanz, <strong>German</strong>y 


Diversity selection is a common task in early drug discovery, be it for 

removing redundant molecules prior to HTS or reducing the number of 

Page 19 of 25 

molecules to synthesize from scratch. One drawback of the current 

approach, especially with regard to HTS, is, however, that only the 

structural diversity is taken into account. The fact that a molecule may 

be highly active or completely inactive is usually ignored. This is 

especially remarkable, as quite a lot of research is involved in 

improving virtual screening methods in order to forecast activity. We 

therefore present a modified version of diversity selection – which we 

termed Maximum-Score Diversity Selection – which additionally takes 

the predicted activities of the molecules into account. Not very 

surprisingly both objectives – maximizing activity whilst also 

maximizing diversity in the selected subset – conflict. As a result, 

we end up with a multiobjective optimization problem. We will show, 

that the task of diversity selection is quite complicated (it is NPcomplete) 

and therefore heuristic approaches are needed for typical 

dataset sizes. 

A common and popular approach is using multiobjective genetic 

algorithms, such as NSGA-II [1], for optimizing both objectives for the 

selected subsets. However, we will show that usual implementations 

suffer from severe limitations that prevent them from finding quite a lot 

of possible interesting solutions. Therefore, we evaluated two other 

heuristic for maximum-score diversity selection. One is special heuristic 

(called BB2) that was motivated by the mentioned proof of NPcompleteness 

[2]. The other is a novel heuristics called Score Erosion 

which was specifically developed for our actual problem. Among all 

three heuristics, Score Erosion is by far the fastest one while finding 

solutions of equal quality compared to the genetic algorithm and BB2. 

This will be shown on several real world datasets, both public and 

internal ones. 

All experiments were carried out using the data analysis platform KNIME 

[3] therefore we will also show some example how maximumscore 

diversity selection can be performed inside workflow-based 

environments. 

References 

1. Deb K, Pratap A, Agarwal S, Meyarivan T: A fast and elitist multiobjective 

genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 

2002, 6:182-197. 

2. Erkut E: The discrete p-dispersion problem. European Journal of 

Operational Research 1990, 46(1):48-60. 

3. [http://www.knime.org/]. 

P34 

Progress on an open source computer-assisted structure elucidation 

suite (SENECA) 

Stefan Kuhn * , Christoph Steinbeck 

Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK 


We suggested and developed various components for Computer-Assisted 

Structure Elucidation (CASE) over the years [1,2]. Our current goal is to 

integrate these into an easy to use and efficient platform for end users, 

called SENECA. This is based on Bioclipse [3], an integrated software suite 

for chemo- and bioinformatics providing plugins for file handling and 

visualisation of compounds and spectra. 

The SENECA feature currently comprises the following components and 

algorithms: 

A deterministic structure generator suitable for small chemical spaces. 

Simulated Annealing and Genetic Algorithm for random structure walks 

in large spaces. 

Simulation of 13C NMR spectra for ranking results. 

SENECA is a Bioclipse feature, available via the update site at http://www. 

ebi.ac.uk/steinbeck-srv/speclipse/net.bioclipse.seneca-updatesite/. 

References 

1. Steinbeck C, Krause S, Kuhn S: NMRShiftDB - Constructing a free chemical 

information system with open-source components. Journal of Chemical 

Information and Computer Sciences 2003, 43:1733-1739. 

2. Kuhn S, Egert B, Neumann S, Steinbeck C: Building blocks for automated 

elucidation of metabolites: Machine learning methods for NMR 

prediction. BMC Bioinformatics 2008, 9:400. 

3. Spjuth O, Helmus T, Willighagen EL, Kuhn S, Eklund M, et al: Bioclipse: An 

open source workbench for chemo- and bioinformatics. BMC 

Bioinformatics 2007, 8:59.



P35 

Prediction of highly-connected ‘hub’-proteins in protein interaction 

networks using QSAR 

M Hsing 1* , K Byler 2 , A Cherkasov 1 

1 

University of British Columbia, 2329 West Mall, Vancouver, BC Canada; 

2 

Rochester, USA 


Proteins that are most essential for functioning and viability of bacterial 

cell have been shown to exhibit larger number of interactions with other 

cell components. Thus, by identifying the most connected proteins (or 

hubs) in protein interaction networks (PINs), one may discover 

prospective drug targets that can be utilized to combat emergent and 

drug-resistant pathogens such as Methicillin-Resistant Staphylococcus 

aureus 252 (MRSA). The advantage of using such hub proteins as drug 

targets lies in their essentiality, non-replaceable position in the PIN and 

lower rate of mutation, which can help to counter bacterial resistance. 

However, finding or predicting such hub proteins remains a challenging 

task as the corresponding experiments are very costly, while traditional 

bioinformatics approaches generally fail in forecasting PIN data due to 

the general lack of agreement between the existing datasets [1]. 

Thus, we have decided to utilize various structural and physicochemical 

features of proteins, related to traditional QSAR properties for predicting 

highly connected proteins. Using our own in-house generated PIN for the 

MRSA cell we have trained a boosting tree-based classifier that uses 75 

physical and chemical QSAR descriptors computed for all proteins in the 

interaction network [2]. The utilized parameters included molecular 

weight, net charge, isoelectric point, hydrophobicity, surface area, solvent 

accessibilities, electronegativity, secondary structure composition, surface 

coils and flexibility among other QSAR descriptors. 

The developed QSAR model has yielded a high prediction accuracy of 

80% for the validation set and was used to predict additional hubs in the 

rest of the MRSA proteome. The predicted hubs have then been 

evaluated experimentally and 55% of them were confirmed as high 

interactors what corresponds to >5 fold dataset enrichment for potential 

hub-proteins provided by the developed QSAR model. 

Thus, the successful development of accurate hub classifiers demonstrated 

that highly-connected proteins tend to share certain structural and 

physicochemical features that can be characterized and quantified by 

conventional QSAR descriptors. 

It is anticipated that the developed hub classifiers will represent a useful 

tool for the prediction of highly-interacting proteins and can find broad 

application for planning and executing large-scale proteomic experiments 

and for identification of novel and prospective antibacterial drug targets – 

even in those organisms that currently lack protein interaction data. 

References 

1. Huang H, Jedynak BM, Bader JS: PLoS Comput Biol 2007, 3:e214. 

2. Byler K, Hsing M, Cherkasov A: QSAR & Combinatorial Science 2009, 

28:509-519. 

P36 

Efficient extraction of canonical spatial relationships using a recursive 

enumeration of k-subsets 

Georg Hinselmann * , Nikolas Fechner, A Jahn, Andreas Zell 

University of Tübingen, Sand 1, 72076 Tübingen, <strong>German</strong>y 


The spatial arrangement of a chemical compound plays an important role 

regarding the related properties or activities. A straightforward approach to 

encode the geometry is to enumerate pairwise spatial relationships between 

k substructures, like functional groups or subgraphs. This leads to a 

combinatorial explosion with the number of features of interest and 

redundant information. The goal of this work is to compute all possible 

k-subsets of spatial points and to extract a single canonical descriptor for 

each subset in sub-polynomial computation time. More precisely, the 

problem is to reduce the complexity of n k = n·(n - 1)...(n - k) possible 

relationships (patterns or descriptors) for n features and k-point 

relationships. 

We propose a two-step algorithm to solve this problem. A modified 

algorithm for the computation of the binomial coefficient computes 

Page 20 of 25 

the k-subsets [1] containing the possible combinations of the n 

relevant features. If a k-subset is completed in the inner recursion, the 

algorithm computes a canonical representation for it. By defining a 

natural order by means of the geometrical center of gravity of the k 

points, we extract k patterns that describe the distance to the center 

of gravity and type of the spatial feature k Î F. Then, the algorithm 

returns a unique identifier for the lexicographically sorted array of 

patterns. If applicable ( f� � f� �� f 

0 1 � ), an additional identifier 

k 

d�d 0�1 � 

�k�1�k is added which has the form , 

f� � f f f 

o � �� 

1 k�1 

�k 

where dij denotes the geometrical distance between features i, j. Else 

( f� � f f 

o � �� 

1 � ), this step is omitted. Therefore, this approach 

k 

also considers stereochemistry. Finally, one feature is returned for each 

k-subset resulting in a set of C(n, k) patterns describing the structure. 

Themainresultisthatthenumberoffeaturesisreducedfromnkto C(n, k), which equals the binomial coefficient. This procedure is useful 

in combination with similarity approaches that use spatial relationships, 

like pharmacophore searches, fingerprints, or graph kernels. We 

experimentally validated the algorithm on numerous QSAR benchmark 

sets in combination with the pharmacophore kernel [2]. 

References 

1. Rolfe T: SIGCSE Bull 2001, 33(3):35-36. 

2. Mahé P, Ralaivola L, Stoven V, Vert J-P: J Chem Inf Mod 2006, 

46(5):2003-2014. 

P37 

A comparative study of in silico prediction of pKa 

C Matijssen 

Cancer Research UK Centre for Cancer Therapeutics, The Institute of Cancer 

Research, 15 Cotswold Road, Sutton, Surrey, SM2 5NG, UK 


The ionization constant (pKa) is the measure of the strength of an acid or 

base in a solution. Most ligands act as a weak acid or base. Accurate 

determination of pK a is important as it can improve the pharmaceutical 

properties of a compound [1]. In general, charged compounds have better 

solubility, but are less effective in membrane permeation. Therefore, 

optimization of the pharmacokinetic profile of a compound can be 

performed by increasing or decreasing its ionization by changing functional 

groups. Another role for optimization of the charged group could be the 

interaction with its target. Changing the charge on the functional group 

could improve the interaction with its associated target and result in 

improved binding affinity. 

Different methods have been developed for the computational 

determination of pK a values. These can be based on different methods such 

as QSAR [2] or quantum chemistry approaches [3]. These two methods differ 

considerable in terms of computational resource and hence, the time 

needed for a prediction. Depending on the number of ligands that needed 

prediction and time available, a choice for one method can be made. 

A comparison between several pK a predictors (Pipeline Pilot, Moka, 

Epik and Jaguar) was made. All methods perform well when a diverse set 

of ligands which covers a range of pK a values is considered. However, 

when optimizing a series of compounds the influence of small changes 

to the molecule and its effect on pKa becomes more difficult to predict. 

References 

1. Waterbeemd van de H, Gifford E: Nature Rev Drug Disc 2003, 2:192. 

2. Milletti F, Storchi L, Sforna G, Cruciani G: J Chem Inf Model 2007, 47:2172. 

3. Kinsella GK, Rodriguez F, Watson GW, Rozas I: Bioorg Med Chem 2007, 15:2850. 

P38 

Kernel-based estimation of the applicability domain of QSAR models 

Nikolas Fechner * , Georg Hinselmann, A Jahn, A Zell 



Machine learning techniques have become a valuable tool to assess 

molecular properties without the need of in vitro experiments. Most of



these methods do not give any information if a molecule that is 

predicted can be sufficiently described by the knowledge contained in 

the model. Thus, the estimation of the reliability of a model-based 

prediction is an important question in machine learning based QSAR 

modeling. 

One approach to solve this problem is to describe the portion of the 

chemical space used during the training phase of a model. Any molecule 

includedinthesamesubspaceisthen considered as a structure for 

which the model is regarded as valid. This concept of the description of 

the subspace in which a model is regarded as reliable is known as the 

estimation of the applicability domain of this model [1]. 

Most machine learning approaches for QSAR rely on a vectorial 

representation of the molecules. The applicability domain is expressed as 

a subspace of the vector space with one dimension for each descriptor 

used. This concept can be not directly applied to kernel-based techniques 

like support vector machines. These methods rely on an implicit feature 

space that is only defined by the applied kernel similarity and with 

unknown dimensions. The applicability domain of a kernel-based model 

therefore has to be defined by means of the kernel. Consequently, this 

allows to use structured similarity measures, like the Optimal Assignment 

Kernel [2] and its extension [3], instead of a numerical encoding. Thus, it 

is possible to describe the complex chemical structure of many drugs 

better than it would be using descriptors. 

In this work, several approaches to define the applicability domain of a 

QSAR model by means of a kernel are presented and compared to each 

other. The approach is to extend the concept of a kernel density 

estimation to incorporate further information contained in a trained 

model. This can be achieved by using a weighted average kernel 

similarity of a predicted molecule to the training data set. The weights 

can be obtained either by exploiting the knowledge contained in the 

learned model or by approaches that describe the feature space structure 

using the kernel. 

References 

1. Jaworska J, Nikolova-Jeliazkova N, Aldenberg T: Altern Lab Anim 2005, 

33:445-459. 

2. Fröhlich H, Wegner J, Sieker F, Zell A: QSAR & Comb Sci 2006, 25:317-326. 

3. Fechner N, Jahn A, Hinselmann G, Zell A: J Chem Inf Mod 2009, 49:549-560. 

P39 

Expanding and understanding metabolite space 

Julio E Peironcely 1* , Andreas Bender 2 , M Rojas-Chertó 2 , T Reijmers 2 , 

L Coulier 1 , T Hankemeier 2 

1 TNO, Quality of Life, Utrechtseweg 48, Zeist, The Netherlands; 2 Netherlands 

Metabolomics Centre, University of Leiden, LACDR, Einsteinweg 55, 2333 CC 

Leiden, The Netherlands 


In metabolomics the identity and role of low mass molecules called 

metabolites that are produced in cell metabolic processes are 

investigated. These make them valuable indicators of the phenotype of a 

biological system. The ‘Metabolite Space’ is the total chemical universe of 

metabolites present in all compartments and in all states from any 

organism. These molecules exhibit common features that form what can 

be called ‘metabolite likeness’. Here,wefocusonthehumanmetabolite 

space, including both endogenous and exogenous (such as drug) 

metabolites. In order to analyze the ‘Metabolite Space’, we collected data 

from the Human Metabolome Database (HMDB) [1] which is a 

comprehensive database for human metabolites containing over 7000 

compounds that were identified in several human biofluids and tissues. 

As there still remain many compounds to be identified that lay outside 

the boundaries of this known space, exploring this unknown region is 

crucial to evaluate ‘metabolite likeness’. 

In order to expand ‘Metabolite Space’ in our approach we employed the 

Retrosynthetic Combinatorial Analysis Procedure (RECAP) [2] to generate 

new molecules that possess features similar to those present in 

metabolites, however in other (but still likely) rearrangements. We studied 

how discernible these new molecules are from real metabolites and, 

hence, whether synthetic organic chemistry reactions are indeed able to 

expand the known universe of metabolites. We further studied the new 

chemistry present in the expanded metabolite space by looking at 

Murcko assemblies [3], ring systems and other chemical properties. The 

new metabolite space is compared to other small molecules, such as 

those obtained from the ZINC database, that are not metabolites. By 

combining all the above analyses we expect to characterize better the 

metabolite space, and furthermore, to predict the metabolite-likeness of a 

molecule and to understand its immanent properties. 

References 

1. Wishart DS, et al: Nucleic Acids Res 2007, 35 Database: D521-526. 

2. Lewell XQ, Judd DB, Watson SP, Hann MM: J Chem Inf Comput Sci 1998, 

38:511-522. 

3. Bemis GW, Murcko MA: J Med Chem 1996, 39:2887-2893. 

P40 

Classification of CYP450 1A2 inhibitors using PubChem data 

Sergii Novotarskyi * , Iurii Sushko, R Körner, AK Pandey, Igor Tetko 

Helmholtz Zentrum München, Ingolstädter Landstraße 1, D-85764 

Neuherberg, <strong>German</strong>y 


Page 21 of 25 

Cytochromes P450 (CYP450) are a superfamily of enzymes, involved in 

metabolism of a large number of xenobiotic compounds. CYP450 are 

involved in degradation of a large amount of drugs, currently present on 

the market. The promiscuity with respect to substrates makes the CYP450 

enzymes prone to inhibition by a large amount of drugs, which gives 

way to clinically significant drug-drug interactions. 

In this work different machine learning methods were applied to classify 

the inhibitors/noninhibitors of human CYP450 1A2. The structures and 

the active/inactive classification concerning CYP1A2 inhibition were taken 

from PubChem BioAssay database. This assay uses human CYP1A2 to 

measure the demethylation of luciferin 6’ methyl ether (Luciferin-ME; 

Promega-Glo) to luciferin. 

The tested methods include k nearest neighbors (kNN), decision tree, 

random forest, support vector machine (SVM) and associative neural 

networks (ASNN). The descriptors used were those from the Dragon 

software, the fragment descriptors and the E-state indices. 

The training and test sets were handled separately to avoid different 

possibilities of overfitting - including overfitting by descriptor selection. 

Different applicability domain (AD) approaches were used to estimate the 

confidence of classification. 

As a result the models managed to correctly classify 80% of the test set 

instances. The accuracy of classification was found to be up to 95%, if 

only 30% most confident predictions were taken into account. The 

model was also applied to an external test set of 187 molecules, 

collected from literature and measured using a different etalon reaction. 

For this set accuracy of 78% was achieved on the 30% most confident 

predictions. 

All the developed models are fast enough to be used for virtual screening 

of CYP1A2 inhibitors and noninhibitors. The developed models are 

publicly available on-line at the http://qspr.eu web site. 

P41 

Applicability domain for classification problems 

Iurii Sushko * , S Novotarskyi, AK Pandey, R Körner, Igor Tetko 

Sushko, I., Helmholtz Zentrum München/IBIS, Ingolstädter Landstraße 1, 

D-85764 Neuherberg, <strong>German</strong>y 


Classification models are frequent in QSAR modeling. It is of crucial 

importance to provide good accuracy estimation for classification. 

Applicability domain provides additional information to identify which 

compounds are classified with best accuracy and which are expected to 

have poor and unreliable predictions. The selection of the most reliable 

predictions can dramatically improve performance of methods while 

decreasing coverage of predictions [1]. 

In binary classification problems, labels for machine learning methods are 

discrete {-1, 1}. Nonetheless, model usually yields prediction that is 

continuous. Most apparent metrics for accuracy estimation is distance 

between prediction point and edge of a class, i.e. the more is the 

distance between prediction the edge of the class, the more reliable and 

accurate is the prediction of given compound. This metric has been 

already used in several previous studies (e.g., [2]) and demonstrated good



separation of reliable and non-reliable classifications. In quantitative 

predictions, the standard deviation of ensemble predictions has been 

found as the most accurate measure distance in a recent benchmarking 

[3]. 

We propose to integrate both metrics. Rather than giving a point 

estimate, this approach provides us with a probability distribution of 

findingparticularcompoundinoneof the classes. Suggested metrics is 

probability 

� 

E 

Navxdx ( , , ) 

where E is class domain a - ensemble’s average prediction, v – variance of 

ensemble’s prediction,N(a, v, x) is probability density of the Gaussian 

distribution. Performance of this metric and its comparison to the 

traditional ones are evaluated for several QSAR/QSPR classification 

problems. The developed approach can be freely accessed to develop 

and estimate applicability domain of classification models at http://qspr.eu 

web site. 

References 

1. Tetko IV, Bruneau P, Mewes HW, Rohrer DC, Poda GI: Drug Discov Today 

2006, 11:700. 

2. Manallack DT, Tehan BG, Gancia E, Hudson BD, Ford MG, Livingstone DJ, 

Whitley DC, Pitt WR: J Chem Inf Comput Sci 2003, 43:674. 

3. Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Oberg T, 

Todeschini R, Fourches D, Varnek A: J Chem Inf Model 2008, 48:1733. 

P42 

Automatic pharmacophore model generation using weighted 

substructure assignments 

Andreas Jahn * , H Planatscher, Georg Hinselmann, Nikolas Fechner, A Zell 



The generation of a pharmacophore model is a challenging process, 

which often requires the interaction of medicinal chemists. Given a 

number of ligands for a specific target, the aim is to identify the 

pharmacophore patterns that are responsible for the biological activities 

of chemical compounds. A recent study of optimal assignment methods 

has shown that the assignment of chemical substructures is able to 

detect active compounds in a data set [1]. Therefore, we investigated the 

possibility to use this technique to identify key features of a set of active 

compounds. 

To determine important substructures of active compounds, we 

integrated n weight factors, where n is the number of substructures. The 

substructures were defined using the pharmacophore definitions of Phase 

3.0 [2]. To define the individual weights of the pharmacophore patterns, 

we integrated a genetic algorithm which assigns weight factors to the 

previously defined patterns. The experimental setup was designed as 

follows: Given a data set with active compounds, the most active 

compound was selected as query structure for the experiment. The 

remaining active compounds were inserted into a background data set 

containing inactive compounds. The genetic algorithm evolved n weights 

for the pharmacophore patterns of the query structure. To evaluate the 

fitness of an individual, we performed a single query screening with the 

weights of the individual. During the optimization process, the BEDROC 

score [3] is optimized which puts emphasis on the early recognition 

performance. The result of the genetic algorithm was a weight vector 

that assigns each pharmacophore feature the weight of the best 

individual. 

We evaluated our approach on a subset of the Directory of Useful Decoys 

that is suitable for ligand-based virtual screening [1][4]. The query 

structure was extracted from the same complexed crystal structure used 

by Huang et al. [4] to determine the binding site of the protein. 

The presented method is able to provide valuable information about key 

features that are important for the biological activity of a compound. 

Additionally, information of the protein structure is not needed. 

Therefore, the method can also be used to derive a pharmacophore 

model if no protein structure is available (e.g. GPCRs). 

Page 22 of 25 

References 

1. Jahn A, Hinselmann G, Fechner N, Zell A: Journal of Cheminformatics 2009, 

1:14. 

2. Phase, version 3.0 Schrödinger, LLC, New York, NY 2008. 

3. Truchon JF, Bayly CI: J Chem Inf Model 2007, 41:488-508. 

4. Huang N, Shoichet B, Irwin J: J Med Chem 2006, 49:6789-6801. 

P43 

A combined combinatorial and pKa-based approach to ligand 

protonation states 

Tim ten Brink * , Thomas E Exner 

Department for Chemistry, University of Konstanz, D-78457 Konstanz, 



The protonation of the ligand molecule and the protein binding site has 

a significant influence on the results obtained by protein-ligand 

docking. Due to the inability of X-ray crystallography to resolve the 

hydrogen atom in protein and protein complex structures, the correct 

protonation for the protein and the ligand has to be assigned on a 

theoretical basis before the structures can be used. Because of the local 

environment inside the binding site and because of the influence of 

the ligand and the protein onto each other, the ligand protonation 

can differ from the protonation one would expect for the ligand in 

solution under physiological conditions. Hence for protein-ligand 

docking different protonation states of the ligand have to be taken into 

account. 

Our recently introduced structure preparation tool SPORES [1] used a rule 

based method to generate a standard protonation for each ligand 

molecule and afterwards generated different protonation states in a 

combinatorial way by adding and removing single hydrogen atoms 

belonging to predefined functional groups. Docking of all these 

protonation states of a given ligand molecule often led to scoring 

problems due to the different number of hydrogen interactions formed 

by the different protomers of the ligand molecule. To overcome these 

problems two methods to filter highly charged and unstable protonation 

states from the docking were implemented. One based on the difference 

between the standard protonation of the ligand and the actual 

protonation state and one based on pKa values calculated with 

ChemAxon’s MARVIN software [2]. Here we present a new approach in 

which the ligand atoms considered for the combinatorial method not 

chosen from predefined functional groups but according to the 

calculated pKa values which leads to a wider variety but with a smaller 

overall number of protonation states. 

References 

1. ten Brink T, Exner TE: J Chem Inf Model 2009, 49(6):1535-1546. 

2. MARVIN 5.2.0, ChemAxon 2009 [http://www.chemaxon.com]. 

P44 

Semi-empirical derived descriptors for the modelling of properties of 

N-containing heterocycles 

Alexander Entzian 1* , Horst Bögel 1 , Mirko Buchholz 2 , Ulrich Heiser 2 

1 Martin-Luther-University Halle-Wittenberg, Institute for Organic Chemistry, 

Kurt-Mothes-Str. 2, 06120 Halle, <strong>German</strong>y; 2 Probiodrug AG, Dept. Medicinal 

Chemistry, Weinbergweg 22, 06120 Halle, <strong>German</strong>y 


Nitrogen is one of the most prominent hetero atoms found in 

heterocycles. The corresponding electron lone pairs of these nitrogen 

atoms are mainly responsible for properties like the basicity and the pkbvalues 

of the investigated heterocycle. Nevertheless, the overall electronic 

state of a molecule is also directly related to observable physico-chemical 

properties. This fact underlines a possible connection between different 

investigated properties. 

The electronic properties of nitrogen containing compounds were 

analyzed with the aim to further predict these properties from 

quantum-mechanical descriptors. We thereby expect a relationship 

between the proton affinity, the pkb-value and the strength of metal 

complexation.



Because basicity seems to be a fundamental property of these compounds, 

the work was first focused on the proton affinities of the molecules 

in the gas phase. We thereby consider the fact, that these values 

are not influenced by the solvation in liquid phases. Lone pairs in 

nitrogen containing heterocycles play also an important role for metal 

complexation or due to their nucleophilic attack in chemical reactions. 

Therefore 55 heterocyclic compounds were selected that belong to the 

compound classes of substituted pyridines, pyrazoles and imidazoles. 

These compound set was carefully divided into a training (37) and test 

set (18), and different descriptors of the electronic states were 

calculated by using the semi-empirical molecular orbital software 

MSINDO [1]. In regression analysis the following descriptors were 

identified and marked as important: 

Charge of the nitrogen atom, 

Energy of the lone pair electrons and 

Perturbation treatment of the interaction energy - Klopman [2]. 

Using this set of descriptors a model was built and validated that predicts 

the experimental data for proton affinity with an R 2 of 0.93 and a Q 2 of 

0.91. Furthermore, it was possible to successfully develop models for the 

prediction of pkb-values and the strength of a metal complex formation 

for these compounds. 

References 

1. Bredow T, Geudtner G, Jug K: J Comput Chem 2001, 22:861. 

2. Klopman G: J Amer Chem Soc 1968, 90:223. 

P45 

QSAR of anti-inflammatory drugs 

Veronica R Khayrullina 1* , H Bögel 2 

1 Bashkir State University, RU-450074 Ufa, Russia; 2 Halle, <strong>German</strong>y 


The computer analysis of relations between molecular structures and their 

biological activity using fragment-based methods is very useful to draw 

conclusions for the understanding of drug action and for the 

development of more efficient non-toxic drug candidates. 

We used the computer system SARD-21 (Structure Activity Relationship & 

Design) to investigate common structural features (fragments and 

substituents) typical for high- and low-effective non-steroid antiinflammatory 

drugs (NSAIDs) succesfully. 

This derived information has been used for the model for prediction of 

anti-inflammatory effectiveness of medicines with 76% and 81% level of 

recognition by two methods. This information could be used for creating 

new highly effective NSAIDs, and for increasing effectiveness of already 

known components. 

In a second part of this paper the interrelation between structure and 

efficacy for anti-inflammatory drug action is carried out using traditional 

QSAR with descriptors from topology and from quantum-mechanical 

calculations followed by regression models from modelling. 

The aim of this paper is to compare both molecular approaches of 

molecular design of drugs. 

Reference 

1. Khayrullina VR, et al: Russ Chem Bull, Int Ed 2006, 55:1. 

P46 

Fingerprint-based detection of acute aquatic toxicity 

Luca Pireddu 1* , L Michielan 2 , M Floris 1 , P Rodriguez-Tomé 1 , S Moro 2 

1 CRS4, Pula, Italy; 2 University of Padova, Padova, Italy 


In this work we show the effectiveness of 2D structural fingerprints in the 

prediction of aquatic toxicity of chemical compounds, creating a selfcontained 

system for structure-based aquatic toxicity classification. Using 

the data from the U.S. Environmental Protection Agency Fat Head 

Minnow (EPA-FHM) dataset [1] we build a non-linear RBF SVM [2] 

classifier that distinguishes acutely toxic compounds from less toxic 

compounds, loosely according to the criterion stipulated by the E.U. 

Reach legislation [3]. The classifier achieves up to 86% accuracy in leave- 

Page 23 of 25 

one-out validation using 580 of the dataset’s 614 compounds. This 

performance is comparable with models built from the same dataset 

using more sophisticated molecular descriptors, such as AutoMEP and 

Sterimol descriptors [4]. We apply our classification model to predict the 

aquatic toxicity of 3M compounds in the MMsINC database [5]. 

Furthermore, we create a linear SVM model using the same technique 

and apply it to the MMsINC data, with the additional integration of the 

EXPLAIN system [6] which allows us to show which structural features are 

responsible for the model classifying a molecule as less toxic or acutely 

toxic. 

References 

1. Russom CL, Bradbury SP, Broderius SJ, Hammermeister DE, Drummond RA: 

Predicting modes of action from chemical structure: Acute toxicity in 

the fathead minnow (Pimephales promelas). Environmental Toxicology and 

Chemistry 1997, 16(5):948-967. 

2. Boser B, Guyon I, Vapnik V: A Training Algorithm for Optimal Margin 

Classifiers. Computational Learning Theory 1992, 144-152. 

3. EU: Corrigendum to Regulation (EC) No 1907/2006 of the European 

Parliament and of the Council of 18 December 2006 concerning the 

Registration, Evaluation, Authorization and Restriction of Chemicals 

(REACH). Off J Eur Union L136 2007, 50. 

4. Michielan L, Pireddu L, Floris M, Bacilieri M, Rodriguez-Tomé P, Moro S: 

2009 in press. 

5. Masciocchi B, Frau G, Fanton M, Sturlese M, Floris M, Pireddu L, Palla P, 

Cedrati F, RodriguezTomé P, Moro S: MMsINC: a large-scale 

chemoinformatics database. Nucleic Acids Research 2008, 37 Database: 

D284-90. 

6. Poulin B, Eisner R, Szafron D, Lu P, Greiner R, Wishart D, Fyshe A, Pearcy B, 

Macdonell C, Anvik J: Visual Explanation of Evidence in Additive 

Classifiers. Innovative Applications of Artificial Intelligence 2006. 

P47 

Adaptive matrix metrics for molecular descriptor assessment in 

QSPR classification 

Axel J Soto 1* , Marc Strickert 2 , GE Vazquez 1 

1 Planta Piloto de Ingeniería Química (PLAPIQUI), Universidad Nacional del 

Sur, Alem 1250, 8000 Bahía Blanca, Argentina; 2 Leibniz Institute of Plant 

Genetics and Crop Plant Research (IPK), Corrensstr. 3, 06466 Gatersleben, 



QSPR methods represent a useful approach in the drug discovery process, 

since they allow predicting in advance biological or physicochemical 

properties of a candidate drug. For this goal, it is necessary that the QSPR 

method be as accurate as possible to provide reliable predictions. 

Moreover, the selection of the molecular descriptors is an important task 

to create QSPR prediction models of low complexity which, at the same 

time, provide accurate predictions. 

In this work, a matrix-based method [1] is used to transform the original 

data space of chemical compounds into an alternative space where 

compounds with different target properties can be better separated. For 

usingthisapproach,QSPRisconsideredasaclassificationproblem.The 

advantage of using adaptive matrix metrics is twofold: it can be used to 

identify important molecular descriptors and at the same time it allows 

improving the classification accuracy. 

A recently proposed method making use of this concept [2] is extended 

to multi-class data. The new method is related to linear discriminant 

analysis and shows better results at yet higher computational costs. An 

application for relating chemical descriptors to hydrophobicity property 

[3] shows promising results. 

References 

1. Strickert M, Keilwagen J, Schleif F-M, Villmann T, Biehl M: Matrix Metric 

Adaptation Linear Discriminant Analysis of Biomedical Data. Lecture Notes 

in Computer Science 2009, 5517/2009:933-940. 

2. Strickert M, Soto AJ, Keilwagen J, Vazquez GE: Towards matrix-based 

selection of feature pairs for efficient ADMET prediction. Argentine 

Symposium on Artificial Intelligence, ASAI 2009, 83-94. 

3. Soto AJ, Cecchini RL, Vazquez GE, Ponzoni I: A Wrapper-based Feature 

Selection Method for ADMET Prediction using Evolutionary Computing. 

Lecture Notes in Computer Science 2008, 4973/2008:188-199.



P48 

Evolutionary design of selective adenosine receptor ligands 

Horst van der Eelke * , J Kruisselbrink, A Aleman, MT Emmerich, 

Andreas Bender, AP IJzerman 

Division of Medicinal Chemistry, Leiden/Amsterdam Center for Drug 

Research, Leiden University, Einsteinweg 55, 2333CC Leiden, The Netherlands 


Evolutionary de novo design is the design of new active compounds from 

scratch, using evolutionary principles. It involves an iterative cycle of 

structure generation, evaluation, and selection of candidate structures. 

Selected candidates serve as input for the next generation. By repeatedly 

selecting only the best structures as basis for structure generation, the 

process evolves toward better (not necessarily best) solutions. Here, we 

applied an evolutionary algorithm (the Molecule Commander) for the design 

of selective adenosine receptor ligands. Generated compounds were all ruleof-5 

compliant and had a polar surface areas between 0 and 140 Å 2 , 

favorable for intestinal absorption. In addition, toxic compounds were 

filtered out using a categorical SVM model for prediction of mutagenicity 

trained on Ames-test mutagenicity data (5-fold ROC score: 0.8948). Four 

pharmacophores were designed, one for each human adenosine receptor 

subtype. The hA 1 receptor pharmacophore served as objective for the 

evolution, while the three other pharmacophores served as negative 

objective in order to obtain selectivity. In order to measure similarity with 

known adenosine receptor ligands, ring systems and scaffolds in the 

generated compounds were compared with those extracted from adenosine 

ligands in the StARLITe database. With each new generation, the structures 

displayed an increasing number of ring systems also found in adenosine 

receptor ligands while the number of uniquecorestructures(scaffolds) 

increased as well. Eventually, the best (ADMET and pharmacophore score) 

candidate structures will be proposed for synthesis and tested for activity. 

P49 

Ionotropic GABA receptors: modelling and design of selective ligands 

Vladimir A Palyulin * , EV Radchenko, DE Osolodkin, VI Chupakhin, NS Zefirov 

Department of Chemistry, Moscow State University, Moscow 119991, Russia 


Ionotropic GABA A and GABA C receptors play an important role in the operation 

of CNS and serve as targets for many neuroactive drugs. Using the homology 

modelling and molecular dynamics, the 3D models of the receptors were built 

and some aspects of ligand-target interactions were elucidated [1,2]. 

To better understand the structural factors controlling the activity and 

selectivity of the ligands, a series of QSAR models [3] were derived based 

on the Molecular Field Topology Analysis (MFTA) [4], CoMFA and 

Topomer CoMFA approaches. They were compared with each other as 

well as with the molecular modelling results. 

Finally, a number of potential selective ligand structures were identified 

by means of the virtual screening [5] from the available chemicals 

databases and the generated structure libraries. 

References 

1. Osolodkin DI, Chupakhin VI, Palyulin VA, Zefirov NS: J Mol Graph Model 

2009, 27:813. 

2. Chupakhin VI, Palyulin VA, Zefirov NS: Doklady Biochem Biophys 2006, 408:169. 

3. Chupakhin VI, Bobrov SV, Radchenko EV, Palyulin VA, Zefirov NS: Doklady 

Chemistry 2008, 422:227. 

4. Palyulin VA, Radchenko EV, Zefirov NS: J Chem Inf Comp Sci 2000, 40:659. 

5. Radchenko EV, Palyulin VA, Zefirov NS: Chemoinformatics Approaches to 

Virtual Screening RSC: Varnek A, Tropsha A 2008, 150-181. 

P50 

PoseView – molecular interaction patterns at a glance 

Katrin Stierand * , Matthias Rarey 

Center for Bioinformatics (ZBH), University of Hamburg, Bundesstrasse 43, 

20146 Hamburg, <strong>German</strong>y 


Chemists are well trained in perceiving 2D molecular sketches. On the side 

of computer assistance, the automated generation of such sketches 

becomes very difficult when it comes to multi-molecular arrangements 

such as protein-ligand complexes in adrugdesigncontext.Existing 

solutions to date suffer from drawbacks such as missing important 

interaction types, inappropriate levels of abstraction and layout quality. 

During the last few years we have developed PoseView [1,2], a tool which 

displays molecular complexes incorporating a simple, easy-to-perceive 

arrangement of the ligand and the amino acids towards which it forms 

interactions. Resulting in atomic resolution diagrams, PoseView operates 

on a fast tree re-arrangement algorithm to minimize crossing lines in the 

sketches. Due to a de-coupling of interaction perception and the drawing 

engine, PoseView can draw any interactions determined by either 

distance-based rules or the FlexX interaction model (which itself is user 

accessible). Owing to the small molecule drawing engine 2Ddraw [3], 

molecules are drawn in a textbook-like manner following the IUPAC 

regulations. 

The tool has a generic file interface for other complexes than proteinligand 

arrangements. It can therefore be used as well for the display of, 

e.g., RNA/DNA complexes with small molecules. For batch processing, an 

additional command line interface is available; output can be provided in 

various formats, amongst them gif, ps, svg and pdf. 

Besides the underlying interaction models, we will present new 

algorithmic approaches, assess usability issues and a large-scale validation 

study on the PDB. 

References 

1. Stierand K, Rarey M: ChemMedChem 2007, 2:853. 

2. Stierand K, Maaß P, Rarey M: Bioinformatics 2006, 22:1710. 

3. Fricker PC, Gastreich M, Rarey M: J Chem Inf Comput Sci 2004, 44(3):1065. 

P51 

Get the best from substructure mining 

Jeroen Kazius 

Curios-IT, Anna van Burenhof 63, 2316GP Leiden, The Netherlands 


Page 24 of 25 

The chemical information that is present in a set of compounds is rarely 

fullyexploited.Thisismostlybecause no descriptor set can capture all 

biologically important features. As a result, valuable chemical 

knowledge can thus stay hidden from hypothesis-based drug design. 

The simplest form of a structure-activity relationship (SAR) is a 

substructure that predisposes compounds towards reduced or increased 

biological activity. Such simple patterns should not be missed during 

drug design. 

The aim of substructure mining is to present those substructures that are 

most likely related to biological activity. This method thus provides rapid 

access to a substantial repertoire of chemical descriptors that otherwise 

remains hidden: substructures. In short, substructure mining consists of a 

focused, but exhaustive, series of substructure searches. 

This poster describes how AweSuM, the new Awesome Substructure 

Mining tool from Curios-IT, was employed to learn the most interesting 

substructures. The poster also discusses the value of enriching the data 

with 2D pharmacophore information prior to mining. An enriched, 

detailed SAR analysis produced a scaffold that summarises the chemical 

content of datasets better than any standard substructure. The 

pharmacophore that AweSuM extracted shows predictive power and 

agrees with published chemical knowledge. These results demonstrate 

that useful SAR knowledge can be extracted from the vast space of 

substructure descriptors. In this way, AweSuM reveals key substructures 

(e.g., pharmacophores or toxicophores), which can often be predictive for 

biological activities. 

P52 

Rescoring of docking poses using force field-based methods 

Nina M Fischer * , WM Schneider, O Kohlbacher 

Eberhard Karls University, Center for Bioinformatics Tübingen, Sand 14, 72076 

Tübingen, <strong>German</strong>y 


Existing protein-ligand docking methods computationally screen 

thousands to millions of organic molecules against protein structures, 

trying to find those with complementary shapes and highest binding free



energies. To allow large molecular databases to be screened rapidly, 

simple and approximative scoring functions are used as a fast filter, 

resultinginlowhitrates.Therefore, docking hit lists are commonly 

rescored in a final step by more rigorous and time-consuming methods 

to gain a more accurate final list of ranked compounds. 

Molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) or 

generalized Born surface area (MM-GBSA) methods are currently 

considered to be suitable techniques for rescoring. These physically 

realistic approaches incorporate more sophisticated models for solvation 

and electrostatic interactions than most scoring functions. Hence, they can 

discriminate more reliably between correct and incorrect docking poses. 

A high-throughput rescoring protocol using force field-based methods 

has been proposed by Brown and Muchmore [1]. They used 18 in-house 

urokinase-ligand crystal structures and their corresponding experimentally 

determined binding free energies as a test set. On the basis of this 

rescoring protocol we tested several molecular dynamics simulation 

protocols in combination with different MM-PBSA, and MM-GBSA 

calculation procedures using Amber 10 [2] and Gromacs 4 [3]. 

Considering performance and accuracy, our best rescoring protocol 

performs similarly to the one described by Brown and Muchmore [1]. 

It has a comparable run-time and achieves a correlation between 

experimental and rescored values of 0.88 compared to 0.87. 

However, we used urokinase-ligand complexes generated using the 

docking program Glide [4] instead of crystal structures. Additionally, this 

shows that our rescoring protocol improves the correlation of 0.57 

between experimental values and Glide scores significantly to 0.88, 

thereby achieving a more accurate list of ranked compounds. 

The protocol will be incorporated into BALLView [5], an open-source 

molecular viewer and modeling tool. Thus, it will be available free of 

charge and can be conveniently used to rescore (docked) protein-ligand 

complexes. 

References 

1. Brown SP: J Chem Inf Model 2007, 47:1493. 

2. Case DA: University of California, San Francisco 2008. 

3. Hess B: J Chem Theory Comput 2008, 4:435. 

4. Friesner RA: J Med Chem 2004, 47:1739. 

5. Moll A: J Comput Aided Mol Des 2005, 19:791. 

P53 

The pipelined metabolite identification based on MS fragmentation 

Miguel Rojas-Cherto * , PT Kasper, Peironcely Julio E, T Reijmers, 

RJ Vreeken, T Hankemeier 

Leiden, The Netherlands 


Structural characterization and identification of components of complex 

biological mixtures constitutes one of the central aspects of metabolomics. 

Metabolite identification is a challenging but essential task in studies of 

biological samples. Mass spectrometry, because of its high sensitivity and 

specificity, is widely and successfully used in analysis of biological samples. 

Identification of metabolites can be in principle achieved using high 

resolution multistage mass spectrometry (MSn) because it provides a 

feature rich fingerprint of the precursor structure. However, neither general 

methodology for the identification nor extensive databases of metabolites 

with multistage mass spectrometric data are available at the moment. We 

demonstrate in this poster the feasibility of the strategy for metabolite 

identification based on analysis of fragmentation trees. 

High resolution multi stage MS experiments were performed on 

LTQ-Orbitrap (Thermo) equipped with Triversa nanoMate (Advion) 

nanoelectrospray ion source using defined protocol. An in-house 

developed software, integrating among others: Chemistry Development 

Kit (CDK) and XCMS libraries, was used for spectral data processing. 

Resulting fragmentation trees were stored in a local database. 

Multi-stage Molecular Formula (MMF) tool, which uses a method to 

resolve the elemental composition of the compound and fragment ions 

derived from MSn data using a cyclic constraining process has been 

developed. The process of elemental formula assignment and fitting 

within elemental formula paths removes artefacts of the spectra. 

Background signals, spikes, contaminations, side peaks, etc., are rejected 

in this stage and are not included in the fragmentation tree. The 

resulting fragmentation trees are stored in XML format. 

Repeatability, reproducibility and robustness of fragmentation tree 

acquisitions were tested by changing experimental conditions and 

varying the concentration of the metabolite of interest. An acquisition 

protocol was established for the reliable and reproducible acquisition of 

mass spectral trees. It was investigated to which extent the variation of 

conditions such as fragmentation energy, isolation width etc. did change 

the fragmentation pattern or topology of hierarchical relations between 

fragments. 

We demonstrate how the developed analytical strategy based on 

analysis of fragmentation trees can be used to discriminate between 

metabolite isomers with the same elemental composition and an only 

slightly different structure, but with a significantly different biological 

function. 

Our results provide firm basis for developing a generic, multi stage mass 

spectrometry based platform for efficient identification of metabolites. 

P54 

SBE13, a newly identified inhibitor of inactive polo-like kinase 1 

Sarah Keppner 1* , Ewgenij Proschak 2 , Gisbert Schneider 2 , B Spänkuch 1 

1 Goethe-University, Medical School, Department of Gynecology and 

Obstetrics, Theodor-Stern-Kai 7, 60590 Frankfurt, <strong>German</strong>y; 

2 Goethe-University, Frankfurt, <strong>German</strong>y 


Page 25 of 25 

Protein kinases are important targets for drug development. The almost 

identical protein folding of kinases and the common co-substrate ATP 

leads to the problem of inhibitor selectivity. Type II inhibitors, targeting 

the inactive conformation of kinases, occupy a hydrophobic pocket with 

less conserved surrounding amino acids [1]. 

Human polo-like kinase 1 (Plk1) represents a promising target for 

approaches to identify new therapeutic agents. Plk1 belongs to a family 

of highly conserved serine/threonine kinases, and is a key player in 

mitosis, where it modulates the spindle checkpoint at metaphase/ 

anaphase transition. Plk1 is over-expressed in all today analyzed human 

tumors of different origin and serves as a negative prognostic marker in 

cancer patients. The newly identified inhibitor, SBE13, a vanillin derivative, 

targets Plk1 in its inactive conformation [2]. This leads to selectivity 

within the Plk family and towards Aurora A. This selectivity can be 

explained by docking studies of SBE13 into the binding pocket of 

homology models of Plk1, Plk2 and Plk3 in their inactive conformation. 

SBE13 showed anti-proliferative effects in cancer cell lines of different 

origins with EC 50 values between 5 μM and39μM and induced apoptosis. 

Increasing concentrations of SBE13 result in increasing amounts of cells in 

G 2/M phase 13 hours after double thymidin block of HeLa cells. The kinase 

activity of Plk1 was inhibited with an IC 50 of 200 pM. 

Taken together, we could show that carefully designed structure-based 

virtual screening is well-suited to identify selective type II kinase 

inhibitors targeting Plk1 as potential anti-cancer therapeutics. 

References 

1. Liu Y, Gray NS: Nat Chem Biol 2006, 2:358-364. 

2. Keppner S, Proschak E, Schneider G, Spänkuch B: Chem Med Chem 2009 in 

press. 

Cite abstracts in this supplement using the relevant abstract number, 

e.g.: Keppner et al.: SBE13, a newly identified inhibitor of inactive pololike 

kinase 1. Journal of Cheminformatics 2010, 2(Suppl 1):P54

5th German Conference on Cheminformatics: 23 ... - BioMed Central

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?