Reviews in Computational Chemistry Volume 18

More documents

Recommendations

Info

CHAPTER 1 Clustering Methods and Their Uses in Computational Chemistry Geoff M. Downs and John M. Barnard Barnard Chemical Information Ltd., 46 Uppergate Road, Stannington, Sheffield S6 6BX, United Kingdom INTRODUCTION Reviews in Computational Chemistry, Volume 18 Edited by Kenny B. Lipkowitz and Donald B. Boyd Copyr ight © 2002 John Wiley & Sons, I nc. ISBN: 0-471-21576-7 Clustering is a data analysis technique that, when applied to a set of heterogeneous items, identifies homogeneous subgroups as defined by a given model or measure of similarity. Of the many uses of clustering, a prime motivation for the increasing interest in clustering methods is their use in the selection and design of combinatorial libraries of chemical structures pertinent to pharmaceutical discovery. One feature of clustering is that the process is unsupervised, that is, there is no predefined grouping that the clustering seeks to reproduce. In contrast to supervised learning, where the task is to establish relationships between given inputs and outputs to enable prediction of the output from new inputs, in unsupervised learning only the inputs are available and the task is to reveal aspects of the underlying distribution of the input data. Clustering is thus complemented by the related supervised process of classification, in which items are assigned labels applied to predefined groups: examples include recursive partitioning, naïve Bayesian analysis, and K nearest-neighbor selection. Clustering is a technique for exploratory data analysis and is used increasingly in preliminary analyses of large data sets of medium and high dimensionality as a method of selection, diversity analysis, and data reduction. This chapter reviews the main clustering methods that are used for analyzing chemical 1
2 Clustering Methods and Their Uses in Computational Chemistry data sets and gives examples of their application in pharmaceutical companies. Compared to the other costs of drug discovery, clustering can add significant value at minimal cost. First, we provide an outline of clustering as a discipline and define some of the terminology. Then, we give a brief tutorial on clustering algorithms, review progress in developing the methods, and offer some example applications. Clustering methodology has been developed and used in a variety of areas including archaeology, astronomy, biology, computer science, electronics, engineering, information science, and medicine. Good, general introductory texts on the topic of clustering include those by Sneath and Sokal, 1 Kaufmann and Rousseeuw, 2 Everitt, 3 and Gordon. 4 The main text that is devoted to clustering of chemical data sets is by Willett, 5 with review articles by Bratchell, 6 Barnard and Downs, 7 and Downs and Willett. 8 The present chapter is a complement and update to the latter article. In a previous volume of this series, Lewis, Pickett, and Clark 9 reviewed the use of diversity analysis techniques in combinatorial library design. As will be shown in the section on Chemical Applications, the current main uses of clustering for chemical data sets are to find representative subsets from high throughput screening (HTS) and combinatorial chemistry, and to increase the diversity of in-house data sets through selection of additional compounds from other data sets. Methods suitable for compound selection are the main focus of this chapter. The methods must be able to handle large data sets of high-dimensional data. For small, low-dimensional data sets, most clustering methods are applicable, and descriptions in the standard texts and implementations available in standard statistical software packages 10,11 suffice. Implementations designed for use on chemical data sets are available from most of the specialist software vendors, 12–17 the majority of which were reviewed by Warr. 18 The overall process of clustering involves the following steps: 1. Generate appropriate descriptors for each compound in the data set. 2. Select an appropriate similarity measure. 3. Use an appropriate clustering method to cluster the data set. 4. Analyze the results. This chapter focuses on step 3. For step 1, descriptors may include property values, biological properties, topological indexes, and structural fragments. The performance of these descriptors and forms of representation have been analyzed by Brown 19 and Brown and Martin. 20,21 Similarity searching for step 2 has been discussed by Downs and Willett; 22 characteristics of various similarity measures have been discussed by Barnard, Downs, and Willett. 23,24 For step 4, little has been published specifically about visualization and analysis of results for chemical data sets. However, most publications that focus on implementing systems that utilize clustering do provide details of how the results were displayed or analyzed.
Page 1 and 2: Reviews in Computational Chemistry
Page 3 and 4: Kenny B. Lipkowitz Department of Ch
Page 5 and 6: vi Preface three-dimensional struct
Page 7 and 8: viii Preface some descriptors and i
Page 9 and 10: Epilogue and Dedication My associat
Page 11 and 12: Contents 1. Clustering Methods and
Page 13 and 14: Contents xv Electron Transfer in Po
Page 15 and 16: Contributors John M. Barnard, Barna
Page 17 and 18: Contributors to Previous Volumes *
Page 19 and 20: Volume 3 (1992) Tamar Schlick, Opti
Page 21 and 22: Volume 7 (1996) Geoffrey M. Downs a
Page 23 and 24: Volume 11 (1997) Mark A. Murcko, Re
Page 25 and 26: T. Daniel Crawford* and Henry F. Sc
Page 27 and 28: Topics Covered in Volumes 1-18 * Ab
Page 29: Reviews in Computational Chemistry
Page 33 and 34: 4 Clustering Methods and Their Uses
Page 39 and 40: 10 Clustering Methods and Their Use
Page 71 and 72: 42 The Use of Scoring Functions in
Page 79 and 80: Table 1 Reference List for the Most
Page 81 and 82:
52 The Use of Scoring Functions in
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
CHAPTER 3 Potentials and Algorithms
Page 119 and 120:
Vn (1 + cos(nω + γ)) 2 K θ (θ
Page 121 and 122:
are modified by their environment w
Page 123 and 124:
Table 1 Polarizability Parameters f
Page 125 and 126:
The polarizable point dipole models
Page 127 and 128:
two to three orders of magnitude sl
Page 129 and 130:
M i z i + q i d i k i and shell cha
Page 131 and 132:
Shell Models 103 on estimates of sh
Page 133 and 134:
minimization can be replaced by mor
Page 135 and 136:
The energy required to create a cha
Page 137 and 138:
for all i ði:e:; 8 iÞ: @U @qi l
Page 139 and 140:
where we have used q Cl ¼ qNa. The
Page 141 and 142:
Electronegativity Equalization Mode
Page 143 and 144:
Electronegativity Equalization Mode
Page 145 and 146:
of N molecules is taken as a Hartre
Page 147 and 148:
is treated using variable charges.
Page 149 and 150:
water have been developed, includin
Page 151 and 152:
Applications 123 classical and rigi
Page 153 and 154:
developing polarizable models. A va
Page 155 and 156:
Comparison of the Polarization Mode
Page 157 and 158:
negligible errors in such propertie
Page 159 and 160:
ecome significant at field strength
Page 161 and 162:
noteworthy in this regard because t
Page 163 and 164:
References 135 9. P. Cieplak and P.
Page 165 and 166:
References 137 48. E. L. Pollock an
Page 167 and 168:
References 139 Computational Chemis
Page 169 and 170:
References 141 139. J. Hinze and H.
Page 171 and 172:
References 143 178. J. J. P. Stewar
Page 173 and 174:
References 145 216. M. W. Mahoney a
Page 175 and 176:
CHAPTER 4 New Developments in the T
Page 177 and 178:
Introduction 149 applications). For
Page 179 and 180:
FCWD(∆E ) FCWD(0) ∆E = 0 ∆E
Page 181 and 182:
Introduction 153 Equations [6]-[12]
Page 183 and 184:
Paradigm of Free Energy Surfaces 15
Page 185 and 186:
Paradigm of Free Energy Surfaces 15
Page 187 and 188:
In Eqs. [18] and [19], F0i is the e
Page 189 and 190:
where ‘‘þ’’ and ‘‘ ’
Page 191 and 192:
is, however, small for the usual co
Page 193 and 194:
solute-solvent coupling through the
Page 195 and 196:
where Z is the electrode overpotent
Page 197 and 198:
energy surfaces of ET. 33,50-56 It
Page 199 and 200:
This result indicates a fundamental
Page 201 and 202:
βF i (X ) 40 20 0 −20 1 2 βλ 1
Page 203 and 204:
Table 1 Main Features of the Two-Pa
Page 205 and 206:
and Hss ¼ U rep ss 1 2 X j;k ðmj
Page 207 and 208:
λ i /eV 4 3 2 1 0 0 10 20 30 40 Bo
Page 209 and 210:
the width/Stokes shift relation (Eq
Page 211 and 212:
Table 3 Mapping of the Q Model on S
Page 213 and 214:
This situation is of course not sat
Page 215 and 216:
F ± (Y ad )/λ I 0.8 0.4 0 −0.4
Page 217 and 218:
it by choosing the GMH basis set 7
Page 219 and 220:
Anharmonic higher order terms gain
Page 221 and 222:
of the radiation is the perturbatio
Page 223 and 224:
individual vibrational excitations
Page 225 and 226:
m 12 /D 6 5.5 5 4.5 4 3.5 17 18 19
Page 227 and 228:
In Eq. [144], the coordinates Y km
Page 229 and 230:
ε/M −1 cm −1 4000 2000 0 analy
Page 231 and 232:
βσ 2 /10 3 cm −1 14 12 10 8 Opt
Page 233 and 234:
0.2 0.1 0 12 16 20 24 28 32 The app
Page 235 and 236:
References 207 7. M. D. Newton, Adv
Page 237 and 238:
References 209 59. R. Kubo and Y. T
show all

Reviews in Computational Chemistry Volume 18

Create successful ePaper yourself

Delete template?

Save as template?