Reviews in Computational Chemistry Volume 18

More documents

Recommendations

Info

Chemical Applications 31 exclusion subset selection methods 80 and the Reynolds system mentioned above. It also bears similarities with other methods, particularly the clustering of merged multiple random samples reported by Bradley and Fayyad. 115 The widespread application of the Jarvis–Patrick nonhierarchical method exists in part because of the influence of the publications by Willett et al. 5,108–110 but also because of the availability of the efficient commercial implementation from Daylight 14 for handling very large data sets. The first publication on the use of Jarvis–Patrick clustering for compound selection from large chemical data sets was from researchers who implemented it at Pfizer Central Research (UK). 134 Clustering was done using 2D fragment descriptors, with calculation of the list of 20 nearest neighbors using the efficient Perry–Willett inverted file approach. 35 After clustering the data set of about 240,000 compounds, singletons were moved to the most similar nonsingleton cluster, and representative compounds were then extracted by generating cluster centroids and selecting the compound closest to each centroid. Earlier in this chapter, we mentioned the cascaded Jarvis–Patrick 63 and fuzzy Jarvis–Patrick 64 variants. The cascaded Jarvis–Patrick method was implemented at Rhone-Poulenc Rorer (RPR) based on using Daylight 2D structural fingerprints and Daylight’s Jarvis–Patrick program. With this variant, singletons are reclustered using less strict parameters so that the singletons do not dominate the set of representative compounds selected. The applications reported by the RPR researchers 63 include selection of compounds from the corporate database for HTS and comparison of the corporate database with external databases, such as the Available Chemicals Directory, to assist in compound acquisition. The fuzzy Jarvis–Patrick variant was developed and implemented at G. D. Searle and Company for analysis of their compound database to help support their screening program. The Searle researchers 64 initially used the Daylight implementation but found the chaining and singleton characteristics of the standard method to be significant drawbacks. This in turn prompted them to develop a variant with different characteristics. McGregor and Pallai 135 discussed an in-house implementation of the standard Jarvis–Patrick algorithm at Procept Inc. They used the MDL 2D structural descriptors to compare and analyze external databases for efficient compound acquisition. Shemetulskis et al. 136 also reported the use of Jarvis– Patrick clustering to assist in compound acquisition at Parke-Davis, giving results from analysis and comparison of the CAST3D and Maybridge compound databases with the corporate database. In a two-stage process, representatives, comprising about a quarter of the compounds, were selected from each data set by clustering on the basis of 2D fingerprints. Each data set was then merged with the corporate database, and the clustering run again on the basis of calculated physicochemical property descriptors. Clusters containing only CAST3D or Maybridge compounds were tagged as highest priority for acquisition. Dunbar 137 summarized the compound acquisition
32 Clustering Methods and Their Uses in Computational Chemistry report, 135 discussed the use of clustering methods to assist in HTS, and then outlined the use at Parke-Davis of Jarvis–Patrick clustering to assist traditional, low-throughput screening. The aim of the Parke-Davis group was to generate a representative subset of no more than 2000 compounds selected from about 126,000 compounds in the Parke-Davis corporate database so that they could be used in a particularly labor-intensive cell-based assay. Jarvis–Patrick clustering was run to generate an initial set of 25,000 nonsingleton clusters. The compounds closest to the centroids were reclustered to give about 2,300 clusters. The compounds closest to these centroids were then analyzed manually providing a final selection of about 1,400 compounds. An interesting feature of this process was that singletons were rejected at each stage, rather than being assigned to the nearest nonsingleton cluster (as at Pfizer, UK) or being reclustered separately (as in the cascaded clustering method used at Rhone-Poulenc Rorer). Jarvis–Patrick clustering has also been used to support QSAR analysis in a system developed at the European Communities Joint Research Center. 7,138–140 The EINECS (European Inventory of Existing Chemical Substances) database contains more than 100,000 compounds and has been clustered using 2D structural descriptors. That database also has associated physicochemical properties and activities, but the data is very sparse. Jarvis– Patrick clustering was used to extract clusters containing sufficient compounds with measured data for an attempt to be made to estimate the properties of members of the cluster lacking the data. For a few clusters, it was used to develop reasonable QSAR models. An example of how use of k-means clustering can be used for QSAR analysis of small data sets is that by Lawson and Jurs 141 who clustered a set of 143 acrylates from the ToSCA (Toxic Substances Control Act) inventory. For large chemical data sets, the seminal paper is that published by Higgs et al., 79 at Eli Lilly and Company. These authors examined three methods of subset selection to assist their HTS and development of combinatorial libraries. The three methods were k-means, MaxMin, and D-optimal design. Seed compounds were selected by the MaxMin method, and the k-means algorithm was implemented on parallel hardware. This research was part of the compound acquisition strategy to support HTS. The Lilly group used an extensive system of filters to ensure that selected compounds were pharmaceutically acceptable. No recommendations were offered in the paper as to the best method. The use of a topographic clustering method for chemical data sets is exemplified by the work of Sadowski, Wagener, and Gasteiger. 142 The authors compared three combinatorial libraries using Kohonen mapping. Each compound within a library was represented by a 12-element autocorrelation vector (a sort of 3D-QSAR descriptor). The vectors were used as input to a 50 50 Kohonen network. Mapping the combinatorial libraries onto the same network placed each compound from the library at a particular node in the network. A 2D display of the positions of each compound revealed the degree of
Page 1 and 2:
Reviews in Computational Chemistry
Page 3 and 4:
Kenny B. Lipkowitz Department of Ch
Page 5 and 6:
vi Preface three-dimensional struct
Page 7 and 8:
viii Preface some descriptors and i
Page 9 and 10: Epilogue and Dedication My associat
Page 11 and 12: Contents 1. Clustering Methods and
Page 13 and 14: Contents xv Electron Transfer in Po
Page 15 and 16: Contributors John M. Barnard, Barna
Page 17 and 18: Contributors to Previous Volumes *
Page 19 and 20: Volume 3 (1992) Tamar Schlick, Opti
Page 21 and 22: Volume 7 (1996) Geoffrey M. Downs a
Page 23 and 24: Volume 11 (1997) Mark A. Murcko, Re
Page 25 and 26: T. Daniel Crawford* and Henry F. Sc
Page 27 and 28: Topics Covered in Volumes 1-18 * Ab
Page 29 and 30: Reviews in Computational Chemistry
Page 31 and 32: 2 Clustering Methods and Their Uses
Page 39 and 40: 10 Clustering Methods and Their Use
Page 59: 30 Clustering Methods and Their Use
Page 71 and 72: 42 The Use of Scoring Functions in
Page 79 and 80: Table 1 Reference List for the Most
Page 111 and 112:
82 The Use of Scoring Functions in
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
CHAPTER 3 Potentials and Algorithms
Page 119 and 120:
Vn (1 + cos(nω + γ)) 2 K θ (θ
Page 121 and 122:
are modified by their environment w
Page 123 and 124:
Table 1 Polarizability Parameters f
Page 125 and 126:
The polarizable point dipole models
Page 127 and 128:
two to three orders of magnitude sl
Page 129 and 130:
M i z i + q i d i k i and shell cha
Page 131 and 132:
Shell Models 103 on estimates of sh
Page 133 and 134:
minimization can be replaced by mor
Page 135 and 136:
The energy required to create a cha
Page 137 and 138:
for all i ði:e:; 8 iÞ: @U @qi l
Page 139 and 140:
where we have used q Cl ¼ qNa. The
Page 141 and 142:
Electronegativity Equalization Mode
Page 143 and 144:
Electronegativity Equalization Mode
Page 145 and 146:
of N molecules is taken as a Hartre
Page 147 and 148:
is treated using variable charges.
Page 149 and 150:
water have been developed, includin
Page 151 and 152:
Applications 123 classical and rigi
Page 153 and 154:
developing polarizable models. A va
Page 155 and 156:
Comparison of the Polarization Mode
Page 157 and 158:
negligible errors in such propertie
Page 159 and 160:
ecome significant at field strength
Page 161 and 162:
noteworthy in this regard because t
Page 163 and 164:
References 135 9. P. Cieplak and P.
Page 165 and 166:
References 137 48. E. L. Pollock an
Page 167 and 168:
References 139 Computational Chemis
Page 169 and 170:
References 141 139. J. Hinze and H.
Page 171 and 172:
References 143 178. J. J. P. Stewar
Page 173 and 174:
References 145 216. M. W. Mahoney a
Page 175 and 176:
CHAPTER 4 New Developments in the T
Page 177 and 178:
Introduction 149 applications). For
Page 179 and 180:
FCWD(∆E ) FCWD(0) ∆E = 0 ∆E
Page 181 and 182:
Introduction 153 Equations [6]-[12]
Page 183 and 184:
Paradigm of Free Energy Surfaces 15
Page 185 and 186:
Paradigm of Free Energy Surfaces 15
Page 187 and 188:
In Eqs. [18] and [19], F0i is the e
Page 189 and 190:
where ‘‘þ’’ and ‘‘ ’
Page 191 and 192:
is, however, small for the usual co
Page 193 and 194:
solute-solvent coupling through the
Page 195 and 196:
where Z is the electrode overpotent
Page 197 and 198:
energy surfaces of ET. 33,50-56 It
Page 199 and 200:
This result indicates a fundamental
Page 201 and 202:
βF i (X ) 40 20 0 −20 1 2 βλ 1
Page 203 and 204:
Table 1 Main Features of the Two-Pa
Page 205 and 206:
and Hss ¼ U rep ss 1 2 X j;k ðmj
Page 207 and 208:
λ i /eV 4 3 2 1 0 0 10 20 30 40 Bo
Page 209 and 210:
the width/Stokes shift relation (Eq
Page 211 and 212:
Table 3 Mapping of the Q Model on S
Page 213 and 214:
This situation is of course not sat
Page 215 and 216:
F ± (Y ad )/λ I 0.8 0.4 0 −0.4
Page 217 and 218:
it by choosing the GMH basis set 7
Page 219 and 220:
Anharmonic higher order terms gain
Page 221 and 222:
of the radiation is the perturbatio
Page 223 and 224:
individual vibrational excitations
Page 225 and 226:
m 12 /D 6 5.5 5 4.5 4 3.5 17 18 19
Page 227 and 228:
In Eq. [144], the coordinates Y km
Page 229 and 230:
ε/M −1 cm −1 4000 2000 0 analy
Page 231 and 232:
βσ 2 /10 3 cm −1 14 12 10 8 Opt
Page 233 and 234:
0.2 0.1 0 12 16 20 24 28 32 The app
Page 235 and 236:
References 207 7. M. D. Newton, Adv
Page 237 and 238:
References 209 59. R. Kubo and Y. T
show all

Reviews in Computational Chemistry Volume 18

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?