Reviews in Computational Chemistry Volume 18
Reviews in Computational Chemistry Volume 18
Reviews in Computational Chemistry Volume 18
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
32 Cluster<strong>in</strong>g Methods and Their Uses <strong>in</strong> <strong>Computational</strong> <strong>Chemistry</strong><br />
report, 135 discussed the use of cluster<strong>in</strong>g methods to assist <strong>in</strong> HTS, and then<br />
outl<strong>in</strong>ed the use at Parke-Davis of Jarvis–Patrick cluster<strong>in</strong>g to assist traditional,<br />
low-throughput screen<strong>in</strong>g. The aim of the Parke-Davis group was to<br />
generate a representative subset of no more than 2000 compounds selected<br />
from about 126,000 compounds <strong>in</strong> the Parke-Davis corporate database so<br />
that they could be used <strong>in</strong> a particularly labor-<strong>in</strong>tensive cell-based assay.<br />
Jarvis–Patrick cluster<strong>in</strong>g was run to generate an <strong>in</strong>itial set of 25,000 nons<strong>in</strong>gleton<br />
clusters. The compounds closest to the centroids were reclustered<br />
to give about 2,300 clusters. The compounds closest to these centroids were<br />
then analyzed manually provid<strong>in</strong>g a f<strong>in</strong>al selection of about 1,400 compounds.<br />
An <strong>in</strong>terest<strong>in</strong>g feature of this process was that s<strong>in</strong>gletons were rejected at each<br />
stage, rather than be<strong>in</strong>g assigned to the nearest nons<strong>in</strong>gleton cluster (as at<br />
Pfizer, UK) or be<strong>in</strong>g reclustered separately (as <strong>in</strong> the cascaded cluster<strong>in</strong>g<br />
method used at Rhone-Poulenc Rorer).<br />
Jarvis–Patrick cluster<strong>in</strong>g has also been used to support QSAR analysis<br />
<strong>in</strong> a system developed at the European Communities Jo<strong>in</strong>t Research<br />
Center. 7,138–140 The EINECS (European Inventory of Exist<strong>in</strong>g Chemical Substances)<br />
database conta<strong>in</strong>s more than 100,000 compounds and has been clustered<br />
us<strong>in</strong>g 2D structural descriptors. That database also has associated<br />
physicochemical properties and activities, but the data is very sparse. Jarvis–<br />
Patrick cluster<strong>in</strong>g was used to extract clusters conta<strong>in</strong><strong>in</strong>g sufficient compounds<br />
with measured data for an attempt to be made to estimate the properties of<br />
members of the cluster lack<strong>in</strong>g the data. For a few clusters, it was used to<br />
develop reasonable QSAR models.<br />
An example of how use of k-means cluster<strong>in</strong>g can be used for QSAR analysis<br />
of small data sets is that by Lawson and Jurs 141 who clustered a set of 143<br />
acrylates from the ToSCA (Toxic Substances Control Act) <strong>in</strong>ventory. For large<br />
chemical data sets, the sem<strong>in</strong>al paper is that published by Higgs et al., 79 at Eli<br />
Lilly and Company. These authors exam<strong>in</strong>ed three methods of subset selection<br />
to assist their HTS and development of comb<strong>in</strong>atorial libraries. The three<br />
methods were k-means, MaxM<strong>in</strong>, and D-optimal design. Seed compounds<br />
were selected by the MaxM<strong>in</strong> method, and the k-means algorithm was implemented<br />
on parallel hardware. This research was part of the compound acquisition<br />
strategy to support HTS. The Lilly group used an extensive system of<br />
filters to ensure that selected compounds were pharmaceutically acceptable.<br />
No recommendations were offered <strong>in</strong> the paper as to the best method.<br />
The use of a topographic cluster<strong>in</strong>g method for chemical data sets is<br />
exemplified by the work of Sadowski, Wagener, and Gasteiger. 142 The authors<br />
compared three comb<strong>in</strong>atorial libraries us<strong>in</strong>g Kohonen mapp<strong>in</strong>g. Each compound<br />
with<strong>in</strong> a library was represented by a 12-element autocorrelation vector<br />
(a sort of 3D-QSAR descriptor). The vectors were used as <strong>in</strong>put to a 50 50<br />
Kohonen network. Mapp<strong>in</strong>g the comb<strong>in</strong>atorial libraries onto the same network<br />
placed each compound from the library at a particular node <strong>in</strong> the network.<br />
A 2D display of the positions of each compound revealed the degree of