Reviews in Computational Chemistry Volume 18
Reviews in Computational Chemistry Volume 18
Reviews in Computational Chemistry Volume 18
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
28 Cluster<strong>in</strong>g Methods and Their Uses <strong>in</strong> <strong>Computational</strong> <strong>Chemistry</strong><br />
is the OPTICS method that orders items <strong>in</strong> a data set <strong>in</strong> terms of local criteria,<br />
thus provid<strong>in</strong>g an equivalent to a density-based analysis.<br />
A variety of different requirements exist for chemical applications. These<br />
requirements dictate whether it is important to address the issues of how many<br />
clusters exist, what the best partition is, and which the best clusters are. When<br />
us<strong>in</strong>g representative sampl<strong>in</strong>g, for example, for high-throughput screen<strong>in</strong>g <strong>in</strong><br />
pharmaceutical research, the number of required clusters is usually set beforehand.<br />
Hence, it is necessary to generate only a reasonable partition from which<br />
to extract the required number of representative compounds. For analysis of<br />
an unknown data set <strong>in</strong>, say, a list of vendor compounds, the number of clusters<br />
is unknown. Hierarchical cluster<strong>in</strong>g with optimum level analysis should<br />
provide suitable results for this scenario s<strong>in</strong>ce the actual composition of<br />
each cluster is not critical. For analysis of quantitative structure–activity relationships<br />
(QSAR), the number of clusters is unknown, and the quality of the<br />
clusters becomes an important issue s<strong>in</strong>ce complete clusters are required for<br />
further analysis. It may be that recent developments 87–93 related to densitybased<br />
cluster<strong>in</strong>g will help <strong>in</strong> this circumstance.<br />
CHEMICAL APPLICATIONS<br />
Hav<strong>in</strong>g <strong>in</strong>troduced and described the various k<strong>in</strong>ds of cluster<strong>in</strong>g methods<br />
used <strong>in</strong> chemistry and other discipl<strong>in</strong>es, we are <strong>in</strong> a position to present some<br />
illustrative examples of chemical applications. This section reviews a representative<br />
selection of publications that have reported or analyzed the use of cluster<strong>in</strong>g<br />
methods for process<strong>in</strong>g chemical data sets, largely from groups of<br />
scientists work<strong>in</strong>g with<strong>in</strong> pharmaceutical companies. The ma<strong>in</strong> applications<br />
for these scientists are high-throughput screen<strong>in</strong>g, comb<strong>in</strong>atorial chemistry,<br />
compound acquisition, and QSAR. The emphasis is on pharmaceutical applications<br />
because these workers tend to process very large and high dimensional<br />
data sets. This section is presented accord<strong>in</strong>g to method, start<strong>in</strong>g with hierarchical<br />
and then mov<strong>in</strong>g to nonhierarchical methods.<br />
Little has been reported on the use of hierarchical divisive methods for<br />
process<strong>in</strong>g chemical data sets (other than the <strong>in</strong>clusion of the m<strong>in</strong>imumdiameter<br />
method <strong>in</strong> some of the comparative studies mentioned above).<br />
Recursive partition<strong>in</strong>g, which is a supervised classification technique very<br />
closely related to monothetic divisive cluster<strong>in</strong>g, has, however, been used at<br />
the GlaxoSmithKl<strong>in</strong>e 57,58 and Organon 59 companies.<br />
There is, however, widespread use of hierarchical agglomerative techniques,<br />
particularly the Ward method. At Organon, Bayada, Hamersma,<br />
and van Geereste<strong>in</strong> 121 compared Ward cluster<strong>in</strong>g with the MaxM<strong>in</strong> diversity<br />
selection method, Kohonen maps, and a simple partition<strong>in</strong>g method to help<br />
select diverse yet representative subsets of compounds for further test<strong>in</strong>g.<br />
The data came from HTS or comb<strong>in</strong>atorial library results. Ward cluster<strong>in</strong>g