19.02.2013 Views

Reviews in Computational Chemistry Volume 18

Reviews in Computational Chemistry Volume 18

Reviews in Computational Chemistry Volume 18

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

28 Cluster<strong>in</strong>g Methods and Their Uses <strong>in</strong> <strong>Computational</strong> <strong>Chemistry</strong><br />

is the OPTICS method that orders items <strong>in</strong> a data set <strong>in</strong> terms of local criteria,<br />

thus provid<strong>in</strong>g an equivalent to a density-based analysis.<br />

A variety of different requirements exist for chemical applications. These<br />

requirements dictate whether it is important to address the issues of how many<br />

clusters exist, what the best partition is, and which the best clusters are. When<br />

us<strong>in</strong>g representative sampl<strong>in</strong>g, for example, for high-throughput screen<strong>in</strong>g <strong>in</strong><br />

pharmaceutical research, the number of required clusters is usually set beforehand.<br />

Hence, it is necessary to generate only a reasonable partition from which<br />

to extract the required number of representative compounds. For analysis of<br />

an unknown data set <strong>in</strong>, say, a list of vendor compounds, the number of clusters<br />

is unknown. Hierarchical cluster<strong>in</strong>g with optimum level analysis should<br />

provide suitable results for this scenario s<strong>in</strong>ce the actual composition of<br />

each cluster is not critical. For analysis of quantitative structure–activity relationships<br />

(QSAR), the number of clusters is unknown, and the quality of the<br />

clusters becomes an important issue s<strong>in</strong>ce complete clusters are required for<br />

further analysis. It may be that recent developments 87–93 related to densitybased<br />

cluster<strong>in</strong>g will help <strong>in</strong> this circumstance.<br />

CHEMICAL APPLICATIONS<br />

Hav<strong>in</strong>g <strong>in</strong>troduced and described the various k<strong>in</strong>ds of cluster<strong>in</strong>g methods<br />

used <strong>in</strong> chemistry and other discipl<strong>in</strong>es, we are <strong>in</strong> a position to present some<br />

illustrative examples of chemical applications. This section reviews a representative<br />

selection of publications that have reported or analyzed the use of cluster<strong>in</strong>g<br />

methods for process<strong>in</strong>g chemical data sets, largely from groups of<br />

scientists work<strong>in</strong>g with<strong>in</strong> pharmaceutical companies. The ma<strong>in</strong> applications<br />

for these scientists are high-throughput screen<strong>in</strong>g, comb<strong>in</strong>atorial chemistry,<br />

compound acquisition, and QSAR. The emphasis is on pharmaceutical applications<br />

because these workers tend to process very large and high dimensional<br />

data sets. This section is presented accord<strong>in</strong>g to method, start<strong>in</strong>g with hierarchical<br />

and then mov<strong>in</strong>g to nonhierarchical methods.<br />

Little has been reported on the use of hierarchical divisive methods for<br />

process<strong>in</strong>g chemical data sets (other than the <strong>in</strong>clusion of the m<strong>in</strong>imumdiameter<br />

method <strong>in</strong> some of the comparative studies mentioned above).<br />

Recursive partition<strong>in</strong>g, which is a supervised classification technique very<br />

closely related to monothetic divisive cluster<strong>in</strong>g, has, however, been used at<br />

the GlaxoSmithKl<strong>in</strong>e 57,58 and Organon 59 companies.<br />

There is, however, widespread use of hierarchical agglomerative techniques,<br />

particularly the Ward method. At Organon, Bayada, Hamersma,<br />

and van Geereste<strong>in</strong> 121 compared Ward cluster<strong>in</strong>g with the MaxM<strong>in</strong> diversity<br />

selection method, Kohonen maps, and a simple partition<strong>in</strong>g method to help<br />

select diverse yet representative subsets of compounds for further test<strong>in</strong>g.<br />

The data came from HTS or comb<strong>in</strong>atorial library results. Ward cluster<strong>in</strong>g

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!