19.02.2013 Views

Reviews in Computational Chemistry Volume 18

Reviews in Computational Chemistry Volume 18

Reviews in Computational Chemistry Volume 18

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

32 Cluster<strong>in</strong>g Methods and Their Uses <strong>in</strong> <strong>Computational</strong> <strong>Chemistry</strong><br />

report, 135 discussed the use of cluster<strong>in</strong>g methods to assist <strong>in</strong> HTS, and then<br />

outl<strong>in</strong>ed the use at Parke-Davis of Jarvis–Patrick cluster<strong>in</strong>g to assist traditional,<br />

low-throughput screen<strong>in</strong>g. The aim of the Parke-Davis group was to<br />

generate a representative subset of no more than 2000 compounds selected<br />

from about 126,000 compounds <strong>in</strong> the Parke-Davis corporate database so<br />

that they could be used <strong>in</strong> a particularly labor-<strong>in</strong>tensive cell-based assay.<br />

Jarvis–Patrick cluster<strong>in</strong>g was run to generate an <strong>in</strong>itial set of 25,000 nons<strong>in</strong>gleton<br />

clusters. The compounds closest to the centroids were reclustered<br />

to give about 2,300 clusters. The compounds closest to these centroids were<br />

then analyzed manually provid<strong>in</strong>g a f<strong>in</strong>al selection of about 1,400 compounds.<br />

An <strong>in</strong>terest<strong>in</strong>g feature of this process was that s<strong>in</strong>gletons were rejected at each<br />

stage, rather than be<strong>in</strong>g assigned to the nearest nons<strong>in</strong>gleton cluster (as at<br />

Pfizer, UK) or be<strong>in</strong>g reclustered separately (as <strong>in</strong> the cascaded cluster<strong>in</strong>g<br />

method used at Rhone-Poulenc Rorer).<br />

Jarvis–Patrick cluster<strong>in</strong>g has also been used to support QSAR analysis<br />

<strong>in</strong> a system developed at the European Communities Jo<strong>in</strong>t Research<br />

Center. 7,138–140 The EINECS (European Inventory of Exist<strong>in</strong>g Chemical Substances)<br />

database conta<strong>in</strong>s more than 100,000 compounds and has been clustered<br />

us<strong>in</strong>g 2D structural descriptors. That database also has associated<br />

physicochemical properties and activities, but the data is very sparse. Jarvis–<br />

Patrick cluster<strong>in</strong>g was used to extract clusters conta<strong>in</strong><strong>in</strong>g sufficient compounds<br />

with measured data for an attempt to be made to estimate the properties of<br />

members of the cluster lack<strong>in</strong>g the data. For a few clusters, it was used to<br />

develop reasonable QSAR models.<br />

An example of how use of k-means cluster<strong>in</strong>g can be used for QSAR analysis<br />

of small data sets is that by Lawson and Jurs 141 who clustered a set of 143<br />

acrylates from the ToSCA (Toxic Substances Control Act) <strong>in</strong>ventory. For large<br />

chemical data sets, the sem<strong>in</strong>al paper is that published by Higgs et al., 79 at Eli<br />

Lilly and Company. These authors exam<strong>in</strong>ed three methods of subset selection<br />

to assist their HTS and development of comb<strong>in</strong>atorial libraries. The three<br />

methods were k-means, MaxM<strong>in</strong>, and D-optimal design. Seed compounds<br />

were selected by the MaxM<strong>in</strong> method, and the k-means algorithm was implemented<br />

on parallel hardware. This research was part of the compound acquisition<br />

strategy to support HTS. The Lilly group used an extensive system of<br />

filters to ensure that selected compounds were pharmaceutically acceptable.<br />

No recommendations were offered <strong>in</strong> the paper as to the best method.<br />

The use of a topographic cluster<strong>in</strong>g method for chemical data sets is<br />

exemplified by the work of Sadowski, Wagener, and Gasteiger. 142 The authors<br />

compared three comb<strong>in</strong>atorial libraries us<strong>in</strong>g Kohonen mapp<strong>in</strong>g. Each compound<br />

with<strong>in</strong> a library was represented by a 12-element autocorrelation vector<br />

(a sort of 3D-QSAR descriptor). The vectors were used as <strong>in</strong>put to a 50 50<br />

Kohonen network. Mapp<strong>in</strong>g the comb<strong>in</strong>atorial libraries onto the same network<br />

placed each compound from the library at a particular node <strong>in</strong> the network.<br />

A 2D display of the positions of each compound revealed the degree of

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!