14.03.2014 Views

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 18 Clustering Data 479<br />

Self Organizing Maps<br />

The Cluster Comparison report gives fit statistics to compare different numbers of clusters. For KMeans<br />

Clustering <strong>and</strong> Self Organizing Maps, the fit statistic is CCC (Cubic Clustering Criterion). For Normal<br />

Mixtures, the fit statistic is BIC or AICc. Robust Normal Mixtures does not provide a fit statistic.<br />

For details on the red-triangle options for Self Organizing Maps, see “K-Means Platform Options” on<br />

page 472.<br />

Implementation Technical Details<br />

The SOM implementation in JMP proceeds as follows:<br />

• The first step is to obtain good initial cluster seeds that provide a good coverage of the multidimensional<br />

space. JMP uses principal components to determine the two directions which capture the most variation<br />

in the data.<br />

• JMP then lays out a grid in this principal component space with its edges 2.5 st<strong>and</strong>ard deviations from<br />

the middle in each direction. The clusters seeds are formed by translating this grid back into the original<br />

space of the variables.<br />

• The cluster assignment proceeds as with k-means, with each point assigned to the cluster closest to it.<br />

• The means are estimated for each cluster as in k-means. JMP then uses these means to set up a weighted<br />

regression with each variable as the response in the regression, <strong>and</strong> the SOM grid coordinates as the<br />

regressors. The weighting function uses a ‘kernel’ function that gives large weight to the cluster whose<br />

center is being estimated, with smaller weights given to clusters farther away from the cluster in the<br />

SOM grid. The new cluster means are the predicted values from this regression.<br />

• These iterations proceed until the process has converged.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!