14.03.2014 Views

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

462 Clustering Data Chapter 18<br />

The Cluster Launch Dialog<br />

• For each cluster, a new center is formed using every observation with its probability of membership as a<br />

weight. This is the maximization step.<br />

This process continues alternating between the expectation <strong>and</strong> maximization steps until the clusters<br />

become stable.<br />

Note: For k-means clustering, you can choose a variable whose values form preset fixed centers for clusters,<br />

instead of using the default r<strong>and</strong>om seeds for clusters.<br />

The Cluster Launch Dialog<br />

When you choose Cluster from the Analyze > <strong>Multivariate</strong> <strong>Methods</strong> submenu, the Hierarchical Cluster<br />

Launch dialog shown in Figure 18.1 appears. The data table used is Birth Death Subset.jmp.<br />

Choose KMeans from the Options menu to see the KMeans launch dialog. See “K-Means Clustering” on<br />

page 469 for more information about the KMeans method.<br />

Figure 18.1 Hierarchical Cluster Launch Dialog<br />

You can specify as many Y variables as you want by selecting the variables in the column selector list <strong>and</strong><br />

clicking Y, Columns.<br />

For Hierarchical clustering, select Hierarchical from the Options list <strong>and</strong> then select one of the clustering<br />

distance options: Average, Centroid, Ward, Single, <strong>and</strong> Complete, <strong>and</strong> Fast Ward. The clustering methods<br />

differ in how the distance between two clusters is computed. These clustering methods are discussed under<br />

“Technical Details for Hierarchical Clustering” on page 467.<br />

By default, data are first st<strong>and</strong>ardized by the column mean <strong>and</strong> st<strong>and</strong>ard deviation. Uncheck the<br />

St<strong>and</strong>ardize Data check box if you do not want the cluster distances computed on st<strong>and</strong>ardized values.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!