14.03.2014 Views

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 18<br />

Clustering Data<br />

Using the Cluster Platform<br />

Clustering is the technique of grouping rows together that share similar values across a number of variables.<br />

It is a wonderful exploratory technique to help you underst<strong>and</strong> the clumping structure of your data. JMP<br />

provides three different clustering methods:<br />

• Hierarchical clustering is appropriate for small tables, up to several thous<strong>and</strong> rows. It combines rows in a<br />

hierarchical sequence portrayed as a tree. In JMP, the tree, also called a dendrogram, is a dynamic,<br />

responding graph. You can choose the number of clusters that you like after the tree is built.<br />

• K-means clustering is appropriate for larger tables, up to hundreds of thous<strong>and</strong>s of rows. It makes a fairly<br />

good guess at cluster seed points. It then starts an iteration of alternately assigning points to clusters <strong>and</strong><br />

recalculating cluster centers. You have to specify the number of clusters before you start the process.<br />

• Normal mixtures are appropriate when data is assumed to come from a mixture of multivariate normal<br />

distributions that overlap. Maximum likelihood is used to estimate the mixture proportions <strong>and</strong> the<br />

means, st<strong>and</strong>ard deviations, <strong>and</strong> correlations jointly. This approach is particularly good at estimating the<br />

total counts in each group. However, each point, rather than being classified into one group, is assigned<br />

a probability of being in each group. The EM algorithm is used to obtain estimates.<br />

After the clustering process is complete, you can save the cluster assignments to the data table or use them to<br />

set colors <strong>and</strong> markers for the rows.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!