14.03.2014 Views

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 18 Clustering Data 473<br />

Normal Mixtures<br />

Normal Mixtures<br />

Normal mixtures is an iterative technique, but rather than being a clustering method to group rows, it is<br />

more of an estimation method to characterize the cluster groups. Rather than classifying each row into a<br />

cluster, it estimates the probability that a row is in each cluster. See McLachlan <strong>and</strong> Krishnan (1997).<br />

The normal mixtures approach to clustering predicts the proportion of responses expected within each<br />

cluster. The assumption is that the joint probability distribution of the measurement columns can be<br />

approximated using a mixture of multivariate normal distributions, which represent different clusters. The<br />

distributions have mean vectors <strong>and</strong> covariance matrices for each cluster.<br />

Hierarchical <strong>and</strong> k-means clustering methods work well when clusters are well separated, but when clusters<br />

overlap, assigning each point to one cluster is problematic. In the overlap areas, there are points from several<br />

clusters sharing the same space. It is especially important to use normal mixtures rather than k-means<br />

clustering if you want an accurate estimate of the total population in each group, because it is based on<br />

membership probabilities, rather than arbitrary cluster assignments based on borders.<br />

To perform Normal Mixtures, select that option on the Method menu of the Iterative Clustering Control<br />

Panel (Figure 18.5). After selecting Normal Mixtures, the control panel looks like Figure 18.7.<br />

Figure 18.7 Normal Mixtures Control Panel<br />

Some of the options on the panel are described in “K-Means Control Panel” on page 470. The other options<br />

are described below:<br />

Diagonal Variance is used to constrain the off-diagonal elements of the covariance matrix to zero. In this<br />

case, the platform fits multivariate normal distributions that have no correlations between the variables.<br />

This is sometimes necessary in order to avoid getting a singular covariance matrix, when there are fewer<br />

observations than columns.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!