14.03.2014 Views

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 18 Clustering Data 475<br />

Normal Mixtures<br />

The Cluster Comparison report gives fit statistics to compare different numbers of clusters. For KMeans<br />

Clustering <strong>and</strong> Self Organizing Maps, the fit statistic is CCC (Cubic Clustering Criterion). For Normal<br />

Mixtures, the fit statistic is BIC or AICc. Robust Normal Mixtures does not provide a fit statistic.<br />

Robust Normal Mixtures<br />

The Robust Normal Mixtures option is available if you suspect you may have outliers in the multivariate<br />

sense. Since regular Normal Mixtures is sensitive to outliers, the Robust Normal Mixtures option uses a<br />

more robust method for estimating the parameters. For details, see “Additional Details for Robust Normal<br />

Mixtures” on page 476.<br />

To perform Robust Normal Mixtures, select that option on the Method menu of the Iterative Clustering<br />

Control Panel (Figure 18.5). After selecting Robust Normal Mixtures, the control panel looks like<br />

Figure 18.9.<br />

Figure 18.9 Normal Mixtures Control Panel<br />

Some of the options on the panel are described in “K-Means Control Panel” on page 470. The other options<br />

are described below:<br />

Diagonal Variance is used to constrain the off-diagonal elements of the covariance matrix to zero. In this<br />

case, the platform fits multivariate normal distributions that have no correlations between the variables.<br />

This is sometimes necessary in order to avoid getting a singular covariance matrix, when there are fewer<br />

observations than columns.<br />

Huber Coverage is a number between 0 <strong>and</strong> 1. Robust Normal Mixtures protects against outliers by<br />

down-weighting them. Huber Coverage can be loosely thought of as the proportion of the data that is<br />

not considered outliers, <strong>and</strong> not down-weighted. Values closer to 1 result in a larger proportion of the<br />

data not being down-weighted. In other words, values closer to 1 protect only against the most extreme<br />

outliers. Values closer to 0 result in a smaller proportion of the data not being down-weighted, <strong>and</strong> may<br />

falsely consider less extreme data points to be outliers.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!