14.03.2014 Views

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

476 Clustering Data Chapter 18<br />

Normal Mixtures<br />

Complete Tours is the number of times to restart the estimation process. This helps to guard against the<br />

process finding a local solution.<br />

Initial Guesses is the number of r<strong>and</strong>om starts within each tour. R<strong>and</strong>om starting values for the<br />

parameters are used for each new start.<br />

Max Iterations is the maximum number of iterations during the convergence stage. The convergence<br />

stage starts after all tours are complete. It begins at the optimal result out of all the starts <strong>and</strong> tours, <strong>and</strong><br />

from there converges to a final solution.<br />

Platform Options<br />

For details on the red-triangle options for Normal Mixtures <strong>and</strong> Robust Normal Mixtures, see “K-Means<br />

Platform Options” on page 472.<br />

Details of the Estimation Process<br />

Normal Mixtures uses the EM algorithm to do fitting because it is more stable than the Newton-Raphson<br />

algorithm. Additionally we're using a Bayesian regularized version of the EM algorithm, which allows us to<br />

smoothly h<strong>and</strong>le cases where the covariance matrix is singular. Since the estimates are heavily dependent on<br />

initial guesses, the platform will go through a number of tours, each with r<strong>and</strong>omly selected points as initial<br />

centers.<br />

Doing multiple tours makes the estimation process somewhat expensive, so considerable patience is required<br />

for large problems. Controls allow you to specify the tour <strong>and</strong> iteration limits.<br />

Additional Details for Robust Normal Mixtures<br />

Because Normal Mixtures is sensitive to outliers, JMP offers an outlier robust alternative called Robust<br />

Normal Mixtures. This uses a robust method of estimating the normal parameters. JMP computes the<br />

estimates via maximum likelihood with respect to a mixture of Huberized normal distributions (a class of<br />

modified normal distributions that was tailor-made to be more outlier resistant than the normal<br />

distribution).<br />

The Huberized Gaussian distribution has pdf Φ k<br />

( x)<br />

.<br />

Φ k<br />

( x)<br />

=<br />

---------------------------<br />

exp( – ρ( x)<br />

)<br />

c k<br />

ρ( x)<br />

=<br />

<br />

x 2<br />

---- if x ≤ k<br />

2<br />

<br />

k 2<br />

kx<br />

– ---- if x > k<br />

2<br />

c k<br />

=<br />

exp( – k 2 ⁄ 2)<br />

2πΦk [ ( ) – Φ( – k)<br />

] + 2----------------------------<br />

k

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!