14.03.2014 Views

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 18 Clustering Data 465<br />

Hierarchical Clustering<br />

The scree plot beneath the dendrogram has a point for each cluster join. The ordinate is the distance that<br />

was bridged to join the clusters at each step. Often there is a natural break where the distance jumps up<br />

suddenly. These breaks suggest natural cutting points to determine the number of clusters.<br />

Open the Clustering History table to see the results shown in Figure 18.3.<br />

Figure 18.3 Clustering History<br />

The number of clusters begins with 32, which is the number of rows in the data table minus one. You can<br />

see that the two closest points, Germany <strong>and</strong> Italy, are joined to reduce the number of existing clusters to 32.<br />

They show as the first Leader <strong>and</strong> Joiner in the Clustering History table. The next two closest points are<br />

Pol<strong>and</strong> <strong>and</strong> United Kingdom, followed by China <strong>and</strong> Thail<strong>and</strong>. When Australia is joined by China in the<br />

eighth line, China had already been joined by Thail<strong>and</strong>, making it the third cluster with three points. The<br />

last single point to be joined to others is Afghanistan, which reduces the number of clusters from nine to<br />

eight at that join. At the very end, a cluster of two points led by Afghanistan is joined by the rest of the<br />

points, led by Algeria. The order of the clusters at each join is unimportant, essentially an accident of the<br />

way the data was sorted.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!