13.07.2015 Views

View - Statistics - University of Washington

View - Statistics - University of Washington

View - Statistics - University of Washington

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

14first <strong>of</strong> these can be done by a human, or by various automatic methods suchas nonparametric maximum likelihood using the Voronoi tesselation (Allard andFraley, 1997), or Kth nearest neighbor clutter removal (Byers and Raftery, 1998).This step does not need to be perfect, since CEM-PCC will examine the noisepoints to determine if they should be included in the features, and vice versa.Once the noise points have been removed, we need an initial clustering <strong>of</strong> thefeature points so that a curve can be fit to each cluster. We recommend that therebe at least seven points in each cluster; when there are fewer than seven, we fit aprincipal component line instead <strong>of</strong> a curve. The feature points can be clusteredusing model-based clustering as implemented in the MCLUST s<strong>of</strong>tware (Banfieldand Raftery, 1993; Dasgupta and Raftery, 1998; Fraley and Raftery, 1998). Asimpler method is to fit a minimum spanning tree to the feature points and cutthe longest edges, which will work well if the main clusters are well separated(Roeder and Wasserman, 1997; Zahn, 1971).2.3.2 Hierarchical Principal Curve Clustering (HPCC)Clustering on closed principal curves was introduced by Banfield and Raftery(1992). Their clustering criterion (V ∗ ) is based on a weighted sum <strong>of</strong> the squareddistances about the curve and the squared distances along the curve, and they statethat it is optimal when the data points are normally distributed about the curve(conditional on the estimated curves and assuming that α is chosen properly). Itis defined byV ∗ = V About + αV Along ,where V About = ∑ Nj=1 (x j − f(λ j )) 2 , V Along = 1 2∑ Nj=1(ɛ j − ¯ɛ) 2 , and ɛ j = f(λ j ) −f(λ j+1 ).The V About term measures the spread <strong>of</strong> observations about the curve (in or-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!