10.07.2015 Views

Dimension Reduction for Model-based Clustering via Mixtures of ...

Dimension Reduction for Model-based Clustering via Mixtures of ...

Dimension Reduction for Model-based Clustering via Mixtures of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 3MethodologyThere are several approaches to using mixture models in a clustering context which werediscussed in Chapter 2. McNicholas (2011) presents a review <strong>of</strong> work in model-<strong>based</strong>clustering with a particular focus on two families <strong>of</strong> Gaussian mixture models, namelyMCLUST (Fraley and Raftery, 2002) and PGMM (McNicholas and Murphy, 2008).3.1 <strong>Dimension</strong>-reduction <strong>for</strong> <strong>Mixtures</strong> <strong>of</strong> MultivariateGaussian Distributions (GMMDR)Scrucca (2010) proposed a novel approach to model-<strong>based</strong> clustering, namely the dimensionreduction<strong>for</strong> mixtures <strong>of</strong> multivariate Gaussian distributions (GMMDR). The main ideais to find a reduced subspace which captures most <strong>of</strong> the clustering structure in the data.By following the work <strong>of</strong> Li (1991, 2000) on sliced inverse regression (SIR) one can obtainin<strong>for</strong>mation on the dimension reduction subspace from two sources:• the variation on group means;• the variation on group covariances (depending on the estimated mixture model).Classical procedures <strong>for</strong> reducing the dimensions in the data are principal componentsanalysis and factor analysis. These techniques lower dimensionality by <strong>for</strong>ming linearcombinations <strong>of</strong> the variables. In terms <strong>of</strong> visualizing any potential clustering structure,neither method is particularly useful.The proposed method reduces dimensionality by identifying a set <strong>of</strong> linear combinations<strong>of</strong> the original variables, ordered by importance <strong>via</strong> their associated eigenvalues,which capture most <strong>of</strong> the cluster structure in the data. Observations are then projected14

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!