12.07.2015 Views

A particular Gaussian mixture model for clustering and its ...

A particular Gaussian mixture model for clustering and its ...

A particular Gaussian mixture model for clustering and its ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

H. Sahbimaximizing a likelihood function of data memberships <strong>and</strong> isequivalent to finding the parameters of the cluster hyperplanesby solving a QP problem. We will show that this QPcan be tackled efficiently by solving trivial linear programmingsub-problems. Notice that when solving this QP, thenumber of clusters (denoted C), fixed initially, might be overestimated<strong>and</strong> this leads to several overlapping clusters; there<strong>for</strong>ethe actual clusters are found as constellations of highlycorrelated hyperplanes in the mapping space H.In the remainder of this paper, we refer to a cluster as aset of data gathered using an algorithm while a class (or acategory) is the actual membership of this data according toa well defined ground truth. Among notations, i <strong>and</strong> j st<strong>and</strong><strong>for</strong> data indices while k, c <strong>and</strong> l st<strong>and</strong> <strong>for</strong> cluster indices.Other notations will be introduced as we go along throughdifferent sections of this paper which is organized as following:in Sect. 2 we provide a short reminder on GMMs followedby our <strong>clustering</strong> minimization problem in Sect. 3.InSect. 4 we show that this minimization problem can be solvedas a succession of simple linear programming subproblemswhich are h<strong>and</strong>led efficiently. We present in Sect. 5 experimentson simple as well as challenging problems in contentbased image retrieval. Finally, we conclude in Sect. 6 <strong>and</strong> weprovide some directions <strong>for</strong> a future work.2 A short reminder on GMMsGiven a training set S N of size N, we are interested in a<strong>particular</strong> <strong>Gaussian</strong> <strong>mixture</strong> <strong>model</strong> (denoted M k ) Bishop(1995) where the number of <strong>its</strong> components is equal to thesize of the training set. The parameters of M k are denotedΘ k ={Θ i|k , i = 1,...,N}. HereΘ i|k st<strong>and</strong>s <strong>for</strong> the ithcomponent parameter of M k , i.e., the mean <strong>and</strong> the covariancematrices of a <strong>Gaussian</strong> density function. In this case,the output of the likelihood GMM function related to a clustery k is a weighted sum of N component densities:Fig. 1 This figure shows a <strong>particular</strong> GMM <strong>model</strong> where the centers<strong>and</strong> the variances are fixed. The only free parameters are the GMM<strong>mixture</strong> coefficients <strong>and</strong> the scale σσ ∈ R + ). Now, each cluster y k is <strong>model</strong>ed as a GMMwhere the only free parameters are the <strong>mixture</strong> coefficients{µ ik }; the means <strong>and</strong> the covariances are assumed constantbut of course dependent on a priori knowledge of the trainingset (see Fig. 1 <strong>and</strong> Sect. 5).3 ClusteringThe goal of a <strong>clustering</strong> algorithm is to make clusters, containingdata from different classes, as different as possible whilekeeping data from the same classes as close as possible totheir actual clusters. Usually this implies the optimization ofan objective function (see <strong>for</strong> instance Frigui <strong>and</strong> Krishnapuram1997) involving a fidelity term which measures thefitness of each training sample to <strong>its</strong> <strong>model</strong> <strong>and</strong> a regularizerwhich reduces the number of clusters, i.e., the complexity ofthe <strong>model</strong>.3.1 Our <strong>for</strong>mulation2σp(x|y k ) =N∑iP(Θ i|k |y k ) p(x|y k ,Θ i|k ), (1)Following notations <strong>and</strong> definitions in Sect. 2, agivenx ∈R p is assigned to a cluster y k if:here P(Θ i|k |y k ) (also denoted µ ik ) is the prior probability ofthe ith component parameter Θ i|k given the cluster y k <strong>and</strong>p(x|y k ,Θ i|k ) = N k (x; Θ i|k ) is the normal density functionof the ith component. In order to guarantee that p(x|y k ) isa density function, the <strong>mixture</strong> parameters {µ ik } are chosensuch that ∑ i µ ik = 1. Usually the training of GMMs canbe <strong>for</strong>mulated as a maximum likelihood problem where theparameters Θ k <strong>and</strong> the <strong>mixture</strong> parameters {µ ik } are estimatedusing expectation maximization (Dempster et al. 1977;Bishop 1995).We consider in our work, N k (x; Θ i|k ) with a fixed meanx i ∈ S N <strong>and</strong> covariance matrix σ Σ i (Σ i ∈ R p×p ,y k= arg maxy lp(x|y l ), (2)here the weights µ ={µ il }, of the GMM functions p(x|y l )l = 1,...,C, are found by solving the following constrainedminimization problem (see motivation below):⎛ ⎞∑min µ ik⎝ ∑ p(x i |y l ) ⎠µk,i l̸=ks.t. ∑ µ ic = 1, µ ic ∈[0, 1], i = 1,...,N, (3)ic = 1,...,C123

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!