A particular Gaussian mixture model for clustering and its ...

More documents

Recommendations

Info

H. Sahbimaximizing a likelihood function of data memberships and isequivalent to finding the parameters of the cluster hyperplanesby solving a QP problem. We will show that this QPcan be tackled efficiently by solving trivial linear programmingsub-problems. Notice that when solving this QP, thenumber of clusters (denoted C), fixed initially, might be overestimatedand this leads to several overlapping clusters; thereforethe actual clusters are found as constellations of highlycorrelated hyperplanes in the mapping space H.In the remainder of this paper, we refer to a cluster as aset of data gathered using an algorithm while a class (or acategory) is the actual membership of this data according toa well defined ground truth. Among notations, i and j standfor data indices while k, c and l stand for cluster indices.Other notations will be introduced as we go along throughdifferent sections of this paper which is organized as following:in Sect. 2 we provide a short reminder on GMMs followedby our clustering minimization problem in Sect. 3.InSect. 4 we show that this minimization problem can be solvedas a succession of simple linear programming subproblemswhich are handled efficiently. We present in Sect. 5 experimentson simple as well as challenging problems in contentbased image retrieval. Finally, we conclude in Sect. 6 and weprovide some directions for a future work.2 A short reminder on GMMsGiven a training set S N of size N, we are interested in aparticular Gaussian mixture model (denoted M k ) Bishop(1995) where the number of its components is equal to thesize of the training set. The parameters of M k are denotedΘ k ={Θ i|k , i = 1,...,N}. HereΘ i|k stands for the ithcomponent parameter of M k , i.e., the mean and the covariancematrices of a Gaussian density function. In this case,the output of the likelihood GMM function related to a clustery k is a weighted sum of N component densities:Fig. 1 This figure shows a particular GMM model where the centersand the variances are fixed. The only free parameters are the GMMmixture coefficients and the scale σσ ∈ R + ). Now, each cluster y k is modeled as a GMMwhere the only free parameters are the mixture coefficients{µ ik }; the means and the covariances are assumed constantbut of course dependent on a priori knowledge of the trainingset (see Fig. 1 and Sect. 5).3 ClusteringThe goal of a clustering algorithm is to make clusters, containingdata from different classes, as different as possible whilekeeping data from the same classes as close as possible totheir actual clusters. Usually this implies the optimization ofan objective function (see for instance Frigui and Krishnapuram1997) involving a fidelity term which measures thefitness of each training sample to its model and a regularizerwhich reduces the number of clusters, i.e., the complexity ofthe model.3.1 Our formulation2σp(x|y k ) =N∑iP(Θ i|k |y k ) p(x|y k ,Θ i|k ), (1)Following notations and definitions in Sect. 2, agivenx ∈R p is assigned to a cluster y k if:here P(Θ i|k |y k ) (also denoted µ ik ) is the prior probability ofthe ith component parameter Θ i|k given the cluster y k andp(x|y k ,Θ i|k ) = N k (x; Θ i|k ) is the normal density functionof the ith component. In order to guarantee that p(x|y k ) isa density function, the mixture parameters {µ ik } are chosensuch that ∑ i µ ik = 1. Usually the training of GMMs canbe formulated as a maximum likelihood problem where theparameters Θ k and the mixture parameters {µ ik } are estimatedusing expectation maximization (Dempster et al. 1977;Bishop 1995).We consider in our work, N k (x; Θ i|k ) with a fixed meanx i ∈ S N and covariance matrix σ Σ i (Σ i ∈ R p×p ,y k= arg maxy lp(x|y l ), (2)here the weights µ ={µ il }, of the GMM functions p(x|y l )l = 1,...,C, are found by solving the following constrainedminimization problem (see motivation below):⎛ ⎞∑min µ ik⎝ ∑ p(x i |y l ) ⎠µk,i l̸=ks.t. ∑ µ ic = 1, µ ic ∈[0, 1], i = 1,...,N, (3)ic = 1,...,C123
A particular Gaussian mixture model for clustering and its application to image retrievalIn the above objective function, the training data belonging toaclustery l are assumed drawn from a GMM with a likelihoodfunction p(x|y l ). Each mixture parameter µ il stands for thedegree of membership (or the contribution) of the trainingexample x i to the cluster y l . The overall objective is to maximizethe membership of each training example to its actualcluster while keeping the memberships to the remaining clustersrelatively low. Usually, existing clustering methods (seefor instance Bezdek 1981) find the parameters {µ il } as thosewhich maximize the membership of each training exampleto its actual cluster. In contrast to these methods, our formulationproceeds using a dual principle; the purpose is tominimize the memberships of training examples to their nonactualclusters.Using (1), we can expand the objective function (3)as:minµ∑ ∑µ ik µ jl N l (x i ; Θ j|l ), (4)k̸=li, jhere N l (x i ; Θ j|k ) is the response of a Gaussian density function(also referred to as kernel), with fixed parameters Θ j|k ={x j ,σ Σ j }; x j is the mean, Σ j is the covariance and σ isthe scale. We will denote this kernel simply as K σ (‖x i −x j ‖). It is known that the Gaussian kernel is positive definite(Cristianini and Shawe-Taylor 2000) so this function correspondsto a scalar product in the mapping space H, i.e., thereis a mapping Φ σ from the input space into an infinite dimensionalspace such that K σ (‖x i − x j ‖) =〈Φ σ (x i ), Φ σ (x j )〉where 〈〉 stands for the inner product in H. At this stage, theresponse of a GMM function p(x|y k ) is equal to the scalarproduct 〈ω k ,Φ σ (x)〉 where ω k = ∑ i µ ik Φ σ (x i ) is the normalof a hyperplane in H (see Fig. 2). Now, the objectivefunction (4) can be rewritten:minµ∑〈ω k ,ω l 〉 (5)k̸=lThe above objective function minimizes the sum of hyperplanecorrelations taken pairwise among all different clusters.Now, we can derive the new form of the constrainedminimization problem (3):minµs.t.∑k,i∑l̸=k, jµ ik µ jl K σ (‖x i − x j ‖)∑µ ic = 1, µ ic ∈[0, 1], i = 1,...,N,ic = 1,...,CThis defines a constrained QP which can be solved using standardQP libraries (see for instance Vanderbei 1999). Whensolving this problem, training examples {x i } for which themixing parameter {µ ik } are positive will be referred to as theGMM vectors of the cluster y k (see Fig. 2).(6)p(x|y 1 ) = ∑ Niµ i1 K σ (‖x − x i ‖)= 〈ω 1 ,Φ σ (x)〉w 1w 2p(x|y 2 )Fig. 2 This figure shows the mapping of training samples into a highdimensional space. Data in the original space characterizes Gaussianblobs while in the mapping space they correspond to hyperplanes. TheGMM vectors are surrounded with circles and correspond to the centersof the Gaussian kernels for which the mixture parameters do not vanish4 TrainingThe number of parameters intervening in (6) isN × C, sofor clustering problems of reasonable size, for instance N =1.000 and C = 20, solving this QP, using standard packages,can quickly get out of hand. Chunking methods have beensuccessfully used to solve QP for large scale training problemssuch as SVM (Osuna et al. 1997). The idea consists insolving a QP problem using an active subset of parameters,referred to as a chunk, which is updated iteratively. When theQP is convex, by checking that the Gram matrix is positivedefinite, the process is guaranteed to converge to the globaloptimum after a sufficient number of iterations (Osuna et al.1997).Using the same principle as Osuna et al. (1997) and Platt(1999), we will show in this section that for a particular choiceof the active chunk, the QP (6) can be decomposed into linearprogramming subproblems each one can be solved trivially.4.1 DecompositionLet us fix one cluster index p ∈{1,...,C} and rewrite theobjective function (6)as:min 2 ∑ µi+ ∑here:i,k̸=pc ip = ∑j,l̸=pµ ip c ip∑j,l̸=k,pµ ik µ jl K σ (‖x i − x j ‖), (7)µ jl K σ (‖x i − x j ‖) (8)123
Page 1: Soft ComputDOI 10.1007/s00500-007-0
Page 5 and 6: A particular Gaussian mixture model

A particular Gaussian mixture model for clustering and its ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?