Algorithms for Gaussian Bandwidth Selection in Kernel Density ...

More documents

Recommendations

Info

We obtain:2E{x T x} − 2µ x T µ x = 2 tr{Σ x }limσ 2 →∞ g(σ2 ) = 2 tr{Σ x}DThe second point is then proved since the maximum value of g(σ 2 ) is reached at theinfinite.To demonstrate the last point, we compute the derivative of g ′ (σ 2 ) and check outthat it is positive:dg(σ 2 ) 1 ∑ ∑dσ 2 =2σ 4 ND=ij≠i12σ 4 N(N − 1) 2 Dd 2 ij∑k (d2 ij − d2 ik ) exp(− d2 ij +d2 ik2σ 2 )∑i( ∑ l≠i exp(− d2 il2σ)) 2 21 ∑ ∑ˆp l (x i ) 2 (d 2 ij − d 2 ik) 2 exp(− d2 ij + d2 ik2σ 2 ) ≥ 0j≠i k≠i,jThe existence of a unique fixed point is then proved. To demonstrate the convergenceof the algorithm in such interval, we need to check out the condition |g ′ (σ 2 )| < 1[7]. In that case, we are guaranteed that only a crossing point between g(σ 2 ) andthe line g(σ 2 ) = σ 2 exists. The convergence condition (4) means that the value of(6) is lesser than 1.3 The unconstrained caseThe general expression for a Gaussian kernel is:(G ij (C) = |2πC| −1/2 exp − 1 )2 (x i − x j ) T C −1 (x i − x j )(6)and its derivative w.r.t. C:∇ C G ij (C) = 1 2(C −1 (x i − x j )(x i − x j ) T − I ) C −1 G ij (C)As in the previous cases, we take the derivative of the log-likelihood and make itequal to zero:∑ 1 1 ∑ 1ˆp(xi i ) N − 1 2 C−1 (x i −x j )(x i −x j ) T C −1 G ij = ∑ ij≠i1 1 ∑ 1ˆp(x i ) N − 1 2 C−1 G ijj≠iBy multiplying both members by C, both at the right and the left, we obtain:∑ 1 ∑i − x j )(x i − x j )ˆp(xi i )j≠i(x T G ij = C ∑ 1 ∑G ijˆp(xi i )j≠iAfter some simplifications as in the spherical case, we reach the following fixed-pointalgorithm:1 ∑ 1 ∑C t+1 =(x i − x j )(x i − x j ) T G ij (C t ) (7)N(N − 1) ˆpi t (x i )j≠i
The expression in (7) suggests a relationship with the Expectation-Maximizationresult for Gaussian Mixture Models (GMM). A GMM is a PDF estimator given bythe expression ˆp(x) = ∑ Kk=1 α kG(x|µ k , C k ). The weights of the K components ofthe mixture are given by the α k , and each Gaussian is characterized by its meanvector µ k and its covariance matrix C k . The solution provided by the EM algorithmconsists of an iterative procedure where the parameters at step t are obtained by theones at step t − 1. To do so, a matrix of auxiliary variables is used, r ki = p(k|x i ),expressing the likelihood of the sample to belong to the k-th component of themixture. These probabilities must hold ∑ k r ki = 1. The EM solution establishesthe following updating rule for the covariance matrix at step t:C t k = ∑ k∑irkit (x i − µ t k )(x i − µ t k )TN(8)where the rki t and µt kare also iteratively updated. Note that our KDE model canbe considered as a special case of GMM where i) there are as many mixtures assamples (K = N) with the same weights α k = 1/N; ii) mean vectors are fixed:µ k = x k ; iii) the covariance matrix is the same for each of the components, and iv)r ki = 0 if k = i and r ki = 1/(N − 1) if k ≠ i.With these particularizations, the updating rule in (8) becomes equal to the onegiven by the iteration in (7).The EM guarantees the monotonic increase of the likelihood cost and so its convergenceto a local minimum, as proved in the literature [8]. The algorithm given in (7)is subject to the same conditions, so that its convergence is also proved. However,in situations in which N ≈ D, empirical covariance matrices are close to singularity,so that numerical problems may arise as in GMM design.4 Application to Parzen classificationWe have tested the performance of the obtained models on a set of public classificationproblems from [9]. For doing so, we apply the Parzen classifier, which performsthe simple Bayes criterion:ŷ = arg maxlˆp θl (x|c l )with per-class spherical (S-KDE) and unconstrained (U-KDE) models ˆp θl (x|c l ) optimizedaccording to the proposed method, being c l each of the L classes considered.We have compared these results with the ones obtained by other classificationmethods such as K-Nearest-Neighbors (KNN, with K=1) and the one-versus-therestSupport Vector Machine (SVM) with RBF kernel. The results are shown inData Train Test L D S-KDE U-KDE KNN SVMPima 738 - 2 8 71.22 75.13 73.18 76.47Wine 178 - 3 13 75.84 99.44 76.97 100Landsat 4435 2000 6 36 89.45 86.10 90.60 90.90Optdigits 3823 1797 10 64 97.89 93.54 94.38 98.22Letter 16000 4000 26 16 95.23 92.77 95.20 97.55Table 1: Classification performance on some public datasets. Leave-one-out accuracyis provided when there are not test data.
Page 2 and 3: evaluated on the point left. The mo
Page 6: Table 1. The most remarkable conclu

Algorithms for Gaussian Bandwidth Selection in Kernel Density ...

Create successful ePaper yourself

Delete template?

Save as template?