Algorithms for Gaussian Bandwidth Selection in Kernel Density ...

evaluated on the point left. The model evaluated on each training sample has theform:ˆp θ (x i ) = 1 N∑G(x i − x j |θ) (2)N − 1j=1j≠iwhere we make explicit the use of a Gaussian kernel. This framework was firstproposed in [4] and later studied by other authors [2], [5]. However, these studieslack a closed optimization procedure, so that the bandwidth σ 2 is obtained bya greedy tuning along its possible values. Besides, the multivariate case is onlyconsidered in these previous works under a spherical kernel assumption. In thispaper, proposed two algorithms that overcome these difficulties.In a multidimensional Gaussian kernel, the set of parameters consists of the covariancematrix of the Gaussian. In the following, we consider two different degreesof complexity assumed for this matrix: a spherical shape, so that C = σ 2 I D -onlyone parameter to adjust-, and an unconstrained kernel, in which a general form isconsidered for C with D(D + 1)/2 parameters.Sections 2 and 3 describe the bandwidth optimization for the both cases mentionedas presented in [6] and establish their convergence conditions. Some classificationexperiments are presented in Section 4 to measure the accuracy of the models.Section 5 closes the paper with the most important conclusions.2 The spherical caseThe expression for the kernel function is, for the spherical case:(G ij (σ 2 ) = G(x i − x j |σ 2 ) = (2π) −D/2 σ −D exp − 1)2σ 2 ‖x i − x j ‖ 2We want to find the σ that maximizes the log-likelihood log L(X|σ 2 ) =∑i log ˆp θ(x i ). The derivative of this likelihood is:∇ σ log L(X|σ 2 ) = 1 ∑ 1 ∑( ‖xi − x j ‖ 2N − 1 ˆp(xi i ) σ 3 − D σWe now search for the point that makes the derivative null:∑ 1 ∑ ‖x i − x j ‖ 2ˆp(x i ) σ 3 G ij (σ 2 ) = ∑ 1 D ∑G ij (σ 2 ) =ˆp(xi i ) σij≠ij≠ij≠i)G ij (σ 2 )N(N − 1)DσThe ∑ second equality has been obtained by the fact that, by definition,j≠i G ij = (N − 1)ˆp(x i ). Then we obtain the following fixed-point algorithm:σ 2 t+1 =1N(N − 1)D∑i1ˆp t (x i )∑‖x i − x j ‖ 2 G ij (σt 2 ) (3)where ˆp t denotes the KDE obtained in iteration t, i.e. the one that makes use ofthe width σ 2 t .We prove the convergence of the algorithm in (3) by means of the following convergencetheorem:j≠i

Previous page

Next page

2

3

4

5

6

Algorithms for Gaussian Bandwidth Selection in Kernel Density ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?