11.07.2015 Views

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

27Proof: If the minimal embedding space has dimensi<strong>on</strong> d H ,thend H points in the image ofL under the mapping can be found whose positi<strong>on</strong> vectors in H are linearly independent.From Theorem 1, these vectors can be shattered by hyperplanes in H. Thus by eitherrestricting ourselves to SVMs <strong>for</strong> the separable case (Secti<strong>on</strong> 3.1), or <strong>for</strong> which the errorpenalty C is allowed to take all values (so that, if the points are linearly separable, a C canbe found such that the soluti<strong>on</strong> does indeed separate them), the family of support vectormachines with kernel K can also shatter these points, and hence has VC dimensi<strong>on</strong> d H +1.Let's look at two examples.6.1. The VC Dimensi<strong>on</strong> <strong>for</strong> Polynomial KernelsC<strong>on</strong>sider an SVM with homogeneous polynomial kernel, acting <strong>on</strong> data in R dL :K(x 1 x 2 )=(x 1 x 2 ) p x 1 x 2 2 R dL (78)As in the case when d L = 2 and the kernel is quadratic (Secti<strong>on</strong> 4), <strong>on</strong>e can explicitlyc<strong>on</strong>struct the map . Letting z i = x 1i x 2i ,sothatK(x 1 x 2 )=(z 1 + + z dL ) p ,we see thateach dimensi<strong>on</strong> of H corresp<strong>on</strong>ds to a term with given powers of the z i in the expansi<strong>on</strong> ofK. In fact if we choose to label the comp<strong>on</strong>ents of (x) inthismanner,we can explicitlywrite the mapping <strong>for</strong> any p and d L : r1r 2r dL(x) =This leads tos p!r 1 !r 2 ! r dL !x r11 xr22 xr d Ld Ld L Xi=1r i = p r i 0 (79)Theorem 4 If the space in which the data live has dimensi<strong>on</strong> d L (i.e. L = R dL ), thedimensi<strong>on</strong> of the minimal embedding space, <strong>for</strong> ; homogeneous polynomial kernels of degree p(K(x 1 x 2 )=(x 1 x 2 ) p x 1 x 2 2 R dL d), is L+p;1 p .; dL+p;1p(The proof is in the Appendix). Thus the VC dimensi<strong>on</strong> of SVMs with these kernels is+1. As noted above, this gets very large very quickly.6.2. The VC Dimensi<strong>on</strong> <strong>for</strong> Radial Basis Functi<strong>on</strong> KernelsTheorem 5 C<strong>on</strong>sider the class of Mercer kernels <strong>for</strong> which K(x 1 x 2 ) ! 0 as kx 1 ; x 2 k!1, and <strong>for</strong> which K(x x) is O(1), and assume that the data can be chosen arbitrarily fromR d . Then the family of classiers c<strong>on</strong>sisting of support vector machines using these kernels,and <strong>for</strong> which the error penalty is allowed to take all values, has innite VC dimensi<strong>on</strong>.Proof: The kernel matrix, K ij K(x i x j ), is a Gram matrix (a matrix of dot products:see (Horn, 1985)) in H. Clearly we can choose training data such that all o-diag<strong>on</strong>alelements K i6=j can be made arbitrarily small, and by assumpti<strong>on</strong> all diag<strong>on</strong>al elements K i=jare of O(1). The matrix K is then of full rank hence the set of vectors, whose dot productsin H <strong>for</strong>m K, are linearly independent (Horn, 1985) hence, by Theorem 1, the points can beshattered by hyperplanes in H, and hence also by support vector machines with sucientlylarge error penalty. Since this is true <strong>for</strong> any nite number of points, the VC dimensi<strong>on</strong> ofthese classiers is innite.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!