11.07.2015 Views

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

19if and <strong>on</strong>ly if, <strong>for</strong> any g(x) suchthatZg(x) 2 dx is nite (66)thenZK(x y)g(x)g(y)dxdy 0: (67)Note that <strong>for</strong> specic cases, it may not be easy to check whether Mercer's c<strong>on</strong>diti<strong>on</strong> issatised. Eq. (67) must hold <strong>for</strong> every g with nite L 2 norm (i.e. which satises Eq. (66)).However, we can easily prove that the c<strong>on</strong>diti<strong>on</strong> is satised <strong>for</strong> positive integral powers ofthe dot product: K(x y) =(x y) p . We must show thatZ(dXi=1x i y i ) p g(x)g(y)dxdy 0: (68)The typical term in the multinomial expansi<strong>on</strong> of ( P di=1 x iy i ) p c<strong>on</strong>tributes a term of the<strong>for</strong>mZp!r 1 !r 2 ! (p ; r 1 ; r 2 )!x r11 xr2 2 yr1 1 yr2 2g(x)g(y)dxdy (69)to the left hand side of Eq. (67), which factorizes:Zp!=r 1 !r 2 ! (p ; r 1 ; r 2 )! ( x r11 xr2 2 g(x)dx)2 0: (70)One simple c<strong>on</strong>sequence is that anykernel which can be expressed as K(x y) = P 1p=0 c p(xy) p , where the c p are positive real coecients and the series is uni<strong>for</strong>mly c<strong>on</strong>vergent, satisesMercer's c<strong>on</strong>diti<strong>on</strong>, a fact also noted in (Smola, Scholkopf and Muller, 1998b).Finally, what happens if <strong>on</strong>e uses a kernel which does not satisfy Mercer's c<strong>on</strong>diti<strong>on</strong>? Ingeneral, there may exist data such that the Hessian is indenite, and <strong>for</strong> which the quadraticprogramming problem will have no soluti<strong>on</strong> (the dual objective functi<strong>on</strong> can become arbitrarilylarge). However, even <strong>for</strong> kernels that do not satisfy Mercer's c<strong>on</strong>diti<strong>on</strong>, <strong>on</strong>e mightstill nd that a given training set results in a positive semidenite Hessian, in which case thetraining will c<strong>on</strong>verge perfectly well. In this case, however, the geometrical interpretati<strong>on</strong>described above is lacking.4.2. Some Notes <strong>on</strong> and HMercer's c<strong>on</strong>diti<strong>on</strong> tells us whether or not a prospective kernel is actually a dot productin some space, but it does not tell us how to c<strong>on</strong>struct or even what H is. However, aswith the homogeneous (that is, homogeneous in the dot product in L) quadratic polynomialkernel discussed above, we can explicitly c<strong>on</strong>struct the mapping <strong>for</strong> some kernels. In Secti<strong>on</strong>6.1 we show howEq. (62) can be extended to arbitrary homogeneous polynomial kernels,. Thus <strong>for</strong>and that the corresp<strong>on</strong>ding space H is a Euclidean space of dimensi<strong>on</strong> ; d+p;1pexample, <strong>for</strong> a degree p = 4 polynomial, and <strong>for</strong> data c<strong>on</strong>sisting of 16 by 16 images (d=256),dim(H) is 183,181,376.Usually, mapping your data to a \feature space" with an enormous number of dimensi<strong>on</strong>swould bode ill <strong>for</strong> the generalizati<strong>on</strong> per<strong>for</strong>mance of the resulting machine. After all, the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!