11.07.2015 Views

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

414. Given the name \test set," perhaps we should also use \train set" but the hobbyists got thererst.5. We use the term \oriented hyperplane" to emphasize that the mathematical object c<strong>on</strong>sideredis the pair fH ng, where H is the set of points which lie in the hyperplane and n is a particularchoice <strong>for</strong> the unit normal. Thus fH ng and fH ;ng are dierent oriented hyperplanes.6. Such a set of m points (which spananm ; 1 dimensi<strong>on</strong>al subspace of a linear space) are saidto be \in general positi<strong>on</strong>" (Kolmogorov, 1970). The c<strong>on</strong>vex hull of a set of m points in generalpositi<strong>on</strong> denes an m ; 1 dimensi<strong>on</strong>al simplex, the vertices of which are the points themselves.7. The derivati<strong>on</strong> of the bound assumes that the empirical risk c<strong>on</strong>verges uni<strong>for</strong>mly to the actualrisk as the number of training observati<strong>on</strong>s increases (Vapnik, 1979). A necessary and sucientc<strong>on</strong>diti<strong>on</strong> <strong>for</strong> this is that lim l!1 H(l)=l = 0, where l is the number of training samples andH(l) istheVCentropy of the set of decisi<strong>on</strong> functi<strong>on</strong>s (Vapnik, 1979 Vapnik, 1995). For anyset of functi<strong>on</strong>s with innite VC dimensi<strong>on</strong>, the VC entropy isl log 2: hence <strong>for</strong> these classiers,the required uni<strong>for</strong>m c<strong>on</strong>vergence does not hold, and so neither does the bound.8. There is a nice geometric interpretati<strong>on</strong> <strong>for</strong> the dual problem: it is basically nding the twoclosest points of c<strong>on</strong>vex hulls of the two sets. See (Bennett and Bredensteiner, 1998).9. One can dene the torque to be; 1 :::n;2 = i :::n x n;1 Fn (A.16)where repeated indices are summed over <strong>on</strong> the right hand side, and where is the totallyantisymmetric tensor with 1:::n =1. (Recall that Greek indices are used to denote tensorcomp<strong>on</strong>ents). The sum of torques <strong>on</strong> the decisi<strong>on</strong> sheet is then:XX 1 :::n s in;1 F in = 1 :::n s in;1 iy i ^w n = 1 :::n w n;1 ^wn =0(A.17)ii10. In the original <strong>for</strong>mulati<strong>on</strong> (Vapnik, 1979) they were called \extreme vectors."11. By \decisi<strong>on</strong> functi<strong>on</strong>" we mean a functi<strong>on</strong> f(x) whose sign represents the class assigned todata point x.12. By \intrinsic dimensi<strong>on</strong>" we mean the number of parameters required to specify a point <strong>on</strong> themanifold.13. Alternatively <strong>on</strong>e can argue that, given the <strong>for</strong>m of the soluti<strong>on</strong>, the possible w must lie in asubspace of dimensi<strong>on</strong> l.14. Work in preparati<strong>on</strong>.15. Thanks to A. Smola <strong>for</strong> pointing this out.16. Many thanks to <strong>on</strong>e of the reviewers <strong>for</strong> pointing this out.17. The core quadratic optimizer is about 700 lines of C++. The higher level code (to handlecaching of dot products, chunking, IO, etc) is quite complex and c<strong>on</strong>siderably larger.18. Thanks to L. Kaufman <strong>for</strong> providing me with these results.19. Recall that the \ceiling" sign de means \smallest integer greater than or equal to." Also, thereis a typo in the actual <strong>for</strong>mula given in (Vapnik, 1995), which Ihave corrected here.20. Note, <strong>for</strong> example, that the distance between every pair of vertices of the symmetric simplexis the same: see Eq. (26). However, a rigorous proof is needed, and as far as I know islacking.21. Thanks to J. Shawe-Taylor <strong>for</strong> pointing this out.22. V. Vapnik, Private Communicati<strong>on</strong>.23. There is an alternative bound <strong>on</strong>e might use, namely that corresp<strong>on</strong>ding to the set of totallybounded n<strong>on</strong>-negative functi<strong>on</strong>s (Equati<strong>on</strong> (3.28) in (Vapnik, 1995)). However, <strong>for</strong> loss functi<strong>on</strong>staking the value zero or <strong>on</strong>e, and if the empirical risk is zero, this bound is looser thanthat in Eq. (3) whenever h(log(2l=h)+1);log(=4) > 1=16, which is the case here.l24. V. Blanz, Private Communicati<strong>on</strong>References10 M.A. Aizerman, E.M. Braverman, and L.I. Roz<strong>on</strong>er. Theoretical foundati<strong>on</strong>s of the potential functi<strong>on</strong>method in pattern recogniti<strong>on</strong> learning. Automati<strong>on</strong> and Remote C<strong>on</strong>trol, 25:821{837, 1964.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!