11.07.2015 Views

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

23y i (w x i + b ) = y i ((1 ; )(w 0 x i + b 0 )+(w 1 x i + b 1 )) (1 ; )(1 ; i )+(1 ; i )=1; i (77)Although simple, this theorem is quite instructive. For example, <strong>on</strong>e might think that theproblems depicted in Figure 10 have several dierent optimal soluti<strong>on</strong>s (<strong>for</strong> the case of linearsupport vector machines). However, since <strong>on</strong>e cannot smoothly move thehyperplane from<strong>on</strong>e proposed soluti<strong>on</strong> to another without generating hyperplanes which are not soluti<strong>on</strong>s,we know that these proposed soluti<strong>on</strong>s are in fact not soluti<strong>on</strong>s at all. In fact, <strong>for</strong> eachof these cases, the optimal unique soluti<strong>on</strong> is at w = 0, with a suitable choice of b (whichhas the eect of assigning the same label to all the points). Note that this is a perfectlyacceptable soluti<strong>on</strong> to the classicati<strong>on</strong> problem: any proposed hyperplane (with w 6= 0)will cause the primal objective functi<strong>on</strong> to take a higher value.Figure 10. Two problems, with proposed (incorrect) n<strong>on</strong>-unique soluti<strong>on</strong>s.Finally, note that the fact that SVM training always nds a global soluti<strong>on</strong> is in c<strong>on</strong>trastto the case of neural networks, where many local minima usually exist.5. Methods of Soluti<strong>on</strong>The support vector optimizati<strong>on</strong> problem can be solved analytically <strong>on</strong>ly when the numberof training data is very small, or <strong>for</strong> the separable case when it is known be<strong>for</strong>ehand whichof the training data become support vectors (as in Secti<strong>on</strong>s 3.3 and 6.2). Note that this canhappen when the problem has some symmetry (Secti<strong>on</strong> 3.3), but that it can also happenwhen it does not (Secti<strong>on</strong> 6.2). For the general analytic case, the worst case computati<strong>on</strong>alcomplexity is of order N 3 S (inversi<strong>on</strong> of the Hessian), where N S is the number of supportvectors, although the two examples given both have complexity of O(1).However, in most real world cases, Equati<strong>on</strong>s (43) (with dot products replaced by kernels),(44), and (45) must be solved numerically. For small problems, any general purposeoptimizati<strong>on</strong> package that solves linearly c<strong>on</strong>strained c<strong>on</strong>vex quadratic programs will do. Agood survey of the available solvers, and where to get them, can be found 16 in (More andWright, 1993).For larger problems, a range of existing techniques can be brought to bear. A full explorati<strong>on</strong>of the relative merits of these methods would ll another tutorial. Here we justdescribe the general issues, and <strong>for</strong> c<strong>on</strong>creteness, give a brief explanati<strong>on</strong> of the techniquewe currently use. Below, a \face" means a set of points lying <strong>on</strong> the boundary of the feasibleregi<strong>on</strong>, and an \active c<strong>on</strong>straint" is a c<strong>on</strong>straint <strong>for</strong>which theequality holds. For more

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!