11.07.2015 Views

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

13Hencekwk 2 ==n+1Xij=1n+1Xi=1 i j y i y j x i x j = T H i1 ; y ipn +1=n+1Xi=1 i = nR 2 2!p1 ;n +1(37)Note that this is <strong>on</strong>e of those cases where the Lagrange multiplier can remain undetermined(although determining it is trivial). We have now solved the problem, since all the i are clearly positive orzero (in fact the i will <strong>on</strong>ly be zero if all training points havethe same class). Note that kwk depends <strong>on</strong>ly <strong>on</strong> the number of positive (negative) polaritypoints, and not <strong>on</strong> how the class labels are assigned to the points in Eq. (22). This is clearlynot true of w itself, which isgiven byw =nR 2 (n +1)n+1Xi=1y i ;pn +1The margin, M =2=kwk, isthus given byM =x i (38)2Rpn (1 ; (p=(n +1))2 ) : (39)Thus when the number of points n +1iseven, the minimum margin occurs when p =0(equal numbers of positive and negative examples), in which case the margin is M min =2R= p n. If n + 1 is odd, the minimum margin occurs when p = 1, in which case M min =2R(n +1)=(n p n + 2). In both cases, the maximum margin is given by M max = R(n +1)=n.Thus, <strong>for</strong> example, <strong>for</strong> the two dimensi<strong>on</strong>al simplex c<strong>on</strong>sisting of three points lying <strong>on</strong> S 1(and spanning R 2 ), and with labeling such that not all three points have the same polarity,the maximum and minimum margin are both 3R=2 (see Figure (12)).Note that the results of this Secti<strong>on</strong> amount to an alternative, c<strong>on</strong>structive proof that theVC dimensi<strong>on</strong> of oriented separating hyperplanes in R n is at least n +1.3.4. Test PhaseOnce we have trained a <strong>Support</strong> <strong>Vector</strong> Machine, how canwe use it? We simply determine<strong>on</strong> which side of the decisi<strong>on</strong> boundary (that hyperplane lying half way between H 1 and H 2and parallel to them) a given test pattern x lies and assign the corresp<strong>on</strong>ding class label,i.e. we take the class of x to be sgn(w x + b).3.5. The N<strong>on</strong>-Separable CaseThe above algorithm <strong>for</strong> separable data, when applied to n<strong>on</strong>-separable data, will nd nofeasible soluti<strong>on</strong>: this will be evidenced by the objective functi<strong>on</strong> (i.e. the dual Lagrangian)growing arbitrarily large. So how canwe extend these ideas to handle n<strong>on</strong>-separable data?We would like to relax the c<strong>on</strong>straints (10) and (11), but <strong>on</strong>ly when necessary, that is, wewould like tointroduce a further cost (i.e. an increase in the primal objective functi<strong>on</strong>) <strong>for</strong>doing so. This can be d<strong>on</strong>e by introducing positive slack variables i i =1 l in thec<strong>on</strong>straints (Cortes and Vapnik, 1995), which then become:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!