A Tutorial on Support Vector Machines for Pattern Recognition

Recommendations

Info

12L D in Eq. (16), subject to i 0 and also subject to the equality constraint, Eq. (15).Our strategy is to simply solve the problem as though there were no inequality constraints.If the resulting solution does in fact satisfy i 0 8i, then we willhave found the generalsolution, since the actual maximum of L D will then lie in the feasible region, provided theequality constraint, Eq. (15), is also met. In order to impose the equality constraint weintroduce an additional Lagrange multiplier . Thus we seek to maximizeL D n+1Xi=1 i ; 1 2n+1Xij=1Xn+1 i H ij j ; where we have introduced the Hessiani=1 i y i (27)H ij y i y j x i x j : (28)Setting @LD@ i=0gives(H) i + y i =1 8i (29)Now H has a very simple structure: the o-diagonal elements are ;y i y j R 2 =n, and thediagonal elements are R 2 . The fact that all the o-diagonal elements dier only by factorsof y i suggests looking for a solution which has the form: i = 1+yi2a + 1 ; yi2b (30)where a and b are unknowns. Plugging this form in Eq. (29) gives: n +1 a + b; y ip a + b= 1 ; y i(31)n 2 n 2 R 2where p is dened byp n+1Xi=1y i : (32)Thusa + b =2nR 2 (n +1)and substituting this into the equality constraint Eq. (15) to nd a, b gives na =1 ; pn b =1+ pR 2 (n +1) n +1 R 2 (n +1) n +1which gives for the solution i =nR 2 (n +1)1 ; y ipn +1(33)(34)(35)Also,(H) i =1;y ipn +1 : (36)
13Hencekwk 2 ==n+1Xij=1n+1Xi=1 i j y i y j x i x j = T H i1 ; y ipn +1=n+1Xi=1 i = nR 2 2!p1 ;n +1(37)Note that this is one of those cases where the Lagrange multiplier can remain undetermined(although determining it is trivial). We have now solved the problem, since all the i are clearly positive orzero (in fact the i will only be zero if all training points havethe same class). Note that kwk depends only on the number of positive (negative) polaritypoints, and not on how the class labels are assigned to the points in Eq. (22). This is clearlynot true of w itself, which isgiven byw =nR 2 (n +1)n+1Xi=1y i ;pn +1The margin, M =2=kwk, isthus given byM =x i (38)2Rpn (1 ; (p=(n +1))2 ) : (39)Thus when the number of points n +1iseven, the minimum margin occurs when p =0(equal numbers of positive and negative examples), in which case the margin is M min =2R= p n. If n + 1 is odd, the minimum margin occurs when p = 1, in which case M min =2R(n +1)=(n p n + 2). In both cases, the maximum margin is given by M max = R(n +1)=n.Thus, for example, for the two dimensional simplex consisting of three points lying on S 1(and spanning R 2 ), and with labeling such that not all three points have the same polarity,the maximum and minimum margin are both 3R=2 (see Figure (12)).Note that the results of this Section amount to an alternative, constructive proof that theVC dimension of oriented separating hyperplanes in R n is at least n +1.3.4. Test PhaseOnce we have trained a Support Vector Machine, how canwe use it? We simply determineon which side of the decision boundary (that hyperplane lying half way between H 1 and H 2and parallel to them) a given test pattern x lies and assign the corresponding class label,i.e. we take the class of x to be sgn(w x + b).3.5. The Non-Separable CaseThe above algorithm for separable data, when applied to non-separable data, will nd nofeasible solution: this will be evidenced by the objective function (i.e. the dual Lagrangian)growing arbitrarily large. So how canwe extend these ideas to handle non-separable data?We would like to relax the constraints (10) and (11), but only when necessary, that is, wewould like tointroduce a further cost (i.e. an increase in the primal objective function) fordoing so. This can be done by introducing positive slack variables i i =1 l in theconstraints (Cortes and Vapnik, 1995), which then become:
Page 2: 2and Smola, 1996). In most of these
Page 7 and 8: 7is not valid, nearest neighbour cl
Page 9 and 10: 9Origin-b|w|wH 1H 2MarginFigure 5.L
Page 11: 11which i 6= 0 and computing b (no
Page 15 and 16: 15@L P@ i= C ; i ; i =0 (50)y i (
Page 17 and 18: 17K(x i x j )=e ;kxi;xjk2 =2 2 : (
Page 19 and 20: 19if and only if, for any g(x) such
Page 21 and 22: 214.3. Some Examples of Nonlinear S
Page 23 and 24: 23y i (w x i + b ) = y i ((1 ; )
Page 25 and 26: 25We also use a \sticky faces" algo
Page 27 and 28: 27Proof: If the minimal embedding s
Page 29 and 30: 29Figure 11. Gaussian RBF SVMs of s
Page 31 and 32: 31Thus for n even the maximum numbe
Page 33 and 34: 33with solution given byC = X i i (
Page 35 and 36: 358. LimitationsPerhaps the biggest
Page 37 and 38: 37Pi =1 0 i 1. Then w x+b = P i
Page 39 and 40: 39from the set can be found which a
Page 41 and 42: 414. Given the name \test set," per
Page 43: 43Edgar Osuna and Federico Girosi.

A Tutorial on Support Vector Machines for Pattern Recognition

Create successful ePaper yourself

Delete template?

Save as template?