11.07.2015 Views

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

9Origin-b|w|wH 1H 2MarginFigure 5.Linear separating hyperplanes <strong>for</strong> the separable case. The support vectors are circled.the c<strong>on</strong>straint equati<strong>on</strong>s are multiplied by positive Lagrange multipliers and subtractedfrom the objective functi<strong>on</strong>, to <strong>for</strong>m the Lagrangian. For equality c<strong>on</strong>straints, the Lagrangemultipliers are unc<strong>on</strong>strained. This gives Lagrangian:L P 1 2 kwk2 ;lXi=1 i y i (x i w + b)+lXi=1 i (13)We must now minimize L P with respect to w b, and simultaneously require that thederivatives of L P with respect to all the i vanish, all subject to the c<strong>on</strong>straints i 0(let's call this particular set of c<strong>on</strong>straints C 1 ). Now this is a c<strong>on</strong>vex quadratic programmingproblem, since the objective functi<strong>on</strong> is itself c<strong>on</strong>vex, and those points which satisfy thec<strong>on</strong>straints also <strong>for</strong>m a c<strong>on</strong>vex set (any linear c<strong>on</strong>straint denes a c<strong>on</strong>vex set, and a set ofN simultaneous linear c<strong>on</strong>straints denes the intersecti<strong>on</strong> of N c<strong>on</strong>vex sets, which isalsoa c<strong>on</strong>vex set). This means that we can equivalently solve the following \dual" problem:maximize L P , subject to the c<strong>on</strong>straints that the gradient ofL P with respect to w and bvanish, and subject also to the c<strong>on</strong>straints that the i 0 (let's call that particular set ofc<strong>on</strong>straints C 2 ). This particular dual <strong>for</strong>mulati<strong>on</strong> of the problem is called the Wolfe dual(Fletcher, 1987). It has the property that the maximum of L P , subject to c<strong>on</strong>straints C 2 ,occurs at the same values of the w, b and , as the minimum of L P , subject to c<strong>on</strong>straintsC 18 .Requiring that the gradient ofL P with respect to w and b vanish give the c<strong>on</strong>diti<strong>on</strong>s:w = X i i y i x i (14)Xi i y i =0: (15)Since these are equality c<strong>on</strong>straints in the dual <strong>for</strong>mulati<strong>on</strong>, we can substitute them intoEq. (13) to giveL D = X i i ; 1 2Xij i j y i y j x i x j (16)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!