11.07.2015 Views

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

39from the set can be found which assigns those labels to the points. The VC dimensi<strong>on</strong> of thatset of functi<strong>on</strong>s is then dened as the maximum number of points that can be so shattered.However, c<strong>on</strong>sider a slightly dierent deniti<strong>on</strong>. Let a set of points be shattered by a setof functi<strong>on</strong>s if <strong>for</strong> any choice of labels <strong>for</strong> the points, a functi<strong>on</strong> from the set can be foundwhich assigns the incorrect labels to all the points. Again let the VC dimensi<strong>on</strong> of that setof functi<strong>on</strong>s be dened as the maximum number of points that can be so shattered.It is in fact this sec<strong>on</strong>d deniti<strong>on</strong> (which we adopt from here <strong>on</strong>) that enters the VCbound proofs (Vapnik, 1979 Devroye, Gyor and Lugosi, 1996). Of course <strong>for</strong> functi<strong>on</strong>swhose range is f1g (i.e. all data will be assigned either positive or negative class), the twodeniti<strong>on</strong>s are the same. However,ifallpoints falling in some regi<strong>on</strong> are simply deemed tobe \errors", or \correct", the two deniti<strong>on</strong>s are dierent. As a c<strong>on</strong>crete example, supposewe dene \gap intolerant classiers", which are like gap tolerant classiers, but which labelall points lying in the margin or outside the sphere as errors. C<strong>on</strong>sider again the situati<strong>on</strong> inFigure 12, but assign positive class to all three points. Then a gap intolerant classier withmargin width greater than the ball diameter cannot shatter the points if we use the rstdeniti<strong>on</strong> of \shatter", but can shatter the points if we use the sec<strong>on</strong>d (correct) deniti<strong>on</strong>.With this caveat in mind, we now outline how theVC bounds can apply to functi<strong>on</strong>s withrange f1 0g, where the label 0 means that the point islabeled \correct." (The boundswill also apply to functi<strong>on</strong>s where 0 is dened to mean \error", but the corresp<strong>on</strong>ding VCdimensi<strong>on</strong> will be higher, weakening the bound, and in our case, making it useless). We willfollow the notati<strong>on</strong> of (Devroye, Gyor and Lugosi, 1996).C<strong>on</strong>sider points x 2 R d , and let p(x) denote a density <strong>on</strong>R d . Let be a functi<strong>on</strong> <strong>on</strong> R dwith range f1 0g, and let be a set of such functi<strong>on</strong>s. Let each x have an associatedlabel y x 2f1g. Let fx 1 x n g be any nite number of points in R d : then we require to have the property that there exists at least <strong>on</strong>e 2 such that (x i ) 2f1g 8x i . Forgiven , dene the set of points A byA = fx : y x =1(x) =;1g[fx : y x = ;1(x) =1g(A.9)We require that the be such that all sets A are measurable. Let A denote the set of allA.Deniti<strong>on</strong>: Let x i i = 1 n be n points. We dene the empirical risk <strong>for</strong> the setfx i g to be n (fx i g) =(1=n)nXi=1Ix i 2A:(A.10)where I is the indicator functi<strong>on</strong>. Note that the empirical risk is zero if (x i )=08 x i .Deniti<strong>on</strong>: We dene the actual risk <strong>for</strong> the functi<strong>on</strong> to be() =P (x 2 A):(A.11)Note also that those points x <strong>for</strong> which (x) = 0 do not c<strong>on</strong>tribute to the actual risk.Deniti<strong>on</strong>: For xed (x 1 x n ) 2 R d ,letN A be the number of dierent setsinffx 1 x n g\A : A 2Ag(A.12)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!