11.07.2015 Views

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

17K(x i x j )=e ;kxi;xjk2 =2 2 : (60)In this particular example, H is innite dimensi<strong>on</strong>al, so it would not be very easy to workwith explicitly. However, if <strong>on</strong>e replaces x i x j by K(x i x j )everywhere in the trainingalgorithm, the algorithm will happily produce a support vector machine which lives in aninnite dimensi<strong>on</strong>al space, and furthermore do so in roughly the same amount oftime itwould take to train <strong>on</strong> the un-mapped data. All the c<strong>on</strong>siderati<strong>on</strong>s of the previous secti<strong>on</strong>shold, since we are still doing a linear separati<strong>on</strong>, but in a dierent space.But how can we use this machine? After all, we need w, and that will live inH also (seeEq. (46)). But in test phase an SVM is used by computing dot products of a given testpoint x with w, or more specically by computing the sign off(x) =XN Si=1 i y i (s i ) (x)+b =XN Si=1 i y i K(s i x)+b (61)where the s i are the support vectors. So again we canavoid computing (x) explicitlyand use the K(s i x) =(s i ) (x) instead.Let us call the space in which the data live, L. (Here and below weuseL as a mnem<strong>on</strong>ic<strong>for</strong> \low dimensi<strong>on</strong>al", and H <strong>for</strong> \high dimensi<strong>on</strong>al": it is usually the case that the range ofisofmuch higher dimensi<strong>on</strong> than its domain). Note that, in additi<strong>on</strong> to the fact that wlives in H, there will in general be no vector in L which maps, via the map , to w. If therewere, f(x) in Eq. (61) could be computed in <strong>on</strong>e step, avoiding the sum (and making thecorresp<strong>on</strong>ding SVM N S times faster, where N S is the number of support vectors). Despitethis, ideas al<strong>on</strong>g these lines can be used to signicantly speed up the test phase of SVMs(Burges, 1996). Note also that it is easy to nd kernels (<strong>for</strong> example, kernels which arefuncti<strong>on</strong>s of the dot products of the x i in L) such that the training algorithm and soluti<strong>on</strong>found are independent of the dimensi<strong>on</strong> of both L and H.In the next Secti<strong>on</strong> we will discuss which functi<strong>on</strong>s K are allowable and which arenot.Let us end this Secti<strong>on</strong> with a very simple example of an allowed kernel, <strong>for</strong> which we canc<strong>on</strong>struct the mapping .Suppose that your data are vectors in R 2 , and you choose K(x i x j )=(x i x j ) 2 . Thenit's easy to nd a space H, and mapping from R 2 to H, suchthat(x y) 2 =(x) (y):we choose H = R 3 and(x) =0@ x2 1 p2 x1 x 2x 2 21A (62)(note that here the subscripts refer to vector comp<strong>on</strong>ents). For data in L dened <strong>on</strong> thesquare [;1 1] [;1 1] 2 R 2 (a typical situati<strong>on</strong>, <strong>for</strong> grey level image data), the (entire)image of is shown in Figure 8. This Figure also illustrates how to think of this mapping:the image of may live in a space of very high dimensi<strong>on</strong>, but it is just a (possibly veryc<strong>on</strong>torted) surface whose intrinsic dimensi<strong>on</strong> 12 is just that of L.Note that neither the mapping nor the space H are unique <strong>for</strong> a given kernel. We couldequally well have chosen H to again be R 3 and0 1(x) = p 1 @ (x2 1 ; x 2 2)2x 1 x 2A (63)2 (x 2 + 1 x2) 2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!