11.07.2015 Views

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

214.3. Some Examples of N<strong>on</strong>linear SVMsThe rst kernels investigated <strong>for</strong> the pattern recogniti<strong>on</strong> problem were the following:K(x y) =(x y +1) p (74)K(x y) =e ;kx;yk2 =2 2 (75)K(x y) = tanh(x y ; ) (76)Eq. (74) results in a classier that is a polynomial of degree p in the data Eq. (75) givesa Gaussian radial basis functi<strong>on</strong> classier, and Eq. (76) gives a particular kind of two-layersigmoidal neural network. For the RBF case, the number of centers (N S in Eq. (61)),the centers themselves (the s i ), the weights ( i ), and the threshold (b) are all producedautomatically by the SVM training and give excellent results compared to classical RBFs,<strong>for</strong> the case of Gaussian RBFs (Scholkopf et al, 1997). For the neural network case, therst layer c<strong>on</strong>sists of N S sets of weights, each set c<strong>on</strong>sisting of d L (the dimensi<strong>on</strong> of thedata) weights, and the sec<strong>on</strong>d layer c<strong>on</strong>sists of N S weights (the i ), so that an evaluati<strong>on</strong>simply requires taking a weighted sum of sigmoids, themselves evaluated <strong>on</strong> dot products ofthe test data with the support vectors. Thus <strong>for</strong> the neural network case, the architecture(number of weights) is determined by SVM training.Note, however, that the hyperbolic tangent kernel <strong>on</strong>ly satises Mercer's c<strong>on</strong>diti<strong>on</strong> <strong>for</strong>certain values of the parameters and (and of the data kxk 2 ). This was rst noticedexperimentally (Vapnik, 1995) however some necessary c<strong>on</strong>diti<strong>on</strong>s <strong>on</strong> these parameters <strong>for</strong>positivity are now known 14 .Figure 9 shows results <strong>for</strong> the same pattern recogniti<strong>on</strong> problem as that shown in Figure7, but where the kernel was chosen to be a cubic polynomial. Notice that, even thoughthe number of degrees of freedom is higher, <strong>for</strong> the linearly separable case (left panel), thesoluti<strong>on</strong> is roughly linear, indicating that the capacity is being c<strong>on</strong>trolled and that thelinearly n<strong>on</strong>-separable case (right panel) has become separable.Figure 9.Degree 3 polynomial kernel. The background colour shows the shape of the decisi<strong>on</strong> surface.Finally, note that although the SVM classiers described above are binary classiers, theyare easily combined to handle the multiclass case. A simple, eective combinati<strong>on</strong> trains

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!