11.07.2015 Views

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

358. Limitati<strong>on</strong>sPerhaps the biggest limitati<strong>on</strong> of the support vector approach lies in choice of the kernel.Once the kernel is xed, SVM classiers have <strong>on</strong>ly <strong>on</strong>e user-chosen parameter (the errorpenalty), but the kernel is a very big rug under which tosweep parameters. Some work hasbeen d<strong>on</strong>e <strong>on</strong> limiting kernels using prior knowledge (Scholkopf et al., 1998a Burges, 1998),but the best choice of kernel <strong>for</strong> a given problem is still a research issue.A sec<strong>on</strong>d limitati<strong>on</strong> is speed and size, both in training and testing. While the speedproblem in test phase is largely solved in (Burges, 1996), this still requires two trainingpasses. Training <strong>for</strong> very large datasets (milli<strong>on</strong>s of support vectors) is an unsolved problem.Discrete data presents another problem, although with suitable rescaling excellent resultshave nevertheless been obtained (Joachims, 1997). Finally, although some work has beend<strong>on</strong>e <strong>on</strong> training a multiclass SVM in <strong>on</strong>e step 24 , the optimal design <strong>for</strong> multiclass SVMclassiers is a further area <strong>for</strong> research.9. Extensi<strong>on</strong>sWe very briey describe two of the simplest, and most eective, methods <strong>for</strong> improving theper<strong>for</strong>mance of SVMs.The virtual support vector method (Scholkopf, Burges and Vapnik, 1996 Burges andScholkopf, 1997), attempts to incorporate known invariances of the problem (<strong>for</strong> example,translati<strong>on</strong> invariance <strong>for</strong> the image recogniti<strong>on</strong> problem) by rst training a system, andthen creating new data by distorting the resulting support vectors (translating them, in thecase menti<strong>on</strong>ed), and nally training a new system <strong>on</strong> the distorted (and the undistorted)data. The idea is easy to implement and seems to work better than other methods <strong>for</strong>incorporating invariances proposed so far.The reduced set method (Burges, 1996 Burges and Scholkopf, 1997) was introduced toaddress the speed of support vector machines in test phase, and also starts with a trainedSVM. The idea is to replace the sum in Eq. (46) by a similar sum, where instead of supportvectors, computed vectors (which are not elements of the training set) are used, and insteadof the i , a dierent set of weights are computed. The number of parameters is chosenbe<strong>for</strong>ehand to give the speedup desired. The resulting vector is still a vector in H, andthe parameters are found by minimizing the Euclidean norm of the dierence between theoriginal vector w and the approximati<strong>on</strong> to it. The same technique could be used <strong>for</strong> SVMregressi<strong>on</strong> to nd much more ecient functi<strong>on</strong> representati<strong>on</strong>s (which could be used, <strong>for</strong>example, in data compressi<strong>on</strong>).Combining these two methods gave a factor of 50 speedup (while the error rate increasedfrom 1.0% to 1.1%) <strong>on</strong> the NIST digits (Burges and Scholkopf, 1997).10. C<strong>on</strong>clusi<strong>on</strong>sSVMs provide a new approach to the problem of pattern recogniti<strong>on</strong> (together with regressi<strong>on</strong>estimati<strong>on</strong> and linear operator inversi<strong>on</strong>) with clear c<strong>on</strong>necti<strong>on</strong>s to the underlyingstatistical learning theory. They dier radically from comparable approaches such as neuralnetworks: SVM training always nds a global minimum, and their simple geometric interpretati<strong>on</strong>provides fertile ground <strong>for</strong> further investigati<strong>on</strong>. An SVM is largely characterizedby thechoice of its kernel, and SVMs thus link the problems they are designed <strong>for</strong> with alarge body of existing work <strong>on</strong> kernel based methods. I hope that this tutorial will encouragesome to explore SVMs <strong>for</strong> themselves.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!