13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.3 EXTENDING LINEAR MODELS 219way of choosing the value of n is to start with 1 (a linear model) <strong>and</strong> incrementit until the estimated error ceases to improve. Usually, quite small valuessuffice.Other kernel functions can be used instead to implement different nonlinearmappings. Two that are often suggested are the radial basis function (RBF) kernel<strong>and</strong> the sigmoid kernel. Which one produces the best results depends on theapplication, although the differences are rarely large in practice. It is interestingto note that a support vector machine with the RBF kernel is simply a type ofneural network called an RBF network (which we describe later), <strong>and</strong> one withthe sigmoid kernel implements another type of neural network, a multilayerperceptron with one hidden layer (also described later).Throughout this section, we have assumed that the training data is linearlyseparable—either in the instance space or in the new space spanned by the nonlinearmapping. It turns out that support vector machines can be generalized tothe case where the training data is not separable. This is accomplished by placingan upper bound on the preceding coefficients a i . Unfortunately, this parametermust be chosen by the user, <strong>and</strong> the best setting can only be determined byexperimentation. Also, in all but trivial cases, it is not possible to determine apriori whether the data is linearly separable or not.Finally, we should mention that compared with other methods such as decisiontree learners, even the fastest training algorithms for support vectormachines are slow when applied in the nonlinear setting. On the other h<strong>and</strong>,they often produce very accurate classifiers because subtle <strong>and</strong> complex decisionboundaries can be obtained.Support vector regressionThe concept of a maximum margin hyperplane only applies to classification.However, support vector machine algorithms have been developed for numericprediction that share many of the properties encountered in the classificationcase: they produce a model that can usually be expressed in terms of a fewsupport vectors <strong>and</strong> can be applied to nonlinear problems using kernel functions.As with regular support vector machines, we will describe the conceptsinvolved but do not attempt to describe the algorithms that actually perform thework.As with linear regression, covered in Section 4.6, the basic idea is to find afunction that approximates the training points well by minimizing the predictionerror. The crucial difference is that all deviations up to a user-specifiedparameter e are simply discarded. Also, when minimizing the error, the risk ofoverfitting is reduced by simultaneously trying to maximize the flatness of thefunction. Another difference is that what is minimized is normally the predic-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!