25.12.2013 Views

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Every kernel is characterised by a set of parameters – the hyperparameters – that have<br />

to be optimised for a particular problem (Chapelle and Vapnik, 2000; Xu et al., 2006).<br />

The Gaussian Radial Basis Function (RBF) kernel is particularly popular especially in<br />

cases where there is little or no knowledge about the data under study. In RBF SVMs,<br />

only one kernel parameter has to be optimised – the value of or – in addition to<br />

the regularisation parameter C.<br />

The value determines the degree of nonlinearity or width of the RBF kernel<br />

(Boardman and Trappenberg, 2006; Verplancke et al., 2008), and is inversely related<br />

to , the spread of the data, where . Higher values of result in greater<br />

nonlinearity of the decision boundaries. More specifically, very high values of (low<br />

values of ) potentially result in sharp peaks, “spiky” functions and boundaries that<br />

surround individual samples as illustrated in Figure 1-6 (Valentini and Dietterich,<br />

2004; Brereton, 2009). As the value decreases, the Gaussians become broader with<br />

smoother surfaces that fit the data quite well. According to Keerthi and Lin (2003),<br />

for small values of<br />

the RBF kernel tends towards a linear boundary<br />

(Boser et al., 1992; Hsu et al., 2003). Thus, a linear classifier may be considered a<br />

special case of the RBF model since “with a suitable combination of hyperparameters<br />

, the testing accuracy of the RBF kernel is at least as good as the linear kernel”<br />

(Boser et al., 1992; Keerthi and Lin, 2003; Hsu et al., 2003; Chang et al., 2010).<br />

In addition, as presented in Section 1.5.2.2, the cost parameter controls the<br />

complexity of the SVM boundaries. More specifically, according to Xu et al. (2006),<br />

the cost parameter controls the optimal trade-off between the two criteria of Equation<br />

14, maximising the margin and minimising the training error. As , the hard<br />

margin case is obtained, and thus, lower tolerance of misclassification is allowed<br />

(Brereton, 2009). The high values of will force the creation of extremely complex<br />

boundaries that misclassify as few training samples as possible. Large values of<br />

may often lead to instances of overfitting (Foody and Mathur, 2004). On the contrary,<br />

a lower value of creates wider margins, which allows instances close to the<br />

boundary to be ignored (Ben-Hur et al., 2010). For very low values of , independent<br />

of the value, the SVM models are unable to learn, causing a problem of underfitting<br />

(Valentini and Dietterich, 2004).<br />

20

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!