25.12.2013 Views

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.3.2 Classification Results<br />

2.3.2.1 Overall Accuracies (%CC)<br />

The prediction results of all implemented classification models are illustrated in the<br />

bar charts of Figure 2-6 as percentages of correctly classified samples ( ). The<br />

first model to be tested on raw standalone data was PLS-DA with LOOCV, which<br />

constitutes a single classifier, and not an ensemble of classifiers. According to Section<br />

1.6.3, LOOCV is a nearly unbiased, albeit with a high variance, technique that may<br />

often lead to misleading results (Efron, 1983; Kohavi, 1995; Duan et al., 2003;<br />

Glasmachers, 2008); in this instance, since the test set is used for both model<br />

construction and optimisation purposes, it may often lead to over-optimistic results.<br />

For case study 1, the overall accuracies of FTIR, HPLC and e-nose are equal to 63%,<br />

84% and 59% respectively. Indeed, when the afore-mentioned percentages are<br />

compared to the accuracies of the other classification models of Figure 2-6, they<br />

appear to be overly optimistic.<br />

In the case of the classification ensembles, HPLC clearly demonstrates the highest<br />

overall accuracies among the three instrumental techniques. The HPLC data appear to<br />

generate higher in the case of SVMs, especially for linear SVMs, compared to<br />

PLS-DA. Even though PLS-DA and linear SVMs both construct linear decision<br />

boundaries, the difference in accuracies can be possibly explained by the fact that<br />

support vector machines conduct linear classification based on a different approach<br />

than PLS-DA. As stated in Brereton et al. (2009), the SVM boundary depends solely<br />

on the selected support vectors, while the remaining samples have no influence over<br />

it. On the contrary, methods such as PLS-DA use all available samples in order to<br />

determine the separating planes between the classes (Xu et al., 2006). Finally, since<br />

the results of both linear and nonlinear (RBF) SVM ensembles approximate 80%, it<br />

can be concluded that the boundaries of the RBF SVMs may have been nearly linear,<br />

but still retain a wide margin (Brereton et al., 2009); as stated in Section 1.5.2.3,<br />

“with a suitable combination of hyperparameters , the testing accuracy of the<br />

RBF kernel is at least as good as the linear kernel” (Boser et al., 1992; Keerthi and<br />

Lin, 2003; Hsu et al., 2003; Chang et al., 2010).<br />

47

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!