25.12.2013 Views

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.3.2 Classification Results<br />

4.3.2.1 Overall Accuracies (%CC)<br />

The classification results of all the standalone datasets for case study 1 are illustrated<br />

in the bar charts of Figure 4-7 as percentages of correctly classified samples ( ).<br />

The overall accuracies of the standalone datasets prior to PCA have been thoroughly<br />

described in Section 2.3.2. As PCA is the first step towards data integration, the<br />

principal components of each experimental technique were also tested using the<br />

multivariate analysis pipeline according to Section 4.2.5<br />

Based on the bar charts of Figure 4-7, the overall accuracy of FTIR is clearly<br />

enhanced when the raw data are subjected to PCA; in particular, the results of both<br />

PLS-DA and SVMs increase by approximately 5%. Even so, as in the case of raw<br />

FTIR, the ensemble of nonlinear (RBF) SVMs performs relatively worse than the<br />

linear classifiers. In order to ensure that the low accuracy of the RBF models is not<br />

due to overfitting and/or the newly incorporated Box complex approximation<br />

algorithm, the result was verified by executing once more the grid search approach of<br />

Section 2.2.4. Indeed, the nonlinear boundaries of the RBF kernel proved to be too<br />

complex to correctly classify the simple FTIR data; linear separation clearly gives<br />

better results for this. Xu et al. (2006) report that chemometric algorithms such as<br />

PLS-DA are more efficient when applied on traditional analytical techniques such as<br />

spectroscopy, where the data are linear and well understood. In addition, according to<br />

Smoliska et al. (2012), a major drawback among kernel-based methods, also<br />

commonly encountered in the auto-scaling process, is that useful information on the<br />

importance of the variables is permanently lost. This impediment is crucial in the case<br />

of FTIR, where the spectral data may contain prominent peaks in significant<br />

biochemical absorption regions that may reveal differences between the samples of<br />

different classes. Furthermore, according to Section 1.5.2.3, an RBF SVM should be<br />

able to perform at least as well as a linear SVM (Boser et al., 1992; Keerthi and Lin,<br />

2003; Chang et al., 2010); thus, we can only assume that the optimisation process was<br />

unable to identify suitable combinations of hyperparameters that result in the same<br />

93

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!