CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...
CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...
CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4.3.2 Classification Results<br />
4.3.2.1 Overall Accuracies (%CC)<br />
The classification results of all the standalone datasets for case study 1 are illustrated<br />
in the bar charts of Figure 4-7 as percentages of correctly classified samples ( ).<br />
The overall accuracies of the standalone datasets prior to PCA have been thoroughly<br />
described in Section 2.3.2. As PCA is the first step towards data integration, the<br />
principal components of each experimental technique were also tested using the<br />
multivariate analysis pipeline according to Section 4.2.5<br />
Based on the bar charts of Figure 4-7, the overall accuracy of FTIR is clearly<br />
enhanced when the raw data are subjected to PCA; in particular, the results of both<br />
PLS-DA and SVMs increase by approximately 5%. Even so, as in the case of raw<br />
FTIR, the ensemble of nonlinear (RBF) SVMs performs relatively worse than the<br />
linear classifiers. In order to ensure that the low accuracy of the RBF models is not<br />
due to overfitting and/or the newly incorporated Box complex approximation<br />
algorithm, the result was verified by executing once more the grid search approach of<br />
Section 2.2.4. Indeed, the nonlinear boundaries of the RBF kernel proved to be too<br />
complex to correctly classify the simple FTIR data; linear separation clearly gives<br />
better results for this. Xu et al. (2006) report that chemometric algorithms such as<br />
PLS-DA are more efficient when applied on traditional analytical techniques such as<br />
spectroscopy, where the data are linear and well understood. In addition, according to<br />
Smoliska et al. (2012), a major drawback among kernel-based methods, also<br />
commonly encountered in the auto-scaling process, is that useful information on the<br />
importance of the variables is permanently lost. This impediment is crucial in the case<br />
of FTIR, where the spectral data may contain prominent peaks in significant<br />
biochemical absorption regions that may reveal differences between the samples of<br />
different classes. Furthermore, according to Section 1.5.2.3, an RBF SVM should be<br />
able to perform at least as well as a linear SVM (Boser et al., 1992; Keerthi and Lin,<br />
2003; Chang et al., 2010); thus, we can only assume that the optimisation process was<br />
unable to identify suitable combinations of hyperparameters that result in the same<br />
93