CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

More documents

Recommendations

Info

4.3.2 Classification Results 4.3.2.1 Overall Accuracies (%CC) The classification results of all the standalone datasets for case study 1 are illustrated in the bar charts of Figure 4-7 as percentages of correctly classified samples ( ). The overall accuracies of the standalone datasets prior to PCA have been thoroughly described in Section 2.3.2. As PCA is the first step towards data integration, the principal components of each experimental technique were also tested using the multivariate analysis pipeline according to Section 4.2.5 Based on the bar charts of Figure 4-7, the overall accuracy of FTIR is clearly enhanced when the raw data are subjected to PCA; in particular, the results of both PLS-DA and SVMs increase by approximately 5%. Even so, as in the case of raw FTIR, the ensemble of nonlinear (RBF) SVMs performs relatively worse than the linear classifiers. In order to ensure that the low accuracy of the RBF models is not due to overfitting and/or the newly incorporated Box complex approximation algorithm, the result was verified by executing once more the grid search approach of Section 2.2.4. Indeed, the nonlinear boundaries of the RBF kernel proved to be too complex to correctly classify the simple FTIR data; linear separation clearly gives better results for this. Xu et al. (2006) report that chemometric algorithms such as PLS-DA are more efficient when applied on traditional analytical techniques such as spectroscopy, where the data are linear and well understood. In addition, according to Smoliska et al. (2012), a major drawback among kernel-based methods, also commonly encountered in the auto-scaling process, is that useful information on the importance of the variables is permanently lost. This impediment is crucial in the case of FTIR, where the spectral data may contain prominent peaks in significant biochemical absorption regions that may reveal differences between the samples of different classes. Furthermore, according to Section 1.5.2.3, an RBF SVM should be able to perform at least as well as a linear SVM (Boser et al., 1992; Keerthi and Lin, 2003; Chang et al., 2010); thus, we can only assume that the optimisation process was unable to identify suitable combinations of hyperparameters that result in the same 93
accuracies as the linear models; possibly, the cost and/or gamma values that generate nearly linear boundaries did not satisfy the provided constraints. In the case of HPLC, PCA boosts the overall accuracy of the simplistic PLS-DA ensembles, while it decreases the results of both linear and nonlinear SVMs by approximately 5%. Since both types of SVMs produce a lower accuracy comparing to PLS-DA, we can only assume that the background of SVMs is the underlying cause for this result; as presented in Section 1.5.2.1, PLS-DA constructs the decision boundaries based on all available samples as a whole, whereas SVMs are solely based on the selection of support vectors. A thorough investigation of the class predictions may help towards justifying this hypothesis. Finally, the PCA scores from the e-nose dataset perform significantly better for all implemented classification models when compared to the accuracies of raw data. In particular, the SVM ensembles reach a maximum overall accuracy of 50%, the highest recorded result for this dataset. Despite the improvement in the results, as with the raw standalone data, the generalisation performance of e-nose is relatively poor since it generates the lowest accuracies among the three experimental techniques. The classification results of the integrated datasets using GPA and CPCA are displayed in Figure 4-8. For the GPA algorithm, SVMs produce at least as good results as PLS-DA in the majority of cases, with the linear SVMs taking the lead among the three classification ensembles. In addition, in the pairwise combination of FTIR with HPLC, as well as the simultaneous fusion of all three experimental techniques, linear SVMs have produced somewhat higher results when compared to those by the standalone principal components. On the contrary, the pairwise integration of either FTIR or HPLC with e-nose decreases the overall accuracy. It can be therefore concluded that e-nose clearly dominates the outcome of data integration and classification in this instance. Furthermore, in the majority of integrated datasets, the GPA algorithm produces higher overall accuracies compared to CPCA when linear and nonlinear SVM classifiers are employed. For the same instances, when PLS-DA is applied as the classification technique, the results between the two data integration techniques 94
Page 1 and 2:
CRANFIELD UNIVERSITY Eleni Anthippi
Page 3 and 4:
ABSTRACT Muscle foods such as meat,
Page 5 and 6:
TABLE OF CONTENTS ABSTRACT ........
Page 7 and 8:
6.2.4 The iWebPlots package .......
Page 9 and 10:
Figure 3-12 Speedup produced by the
Page 11 and 12:
Figure 6-9 Sweave example for the d
Page 13 and 14:
TABLE OF EQUATIONS Equation 1 Mean-
Page 15 and 16:
RMSE RMSECV SSE SVD SVMs SYMBIOSIS-
Page 17 and 18:
Figure 1-1 Evolution from molecular
Page 19 and 20:
1.1.2.1 Fourier Transform Infrared
Page 21 and 22:
1.1.3 Microbial Spoilage in Meat Sy
Page 23 and 24:
The multivariate techniques applied
Page 25 and 26:
1.4 Multivariate Analysis: Unsuperv
Page 27 and 28:
1.4.2 Cluster Analysis Cluster anal
Page 29 and 30:
1.5 Multivariate Analysis: Supervis
Page 31 and 32:
In any linearly separable binary da
Page 33 and 34:
Where ( ) are the Lagrange multipli
Page 35 and 36:
Every kernel is characterised by a
Page 37 and 38:
The “one-against-all” approach
Page 39 and 40:
Furthermore, metrics such as the bi
Page 41 and 42:
1.6.3 Leave-One-Out Cross-Validatio
Page 43 and 44:
In such cases, the model becomes qu
Page 45 and 46:
1.8 Aims and objectives The overall
Page 47 and 48:
2 Development of the multivariate a
Page 49 and 50:
Sensory panel scores were of the hi
Page 51 and 52:
DF (Hz) 2.2.1.5 Electronic Nose (e-
Page 53 and 54:
Figure 2-3 Data intersection The fi
Page 55 and 56:
3. Validation Techniques In the cas
Page 57 and 58: Figure 2-4 The process of construct
Page 59 and 60: 2.3 Results and Discussion 2.3.1 Pr
Page 61 and 62: (b) HPLC data (c) e-nose data Figur
Page 63 and 64: In the case of FTIR, linear classif
Page 65 and 66: 2.3.2.2 Class Prediction Accuracies
Page 67 and 68: Figure 2-7 Class prediction rates o
Page 69 and 70: The topology of the three-dimension
Page 71 and 72: Figure 2-9 Three-dimensional error
Page 73 and 74: Furthermore, a thorough comparison
Page 75 and 76: In the master/slave architecture, a
Page 77 and 78: precision of the classification acc
Page 79 and 80: Figure 3-3 The steps of the Nelder-
Page 81 and 82: 3.3 Results and Discussion 3.3.1 Li
Page 83 and 84: 3.3.2 Nonlinear Models In order to
Page 85 and 86: 9) 10) 11) 12) 13) 14) 15) 16) Figu
Page 87 and 88: As an additional visual aid, these
Page 89 and 90: a) Number of iterations b) Number o
Page 91 and 92: Figure 3-10 Comparison of the execu
Page 93 and 94: Figure 3-12 Speedup produced by the
Page 95 and 96: 4 Integration of Heterogeneous Data
Page 97 and 98: 4.2.2 Procrustes Analysis Procruste
Page 99 and 100: symmetric method since the ordering
Page 101 and 102: Figure 4-3 Steps of CPCA for the da
Page 103 and 104: 4.2.5 Data Integration and Analysis
Page 105 and 106: Figure 4-4 Data integration workflo
Page 107: a) GPA b) CPCA Figure 4-6 The conse
Page 111 and 112: Figure 4-8 Classification Results f
Page 113 and 114: Similar to FTIR, the PLS-DA ensembl
Page 117 and 118: 4.3.3 Permutation Tests Even though
Page 119 and 120: Even though the interpretation of t
Page 121 and 122: Figure 4-12 Distribution plots of t
Page 123 and 124: RBF SVMs Datasets Original %CC Mean
Page 125 and 126: Figure 4-14 Boxplots representing t
Page 127 and 128: 5 Application of the multivariate a
Page 129 and 130: Figure 5-1 Mean FTIR spectra for ca
Page 131 and 132: 5.2.2.2 Fourier Transform Infrared
Page 133 and 134: 5.2.2.4 Data Overview For each expe
Page 135 and 136: 5.2.3.3 High Throughput Liquid Chro
Page 137 and 138: Figure 5-5 displays the PCA scores
Page 139 and 140: (a) GPA (b) CPCA Figure 5-6 The con
Page 141 and 142: observation confirms that indeed CP
Page 145 and 146: (a) RBF SVMs (b) PLS-DA Figure 5-9
Page 147 and 148: RBF SVMs Datasets Original %CC Mean
Page 149 and 150: Figure 5-12 Execution times of the
Page 151 and 152: (a) FTIR data (b) Raman data Figure
Page 153 and 154: The classification results for the
Page 155 and 156: Figure 5-16 Class prediction rates
Page 157 and 158: (a) RBF SVMs (b) PLS-DA Figure 5-17
Page 159 and 160:
RBF SVMs Datasets Original %CC Mean
Page 161 and 162:
Figure 5-20 Execution times of the
Page 163 and 164:
(a) FTIR data (b) HPLC data 148
Page 165 and 166:
(a) GPA (b) CPCA Figure 5-22 The co
Page 167 and 168:
The classification results of the f
Page 169 and 170:
Figure 5-24 Classification Results
Page 171 and 172:
Figure 5-25 Class prediction rates
Page 173 and 174:
The permutation results for the dat
Page 175 and 176:
Figure 5-28 Distribution plots of t
Page 177 and 178:
RBF SVMs Datasets Original %CC Mean
Page 179 and 180:
Figure 5-30 Boxplots representing t
Page 181 and 182:
5.4 Comparison of the individual ca
Page 183 and 184:
Figure 5-32 Investigating the commo
Page 185 and 186:
6 Development of improved visualisa
Page 187 and 188:
6.2.2 Generating static graphs As d
Page 189 and 190:
Figure 6-2 Construction process of
Page 191 and 192:
6.2.2.2 Generating dynamic reports
Page 193 and 194:
activity, while simultaneously the
Page 195 and 196:
or disappeared (Wang et al., 2008).
Page 197 and 198:
ectangles, circles and/or irregular
Page 199 and 200:
In addition to the static figures g
Page 201 and 202:
c) graphics package d) ggplot2 pack
Page 203 and 204:
In addition to aesthetics, faceting
Page 205 and 206:
Datasets FTIR HPLC e-nose PLS-DA (L
Page 207 and 208:
Figure 6-10 Interactive dendrogram
Page 209 and 210:
6.4 Conclusion The work reported in
Page 211 and 212:
large number of individual models i
Page 213 and 214:
The generated suite of tools, as pr
Page 215 and 216:
In addition, the difficulty to corr
Page 217 and 218:
REFERENCES Almeida, J. A. S., Barbo
Page 219 and 220:
Processing., Proceedings of the 12t
Page 221 and 222:
Clarke, B., Fokoué, E. and Zhang,
Page 223 and 224:
spectroscopy and machine learning",
Page 225 and 226:
Gower, J. C. (1975), "Generalized p
Page 227 and 228:
Jardon, M. (2006), Systems Biology:
Page 229 and 230:
Nicolaou, N., Xu, Y. and Goodacre,
Page 231 and 232:
Shah, A. A., Barthel, D., Lukasiak,
Page 233 and 234:
Vera, G., Jansen, R. and Suppi, R.
Page 235:
Xu, Y. and Goodacre, R. (2012), "Mu
show all

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

Create successful ePaper yourself

Delete template?

Save as template?