CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...
CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...
CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4.3.3 Permutation Tests<br />
Even though thorough model validation and evaluation methods have been applied to<br />
ensure that the performance metrics are representative of real world application,<br />
“accuracy estimates are usually meaningless without a confidence interval” (Kohavi,<br />
1995; Brereton, 2006; Harrington, 2006).<br />
As a means of providing an indication of the statistical significance of the results,<br />
permutation tests were applied. The background of permutation testing was<br />
demonstrated in Sections 1.7 and 2.2.4. By randomising the data with respect to the<br />
sensory scores (classes), any prior association between the initial data and the classes<br />
is destroyed, while their initial distributional properties are preserved (Wu et al.,<br />
2002; Westerhuis et al., 2008). As permutation testing is performed repeatedly a large<br />
number of times, a reference distribution for the null hypothesis is obtained. The 95%<br />
confidence interval (C.I.), which is equal to two standard deviations from the mean, is<br />
calculated based on the distribution of permuted classification results. If the observed<br />
non-permuted value is higher than both 95% confidence bounds, then the initial result<br />
is indeed significant. Metrics such as the -value are also frequently reported in<br />
permutation testing; the -value is equal to the proportion of permuted values that are<br />
at least as good as the observed statistic (Hubert and Schultz, 1976).<br />
In the context of this work, each permutation constitutes a single classification<br />
ensemble, which consists of 100 individual classifiers; each of these classifiers<br />
includes 100 bootstrapping iterations for the purposes of hyperparameter<br />
optimisation. The permutation tests were executed a total of 100 times for each<br />
dataset under study, which results to a total of one million iterations per dataset.<br />
Under the null hypothesis, the original non-permuted value is considered another<br />
random case. Thus, only 99 actual permutations are indeed required, in addition to the<br />
observed value, leading to 100 permutations in total; for the specific number of<br />
iterations, the lowest possible -value will be equal to .<br />
Finally, all the permuted samples were drawn as an individual step prior to analysis to<br />
assure that the outcome of randomisation is not biased in any way.<br />
102