20.04.2014 Views

Combining Pattern Classifiers

Combining Pattern Classifiers

Combining Pattern Classifiers

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

22 FUNDAMENTALS OF PATTERN RECOGNITION<br />

level of significance 0.05). Therefore we cannot conclude that there is a significant<br />

difference between the two models on this data set.<br />

It is intuitively clear that simple models or stable classifiers are less likely to be<br />

overtrained than more sophisticated models. However, simple models might not be<br />

versatile enough to fit complex classification boundaries. More complex models<br />

(e.g., neural networks and prototype-based classifiers) have a better flexibility but<br />

require more system resources and are prone to overtraining. What do “simple”<br />

and “complex” mean in this context? The main aspects of complexity can be summarized<br />

as [23]<br />

. training time and training complexity;<br />

. memory requirements (e.g., the number of the parameters of the classifier that<br />

are needed for its operation); and<br />

. running complexity.<br />

1.4.4 Experiment Design<br />

When talking about experiment design, I cannot refrain from quoting again and<br />

again a masterpiece of advice by George Nagy titled “Candide’s practical principles<br />

of experimental pattern recognition” [24]:<br />

Comparison of Classification Accuracies<br />

Comparisons against algorithms proposed by others are distasteful and should be<br />

avoided. When this is not possible, the following Theorem of Ethical Data Selection<br />

may prove useful. Theorem: There exists a set of data for which a candidate algorithm<br />

is superior to any given rival algorithm. This set may be constructed by omitting from<br />

the test set any pattern which is misclassified by the candidate algorithm.<br />

Replication of Experiments<br />

Since pattern recognition is a mature discipline, the replication of experiments on new<br />

data by independent research groups, a fetish in the physical and biological sciences, is<br />

unnecessary. Concentrate instead on the accumulation of novel, universally applicable<br />

algorithms. Casey’s Caution: Do not ever make your experimental data available to<br />

others; someone may find an obvious solution that you missed.<br />

Albeit meant to be satirical, the above principles are surprisingly widespread and<br />

closely followed! Speaking seriously now, the rest of this section gives some practical<br />

tips and recommendations.<br />

Example: Which Is the “Best” Result? Testing should be carried out on previously<br />

unseen data. Let D(r) be a classifier with a parameter r such that varying<br />

r leads to different training accuracies. To account for this variability, here we<br />

use a randomly drawn 1000 objects from the Letter data set. The remaining

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!