anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />
population and to produce more generations. The accuracy on the testing set,<br />
even after thousands of generations, remained very low. Similar results were<br />
obtained <strong>for</strong> the Multiplexers-20 dataset.<br />
The above set of experiments was repeated on the much more difficult XOR-10<br />
dataset. The advantage of IIDT over the other methods was even more evident.<br />
While IIDT was able to reach accuracy of 100%, bagging-ID3, skewing, and<br />
GATree per<strong>for</strong>med as poorly as a random guesser, with accuracy of only 50%.<br />
The next experiment was with the Tic-tac-toe dataset. In this case, as shown<br />
in Figure 3.35, both ensemble-based methods have a significant advantage over<br />
the single tree inducers. We speculate that this is because ensemble methods<br />
were able to overcome the quick-fragmentation problem associated with multiway<br />
splits by combining several <strong>classifiers</strong>. We are still looking <strong>for</strong> ways to verify<br />
this hypothesis. Bagging-ID3 outper<strong>for</strong>ms the other methods until the fifth second,<br />
where bagging-LSID3 overtakes it slightly. In contrast to the XOR-5 domain,<br />
building larger committees is worthwhile in this case, even at the expense of less<br />
accurate base <strong>classifiers</strong>. However, if the time allocation permits, large ensembles<br />
of LSID3 trees are shown to be the most accurate. We believe that the general<br />
question of tradeoff between the resources allocated <strong>for</strong> each tree and the number<br />
of trees <strong>for</strong>ming the ensemble should be addressed by further research with extensive<br />
experiments on various datasets. The per<strong>for</strong>mance of generalized skewing<br />
and IIDT was similar in this case, with a slight advantage <strong>for</strong> skewing in terms<br />
of accuracy and an advantage <strong>for</strong> IIDT in terms of tree size. GATree was run on<br />
the dataset <strong>for</strong> 150 generations (30 seconds). The average accuracy was 76.42%,<br />
much lower than that of the other methods.<br />
69