18.11.2012 Views

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />

population and to produce more generations. The accuracy on the testing set,<br />

even after thousands of generations, remained very low. Similar results were<br />

obtained <strong>for</strong> the Multiplexers-20 dataset.<br />

The above set of experiments was repeated on the much more difficult XOR-10<br />

dataset. The advantage of IIDT over the other methods was even more evident.<br />

While IIDT was able to reach accuracy of 100%, bagging-ID3, skewing, and<br />

GATree per<strong>for</strong>med as poorly as a random guesser, with accuracy of only 50%.<br />

The next experiment was with the Tic-tac-toe dataset. In this case, as shown<br />

in Figure 3.35, both ensemble-based methods have a significant advantage over<br />

the single tree inducers. We speculate that this is because ensemble methods<br />

were able to overcome the quick-fragmentation problem associated with multiway<br />

splits by combining several <strong>classifiers</strong>. We are still looking <strong>for</strong> ways to verify<br />

this hypothesis. Bagging-ID3 outper<strong>for</strong>ms the other methods until the fifth second,<br />

where bagging-LSID3 overtakes it slightly. In contrast to the XOR-5 domain,<br />

building larger committees is worthwhile in this case, even at the expense of less<br />

accurate base <strong>classifiers</strong>. However, if the time allocation permits, large ensembles<br />

of LSID3 trees are shown to be the most accurate. We believe that the general<br />

question of tradeoff between the resources allocated <strong>for</strong> each tree and the number<br />

of trees <strong>for</strong>ming the ensemble should be addressed by further research with extensive<br />

experiments on various datasets. The per<strong>for</strong>mance of generalized skewing<br />

and IIDT was similar in this case, with a slight advantage <strong>for</strong> skewing in terms<br />

of accuracy and an advantage <strong>for</strong> IIDT in terms of tree size. GATree was run on<br />

the dataset <strong>for</strong> 150 generations (30 seconds). The average accuracy was 76.42%,<br />

much lower than that of the other methods.<br />

69

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!