18.11.2012 Views

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />

Table 3.2: Characteristics of the datasets used<br />

Attributes Max<br />

Dataset Instances Nom. (bin.) Num. domain Classes<br />

Automobile - Make 160 10 (4) 15 8 22<br />

Automobile - Sym. 160 10 (4) 15 22 7<br />

Balance Scale 625 4 (0) 0 5 3<br />

Breast Cancer 277 9 (3) 0 13 2<br />

Connect-4 68557 42 (0) 0 3 3<br />

Corral 32 6 (6) 0 2 2<br />

Glass 214 0 (0) 9 - 7<br />

Iris 150 0 (0) 4 - 3<br />

Monks-1 124+432 6 (2) 0 4 2<br />

Monks-2 169+432 6 (2) 0 4 2<br />

Monks-3 122+432 6 (2) 0 4 2<br />

Mushroom 8124 22 (4) 0 12 2<br />

Solar Flare 323 10 (5) 0 7 4<br />

Tic-Tac-Toe 958 9 (0) 0 3 2<br />

Voting 232 16 (16) 0 2 2<br />

Wine 178 0 (0) 13 - 3<br />

Zoo 101 16 (15) 0 6 7<br />

Numeric XOR 3D 200 0 (0) 6 - 2<br />

Numeric XOR 4D 500 0 (0) 8 - 2<br />

Multiplexer-20 615 20 (20) 0 2 2<br />

Multiplex-XOR 200 11 (11) 0 2 2<br />

XOR-5 200 10 (10) 0 2 2<br />

XOR-5 10% noise 200 10 (10) 0 2 2<br />

XOR-10 10000 20 (20) 0 2 2<br />

ones. 10 Several commonly used UCI datasets were included (among the 17) to<br />

allow comparison with results reported in literature. Table 3.2 summarizes the<br />

characteristics of these datasets while Appendix A gives more detailed descriptions.<br />

To compare the per<strong>for</strong>mance of the different <strong>algorithms</strong>, we will consider<br />

two evaluation criteria over decision trees: (1) generalization accuracy, measured<br />

by the ratio of the correctly classified examples in the testing set, and (2) tree<br />

size, measured by the number of non-empty leaves in the tree.<br />

3.7.2 Fixed Time Comparison<br />

Our first set of experiments compares ID3, C4.5 with its default parameters, ID3k(k<br />

= 2), LSID3(r = 5) and LSID3-p(r = 5). We used our own implementation<br />

<strong>for</strong> all <strong>algorithms</strong>, where the results of C4.5 were validated with WEKA’s implementation<br />

(Witten & Frank, 2005). We first discuss the results <strong>for</strong> the consistent<br />

trees and continue by analyzing the findings when pruning is applied.<br />

10 The artificial datasets are available at http://www.cs.technion.ac.il/∼e<strong>saher</strong>/publications/datasets.<br />

46

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!