anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />
Table 3.4: The average differences in tree-size between the different <strong>algorithms</strong> and<br />
their t-test significance, with α = 0.05 ( √ indicates a significant advantage and × a<br />
significant disadvantage). The t-test is not applicable <strong>for</strong> the Monk datasets because<br />
only 1 train-test partition was used.<br />
LSID3 vs. ID3 LSID3 vs. C4.5 LSID3-P vs. C4.5<br />
Dataset Diff Sig? Diff Sig? Diff Sig?<br />
Autos Make<br />
Autos Sym.<br />
Balance<br />
Br. Cancer<br />
Connect-4<br />
Corral<br />
Glass<br />
Iris<br />
-17.1 ±4.8<br />
-19.9 ±2.2<br />
-6.2 ±5.0<br />
-29.3 ±5.6<br />
-3976 ±265<br />
-2.5 ±2.2<br />
-4.4 ±3.0<br />
-0.9 ±0.7<br />
√<br />
√<br />
√<br />
√<br />
√<br />
√<br />
√<br />
√<br />
9.9 ±2.4<br />
6.6 ±4.1<br />
312.8 ±9.8<br />
94.2 ±5.4<br />
11201 ±183<br />
1.4 ±1.3<br />
10.3 ±3.1<br />
2.9 ±1.0<br />
×<br />
×<br />
×<br />
×<br />
×<br />
×<br />
×<br />
×<br />
10.4 ±2.2<br />
6.3 ±4.3<br />
5.2 ±7.7<br />
1.7 ±8.5<br />
3284 ±186<br />
1.0 ±1.3<br />
11.6 ±3.7<br />
1.9 ±1.6<br />
×<br />
×<br />
×<br />
∼<br />
×<br />
×<br />
×<br />
×<br />
Monks-1 -35.0 ±0.0 - 16.0 ±0.0 - 10.0 ±0.0 -<br />
Monks-2 -16.6 ±3.4 - 72.4 ±3.4 - -1.7 ±5.0 -<br />
Monks-3<br />
Mushroom<br />
Solar-Flare<br />
Tic-tac-toe<br />
Voting<br />
Wine<br />
Zoo<br />
Numeric XOR-3D<br />
Numeric XOR-4d<br />
Multiplexer-20<br />
Multiplex-XOR<br />
XOR-5<br />
XOR-5 Noise<br />
XOR-10<br />
-4.2 ±1.6<br />
-7.8 ±0.9<br />
-5.6 ±2.5<br />
-37.3 ±15.1<br />
-0.6 ±2.1<br />
-1.7 ±1.2<br />
-3.9 ±0.9<br />
-33.8 ±5.1<br />
-77.9 ±6.0<br />
-96.1 ±21.6<br />
-40.1 ±7.3<br />
-60.3 ±7.8<br />
-35.4 ±8.3<br />
-1897 ±587<br />
√<br />
-<br />
√<br />
√<br />
√<br />
√<br />
√<br />
√<br />
√<br />
√<br />
√<br />
√<br />
√<br />
√<br />
17.8 ±1.6<br />
-2.8 ±0.9<br />
61.2 ±3.1<br />
68.3 ±9.3<br />
10.2 ±2.5<br />
0.9 ±1.0<br />
1.6 ±1.2<br />
8.2 ±1.0<br />
24.1 ±4.8<br />
-19.4 ±20.4<br />
17.9 ±8.1<br />
10.1 ±5.3<br />
35.0 ±8.9<br />
637 ±577<br />
√<br />
-<br />
×<br />
×<br />
×<br />
×<br />
×<br />
×<br />
√<br />
×<br />
×<br />
×<br />
×<br />
×<br />
0.9 ±1.4<br />
-2.9 ±0.9<br />
-0.8 ±1.4<br />
29.1 ±10.9<br />
0.7 ±2.4<br />
2.1 ±1.6<br />
1.5 ±1.2<br />
10.7 ±1.7<br />
28.1 ±6.3<br />
-24.9 ±17.4<br />
5.7 ±7.0<br />
10.1 ±5.3<br />
17.2 ±8.1<br />
273 ±524<br />
√<br />
-<br />
√<br />
×<br />
×<br />
×<br />
×<br />
×<br />
√<br />
×<br />
×<br />
×<br />
×<br />
×<br />
that of ID3 on some datasets. ID3-k achieved similar results to LSID3 <strong>for</strong> some<br />
datasets, but per<strong>for</strong>med much worse <strong>for</strong> others, such as Tic-tac-toe and XOR-<br />
10. For most datasets, the decrease in the size of the trees induced by LSID3 is<br />
accompanied by an increase in predictive power. This phenomenon is consistent<br />
with Occam’s Razor.<br />
Pruned Trees<br />
Pruning techniques help to avoid overfitting. We view pruning as orthogonal to<br />
our lookahead approach. Thus, to allow handling noisy datasets, we tested the<br />
per<strong>for</strong>mance of LSID3-p, which post-prunes the LSID3 trees using error-based<br />
pruning.<br />
Figure 3.22 compares the per<strong>for</strong>mance of LSID3-p to that of C4.5. Applying<br />
50