anytime algorithms for learning anytime classifiers saher ... - Technion

More documents

Recommendations

Info

Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 Procedure LSID3-Choose-Attribute(E, A, r) If r = 0 Return ID3-Choose-Attribute(E, A) Foreach a ∈ A Foreach vi ∈ domain(a) Ei ← {e ∈ E | a(e) = vi} mini ← ∞ Repeat r times T ← SID3(Ei, A − {a}) mini ← min (mini, |T |) mini Return a for which totala is minimal totala ← � |domain(a)| i=1 Figure B.3: Attribute selection in LSID3 accurate. Therefore, LSID3 is expected to improve with the increase in r. Figure B.3 lists the procedure for attribute selection as applied by LSID3. Because our goal is to sample small trees and not to always obtain the smallest tree, we use LSID3(r = 1) as a sampler. Observe that LSID3 is stochastic by nature, and therefore we do not need to randomize its decisions. B.3 Empirical Evaluation We tested Occam’s empirical principle, as stated in Definition 1, on 20 datasets, 18 of which were chosen arbitrarily from the UCI repository (Asuncion & Newman, 2007), and 2 which are artificial datasets that represent hard concepts: XOR- 5 with 5 additional irrelevant attributes, and 20-bit Multiplexer. Each dataset was partitioned into 10 subsets that were used to create 10 learning problems. Each problem consisted of one subset serving as a testing set and the union of the remaining 9 as a training set, as in 10-fold cross validation. We sampled the version space, TDIDT A(E), for each training set E using the three methods described in Section B.2.2 and tested the correlation between the size of a tree (number of leaves) and its accuracy on the associated testing set. The size of the sample was ten thousand for RTG and SID3, and one thousand for LSID3 (due to its higher costs). We first present and discuss the results for consistent trees and then we address the problem of pruned, inconsistent trees. 148
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 Table B.1: Testing Occam’s empirical principle using different sampling methods that produce consistent trees. For each method we report the accuracy, tree size, and Spearman’s correlation coefficient (ρ) averaged over all 10 partitions. We also report the number of times (out of 10) that a negative correlation was found to be statistically significant with p = 0.95. RTG SID3 LSID3 √ √ Dataset Acc. Size ρ Acc. Size ρ Acc. Size ρ Breast-w 92.9±2.8 128±14 -0.1 7 93.1±2.7 108±11 -0.1 8 94.3±1.6 77±4 -0.1 5 Bupa 59.7±8.0 213±10 0 10 63.4±7.3 94±6 0 7 61.9±7.4 69±3 0 3 Car 72.8±4.6 647±77 -0.8 10 79.7±5.8 520±95 -0.9 10 91.9±1.1 285±13 0 3 Cleveland 51.6±6.9 188±7 0 6 50.2±7.2 134±7 0 7 46.1±7.4 98±5 -0.1 7 Corral 73.3±22.9 15±3 -0.1 9 81.6±19.6 10±2 -0.2 10 89.8±8.3 7±1 0.2 na Glass 55.6±9.9 135±8 -0.1 10 62.3±9.3 57±5 -0.2 10 68.0±8.4 39±3 -0.1 9 Hungerian 72.8±7.4 125±10 -0.1 9 73.3±7.2 65±6 -0.1 8 69.9±7.5 47±3 -0.1 8 Iris 88.9±7.3 39±9 -0.3 10 92.8±4.5 12±2 -0.1 8 93.8±2.7 8±0 -0.1 7 Monks-1 91.1±4.6 203±42 -0.5 10 97.0±3.8 113±55 -0.7 10 100.0±0.0 28±4 na na Monks-2 77.8±4.4 294±9 -0.3 10 75.5±4.6 289±8 -0.4 10 77.7±3.1 259±3 0.2 0 Monks-3 88.3±5.4 171±46 -0.6 10 96.0±2.5 77±33 -0.5 10 96.7±0.4 38±2 0.1 na Mux-20 55.7±6.3 388±14 -0.1 10 56.6±6.5 249±13 -0.2 10 86.1±11.8 89±35 -0.9 10 Nursery 77.2±5.9 3271±551 -0.9 10 93.1±1.8 1583±295 -0.8 10 98.1±0.5 656±54 -0.4 10 Scale 72.2±4.2 394±11 0.1 0 71.7±4.1 389±11 0.1 0 70.1±3.8 352±5 0.1 3 Splice 61.1±3.7 1977±101 -0.6 10 60.5±4.2 1514±112 -0.7 10 89.3±1.8 355±23 -0.5 10 Tic-tac 72.3±4.8 468±34 -0.4 10 80.2±4.5 311±30 -0.4 10 87.7±3.0 166±11 -0.1 9 Voting 89.2±5.8 52±12 -0.3 10 92.8±4.3 26±5 -0.2 9 94.5±3.1 15±2 -0.2 8 Wine 78.6±10.2 73±12 -0.3 10 90.6±6.7 13±3 -0.2 9 91.7±4.3 7±1 -0.1 6 Xor-5 50.7±10.6 136±8 -0.1 10 51.9±11.8 108±11 -0.4 10 96.5±7.7 39±11 -0.8 10 Zoo 90.0±7.4 24±5 -0.2 10 91.8±6.2 18±4 -0.2 9 94.2±3.5 11±1 -0.1 na B.3.1 Consistent Decision Trees Figure B.4 plots size-frequency curves for the trees obtained by each sampling method for three datasets: Nursery, Glass, and Multiplexer-20 (for one fold out of the 10). Each of the three methods focuses on a different subspace of TDIDT A(E), with the biased sampling methods producing samples consisting of smaller trees. In all cases we see a bell-shaped curve, indicating that the distribution is close to normal. Recall that RTG does not uniformly sample TDIDT A(E): a specific small tree has a better chance of being built than a specific large tree. The histograms indicate, however, that the frequency of small trees in the sample is similar to that of large trees (symmetry). This can be explained by the fact that there are more large trees than small trees. To further verify this, we compared the distribution of the tree size in an RTG sample to that of all trees, as reported in (Murphy & Pazzani, 1994) (Mux-11 dataset). The size-frequency curves for the full space and the sampled space were found to be similar. Occam’s empirical principle states that there is a negative correlation between 149 √
Page 1 and 2:
Technion - Computer Science Departm
Page 3 and 4:
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114: Technion - Computer Science Departm
Page 163: Technion - Computer Science Departm
show all

anytime algorithms for learning anytime classifiers saher ... - Technion

Create successful ePaper yourself

Delete template?

Save as template?