anytime algorithms for learning anytime classifiers saher ... - Technion

More documents

Recommendations

Info

Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 Misclassification cost Misclassification cost 70 65 60 55 50 45 40 35 C4.5 Uni(r=0,k=16) Uni(r=3,k=16) Hill(r=3,k=16) 30 0 50 100 150 200 250 300 60 50 40 30 20 10 Maximal classification cost C4.5 Uni(r=0,k=16) Uni(r=3,k=16) Hill(r=3,k=16) 0 0 50 100 150 200 250 Maximal classification cost Misclassification cost Misclassification cost 70 60 50 40 30 20 10 C4.5 Uni(r=0,k=16) Uni(r=3,k=16) Hill(r=3,k=16) 0 0 50 100 150 200 250 300 350 85 80 75 70 65 60 55 50 45 40 35 Maximal classification cost C4.5 Uni(r=0,k=16) Uni(r=3,k=16) Hill(r=3,k=16) 0 50 100 150 200 250 Maximal classification cost Figure 5.14: Results for contract classification: the misclassification cost as a function of the preallocated testing costs contract for Glass (upper-left), AND-OR (upperright), MULTI-XOR (lower-left) and KRK (lower-right). cost-insensitive C4.5. It is easy to see that across all 4 domains Uni- and Hill-TATA(r = 3) are dominant. Uniform-TATA(r = 0) is better than C4.5 when the provided contracts are low. When the contracts can afford using all the attributes, both algorithms perform similarly. In comparison to Uniform-TATA(r = 0), the anycost behavior of Uniform-TATA(r = 3) is better: it is monotonic and utilizes testing resources better. The differences in performance between Uniform- and Hill-TATA(r = 3) are interesting. While both algorithms exhibit similar trends, Hill-TATA reaches better results slightly earlier than Uniform-TATA on 3 out of the 4 domains (with the exception of KRK). The reason is that Hill-TATA selects the series of ρ c ’s heuristically, rather than by means of blind uniform gaps. As a result, it can focus on cost ranges where it is worthwhile to build more trees. These differences are expected to diminish when the repertoires are larger, which enables Uniform-TATA to cover more contracts. To verify this hypothesis, we repeated the experiments with k = 32 and indeed the performance differences between the 116
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 Misclassification cost 50 45 40 35 30 25 20 15 10 5 0 1 2 3 4 5 6 7 Sample size (r) Figure 5.15: Learning repertoires with different time allocations and sample sizes. Each curve represents the normalized AUC for a fixed-time allocation and varying r values. methods disappeared. It is important to note, however, that while Uniform-TATA is a contract learner that requires k in advance, Hill-TATA is an interruptible learner and is therefore appropriate also for cases where the learning resources are not preallocated. In Section 5.4.1 we examined the anytime behavior of the learner in the precontract setup. The results indicate that the misclassification costs decrease with the increase in the sample size (and hence learning resources). In the contract setup, given a fixed learning time, increasing the sample size comes at a price: reducing the number of trees in the repertoire. An interesting question is whether one should invest more resources in building better single trees or in forming larger repertoires? To investigate this, we learned several repertoires using the hill-climbing approach. The trees of each repertoire were induced with a different r parameter for pre-contract-TATA, from 0 up to 7. When r = 0, pre-contract- TATA behaves like the greedy C4.5$. In this case we assumed that an infinite number of trees can be built (in practice a tree was built for every tested value of ρ c ). Because we used the hill-climbing approach, we could stop the learning process at any time. We chose three different stopping points: 1 seconds, 3 seconds, and 5 seconds. We tested the performance of these 8×3 repertoires. Figure 5.15 gives the results. Each curve stands for a different time allocation. The first plot gives the normalized AUC in the range ρ = 33% − 99%ρ c max . It is easy to see that in all three graphs, increasing the learning time allows the production of more trees: the curve for T = 5 is lower than that of T = 3 117 T=1 T=3 T=5
Page 1 and 2:
Technion - Computer Science Departm
Page 3 and 4:
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82: Technion - Computer Science Departm
Page 131: Technion - Computer Science Departm
Page 183 and 184:
Page 185 and 186:
Page 187 and 188:
Page 189 and 190:
Page 191 and 192:
show all

anytime algorithms for learning anytime classifiers saher ... - Technion

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?