anytime algorithms for learning anytime classifiers saher ... - Technion

More documents

Recommendations

Info

Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 Misclassification cost Misclassification cost 70 65 60 55 50 45 40 35 C4.5 TATA(r=0) TATA(r=5) 30 0 50 100 150 200 250 300 50 45 40 35 30 25 20 15 10 5 0 Maximal testing cost C4.5 TATA(r=0) TATA(r=5) 0 50 100 150 200 250 Maximal testing cost Misclassification cost Misclassification cost 60 50 40 30 20 10 C4.5 TATA(r=0) TATA(r=5) 0 0 50 100 150 200 250 300 350 85 80 75 70 65 60 55 50 45 40 35 C4.5 TATA(0) TATA(5) Maximal testing cost 0 50 100 150 200 250 Maximal testing cost Figure 5.12: Results for pre-contract classification: the misclassification cost as a function of the preallocated testing costs contract for one instance of Glass (upperleft), AND-OR (upper-right), MULTI-XOR (lower-left) and KRK (lower-right). samples the possible subtrees of each candidate and bases its decision on the quality of the sampled trees. Of course, the advantage of TATA(r = 5) comes at a price: it uses far more learning resources. Figure 5.12 gives the results for 4 individual datasets, namely Glass, AND- OR, MULTI-XOR and KRK. In all 4 cases TATA is dominant with its r = 5 version being the best method. We can see that the misclassification cost decreases almost monotonically. The curves, however, are less smooth than the average result from Figure 5.11 and slight degradations are observed. The reason could be that irrelevant features become available and mislead the learner. The KRK curve looks much like steps. The algorithms improve at certain points. These points represent the budgets when the use of another relevant attribute becomes possible. The graphs of AND-OR and MULTI-XOR do not share this phenomenon because these concepts hide interdependency between the attributes. As a results, an attribute may be useless if other attributes cannot be considered. The performance of the greedy C4.5 and TATA(r = 0) on these problems is very poor. 114
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 Misclassification cost 40 35 30 25 20 15 10 5 0 0 1 2 3 4 5 Sample size (r) C4.5 TATA Figure 5.13: TATA with different sample sizes on the Multi-XOR dataset Besides being able to produce good anycost trees, TATA itself is an anytime algorithm that can trade learning resources for producing better trees. Our next experiment examines the anytime behavior of TATA by invoking it with different values of r. Figure 5.13 shows the results. As we can see, all TATA versions are better than C4.5. With the increase in r, the advantage of TATA increases. The most significant improvement is from r = 0 to r = 1. TATA with r = 0 is a greedy algorithm that uses local heuristics. When r > 0, TATA uses sampling techniques to evaluate attributes and therefore becomes significantly better. For more difficult concepts, where only a combination of a large number of attributes yields information, a larger value of r would be needed. 5.4.2 Contract Classification TATA uses repertoires to operate in the contract and interruptible classification setups. To examine the anycost behavior of TATA in the contract setup we built 3 repertoires, each of size 16. The first repertoire uses pre-contract-TATA(r = 0) to induce the trees, with uniform contract gaps. The second repertoire uses precontractTATA(r = 3) to induce the trees, with uniform cntract gaps. The trees in the third repertoire are also formed using pre-contract-TATA(r = 3), but with the hill-climbing approach instead of uniform gaps. The repertoires were used to classify examples in the contract setup. 120 uniformly distributed ρ values in the range [0 − 120%ρ c max] were used as contract parameters. Figure 5.14 describes the performance of these three repertoires on 4 datasets (Glass, AND-OR, MULTI-XOR, and KRK), averaged over the 100 runs of 10 times 10-fold cross-validation. In addition, we report the results of the 115
Page 1 and 2:
Technion - Computer Science Departm
Page 3 and 4:
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80: Technion - Computer Science Departm
Page 129: Technion - Computer Science Departm
Page 181 and 182:
Page 183 and 184:
Page 185 and 186:
Page 187 and 188:
Page 189 and 190:
Page 191 and 192:
show all

anytime algorithms for learning anytime classifiers saher ... - Technion

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?