anytime algorithms for learning anytime classifiers saher ... - Technion

More documents

Recommendations

Info

Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 Test costs a1 a2 a a4 a5 a6 40 10 10 10 10 10 T1 T2 a1 40 60 - + a1 20 80 a2 a3 15 5 10 70 - + - + Figure 5.9: An example of applying cost discounts when forming a repertoire for interruptible classification. The numbers represent the number of examples that follow each edge. Because the value of a1 is required at the root of T1, any subsequent tree can obtain this value at no cost. The probability for testing a2 is 50% in T2. When inducing T3, the attribute a2 has already been measured with a probability of 50%. Hence, we discount the cost of a2 by 50% ($5 instead of $10). Similarly, the cost of a3 is discounted by 80% ($2 instead of $10). previous trees. Consider, for example, the trees in Figure 5.9. The probability to measure a1 in T1 is 100%. Therefore, when building subsequent trees, the cost of a1 would be zero. The probability for testing a2 is in T2 20%. Hence, when inducing T3, we discount the cost of a2 by 20% ($8 instead of $10). Similarly, the cost of a3 is discounted by 80% ($2 instead of $10). Because the trees may be strongly correlated, we cannot simply calculate this probability independently for each tree. For example, if T3 in the aforementioned example tests a2 for 70% of the examples, we would like to know for how many of these examples a2 has been tested also in T2. Therefore, we traverse the previous trees with each of the training examples and mark the attributes that are tested at least once. For efficiency, the matrix that represents which tests were administered for which case is built incrementally and updated after building each new tree. We refer to this method as discount repertoire. The repertoire is formed using the same method in Figure 5.5 with a single change: before building each tree, cost discounts are applied; the discounts are based on the trees already in the repertoire. Figure 5.10 formalizes the procedure for updating test costs. During classification we iterate over the trees until interrupted, as described in Figure 5.8. 5.4 Empirical Evaluation A variety of experiments were conducted to test the performance and behavior of TATA in 3 different setups: pre-contract, contract, and interruptible. In Chapter 110
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 Procedure Apply-Discount(E, A, Π) Mi,j ← 0 Foreach e ∈ E Foreach T ∈ Π Â ← attributes whose values tested by T when classifying e Foreach a ∈ A Me,a ← 1 Foreach a ∈ Â pa ← P |E| e=1 Ma,e |E| cost(a) ← cost(a) · (1 − pa) Figure 5.10: Procedure for applying discounts when forming discount repertoires for interruptible classification 4, we presented an automatic method for assigning testing costs to attributes in existing datasets. We applied this method 4 times on 20 UCI (Asuncion & Newman, 2007) problems 2 and another 5 datasets that hide hard concepts and have been used in previous machine learning literature. Table 5.1 summarizes the basic properties of these datasets while Appendix A describes them in more details. 3 Following the recommendations of Bouckaert (2003), 10 runs of a 10-fold crossvalidation experiment were conducted for each dataset and the reported results are averaged over the 100 individual runs. 5.4.1 Pre-Contract Classification Our first set of experiments compares C4.5, EG2, EG2$, TATA(r = 0), which is equivalent to C4.5$, and TATA(r = 5) in the pre-contract setup. Misclassification has been set uniformly to 100. 4 For each dataset we invoked the algorithms 30 times, each with a different ρc value taken from the range [0, 120%ρc max), with uniform steps. Figure 5.11 describes the misclassification cost of the different algorithms, as a function of ρc . For each point (ρc value),the results are averaged over the 100 datasets. 5 Clearly, TATA(r = 5) is dominant. When ρc ≤ ρc min , the algorithms cannot 2 The datasets vary in size, type of attributes, and dimension. 3 The 4X25 datasets are available at http://www.cs.technion.ac.il/∼esaher/publications/cost. 4 Note that the absolute value of the misclassification cost does not matter because we do not assume same-scale. 5 The full results are available at http://www.cs.technion.ac.il/∼esaher/publications/rbc. 111
Page 1 and 2:
Technion - Computer Science Departm
Page 3 and 4:
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76: Technion - Computer Science Departm
Page 125: Technion - Computer Science Departm
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 185 and 186:
Page 187 and 188:
Page 189 and 190:
Page 191 and 192:
show all

anytime algorithms for learning anytime classifiers saher ... - Technion

Create successful ePaper yourself

Delete template?

Save as template?