anytime algorithms for learning anytime classifiers saher ... - Technion

More documents

Recommendations

Info

Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 Average size Average Accuracy 4000 3500 3000 2500 2000 1500 k=3 r=10 1000 0 50 100 150 200 250 300 350 95 90 85 80 75 70 65 60 55 50 k=3 Time [sec] r=10 LSID3 ID3 C4.5 ID3-k 45 0 50 100 150 200 250 300 350 Time [sec] r=15 ID3 C4.5 LSID3 ID3-k Figure 3.24: Anytime behavior of ID3-k and LSID3 on the 10-XOR dataset In the experiment with the Tic-tac-toe dataset, LSID3 dominated ID3-k consistently, both in terms of accuracy and size. ID3-k performs poorly in this case. In addition to the large gaps between successive possible time allocations, a decrease in accuracy and an increase in tree size are observed at k = 3. Similar cases of pathology caused by limited-depth lookahead have been reported by Murthy and Salzberg (1995). Starting from r = 5, the accuracy of LSID3 does not improve over time and sometimes slightly declines (but still dominates ID3-k). We believe that the multiway splits prevent LSID3 from further improvements. Indeed, our experiments in Section 3.7.2 indicate that LSID3 can perform much better with binary splits. Continuous Attributes Our next anytime-behavior experiment uses the Numeric-XOR 4D dataset with continuous attributes. Figure 3.26 gives the results for ID3, C4.5, LSID3, and 56
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 Average Size Average Accuracy 200 180 160 140 120 100 k=3 r=2 80 0 0.2 0.4 0.6 0.8 1 1.2 92 90 88 86 84 82 80 78 r=2 k=3 Time [seconds] r=10 LSID3 ID3k ID3 C4.5 0 0.2 0.4 0.6 0.8 1 1.2 Time [seconds] r=10 LSID3 ID3k ID3 C4.5 Figure 3.25: Anytime behavior of ID3-k and LSID3 on the Tic-tac-toe dataset ID3-k. LSID3 clearly outperforms all the other algorithms and exhibits good anytime behavior. Generalization accuracy and tree size both improve with time. ID3-k behaves poorly in this case. For example, when 200 seconds are allocated, we can run LSID3 with r = 2 and achieve accuracy of about 90%. With the same allocation, ID3-k can be run with k = 2 and achieve accuracy of about 52%. The next improvement of ID3-k (with k = 3) requires 10,000 seconds. But even with such a large allocation (not shown in the graph since it is off the scale), the resulting accuracy is only about 66%. In Section 3.5.1 we described the LSID3-MC algorithm which, instead of uniformly distributing evaluation resources over all possible splitting points, performs biased sampling towards points with high information gain. Figure 3.27 compares the anytime behavior of LSID3-MC to that of LSID3. The graph of LSID3 shows, as before, the performance for successive values of r. The graph of LSID3-MC shows the performance for p = 10%, 20%, . . ., 150%. A few significant conclusions can be drawn from these results: 57
Page 1 and 2:
Technion - Computer Science Departm
Page 3 and 4:
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22: Technion - Computer Science Departm
Page 71: Technion - Computer Science Departm
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 185 and 186:
Page 187 and 188:
Page 189 and 190:
Page 191 and 192:
show all

anytime algorithms for learning anytime classifiers saher ... - Technion

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?