anytime algorithms for learning anytime classifiers saher ... - Technion

More documents

Recommendations

Info

Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 Procedure SID3-Choose-Attribute(E, A) Foreach a ∈ A p (a) ← gain-1(E, a) If ∃a such that entropy-1(E, a) = 0 a ∗ ← Choose attribute at random from {a ∈ A | entropy-1(E, a) = 0} Else a ∗ ← Choose attribute at random from A; for each attribute a, the probability of selecting it is proportional to p (a) Return a ∗ Figure 3.4: Attribute selection in SID3 Next, we compared the average minimum found for samples of different sizes. Figure 3.6 shows the results. For the three datasets, the minimal size found by SID3 is strictly smaller than the value found by RTG. Given the same budget of time, RTG produced, on average, samples that are twice as large as that of SID3. However, even when the results are normalized (dashed line), SID3 is still superior. Having decided about the sampler, we are ready to describe our proposed contract algorithm, Lookahead-by-Stochastic-ID3 (LSID3). In LSID3, each candidate split is evaluated by the estimated size of the subtree under it. To estimate the size under an attribute a, LSID3 partitions the set of examples according to the values a can take and repeatedly invokes SID3 to sample the space of trees consistent with each subset. Summing up the minimal tree size for each subset gives an estimation of the minimal total tree size under a. LSID3 is a contract algorithm parameterized by r, the sample size. LSID3 with r = 0 is defined to choose the splitting attribute using the standard ID3 selection method. Figure 3.7 illustrates the choice of splitting attributes as made by LSID3. In the given example, SID3 is called twice for each subset and the evaluation of the examined attribute a is the sum of the two minima: min(4, 3)+ min(2, 6) = 5. The method for choosing a splitting attribute is formalized in Figure 3.8. To analyze the time complexity of LSID3, let m be the total number of examples and n be the total number of attributes. For a given node y, we denote by ny the number of candidate attributes at y, and by my the number of examples that reach y. ID3, at each node y, calculates gain for ny attributes using my examples, i.e., the complexity of choosing an attribute is O(ny · my). At level i of the tree, the total number of examples is bounded by m and the number of 26
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 Frequency Frequency Frequency 0.3 0.25 0.2 0.15 0.1 0.05 RTG SID3 0 40 60 80 100 120 140 160 180 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0.35 0.25 0.15 0.05 RTG SID3 Size 0 150 200 250 300 350 400 450 500 550 600 0.3 0.2 0.1 Size 0 15 20 25 30 35 40 45 50 55 Figure 3.5: Tree-size frequency curves for the XOR-5 (left), Tic-tac-toe, and Zoo (right) datasets attributes to consider is n−i. Thus, it takes O(m·(n−i)) to find the splits for all nodes in level i. In the worst case the tree will be of depth n and hence the total runtime complexity of ID3 will be O(m · n 2 ) (Utgoff, 1989). Shavlik, Mooney, and G. (1991) reported for ID3 an empirically based average-case complexity of O(m · n). It is easy to see that the complexity of SID3 is similar to that of ID3. LSID3(r) invokes SID3 r times for each candidate split. Recalling the above analysis for 27 Size RTG SID3
Page 1 and 2: Technion - Computer Science Departm
Page 41: Technion - Computer Science Departm
Page 93 and 94:
Technion - Computer Science Departm
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 185 and 186:
Page 187 and 188:
Page 189 and 190:
Page 191 and 192:
show all

anytime algorithms for learning anytime classifiers saher ... - Technion

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?