anytime algorithms for learning anytime classifiers saher ... - Technion

More documents

Recommendations

Info

Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 Procedure Choose-Node(T, E, A) Foreach node ∈ T rnode ← Next-R(node) costnode ← Expected-Cost(T, node) max-cost ← Expected-Cost(T, root) If (costnode/max-cost) > g Tnode ← Subtree(node) ∆q ← Expected-Benefit(Tnode) unode ← ∆q/costnode best ← node that maximizes unode Return 〈best, rbest〉 Figure 3.14: Choosing a node for reconstruction in IIDT Granularity. Considering the cost and benefit approximations described above, the selection procedure would prefer deep nodes (that are expected to have low costs) with large subtrees (that are expected to yield large benefits). When no such large subtrees exist, our algorithm may repeatedly attempt to improve smaller trees rooted at deep nodes because these trees have low associated costs. In the short term, this behavior would indeed be beneficial but can be harmful in the long term. This is because when the algorithm later improves subtrees in upper levels, the resources spent on deeper nodes will have been wasted. Had the algorithm first selected the upper level trees, this waste would have been avoided, but the time gaps between potential improvements would have increased. To control the tradeoff between efficient resource use and anytime performance flexibility, we add a granularity parameter 0 ≤ g ≤ 1. This parameter serves as a threshold for the minimal time allocation for an improvement phase. A node can be selected for improvement only if its normalized expected cost is above g. To compute the normalized expected cost, we divide the expected cost by the expected cost of the root node. Note that it is possible to have nodes with a cost that is higher than the cost of the root node, since the expected cost doubles the cost of the last improvement of the node. Therefore, the normalized expected cost can be higher than 1. Such nodes, however, will never be selected for improvement, because their expected benefit is necessarily lower than the expected benefit of the root node. Hence, when g = 1, IIDT is forced to choose the root node and its behavior becomes identical to that of the sequencing algorithm described in Section 3.6.1. Observe that IIDT does not determine g but expects the user to provide this value according to her needs: more frequent small improvements or faster overall progress. Figure 3.14 formalizes the procedure for choosing a node for reconstruction. 40
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 Procedure TS-Greedy-Build-Tree(E, A) Return ID3(E, A) Procedure TS-Rebuild-Tree(E, A, r) Return LSID3(E, A, r) Procedure TS-Expected-Cost(T, node) Enode ← Examples-At(node, T, E) Anode ← Attributes-At(node, T, A) Return Next-R(node) · |Enode| · |Anode| 3 Procedure TS-Expected-Benefit(T) l-bound ← (mina∈Anode |Domain(a)|)2 Return Tree-Size(T) − l-bound Procedure TS-Better(T1, T2) Return Tree-Size(T1) < Tree-Size(T2) Figure 3.15: IIDT-TS We refer to the above-described instantiation of IIDT that uses the tree size as a quality metric by IIDT-TS. Figure 3.15 formalizes IIDT-TS. Evaluating a Subtree Although LSID3 is expected to produce better trees when allocated more resources, an improved result is not guaranteed. Thus, to avoid obtaining an induced tree of lower quality, we replace an existing subtree with a newly induced alternative only if the alternative is expected to improve the quality of the complete decision tree. Following Occam’s Razor, we measure the usefulness of a subtree by its size. Only if the reconstructed subtree is smaller does it replace an existing subtree. This guarantees that the size of the complete decision tree will decrease monotonically. Another possible measure is the accuracy of the decision tree on a set-aside validation set of examples. In this case the training set is split into two subsets: a growing set and a validation set. Only if the accuracy on the validation set increases is the modification applied. This measure suffers from two drawbacks. The first is that putting aside a set of examples for validation results in a smaller set of training examples, making the learning process harder. The second is the bias towards overfitting the validation set, which might reduce the generalization 41
Page 1 and 2:
Technion - Computer Science Departm
Page 3 and 4:
Page 5 and 6: Technion - Computer Science Departm
Page 55: Technion - Computer Science Departm
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 185 and 186:
Page 187 and 188:
Page 189 and 190:
Page 191 and 192:
show all

anytime algorithms for learning anytime classifiers saher ... - Technion

Create successful ePaper yourself

Delete template?

Save as template?