anytime algorithms for learning anytime classifiers saher ... - Technion

More documents

Recommendations

Info

Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 utility of informative features and lead the learner to prefer irrelevant features, which, when evaluated locally, seem better. The effect of a wrong split decision in the greedy top-down strategy is propagated down to all the nodes below it. The greedy algorithms, nevertheless, cannot utilize extra time resources to avoid the irreversible damage caused by wrong splits. In this work we proposed a novel approach for building decision trees. Our approach allows more resources to be invested in the learning process in return for producing better models. Quinlan (1993, chap. 11) recognized the need for this type of anytime algorithm for decision tree learning: “What is wanted is a resource constrained algorithm that will do the best it can within a specified computational budget and can pick up threads and continue if this budget is increased. This would make a challenging thesis topic!” Note that in this work we assume the availability of a set of labeled examples, along with their feature values. In many scenarios the budget for obtaining the training examples is limited. Active learning (Lindenbaum, Markovitch, & Rusakov, 2004) and budgeted learning (Lizotte, Madani, & Greiner, 2003) deal with the acquisition of training examples. We view these works as orthogonal to our work. 1.1 Proposed Solution In order to overcome the inherited shortcomings of greedy split evaluation, a fundamentally new approach for attribute evaluation is needed. One way to exploit additional learning time resources is to perform lookahead. Lookahead search is a well-known technique for improving greedy algorithms (Sarkar, Chakrabarti, Ghose, & DeSarkar, 1994). When applied to decision tree induction, lookahead attempts to predict the profitability of a split at a node by estimating its effect on deeper descendants of the node. The damage caused by a wrong decision in top-down induction is irreversible: once an attribute was chosen to split on, there is no provision for backtracking and choosing another attribute instead. Lookahead search attempts to predict and avoid such non-contributive splits during the process of induction, before the final decision at each node is taken. Lookahead techniques have been applied to decision tree induction by several researchers. The reported results vary from lookahead produces better trees (Norton, 1989; Ragavan & Rendell, 1993; Dong & Kothari, 2001) to lookahead does not help and can hurt (Murthy & Salzberg, 1995). One problem with these works is their use of a uniform, fixed low-depth lookahead, therefore disqualifying the proposed algorithms from serving as anytime algorithms. Another problem is the datasets on which the lookahead methods were evaluated. For simple learning tasks, such as induction of conjunctive concepts, greedy methods perform quite 8
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 well and no lookahead is needed. However, for more difficult concepts such as XOR, the greedy approach is likely to fail. A third problem is their limitedness to a specific objective: they cannot be adapted to different learning setups and other objectives, such as minimizing testing and misclassification costs. We therefore propose an alternative approach for looking ahead. For each candidate split we sample the spaces of subtrees under it and estimate the utility of the sampled trees. Because we evaluate entire trees, different utility functions can be used, depending on the actual cost scheme. The split with the best tree in its sample is then selected to split on. In the cost-insensitive setup, our goal is to induce small and accurate trees. Following Occam’s razor, we bias the sample towards small consistent trees and evaluate each sample tree by its size. To avoid overfitting the training examples, we apply a post-pruning phase, similarly to C4.5. When our objective is to minimize the total cost, we bias the sample towards low cost trees, and evaluate the sampled trees by their expected total cost. The total cost of a tree is estimated using the average costs of classifying the training examples using the tree, and the expected error of the tree. In cost-insensitive environments, the main goal of pruning is to simplify the tree in order to avoid overfitting the training data. A subtree is pruned if the resulting tree is expected to yield a lower error. When test costs are taken into account, pruning has another important role: reducing test costs in a tree. Keeping a subtree is worthwhile only if its expected reduction in misclassification costs is larger than the cost of the tests in that subtree. Therefore, we designed a novel pruning approach based on the expected total cost of a tree. For the scenarios that constrain the testing costs, we developed a novel topdown approach to exploit the available testing resources. When the bounds are known to the learner, a tree that fits the budget is built. In other cases, a repertoire of trees is formed. If the quota is known before classification, a single tree that best fits the budget is picked. Otherwise, the trees are traversed until resources are exhausted. Our anytime approach can benefit from extra learning time by creating larger samples. The larger the samples are, the more accurate the attribute evaluation is. There are two main classes of anytime algorithms, namely contract and interruptible (Russell & Zilberstein, 1996). A contract algorithm is one that gets its resource allocation as a parameter. An interruptible algorithm is one whose resource allocation is not given in advance and thus must be prepared to be interrupted at any moment. While the assumption of preallocated resources holds for many induction tasks, in many other real-life applications it is not possible to allocate the resources a priori. Therefore, in our work, we are interested both in contract and interruptible decision tree learners. In the contract setup, the sample size is predetermined according to the available resources. In the interruptible 9
Page 1 and 2: Technion - Computer Science Departm
Page 23: Technion - Computer Science Departm
Page 75 and 76:
Technion - Computer Science Departm
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 185 and 186:
Page 187 and 188:
Page 189 and 190:
Page 191 and 192:
show all

anytime algorithms for learning anytime classifiers saher ... - Technion

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?