anytime algorithms for learning anytime classifiers saher ... - Technion

More documents

Recommendations

Info

Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 heart cath no normal alerting chest pain no yes yes normal alerting normal blood pressure no yes no no normal cardiac stress chest pain no yes heart cath yes normal normal alerting blood pressure no yes yes no heart cath yes normal alerting Figure 1.1: Three possible decision trees for diagnosis of heart disease. The upperleft tree bases its diagnosis solely on heart cath and is therefore accurate but prohibitively expensive. The lower-left tree dispenses with the need for heart cath and makes a diagnosis using a single, simple, and inexpensive test: whether or not the patient complains of chest pain. Such a tree would be highly accurate but does not distinguish between the costs of the different error types. The third (righthand) tree is preferable: it attempts to minimize test costs and misclassification costs simultaneously. cost, but in utilizing a predetermined classification budget in order to minimize misclassification costs. The predictive models learned by the medical center and by the hardware manufacturer, are going to be used during the coming months, or even years, to make vital decisions. Obviously, in both cases we are willing to invest many resources during the learning phase to reduce future classification costs. An algorithm that can improve the quality of its output with the increase in allocated resources is called an anytime algorithm (Boddy & Dean, 1994). We use a similar notation and refer to a classifier whose accuracy improves with the increase in allocation of testing resources as an anycost classifier. In this work we propose a novel and innovative framework that allows anytime learning of anycost classifiers. Our algorithms are practical for numerous real-life applications that involve offline learning (Esmeir & Markovitch, 2006). They allow for a greater investment of learning resources in return for better predictive models. Despite the recent progress in advanced induction algorithms such as SVM 6
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 (Vapnik, 1995), decision trees are still considered attractive for many real-life applications, mostly due to their interpretability (Hastie, Tibshirani, & Friedman, 2001, chap. 9). Craven (1996) lists several reasons why the understandability of a model by humans is an important criterion for evaluating it. These reasons include, among others, the possibility for human validation of the model and generation of human-readable explanations for the classifier predictions. When classification cost is important, decision trees may be attractive in that they ask only for the values of the features along a single path from the root to a leaf. In terms of accuracy, decision trees have been shown to be competitive with other classifiers for several learning tasks. For any given set of training examples, many trees can be induced. An important questions is what tree should be preferred? Cost-insensitive decision tree learners attempt to produce small consistent trees, which, by Occam’s razor (Blumer, Ehrenfeucht, Haussler, & Warmuth, 1987), should have better predictive power. 1 The problem of finding the smallest consistent tree is known to be NP-complete (Hyafil & Rivest, 1976; Murphy & McCraw, 1991). Obviously, when costs are involved, the problem of finding the tree with the lowest expected total cost is even more difficult. One way to overcome the unaffordable complexity of finding the optimal tree is to use greedy heuristics. The majority of existing decision-tree learners build a tree top-down and choose a split based on local heuristics. The well-known, cost-insensitive C4.5 algorithm (Quinlan, 1993) uses the gain ratio as a heuristic for predicting which attribute will yield a smaller tree. Several other alternative local greedy measures have been developed, among which are ID3’s information gain, the Gini index (Breiman, Friedman, Olshen, & Stone, 1984), and the chi-square test (Mingers, 1989). For problems with test costs, a number of works suggest considering both the cost of a split and the immediate gain in information it yields, e.g., EG2 (Nunez, 1991). When nonuniform misclassification costs are involved, the DTMC algorithm evaluates a feature by its immediate reduction in total cost (Ling, Yang, Wang, & Zhang, 2004; Sheng, Ling, Ni, & Zhang, 2006). The top-down methodology has the advantage of evaluating a potential attribute for a split in the context of the tests associated with the nodes above it. The local greedy measures, however, consider each of the remaining attributes independently, ignoring potential interaction between different attributes (Mingers, 1989; Kononenko, Simec, & Robnik-Sikonja, 1997; Kim & Loh, 2001). In paritylike concepts, for example, the usefulness of an attribute cannot be discovered when the attribute is judged independently. Such concepts naturally arise in real-world data such as the Drosophila survival rate (Page & Ray, 2003). This problem is intensified when test costs are involved. High costs may hide the 1 A consistent decision tree is a tree that correctly classifies all training examples. 7
Page 1 and 2: Technion - Computer Science Departm
Page 21: Technion - Computer Science Departm
Page 73 and 74:
Technion - Computer Science Departm
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 185 and 186:
Page 187 and 188:
Page 189 and 190:
Page 191 and 192:
show all

anytime algorithms for learning anytime classifiers saher ... - Technion

Create successful ePaper yourself

Delete template?

Save as template?