anytime algorithms for learning anytime classifiers saher ... - Technion

More documents

Recommendations

Info

Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 setup we start with a greedy tree. Then, we repeatedly select a subtree whose reconstruction is expected to yield the highest marginal utility, and rebuild the subtree with a doubled allocation of resources. 1.2 Major Contributions The major contributions of this thesis are: 1. First work on anytime induction of anytime classifiers: To the best of our knowledge, this is the first work that deals with anytime learning of anytime classifiers. Our proposed framework allows computational resources during the learning process to be traded for better classifiers, whether these classifiers are cost-insensitive, cost-sensitive, or resource-bounded. Allowing such a tradeoff is useful for many real-world applications where the user is willing to invest more learning resources than required by the greedy methods. The framework can be invoked both in the contract setup where the learning resources are predetermined or in the interruptible setup where the learner may be stopped anytime. 2. Algorithms for producing both accurate and comprehensible classifiers: In comparison to other recently developed learning algorithms such as SVM, decision trees are known to be easier to use and more humanly comprehensible. LSID3 and IIDT, our proposed algorithms for cost-insensitive learning, make it possible to learn accurate classifiers without compromising on comprehensibility. 3. Algorithms for producing low error, low cost classifiers: Many real-world problems involve nonuniform testing and misclassification costs. Our novel ACT algorithm can utilize extra learning time to produce cost-efficient trees with lower total costs. ACT evaluates entire trees during the search; thus, it can be adjusted to any cost scheme that is defined over trees. 4. Algorithms for producing anycost classifiers: When classification resources are limited, we need a classifier that can do its best within the available resources. TATA, our anytime framework for learning anycost classifiers, is designed to produce accurate classifiers that can operate under strict cost constraints, preallocated or not. 5. Nonparametric method for learning hard concepts: Several hard concepts such as parity functions are known to be a problem 10
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 for induction algorithms. In such cases, previous knowledge about the target concept can help in choosing better parameters for the classifier, e.g., the right architecture of a neural network or the right kernel for an SVM. However, such previous knowledge usually does not exist. Our proposed algorithms can learn these hard concepts without the need for an ad hoc setup. 6. Easy to parallelize learner: Our sampling approach for evaluating candidate splits during top-down tree induction can easily be parallelized: different machines can be used to sample the space of trees simultaneously and independently. Therefore, the method benefits from distributed computer power. 7. Empirical study of Occam’s razor: Occam’s razor is the principle that, given two hypotheses consistent with the observed data, the simpler one should be preferred. Many machine learning algorithms follow this principle and search for a small hypothesis within the version space. The principle has been the subject of a heated debate with theoretical and empirical arguments both for and against it. Earlier empirical studies lacked sufficient coverage to resolve the debate. In this work we provide convincing empirical evidence for Occam’s razor in the context of decision tree induction, and show that indeed a smaller tree is likely to be more accurate, and that this correlation is statistically significant. 8. Automatic method for cost-assignment to existing datasets: Typically, machine learning researchers use datasets from the UCI repository (Asuncion & Newman, 2007). Only five UCI datasets, however, have assigned test costs. To gain a wider perspective, we have developed an automatic, parameterized method that assigns costs to existing datasets. 1.3 Thesis Outline The rest of this thesis is organized as follows. In Chapter 2 we provide background on anytime algorithms and resource-bounded classification, and describe the different scenarios under which our proposed framework can operate. Chapter 3 introduces our novel anytime approach for sampling-based attribute evaluation and instantiates this approach for creating accurate decision trees. An interruptible approach for acting when learning resources are not preallocated is described in Chapter 3 as well. In Chapter 4 we present our methodology for constructing low-error, low-cost trees. In Chapter 5 we focus on inducing resource-bounded 11
Page 1 and 2: Technion - Computer Science Departm
Page 25: Technion - Computer Science Departm
Page 77 and 78:
Technion - Computer Science Departm
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 185 and 186:
Page 187 and 188:
Page 189 and 190:
Page 191 and 192:
show all

anytime algorithms for learning anytime classifiers saher ... - Technion

Create successful ePaper yourself

Delete template?

Save as template?