anytime algorithms for learning anytime classifiers saher ... - Technion

More documents

Recommendations

Info

Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 node if resources allow. These would make challenging and interesting directions for future research. Nijssen and Fromont (2007) presented DL8, an exact algorithm for finding a decision tree that optimizes a ranking function under size, depth, accuracy and leaf constraints. The key idea behind DL8 is that constraints on decision trees are simulated as constraints on itemsets. They show that optimal decision trees can be extracted from lattices of itemsets in linear time. The applicability of DL8, however, is limited by two factors: the number of itemsets that need to be stored, and the time that it takes to compute these itemsets. In some cases, the number of frequent itemsets is so large that it is impossible to compute or store them within a reasonable amount of time or space. Several researchers investigated the induction of shallow decision trees. Holte (1993) reported an empirical study of the accuracy of rules that classify examples on the basis of a single test (1R). Holte concluded that, on most real world datasets, multilevel decision trees do not perform significantly better than onelevel classification rules. Elomaa (1994), however, questioned the validity of these conclusions. Elomaa argued that the small difference in accuracy between 1R and C4.5 is still significant, and that the conclusions may have been the results of the use of unrepresentative databases. Auer, Holte, and Maass (1995) presented a novel algorithm, called T2, for agnostic PAC-learning with decision trees of at most 2-levels. The computation time of T2 is almost linear in the size of the training set. When tested empirically on several datasets, T2 was shown to produce substantially simpler decision trees with little or no loss in predictive power. Since one can prove that T2 is an angostic PAC-learning algorithm, it is guaranteed to produce close to optimal 2level decision trees given sufficiently large training data. A generalization to depth d, however, would require computation time that is exponential in d. Dobkin, Gunopulos, and Kasif (1996) described several algorithms and theoretical results for learning optimal consistent decision trees of bounded depth. These methods, nevertheless, are practical only when the tree depth is tiny. Much research effort has also been invested in the theoretical aspects of decision tree learning. A standard theoretical approach is to prove a bound on generalization error as a function of the training error and the concept size (McAllester, 1998). Then, a concept optimizing the tradeoff between training error and concept size, as expressed in the bound, is selected. These bounds depend on the size of the training sample, but not on the sample itself. To improve them, Mansour and McAllester (2000) constructed bounds that depend both on the structure of the model and on the actual examples that form the sample. Kushilevitz and Mansour (1993) presented a polynomial time algorithm for learning decision trees with respect to the uniform distribution, using membership queries. The considered decision tree model is an extension of the traditional 126
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 Boolean decision tree model that allows linear operations in each node. Kearns and Mansour (1996) analyze the performance of greedy top-down tree learners that use impurity-based split criteria, and demonstrate that such algorithms implicitly perform boosting. Mansour, Kearns and Mansour (1997, 1998) suggest modifying C4.5’s (pessimistic) error based pruning to incorporate theoretically motivated local decision criteria, and provide risk bounds relating the pruned tree’s performance to the performance of the best possible pruned tree. 6.2 Anytime Induction of Other Costinsensitive Classifiers Ensemble-based methods, such as bagging and boosting (Schapire, 1999), can also be viewed as anytime algorithms. However, the classifiers constructed by the bagging and boosting algorithms consist of a committee of decision trees rather than a single tree. Therefore, the problem they face is very different from the problem we face in this work—that of learning a single tree. A major problem with ensemble-based methods is that the induced ensemble is often large, complex, and difficult to interpret (Freund & Mason, 1999). Therefore, these methods cannot be used when comprehensible models are required. 1 Another problem is that greedy trees are unable to discover any knowledge about hard-tolearn target concepts. Therefore, combining them cannot improve performance. In our experiments, reported in Section 3.7.6, we provide empirical support for this claim by comparing our proposed anytime algorithms to bagging. Dietterich (2000) presented the randomized-C4.5 algorithm, where a randomized version of C4.5 that chooses the split attribute at random from among the 20 best candidates is repeatedly invoked to produce an ensemble of decision trees. The experimental results indicate that ensembles of randomized-C4.5 were competitive with bagging but not as accurate as boosting. As in the case of the traditional bagging and boosting methods, our framework differs from randomized- C4.5 in that the latter produces an ensemble of trees that is obviously not as comprehensible as a single decision tree is. Kononenko and Kovacic (1992) applied stochastic search methods, such as stochastic hill climbing and simulated annealing, for learning decision rules from examples: rules are randomly generated and iteratively improved until a local maximum is reached. 1 In the contract and interruptible classification setups, our framework builds several trees. However, because only one tree participates in classification, the comprehensibility is not adversely affected. 127
Page 1 and 2:
Technion - Computer Science Departm
Page 3 and 4:
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92: Technion - Computer Science Departm
Page 141: Technion - Computer Science Departm
show all

anytime algorithms for learning anytime classifiers saher ... - Technion

Create successful ePaper yourself

Delete template?

Save as template?