anytime algorithms for learning anytime classifiers saher ... - Technion

More documents

Recommendations

Info

Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 attributes exceeds 35, but even with 60 attributes the algorithm achieves an accuracy of about 82%. The skewing algorithms, on the other hand, are much more sensitive to irrelevant attributes and their performance decreases drastically with the increase in the number of irrelevant attributes. When the number of attributes is more than 35, the skewing algorithms become no better than a random guesser. The consistent advantage of LSID3 is clear also in terms of tree size, where the trees produced by ID3 and skewing are significantly larger. To be fair, it is important to note that LSID3 had a much longer runtime than skewing with its default parameters. However, our previous experiments with parity concepts showed that the performance of skewing does not improve with time and hence the results are expected to be the same, even if skewing were to be allocated the same amount of time. To verify this, we repeated the experiment for 35 and 60 attributes and allocated skewing the same time as LSID3(r = 5). The results were similar to those reported in Figure 6.1 and no improvement in the performance of skewing was observed. 6.1.2 Other Cost-insensitive Decision-Tree Inducers Papagelis and Kalles (2001) studied GATree, a learner that uses genetic algorithms for building decision trees. GATree does not adopt the top-down scheme. Instead, it starts with a population of random trees and uses a mutation operation of randomly changing a splitting test and a crossover operation of exchanging subtrees. Unlike our approach, GATree is not designed to generate consistent decision trees and searches the space of all possible trees over a given set of attributes. Thus, it is not appropriate for applications where a consistent tree is required. Like most genetic algorithms, GATree requires cautious parameter tuning and its performance depends greatly on the chosen setting. Comparing GATree to our algorithm (see Section 3.7.6) shows that, especially for hard concepts, it is much better to invest the resources in careful tuning of a single tree than to perform genetic search over the large population of decision trees. Utgoff et al. (1997) presented DMTI (Direct Metric Tree Induction), an induction algorithm that chooses an attribute by building a single decision tree under each candidate attribute and evaluates it using various measures. Several possible tree measures were examined and the MDL (Minimum Description Length) measure performed best. DMTI is similar to LSID3(r = 1) but, unlike LSID3, it can only use a fixed amount of additional resources and hence cannot serve as an anytime algorithm. When the user can afford using more resources than required by DMTI, the latter does not provide means to improve the learned model further. Furthermore, DMTI uses a single greedy lookahead tree for attribute evaluation, while we use a biased sample of the possible lookahead trees. Our experiments with DMTI (as available online) show that while it can solve 124
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 simpler XOR and multiplexer problems, its limited lookahead is not sufficient for learning complex concepts such as XOR-10: DMTI achieved an accuracy of 50%. IIDT and LSID3, by producing larger samples, overcame this problem and reached high accuracies. Kim and Loh (2001) introduced CRUISE, a bias-free decision tree learner that attempts to produce more compact trees by (1) using multiway splits— one subnode for each class, and (2) examining pair-wise interactions among the variables. CRUISE is able to learn XOR-2 and Chess-board (numeric XOR- 2) concepts. Much like ID3-k with k = 2, it cannot recognize more complex interactions. Bennett (1994) presented GTO, a non-greedy approach for repairing multivariate decision trees. GTO requires as input an initial tree. The algorithm retains the structure of the tree but attempts to simultaneously improve all the multivariate decisions of the tree using iterative linear programming. GTO and IIDT both use a non-greedy approach to improve a decision tree. The advantage of GTO is its use of a well-established numerical method for optimization. Its disadvantages are its inability to modify the initial structure and its inability to exploit additional resources (beyond those needed for convergence). Freund and Mason (1999) described a new type of classification mechanism, the alternating decision tree (ADTree). ADTree provides a method for visualizing decision stumps in an ordered and logical way to demonstrate correlations. Freund and Mason presented an iterative algorithm for learning ADTrees, based on boosting. While it has not been studied as an anytime algorithm, ADTrees can be viewed as such. The learner starts with a constant prediction and adds one decision stump at a time. If stopped, it returns the current tree. The anytime behavior of ADTree, however, is problematic. Additional time resources can only be used to add more rules and therefore might result in large and over complicated trees. Moreover, ADtree is not designed to tackle the problem of attribute interdependencies because it evaluates each split independently. Murthy, Kasif, and Salzberg (1994) introduced OC1 (Oblique Classifier 1), a new algorithm for induction of oblique decision trees. Oblique decision trees use multivariate tests that are not necessarily parallel to an axis. OC1 builds decision trees that contain linear combinations of one or more attributes at each internal node; these trees then partition the space of examples with both oblique and axisparallel hyperplanes. The problem of searching for the best oblique split is much more difficult than that of searching for the best axis-parallel split because the number of candidate oblique splits is exponential. Therefore, OC1 takes a greedy approach and attempts to find locally good splits rather optimal ones. Our LSID3 algorithm can be generalized to support the induction of oblique decision trees by using OC1 as the sampler, and by considering oblique splits. Furthermore, OC1 can be converted into an anytime algorithm by considering more splits at each 125
Page 1 and 2:
Technion - Computer Science Departm
Page 3 and 4:
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90: Technion - Computer Science Departm
Page 139: Technion - Computer Science Departm
Page 191 and 192:
show all

anytime algorithms for learning anytime classifiers saher ... - Technion

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?