anytime algorithms for learning anytime classifiers saher ... - Technion

More documents

Recommendations

Info

Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 The SVM algorithm usually depends on several parameters – kernel parameters, for example. Several works, such as (Chapelle, Vapnik, Bousquet, & Mukherjee, 2002), proposed iterative methods for automatic tuning of SVM parameters. These iterative methods can exploit additional time resources for better tuning. A well-studied alternative to inductive learning is the theory refinement paradigm. In a theory refinement system, we first acquire a domain theory, for instance by querying experts, and then revise the obtained set of rules in an attempt to make it consistent with the training data. Opitz (1995) introduced an anytime approach for theory refinement. This approach starts by generating a neural network from a set of rules that describe what is currently known about the domain. The network then uses the training data and the additional time resources to try to improve the resulting hypothesis. 6.3 Cost-sensitive Classification Cost-sensitive trees have been the subject of many research efforts. Several works proposed learning algorithms that consider different misclassification costs (Breiman et al., 1984; Pazzani, Merz, Murphy, Ali, Hume, & Brunk, 1994; Provost & Buchanan, 1995; Bradford, Kunz, Kohavi, Brunk, & Brodley, 1998; Domingos, 1999b; Drummond & Holte, 2000; Elkan, 2001; Zadrozny, Langford, & Abe, 2003; Lachiche & Flach, 2003; Abe, Zadrozny, & Langford, 2004; Vadera, 2005; Margineantu, 2005; Zhu, Wu, Khoshgoftaar, & Yong, 2007; Sheng & Ling, 2007b). These methods, however, do not consider test costs and hence are appropriate mainly for domains where test costs are not a constraint. Other authors designed tree learners that take into account test costs, such as IDX (Norton, 1989), CSID3 (Tan & Schlimmer, 1989), and EG2 (Nunez, 1991). These methods, however, do not consider misclassification costs. Decision Trees with Minimal Cost (DTMC), a greedy method that attempts to minimize both types of costs simultaneously, has been recently introduced (Ling et al., 2004; Sheng et al., 2006). A tree is built top-down, and a greedy split criterion that takes into account both testing and misclassification costs is used. The basic idea is to estimate the immediate reduction in total cost after each split, and to prefer the split with the maximal reduction. If no split reduces the cost on the training data, the induction process is stopped. Although efficient, the DTMC approach can be trapped into a local minimum and produce trees that are not globally optimal. For example, consider the concept and costs described in Figure 6.2 (left). There are 10 attributes, of which only a9 and a10 are relevant. The cost of a9 and a10, however, is significantly higher than the others. Such high costs may hide the usefulness of a9 and a10, and mislead the learner into repeatedly splitting on a1−8, which would result in 128
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 a10 0 1 a9 a10 1 0 cost(a1-8) = $$ cost(a9,10) = $$$$$$ cost(a1-10) = $$ a7 a9 a9 0 1 1 0 a1 a6 a4 a4 0 1 1 0 Figure 6.2: Left: an example of a difficulty greedy learners might face. Right: an example of the importance of context-based feature evaluation. a large, expensive tree. The problem would be intensified if a9 and a10 were interdependent, with a low immediate information gain (e.g., a9 ⊕ a10). In that case, even if the costs were uniform, a local measure might fail to recognize the relevance of a9 and a10. DTMC is appealing when learning resources are very limited. However, it requires a fixed runtime and cannot exploit additional resources to escape local minima. Inexpensive Classification with Expensive Tests (ICET) was a pioneer in non-greedy search for a tree that minimizes test and misclassification costs (Turney, 1995). ICET uses genetic search to produce a new set of costs that reflects both the original costs and the contribution of each attribute in reducing misclassification costs. Then it builds a tree using the EG2 algorithm but with the evolved costs instead of the original ones. EG2 is a greedy cost-sensitive algorithm that builds a tree top-down and evaluates candidate splits by considering both the information gain they yield and their measurement costs. It does not, however, take into account the misclassification cost of the problem. ICET was shown to significantly outperform greedy tree learners, producing trees of lower total cost. ICET can use additional time resources to produce more generations and hence widen its search in the space of costs. Because the genetic operations are randomized, ICET is more likely to escape local minima – into which EG2 with the original costs might be trapped. Nevertheless, two shortcomings limit ICET’s ability to benefit from extra time. First, after the search phase, it uses the greedy EG2 algorithm to build the final tree. But because EG2 prefers attributes with high information gain (and low test cost), the usefulness of highly relevant attributes might be underestimated by the greedy measure in the case of hard-to-learn concepts where attribute interdependency is hidden. This will result in more expensive trees. Second, even if ICET overcomes the above problem by randomly reweighting the attributes, it searches the space of parameters globally, regardless of the context in the tree. This imposes a problem if an attribute is important in one subtree but useless in another. To 129
Page 1 and 2:
Technion - Computer Science Departm
Page 3 and 4:
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94: Technion - Computer Science Departm
Page 143: Technion - Computer Science Departm
show all

anytime algorithms for learning anytime classifiers saher ... - Technion

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?