anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />
The SVM algorithm usually depends on several parameters – kernel parameters,<br />
<strong>for</strong> example. Several works, such as (Chapelle, Vapnik, Bousquet, & Mukherjee,<br />
2002), proposed iterative methods <strong>for</strong> automatic tuning of SVM parameters.<br />
These iterative methods can exploit additional time resources <strong>for</strong> better tuning.<br />
A well-studied alternative to inductive <strong>learning</strong> is the theory refinement<br />
paradigm. In a theory refinement system, we first acquire a domain theory,<br />
<strong>for</strong> instance by querying experts, and then revise the obtained set of rules in an<br />
attempt to make it consistent with the training data. Opitz (1995) introduced<br />
an <strong>anytime</strong> approach <strong>for</strong> theory refinement. This approach starts by generating<br />
a neural network from a set of rules that describe what is currently known about<br />
the domain. The network then uses the training data and the additional time<br />
resources to try to improve the resulting hypothesis.<br />
6.3 Cost-sensitive Classification<br />
Cost-sensitive trees have been the subject of many research ef<strong>for</strong>ts. Several<br />
works proposed <strong>learning</strong> <strong>algorithms</strong> that consider different misclassification costs<br />
(Breiman et al., 1984; Pazzani, Merz, Murphy, Ali, Hume, & Brunk, 1994;<br />
Provost & Buchanan, 1995; Brad<strong>for</strong>d, Kunz, Kohavi, Brunk, & Brodley, 1998;<br />
Domingos, 1999b; Drummond & Holte, 2000; Elkan, 2001; Zadrozny, Lang<strong>for</strong>d,<br />
& Abe, 2003; Lachiche & Flach, 2003; Abe, Zadrozny, & Lang<strong>for</strong>d, 2004; Vadera,<br />
2005; Margineantu, 2005; Zhu, Wu, Khoshgoftaar, & Yong, 2007; Sheng & Ling,<br />
2007b). These methods, however, do not consider test costs and hence are appropriate<br />
mainly <strong>for</strong> domains where test costs are not a constraint. Other authors<br />
designed tree learners that take into account test costs, such as IDX (Norton,<br />
1989), CSID3 (Tan & Schlimmer, 1989), and EG2 (Nunez, 1991). These methods,<br />
however, do not consider misclassification costs.<br />
Decision Trees with Minimal Cost (DTMC), a greedy method that attempts<br />
to minimize both types of costs simultaneously, has been recently introduced<br />
(Ling et al., 2004; Sheng et al., 2006). A tree is built top-down, and a greedy<br />
split criterion that takes into account both testing and misclassification costs is<br />
used. The basic idea is to estimate the immediate reduction in total cost after<br />
each split, and to prefer the split with the maximal reduction. If no split reduces<br />
the cost on the training data, the induction process is stopped.<br />
Although efficient, the DTMC approach can be trapped into a local minimum<br />
and produce trees that are not globally optimal. For example, consider the concept<br />
and costs described in Figure 6.2 (left). There are 10 attributes, of which<br />
only a9 and a10 are relevant. The cost of a9 and a10, however, is significantly<br />
higher than the others. Such high costs may hide the usefulness of a9 and a10,<br />
and mislead the learner into repeatedly splitting on a1−8, which would result in<br />
128