anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />
Procedure Apply-Discount(E, A, Π)<br />
Mi,j ← 0<br />
Foreach e ∈ E<br />
Foreach T ∈ Π<br />
 ← attributes whose values tested by T when classifying e<br />
Foreach a ∈ A<br />
Me,a ← 1<br />
Foreach a ∈ Â<br />
pa ←<br />
P |E|<br />
e=1 Ma,e<br />
|E|<br />
cost(a) ← cost(a) · (1 − pa)<br />
Figure 5.10: Procedure <strong>for</strong> applying discounts when <strong>for</strong>ming discount repertoires <strong>for</strong><br />
interruptible classification<br />
4, we presented an automatic method <strong>for</strong> assigning testing costs to attributes<br />
in existing datasets. We applied this method 4 times on 20 UCI (Asuncion &<br />
Newman, 2007) problems 2 and another 5 datasets that hide hard concepts and<br />
have been used in previous machine <strong>learning</strong> literature. Table 5.1 summarizes<br />
the basic properties of these datasets while Appendix A describes them in more<br />
details. 3<br />
Following the recommendations of Bouckaert (2003), 10 runs of a 10-fold crossvalidation<br />
experiment were conducted <strong>for</strong> each dataset and the reported results<br />
are averaged over the 100 individual runs.<br />
5.4.1 Pre-Contract Classification<br />
Our first set of experiments compares C4.5, EG2, EG2$, TATA(r = 0), which is<br />
equivalent to C4.5$, and TATA(r = 5) in the pre-contract setup. Misclassification<br />
has been set uni<strong>for</strong>mly to 100. 4 For each dataset we invoked the <strong>algorithms</strong> 30<br />
times, each with a different ρc value taken from the range [0, 120%ρc max), with<br />
uni<strong>for</strong>m steps. Figure 5.11 describes the misclassification cost of the different<br />
<strong>algorithms</strong>, as a function of ρc . For each point (ρc value),the results are averaged<br />
over the 100 datasets. 5<br />
Clearly, TATA(r = 5) is dominant. When ρc ≤ ρc min , the <strong>algorithms</strong> cannot<br />
2 The datasets vary in size, type of attributes, and dimension.<br />
3 The 4X25 datasets are available at http://www.cs.technion.ac.il/∼e<strong>saher</strong>/publications/cost.<br />
4 Note that the absolute value of the misclassification cost does not matter because we do<br />
not assume same-scale.<br />
5 The full results are available at http://www.cs.technion.ac.il/∼e<strong>saher</strong>/publications/rbc.<br />
111