18.11.2012 Views

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />

Procedure Apply-Discount(E, A, Π)<br />

Mi,j ← 0<br />

Foreach e ∈ E<br />

Foreach T ∈ Π<br />

 ← attributes whose values tested by T when classifying e<br />

Foreach a ∈ A<br />

Me,a ← 1<br />

Foreach a ∈ Â<br />

pa ←<br />

P |E|<br />

e=1 Ma,e<br />

|E|<br />

cost(a) ← cost(a) · (1 − pa)<br />

Figure 5.10: Procedure <strong>for</strong> applying discounts when <strong>for</strong>ming discount repertoires <strong>for</strong><br />

interruptible classification<br />

4, we presented an automatic method <strong>for</strong> assigning testing costs to attributes<br />

in existing datasets. We applied this method 4 times on 20 UCI (Asuncion &<br />

Newman, 2007) problems 2 and another 5 datasets that hide hard concepts and<br />

have been used in previous machine <strong>learning</strong> literature. Table 5.1 summarizes<br />

the basic properties of these datasets while Appendix A describes them in more<br />

details. 3<br />

Following the recommendations of Bouckaert (2003), 10 runs of a 10-fold crossvalidation<br />

experiment were conducted <strong>for</strong> each dataset and the reported results<br />

are averaged over the 100 individual runs.<br />

5.4.1 Pre-Contract Classification<br />

Our first set of experiments compares C4.5, EG2, EG2$, TATA(r = 0), which is<br />

equivalent to C4.5$, and TATA(r = 5) in the pre-contract setup. Misclassification<br />

has been set uni<strong>for</strong>mly to 100. 4 For each dataset we invoked the <strong>algorithms</strong> 30<br />

times, each with a different ρc value taken from the range [0, 120%ρc max), with<br />

uni<strong>for</strong>m steps. Figure 5.11 describes the misclassification cost of the different<br />

<strong>algorithms</strong>, as a function of ρc . For each point (ρc value),the results are averaged<br />

over the 100 datasets. 5<br />

Clearly, TATA(r = 5) is dominant. When ρc ≤ ρc min , the <strong>algorithms</strong> cannot<br />

2 The datasets vary in size, type of attributes, and dimension.<br />

3 The 4X25 datasets are available at http://www.cs.technion.ac.il/∼e<strong>saher</strong>/publications/cost.<br />

4 Note that the absolute value of the misclassification cost does not matter because we do<br />

not assume same-scale.<br />

5 The full results are available at http://www.cs.technion.ac.il/∼e<strong>saher</strong>/publications/rbc.<br />

111

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!