18.11.2012 Views

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />

Table 4.2: Average cost of classification as a percentage of the standard cost of<br />

classification <strong>for</strong> different mc values. The numbers represent the average over all 105<br />

datasets and the associated confidence intervals (α = 5%).<br />

mc C4.5 LSID3 IDX CSID3 EG2 DTMC ICET ACT<br />

100 50.6 ±4 45.3 ±4 34.4 ±4 41.7 ±4 35.1 ±4 14.6 ±2 24.3 ±3.1 15.2 ±2<br />

500 49.9 ±4 43.0 ±4 42.4 ±4 45.2 ±4 42.5 ±4 37.7 ±3 36.3 ±3.1 34.5 ±3<br />

1000 50.4 ±5 42.4 ±5 47.5 ±4 47.8 ±4 47.3 ±4 47.1 ±4 40.6 ±3.9 39.1 ±4<br />

5000 53.3 ±6 43.6 ±6 58.1 ±6 54.3 ±6 57.3 ±6 57.6 ±5 45.7 ±5.6 41.5 ±6<br />

10000 54.5 ±6 44.5 ±7 60.8 ±6 56.2 ±6 59.9 ±6 59.5 ±6 47.1 ±6.0 41.4 ±6<br />

Statistical Significance<br />

For each problem, a single 10-fold cross-validation experiment was conducted.<br />

The same partition to train-test sets was used <strong>for</strong> all compared <strong>algorithms</strong>. To<br />

determine statistical significance of the per<strong>for</strong>mance differences between ACT,<br />

ICET, and DTMC we used two tests:<br />

• A t-test with α = 5% confidence. For each method we counted how many<br />

times it was a significant winner.<br />

• Wilcoxon test (Demsar, 2006), which compares <strong>classifiers</strong> over multiple<br />

datasets and states whether one method is significantly better than the<br />

other (α = 5%).<br />

4.6.3 Fixed-time Comparison<br />

For each of the 105 × 5 problem instances, we ran the different <strong>algorithms</strong>, including<br />

ACT with r = 5. We chose r = 5 so the average runtime of ACT would<br />

be shorter than ICET over all problems. The other methods have much shorter<br />

runtime due to their greedy nature.<br />

Table 4.2 summarizes the results and Tables 4.4-4.8 list the detailed results<br />

<strong>for</strong> the different datasets. 8 Each table corresponds to a different mc value. Each<br />

pair of numbers represents the average normalized cost and its associated confidence<br />

interval (α = 5%). Figure 4.7 illustrates the average results and plots the<br />

normalized costs <strong>for</strong> the different <strong>algorithms</strong> and misclassification costs.<br />

Statistical significance test results <strong>for</strong> ACT, ICET, and DTMC are given in<br />

Table 4.3. The <strong>algorithms</strong> are compared using both the t-test and the Wilcoxon<br />

8 For the 25 datasets with automatic cost assignment, the average over the 4 seeds is listed.<br />

The full results are available at http://www.cs.technion.ac.il/∼e<strong>saher</strong>/publications/cost.<br />

85

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!