18.11.2012 Views

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />

attributes exceeds 35, but even with 60 attributes the algorithm achieves an<br />

accuracy of about 82%. The skewing <strong>algorithms</strong>, on the other hand, are much<br />

more sensitive to irrelevant attributes and their per<strong>for</strong>mance decreases drastically<br />

with the increase in the number of irrelevant attributes. When the number of<br />

attributes is more than 35, the skewing <strong>algorithms</strong> become no better than a<br />

random guesser. The consistent advantage of LSID3 is clear also in terms of tree<br />

size, where the trees produced by ID3 and skewing are significantly larger.<br />

To be fair, it is important to note that LSID3 had a much longer runtime<br />

than skewing with its default parameters. However, our previous experiments<br />

with parity concepts showed that the per<strong>for</strong>mance of skewing does not improve<br />

with time and hence the results are expected to be the same, even if skewing<br />

were to be allocated the same amount of time. To verify this, we repeated the<br />

experiment <strong>for</strong> 35 and 60 attributes and allocated skewing the same time as<br />

LSID3(r = 5). The results were similar to those reported in Figure 6.1 and no<br />

improvement in the per<strong>for</strong>mance of skewing was observed.<br />

6.1.2 Other Cost-insensitive Decision-Tree Inducers<br />

Papagelis and Kalles (2001) studied GATree, a learner that uses genetic <strong>algorithms</strong><br />

<strong>for</strong> building decision trees. GATree does not adopt the top-down scheme.<br />

Instead, it starts with a population of random trees and uses a mutation operation<br />

of randomly changing a splitting test and a crossover operation of exchanging subtrees.<br />

Unlike our approach, GATree is not designed to generate consistent decision<br />

trees and searches the space of all possible trees over a given set of attributes.<br />

Thus, it is not appropriate <strong>for</strong> applications where a consistent tree is required.<br />

Like most genetic <strong>algorithms</strong>, GATree requires cautious parameter tuning and its<br />

per<strong>for</strong>mance depends greatly on the chosen setting. Comparing GATree to our<br />

algorithm (see Section 3.7.6) shows that, especially <strong>for</strong> hard concepts, it is much<br />

better to invest the resources in careful tuning of a single tree than to per<strong>for</strong>m<br />

genetic search over the large population of decision trees.<br />

Utgoff et al. (1997) presented DMTI (Direct Metric Tree Induction), an<br />

induction algorithm that chooses an attribute by building a single decision tree<br />

under each candidate attribute and evaluates it using various measures. Several<br />

possible tree measures were examined and the MDL (Minimum Description<br />

Length) measure per<strong>for</strong>med best. DMTI is similar to LSID3(r = 1) but, unlike<br />

LSID3, it can only use a fixed amount of additional resources and hence cannot<br />

serve as an <strong>anytime</strong> algorithm. When the user can af<strong>for</strong>d using more resources<br />

than required by DMTI, the latter does not provide means to improve the learned<br />

model further. Furthermore, DMTI uses a single greedy lookahead tree <strong>for</strong> attribute<br />

evaluation, while we use a biased sample of the possible lookahead trees.<br />

Our experiments with DMTI (as available online) show that while it can solve<br />

124

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!