18.11.2012 Views

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />

in depth k but does not give a direct estimation of the resulting tree size. Thus,<br />

when k < |A|, this heuristic is more in<strong>for</strong>med than entropy1 but can still produce<br />

misleading results. In most cases we do not know in advance what value of k<br />

would be sufficient <strong>for</strong> correctly <strong>learning</strong> the target concept. Invoking exhaustive<br />

lookahead, i.e., lookahead to depth k = |A|, will obviously lead to optimal<br />

splits, but its computational costs are prohibitively high. In the following section,<br />

we propose an alternative approach <strong>for</strong> evaluating attributes that overcomes the<br />

above-mentioned drawbacks of ID3-k.<br />

3.5 A Contract Algorithm <strong>for</strong> Learning Decision<br />

trees<br />

Motivated by the advantages of smaller decision trees, we introduce a novel algorithm<br />

that, given an attribute a, evaluates it by estimating the size of the minimal<br />

tree under it. The estimation is based on Monte-Carlo sampling of the space of<br />

consistent trees rooted at a. We estimate the minimum by the size of the smallest<br />

tree in the sample. The number of trees in the sample depends on the available<br />

resources, where the quality of the estimation is expected to improve with the<br />

increased sample size.<br />

One way to sample the space of trees is to repeatedly produce random consistent<br />

trees using the RTG procedure. Since the space of consistent decision trees is<br />

large, such a sample might be a poor representative and the resulting estimation<br />

inaccurate. We propose an alternative sampling approach that produces trees of<br />

smaller expected size. Such a sample is likely to lead to a better estimation of<br />

the minimum.<br />

Our approach is based on a new tree generation algorithm that we designed,<br />

called Stochastic ID3 (SID3). Using this algorithm repeatedly allows the space<br />

of “good” trees to be sampled semi-randomly. In SID3, rather than choose an<br />

attribute that maximizes the in<strong>for</strong>mation gain, we choose the splitting attribute<br />

randomly. The likelihood that an attribute will be chosen is proportional to its<br />

in<strong>for</strong>mation gain. 3 However, if there are attributes that decrease the entropy to<br />

zero, then one of them is picked randomly. The attribute selection procedure of<br />

SID3 is listed in Figure 3.4.<br />

To show that SID3 is a better sampler than RTG, we repeated our sampling<br />

experiment (Section 3.3) using SID3 as the sample generator. Figure 3.5 compares<br />

the frequency curves of RTG and SID3. Relative to RTG, the graph <strong>for</strong><br />

SID3 is shifted to the left, indicating that the SID3 trees are clearly smaller.<br />

3 We make sure that attributes with gain of zero will have a positive probability of being<br />

selected.<br />

25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!