anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />
in depth k but does not give a direct estimation of the resulting tree size. Thus,<br />
when k < |A|, this heuristic is more in<strong>for</strong>med than entropy1 but can still produce<br />
misleading results. In most cases we do not know in advance what value of k<br />
would be sufficient <strong>for</strong> correctly <strong>learning</strong> the target concept. Invoking exhaustive<br />
lookahead, i.e., lookahead to depth k = |A|, will obviously lead to optimal<br />
splits, but its computational costs are prohibitively high. In the following section,<br />
we propose an alternative approach <strong>for</strong> evaluating attributes that overcomes the<br />
above-mentioned drawbacks of ID3-k.<br />
3.5 A Contract Algorithm <strong>for</strong> Learning Decision<br />
trees<br />
Motivated by the advantages of smaller decision trees, we introduce a novel algorithm<br />
that, given an attribute a, evaluates it by estimating the size of the minimal<br />
tree under it. The estimation is based on Monte-Carlo sampling of the space of<br />
consistent trees rooted at a. We estimate the minimum by the size of the smallest<br />
tree in the sample. The number of trees in the sample depends on the available<br />
resources, where the quality of the estimation is expected to improve with the<br />
increased sample size.<br />
One way to sample the space of trees is to repeatedly produce random consistent<br />
trees using the RTG procedure. Since the space of consistent decision trees is<br />
large, such a sample might be a poor representative and the resulting estimation<br />
inaccurate. We propose an alternative sampling approach that produces trees of<br />
smaller expected size. Such a sample is likely to lead to a better estimation of<br />
the minimum.<br />
Our approach is based on a new tree generation algorithm that we designed,<br />
called Stochastic ID3 (SID3). Using this algorithm repeatedly allows the space<br />
of “good” trees to be sampled semi-randomly. In SID3, rather than choose an<br />
attribute that maximizes the in<strong>for</strong>mation gain, we choose the splitting attribute<br />
randomly. The likelihood that an attribute will be chosen is proportional to its<br />
in<strong>for</strong>mation gain. 3 However, if there are attributes that decrease the entropy to<br />
zero, then one of them is picked randomly. The attribute selection procedure of<br />
SID3 is listed in Figure 3.4.<br />
To show that SID3 is a better sampler than RTG, we repeated our sampling<br />
experiment (Section 3.3) using SID3 as the sample generator. Figure 3.5 compares<br />
the frequency curves of RTG and SID3. Relative to RTG, the graph <strong>for</strong><br />
SID3 is shifted to the left, indicating that the SID3 trees are clearly smaller.<br />
3 We make sure that attributes with gain of zero will have a positive probability of being<br />
selected.<br />
25