anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />
Procedure SID3-Choose-Attribute(E, A)<br />
Foreach a ∈ A<br />
p (a) ← gain-1(E, a)<br />
If ∃a such that entropy-1(E, a) = 0<br />
a ∗ ← Choose attribute at random from<br />
{a ∈ A | entropy-1(E, a) = 0}<br />
Else<br />
a ∗ ← Choose attribute at random from A;<br />
<strong>for</strong> each attribute a, the probability<br />
of selecting it is proportional to p (a)<br />
Return a ∗<br />
Figure 3.4: Attribute selection in SID3<br />
Next, we compared the average minimum found <strong>for</strong> samples of different sizes.<br />
Figure 3.6 shows the results. For the three datasets, the minimal size found by<br />
SID3 is strictly smaller than the value found by RTG. Given the same budget<br />
of time, RTG produced, on average, samples that are twice as large as that of<br />
SID3. However, even when the results are normalized (dashed line), SID3 is still<br />
superior.<br />
Having decided about the sampler, we are ready to describe our proposed<br />
contract algorithm, Lookahead-by-Stochastic-ID3 (LSID3). In LSID3, each candidate<br />
split is evaluated by the estimated size of the subtree under it. To estimate<br />
the size under an attribute a, LSID3 partitions the set of examples according to<br />
the values a can take and repeatedly invokes SID3 to sample the space of trees<br />
consistent with each subset. Summing up the minimal tree size <strong>for</strong> each subset<br />
gives an estimation of the minimal total tree size under a.<br />
LSID3 is a contract algorithm parameterized by r, the sample size. LSID3<br />
with r = 0 is defined to choose the splitting attribute using the standard ID3<br />
selection method. Figure 3.7 illustrates the choice of splitting attributes as made<br />
by LSID3. In the given example, SID3 is called twice <strong>for</strong> each subset and the<br />
evaluation of the examined attribute a is the sum of the two minima: min(4, 3)+<br />
min(2, 6) = 5. The method <strong>for</strong> choosing a splitting attribute is <strong>for</strong>malized in<br />
Figure 3.8.<br />
To analyze the time complexity of LSID3, let m be the total number of examples<br />
and n be the total number of attributes. For a given node y, we denote by<br />
ny the number of candidate attributes at y, and by my the number of examples<br />
that reach y. ID3, at each node y, calculates gain <strong>for</strong> ny attributes using my<br />
examples, i.e., the complexity of choosing an attribute is O(ny · my). At level i<br />
of the tree, the total number of examples is bounded by m and the number of<br />
26