18.11.2012 Views

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />

Procedure SID3-Choose-Attribute(E, A)<br />

Foreach a ∈ A<br />

p (a) ← gain-1(E, a)<br />

If ∃a such that entropy-1(E, a) = 0<br />

a ∗ ← Choose attribute at random from<br />

{a ∈ A | entropy-1(E, a) = 0}<br />

Else<br />

a ∗ ← Choose attribute at random from A;<br />

<strong>for</strong> each attribute a, the probability<br />

of selecting it is proportional to p (a)<br />

Return a ∗<br />

Figure 3.4: Attribute selection in SID3<br />

Next, we compared the average minimum found <strong>for</strong> samples of different sizes.<br />

Figure 3.6 shows the results. For the three datasets, the minimal size found by<br />

SID3 is strictly smaller than the value found by RTG. Given the same budget<br />

of time, RTG produced, on average, samples that are twice as large as that of<br />

SID3. However, even when the results are normalized (dashed line), SID3 is still<br />

superior.<br />

Having decided about the sampler, we are ready to describe our proposed<br />

contract algorithm, Lookahead-by-Stochastic-ID3 (LSID3). In LSID3, each candidate<br />

split is evaluated by the estimated size of the subtree under it. To estimate<br />

the size under an attribute a, LSID3 partitions the set of examples according to<br />

the values a can take and repeatedly invokes SID3 to sample the space of trees<br />

consistent with each subset. Summing up the minimal tree size <strong>for</strong> each subset<br />

gives an estimation of the minimal total tree size under a.<br />

LSID3 is a contract algorithm parameterized by r, the sample size. LSID3<br />

with r = 0 is defined to choose the splitting attribute using the standard ID3<br />

selection method. Figure 3.7 illustrates the choice of splitting attributes as made<br />

by LSID3. In the given example, SID3 is called twice <strong>for</strong> each subset and the<br />

evaluation of the examined attribute a is the sum of the two minima: min(4, 3)+<br />

min(2, 6) = 5. The method <strong>for</strong> choosing a splitting attribute is <strong>for</strong>malized in<br />

Figure 3.8.<br />

To analyze the time complexity of LSID3, let m be the total number of examples<br />

and n be the total number of attributes. For a given node y, we denote by<br />

ny the number of candidate attributes at y, and by my the number of examples<br />

that reach y. ID3, at each node y, calculates gain <strong>for</strong> ny attributes using my<br />

examples, i.e., the complexity of choosing an attribute is O(ny · my). At level i<br />

of the tree, the total number of examples is bounded by m and the number of<br />

26

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!