18.11.2012 Views

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />

Procedure MC-Evaluate-Continuous-Attribute(E, A, a, p)<br />

Foreach e ∈ E<br />

E≤ ← {e ′ | a(e ′ ) ≤ a(e)}<br />

E> ← {e ′ | a(e ′ ) > a(e)}<br />

entropye ← |E≤|<br />

|E| entropy(E≤) + |E>|<br />

|E| entropy(E>)<br />

gaine ← entropy(E) − entropye<br />

sampleSize ← p · |E|<br />

min ← ∞<br />

Repeat sampleSize times<br />

e ← Choose an example at random from E; <strong>for</strong> each example e,<br />

the probability of selecting it is proportional to gaine<br />

E≤ ← {e ′ | a(e ′ ) ≤ a(e)}<br />

E> ← {e ′ | a(e ′ ) > a(e)}<br />

totale ← |SID3(E≤, A)| + |SID3(E>, A)|<br />

If totale < min<br />

min ← totale<br />

bestTest ← a(e)<br />

Return 〈min, bestTest〉<br />

Figure 3.9: Monte Carlo evaluation of continuous attributes using SDI3<br />

repetitions, candidates with high in<strong>for</strong>mation gain may have multiple instances<br />

in the sample, resulting in several SID3 invocations.<br />

The resources allocated <strong>for</strong> evaluating a continuous attribute using the above<br />

method are determined by the sample size. If our goal is to devote a similar<br />

amount of resources to all attributes, then we can use r as the size of the sample.<br />

Such an approach, however, does not take into account the size of the population<br />

to be sampled. We use a simple alternative approach of taking samples with size<br />

p · |E| where p is a predetermined parameter, set by the user according to the<br />

available resources. 5 Note that p can be greater than one because we sample with<br />

repetition.<br />

We name this variant of the LSID3 algorithm LSID3-MC, and <strong>for</strong>malize it in<br />

Figure 3.9. LSID3-MC can serve as an <strong>anytime</strong> algorithm that is parameterized<br />

by p.<br />

5 Similarly to the parameter r, a mapping from the available time to p is needed (see Section<br />

3.5.4).<br />

31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!