anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />
Procedure MC-Evaluate-Continuous-Attribute(E, A, a, p)<br />
Foreach e ∈ E<br />
E≤ ← {e ′ | a(e ′ ) ≤ a(e)}<br />
E> ← {e ′ | a(e ′ ) > a(e)}<br />
entropye ← |E≤|<br />
|E| entropy(E≤) + |E>|<br />
|E| entropy(E>)<br />
gaine ← entropy(E) − entropye<br />
sampleSize ← p · |E|<br />
min ← ∞<br />
Repeat sampleSize times<br />
e ← Choose an example at random from E; <strong>for</strong> each example e,<br />
the probability of selecting it is proportional to gaine<br />
E≤ ← {e ′ | a(e ′ ) ≤ a(e)}<br />
E> ← {e ′ | a(e ′ ) > a(e)}<br />
totale ← |SID3(E≤, A)| + |SID3(E>, A)|<br />
If totale < min<br />
min ← totale<br />
bestTest ← a(e)<br />
Return 〈min, bestTest〉<br />
Figure 3.9: Monte Carlo evaluation of continuous attributes using SDI3<br />
repetitions, candidates with high in<strong>for</strong>mation gain may have multiple instances<br />
in the sample, resulting in several SID3 invocations.<br />
The resources allocated <strong>for</strong> evaluating a continuous attribute using the above<br />
method are determined by the sample size. If our goal is to devote a similar<br />
amount of resources to all attributes, then we can use r as the size of the sample.<br />
Such an approach, however, does not take into account the size of the population<br />
to be sampled. We use a simple alternative approach of taking samples with size<br />
p · |E| where p is a predetermined parameter, set by the user according to the<br />
available resources. 5 Note that p can be greater than one because we sample with<br />
repetition.<br />
We name this variant of the LSID3 algorithm LSID3-MC, and <strong>for</strong>malize it in<br />
Figure 3.9. LSID3-MC can serve as an <strong>anytime</strong> algorithm that is parameterized<br />
by p.<br />
5 Similarly to the parameter r, a mapping from the available time to p is needed (see Section<br />
3.5.4).<br />
31