anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />
Procedure LSID3-Choose-Attribute(E, A, r)<br />
If r = 0<br />
Return ID3-Choose-Attribute(E, A)<br />
Foreach a ∈ A<br />
Foreach vi ∈ domain(a)<br />
Ei ← {e ∈ E | a(e) = vi}<br />
mini ← ∞<br />
Repeat r times<br />
T ← SID3(Ei, A − {a})<br />
mini ← min (mini, |T |)<br />
mini<br />
Return a <strong>for</strong> which totala is minimal<br />
totala ← � |domain(a)|<br />
i=1<br />
Figure B.3: Attribute selection in LSID3<br />
accurate. There<strong>for</strong>e, LSID3 is expected to improve with the increase in r. Figure<br />
B.3 lists the procedure <strong>for</strong> attribute selection as applied by LSID3. Because our<br />
goal is to sample small trees and not to always obtain the smallest tree, we use<br />
LSID3(r = 1) as a sampler. Observe that LSID3 is stochastic by nature, and<br />
there<strong>for</strong>e we do not need to randomize its decisions.<br />
B.3 Empirical Evaluation<br />
We tested Occam’s empirical principle, as stated in Definition 1, on 20 datasets, 18<br />
of which were chosen arbitrarily from the UCI repository (Asuncion & Newman,<br />
2007), and 2 which are artificial datasets that represent hard concepts: XOR-<br />
5 with 5 additional irrelevant attributes, and 20-bit Multiplexer. Each dataset<br />
was partitioned into 10 subsets that were used to create 10 <strong>learning</strong> problems.<br />
Each problem consisted of one subset serving as a testing set and the union of<br />
the remaining 9 as a training set, as in 10-fold cross validation. We sampled<br />
the version space, TDIDT A(E), <strong>for</strong> each training set E using the three methods<br />
described in Section B.2.2 and tested the correlation between the size of a tree<br />
(number of leaves) and its accuracy on the associated testing set. The size of the<br />
sample was ten thousand <strong>for</strong> RTG and SID3, and one thousand <strong>for</strong> LSID3 (due<br />
to its higher costs). We first present and discuss the results <strong>for</strong> consistent trees<br />
and then we address the problem of pruned, inconsistent trees.<br />
148