anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />
simpler XOR and multiplexer problems, its limited lookahead is not sufficient<br />
<strong>for</strong> <strong>learning</strong> complex concepts such as XOR-10: DMTI achieved an accuracy of<br />
50%. IIDT and LSID3, by producing larger samples, overcame this problem and<br />
reached high accuracies.<br />
Kim and Loh (2001) introduced CRUISE, a bias-free decision tree learner<br />
that attempts to produce more compact trees by (1) using multiway splits—<br />
one subnode <strong>for</strong> each class, and (2) examining pair-wise interactions among the<br />
variables. CRUISE is able to learn XOR-2 and Chess-board (numeric XOR-<br />
2) concepts. Much like ID3-k with k = 2, it cannot recognize more complex<br />
interactions.<br />
Bennett (1994) presented GTO, a non-greedy approach <strong>for</strong> repairing multivariate<br />
decision trees. GTO requires as input an initial tree. The algorithm<br />
retains the structure of the tree but attempts to simultaneously improve all the<br />
multivariate decisions of the tree using iterative linear programming. GTO and<br />
IIDT both use a non-greedy approach to improve a decision tree. The advantage<br />
of GTO is its use of a well-established numerical method <strong>for</strong> optimization. Its<br />
disadvantages are its inability to modify the initial structure and its inability to<br />
exploit additional resources (beyond those needed <strong>for</strong> convergence).<br />
Freund and Mason (1999) described a new type of classification mechanism,<br />
the alternating decision tree (ADTree). ADTree provides a method <strong>for</strong> visualizing<br />
decision stumps in an ordered and logical way to demonstrate correlations.<br />
Freund and Mason presented an iterative algorithm <strong>for</strong> <strong>learning</strong> ADTrees, based<br />
on boosting. While it has not been studied as an <strong>anytime</strong> algorithm, ADTrees<br />
can be viewed as such. The learner starts with a constant prediction and adds<br />
one decision stump at a time. If stopped, it returns the current tree. The <strong>anytime</strong><br />
behavior of ADTree, however, is problematic. Additional time resources can only<br />
be used to add more rules and there<strong>for</strong>e might result in large and over complicated<br />
trees. Moreover, ADtree is not designed to tackle the problem of attribute<br />
interdependencies because it evaluates each split independently.<br />
Murthy, Kasif, and Salzberg (1994) introduced OC1 (Oblique Classifier 1), a<br />
new algorithm <strong>for</strong> induction of oblique decision trees. Oblique decision trees use<br />
multivariate tests that are not necessarily parallel to an axis. OC1 builds decision<br />
trees that contain linear combinations of one or more attributes at each internal<br />
node; these trees then partition the space of examples with both oblique and axisparallel<br />
hyperplanes. The problem of searching <strong>for</strong> the best oblique split is much<br />
more difficult than that of searching <strong>for</strong> the best axis-parallel split because the<br />
number of candidate oblique splits is exponential. There<strong>for</strong>e, OC1 takes a greedy<br />
approach and attempts to find locally good splits rather optimal ones. Our LSID3<br />
algorithm can be generalized to support the induction of oblique decision trees by<br />
using OC1 as the sampler, and by considering oblique splits. Furthermore, OC1<br />
can be converted into an <strong>anytime</strong> algorithm by considering more splits at each<br />
125