18.11.2012 Views

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />

simpler XOR and multiplexer problems, its limited lookahead is not sufficient<br />

<strong>for</strong> <strong>learning</strong> complex concepts such as XOR-10: DMTI achieved an accuracy of<br />

50%. IIDT and LSID3, by producing larger samples, overcame this problem and<br />

reached high accuracies.<br />

Kim and Loh (2001) introduced CRUISE, a bias-free decision tree learner<br />

that attempts to produce more compact trees by (1) using multiway splits—<br />

one subnode <strong>for</strong> each class, and (2) examining pair-wise interactions among the<br />

variables. CRUISE is able to learn XOR-2 and Chess-board (numeric XOR-<br />

2) concepts. Much like ID3-k with k = 2, it cannot recognize more complex<br />

interactions.<br />

Bennett (1994) presented GTO, a non-greedy approach <strong>for</strong> repairing multivariate<br />

decision trees. GTO requires as input an initial tree. The algorithm<br />

retains the structure of the tree but attempts to simultaneously improve all the<br />

multivariate decisions of the tree using iterative linear programming. GTO and<br />

IIDT both use a non-greedy approach to improve a decision tree. The advantage<br />

of GTO is its use of a well-established numerical method <strong>for</strong> optimization. Its<br />

disadvantages are its inability to modify the initial structure and its inability to<br />

exploit additional resources (beyond those needed <strong>for</strong> convergence).<br />

Freund and Mason (1999) described a new type of classification mechanism,<br />

the alternating decision tree (ADTree). ADTree provides a method <strong>for</strong> visualizing<br />

decision stumps in an ordered and logical way to demonstrate correlations.<br />

Freund and Mason presented an iterative algorithm <strong>for</strong> <strong>learning</strong> ADTrees, based<br />

on boosting. While it has not been studied as an <strong>anytime</strong> algorithm, ADTrees<br />

can be viewed as such. The learner starts with a constant prediction and adds<br />

one decision stump at a time. If stopped, it returns the current tree. The <strong>anytime</strong><br />

behavior of ADTree, however, is problematic. Additional time resources can only<br />

be used to add more rules and there<strong>for</strong>e might result in large and over complicated<br />

trees. Moreover, ADtree is not designed to tackle the problem of attribute<br />

interdependencies because it evaluates each split independently.<br />

Murthy, Kasif, and Salzberg (1994) introduced OC1 (Oblique Classifier 1), a<br />

new algorithm <strong>for</strong> induction of oblique decision trees. Oblique decision trees use<br />

multivariate tests that are not necessarily parallel to an axis. OC1 builds decision<br />

trees that contain linear combinations of one or more attributes at each internal<br />

node; these trees then partition the space of examples with both oblique and axisparallel<br />

hyperplanes. The problem of searching <strong>for</strong> the best oblique split is much<br />

more difficult than that of searching <strong>for</strong> the best axis-parallel split because the<br />

number of candidate oblique splits is exponential. There<strong>for</strong>e, OC1 takes a greedy<br />

approach and attempts to find locally good splits rather optimal ones. Our LSID3<br />

algorithm can be generalized to support the induction of oblique decision trees by<br />

using OC1 as the sampler, and by considering oblique splits. Furthermore, OC1<br />

can be converted into an <strong>anytime</strong> algorithm by considering more splits at each<br />

125

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!