18.11.2012 Views

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />

node if resources allow. These would make challenging and interesting directions<br />

<strong>for</strong> future research.<br />

Nijssen and Fromont (2007) presented DL8, an exact algorithm <strong>for</strong> finding a<br />

decision tree that optimizes a ranking function under size, depth, accuracy and<br />

leaf constraints. The key idea behind DL8 is that constraints on decision trees<br />

are simulated as constraints on itemsets. They show that optimal decision trees<br />

can be extracted from lattices of itemsets in linear time. The applicability of<br />

DL8, however, is limited by two factors: the number of itemsets that need to be<br />

stored, and the time that it takes to compute these itemsets. In some cases, the<br />

number of frequent itemsets is so large that it is impossible to compute or store<br />

them within a reasonable amount of time or space.<br />

Several researchers investigated the induction of shallow decision trees. Holte<br />

(1993) reported an empirical study of the accuracy of rules that classify examples<br />

on the basis of a single test (1R). Holte concluded that, on most real world<br />

datasets, multilevel decision trees do not per<strong>for</strong>m significantly better than onelevel<br />

classification rules. Elomaa (1994), however, questioned the validity of these<br />

conclusions. Elomaa argued that the small difference in accuracy between 1R and<br />

C4.5 is still significant, and that the conclusions may have been the results of the<br />

use of unrepresentative databases.<br />

Auer, Holte, and Maass (1995) presented a novel algorithm, called T2, <strong>for</strong><br />

agnostic PAC-<strong>learning</strong> with decision trees of at most 2-levels. The computation<br />

time of T2 is almost linear in the size of the training set. When tested empirically<br />

on several datasets, T2 was shown to produce substantially simpler decision trees<br />

with little or no loss in predictive power. Since one can prove that T2 is an<br />

angostic PAC-<strong>learning</strong> algorithm, it is guaranteed to produce close to optimal 2level<br />

decision trees given sufficiently large training data. A generalization to depth<br />

d, however, would require computation time that is exponential in d. Dobkin,<br />

Gunopulos, and Kasif (1996) described several <strong>algorithms</strong> and theoretical results<br />

<strong>for</strong> <strong>learning</strong> optimal consistent decision trees of bounded depth. These methods,<br />

nevertheless, are practical only when the tree depth is tiny.<br />

Much research ef<strong>for</strong>t has also been invested in the theoretical aspects of decision<br />

tree <strong>learning</strong>. A standard theoretical approach is to prove a bound on generalization<br />

error as a function of the training error and the concept size (McAllester,<br />

1998). Then, a concept optimizing the tradeoff between training error and concept<br />

size, as expressed in the bound, is selected. These bounds depend on the size<br />

of the training sample, but not on the sample itself. To improve them, Mansour<br />

and McAllester (2000) constructed bounds that depend both on the structure of<br />

the model and on the actual examples that <strong>for</strong>m the sample.<br />

Kushilevitz and Mansour (1993) presented a polynomial time algorithm <strong>for</strong><br />

<strong>learning</strong> decision trees with respect to the uni<strong>for</strong>m distribution, using membership<br />

queries. The considered decision tree model is an extension of the traditional<br />

126

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!