18.11.2012 Views

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />

well and no lookahead is needed. However, <strong>for</strong> more difficult concepts such as<br />

XOR, the greedy approach is likely to fail. A third problem is their limitedness<br />

to a specific objective: they cannot be adapted to different <strong>learning</strong> setups and<br />

other objectives, such as minimizing testing and misclassification costs.<br />

We there<strong>for</strong>e propose an alternative approach <strong>for</strong> looking ahead. For each<br />

candidate split we sample the spaces of subtrees under it and estimate the utility<br />

of the sampled trees. Because we evaluate entire trees, different utility functions<br />

can be used, depending on the actual cost scheme. The split with the best tree<br />

in its sample is then selected to split on.<br />

In the cost-insensitive setup, our goal is to induce small and accurate trees.<br />

Following Occam’s razor, we bias the sample towards small consistent trees and<br />

evaluate each sample tree by its size. To avoid overfitting the training examples,<br />

we apply a post-pruning phase, similarly to C4.5.<br />

When our objective is to minimize the total cost, we bias the sample towards<br />

low cost trees, and evaluate the sampled trees by their expected total cost. The<br />

total cost of a tree is estimated using the average costs of classifying the training<br />

examples using the tree, and the expected error of the tree. In cost-insensitive<br />

environments, the main goal of pruning is to simplify the tree in order to avoid<br />

overfitting the training data. A subtree is pruned if the resulting tree is expected<br />

to yield a lower error. When test costs are taken into account, pruning has another<br />

important role: reducing test costs in a tree. Keeping a subtree is worthwhile<br />

only if its expected reduction in misclassification costs is larger than the cost of<br />

the tests in that subtree. There<strong>for</strong>e, we designed a novel pruning approach based<br />

on the expected total cost of a tree.<br />

For the scenarios that constrain the testing costs, we developed a novel topdown<br />

approach to exploit the available testing resources. When the bounds are<br />

known to the learner, a tree that fits the budget is built. In other cases, a<br />

repertoire of trees is <strong>for</strong>med. If the quota is known be<strong>for</strong>e classification, a single<br />

tree that best fits the budget is picked. Otherwise, the trees are traversed until<br />

resources are exhausted.<br />

Our <strong>anytime</strong> approach can benefit from extra <strong>learning</strong> time by creating larger<br />

samples. The larger the samples are, the more accurate the attribute evaluation<br />

is. There are two main classes of <strong>anytime</strong> <strong>algorithms</strong>, namely contract and interruptible<br />

(Russell & Zilberstein, 1996). A contract algorithm is one that gets<br />

its resource allocation as a parameter. An interruptible algorithm is one whose<br />

resource allocation is not given in advance and thus must be prepared to be interrupted<br />

at any moment. While the assumption of preallocated resources holds<br />

<strong>for</strong> many induction tasks, in many other real-life applications it is not possible to<br />

allocate the resources a priori. There<strong>for</strong>e, in our work, we are interested both in<br />

contract and interruptible decision tree learners. In the contract setup, the sample<br />

size is predetermined according to the available resources. In the interruptible<br />

9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!