18.11.2012 Views

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />

Gogan, 2004). If we chose to update the portfolio at specific time points,<br />

we would like the learner to exploit the time between these updates.<br />

Furthermore, researchers in the field can benefit from the automatic method<br />

<strong>for</strong> cost assignments we have developed. Only a few UCI datasets have assigned<br />

costs. In this work we designed a semi-randomized method <strong>for</strong> assigning costs<br />

to existing datasets. We applied this method on 25 datasets and established a<br />

repository, available at: http://www.cs.technion.ac.il/∼e<strong>saher</strong>/cost.<br />

This research can be extended in several directions. We intend to apply<br />

monitoring techniques <strong>for</strong> optimal scheduling of the <strong>anytime</strong> learners. We also<br />

plan to use different measures <strong>for</strong> tree quality and compare their utility. While<br />

the tree size and expected error were generally successful, our sampling approach<br />

did not, in few cases, yield significant improvement. Using other measures may<br />

improve the per<strong>for</strong>mance in these cases.<br />

We also intend to test the per<strong>for</strong>mance of our framework on other cost schemes<br />

that involve other types of cost. We believe that the generality of our framework<br />

will allow excellent results to be obtained under other setups as well. To reduce<br />

the runtime of our <strong>anytime</strong> <strong>algorithms</strong>, we plan to cache some of the lookahead<br />

trees and use them, rather than resampling at each node. If a split is chosen,<br />

the sample of already available subtrees can be used to evaluate its descendants<br />

as well. Finally, an important advantage of our method is that it can be easily<br />

parallelized. Assume, <strong>for</strong> example, that we decided on samples of size r. Then, r<br />

different machines can independently <strong>for</strong>m the sample and speed up the induction<br />

process by a factor of r. We intend to consider this direction in the future.<br />

138

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!