anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
anytime algorithms for learning anytime classifiers saher ... - Technion
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />
Gogan, 2004). If we chose to update the portfolio at specific time points,<br />
we would like the learner to exploit the time between these updates.<br />
Furthermore, researchers in the field can benefit from the automatic method<br />
<strong>for</strong> cost assignments we have developed. Only a few UCI datasets have assigned<br />
costs. In this work we designed a semi-randomized method <strong>for</strong> assigning costs<br />
to existing datasets. We applied this method on 25 datasets and established a<br />
repository, available at: http://www.cs.technion.ac.il/∼e<strong>saher</strong>/cost.<br />
This research can be extended in several directions. We intend to apply<br />
monitoring techniques <strong>for</strong> optimal scheduling of the <strong>anytime</strong> learners. We also<br />
plan to use different measures <strong>for</strong> tree quality and compare their utility. While<br />
the tree size and expected error were generally successful, our sampling approach<br />
did not, in few cases, yield significant improvement. Using other measures may<br />
improve the per<strong>for</strong>mance in these cases.<br />
We also intend to test the per<strong>for</strong>mance of our framework on other cost schemes<br />
that involve other types of cost. We believe that the generality of our framework<br />
will allow excellent results to be obtained under other setups as well. To reduce<br />
the runtime of our <strong>anytime</strong> <strong>algorithms</strong>, we plan to cache some of the lookahead<br />
trees and use them, rather than resampling at each node. If a split is chosen,<br />
the sample of already available subtrees can be used to evaluate its descendants<br />
as well. Finally, an important advantage of our method is that it can be easily<br />
parallelized. Assume, <strong>for</strong> example, that we decided on samples of size r. Then, r<br />
different machines can independently <strong>for</strong>m the sample and speed up the induction<br />
process by a factor of r. We intend to consider this direction in the future.<br />
138