18.11.2012 Views

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

anytime algorithms for learning anytime classifiers saher ... - Technion

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Technion</strong> - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008<br />

Procedure TS-Greedy-Build-Tree(E, A)<br />

Return ID3(E, A)<br />

Procedure TS-Rebuild-Tree(E, A, r)<br />

Return LSID3(E, A, r)<br />

Procedure TS-Expected-Cost(T, node)<br />

Enode ← Examples-At(node, T, E)<br />

Anode ← Attributes-At(node, T, A)<br />

Return Next-R(node) · |Enode| · |Anode| 3<br />

Procedure TS-Expected-Benefit(T)<br />

l-bound ← (mina∈Anode |Domain(a)|)2<br />

Return Tree-Size(T) − l-bound<br />

Procedure TS-Better(T1, T2)<br />

Return Tree-Size(T1) < Tree-Size(T2)<br />

Figure 3.15: IIDT-TS<br />

We refer to the above-described instantiation of IIDT that uses the tree size as a<br />

quality metric by IIDT-TS. Figure 3.15 <strong>for</strong>malizes IIDT-TS.<br />

Evaluating a Subtree<br />

Although LSID3 is expected to produce better trees when allocated more resources,<br />

an improved result is not guaranteed. Thus, to avoid obtaining an induced<br />

tree of lower quality, we replace an existing subtree with a newly induced<br />

alternative only if the alternative is expected to improve the quality of the complete<br />

decision tree. Following Occam’s Razor, we measure the usefulness of a<br />

subtree by its size. Only if the reconstructed subtree is smaller does it replace an<br />

existing subtree. This guarantees that the size of the complete decision tree will<br />

decrease monotonically.<br />

Another possible measure is the accuracy of the decision tree on a set-aside<br />

validation set of examples. In this case the training set is split into two subsets:<br />

a growing set and a validation set. Only if the accuracy on the validation set<br />

increases is the modification applied. This measure suffers from two drawbacks.<br />

The first is that putting aside a set of examples <strong>for</strong> validation results in a smaller<br />

set of training examples, making the <strong>learning</strong> process harder. The second is the<br />

bias towards overfitting the validation set, which might reduce the generalization<br />

41

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!