23.02.2015 Views

Machine Learning - DISCo

Machine Learning - DISCo

Machine Learning - DISCo

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

of the difference in their errors<br />

where L(S) denotes the hypothesis output by learning method L when given<br />

the sample S of training data and where the subscript S c V indicates that<br />

the expected value is taken over samples S drawn according to the underlying<br />

instance distribution V. The above expression describes the expected value of the<br />

difference in errors between learning methods LA and L B.<br />

Of course in practice we have only a limited sample Do of data when<br />

comparing learning methods. In such cases, one obvious approach to estimating<br />

the above quantity is to divide Do into a training set So and a disjoint test set To.<br />

The training data can be used to train both LA and LB, and the test data can be<br />

used to compare the accuracy of the two learned hypotheses. In other words, we<br />

measure the quantity<br />

Notice two key differences between this estimator and the quantity in Equation<br />

(5.14). First, we are using errorTo(h) to approximate errorv(h). Second, we<br />

are only measuring the difference in errors for one training set So rather than taking<br />

the expected value of this difference over all samples S that might be drawn<br />

from the distribution 2).<br />

One way to improve on the estimator given by Equation (5.15) is to repeatedly<br />

partition the data Do into disjoint training and test sets and to take the mean<br />

of the test set errors for these different experiments. This leads to the procedure<br />

shown in Table 5.5 for estimating the difference between errors of two learning<br />

methods, based on a fixed sample Do of available data. This procedure first partitions<br />

the data into k disjoint subsets of equal size, where this size is at least 30. It<br />

then trains and tests the learning algorithms k times, using each of the k subsets<br />

in turn as the test set, and using all remaining data as the training set. In this<br />

way, the learning algorithms are tested on k independent test sets, 'and the mean<br />

difference in errors 8 is returned as an estimate of the difference between the two<br />

learning algorithms.<br />

The quantity 8 returned by the procedure of Table 5.5 can be taken as an<br />

estimate of the desired quantity from Equation 5.14. More appropriately, we can<br />

view 8 as an estimate of the quantity<br />

where S represents a random sample of size ID01 drawn uniformly from Do.<br />

The only difference between this expression and our original expression in Equation<br />

(5.14) is that this new expression takes the expected value over subsets of<br />

the available data Do, rather than over subsets drawn from the full instance distribution<br />

2).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!