23.02.2015 Views

Machine Learning - DISCo

Machine Learning - DISCo

Machine Learning - DISCo

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

can show that it is at least 3, as follows. Represent each instance by a 3-bit string<br />

corresponding to the values of each of its three literals 11, 12, and 13. Consider the<br />

following set of three instances:<br />

This set of three instances can be shattered by H, because a hypothesis<br />

can be constructed for any desired dichotomy as follows: If the dichotomy is to<br />

exclude instancei, add the literal -li to the hypothesis. For example, suppose we<br />

wish to include instance2, but exclude instance1 and instance3. Then we use the<br />

hypothesis -Il A -I3. This argument easily extends from three features to n. Thus,<br />

the VC dimension for conjunctions of n boolean literals is at least n. In fact, it is<br />

exactly n, though showing this is more difficult, because it requires demonstrating<br />

that no set of n + 1 instances can be shattered.<br />

i 7.4.3 Sample Complexity and the VC Dimension<br />

Earlier we considered the question "How many randomly drawn training examples<br />

suffice to probably approximately learn any target concept in C?' (i.e., how many<br />

examples suffice to €-exhaust the version space with probability (1 - a)?). Using<br />

VC(H) as a measure for the complexity of H, it is possible to derive an alternative<br />

answer to this question, analogous to the earlier bound of Equation (7.2). This<br />

new bound (see Blumer et al. 1989) is<br />

Note that just as in the bound from Equation (7.2), the number of required training<br />

examples m grows logarithmically in 118. It now grows log times linear in 116,<br />

rather than linearly. Significantly, the In I HI term in the earlier bound has now<br />

been replaced by the alternative measure of hypothesis space complexity, VC(H)<br />

(recall VC(H) I log2 I H I).<br />

Equation (7.7) provides an upper bound on the number of training examples<br />

sufficient to probably approximately learn any target concept in C, for any desired<br />

t and a. It is also possible to obtain a lower bound, as summarized in the following<br />

theorem (see Ehrenfeucht et al. 1989).<br />

Theorem 7.3. Lower bound on sample complexity. Consider any concept class<br />

C such that VC(C) 2 2, any learner L, and any 0 < E < $, and 0 < S < &. Then<br />

there exists a distribution 23 and target concept in C such that if L observes fewer<br />

examples than<br />

then with probability at least 6, L outputs a hypothesis h having errorD(h) > E.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!