23.02.2015 Views

Machine Learning - DISCo

Machine Learning - DISCo

Machine Learning - DISCo

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 7 COMPUTATIONAL LEARNING THEORY 211<br />

probability of the coin being heads corresponds to the probability that the hypothesis<br />

will misclassify a randomly drawn instance. The m independent coin flips correspond<br />

to the m independently drawn instances. The frequency of heads over the m<br />

examples corresponds to the frequency of misclassifications over the m instances.<br />

The Hoeffding bounds state that if the training error errOrD(h) is measured<br />

over the set D containing m randomly drawn examples, then<br />

This gives us a bound on the probability that an arbitrarily chosen single hypothesis<br />

has a very misleading training error. To assure that the best hypothesis found by<br />

L has an error bounded in this way, we must consider the probability that any<br />

one of the 1 H 1 hypotheses could have a large error<br />

Pr[(3h E H)(errorv(h) > error~(h) + E)] 5 1 H ~ e - ~ ~ ' ~<br />

If we call this probability 6, and ask how many examples m suffice to hold S to<br />

some desired value, we now obtain<br />

This is the generalization of Equation (7.2) to the case in which the learner still<br />

picks the best hypothesis h E H, but where the best hypothesis may have nonzero<br />

training error. Notice that m depends logarithmically on H and on 116, as it did<br />

in the more restrictive case of Equation (7.2). However, in this less restrictive<br />

situation m now grows as the square of 116, rather than linearly with 116.<br />

7.3.2 Conjunctions of Boolean Literals Are PAC-Learnable<br />

Now that we have a bound indicating the number of training examples sufficient<br />

to probably approximately learn the target concept, we can use it to determine the<br />

sample complexity and PAC-learnability of some specific concept classes.<br />

Consider the class C of target concepts described by conjunctions of boolean<br />

literals. A boolean literal is any boolean variable (e.g., Old), or its negation (e.g.,<br />

-Old). Thus, conjunctions of boolean literals include target concepts such as<br />

"Old A -Tallv. Is C PAC-learnable? We can show that the answer is yes by<br />

first showing that any consistent learner will require only a polynomial number<br />

of training examples to learn any c in C, and then suggesting a specific algorithm<br />

that uses polynomial time per training example.<br />

Consider any consistent learner L using a hypothesis space H identical to C.<br />

We can use Equation (7.2) to compute the number m of random training examples<br />

sufficient to ensure that L will, with probability (1 - S), output a hypothesis with<br />

maximum error E. To accomplish this, we need only determine the size IHI of<br />

the hypothesis space.<br />

Now consider the hypothesis space H defined by conjunctions of literals<br />

based on n boolean variables. The size 1HI of this hypothesis space is 3". To see<br />

this, consider the fact that there are only three possibilities for each variable in

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!