23.02.2015 Views

Machine Learning - DISCo

Machine Learning - DISCo

Machine Learning - DISCo

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

est possible hypothesis in H, after observing rn randomly drawn training<br />

examples, provided<br />

a The number of training examples required for successful learning is strongly<br />

influenced by the complexity of the hypothesis space considered by the<br />

learner. One useful measure of the complexity of a hypothesis space H<br />

is its Vapnik-Chervonenkis dimension, VC(H). VC(H) is the size of the<br />

largest subset of instances that can be shattered (split in all possible ways)<br />

by H.<br />

a An alternative upper bound on the number of training examples sufficient<br />

for successful learning under the PAC model, stated in terms of VC(H) is<br />

A lower bound is<br />

a An alternative learning model, called the mistake bound model, is used to<br />

analyze the number of training examples a learner will misclassify before<br />

it exactly learns the target concept. For example, the HALVING algorithm<br />

will make at most Llog, 1 H 1 J mistakes before exactly learning any target<br />

concept drawn from H. For an arbitrary concept class C, the best worstcase<br />

algorithm will make Opt (C) mistakes, where<br />

VC(C> 5 Opt(C) I log,(lCI)<br />

a The WEIGHTED-MAJORITY algorithm combines the weighted votes of multiple<br />

prediction algorithms to classify new instances. It learns weights for each of<br />

these prediction algorithms based on errors made over a sequence of examples.<br />

Interestingly, the number of mistakes made by WEIGHTED-MAJORITY can<br />

be bounded in terms of the number of mistakes made by the best prediction<br />

algorithm in the pool.<br />

Much early work on computational learning theory dealt with the question<br />

of whether the learner could identify the target concept in the limit, given an<br />

indefinitely long sequence of training examples. The identification in the limit<br />

model was introduced by Gold (1967). A good overview of results in this area is<br />

(Angluin 1992). Vapnik (1982) examines in detail the problem of uniform convergence,<br />

and the closely related PAC-learning model was introduced by Valiant<br />

(1984). The discussion in this chapter of €-exhausting the version space is based<br />

on Haussler's (1988) exposition. A useful collection of results under the PAC<br />

model can be found in Blumer et al. (1989). Kearns and Vazirani (1994) provide<br />

an excellent exposition of many results from computational learning theory.<br />

Earlier texts in this area include Anthony and Biggs (1992) and Natarajan (1991).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!