23.02.2015 Views

Machine Learning - DISCo

Machine Learning - DISCo

Machine Learning - DISCo

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

(b) You now draw a new set of 100 instances, drawn independently according to the<br />

same distribution D. You find that h misclassifies 30 of these 100 new examples.<br />

Can you give an interval into which this true error will fall with approximately<br />

95% probability? (Ignore the performance over the earlier training data for this<br />

part.) If so, state it and justify it briefly. If not, explain the difficulty.<br />

(c) It may seem a bit odd that h misclassifies 30% of the new examples even though<br />

it perfectly classified the training examples. Is this event more likely for large<br />

n or small n? Justify your answer in a sentence.<br />

REFERENCES<br />

Angluin, D. (1992). Computational learning theory: Survey and selected bibliography. Proceedings<br />

of the Twenty-Fourth Annual ACM Symposium on Theory of Computing (pp. 351-369). ACM<br />

Press.<br />

Angluin, D., Frazier, M., & Pitt, L. (1992). <strong>Learning</strong> conjunctions of horn clauses. <strong>Machine</strong> <strong>Learning</strong>,<br />

9, 147-164.<br />

Anthony, M., & Biggs, N. (1992). Computational learning theory: An introduction. Cambridge,<br />

England: Cambridge University Press.<br />

Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1989). Learnability and the Vapnik-<br />

Chemonenkis dimension. Journal of the ACM, 36(4) (October), 929-965.<br />

Ehrenfeucht, A., Haussler, D., Kearns, M., & Valiant, L. (1989). A general lower bound on the<br />

number of examples needed for learning. Informution and computation, 82, 247-261.<br />

Gold, E. M. (1967). Language identification in the limit. Information and Control, 10, 447-474.<br />

Goldman, S. (Ed.). (1995). Special issue on computational learning theory. <strong>Machine</strong> <strong>Learning</strong>,<br />

18(2/3), February.<br />

Haussler, D. (1988). Quantifying inductive bias: A1 learning algorithms and Valiant's learning framework.<br />

ArtGcial Intelligence, 36, 177-221.<br />

Kearns, M. J., & Vazirani, U. V. (1994). An introduction to computational learning theory. Cambridge,<br />

MA: MIT Press.<br />

Laird, P. (1988). <strong>Learning</strong> from good and bad data. Dordrecht: Kluwer Academic Publishers.<br />

Li, M., & Valiant, L. G. (Eds.). (1994). Special issue on computational learning theory. <strong>Machine</strong><br />

<strong>Learning</strong>, 14(1).<br />

Littlestone, N. (1987). <strong>Learning</strong> quickly when irrelevant attributes abound: A new linear-threshold<br />

algorithm. <strong>Machine</strong> <strong>Learning</strong>, 2, 285-318.<br />

Littlestone, N., & Warmuth, M. (1991). The weighted majority algorithm (Technical report UCSC-<br />

CRL-91-28). Univ. of California Santa Cruz, Computer Engineering and Information Sciences<br />

Dept., Santa Cruz, CA.<br />

Littlestone, N., & Warmuth, M. (1994). The weighted majority algorithm. Information and Computation<br />

(log), 212-261.<br />

Pltt, L. (Ed.). (1990). Special issue on computational learning theory. <strong>Machine</strong> <strong>Learning</strong>, 5(2).<br />

Natarajan, B. K. (1991). <strong>Machine</strong> learning: A theoretical approach. San Mateo, CA: Morgan Kaufmann.<br />

Valiant, L. (1984). A theory of the learnable. Communications of the ACM, 27(1 I), 1134-1 142.<br />

Vapnik, V. N. (1982). Estimation of dependences based on empirical data. New York: Springer-<br />

Verlag.<br />

Vapnik, V. N., & Chervonenkis, A. (1971). On the uniform convergence of relative frequencies of<br />

events to their probabilities. Theory of Probability and Its Applications, 16, 264-280.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!