The error rate of learning halfspaces using kernel-SVM

More documents

Recommendations

Info

$Lectures on fractal geometry and dynamics$

Acknowledgements: Amit Daniely is a recipient of the Google Europe Fellowship inLearning Theory, and this research is supported in part by this Google Fellowship. NatiLinial is supported by grants from ISF, BSF and I-Core. Shai Shalev-Shwartz is supportedby the Israeli Science Foundation grant number 590-10.ReferencesMartin Anthony and Peter Bartlet.Cambridge University Press, 1999.Neural Network Learning: Theoretical Foundations.K. Atkinson and W. Han. Spherical Harmonics and Approximations on the Unit Sphere: AnIntroduction, volume 2044. Springer, 2012.P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classification, and risk bounds.Journal of the American Statistical Association, 101:138–156, 2006.S. Ben-David, D. Loker, N. Srebro, and K. Sridharan. Minimizing the misclassification errorrate using a surrogate convex loss. In ICML, 2012.Shai Ben-David, Nadav Eiron, and Hans Ulrich Simon. Limitations of learning via embeddingsin euclidean half spaces. The Journal of Machine Learning Research, 3:441–461,2003.A. Birnbaum and S. Shalev-Shwartz. Learning halfspaces with the zero-one loss: Timeaccuracytradeoffs. In NIPS, 2012.E. Blais, R. O’Donnell, and K Wimmer. Polynomial regression under arbitrary productdistributions. In COLT, 2008.N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. CambridgeUniversity Press, 2000.Amit Daniely, Nati Linial, and Shai Shalev-Shwartz. From average case complexity to improperlearning complexity. arXiv preprint arXiv:1311.2272, 2013.V. Feldman, P. Gopalan, S. Khot, and A.K. Ponnuswami. New results for learning noisyparities and halfspaces. In In Proceedings of the 47th Annual IEEE Symposium on Foundationsof Computer Science, 2006.G.B. Folland. A course in abstract harmonic analysis. CRC, 1994.V. Guruswami and P. Raghavendra. Hardness of learning halfspaces with noise. In Proceedingsof the 47th Foundations of Computer Science (FOCS), 2006.A. Kalai, A.R. Klivans, Y. Mansour, and R. Servedio. Agnostically learning halfspaces. InProceedings of the 46th Foundations of Computer Science (FOCS), 2005.A.R. Klivans and R. Servedio. Learning DNF in time 2Õ(n1/3) . In STOC, pages 258–265.ACM, 2001.43
Kosaku Yosida. Functional Analysis. Springer-Verlag, Heidelberg, 1963.Eyal Kushilevitz and Yishay Mansour. Learning decision trees using the Fourier spectrum.In STOC, pages 455–464, May 1991.Nathan Linial, Yishay Mansour, and Noam Nisan. Constant depth circuits, Fourier transform,and learnability. In FOCS, pages 574–579, October 1989.P.M. Long and R.A. Servedio. Learning large-margin halfspaces with more malicious noise.In NIPS, 2011.J. Matousek. Lectures on discrete geometry, volume 212. Springer, 2002.V.D. Milman and G. Schechtman. Asymptotic Theory of Finite Dimensional Normed Spaces:Isoperimetric Inequalities in Riemannian Manifolds, volume 1200. Springer, 2002.F. Rosenblatt. The perceptron: A probabilistic model for information storage and organizationin the brain. Psychological Review, 65:386–407, 1958. (Reprinted in Neurocomputing(MIT Press, 1988).).S. Saitoh. Theory of reproducing kernels and its applications. Longman Scientific & TechnicalEngland, 1988.IJ Schoenberg. Positive definite functions on spheres. Duke. Math. J., 1942.B. Schölkopf, C. Burges, and A. Smola, editors. Advances in Kernel Methods - SupportVector Learning. MIT Press, 1998.S. Shalev-Shwartz, O. Shamir, and K. Sridharan. Learning kernel-based halfspaces with the0-1 loss. SIAM Journal on Computing, 40:1623–1646, 2011.I. Steinwart and A. Christmann. Support vector machines. Springer, 2008.R. Tibshirani. Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., 58(1):267–288, 1996.V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.Manfred K Warmuth and SVN Vishwanathan. Leaving the span. In Learning Theory, pages366–381. Springer, 2005.T. Zhang. Statistical behavior and consistency of classification methods based on convexrisk minimization. The Annals of Statistics, 32:56–85, 2004.44
Page 1 and 2: The complexity of learning halfspac
Page 3 and 4: exists an equivalent inner product
Page 5 and 6: is enough that we can efficiently c
Page 7 and 8: our terminology, they considered th
Page 9 and 10: It is shown in (Birnbaum and Shalev
Page 11 and 12: We now expand on this brief descrip
Page 13 and 14: and (uniformly and independently) a
Page 15 and 16: The proof of Theorem 2.7To prove Th
Page 17 and 18: attempts to prove a quantitative op
Page 19 and 20: 5.1.3 Harmonic Analysis on the Sphe
Page 21 and 22: Lemma 5.11 (John’s Lemma) (Matous
Page 23 and 24: For 1 ≤ i ≤ t Let v i = x i −
Page 25 and 26: ( )that in this case Err µN ,hinge
Page 27 and 28: Thus, it is enough to find a neighb
Page 29 and 30: Legendre polynomials we have|P d,n
Page 31 and 32: Then, for every K ∈ N, 1 8 > γ >
Page 33 and 34: Now, it holds that∫∫∫ ∫∫
Page 35 and 36: We note that ω f◦g ≤ ω f ·
Page 37 and 38: Now, denote δ = ∫ g. It holds th
Page 39 and 40: equivalent formulationminErr D,l (f
Page 41 and 42: Denote ||g|| Hk = C. By Lemma 5.25,
Page 43: Consequently, every approximated so

The error rate of learning halfspaces using kernel-SVM

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?