Acknowledgements: Amit Daniely is a recipient <strong>of</strong> the Google Europe Fellowship inLearning <strong>The</strong>ory, and this research is supported in part by this Google Fellowship. NatiLinial is supported by grants from ISF, BSF and I-Core. Shai Shalev-Shwartz is supportedby the Israeli Science Foundation grant number 590-10.ReferencesMartin Anthony and Peter Bartlet.Cambridge University Press, 1999.Neural Network Learning: <strong>The</strong>oretical Foundations.K. Atkinson and W. Han. Spherical Harmonics and Approximations on the Unit Sphere: AnIntroduction, volume 2044. Springer, 2012.P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classification, and risk bounds.Journal <strong>of</strong> the American Statistical Association, 101:138–156, 2006.S. Ben-David, D. Loker, N. Srebro, and K. Sridharan. Minimizing the misclassification <strong>error</strong><strong>rate</strong> <strong>using</strong> a surrogate convex loss. In ICML, 2012.Shai Ben-David, Nadav Eiron, and Hans Ulrich Simon. Limitations <strong>of</strong> <strong>learning</strong> via embeddingsin euclidean half spaces. <strong>The</strong> Journal <strong>of</strong> Machine Learning Research, 3:441–461,2003.A. Birnbaum and S. Shalev-Shwartz. Learning <strong>halfspaces</strong> with the zero-one loss: Timeaccuracytrade<strong>of</strong>fs. In NIPS, 2012.E. Blais, R. O’Donnell, and K Wimmer. Polynomial regression under arbitrary productdistributions. In COLT, 2008.N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. CambridgeUniversity Press, 2000.Amit Daniely, Nati Linial, and Shai Shalev-Shwartz. From average case complexity to improper<strong>learning</strong> complexity. arXiv preprint arXiv:1311.2272, 2013.V. Feldman, P. Gopalan, S. Khot, and A.K. Ponnuswami. New results for <strong>learning</strong> noisyparities and <strong>halfspaces</strong>. In In Proceedings <strong>of</strong> the 47th Annual IEEE Symposium on Foundations<strong>of</strong> Computer Science, 2006.G.B. Folland. A course in abstract harmonic analysis. CRC, 1994.V. Guruswami and P. Raghavendra. Hardness <strong>of</strong> <strong>learning</strong> <strong>halfspaces</strong> with noise. In Proceedings<strong>of</strong> the 47th Foundations <strong>of</strong> Computer Science (FOCS), 2006.A. Kalai, A.R. Klivans, Y. Mansour, and R. Servedio. Agnostically <strong>learning</strong> <strong>halfspaces</strong>. InProceedings <strong>of</strong> the 46th Foundations <strong>of</strong> Computer Science (FOCS), 2005.A.R. Klivans and R. Servedio. Learning DNF in time 2Õ(n1/3) . In STOC, pages 258–265.ACM, 2001.43
Kosaku Yosida. Functional Analysis. Springer-Verlag, Heidelberg, 1963.Eyal Kushilevitz and Yishay Mansour. Learning decision trees <strong>using</strong> the Fourier spectrum.In STOC, pages 455–464, May 1991.Nathan Linial, Yishay Mansour, and Noam Nisan. Constant depth circuits, Fourier transform,and learnability. In FOCS, pages 574–579, October 1989.P.M. Long and R.A. Servedio. Learning large-margin <strong>halfspaces</strong> with more malicious noise.In NIPS, 2011.J. Matousek. Lectures on discrete geometry, volume 212. Springer, 2002.V.D. Milman and G. Schechtman. Asymptotic <strong>The</strong>ory <strong>of</strong> Finite Dimensional Normed Spaces:Isoperimetric Inequalities in Riemannian Manifolds, volume 1200. Springer, 2002.F. Rosenblatt. <strong>The</strong> perceptron: A probabilistic model for information storage and organizationin the brain. Psychological Review, 65:386–407, 1958. (Reprinted in Neurocomputing(MIT Press, 1988).).S. Saitoh. <strong>The</strong>ory <strong>of</strong> reproducing <strong>kernel</strong>s and its applications. Longman Scientific & TechnicalEngland, 1988.IJ Schoenberg. Positive definite functions on spheres. Duke. Math. J., 1942.B. Schölkopf, C. Burges, and A. Smola, editors. Advances in Kernel Methods - SupportVector Learning. MIT Press, 1998.S. Shalev-Shwartz, O. Shamir, and K. Sridharan. Learning <strong>kernel</strong>-based <strong>halfspaces</strong> with the0-1 loss. SIAM Journal on Computing, 40:1623–1646, 2011.I. Steinwart and A. Christmann. Support vector machines. Springer, 2008.R. Tibshirani. Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., 58(1):267–288, 1996.V. N. Vapnik. Statistical Learning <strong>The</strong>ory. Wiley, 1998.Manfred K Warmuth and SVN Vishwanathan. Leaving the span. In Learning <strong>The</strong>ory, pages366–381. Springer, 2005.T. Zhang. Statistical behavior and consistency <strong>of</strong> classification methods based on convexrisk minimization. <strong>The</strong> Annals <strong>of</strong> Statistics, 32:56–85, 2004.44
- Page 1 and 2: The complexity of learning halfspac
- Page 3 and 4: exists an equivalent inner product
- Page 5 and 6: is enough that we can efficiently c
- Page 7 and 8: our terminology, they considered th
- Page 9 and 10: It is shown in (Birnbaum and Shalev
- Page 11 and 12: We now expand on this brief descrip
- Page 13 and 14: and (uniformly and independently) a
- Page 15 and 16: The proof of Theorem 2.7To prove Th
- Page 17 and 18: attempts to prove a quantitative op
- Page 19 and 20: 5.1.3 Harmonic Analysis on the Sphe
- Page 21 and 22: Lemma 5.11 (John’s Lemma) (Matous
- Page 23 and 24: For 1 ≤ i ≤ t Let v i = x i −
- Page 25 and 26: ( )that in this case Err µN ,hinge
- Page 27 and 28: Thus, it is enough to find a neighb
- Page 29 and 30: Legendre polynomials we have|P d,n
- Page 31 and 32: Then, for every K ∈ N, 1 8 > γ >
- Page 33 and 34: Now, it holds that∫∫∫ ∫∫
- Page 35 and 36: We note that ω f◦g ≤ ω f ·
- Page 37 and 38: Now, denote δ = ∫ g. It holds th
- Page 39 and 40: equivalent formulationminErr D,l (f
- Page 41 and 42: Denote ||g|| Hk = C. By Lemma 5.25,
- Page 43: Consequently, every approximated so