The error rate of learning halfspaces using kernel-SVM
The error rate of learning halfspaces using kernel-SVM
The error rate of learning halfspaces using kernel-SVM
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
and (uniformly and independently) a vector z from the 1-codimensional sphere <strong>of</strong> S d−1 thatis orthogonal to e. <strong>The</strong> constructed point is (αe + √ 1 − α 2 z, β).For any f ∈ H k and a ∈ [−1, 1] define ¯f(a) to be the expectation <strong>of</strong> f over the 1-codimensional sphere {x ∈ S d−1 : 〈x, e〉 = a}. We will show that for any f ∈ H k , such that‖f‖ Hk ≤ C and Err De,hinge(f) ≤ 1, we have that | ¯f(γ) − ¯f(−γ)| = o(1).To do so, let us first assume that f is symmetric with respect to e, and hence can bewritten as∞∑f(x) = α n P d,n (〈x, e〉) ,n=0where α n ∈ R and P d,n is the d-dimensional Legendre polynomial <strong>of</strong> degree n. Furthermore,by a characterization <strong>of</strong> Hilbert spaces corresponding to symmetric <strong>kernel</strong>s, it follows that∑ α2n ≤ C 2 .Since f is symmetric w.r.t. e we have,¯f(a) =∞∑α n P d,n (a) .n=0For |a| ≤ 1/8, we have that |P d,n (a)| tends to zero exponentially fast with both d and n.Hence, if d is large enough then¯f(a) ≈log(C)∑n=0α n P d,n (a) =: ˜f(a) .Note that ˜f is a polynomial <strong>of</strong> degree bounded by log(C). In addition, by construction,Err De,hinge(f) = Err D,hinge ( ¯f) ≈ Err D,hinge ( ˜f). Hence, if 1 ≥ Err De,hinge(f) then <strong>using</strong> theprevious subsection we conclude that | ¯f(γ) − ¯f(−γ)| = o(1).Symmetrization <strong>of</strong> fIn the above, we assumed that both the <strong>kernel</strong> function is symmetric and that f is symmetricw.r.t. e. Our next step is to relax the latter assumption, while still assuming that the <strong>kernel</strong>function is symmetric.Let O(e) be the group <strong>of</strong> linear isometries that fix e, namely, O(e) = {A ∈ O(d) : Ae = e}.By assuming that k is a symmetric <strong>kernel</strong>, we have that for all A ∈ O(e), the functiong(x) = f(Ax) is also in H k . Furthermore, ‖g‖ Hk = ‖f‖ Hk and by the construction<strong>of</strong> D e we also have Err De,hinge(g) = Err De,hinge(f). Let P e f(x) = ∫ f(Ax)dA beO(e)the symmetrization <strong>of</strong> f w.r.t. e. On one hand, P e f ∈ H k , ‖P e f‖ Hk ≤ ‖f‖ Hk , andErr De,hinge(P e f) ≤ Err De,hinge(f). On the other hand, ¯f = Pe f. Since for P e f we havealready shown that |P e f(γ) − P e f(−γ)| = o(1), it follows that | ¯f(γ) − ¯f(−γ)| = o(1) as well.Symmetrization <strong>of</strong> the <strong>kernel</strong>Our final step is to remove the assumption that the <strong>kernel</strong> is symmetric. To do so, we firstsymmetrize the <strong>kernel</strong> as follows. Recall that O(d) is the group <strong>of</strong> linear isometries <strong>of</strong> R d .12