10.07.2015 Views

The error rate of learning halfspaces using kernel-SVM

The error rate of learning halfspaces using kernel-SVM

The error rate of learning halfspaces using kernel-SVM

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

and (uniformly and independently) a vector z from the 1-codimensional sphere <strong>of</strong> S d−1 thatis orthogonal to e. <strong>The</strong> constructed point is (αe + √ 1 − α 2 z, β).For any f ∈ H k and a ∈ [−1, 1] define ¯f(a) to be the expectation <strong>of</strong> f over the 1-codimensional sphere {x ∈ S d−1 : 〈x, e〉 = a}. We will show that for any f ∈ H k , such that‖f‖ Hk ≤ C and Err De,hinge(f) ≤ 1, we have that | ¯f(γ) − ¯f(−γ)| = o(1).To do so, let us first assume that f is symmetric with respect to e, and hence can bewritten as∞∑f(x) = α n P d,n (〈x, e〉) ,n=0where α n ∈ R and P d,n is the d-dimensional Legendre polynomial <strong>of</strong> degree n. Furthermore,by a characterization <strong>of</strong> Hilbert spaces corresponding to symmetric <strong>kernel</strong>s, it follows that∑ α2n ≤ C 2 .Since f is symmetric w.r.t. e we have,¯f(a) =∞∑α n P d,n (a) .n=0For |a| ≤ 1/8, we have that |P d,n (a)| tends to zero exponentially fast with both d and n.Hence, if d is large enough then¯f(a) ≈log(C)∑n=0α n P d,n (a) =: ˜f(a) .Note that ˜f is a polynomial <strong>of</strong> degree bounded by log(C). In addition, by construction,Err De,hinge(f) = Err D,hinge ( ¯f) ≈ Err D,hinge ( ˜f). Hence, if 1 ≥ Err De,hinge(f) then <strong>using</strong> theprevious subsection we conclude that | ¯f(γ) − ¯f(−γ)| = o(1).Symmetrization <strong>of</strong> fIn the above, we assumed that both the <strong>kernel</strong> function is symmetric and that f is symmetricw.r.t. e. Our next step is to relax the latter assumption, while still assuming that the <strong>kernel</strong>function is symmetric.Let O(e) be the group <strong>of</strong> linear isometries that fix e, namely, O(e) = {A ∈ O(d) : Ae = e}.By assuming that k is a symmetric <strong>kernel</strong>, we have that for all A ∈ O(e), the functiong(x) = f(Ax) is also in H k . Furthermore, ‖g‖ Hk = ‖f‖ Hk and by the construction<strong>of</strong> D e we also have Err De,hinge(g) = Err De,hinge(f). Let P e f(x) = ∫ f(Ax)dA beO(e)the symmetrization <strong>of</strong> f w.r.t. e. On one hand, P e f ∈ H k , ‖P e f‖ Hk ≤ ‖f‖ Hk , andErr De,hinge(P e f) ≤ Err De,hinge(f). On the other hand, ¯f = Pe f. Since for P e f we havealready shown that |P e f(γ) − P e f(−γ)| = o(1), it follows that | ¯f(γ) − ¯f(−γ)| = o(1) as well.Symmetrization <strong>of</strong> the <strong>kernel</strong>Our final step is to remove the assumption that the <strong>kernel</strong> is symmetric. To do so, we firstsymmetrize the <strong>kernel</strong> as follows. Recall that O(d) is the group <strong>of</strong> linear isometries <strong>of</strong> R d .12

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!