10.07.2015 Views

The error rate of learning halfspaces using kernel-SVM

The error rate of learning halfspaces using kernel-SVM

The error rate of learning halfspaces using kernel-SVM

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

We note that ω f◦g ≤ ω f · ω g and that ω Λv (ɛ) = ‖v‖ · ɛ. Thus, if ψ : S ∞ → H 1 is an absolutelycontinuous embedding such that k(x, y) = 〈ψ(x), ψ(y)〉 H1 , then for every v ∈ H 1 , it holdsthat ω Λv,0 ◦ψ ≤ ||v|| H1 · ω ψ . Suppose now that f ∈ H k with ‖f‖ Hk ≤ C. Let v ∈ H 1 such thatf = Λ v,0 ◦ ψ and ||v|| H1 = ||f|| Hk ≤ C. It follows from Levi’s Lemma thatPr (|f − Ef| > C · ω ψ (ɛ)) ≤ Pr (|f − Ef| > ω f (ɛ)) ≤ exp ( −ηdɛ 2) (12)Again, when both probability and expectation are w.r.t. the uniform distribution over S d−1 .Pro<strong>of</strong> (<strong>of</strong> <strong>The</strong>orems 2.6 and 3.2) Let β > α > 0 such that l(α) > l(β). Choose 0 < θ < 1large enough so that (1 − θ)l(−β) + θl(β) < θl(α). Define probability measures µ 1 , µ 2 , µ over[−1, 1] × {±1} as follows.µ 1 ((−γ, −1)) = 1 − θ, µ 1 ((γ, 1)) = θ<strong>The</strong> measure µ 2 is the product <strong>of</strong> uniform{±1} and the measure on [−1, 1] whose densityfunction is{0 |x| >18w(x) =√π81−(8x) 2 |x| ≤ 1 8Finally, µ = (1 − λ)µ 1 + λµ 2 for λ > 0, which will be chosen later.By lemma 5.15 (see remark 5.16), there is a continuous normalized <strong>kernel</strong> k ′ such thatw.p. ≥ 1 − 2e exp(− 1 ) the function returned by the algorithm is <strong>of</strong> the form f + b withγ‖f‖ Hk ′ ≤ c · m 3 A (γ) for some c > 0 (depending only on l). Let e ∈ Sd−1 be the vector fromLemma 5.22, corresponding to the <strong>kernel</strong> k ′ . <strong>The</strong> distribution D is the pullback <strong>of</strong> µ w.r.t.e. By considering the affine functional Λ e,0 , it holds that Err γ (D) ≤ λ.Let g be the solution returned by the algorithm. With probability ≥ 1 − exp(−1/γ),g = f + b, where f, b is a solution to program (11) with C = C A (γ) and with an additive<strong>error</strong> ≤ √ γ. Since the value <strong>of</strong> the zero solution for program (11) is l(0), it follows thatl(0) + √ γ ≥ Err µ,l (g) = (1 − λ) Err µ 1 e ,l(g) + λ Err µ 2 e ,l(g)Thus, Err µ 2 e ,l(g) ≤ l(0)+√ γ≤ 2l(0) . Combining Lemma 5.23, Lemma 5.15, and Lemma 5.22 isλ λfollows that w.p. ≥ 1 − (1 + 2e) exp(− 1 ) ≥ 1 − 10 exp(− 1 ), for m = m γ γ A(γ)∫∫∣ g −g∣ ≤ 128l(0)γK3.5 + 10 · c · K 3.5 · E · m 3 · (r K + s d )|∂ + l(0)|λ{x:〈x,e〉=γ}{x:〈x,e〉=−γ}By choosing K = Θ(log(m)), λ = Θ (γK 3.5 ) = Θ ( γ log 3.5 (m) ) and d = Θ(log(m)), we canmake the last bound ≤ α. We claim that ∫ g > α . To see that, note that otherwise∫2 {x:〈x,e〉=−γ} 2 g ≤ α thus,{x:〈x,e〉=γ}E (x,y)∼D l((f(x) + b)y) = E (x,y)∼D l(g(x)y)∫≥ θ(1 − λ) ·≥≥(∫θ(1 − λ) · l{x:〈x,e〉=γ}l(g(x))dxg(x)dx{x:〈x,e〉=γ})θ · l (α) · (1 − λ) = θ · l (α) + o(1)34

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!