Subgradient slides

Review: subgradient 

descent 

• Initialize x1 

• for t = 1 to “I’m tired”: 

‣ ft = estimate of f 

Problem: 

minx f(x) 

(optionally, 

s.t. x ∈ F) 

‣ from limited # of terms (or just use f) 

‣ gt = any element of ∂ft(xt) 

‣ xt+0.5 = xt – ηtgt 

‣ xt+1 = arg minx∈F ||x–xt+0.5|| 

‣ which is just xt+1 = xt+0.5 if F = R n

Subgradient in action 

! 

" 

# 

$ 

!# 

!" 

!! 

!! !" !# $ # " !

Convergence summary 

• For strictly convex f(x) (i.e., λ>0): 

‣ set ƞt = 1/λt 

~ 

‣ f(xt) – f(x*) = O(1/t) 

• For non-strictly convex f(x) (i.e., λ=0): 

‣ set ƞt = 1/√t 

‣ f(xt) – f(x*) = O(1/√t) 

• To get accuracy ϵ: 

~ 

‣ λ>0: T = O(1/ϵ) 

‣ λ=0: T = O(1/ϵ 2 ) 

Interior point: T = 

O(ln(1/ϵ)), but each 

iteration much slower

Convergence intuition 

• Q(x) = ||x – x * || 2 /2 

• Proof works by guaranteeing that Q(x) decreases 

‣ subtlety: only if f(xt) ≫ f(x * ) 

‣ has to be like this: e.g., multiple minimizers of f 

• We showed (for λ=0): 

‣ f(x * ) ≥ f(xt) + Q(xt+1)/ƞt – Q(xt)/ƞt + ƞt||gt|| 2 /2 

• Suppose f(xt) ≥ f(x * ) + ϵ:

Typical SVM bound 

• Given n training examples, for any δ

Example: SVM 

• Suppose no b 

‣ L = ||w|| 2 /2 + (C/m) ∑ h(yixi T w)) 

‣ ∂L = w + (C/m) ∑ yi xi ∂h(yixi T w)) 

• If ||xi|| ≤ X:

What if we want b 

• Problem: λ=0 

‣ 

• Solutions: 

‣ ignore the problem: 

‣ penalize b too: 

‣ change the algorithm slightly:

Subgradient slides

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?