26.12.2022 Views

TheoryofDeepLearning.2022

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

50 theory of deep learning

coordinate descent with exact line search (AdaBoost) can result in

infinite step-sizes, leading the iterates to converge in a different direction

that is not a max-l 1 -margin direction [? ], hence the bounded

step-sizes rule in Theorem 6.3.2.

Theorem 6.3.2 is a generalization of the result of ? to steepest

descent with respect to other norms, and our proof follows the same

strategy as ? . We first prove a generalization of the duality result of

? ]: if there is a unit norm linear separator that achieves margin γ,

then ‖∇L(w)‖ ⋆ ≥ γL(w) for all w. By using this lower bound on the

dual norm of the gradient, we are able to show that the loss decreases

faster than the increase in the norm of the iterates, establishing

convergence in a margin maximizing direction.

In the rest of this section, we discuss the proof of Theorem 6.3.2.

The proof is divided into three steps:

1. Gradient domination condition: For all norms and any w, ‖∇L(w)‖ ⋆ ≥

γL(w)

2. Optimization properties of steepest descent such as decrease of

loss function and convergence of the gradient in dual norm to

zero.

3. Establishing sufficiently fast convergence of L(w t ) relative to the

growth of ‖w t ‖ to prove the Theorem.

Proposition 6.3.3. Gradient domination condition (Lemma 10 of [? ])

Let γ = max ‖w‖≤1 min i y i x ⊤ i

w. For all w,

‖∇L(w)‖ ⋆ ≥ γL(w).

Next, we establish some optimization properties of the steepest

descent algorithm including convergence of gradient norms and loss

value.

Proposition 6.3.4. (Lemma 11 and 12 of ? ]) Consider the steepest descent

iterates w t on (6.6) with stepsize η ≤

1

B 2 L(w 0 ) , where B = max i ‖x i ‖ ⋆ . The

following holds:

1. L(w t+1 ) ≤ L(w t ).

2. ∑ ∞ t=0 ‖∇L(w t)‖ 2 < ∞ and hence ‖∇L(w t )‖ ⋆ → 0.

3. L(w t ) → 0 and hence w ⊤ t x i → ∞.

4. ∑ ∞ t=0 ‖∇L(w t)‖ ⋆ = ∞.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!