26.12.2022 Views

TheoryofDeepLearning.2022

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

basic setup and some math notions 15

Lemma 1.1.5 (Bernstein inequality [? ]). Let X 1 , · · · , X n be independent

zero-mean random variables. Suppose that |X i | ≤ M almost surely, for all i.

Then, for all positive t,

Pr

[ n∑

X i > t

i=1

] (

)

t

≤ exp −

2 /2

∑ n j=1 E[X2 j ] + Mt/3 .

Lemma 1.1.6 (Anti-concentration of Gaussian distribution). Let

X ∼ N(0, σ 2 ), that is, the probability density function of X is given by

φ(x) = √ 1 x2

e− 2σ 2 . Then

2πσ 2

( 2 t

Pr[|X| ≤ t] ∈

3 σ , 4 )

t

.

5 σ

Lemma 1.1.7 (Matrix Bernstein, Theorem 6.1.1 in [? ]). Consider a finite

sequence {X 1 , · · · , X m } ⊂ R n 1×n 2 of independent, random matrices with

common dimension n 1 × n 2 . Assume that

E[X i ] = 0, ∀i ∈ [m] and ‖X i ‖ ≤ M, ∀i ∈ [m].

Let Z = ∑i=1 m X i. Let Var[Z] be the matrix variance statistic of sum:

{ }

∥∥∥ m Var[Z] = max ∑ E[X i Xi ⊤ m

] ∥, ∥ ∑ E[Xi ⊤ X i ] ∥ .

i=1

Then

i=1

E[‖Z‖] ≤ (2Var[Z] · log(n 1 + n 2 )) 1/2 + M · log(n 1 + n 2 )/3.

Furthermore, for all t ≥ 0,

(

Pr[‖Z‖ ≥ t] ≤ (n 1 + n 2 ) · exp −

t 2 /2

Var[Z] + Mt/3

explain these in a para

A useful shorthand will be the following: If y 1 , y 2 , . . . , y m are independent

random variables each having mean 0 and taking values

in [−1, 1], then their average 1 m ∑ i y i behaves like a Gaussian variable

with mean zero and variance at most 1/m. In other words, the

probability that this average is at least ɛ in absolute value is at most

exp(−ɛ 2 m).

)

.

1.1.2 Singular Value Decomposition

TBD.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!