TheoryofDeepLearning.2022
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
basic setup and some math notions 15
Lemma 1.1.5 (Bernstein inequality [? ]). Let X 1 , · · · , X n be independent
zero-mean random variables. Suppose that |X i | ≤ M almost surely, for all i.
Then, for all positive t,
Pr
[ n∑
X i > t
i=1
] (
)
t
≤ exp −
2 /2
∑ n j=1 E[X2 j ] + Mt/3 .
Lemma 1.1.6 (Anti-concentration of Gaussian distribution). Let
X ∼ N(0, σ 2 ), that is, the probability density function of X is given by
φ(x) = √ 1 x2
e− 2σ 2 . Then
2πσ 2
( 2 t
Pr[|X| ≤ t] ∈
3 σ , 4 )
t
.
5 σ
Lemma 1.1.7 (Matrix Bernstein, Theorem 6.1.1 in [? ]). Consider a finite
sequence {X 1 , · · · , X m } ⊂ R n 1×n 2 of independent, random matrices with
common dimension n 1 × n 2 . Assume that
E[X i ] = 0, ∀i ∈ [m] and ‖X i ‖ ≤ M, ∀i ∈ [m].
Let Z = ∑i=1 m X i. Let Var[Z] be the matrix variance statistic of sum:
{ }
∥∥∥ m Var[Z] = max ∑ E[X i Xi ⊤ m
∥
] ∥, ∥ ∑ E[Xi ⊤ X i ] ∥ .
i=1
Then
i=1
E[‖Z‖] ≤ (2Var[Z] · log(n 1 + n 2 )) 1/2 + M · log(n 1 + n 2 )/3.
Furthermore, for all t ≥ 0,
(
Pr[‖Z‖ ≥ t] ≤ (n 1 + n 2 ) · exp −
t 2 /2
Var[Z] + Mt/3
explain these in a para
A useful shorthand will be the following: If y 1 , y 2 , . . . , y m are independent
random variables each having mean 0 and taking values
in [−1, 1], then their average 1 m ∑ i y i behaves like a Gaussian variable
with mean zero and variance at most 1/m. In other words, the
probability that this average is at least ɛ in absolute value is at most
exp(−ɛ 2 m).
)
.
1.1.2 Singular Value Decomposition
TBD.