- Page 3 and 4: About this lectureMain goalShow how
- Page 5: OutlineMachine learning in a nutshe
- Page 9 and 10: Statistical LearningStatistical Lea
- Page 11 and 12: Statistical LearningStatistical Lea
- Page 13 and 14: Machine learning◮ a phenomenon is
- Page 15 and 16: Machine learning◮ a phenomenon is
- Page 17 and 18: Supervised learning◮ several type
- Page 19 and 20: Model quality◮ given a dataset (x
- Page 21 and 22: Model quality◮ given a dataset (x
- Page 23 and 24: Model quality◮ given a dataset (x
- Page 25 and 26: Machine learning objectives◮ main
- Page 27 and 28: Data◮ the phenomenon is fully des
- Page 29 and 30: Model quality (risk)◮ quality is
- Page 31 and 32: Model quality (risk)◮ quality is
- Page 33 and 34: Cost functions◮ Regression (Y = R
- Page 35 and 36: Cost functions◮ Regression (Y = R
- Page 37 and 38: Machine learning algorithm◮ given
- Page 39 and 40: Consistency◮ best risk:L ∗ = in
- Page 41 and 42: Interpretation◮ universality: no
- Page 43 and 44: Interpretation (continued)◮ We ca
- Page 45 and 46: From PAO to consistency◮ if conve
- Page 47 and 48: From PAO to consistency◮ if conve
- Page 49 and 50: From PAO to strong consistency◮ t
- Page 51 and 52: From PAO to strong consistency◮ t
- Page 53 and 54: Classical statisticsEven though we
- Page 55 and 56: Optimal model◮ there is neverthel
- Page 57 and 58:
No free lunch◮ unfortunately, mak

- Page 59 and 60:
No free lunch◮ unfortunately, mak

- Page 61 and 62:
Estimation difficulties◮ in fact,

- Page 63 and 64:
Consequences◮ no free lunch resul

- Page 65 and 66:
Consequences◮ no free lunch resul

- Page 67 and 68:
Estimation and approximation◮ sol

- Page 69 and 70:
Back to machine learning algorithms

- Page 71 and 72:
Back to machine learning algorithms

- Page 73 and 74:
Criterion◮ the best solution woul

- Page 75 and 76:
Criterion◮ the best solution woul

- Page 77 and 78:
Empirical risk minimization◮ gene

- Page 79 and 80:
Empirical risk minimization◮ gene

- Page 81 and 82:
Complexity controlThe overfitting i

- Page 83 and 84:
Complexity controlThe overfitting i

- Page 85 and 86:
Increasing complexity◮ key idea:

- Page 87 and 88:
Increasing complexity◮ key idea:

- Page 89 and 90:
Structural risk minimization◮ key

- Page 91 and 92:
Structural risk minimization◮ key

- Page 93 and 94:
Regularization◮ key idea: use a l

- Page 95 and 96:
Regularization◮ key idea: use a l

- Page 97 and 98:
Summary◮ from n i.i.d. observatio

- Page 99 and 100:
Outline◮ part II: empirical risk

- Page 101 and 102:
OutlineConcentrationHoeffding inequ

- Page 103 and 104:
Concentration inequalities◮ in es

- Page 105 and 106:
Hoeffding’s inequalityHoeffding,

- Page 107 and 108:
Direct application◮ with a bounde

- Page 109 and 110:
Direct application◮ with a bounde

- Page 111 and 112:
The test set◮ Hoeffding’s inequ

- Page 113 and 114:
The test set◮ Hoeffding’s inequ

- Page 115 and 116:
Proof of the Hoeffding’s inequali

- Page 117 and 118:
Uniform bounds◮ we deal with the

- Page 119 and 120:
Uniform bounds◮ we deal with the

- Page 121 and 122:
Finite model class◮ union bound :

- Page 123 and 124:
Bound versus sizewith probability a

- Page 125 and 126:
Infinite model class◮ so far we h

- Page 127 and 128:
Infinite model class◮ so far we h

- Page 129 and 130:
Growth function◮ abstract setting

- Page 131 and 132:
Vapnik-Chervonenkis dimension◮ S

- Page 133 and 134:
Vapnik-Chervonenkis dimension◮ S

- Page 135 and 136:
Example◮ points from R 2◮ F = {

- Page 137 and 138:
Example◮ points from R 2◮ F = {

- Page 139 and 140:
Example◮ points from R 2◮ F = {

- Page 141 and 142:
Example◮ points from R 2◮ F = {

- Page 143 and 144:
Linear modelsresultif G is a p dime

- Page 145 and 146:
Linear modelsresultif G is a p dime

- Page 147 and 148:
VC dimension ≠ parameter number

- Page 149 and 150:
Snake oil warning◮ you might read

- Page 151 and 152:
Snake oil warning◮ you might read

- Page 153 and 154:
Uniform boundVapnik and Chervonenki

- Page 155 and 156:
Application to classification◮ th

- Page 157 and 158:
Risk bounds◮ with probability at

- Page 159 and 160:
Discussion◮ VC dimension provides

- Page 161 and 162:
Discussion◮ VC dimension provides

- Page 163 and 164:
Extension◮ a consequence of VC th

- Page 165 and 166:
Proof of the lemma◮ let f ∗ be

- Page 167 and 168:
Proof of the lemma◮ let f ∗ be

- Page 169 and 170:
Proof of the lemma◮ use 1 − 1th

- Page 171 and 172:
Back to the proof of the VC theorem

- Page 173 and 174:
Summary◮ the empirical risk minim

- Page 175 and 176:
What about regression?◮ capacity

- Page 177 and 178:
What about regression?◮ capacity

- Page 179 and 180:
Comments◮ this is again a uniform

- Page 181 and 182:
Comments◮ this is again a uniform

- Page 183 and 184:
Proof of the lemma◮ first step: s

- Page 185 and 186:
Proof of the lemma◮ therefore, by

- Page 187 and 188:
Proof of the main result◮ let g 1

- Page 189 and 190:
Proof of the main result◮ let g 1

- Page 191 and 192:
Proof of the main result◮ and the

- Page 193 and 194:
Application to regression◮ a quit

- Page 195 and 196:
Pseudo dimension◮ associate to F

- Page 197 and 198:
In practice◮ the pseudo dimension

- Page 199 and 200:
Example◮ as G is a subset of a k

- Page 201 and 202:
Summary◮ we have in general{Psup

- Page 203 and 204:
Limitations◮ G must be fixed and

- Page 205 and 206:
Limitations◮ G must be fixed and

- Page 207 and 208:
Still no free lunch!◮ binary clas

- Page 209 and 210:
Still no free lunch!◮ VCdim (G) <

- Page 211 and 212:
Part IIICapacity control93 / 154 Fa

- Page 213 and 214:
Estimation and approximation◮ bac

- Page 215 and 216:
Increasing complexity/capacityfor b

- Page 217 and 218:
In practice◮ a central question:

- Page 219 and 220:
Proof◮ L(g ∗ n)−L ∗ =[]]L(g

- Page 221 and 222:
Regression◮ similar principle◮

- Page 223 and 224:
OutlineGeneric controlBinary classi

- Page 225 and 226:
Structural Risk Minimization◮ in

- Page 227 and 228:
Comments◮ the trade off between e

- Page 229 and 230:
Comments◮ the trade off between e

- Page 231 and 232:
Proof◮ then∞∑∞∑8n VCdim(G

- Page 233 and 234:
Model selection via validation◮ a

- Page 235 and 236:
Model selection via validation◮ a

- Page 237 and 238:
Model selection via validation◮ a

- Page 239 and 240:
Regularization◮ can be seen as a

- Page 241 and 242:
Examples◮ Ridge approaches (a.k.a

- Page 243 and 244:
Part IVBeyond empirical risk minimi

- Page 245 and 246:
Algorithmic difficulties◮ for reg

- Page 247 and 248:
Minimizing a loss◮ standard solut

- Page 249 and 250:
Minimizing a loss◮ standard solut

- Page 251 and 252:
Squared error◮ the easy case◮

- Page 253 and 254:
A more general case◮ to extend th

- Page 255 and 256:
A more general case◮ to extend th

- Page 257 and 258:
Convex risk minimization◮ Interes

- Page 259 and 260:
Margin◮ when h(x) = sign(g(x)), t

- Page 261 and 262:
Maximal margin◮ linearly separabl

- Page 263 and 264:
Maximal margin◮ linearly separabl

- Page 265 and 266:
Examples of calibrated convex loss

- Page 267 and 268:
Summary◮ in classification, algor

- Page 269 and 270:
Support Vector Machines◮ linear S

- Page 271 and 272:
Support Vector Machines◮ linear S

- Page 273 and 274:
Reproducing Kernel Hilbert Space◮

- Page 275 and 276:
Kernel methods◮ the representer t

- Page 277 and 278:
Some hypothesis◮ X i take values

- Page 279 and 280:
Concentration◮ |A(g) − A n (g)|

- Page 281 and 282:
Concentration◮ let g 1 , . . . ,

- Page 283 and 284:
Covering numbers◮ to conclude, we

- Page 285 and 286:
Summary◮ regularization in a Repr

- Page 287 and 288:
ExtensionsRademacher averages (a.k.

- Page 289 and 290:
ExtensionsTaking into account the v

- Page 291 and 292:
Extensions◮ Other supervised algo

- Page 293 and 294:
Wrapping upWhat have we learned?◮

- Page 295 and 296:
OutlineResearchersBibliography147 /

- Page 297 and 298:
Bibliography IN. Alon, S. Ben-David

- Page 299 and 300:
Bibliography IIIP. Domingos and G.

- Page 301 and 302:
Bibliography VG. Lugosi and K. Zege