Bayesian Reasoning and Machine Learning

More documents

Recommendations

Info

Introduction to Part I Probabilistic models explicitly take into account uncertainty and deal with our imperfect knowledge of the world. Such models are of fundamental significance in Machine Learning since our understanding of the world will always be limited by our observations and understanding. We will focus initially on using probabilistic models as a kind of expert system. In Part I, we assume that the model is fully specified. That is, given a model of the environment, how can we use it to answer questions of interest. We will relate the complexity of inferring quantities of interest to the structure of the graph describing the model. In addition, we will describe operations in terms of manipulations on the corresponding graphs. As we will see, provided the graphs are simple tree-like structures, most quantities of interest can be computed efficiently. Part I deals with manipulating mainly discrete variable distributions and forms the background to all the later material in the book. DRAFT January 9, 2013 3
Page 1 and 2: Bayesian Reasoning and Machine Lear
Page 3 and 4: The data explosion Preface We live
Page 5 and 6: Part I: Inference in Probabilistic
Page 7 and 8: BRMLtoolbox The BRMLtoolbox is a li
Page 9 and 10: GMMem - Fit a mixture of Gaussian t
Page 11 and 12: Contents Front Matter I Notation Li
Page 13 and 14: CONTENTS CONTENTS 5.6.3 Bucket elim
Page 15 and 16: CONTENTS CONTENTS 9.5.2 Empirical i
Page 17 and 18: CONTENTS CONTENTS 15.3 High Dimensi
Page 19 and 20: CONTENTS CONTENTS 20.3.7 Semi-super
Page 21 and 22: CONTENTS CONTENTS 25 Switching Line
Page 23 and 24: CONTENTS CONTENTS 29.1.6 Linear tra
Page 25: Part I Inference in Probabilistic M
Page 29 and 30: messagepassing messagestractable ap
Page 31 and 32: CHAPTER 1 Probabilistic Reasoning W
Page 33 and 34: Probability Refresher us ‘invert
Page 35 and 36: Probability Refresher This may seem
Page 37 and 38: Probabilistic Reasoning reasoning i
Page 39 and 40: Probabilistic Reasoning Example 1.5
Page 41 and 42: Prior, Likelihood and Posterior The
Page 43 and 44: Prior, Likelihood and Posterior 1.3
Page 45 and 46: Exercises Summing a Potential sumpo
Page 47 and 48: Exercises Exercise 1.12. Implement
Page 49 and 50: CHAPTER 2 Basic Graph Concepts Ofte
Page 51 and 52: Numerically Encoding Graphs c a f d
Page 53 and 54: Code is an incidence matrix for fig
Page 55 and 56: CHAPTER 3 Belief Networks We can no
Page 57 and 58: The Benefits of Structure Furthermo
Page 59 and 60: Uncertain and Unreliable Evidence x
Page 61 and 62: Uncertain and Unreliable Evidence B
Page 63 and 64: Belief Networks x1 x2 x3 x4 (a) x3
Page 65 and 66: Belief Networks x z (a) y x z (b) y
Page 67 and 68: Belief Networks A B C A B C → A B
Page 69 and 70: Belief Networks b g f t (a) s b g f
Page 71 and 72: Causality infer this in the larger
Page 73 and 74: Causality Similarly, G p(R|G, D)p(
Page 75 and 76: Exercises H P D A U a t l b e x d F
Page 77 and 78:
Exercises 1. Compute ‘by hand’
Page 79 and 80:
Exercises 4. Writing M = [m1 m2 m3]
Page 81 and 82:
CHAPTER 4 Graphical Models In chapt
Page 83 and 84:
Markov Networks 1 2 3 4 (a) 5 6 7 1
Page 85 and 86:
Markov Networks Definition 4.7 (Mar
Page 87 and 88:
Markov Networks h a b c d e f g i j
Page 89 and 90:
Chain Graphical Models a b c d (a)
Page 91 and 92:
Factor Graphs a b c d e f (a) c a (
Page 93 and 94:
Expressiveness of Graphical Models
Page 95 and 96:
Exercises • Some special probabil
Page 97 and 98:
Exercises 2. Does the distribution
Page 99 and 100:
CHAPTER 5 Efficient Inference in Tr
Page 101 and 102:
Marginal Inference The missing prop
Page 103 and 104:
Marginal Inference f1 f2 f3 f4 a b
Page 105 and 106:
Marginal Inference Factor to Variab
Page 107 and 108:
Other Forms of Inference f4 a b d f
Page 109 and 110:
Other Forms of Inference If we want
Page 111 and 112:
Other Forms of Inference 4 1 2 6 3
Page 113 and 114:
Inference in Multiply Connected Gra
Page 115 and 116:
Inference in Multiply Connected Gra
Page 117 and 118:
Code • The sum-product and max-pr
Page 119 and 120:
Exercises 5. If two players are jum
Page 121 and 122:
CHAPTER 6 The Junction Tree Algorit
Page 123 and 124:
Clique Graphs In other words p(a, b
Page 125 and 126:
Junction Trees x3 x1 x4 (a) x2 x3,
Page 127 and 128:
Junction Trees and φ ∗ (x3, x4)
Page 129 and 130:
Junction Trees for Multiply-Connect
Page 131 and 132:
Junction Trees for Multiply-Connect
Page 133 and 134:
The Junction Tree Algorithm abf bf
Page 135 and 136:
The Junction Tree Algorithm where s
Page 137 and 138:
Finding the Most Likely State 6.6.5
Page 139 and 140:
The Need For Approximations dce e e
Page 141 and 142:
Exercises 6.12 Exercises Exercise 6
Page 143 and 144:
Exercises 1. Now eliminate clique 1
Page 145 and 146:
CHAPTER 7 Making Decisions So far w
Page 147 and 148:
Decision Trees Based on expected ut
Page 149 and 150:
Extending Bayesian Networks for Dec
Page 151 and 152:
Extending Bayesian Networks for Dec
Page 153 and 154:
Solving Influence Diagrams where we
Page 155 and 156:
Solving Influence Diagrams The cliq
Page 157 and 158:
Markov Decision Processes and utili
Page 159 and 160:
Temporally Unbounded MDPs Defining
Page 161 and 162:
Variational Inference and Planning
Page 163 and 164:
Financial Matters S 1 T using this
Page 165 and 166:
Financial Matters Alternatively, th
Page 167 and 168:
Financial Matters At time step t +
Page 169 and 170:
Further Topics Similarly, U(xt+1, d
Page 171 and 172:
Code case, the first step of the ju
Page 173 and 174:
Exercises (8, 4) is the desired par
Page 175 and 176:
Exercises Exercise 7.9. Consider an
Page 177:
Part II Learning in Probabilistic M
Page 180 and 181:
Graphical model Undirected Directed
Page 182 and 183:
8.2 Distributions Distributions Dis
Page 184 and 185:
For a multivariate distribution the
Page 186 and 187:
The KL divergence is ≥ 0 Distribu
Page 188 and 189:
1.5 1 0.5 λ=0.2 λ=0.5 λ=1 λ=1.5
Page 190 and 191:
30 20 10 −5 0 5 10 15 0 −5 0 5
Page 192 and 193:
0.14 0.12 0.1 0.08 0.06 0.04 0.02 0
Page 194 and 195:
Result 8.3 (Linear Transform of a G
Page 196 and 197:
Learning distributions For example
Page 198 and 199:
Properties of Maximum Likelihood me
Page 200 and 201:
8.7.3 Maximum likelihood and the em
Page 202 and 203:
The posterior is then p(µ, σ 2 |X
Page 204 and 205:
Exercises • Provided that we are
Page 206 and 207:
must satisfy A B C D P Q R S I
Page 208 and 209:
1. Take the Taylor expansion of Exe
Page 210 and 211:
Exercises Exercise 8.31. For a Gaus
Page 212 and 213:
1. Show that the α that minimises
Page 214 and 215:
θ v 1 v 2 v 3 · · · v N (a) θ
Page 216 and 217:
10 5 0 0 0.2 0.4 0.6 0.8 1 θ Using
Page 218 and 219:
a s c 1 1 1 1 0 0 0 1 1 0 1 0 1 1 1
Page 220 and 221:
x1 x2 · · · xn−1 xn Conditiona
Page 222 and 223:
θa a n s n c n θ a,s c θs n = 1
Page 224 and 225:
Bayesian Belief Network Training Ze
Page 226 and 227:
a s c 1 1 2 1 0 0 0 1 1 0 1 0 1 1 2
Page 228 and 229:
Structure learning For all but the
Page 230 and 231:
Structure learning sample value is
Page 232 and 233:
x1 x3 x6 x8 (a) x2 x4 x7 x5 x1 x3 x
Page 234 and 235:
Structure learning For two variable
Page 236 and 237:
Maximum Likelihood for Undirected m
Page 238 and 239:
x1 x2 x3 x4 (a) x5 x6 φ (x1, x2)
Page 240 and 241:
Page 242 and 243:
9.6.4 Exponential form potentials F
Page 244 and 245:
Page 246 and 247:
9.6.6 Pseudo likelihood Consider a
Page 248 and 249:
Fuse Drum Toner Paper Roller Burnin
Page 250 and 251:
Exercises Exercise 9.7. For i.i.d.
Page 252 and 253:
Estimation using Maximum Likelihood
Page 254 and 255:
Using the binary encoding xi ∈ {0
Page 256 and 257:
c n x n i n = 1 : N θi,c c = 1 : C
Page 258 and 259:
c x1 x2 x3 x4 Using the general ide
Page 260 and 261:
Exercises Thus, a response (1, 0, 1
Page 262 and 263:
Exercises 238 DRAFT January 9, 2013
Page 264 and 265:
xvis θ minv (a) xinv xvis θ minv
Page 266 and 267:
11.1.4 Identifiability issues Expec
Page 268 and 269:
11.2.2 Classical EM In the variatio
Page 270 and 271:
s c 1 1 0 0 1 1 1 0 1 1 0 0 0 1 Exp
Page 272 and 273:
11.2.4 General case Expectation Max
Page 274 and 275:
Expectation Maximisation A useful p
Page 276 and 277:
Extensions of EM For a guaranteed i
Page 278 and 279:
Variational Bayes where n(v, h) is
Page 280 and 281:
Algorithm 11.4 Variational Bayes (i
Page 282 and 283:
M-step: q(θ) updates From above we
Page 284 and 285:
which has gradient ⎛ ∂ L = ∂
Page 286 and 287:
Exercises Exercise 11.5. Write a ge
Page 288 and 289:
Page 290 and 291:
0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4
Page 292 and 293:
0.2 0.1 0 0 0.2 0.1 0 0 0.2 0.1 0 0
Page 294 and 295:
y 12 10 8 6 4 2 0 −2 0 5 10 x 15
Page 296 and 297:
where φ(x) is a K + 1 dimensional
Page 298 and 299:
Bayesian Hypothesis Testing for Out
Page 300 and 301:
Page 302 and 303:
Page 304 and 305:
9 8 7 6 5 4 3 2 1 0 0 0.1 0.2 0.3 0
Page 306 and 307:
Assuming no prior preference amongs
Page 308 and 309:
Page 311 and 312:
Introduction to Part III Machine Le
Page 313 and 314:
CHAPTER 13 Machine Learning Concept
Page 315 and 316:
Styles of Learning on a visit to a
Page 317 and 318:
Supervised Learning General loss/ut
Page 319 and 320:
Supervised Learning Train Validate
Page 321 and 322:
Supervised Learning θ X , C p(x, c
Page 323 and 324:
Supervised Learning m m m m m m m f
Page 325 and 326:
Exercises The advantage of this app
Page 327 and 328:
CHAPTER 14 Nearest Neighbour Classi
Page 329 and 330:
K-Nearest Neighbours Choosing K Fig
Page 331 and 332:
Exercises The motivation for K near
Page 333 and 334:
CHAPTER 15 Unsupervised Linear Dime
Page 335 and 336:
Principal Components Analysis B)
Page 337 and 338:
Principal Components Analysis Algor
Page 339 and 340:
High Dimensional Data (a) (b) Figur
Page 341 and 342:
Latent Semantic Analysis 2 4 6 8 10
Page 343 and 344:
PCA With Missing Data 1 2 3 4 5 6 7
Page 345 and 346:
PCA With Missing Data 20 40 60 80 1
Page 347 and 348:
Matrix Decomposition Methods Below
Page 349 and 350:
Matrix Decomposition Methods Algori
Page 351 and 352:
Matrix Decomposition Methods Gradie
Page 353 and 354:
Kernel PCA PCA in the potentially v
Page 355 and 356:
Canonical Correlation Analysis 5 10
Page 357 and 358:
Exercises 2. Show that for a matrix
Page 359 and 360:
CHAPTER 16 Supervised Linear Dimens
Page 361 and 362:
Canonical Variates and therefore th
Page 363 and 364:
Canonical Variates Algorithm 16.1 C
Page 365 and 366:
Exercises Show that by setting y1 =
Page 367 and 368:
CHAPTER 17 Linear Models In this ch
Page 369 and 370:
Linear Parameter Models for Regress
Page 371 and 372:
Linear Parameter Models for Regress
Page 373 and 374:
The Dual Representation and Kernels
Page 375 and 376:
Linear Parameter Models for Classif
Page 377 and 378:
Page 379 and 380:
Page 381 and 382:
Support Vector Machines 17.5 Suppor
Page 383 and 384:
Support Vector Machines Since only
Page 385 and 386:
Soft Zero-One Loss for Outlier Robu
Page 387 and 388:
Exercises Exercise 17.6. The logist
Page 389 and 390:
CHAPTER 18 Bayesian Linear Models T
Page 391 and 392:
Regression With Additive Gaussian N
Page 393 and 394:
Page 395 and 396:
Page 397 and 398:
Classification Algorithm 18.1 Evide
Page 399 and 400:
Classification By approximating E(w
Page 401 and 402:
Classification 18.2.3 Variational G
Page 403 and 404:
Classification 0 −0.5 −1 −1.5
Page 405 and 406:
Page 407 and 408:
CHAPTER 19 Gaussian Processes In Ba
Page 409 and 410:
Non-Parametric Prediction From this
Page 411 and 412:
Gaussian Process Prediction In this
Page 413 and 414:
Covariance Functions Definition 19.
Page 415 and 416:
Analysis of Covariance Functions 1
Page 417 and 418:
Analysis of Covariance Functions of
Page 419 and 420:
Gaussian Processes for Classificati
Page 421 and 422:
Gaussian Processes for Classificati
Page 423 and 424:
Page 425 and 426:
CHAPTER 20 Mixture Models Mixture m
Page 427 and 428:
Expectation Maximisation for Mixtur
Page 429 and 430:
Expectation Maximisation for Mixtur
Page 431 and 432:
The Gaussian Mixture Model Figure 2
Page 433 and 434:
The Gaussian Mixture Model Algorith
Page 435 and 436:
The Gaussian Mixture Model 0.25 0.2
Page 437 and 438:
The Gaussian Mixture Model Algorith
Page 439 and 440:
Indicator Models x n y n h n W U n
Page 441 and 442:
Mixed Membership Models 250 200 150
Page 443 and 444:
Mixed Membership Models Arts Budget
Page 445 and 446:
Mixed Membership Models 1 2 3 4 (a)
Page 447 and 448:
Mixed Membership Models 20 40 60 80
Page 449 and 450:
Exercises demoGMMclass.m: Demo GMM
Page 451 and 452:
CHAPTER 21 Latent Linear Models In
Page 453 and 454:
Factor Analysis : Maximum Likelihoo
Page 455 and 456:
Page 457 and 458:
Page 459 and 460:
Interlude: Modelling Faces (a) Mean
Page 461 and 462:
Canonical Correlation Analysis and
Page 463 and 464:
Independent Components Analysis 1.5
Page 465 and 466:
Exercises Exercise 21.5. For the lo
Page 467 and 468:
CHAPTER 22 Latent Ability Models In
Page 469 and 470:
Competition Models student fraction
Page 471 and 472:
Exercises This integral is formally
Page 473:
Part IV Dynamical Models 449
Page 476 and 477:
Dynamic models Markov models Comple
Page 478 and 479:
v1 v2 v3 v4 (a) v1 v2 v3 v4 Figure
Page 480 and 481:
Bayesian fitting Markov Models For
Page 482 and 483:
h1 h2 h3 h4 v1 v2 v3 v4 23.2 Hidden
Page 484 and 485:
Hidden Markov Models We can write e
Page 486 and 487:
where the final term p(ht−1|v1:t
Page 488 and 489:
m1 m2 m3 h1 h2 h3 h4 v1 v2 v3 v4 st
Page 490 and 491:
23.2.9 Natural language models Lear
Page 492 and 493:
Continuous observations Learning HM
Page 494 and 495:
Related Models Including the counte
Page 496 and 497:
x3(t) x2(t) x1(t) v1(t) h1(t) h2(t)
Page 498 and 499:
Exercises • Hidden Markov models
Page 500 and 501:
1. What is the probability that the
Page 502 and 503:
Exercises Exercise 23.14. For train
Page 504 and 505:
Page 506 and 507:
corresponds to rotational behaviour
Page 508 and 509:
24.2.1 Training an AR model Maximum
Page 510 and 511:
(a) (b) (c) (d) (e) Auto-Regressive
Page 512 and 513:
h1 h2 h3 h4 v1 v2 v3 v4 Inference F
Page 514 and 515:
Inference Algorithm 24.1 LDS Forwar
Page 516 and 517:
24.4.2 Smoothing : Rauch-Tung-Strie
Page 518 and 519:
24.4.4 Most likely state Inference
Page 520 and 521:
which is representable as a new lat
Page 522 and 523:
s1 s2 s3 s4 v1 v2 v3 v4 Switching A
Page 524 and 525:
Switching Auto-Regressive Models On
Page 526 and 527:
24.9 Exercises Exercise 24.1. Consi
Page 528 and 529:
Page 530 and 531:
s1 h1 v1 s2 h2 v2 s3 h3 v3 s4 h4 v4
Page 532 and 533:
Gaussian Sum Filtering and subseque
Page 534 and 535:
Gaussian Sum Smoothing Algorithm 25
Page 536 and 537:
st−1 ht−1 vt−1 st ht vt st+1
Page 538 and 539:
25.4.5 Relation to other methods Ga
Page 540 and 541:
20 10 0 0 20 40 60 80 100 20 10 0 0
Page 542 and 543:
Reset Models with the understanding
Page 544 and 545:
c1 c2 c3 c4 s1 h1 v1 s2 h2 v2 s3 h3
Page 546 and 547:
2. Writing Exercises β(ht, st) =
Page 548 and 549:
v4 v4 These two rules can be compac
Page 550 and 551:
Under this condition sgn j wijvj(t)
Page 552 and 553:
fraction correct 1 0.95 0.9 0.85 0.
Page 554 and 555:
26.3.3 Boolean networks Tractable C
Page 556 and 557:
(a) (b) Tractable Continuous Latent
Page 558 and 559:
neuron number 5 10 15 20 25 30 35 4
Page 560 and 561:
26.8 Exercises Exercises Exercise 2
Page 563 and 564:
Introduction to Part V In Part I we
Page 565 and 566:
CHAPTER 27 Sampling In cases where
Page 567 and 568:
Introduction 1 × 2 3 Figure 27.1:
Page 569 and 570:
Introduction An alternative exact a
Page 571 and 572:
Gibbs Sampling x1 x4 x3 x6 x2 x5 Fi
Page 573 and 574:
Gibbs Sampling 27.3.2 Structured Gi
Page 575 and 576:
Markov Chain Monte Carlo (MCMC) 27.
Page 577 and 578:
Auxiliary Variable Methods 2.5 2 1.
Page 579 and 580:
Auxiliary Variable Methods (a) (b)
Page 581 and 582:
Auxiliary Variable Methods Figure 2
Page 583 and 584:
Importance Sampling p(x). Rather th
Page 585 and 586:
Importance Sampling in which case,
Page 587 and 588:
Importance Sampling where η t ∼
Page 589 and 590:
Exercises Exercise 27.6. The file d
Page 591 and 592:
CHAPTER 28 Deterministic Approximat
Page 593 and 594:
Properties of Kullback-Leibler Vari
Page 595 and 596:
Variational Bounding Using KL(q|p)
Page 597 and 598:
Page 599 and 600:
Page 601 and 602:
Local and KL Variational Approximat
Page 603 and 604:
Mutual Information Maximisation : A
Page 605 and 606:
Mutual Information Maximisation : A
Page 607 and 608:
Loopy Belief Propagation 28.7.2 Loo
Page 609 and 610:
Expectation Propagation x w origina
Page 611 and 612:
Expectation Propagation x1 x4 (a) x
Page 613 and 614:
MAP for Markov networks Algorithm 2
Page 615 and 616:
MAP for Markov networks Iterated co
Page 617 and 618:
MAP for Markov networks (a) (b) Fig
Page 619 and 620:
Exercises • Loopy belief propagat
Page 621 and 622:
Exercises 2. Furthermore, show that
Page 623 and 624:
Exercises Exercise 28.17. Consider
Page 625 and 626:
29.1 Linear Algebra 29.1.1 Vector a
Page 627 and 628:
Linear Algebra p v u a n Figure 29.
Page 629 and 630:
Linear Algebra generalises to any d
Page 631 and 632:
Linear Algebra Hence a matrix is si
Page 633 and 634:
Multivariate Calculus x 1 f(x) 29.2
Page 635 and 636:
Multivariate Optimisation below thi
Page 637 and 638:
Multivariate Optimisation 29.5.3 Mi
Page 639 and 640:
@@ Multivariate Optimisation Algori
Page 641 and 642:
Constrained Optimisation using Lagr
Page 643 and 644:
Bibliography [1] L. F. Abbott, J. A
Page 645 and 646:
BIBLIOGRAPHY BIBLIOGRAPHY [39] G. J
Page 647 and 648:
BIBLIOGRAPHY BIBLIOGRAPHY [87] A. D
Page 649 and 650:
BIBLIOGRAPHY BIBLIOGRAPHY [134] G.
Page 651 and 652:
BIBLIOGRAPHY BIBLIOGRAPHY [183] S.
Page 653 and 654:
BIBLIOGRAPHY BIBLIOGRAPHY [229] F.
Page 655 and 656:
BIBLIOGRAPHY BIBLIOGRAPHY [275] N.
Page 657 and 658:
BIBLIOGRAPHY BIBLIOGRAPHY [321] S.
Page 659 and 660:
1 − of − M coding, 231 N-max-pr
Page 661 and 662:
INDEX INDEX Matérn, 390 Mercer ker
Page 663 and 664:
INDEX INDEX optimal investment, 142
Page 665 and 666:
INDEX INDEX Joseph’s symmetrized
Page 667 and 668:
INDEX INDEX max-product, 83 N most
Page 669 and 670:
INDEX INDEX regression, 290, 384 lo
show all

Bayesian Reasoning and Machine Learning

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?