- Page 1 and 2: Bayesian Reasoning and Machine Lear
- Page 3 and 4: The data explosion Preface We live
- Page 5 and 6: Part I: Inference in Probabilistic
- Page 7 and 8: BRMLtoolbox The BRMLtoolbox is a li
- Page 9 and 10: GMMem - Fit a mixture of Gaussian t
- Page 11 and 12: Contents Front Matter I Notation Li
- Page 13 and 14: CONTENTS CONTENTS 5.6.3 Bucket elim
- Page 15 and 16: CONTENTS CONTENTS 9.5.2 Empirical i
- Page 17 and 18: CONTENTS CONTENTS 15.3 High Dimensi
- Page 19 and 20: CONTENTS CONTENTS 20.3.7 Semi-super
- Page 21 and 22: CONTENTS CONTENTS 25 Switching Line
- Page 23 and 24: CONTENTS CONTENTS 29.1.6 Linear tra
- Page 25: Part I Inference in Probabilistic M
- Page 28 and 29: Factor Graphs Graphical Models Undi
- Page 30 and 31: 6 DRAFT January 9, 2013
- Page 34 and 35: Probability Refresher conditioning
- Page 36 and 37: 1.1.2 Probability Tables Probabilis
- Page 38 and 39: Probabilistic Reasoning where we us
- Page 40 and 41: Then p(A = 0, C = 0) = p(A = 0, B,
- Page 42 and 43: 4 2 0 −2 −4 0 20 40 60 80 100 (
- Page 44 and 45: 1.5 Code Code The BRMLtoolbox code
- Page 46 and 47: and also p(x|y, z) = p(y|x, z)p(x|z
- Page 48 and 49: Exercises 24 DRAFT January 9, 2013
- Page 50 and 51: Graphs and each edge (Ak−1, Ak),
- Page 52 and 53: 1 2 3 (a) 4 1 2 3 (b) 2.2.2 Adjacen
- Page 54 and 55: node (a leaf node) which can be eli
- Page 56 and 57: J R (a) T 3.1.1 Modelling independe
- Page 58 and 59: The Benefits of Structure Example 3
- Page 60 and 61: Uncertain and Unreliable Evidence k
- Page 62 and 63: 3.3 Belief Networks Definition 3.1
- Page 64 and 65: x1 x3 (a) x2 x1 x3 (b) x2 x1 x3 (c)
- Page 66 and 67: a b e A B C A B C A B C D A B C A B
- Page 68 and 69: u x y z u x y w z u x y z u x y z u
- Page 70 and 71: t1 y1 h (a) t2 y2 t1 y1 (b) t2 y2 B
- Page 72 and 73: G D R (a) FD Resolution of the para
- Page 74 and 75: where p(D|FD = ∅, G) ≡ p(D|pa (
- Page 76 and 77: Battery Fuel Gauge Turn Over Start
- Page 78 and 79: Exercises Exercise 3.15. A belief n
- Page 80 and 81: Exercises 56 DRAFT January 9, 2013
- Page 82 and 83:
x1 x2 x4 (a) x3 x1 x2 x4 (b) x3 x1
- Page 84 and 85:
x1 x3 (a) x4 x3 (b) x2 x4 x1 x3 (c)
- Page 86 and 87:
Markov Networks 4-cycle x1 − x2
- Page 88 and 89:
Consider a model in which our desir
- Page 90 and 91:
Chain Graphical Models Each chain c
- Page 92 and 93:
Expressiveness of Graphical Models
- Page 94 and 95:
Consider the graph of the BN Expres
- Page 96 and 97:
2. By symmetry arguments, write dow
- Page 98 and 99:
Exercises The set of marginals cons
- Page 100 and 101:
discussed in more depth in section(
- Page 102 and 103:
Marginal Inference where v is a vec
- Page 104 and 105:
For consistency of interpretation,
- Page 106 and 107:
Marginal Inference This can be inte
- Page 108 and 109:
Other Forms of Inference defining m
- Page 110 and 111:
lists, see for example [72]. Other
- Page 112 and 113:
until the point Other Forms of Infe
- Page 114 and 115:
Inference in Multiply Connected Gra
- Page 116 and 117:
c a f d (a) marginal, say p(d). The
- Page 118 and 119:
Exercises demoShortestPath.m: The s
- Page 120 and 121:
Exercises 1. Explain mathematically
- Page 122 and 123:
a b c (a) d a, b, c b, c b, c, d (b
- Page 124 and 125:
A B C D ← 10 1 → 6 ← 2 → 3
- Page 126 and 127:
6.3.1 The running intersection prop
- Page 128 and 129:
dce d Constructing a Junction Tree
- Page 130 and 131:
To see this, consider the following
- Page 132 and 133:
a b c d e f g h i j k l (a) a b c d
- Page 134 and 135:
(a) 1 2 5 7 10 3 4 8 9 6 11 (b) 1 2
- Page 136 and 137:
• The new potential on (a, b) is
- Page 138 and 139:
Reabsorption : Converting a Junctio
- Page 140 and 141:
d1 d2 d3 d4 d5 s1 s2 s3 6.10 Summar
- Page 142 and 143:
Exercise 6.6. For the undirected gr
- Page 144 and 145:
Exercises 120 DRAFT January 9, 2013
- Page 146 and 147:
P arty yes no Rain Rain (0.6) yes (
- Page 148 and 149:
P arty yes no (0.6) yes Rain (0.6)
- Page 150 and 151:
P arty Rain Uparty Partial ordering
- Page 152 and 153:
E UC The utilities are (a) P I UB E
- Page 154 and 155:
where D1 ≺ X1 ≺ D2 ≺ X2, and
- Page 156 and 157:
a b D1 U1 c d b, c, a b, c b, e, d,
- Page 158 and 159:
Markov Decision Processes time depe
- Page 160 and 161:
1 0 11 0 21 0 31 0 41 0 0 0 0 0 1 0
- Page 162 and 163:
Variational Inference and Planning
- Page 164 and 165:
Two possibilities case Financial Ma
- Page 166 and 167:
3 2.5 2 1.5 1 0.5 0 5 10 15 20 25 3
- Page 168 and 169:
Further Topics d1 d2 d3 Figure 7.13
- Page 170 and 171:
dx Ux a t e s x d l b dh Uh s = Smo
- Page 172 and 173:
Exercises We assume that we know wh
- Page 174 and 175:
The cost of playing the second stag
- Page 176 and 177:
Exercises 152 DRAFT January 9, 2013
- Page 179 and 180:
Introduction to Part II In Part II
- Page 181 and 182:
CHAPTER 8 Statistics for Machine Le
- Page 183 and 184:
Distributions where the Jacobian ma
- Page 185 and 186:
Distributions Definition 8.10 (Empi
- Page 187 and 188:
Classical Distributions Definition
- Page 189 and 190:
Classical Distributions 5 4 3 2 1
- Page 191 and 192:
Classical Distributions x 3 1 0.5 0
- Page 193 and 194:
Multivariate Gaussian The multivari
- Page 195 and 196:
Exponential Family 8.4.3 Whitening
- Page 197 and 198:
Learning distributions In seeking t
- Page 199 and 200:
Properties of Maximum Likelihood Wh
- Page 201 and 202:
Learning a Gaussian (a) Prior (b) P
- Page 203 and 204:
Learning a Gaussian 8.8.3 Gauss-Gam
- Page 205 and 206:
Exercises Exercise 8.6. Using x T
- Page 207 and 208:
Exercises show that ∂ lim γ→0
- Page 209 and 210:
Exercises 2. Using equation (8.4.13
- Page 211 and 212:
Exercises 1. Show that the log prod
- Page 213 and 214:
CHAPTER 9 Learning as Inference In
- Page 215 and 216:
Learning as Inference In the coin t
- Page 217 and 218:
Bayesian methods and ML-II a s θ v
- Page 219 and 220:
Maximum Likelihood Training of Beli
- Page 221 and 222:
Bayesian Belief Network Training 9.
- Page 223 and 224:
Bayesian Belief Network Training Th
- Page 225 and 226:
Bayesian Belief Network Training No
- Page 227 and 228:
Structure learning Algorithm 9.1 PC
- Page 229 and 230:
Structure learning x y z w t (a) x
- Page 231 and 232:
Structure learning z1 θz z n x n y
- Page 233 and 234:
Structure learning Algorithm 9.3 Ch
- Page 235 and 236:
Maximum Likelihood for Undirected m
- Page 237 and 238:
Maximum Likelihood for Undirected m
- Page 239 and 240:
Maximum Likelihood for Undirected m
- Page 241 and 242:
Maximum Likelihood for Undirected m
- Page 243 and 244:
Maximum Likelihood for Undirected m
- Page 245 and 246:
Maximum Likelihood for Undirected m
- Page 247 and 248:
Code • Provided we assume local a
- Page 249 and 250:
Exercises Exercise 9.3. A training
- Page 251 and 252:
CHAPTER 10 Naive Bayes So far we’
- Page 253 and 254:
Estimation using Maximum Likelihood
- Page 255 and 256:
Estimation using Maximum Likelihood
- Page 257 and 258:
Bayesian Naive Bayes For convenienc
- Page 259 and 260:
Exercises Practitioners typically c
- Page 261 and 262:
Exercises so that for example x2 =
- Page 263 and 264:
CHAPTER 11 Learning with Hidden Var
- Page 265 and 266:
Hidden Variables and Missing Data E
- Page 267 and 268:
Expectation Maximisation Algorithm
- Page 269 and 270:
Expectation Maximisation log p(v|θ
- Page 271 and 272:
Expectation Maximisation The first
- Page 273 and 274:
Expectation Maximisation Algorithm
- Page 275 and 276:
Extensions of EM log likelihood −
- Page 277 and 278:
A failure case for EM Stochastic EM
- Page 279 and 280:
Variational Bayes Algorithm 11.3 Va
- Page 281 and 282:
Variational Bayes θa a s c θc (a)
- Page 283 and 284:
Optimising the Likelihood by Gradie
- Page 285 and 286:
Exercises fuse assembly malfunction
- Page 287 and 288:
Exercises Exercise 11.9. A 2 × 2 p
- Page 289 and 290:
CHAPTER 12 Bayesian Model Selection
- Page 291 and 292:
Illustrations : coin tossing 8 6 4
- Page 293 and 294:
++ Occam’s Razor and Bayesian Com
- Page 295 and 296:
A continuous example : curve fittin
- Page 297 and 298:
Approximating the Model Likelihood
- Page 299 and 300:
Bayesian Hypothesis Testing for Out
- Page 301 and 302:
Bayesian Hypothesis Testing for Out
- Page 303 and 304:
Bayesian Hypothesis Testing for Out
- Page 305 and 306:
@@ @@ Exercises 1. Under the assump
- Page 307 and 308:
Exercises show that p(D|My→x) is
- Page 309:
Part III Machine Learning 285
- Page 312 and 313:
Clique decomp. Mixed membership Lat
- Page 314 and 315:
Train Test Styles of Learning Figur
- Page 316 and 317:
Supervised Learning be disastrous (
- Page 318 and 319:
X , C p(x, c) 〈L(c, c(x|θ))〉 p
- Page 320 and 321:
1.5 1 0.5 0 −0.5 −1 −1.5 −3
- Page 322 and 323:
Supervised Learning Based on zero-o
- Page 324 and 325:
θ c|h θh θ x|h cn xn hn N 13.3 B
- Page 326 and 327:
form p(c = 1|x) = 1 1 + exp −(ax
- Page 328 and 329:
K-Nearest Neighbours Figure 14.1: I
- Page 330 and 331:
A Probabilistic Interpretation of N
- Page 332 and 333:
Exercises many images of male faces
- Page 334 and 335:
3 2 1 0 −1 −2 0 2 4 4 2 0 −2
- Page 336 and 337:
2 1.5 1 0.5 0 −0.5 −1 −1.5
- Page 338 and 339:
Principal Components Analysis struc
- Page 340 and 341:
number of errors 12 10 8 6 4 2 0 0
- Page 342 and 343:
Latent Semantic Analysis Example 15
- Page 344 and 345:
(a) (b) PCA With Missing Data Figur
- Page 346 and 347:
(a) (b) Matrix Decomposition Method
- Page 348 and 349:
z x y M-step (a) x z y (b) Matrix D
- Page 350 and 351:
1 2 3 4 5 6 7 8 9 10 1 2 (a) 1 2 3
- Page 352 and 353:
factor 1 (Reinforcement Learning) 0
- Page 354 and 355:
Finding the reconstructions Canonic
- Page 356 and 357:
Exercises This approach is taken in
- Page 358 and 359:
3. Finding the eigenvalues and eige
- Page 360 and 361:
5 4 3 2 1 0 −1 −2 −3 −4 −
- Page 362 and 363:
Canonical Variates Between class Sc
- Page 364 and 365:
Exercises Example 16.1 (Using canon
- Page 366 and 367:
Exercises 342 DRAFT January 9, 2013
- Page 368 and 369:
chirps per sec 110 105 100 95 90 85
- Page 370 and 371:
1 0.5 0 0 0.5 5 10 15 20 25 0 −0.
- Page 372 and 373:
17.2.3 Radial basis functions A pop
- Page 374 and 375:
17.3.1 Regression in the dual-space
- Page 376 and 377:
w Linear Parameter Models for Class
- Page 378 and 379:
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0
- Page 380 and 381:
1.5 1 0.5 0 0.9 0.8 0.7 0.6 0.5 0.4
- Page 382 and 383:
2-Norm soft-margin w w origin ξ n
- Page 384 and 385:
2 1.5 1 0.5 0 −0.5 −1 −1.5
- Page 386 and 387:
17.8 Code demoCubicPoly.m: Demo of
- Page 388 and 389:
Exercises 364 DRAFT January 9, 2013
- Page 390 and 391:
α w x n y n N β Regression With A
- Page 392 and 393:
Regression With Additive Gaussian N
- Page 394 and 395:
we obtain ∂ 1 log p(D|Γ) = ∂α
- Page 396 and 397:
18.1.7 Sparse linear models Regress
- Page 398 and 399:
18.2.1 Hyperparameter optimisation
- Page 400 and 401:
6 4 2 0 −2 −4 0.4 0.3 0.2 0.3 0
- Page 402 and 403:
1 0.5 0 −6 −4 −2 0 2 4 6 18.2
- Page 404 and 405:
8 6 4 2 0 −2 −4 −6 0.3 0.4 0.
- Page 406 and 407:
Exercises Exercise 18.6. This exerc
- Page 408 and 409:
θ y 1 y N y ∗ x 1 x N x ∗ (a)
- Page 410 and 411:
Gaussian Process Prediction The zer
- Page 412 and 413:
2 1.5 1 0.5 0 −0.5 −1 −1.5
- Page 414 and 415:
Covariance Functions For a stationa
- Page 416 and 417:
1.5 1 0.5 0 −0.5 −1 −1.5 −2
- Page 418 and 419:
c 1 c N c ∗ y 1 y N y ∗ x 1 x N
- Page 420 and 421:
Using these expressions in the Newt
- Page 422 and 423:
Gaussian Processes for Classificati
- Page 424 and 425:
d 1 log q(C|X ) = dθ 2 yTK −1 x,
- Page 426 and 427:
θh θ v|h h n v n N Expectation Ma
- Page 428 and 429:
θh θ vi|h M-step : p(v|h) h n vn
- Page 430 and 431:
1 2 3 4 Expectation Maximisation fo
- Page 432 and 433:
M-step : optimal mi Maximising equa
- Page 434 and 435:
(a) 1 iteration (b) 50 iterations (
- Page 436 and 437:
12 10 8 6 4 2 0 −2 −4 −6 −8
- Page 438 and 439:
8 6 4 2 0 −2 −4 −8 −6 −4
- Page 440 and 441:
z 1 v 1 θ (a) z N v N z 1 v 1 π
- Page 442 and 443:
π n z n w v n w α Wn θ N Mixed M
- Page 444 and 445:
(a) (b) Mixed Membership Models Fig
- Page 446 and 447:
1 0.8 0.6 0.4 0.2 0 −2 −1.5 −
- Page 448 and 449:
Mixed Membership Models 1000 Years
- Page 450 and 451:
Exercises Exercise 20.7. You wish t
- Page 452 and 453:
h1 h2 h3 v1 v2 v3 v4 v5 Probabilist
- Page 454 and 455:
21.2.1 Eigen-approach likelihood op
- Page 456 and 457:
Factor Analysis : Maximum Likelihoo
- Page 458 and 459:
¯x11 f1 G ¯x12 ¯x13 µ ¯x22 ori
- Page 460 and 461:
h1 h2 x1 x2 x∗ w1 w2 w∗ (a) M1
- Page 462 and 463:
h x y and integrating out the laten
- Page 464 and 465:
Exercises A popular alternative est
- Page 466 and 467:
Exercises 442 DRAFT January 9, 2013
- Page 468 and 469:
δq xqs Q αs S The Rasch Model Fig
- Page 470 and 471:
competitor 10 20 30 40 50 60 70 80
- Page 472 and 473:
Exercises Exercise 22.4. An extensi
- Page 475 and 476:
Introduction to Part IV Natural org
- Page 477 and 478:
CHAPTER 23 Discrete-State Markov Mo
- Page 479 and 480:
Markov Models h v1 v2 v3 v4 (a) Fig
- Page 481 and 482:
Markov Models For those less comfor
- Page 483 and 484:
Hidden Markov Models smoothing corr
- Page 485 and 486:
Hidden Markov Models This gives a r
- Page 487 and 488:
Hidden Markov Models (a) ‘Creaks
- Page 489 and 490:
Hidden Markov Models (a) (b) (c) Fi
- Page 491 and 492:
Learning HMMs M-step Assuming i.i.d
- Page 493 and 494:
Related Models c1 c2 c3 c4 h1 h2 h3
- Page 495 and 496:
Related Models x y1 y2 y3 Direction
- Page 497 and 498:
Applications determines the phoneme
- Page 499 and 500:
Exercises where Aij ≡ p(ht+1 = i|
- Page 501 and 502:
Exercises 1. First consider h1,...
- Page 503 and 504:
Exercises ++ Exercise 23.16 (Fuzzy
- Page 505 and 506:
CHAPTER 24 Continuous-state Markov
- Page 507 and 508:
Auto-Regressive Models 200 150 100
- Page 509 and 510:
Auto-Regressive Models a1 a2 a3 a4
- Page 511 and 512:
Latent Linear Dynamical Systems σ
- Page 513 and 514:
Inference messages exactly. In tran
- Page 515 and 516:
Inference and covariance Ft ≡
- Page 517 and 518:
Inference Algorithm 24.2 LDS Backwa
- Page 519 and 520:
Learning Linear Dynamical Systems y
- Page 521 and 522:
Learning Linear Dynamical Systems s
- Page 523 and 524:
Switching Auto-Regressive Models 10
- Page 525 and 526:
Code s1 s2 s3 s4 v1 v2 v3 v4 ˜v1
- Page 527 and 528:
Exercises Exercise 24.5. Consider a
- Page 529 and 530:
CHAPTER 25 Switching Linear Dynamic
- Page 531 and 532:
Gaussian Sum Filtering than that of
- Page 533 and 534:
Gaussian Sum Filtering t t+1 25.3.2
- Page 535 and 536:
Gaussian Sum Smoothing filtered app
- Page 537 and 538:
Gaussian Sum Smoothing Algorithm 25
- Page 539 and 540:
Gaussian Sum Smoothing a b d 40 20
- Page 541 and 542:
Reset Models s1 h1 v1 s2 h2 v2 s3 h
- Page 543 and 544:
Reset Models where g(a1, b1, a2, b2
- Page 545 and 546:
Exercises 11 10.5 10 9.5 0 50 100 1
- Page 547 and 548:
CHAPTER 26 Distributed Computation
- Page 549 and 550:
Learning Sequences v4(t) v3(t) v2(t
- Page 551 and 552:
Learning Sequences neuron number 10
- Page 553 and 554:
Learning Sequences (a) (b) Figure 2
- Page 555 and 556:
Tractable Continuous Latent Variabl
- Page 557 and 558:
Neural Models Example 26.3 (Sequenc
- Page 559 and 560:
Neural Models 26.5.4 Leaky integrat
- Page 561:
Part V Approximate Inference 537
- Page 564 and 565:
Laplace (cont. variables) determini
- Page 566 and 567:
Introduction For any sampling metho
- Page 568 and 569:
Algorithm 27.2 Rejection sampling t
- Page 570 and 571:
x1 x2 x3 x4 x5 x6 27.2 Ancestral Sa
- Page 572 and 573:
x 2 x 1 Gibbs Sampling Figure 27.5:
- Page 574 and 575:
3 2 1 0 −1 −2 −3 −4 −3
- Page 576 and 577:
Algorithm 27.3 Metropolis-Hastings
- Page 578 and 579:
Algorithm 27.4 Hybrid Monte Carlo s
- Page 580 and 581:
Algorithm 27.5 Swendson-Wang sampli
- Page 582 and 583:
Algorithm 27.6 Slice Sampling (univ
- Page 584 and 585:
h1 h2 h3 h4 v1 v2 v3 v4 27.6.1 Sequ
- Page 586 and 587:
5 10 15 20 25 30 35 40 5 10 15 20 2
- Page 588 and 589:
demoMetropolis.m: Demo of Metropoli
- Page 590 and 591:
Exercises 566 DRAFT January 9, 2013
- Page 592 and 593:
Properties of Kullback-Leibler Vari
- Page 594 and 595:
Properties of Kullback-Leibler Vari
- Page 596 and 597:
see fig(28.3), from which the poste
- Page 598 and 599:
The variance is 2 zi − 〈zi〉
- Page 600 and 601:
Variational Bounding Using KL(q|p)
- Page 602 and 603:
Local and KL Variational Approximat
- Page 604 and 605:
x y1 y2 y3 y4 Mutual Information Ma
- Page 606 and 607:
x1 x2 λx1→xi (xi) x3 λx2→xi (
- Page 608 and 609:
Loopy Belief Propagation (so that t
- Page 610 and 611:
Expectation Propagation ing an expo
- Page 612 and 613:
which, on substituting in the updat
- Page 614 and 615:
a b e f Lagrangian c d (a) a b e f
- Page 616 and 617:
a+ b− s c+ d− t e− f+ a b (a)
- Page 618 and 619:
Further Reading For a given α and
- Page 620 and 621:
1. ˜ f(x) ≥ ˜g(x), for x ≥ a
- Page 622 and 623:
Exercise 28.13. Consider an approxi
- Page 624 and 625:
Exercises 600 DRAFT January 9, 2013
- Page 626 and 627:
e e* a Linear Algebra a α e Figure
- Page 628 and 629:
Linear Algebra A square matrix A is
- Page 630 and 631:
Linear Algebra Definition 29.12 (Bl
- Page 632 and 633:
Definition 29.17 (Quadratic form).
- Page 634 and 635:
Inequalities Definition 29.20 (Chai
- Page 636 and 637:
29.5.1 Gradient descent with fixed
- Page 638 and 639:
29.5.5 The conjugate vectors algori
- Page 640 and 641:
Algorithm 29.2 Quasi-Newton for min
- Page 642 and 643:
Constrained Optimisation using Lagr
- Page 644 and 645:
BIBLIOGRAPHY BIBLIOGRAPHY [16] D. B
- Page 646 and 647:
BIBLIOGRAPHY BIBLIOGRAPHY [63] S. C
- Page 648 and 649:
BIBLIOGRAPHY BIBLIOGRAPHY [111] A.
- Page 650 and 651:
BIBLIOGRAPHY BIBLIOGRAPHY [159] F.
- Page 652 and 653:
BIBLIOGRAPHY BIBLIOGRAPHY [206] M.
- Page 654 and 655:
BIBLIOGRAPHY BIBLIOGRAPHY [251] F.
- Page 656 and 657:
BIBLIOGRAPHY BIBLIOGRAPHY [299] M.
- Page 658 and 659:
BIBLIOGRAPHY BIBLIOGRAPHY 634 DRAFT
- Page 660 and 661:
INDEX INDEX Bellman’s equation, 1
- Page 662 and 663:
INDEX INDEX conditioning, 170 conju
- Page 664 and 665:
INDEX INDEX filtering, 459 input-ou
- Page 666 and 667:
INDEX INDEX symmetrising updates, 4
- Page 668 and 669:
INDEX INDEX conjugate vectors algor
- Page 670:
INDEX INDEX changepoint model, 516