- Page 1 and 2: Bayesian Reasoning and Machine Lear
- Page 3 and 4: The data explosion Preface We live
- Page 5 and 6: Part I: Inference in Probabilistic
- Page 7 and 8: BRMLtoolbox The BRMLtoolbox is a li
- Page 9 and 10: GMMem - Fit a mixture of Gaussian t
- Page 11 and 12: Contents Front Matter I Notation Li
- Page 13 and 14: CONTENTS CONTENTS 5.6.3 Bucket elim
- Page 15 and 16: CONTENTS CONTENTS 9.5.2 Empirical i
- Page 17 and 18: CONTENTS CONTENTS 15.3 High Dimensi
- Page 19 and 20: CONTENTS CONTENTS 20.3.7 Semi-super
- Page 21 and 22: CONTENTS CONTENTS 25 Switching Line
- Page 23 and 24: CONTENTS CONTENTS 29.1.6 Linear tra
- Page 25: Part I Inference in Probabilistic M
- Page 29 and 30: messagepassing messagestractable ap
- Page 31 and 32: CHAPTER 1 Probabilistic Reasoning W
- Page 33 and 34: Probability Refresher us ‘invert
- Page 35 and 36: Probability Refresher This may seem
- Page 37 and 38: Probabilistic Reasoning reasoning i
- Page 39 and 40: Probabilistic Reasoning Example 1.5
- Page 41 and 42: Prior, Likelihood and Posterior The
- Page 43 and 44: Prior, Likelihood and Posterior 1.3
- Page 45 and 46: Exercises Summing a Potential sumpo
- Page 47 and 48: Exercises Exercise 1.12. Implement
- Page 49 and 50: CHAPTER 2 Basic Graph Concepts Ofte
- Page 51 and 52: Numerically Encoding Graphs c a f d
- Page 53 and 54: Code is an incidence matrix for fig
- Page 55 and 56: CHAPTER 3 Belief Networks We can no
- Page 57 and 58: The Benefits of Structure Furthermo
- Page 59 and 60: Uncertain and Unreliable Evidence x
- Page 61 and 62: Uncertain and Unreliable Evidence B
- Page 63 and 64: Belief Networks x1 x2 x3 x4 (a) x3
- Page 65 and 66: Belief Networks x z (a) y x z (b) y
- Page 67 and 68: Belief Networks A B C A B C → A B
- Page 69 and 70: Belief Networks b g f t (a) s b g f
- Page 71 and 72: Causality infer this in the larger
- Page 73 and 74: Causality Similarly, G p(R|G, D)p(
- Page 75 and 76: Exercises H P D A U a t l b e x d F
- Page 77 and 78:
Exercises 1. Compute ‘by hand’
- Page 79 and 80:
Exercises 4. Writing M = [m1 m2 m3]
- Page 81 and 82:
CHAPTER 4 Graphical Models In chapt
- Page 83 and 84:
Markov Networks 1 2 3 4 (a) 5 6 7 1
- Page 85 and 86:
Markov Networks Definition 4.7 (Mar
- Page 87 and 88:
Markov Networks h a b c d e f g i j
- Page 89 and 90:
Chain Graphical Models a b c d (a)
- Page 91 and 92:
Factor Graphs a b c d e f (a) c a (
- Page 93 and 94:
Expressiveness of Graphical Models
- Page 95 and 96:
Exercises • Some special probabil
- Page 97 and 98:
Exercises 2. Does the distribution
- Page 99 and 100:
CHAPTER 5 Efficient Inference in Tr
- Page 101 and 102:
Marginal Inference The missing prop
- Page 103 and 104:
Marginal Inference f1 f2 f3 f4 a b
- Page 105 and 106:
Marginal Inference Factor to Variab
- Page 107 and 108:
Other Forms of Inference f4 a b d f
- Page 109 and 110:
Other Forms of Inference If we want
- Page 111 and 112:
Other Forms of Inference 4 1 2 6 3
- Page 113 and 114:
Inference in Multiply Connected Gra
- Page 115 and 116:
Inference in Multiply Connected Gra
- Page 117 and 118:
Code • The sum-product and max-pr
- Page 119 and 120:
Exercises 5. If two players are jum
- Page 121 and 122:
CHAPTER 6 The Junction Tree Algorit
- Page 123 and 124:
Clique Graphs In other words p(a, b
- Page 125 and 126:
Junction Trees x3 x1 x4 (a) x2 x3,
- Page 127 and 128:
Junction Trees and φ ∗ (x3, x4)
- Page 129 and 130:
Junction Trees for Multiply-Connect
- Page 131 and 132:
Junction Trees for Multiply-Connect
- Page 133 and 134:
The Junction Tree Algorithm abf bf
- Page 135 and 136:
The Junction Tree Algorithm where s
- Page 137 and 138:
Finding the Most Likely State 6.6.5
- Page 139 and 140:
The Need For Approximations dce e e
- Page 141 and 142:
Exercises 6.12 Exercises Exercise 6
- Page 143 and 144:
Exercises 1. Now eliminate clique 1
- Page 145 and 146:
CHAPTER 7 Making Decisions So far w
- Page 147 and 148:
Decision Trees Based on expected ut
- Page 149 and 150:
Extending Bayesian Networks for Dec
- Page 151 and 152:
Extending Bayesian Networks for Dec
- Page 153 and 154:
Solving Influence Diagrams where we
- Page 155 and 156:
Solving Influence Diagrams The cliq
- Page 157 and 158:
Markov Decision Processes and utili
- Page 159 and 160:
Temporally Unbounded MDPs Defining
- Page 161 and 162:
Variational Inference and Planning
- Page 163 and 164:
Financial Matters S 1 T using this
- Page 165 and 166:
Financial Matters Alternatively, th
- Page 167 and 168:
Financial Matters At time step t +
- Page 169 and 170:
Further Topics Similarly, U(xt+1, d
- Page 171 and 172:
Code case, the first step of the ju
- Page 173 and 174:
Exercises (8, 4) is the desired par
- Page 175 and 176:
Exercises Exercise 7.9. Consider an
- Page 177:
Part II Learning in Probabilistic M
- Page 180 and 181:
Graphical model Undirected Directed
- Page 182 and 183:
8.2 Distributions Distributions Dis
- Page 184 and 185:
For a multivariate distribution the
- Page 186 and 187:
The KL divergence is ≥ 0 Distribu
- Page 188 and 189:
1.5 1 0.5 λ=0.2 λ=0.5 λ=1 λ=1.5
- Page 190 and 191:
30 20 10 −5 0 5 10 15 0 −5 0 5
- Page 192 and 193:
0.14 0.12 0.1 0.08 0.06 0.04 0.02 0
- Page 194 and 195:
Result 8.3 (Linear Transform of a G
- Page 196 and 197:
Learning distributions For example
- Page 198 and 199:
Properties of Maximum Likelihood me
- Page 200 and 201:
8.7.3 Maximum likelihood and the em
- Page 202 and 203:
The posterior is then p(µ, σ 2 |X
- Page 204 and 205:
Exercises • Provided that we are
- Page 206 and 207:
must satisfy A B C D P Q R S I
- Page 208 and 209:
1. Take the Taylor expansion of Exe
- Page 210 and 211:
Exercises Exercise 8.31. For a Gaus
- Page 212 and 213:
1. Show that the α that minimises
- Page 214 and 215:
θ v 1 v 2 v 3 · · · v N (a) θ
- Page 216 and 217:
10 5 0 0 0.2 0.4 0.6 0.8 1 θ Using
- Page 218 and 219:
a s c 1 1 1 1 0 0 0 1 1 0 1 0 1 1 1
- Page 220 and 221:
x1 x2 · · · xn−1 xn Conditiona
- Page 222 and 223:
θa a n s n c n θ a,s c θs n = 1
- Page 224 and 225:
Bayesian Belief Network Training Ze
- Page 226 and 227:
a s c 1 1 2 1 0 0 0 1 1 0 1 0 1 1 2
- Page 228 and 229:
Structure learning For all but the
- Page 230 and 231:
Structure learning sample value is
- Page 232 and 233:
x1 x3 x6 x8 (a) x2 x4 x7 x5 x1 x3 x
- Page 234 and 235:
Structure learning For two variable
- Page 236 and 237:
Maximum Likelihood for Undirected m
- Page 238 and 239:
x1 x2 x3 x4 (a) x5 x6 φ (x1, x2)
- Page 240 and 241:
Maximum Likelihood for Undirected m
- Page 242 and 243:
9.6.4 Exponential form potentials F
- Page 244 and 245:
Maximum Likelihood for Undirected m
- Page 246 and 247:
9.6.6 Pseudo likelihood Consider a
- Page 248 and 249:
Fuse Drum Toner Paper Roller Burnin
- Page 250 and 251:
Exercises Exercise 9.7. For i.i.d.
- Page 252 and 253:
Estimation using Maximum Likelihood
- Page 254 and 255:
Using the binary encoding xi ∈ {0
- Page 256 and 257:
c n x n i n = 1 : N θi,c c = 1 : C
- Page 258 and 259:
c x1 x2 x3 x4 Using the general ide
- Page 260 and 261:
Exercises Thus, a response (1, 0, 1
- Page 262 and 263:
Exercises 238 DRAFT January 9, 2013
- Page 264 and 265:
xvis θ minv (a) xinv xvis θ minv
- Page 266 and 267:
11.1.4 Identifiability issues Expec
- Page 268 and 269:
11.2.2 Classical EM In the variatio
- Page 270 and 271:
s c 1 1 0 0 1 1 1 0 1 1 0 0 0 1 Exp
- Page 272 and 273:
11.2.4 General case Expectation Max
- Page 274 and 275:
Expectation Maximisation A useful p
- Page 276 and 277:
Extensions of EM For a guaranteed i
- Page 278 and 279:
Variational Bayes where n(v, h) is
- Page 280 and 281:
Algorithm 11.4 Variational Bayes (i
- Page 282 and 283:
M-step: q(θ) updates From above we
- Page 284 and 285:
which has gradient ⎛ ∂ L = ∂
- Page 286 and 287:
Exercises Exercise 11.5. Write a ge
- Page 288 and 289:
Exercises 264 DRAFT January 9, 2013
- Page 290 and 291:
0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4
- Page 292 and 293:
0.2 0.1 0 0 0.2 0.1 0 0 0.2 0.1 0 0
- Page 294 and 295:
y 12 10 8 6 4 2 0 −2 0 5 10 x 15
- Page 296 and 297:
where φ(x) is a K + 1 dimensional
- Page 298 and 299:
Bayesian Hypothesis Testing for Out
- Page 300 and 301:
Bayesian Hypothesis Testing for Out
- Page 302 and 303:
Bayesian Hypothesis Testing for Out
- Page 304 and 305:
9 8 7 6 5 4 3 2 1 0 0 0.1 0.2 0.3 0
- Page 306 and 307:
Assuming no prior preference amongs
- Page 308 and 309:
Exercises 284 DRAFT January 9, 2013
- Page 311 and 312:
Introduction to Part III Machine Le
- Page 313 and 314:
CHAPTER 13 Machine Learning Concept
- Page 315 and 316:
Styles of Learning on a visit to a
- Page 317 and 318:
Supervised Learning General loss/ut
- Page 319 and 320:
Supervised Learning Train Validate
- Page 321 and 322:
Supervised Learning θ X , C p(x, c
- Page 323 and 324:
Supervised Learning m m m m m m m f
- Page 325 and 326:
Exercises The advantage of this app
- Page 327 and 328:
CHAPTER 14 Nearest Neighbour Classi
- Page 329 and 330:
K-Nearest Neighbours Choosing K Fig
- Page 331 and 332:
Exercises The motivation for K near
- Page 333 and 334:
CHAPTER 15 Unsupervised Linear Dime
- Page 335 and 336:
Principal Components Analysis B)
- Page 337 and 338:
Principal Components Analysis Algor
- Page 339 and 340:
High Dimensional Data (a) (b) Figur
- Page 341 and 342:
Latent Semantic Analysis 2 4 6 8 10
- Page 343 and 344:
PCA With Missing Data 1 2 3 4 5 6 7
- Page 345 and 346:
PCA With Missing Data 20 40 60 80 1
- Page 347 and 348:
Matrix Decomposition Methods Below
- Page 349 and 350:
Matrix Decomposition Methods Algori
- Page 351 and 352:
Matrix Decomposition Methods Gradie
- Page 353 and 354:
Kernel PCA PCA in the potentially v
- Page 355 and 356:
Canonical Correlation Analysis 5 10
- Page 357 and 358:
Exercises 2. Show that for a matrix
- Page 359 and 360:
CHAPTER 16 Supervised Linear Dimens
- Page 361 and 362:
Canonical Variates and therefore th
- Page 363 and 364:
Canonical Variates Algorithm 16.1 C
- Page 365 and 366:
Exercises Show that by setting y1 =
- Page 367 and 368:
CHAPTER 17 Linear Models In this ch
- Page 369 and 370:
Linear Parameter Models for Regress
- Page 371 and 372:
Linear Parameter Models for Regress
- Page 373 and 374:
The Dual Representation and Kernels
- Page 375 and 376:
Linear Parameter Models for Classif
- Page 377 and 378:
Linear Parameter Models for Classif
- Page 379 and 380:
Linear Parameter Models for Classif
- Page 381 and 382:
Support Vector Machines 17.5 Suppor
- Page 383 and 384:
Support Vector Machines Since only
- Page 385 and 386:
Soft Zero-One Loss for Outlier Robu
- Page 387 and 388:
Exercises Exercise 17.6. The logist
- Page 389 and 390:
CHAPTER 18 Bayesian Linear Models T
- Page 391 and 392:
Regression With Additive Gaussian N
- Page 393 and 394:
Regression With Additive Gaussian N
- Page 395 and 396:
Regression With Additive Gaussian N
- Page 397 and 398:
Classification Algorithm 18.1 Evide
- Page 399 and 400:
Classification By approximating E(w
- Page 401 and 402:
Classification 18.2.3 Variational G
- Page 403 and 404:
Classification 0 −0.5 −1 −1.5
- Page 405 and 406:
Exercises 18.5 Exercises Exercise 1
- Page 407 and 408:
CHAPTER 19 Gaussian Processes In Ba
- Page 409 and 410:
Non-Parametric Prediction From this
- Page 411 and 412:
Gaussian Process Prediction In this
- Page 413 and 414:
Covariance Functions Definition 19.
- Page 415 and 416:
Analysis of Covariance Functions 1
- Page 417 and 418:
Analysis of Covariance Functions of
- Page 419 and 420:
Gaussian Processes for Classificati
- Page 421 and 422:
Gaussian Processes for Classificati
- Page 423 and 424:
Exercises 19.8 Exercises Exercise 1
- Page 425 and 426:
CHAPTER 20 Mixture Models Mixture m
- Page 427 and 428:
Expectation Maximisation for Mixtur
- Page 429 and 430:
Expectation Maximisation for Mixtur
- Page 431 and 432:
The Gaussian Mixture Model Figure 2
- Page 433 and 434:
The Gaussian Mixture Model Algorith
- Page 435 and 436:
The Gaussian Mixture Model 0.25 0.2
- Page 437 and 438:
The Gaussian Mixture Model Algorith
- Page 439 and 440:
Indicator Models x n y n h n W U n
- Page 441 and 442:
Mixed Membership Models 250 200 150
- Page 443 and 444:
Mixed Membership Models Arts Budget
- Page 445 and 446:
Mixed Membership Models 1 2 3 4 (a)
- Page 447 and 448:
Mixed Membership Models 20 40 60 80
- Page 449 and 450:
Exercises demoGMMclass.m: Demo GMM
- Page 451 and 452:
CHAPTER 21 Latent Linear Models In
- Page 453 and 454:
Factor Analysis : Maximum Likelihoo
- Page 455 and 456:
Factor Analysis : Maximum Likelihoo
- Page 457 and 458:
Factor Analysis : Maximum Likelihoo
- Page 459 and 460:
Interlude: Modelling Faces (a) Mean
- Page 461 and 462:
Canonical Correlation Analysis and
- Page 463 and 464:
Independent Components Analysis 1.5
- Page 465 and 466:
Exercises Exercise 21.5. For the lo
- Page 467 and 468:
CHAPTER 22 Latent Ability Models In
- Page 469 and 470:
Competition Models student fraction
- Page 471 and 472:
Exercises This integral is formally
- Page 473:
Part IV Dynamical Models 449
- Page 476 and 477:
Dynamic models Markov models Comple
- Page 478 and 479:
v1 v2 v3 v4 (a) v1 v2 v3 v4 Figure
- Page 480 and 481:
Bayesian fitting Markov Models For
- Page 482 and 483:
h1 h2 h3 h4 v1 v2 v3 v4 23.2 Hidden
- Page 484 and 485:
Hidden Markov Models We can write e
- Page 486 and 487:
where the final term p(ht−1|v1:t
- Page 488 and 489:
m1 m2 m3 h1 h2 h3 h4 v1 v2 v3 v4 st
- Page 490 and 491:
23.2.9 Natural language models Lear
- Page 492 and 493:
Continuous observations Learning HM
- Page 494 and 495:
Related Models Including the counte
- Page 496 and 497:
x3(t) x2(t) x1(t) v1(t) h1(t) h2(t)
- Page 498 and 499:
Exercises • Hidden Markov models
- Page 500 and 501:
1. What is the probability that the
- Page 502 and 503:
Exercises Exercise 23.14. For train
- Page 504 and 505:
Exercises 480 DRAFT January 9, 2013
- Page 506 and 507:
corresponds to rotational behaviour
- Page 508 and 509:
24.2.1 Training an AR model Maximum
- Page 510 and 511:
(a) (b) (c) (d) (e) Auto-Regressive
- Page 512 and 513:
h1 h2 h3 h4 v1 v2 v3 v4 Inference F
- Page 514 and 515:
Inference Algorithm 24.1 LDS Forwar
- Page 516 and 517:
24.4.2 Smoothing : Rauch-Tung-Strie
- Page 518 and 519:
24.4.4 Most likely state Inference
- Page 520 and 521:
which is representable as a new lat
- Page 522 and 523:
s1 s2 s3 s4 v1 v2 v3 v4 Switching A
- Page 524 and 525:
Switching Auto-Regressive Models On
- Page 526 and 527:
24.9 Exercises Exercise 24.1. Consi
- Page 528 and 529:
Exercises 504 DRAFT January 9, 2013
- Page 530 and 531:
s1 h1 v1 s2 h2 v2 s3 h3 v3 s4 h4 v4
- Page 532 and 533:
Gaussian Sum Filtering and subseque
- Page 534 and 535:
Gaussian Sum Smoothing Algorithm 25
- Page 536 and 537:
st−1 ht−1 vt−1 st ht vt st+1
- Page 538 and 539:
25.4.5 Relation to other methods Ga
- Page 540 and 541:
20 10 0 0 20 40 60 80 100 20 10 0 0
- Page 542 and 543:
Reset Models with the understanding
- Page 544 and 545:
c1 c2 c3 c4 s1 h1 v1 s2 h2 v2 s3 h3
- Page 546 and 547:
2. Writing Exercises β(ht, st) =
- Page 548 and 549:
v4 v4 These two rules can be compac
- Page 550 and 551:
Under this condition sgn j wijvj(t)
- Page 552 and 553:
fraction correct 1 0.95 0.9 0.85 0.
- Page 554 and 555:
26.3.3 Boolean networks Tractable C
- Page 556 and 557:
(a) (b) Tractable Continuous Latent
- Page 558 and 559:
neuron number 5 10 15 20 25 30 35 4
- Page 560 and 561:
26.8 Exercises Exercises Exercise 2
- Page 563 and 564:
Introduction to Part V In Part I we
- Page 565 and 566:
CHAPTER 27 Sampling In cases where
- Page 567 and 568:
Introduction 1 × 2 3 Figure 27.1:
- Page 569 and 570:
Introduction An alternative exact a
- Page 571 and 572:
Gibbs Sampling x1 x4 x3 x6 x2 x5 Fi
- Page 573 and 574:
Gibbs Sampling 27.3.2 Structured Gi
- Page 575 and 576:
Markov Chain Monte Carlo (MCMC) 27.
- Page 577 and 578:
Auxiliary Variable Methods 2.5 2 1.
- Page 579 and 580:
Auxiliary Variable Methods (a) (b)
- Page 581 and 582:
Auxiliary Variable Methods Figure 2
- Page 583 and 584:
Importance Sampling p(x). Rather th
- Page 585 and 586:
Importance Sampling in which case,
- Page 587 and 588:
Importance Sampling where η t ∼
- Page 589 and 590:
Exercises Exercise 27.6. The file d
- Page 591 and 592:
CHAPTER 28 Deterministic Approximat
- Page 593 and 594:
Properties of Kullback-Leibler Vari
- Page 595 and 596:
Variational Bounding Using KL(q|p)
- Page 597 and 598:
Variational Bounding Using KL(q|p)
- Page 599 and 600:
Variational Bounding Using KL(q|p)
- Page 601 and 602:
Local and KL Variational Approximat
- Page 603 and 604:
Mutual Information Maximisation : A
- Page 605 and 606:
Mutual Information Maximisation : A
- Page 607 and 608:
Loopy Belief Propagation 28.7.2 Loo
- Page 609 and 610:
Expectation Propagation x w origina
- Page 611 and 612:
Expectation Propagation x1 x4 (a) x
- Page 613 and 614:
MAP for Markov networks Algorithm 2
- Page 615 and 616:
MAP for Markov networks Iterated co
- Page 617 and 618:
MAP for Markov networks (a) (b) Fig
- Page 619 and 620:
Exercises • Loopy belief propagat
- Page 621 and 622:
Exercises 2. Furthermore, show that
- Page 623 and 624:
Exercises Exercise 28.17. Consider
- Page 625 and 626:
29.1 Linear Algebra 29.1.1 Vector a
- Page 627 and 628:
Linear Algebra p v u a n Figure 29.
- Page 629 and 630:
Linear Algebra generalises to any d
- Page 631 and 632:
Linear Algebra Hence a matrix is si
- Page 633 and 634:
Multivariate Calculus x 1 f(x) 29.2
- Page 635 and 636:
Multivariate Optimisation below thi
- Page 637 and 638:
Multivariate Optimisation 29.5.3 Mi
- Page 639 and 640:
@@ Multivariate Optimisation Algori
- Page 641 and 642:
Constrained Optimisation using Lagr
- Page 643 and 644:
Bibliography [1] L. F. Abbott, J. A
- Page 645 and 646:
BIBLIOGRAPHY BIBLIOGRAPHY [39] G. J
- Page 647 and 648:
BIBLIOGRAPHY BIBLIOGRAPHY [87] A. D
- Page 649 and 650:
BIBLIOGRAPHY BIBLIOGRAPHY [134] G.
- Page 651 and 652:
BIBLIOGRAPHY BIBLIOGRAPHY [183] S.
- Page 653 and 654:
BIBLIOGRAPHY BIBLIOGRAPHY [229] F.
- Page 655 and 656:
BIBLIOGRAPHY BIBLIOGRAPHY [275] N.
- Page 657 and 658:
BIBLIOGRAPHY BIBLIOGRAPHY [321] S.
- Page 659 and 660:
1 − of − M coding, 231 N-max-pr
- Page 661 and 662:
INDEX INDEX Matérn, 390 Mercer ker
- Page 663 and 664:
INDEX INDEX optimal investment, 142
- Page 665 and 666:
INDEX INDEX Joseph’s symmetrized
- Page 667 and 668:
INDEX INDEX max-product, 83 N most
- Page 669 and 670:
INDEX INDEX regression, 290, 384 lo