Bayesian Reasoning and Machine Learning

More documents

Recommendations

Info

CONTENTS CONTENTS 7.9.1 Partially observable MDPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 7.9.2 Reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 7.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 7.11 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 7.11.1 Sum/Max under a partial order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 7.11.2 Junction trees for influence diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 7.11.3 Party-Friend example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.11.4 Chest Clinic with Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.11.5 Markov decision processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 7.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 II Learning in Probabilistic Models 153 8 Statistics for Machine Learning 157 8.1 Representing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 8.1.1 Categorical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 8.1.2 Ordinal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 8.1.3 Numerical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 8.2 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 8.2.1 The Kullback-Leibler Divergence KL(q|p) . . . . . . . . . . . . . . . . . . . . . . . . . 161 8.2.2 Entropy and information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 8.3 Classical Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 8.4 Multivariate Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 8.4.1 Completing the square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 8.4.2 Conditioning as system reversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 8.4.3 Whitening and centering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 8.5 Exponential Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 8.5.1 Conjugate priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 8.6 Learning distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 8.7 Properties of Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 8.7.1 Training assuming the correct model class . . . . . . . . . . . . . . . . . . . . . . . . . 175 8.7.2 Training when the assumed model is incorrect . . . . . . . . . . . . . . . . . . . . . . . 175 8.7.3 Maximum likelihood and the empirical distribution . . . . . . . . . . . . . . . . . . . . 176 8.8 Learning a Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 8.8.1 Maximum likelihood training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 8.8.2 Bayesian inference of the mean and variance . . . . . . . . . . . . . . . . . . . . . . . . 177 8.8.3 Gauss-Gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 8.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 8.10 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 8.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 9 Learning as Inference 189 9.1 Learning as Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 9.1.1 Learning the bias of a coin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 9.1.2 Making decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 9.1.3 A continuum of parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 9.1.4 Decisions based on continuous intervals . . . . . . . . . . . . . . . . . . . . . . . . . . 192 9.2 Bayesian methods and ML-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 9.3 Maximum Likelihood Training of Belief Networks . . . . . . . . . . . . . . . . . . . . . . . . . 194 9.4 Bayesian Belief Network Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 9.4.1 Global and local parameter independence . . . . . . . . . . . . . . . . . . . . . . . . . 197 9.4.2 Learning binary variable tables using a Beta prior . . . . . . . . . . . . . . . . . . . . 198 9.4.3 Learning multivariate discrete tables using a Dirichlet prior . . . . . . . . . . . . . . . 200 9.5 Structure learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 9.5.1 PC algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 XIV DRAFT January 9, 2013
CONTENTS CONTENTS 9.5.2 Empirical independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 9.5.3 Network scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 9.5.4 Chow-Liu Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 9.6 Maximum Likelihood for Undirected models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 9.6.1 The likelihood gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 9.6.2 General tabular clique potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 9.6.3 Decomposable Markov networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 9.6.4 Exponential form potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 9.6.5 Conditional random fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 9.6.6 Pseudo likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 9.6.7 Learning the structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 9.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 9.8 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 9.8.1 PC algorithm using an oracle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 9.8.2 Demo of empirical conditional independence . . . . . . . . . . . . . . . . . . . . . . . . 223 9.8.3 Bayes Dirichlet structure learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 9.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 10 Naive Bayes 227 10.1 Naive Bayes and Conditional Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 10.2 Estimation using Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 10.2.1 Binary attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 10.2.2 Multi-state variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 10.2.3 Text classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 10.3 Bayesian Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 10.4 Tree Augmented Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 10.4.1 Learning tree augmented Naive Bayes networks . . . . . . . . . . . . . . . . . . . . . . 234 10.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 10.6 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 10.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 11 Learning with Hidden Variables 239 11.1 Hidden Variables and Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 11.1.1 Why hidden/missing variables can complicate proceedings . . . . . . . . . . . . . . . . 239 11.1.2 The missing at random assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 11.1.3 Maximum likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 11.1.4 Identifiability issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 11.2 Expectation Maximisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 11.2.1 Variational EM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 11.2.2 Classical EM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 11.2.3 Application to Belief networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 11.2.4 General case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 11.2.5 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 11.2.6 Application to Markov networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 11.3 Extensions of EM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 11.3.1 Partial M step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 11.3.2 Partial E-step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 11.4 A failure case for EM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 11.5 Variational Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 11.5.1 EM is a special case of variational Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . 256 11.5.2 An example: VB for the Asbestos-Smoking-Cancer network . . . . . . . . . . . . . . . 256 11.6 Optimising the Likelihood by Gradient Methods . . . . . . . . . . . . . . . . . . . . . . . . . 259 11.6.1 Undirected models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 11.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 11.8 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 11.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 DRAFT January 9, 2013 XV
Page 1 and 2: Bayesian Reasoning and Machine Lear
Page 3 and 4: The data explosion Preface We live
Page 5 and 6: Part I: Inference in Probabilistic
Page 7 and 8: BRMLtoolbox The BRMLtoolbox is a li
Page 9 and 10: GMMem - Fit a mixture of Gaussian t
Page 11 and 12: Contents Front Matter I Notation Li
Page 13: CONTENTS CONTENTS 5.6.3 Bucket elim
Page 17 and 18: CONTENTS CONTENTS 15.3 High Dimensi
Page 19 and 20: CONTENTS CONTENTS 20.3.7 Semi-super
Page 21 and 22: CONTENTS CONTENTS 25 Switching Line
Page 23 and 24: CONTENTS CONTENTS 29.1.6 Linear tra
Page 25: Part I Inference in Probabilistic M
Page 28 and 29: Factor Graphs Graphical Models Undi
Page 30 and 31: 6 DRAFT January 9, 2013
Page 32 and 33: Probability Refresher The probabili
Page 34 and 35: Probability Refresher conditioning
Page 36 and 37: 1.1.2 Probability Tables Probabilis
Page 38 and 39: Probabilistic Reasoning where we us
Page 40 and 41: Then p(A = 0, C = 0) = p(A = 0, B,
Page 42 and 43: 4 2 0 −2 −4 0 20 40 60 80 100 (
Page 44 and 45: 1.5 Code Code The BRMLtoolbox code
Page 46 and 47: and also p(x|y, z) = p(y|x, z)p(x|z
Page 48 and 49: Exercises 24 DRAFT January 9, 2013
Page 50 and 51: Graphs and each edge (Ak−1, Ak),
Page 52 and 53: 1 2 3 (a) 4 1 2 3 (b) 2.2.2 Adjacen
Page 54 and 55: node (a leaf node) which can be eli
Page 56 and 57: J R (a) T 3.1.1 Modelling independe
Page 58 and 59: The Benefits of Structure Example 3
Page 60 and 61: Uncertain and Unreliable Evidence k
Page 62 and 63: 3.3 Belief Networks Definition 3.1
Page 64 and 65:
x1 x3 (a) x2 x1 x3 (b) x2 x1 x3 (c)
Page 66 and 67:
a b e A B C A B C A B C D A B C A B
Page 68 and 69:
u x y z u x y w z u x y z u x y z u
Page 70 and 71:
t1 y1 h (a) t2 y2 t1 y1 (b) t2 y2 B
Page 72 and 73:
G D R (a) FD Resolution of the para
Page 74 and 75:
where p(D|FD = ∅, G) ≡ p(D|pa (
Page 76 and 77:
Battery Fuel Gauge Turn Over Start
Page 78 and 79:
Exercises Exercise 3.15. A belief n
Page 80 and 81:
Exercises 56 DRAFT January 9, 2013
Page 82 and 83:
x1 x2 x4 (a) x3 x1 x2 x4 (b) x3 x1
Page 84 and 85:
x1 x3 (a) x4 x3 (b) x2 x4 x1 x3 (c)
Page 86 and 87:
Markov Networks 4-cycle x1 − x2
Page 88 and 89:
Consider a model in which our desir
Page 90 and 91:
Chain Graphical Models Each chain c
Page 92 and 93:
Expressiveness of Graphical Models
Page 94 and 95:
Consider the graph of the BN Expres
Page 96 and 97:
2. By symmetry arguments, write dow
Page 98 and 99:
Exercises The set of marginals cons
Page 100 and 101:
discussed in more depth in section(
Page 102 and 103:
Marginal Inference where v is a vec
Page 104 and 105:
For consistency of interpretation,
Page 106 and 107:
Marginal Inference This can be inte
Page 108 and 109:
Other Forms of Inference defining m
Page 110 and 111:
lists, see for example [72]. Other
Page 112 and 113:
until the point Other Forms of Infe
Page 114 and 115:
Inference in Multiply Connected Gra
Page 116 and 117:
c a f d (a) marginal, say p(d). The
Page 118 and 119:
Exercises demoShortestPath.m: The s
Page 120 and 121:
Exercises 1. Explain mathematically
Page 122 and 123:
a b c (a) d a, b, c b, c b, c, d (b
Page 124 and 125:
A B C D ← 10 1 → 6 ← 2 → 3
Page 126 and 127:
6.3.1 The running intersection prop
Page 128 and 129:
dce d Constructing a Junction Tree
Page 130 and 131:
To see this, consider the following
Page 132 and 133:
a b c d e f g h i j k l (a) a b c d
Page 134 and 135:
(a) 1 2 5 7 10 3 4 8 9 6 11 (b) 1 2
Page 136 and 137:
• The new potential on (a, b) is
Page 138 and 139:
Reabsorption : Converting a Junctio
Page 140 and 141:
d1 d2 d3 d4 d5 s1 s2 s3 6.10 Summar
Page 142 and 143:
Exercise 6.6. For the undirected gr
Page 144 and 145:
Page 146 and 147:
P arty yes no Rain Rain (0.6) yes (
Page 148 and 149:
P arty yes no (0.6) yes Rain (0.6)
Page 150 and 151:
P arty Rain Uparty Partial ordering
Page 152 and 153:
E UC The utilities are (a) P I UB E
Page 154 and 155:
where D1 ≺ X1 ≺ D2 ≺ X2, and
Page 156 and 157:
a b D1 U1 c d b, c, a b, c b, e, d,
Page 158 and 159:
Markov Decision Processes time depe
Page 160 and 161:
1 0 11 0 21 0 31 0 41 0 0 0 0 0 1 0
Page 162 and 163:
Variational Inference and Planning
Page 164 and 165:
Two possibilities case Financial Ma
Page 166 and 167:
3 2.5 2 1.5 1 0.5 0 5 10 15 20 25 3
Page 168 and 169:
Further Topics d1 d2 d3 Figure 7.13
Page 170 and 171:
dx Ux a t e s x d l b dh Uh s = Smo
Page 172 and 173:
Exercises We assume that we know wh
Page 174 and 175:
The cost of playing the second stag
Page 176 and 177:
Page 179 and 180:
Introduction to Part II In Part II
Page 181 and 182:
CHAPTER 8 Statistics for Machine Le
Page 183 and 184:
Distributions where the Jacobian ma
Page 185 and 186:
Distributions Definition 8.10 (Empi
Page 187 and 188:
Classical Distributions Definition
Page 189 and 190:
Classical Distributions 5 4 3 2 1
Page 191 and 192:
Classical Distributions x 3 1 0.5 0
Page 193 and 194:
Multivariate Gaussian The multivari
Page 195 and 196:
Exponential Family 8.4.3 Whitening
Page 197 and 198:
Learning distributions In seeking t
Page 199 and 200:
Properties of Maximum Likelihood Wh
Page 201 and 202:
Learning a Gaussian (a) Prior (b) P
Page 203 and 204:
Learning a Gaussian 8.8.3 Gauss-Gam
Page 205 and 206:
Exercises Exercise 8.6. Using x T
Page 207 and 208:
Exercises show that ∂ lim γ→0
Page 209 and 210:
Exercises 2. Using equation (8.4.13
Page 211 and 212:
Exercises 1. Show that the log prod
Page 213 and 214:
CHAPTER 9 Learning as Inference In
Page 215 and 216:
Learning as Inference In the coin t
Page 217 and 218:
Bayesian methods and ML-II a s θ v
Page 219 and 220:
Maximum Likelihood Training of Beli
Page 221 and 222:
Bayesian Belief Network Training 9.
Page 223 and 224:
Bayesian Belief Network Training Th
Page 225 and 226:
Bayesian Belief Network Training No
Page 227 and 228:
Structure learning Algorithm 9.1 PC
Page 229 and 230:
Structure learning x y z w t (a) x
Page 231 and 232:
Structure learning z1 θz z n x n y
Page 233 and 234:
Structure learning Algorithm 9.3 Ch
Page 235 and 236:
Maximum Likelihood for Undirected m
Page 237 and 238:
Page 239 and 240:
Page 241 and 242:
Page 243 and 244:
Page 245 and 246:
Page 247 and 248:
Code • Provided we assume local a
Page 249 and 250:
Exercises Exercise 9.3. A training
Page 251 and 252:
CHAPTER 10 Naive Bayes So far we’
Page 253 and 254:
Estimation using Maximum Likelihood
Page 255 and 256:
Estimation using Maximum Likelihood
Page 257 and 258:
Bayesian Naive Bayes For convenienc
Page 259 and 260:
Exercises Practitioners typically c
Page 261 and 262:
Exercises so that for example x2 =
Page 263 and 264:
CHAPTER 11 Learning with Hidden Var
Page 265 and 266:
Hidden Variables and Missing Data E
Page 267 and 268:
Expectation Maximisation Algorithm
Page 269 and 270:
Expectation Maximisation log p(v|θ
Page 271 and 272:
Expectation Maximisation The first
Page 273 and 274:
Expectation Maximisation Algorithm
Page 275 and 276:
Extensions of EM log likelihood −
Page 277 and 278:
A failure case for EM Stochastic EM
Page 279 and 280:
Variational Bayes Algorithm 11.3 Va
Page 281 and 282:
Variational Bayes θa a s c θc (a)
Page 283 and 284:
Optimising the Likelihood by Gradie
Page 285 and 286:
Exercises fuse assembly malfunction
Page 287 and 288:
Exercises Exercise 11.9. A 2 × 2 p
Page 289 and 290:
CHAPTER 12 Bayesian Model Selection
Page 291 and 292:
Illustrations : coin tossing 8 6 4
Page 293 and 294:
++ Occam’s Razor and Bayesian Com
Page 295 and 296:
A continuous example : curve fittin
Page 297 and 298:
Approximating the Model Likelihood
Page 299 and 300:
Bayesian Hypothesis Testing for Out
Page 301 and 302:
Page 303 and 304:
Page 305 and 306:
@@ @@ Exercises 1. Under the assump
Page 307 and 308:
Exercises show that p(D|My→x) is
Page 309:
Part III Machine Learning 285
Page 312 and 313:
Clique decomp. Mixed membership Lat
Page 314 and 315:
Train Test Styles of Learning Figur
Page 316 and 317:
Supervised Learning be disastrous (
Page 318 and 319:
X , C p(x, c) 〈L(c, c(x|θ))〉 p
Page 320 and 321:
1.5 1 0.5 0 −0.5 −1 −1.5 −3
Page 322 and 323:
Supervised Learning Based on zero-o
Page 324 and 325:
θ c|h θh θ x|h cn xn hn N 13.3 B
Page 326 and 327:
form p(c = 1|x) = 1 1 + exp −(ax
Page 328 and 329:
K-Nearest Neighbours Figure 14.1: I
Page 330 and 331:
A Probabilistic Interpretation of N
Page 332 and 333:
Exercises many images of male faces
Page 334 and 335:
3 2 1 0 −1 −2 0 2 4 4 2 0 −2
Page 336 and 337:
2 1.5 1 0.5 0 −0.5 −1 −1.5
Page 338 and 339:
Principal Components Analysis struc
Page 340 and 341:
number of errors 12 10 8 6 4 2 0 0
Page 342 and 343:
Latent Semantic Analysis Example 15
Page 344 and 345:
(a) (b) PCA With Missing Data Figur
Page 346 and 347:
(a) (b) Matrix Decomposition Method
Page 348 and 349:
z x y M-step (a) x z y (b) Matrix D
Page 350 and 351:
1 2 3 4 5 6 7 8 9 10 1 2 (a) 1 2 3
Page 352 and 353:
factor 1 (Reinforcement Learning) 0
Page 354 and 355:
Finding the reconstructions Canonic
Page 356 and 357:
Exercises This approach is taken in
Page 358 and 359:
3. Finding the eigenvalues and eige
Page 360 and 361:
5 4 3 2 1 0 −1 −2 −3 −4 −
Page 362 and 363:
Canonical Variates Between class Sc
Page 364 and 365:
Exercises Example 16.1 (Using canon
Page 366 and 367:
Page 368 and 369:
chirps per sec 110 105 100 95 90 85
Page 370 and 371:
1 0.5 0 0 0.5 5 10 15 20 25 0 −0.
Page 372 and 373:
17.2.3 Radial basis functions A pop
Page 374 and 375:
17.3.1 Regression in the dual-space
Page 376 and 377:
w Linear Parameter Models for Class
Page 378 and 379:
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0
Page 380 and 381:
1.5 1 0.5 0 0.9 0.8 0.7 0.6 0.5 0.4
Page 382 and 383:
2-Norm soft-margin w w origin ξ n
Page 384 and 385:
2 1.5 1 0.5 0 −0.5 −1 −1.5
Page 386 and 387:
17.8 Code demoCubicPoly.m: Demo of
Page 388 and 389:
Page 390 and 391:
α w x n y n N β Regression With A
Page 392 and 393:
Regression With Additive Gaussian N
Page 394 and 395:
we obtain ∂ 1 log p(D|Γ) = ∂α
Page 396 and 397:
18.1.7 Sparse linear models Regress
Page 398 and 399:
18.2.1 Hyperparameter optimisation
Page 400 and 401:
6 4 2 0 −2 −4 0.4 0.3 0.2 0.3 0
Page 402 and 403:
1 0.5 0 −6 −4 −2 0 2 4 6 18.2
Page 404 and 405:
8 6 4 2 0 −2 −4 −6 0.3 0.4 0.
Page 406 and 407:
Exercises Exercise 18.6. This exerc
Page 408 and 409:
θ y 1 y N y ∗ x 1 x N x ∗ (a)
Page 410 and 411:
Gaussian Process Prediction The zer
Page 412 and 413:
2 1.5 1 0.5 0 −0.5 −1 −1.5
Page 414 and 415:
Covariance Functions For a stationa
Page 416 and 417:
1.5 1 0.5 0 −0.5 −1 −1.5 −2
Page 418 and 419:
c 1 c N c ∗ y 1 y N y ∗ x 1 x N
Page 420 and 421:
Using these expressions in the Newt
Page 422 and 423:
Gaussian Processes for Classificati
Page 424 and 425:
d 1 log q(C|X ) = dθ 2 yTK −1 x,
Page 426 and 427:
θh θ v|h h n v n N Expectation Ma
Page 428 and 429:
θh θ vi|h M-step : p(v|h) h n vn
Page 430 and 431:
1 2 3 4 Expectation Maximisation fo
Page 432 and 433:
M-step : optimal mi Maximising equa
Page 434 and 435:
(a) 1 iteration (b) 50 iterations (
Page 436 and 437:
12 10 8 6 4 2 0 −2 −4 −6 −8
Page 438 and 439:
8 6 4 2 0 −2 −4 −8 −6 −4
Page 440 and 441:
z 1 v 1 θ (a) z N v N z 1 v 1 π
Page 442 and 443:
π n z n w v n w α Wn θ N Mixed M
Page 444 and 445:
(a) (b) Mixed Membership Models Fig
Page 446 and 447:
1 0.8 0.6 0.4 0.2 0 −2 −1.5 −
Page 448 and 449:
Mixed Membership Models 1000 Years
Page 450 and 451:
Exercises Exercise 20.7. You wish t
Page 452 and 453:
h1 h2 h3 v1 v2 v3 v4 v5 Probabilist
Page 454 and 455:
21.2.1 Eigen-approach likelihood op
Page 456 and 457:
Factor Analysis : Maximum Likelihoo
Page 458 and 459:
¯x11 f1 G ¯x12 ¯x13 µ ¯x22 ori
Page 460 and 461:
h1 h2 x1 x2 x∗ w1 w2 w∗ (a) M1
Page 462 and 463:
h x y and integrating out the laten
Page 464 and 465:
Exercises A popular alternative est
Page 466 and 467:
Page 468 and 469:
δq xqs Q αs S The Rasch Model Fig
Page 470 and 471:
competitor 10 20 30 40 50 60 70 80
Page 472 and 473:
Exercises Exercise 22.4. An extensi
Page 475 and 476:
Introduction to Part IV Natural org
Page 477 and 478:
CHAPTER 23 Discrete-State Markov Mo
Page 479 and 480:
Markov Models h v1 v2 v3 v4 (a) Fig
Page 481 and 482:
Markov Models For those less comfor
Page 483 and 484:
Hidden Markov Models smoothing corr
Page 485 and 486:
Hidden Markov Models This gives a r
Page 487 and 488:
Hidden Markov Models (a) ‘Creaks
Page 489 and 490:
Hidden Markov Models (a) (b) (c) Fi
Page 491 and 492:
Learning HMMs M-step Assuming i.i.d
Page 493 and 494:
Related Models c1 c2 c3 c4 h1 h2 h3
Page 495 and 496:
Related Models x y1 y2 y3 Direction
Page 497 and 498:
Applications determines the phoneme
Page 499 and 500:
Exercises where Aij ≡ p(ht+1 = i|
Page 501 and 502:
Exercises 1. First consider h1,...
Page 503 and 504:
Exercises ++ Exercise 23.16 (Fuzzy
Page 505 and 506:
CHAPTER 24 Continuous-state Markov
Page 507 and 508:
Auto-Regressive Models 200 150 100
Page 509 and 510:
Auto-Regressive Models a1 a2 a3 a4
Page 511 and 512:
Latent Linear Dynamical Systems σ
Page 513 and 514:
Inference messages exactly. In tran
Page 515 and 516:
Inference and covariance Ft ≡
Page 517 and 518:
Inference Algorithm 24.2 LDS Backwa
Page 519 and 520:
Learning Linear Dynamical Systems y
Page 521 and 522:
Learning Linear Dynamical Systems s
Page 523 and 524:
Switching Auto-Regressive Models 10
Page 525 and 526:
Code s1 s2 s3 s4 v1 v2 v3 v4 ˜v1
Page 527 and 528:
Exercises Exercise 24.5. Consider a
Page 529 and 530:
CHAPTER 25 Switching Linear Dynamic
Page 531 and 532:
Gaussian Sum Filtering than that of
Page 533 and 534:
Gaussian Sum Filtering t t+1 25.3.2
Page 535 and 536:
Gaussian Sum Smoothing filtered app
Page 537 and 538:
Gaussian Sum Smoothing Algorithm 25
Page 539 and 540:
Gaussian Sum Smoothing a b d 40 20
Page 541 and 542:
Reset Models s1 h1 v1 s2 h2 v2 s3 h
Page 543 and 544:
Reset Models where g(a1, b1, a2, b2
Page 545 and 546:
Exercises 11 10.5 10 9.5 0 50 100 1
Page 547 and 548:
CHAPTER 26 Distributed Computation
Page 549 and 550:
Learning Sequences v4(t) v3(t) v2(t
Page 551 and 552:
Learning Sequences neuron number 10
Page 553 and 554:
Learning Sequences (a) (b) Figure 2
Page 555 and 556:
Tractable Continuous Latent Variabl
Page 557 and 558:
Neural Models Example 26.3 (Sequenc
Page 559 and 560:
Neural Models 26.5.4 Leaky integrat
Page 561:
Part V Approximate Inference 537
Page 564 and 565:
Laplace (cont. variables) determini
Page 566 and 567:
Introduction For any sampling metho
Page 568 and 569:
Algorithm 27.2 Rejection sampling t
Page 570 and 571:
x1 x2 x3 x4 x5 x6 27.2 Ancestral Sa
Page 572 and 573:
x 2 x 1 Gibbs Sampling Figure 27.5:
Page 574 and 575:
3 2 1 0 −1 −2 −3 −4 −3
Page 576 and 577:
Algorithm 27.3 Metropolis-Hastings
Page 578 and 579:
Algorithm 27.4 Hybrid Monte Carlo s
Page 580 and 581:
Algorithm 27.5 Swendson-Wang sampli
Page 582 and 583:
Algorithm 27.6 Slice Sampling (univ
Page 584 and 585:
h1 h2 h3 h4 v1 v2 v3 v4 27.6.1 Sequ
Page 586 and 587:
5 10 15 20 25 30 35 40 5 10 15 20 2
Page 588 and 589:
demoMetropolis.m: Demo of Metropoli
Page 590 and 591:
Page 592 and 593:
Properties of Kullback-Leibler Vari
Page 594 and 595:
Properties of Kullback-Leibler Vari
Page 596 and 597:
see fig(28.3), from which the poste
Page 598 and 599:
The variance is 2 zi − 〈zi〉
Page 600 and 601:
Variational Bounding Using KL(q|p)
Page 602 and 603:
Local and KL Variational Approximat
Page 604 and 605:
x y1 y2 y3 y4 Mutual Information Ma
Page 606 and 607:
x1 x2 λx1→xi (xi) x3 λx2→xi (
Page 608 and 609:
Loopy Belief Propagation (so that t
Page 610 and 611:
Expectation Propagation ing an expo
Page 612 and 613:
which, on substituting in the updat
Page 614 and 615:
a b e f Lagrangian c d (a) a b e f
Page 616 and 617:
a+ b− s c+ d− t e− f+ a b (a)
Page 618 and 619:
Further Reading For a given α and
Page 620 and 621:
1. ˜ f(x) ≥ ˜g(x), for x ≥ a
Page 622 and 623:
Exercise 28.13. Consider an approxi
Page 624 and 625:
Page 626 and 627:
e e* a Linear Algebra a α e Figure
Page 628 and 629:
Linear Algebra A square matrix A is
Page 630 and 631:
Linear Algebra Definition 29.12 (Bl
Page 632 and 633:
Definition 29.17 (Quadratic form).
Page 634 and 635:
Inequalities Definition 29.20 (Chai
Page 636 and 637:
29.5.1 Gradient descent with fixed
Page 638 and 639:
29.5.5 The conjugate vectors algori
Page 640 and 641:
Algorithm 29.2 Quasi-Newton for min
Page 642 and 643:
Constrained Optimisation using Lagr
Page 644 and 645:
BIBLIOGRAPHY BIBLIOGRAPHY [16] D. B
Page 646 and 647:
BIBLIOGRAPHY BIBLIOGRAPHY [63] S. C
Page 648 and 649:
BIBLIOGRAPHY BIBLIOGRAPHY [111] A.
Page 650 and 651:
BIBLIOGRAPHY BIBLIOGRAPHY [159] F.
Page 652 and 653:
BIBLIOGRAPHY BIBLIOGRAPHY [206] M.
Page 654 and 655:
BIBLIOGRAPHY BIBLIOGRAPHY [251] F.
Page 656 and 657:
BIBLIOGRAPHY BIBLIOGRAPHY [299] M.
Page 658 and 659:
BIBLIOGRAPHY BIBLIOGRAPHY 634 DRAFT
Page 660 and 661:
INDEX INDEX Bellman’s equation, 1
Page 662 and 663:
INDEX INDEX conditioning, 170 conju
Page 664 and 665:
INDEX INDEX filtering, 459 input-ou
Page 666 and 667:
INDEX INDEX symmetrising updates, 4
Page 668 and 669:
INDEX INDEX conjugate vectors algor
Page 670:
INDEX INDEX changepoint model, 516
show all

Bayesian Reasoning and Machine Learning

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?