Information Theory, Inference, and Learning ... - Inference Group

More documents

Recommendations

Info

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.540 45 — Gaussian Processes45.2 From parametric models to Gaussian processesLinear modelsLet us consider a regression problem using H fixed basis functions, for exampleone-dimensional radial basis functions as defined in equation (45.3).Let us assume that a list of N input points {x (n) } has been specified anddefine the N × H matrix R to be the matrix of values of the basis functions{φ h (x)} H h=1 at the points {x n},R nh ≡ φ h (x (n) ). (45.17)We define the vector y N to be the vector of values of y(x) at the N points,y n ≡ ∑ hR nh w h . (45.18)If the prior distribution of w is Gaussian with zero mean,P (w) = Normal(w; 0, σ 2 wI), (45.19)then y, being a linear function of w, is also Gaussian distributed, with meanzero. The covariance matrix of y isSo the prior distribution of y is:Q = 〈yy T 〉 = 〈Rww T R T 〉 = R 〈ww T 〉 R T (45.20)= σ 2 wRR T . (45.21)P (y) = Normal(y; 0, Q) = Normal(y; 0, σ 2 w RRT ). (45.22)This result, that the vector of N function values y has a Gaussian distribution,is true for any selected points X N . This is the defining property of aGaussian process. The probability distribution of a function y(x) is a Gaussianprocess if for any finite selection of points x (1) , x (2) , . . . , x (N) , the densityP (y(x (1) ), y(x (2) ), . . . , y(x (N) )) is a Gaussian.Now, if the number of basis functions H is smaller than the number ofdata points N, then the matrix Q will not have full rank. In this case theprobability distribution of y might be thought of as a flat elliptical pancakeconfined to an H-dimensional subspace in the N-dimensional space in whichy lives.What about the target values? If each target t n is assumed to differ byadditive Gaussian noise of variance σν 2 from the corresponding function valuey n then t also has a Gaussian prior distribution,P (t) = Normal(t; 0, Q + σ 2 νI). (45.23)We will denote the covariance matrix of t by C:C = Q + σν 2 I = σ2 w RRT + σν 2 I. (45.24)Whether or not Q has full rank, the covariance matrix C has full rank sinceσνI 2 is full rank.What does the covariance matrix Q look like? In general, the (n, n ′ ) entryof Q is∑Q nn ′ = [σwRR 2 T ] nn ′ = σw2 φ h (x (n) )φ h (x (n′) ) (45.25)h
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.45.2: From parametric models to Gaussian processes 541and the (n, n ′ ) entry of C is∑C nn ′ = σw2 φ h (x (n) )φ h (x (n′) ) + δ nn ′σν 2 , (45.26)where δ nn ′ = 1 if n = n ′ and 0 otherwise.hExample 45.4. Let’s take as an example a one-dimensional case, with radialbasis functions. The expression for Q nn ′ becomes simplest if we assume wehave uniformly-spaced basis functions with the basis function labelled h centredon the point x = h, and take the limit H → ∞, so that the sum overh becomes an integral; to avoid having a covariance that diverges with H,we had better make σw 2 scale as S/(∆H), where ∆H is the number of basisfunctions per unit length of the x-axis, and S is a constant; then∫ hmaxQ nn ′ = S dh φ h (x (n) )φ h (x (n′) ) (45.27)h min∫ [] []hmax= S dh exp − (x(n) − h) 22r 2 exp − (x(n′) − h) 22r 2 . (45.28)h minIf we let the limits of integration be ±∞, we can solve this integral:Q nn ′ = √ πr 2 S exp[]− (x(n′) − x (n) ) 24r 2 . (45.29)We are arriving at a new perspective on the interpolation problem. Instead ofspecifying the prior distribution on functions in terms of basis functions andpriors on parameters, the prior can be summarized simply by a covariancefunction,[]C(x (n) , x (n′) ) ≡ θ 1 exp − (x(n′) − x (n) ) 24r 2 , (45.30)where we have given a new name, θ 1 , to the constant out front.Generalizing from this particular case, a vista of interpolation methodsopens up. Given any valid covariance function C(x, x ′ ) – we’ll discuss ina moment what ‘valid’ means – we can define the covariance matrix for Nfunction values at locations X N to be the matrix Q given byQ nn ′ = C(x (n) , x (n′) ) (45.31)and the covariance matrix for N corresponding target values, assuming Gaussiannoise, to be the matrix C given byC nn ′ = C(x (n) , x (n′) ) + σ 2 νδ nn ′. (45.32)In conclusion, the prior probability of the N target values t in the data set is:P (t) = Normal(t; 0, C) = 1 Z e− 1 2 tT C −1t . (45.33)Samples from this Gaussian process and a few other simple Gaussian processesare displayed in figure 45.1.
Page 1 and 2:
Copyright Cambridge University Pres
Page 3 and 4:
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 80 and 81:
Page 82 and 83:
Page 84 and 85:
Page 86 and 87:
Page 88 and 89:
Page 90 and 91:
Page 92 and 93:
Page 94 and 95:
Page 96 and 97:
Page 98 and 99:
Page 100 and 101:
Page 102 and 103:
Page 104 and 105:
Page 106 and 107:
Page 108 and 109:
¥¡¥¡¥¡¥¡¥¡¥¡¥¡¥¡¥
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
Page 118 and 119:
Page 120 and 121:
Page 122 and 123:
Page 124 and 125:
Page 126 and 127:
Page 128 and 129:
Page 130 and 131:
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Page 146 and 147:
Page 148 and 149:
Page 150 and 151:
Page 152 and 153:
Page 154 and 155:
Page 156 and 157:
Page 158 and 159:
Page 160 and 161:
Page 162 and 163:
Page 164 and 165:
Page 166 and 167:
Page 168 and 169:
Page 170 and 171:
Page 172 and 173:
Page 174 and 175:
Page 176 and 177:
Page 178 and 179:
Page 180 and 181:
Page 182 and 183:
Page 184 and 185:
Page 186 and 187:
Page 188 and 189:
Page 190 and 191:
Page 192 and 193:
Page 194 and 195:
Page 196 and 197:
Page 198 and 199:
Page 200 and 201:
Page 202 and 203:
Page 204 and 205:
Page 206 and 207:
Page 208 and 209:
Page 210 and 211:
Page 212 and 213:
Page 214 and 215:
Page 216 and 217:
Page 218 and 219:
Page 220 and 221:
Page 222 and 223:
Page 224 and 225:
Page 226 and 227:
Page 228 and 229:
Page 230 and 231:
Page 232 and 233:
Page 234 and 235:
Page 236 and 237:
Page 238 and 239:
Page 240 and 241:
Page 242 and 243:
Page 244 and 245:
Page 246 and 247:
Page 248 and 249:
Page 250 and 251:
Page 252 and 253:
Page 254 and 255:
Page 256 and 257:
Page 258 and 259:
Page 260 and 261:
Page 262 and 263:
Page 264 and 265:
Page 266 and 267:
Page 268 and 269:
Page 270 and 271:
Page 272 and 273:
Page 274 and 275:
Page 276 and 277:
Page 278 and 279:
Page 280 and 281:
Page 282 and 283:
Page 284 and 285:
Page 286 and 287:
Page 288 and 289:
Page 290 and 291:
Page 292 and 293:
Page 294 and 295:
Page 296 and 297:
Page 298 and 299:
Page 300 and 301:
Page 302 and 303:
Page 304 and 305:
Page 306 and 307:
Page 308 and 309:
Page 310 and 311:
Page 312 and 313:
Page 314 and 315:
Page 316 and 317:
Page 318 and 319:
Page 320 and 321:
Page 322 and 323:
Page 324 and 325:
Page 326 and 327:
Page 328 and 329:
Page 330 and 331:
Page 332 and 333:
Page 334 and 335:
Page 336 and 337:
Page 338 and 339:
Page 340 and 341:
Page 342 and 343:
Page 344 and 345:
Page 346 and 347:
Page 348 and 349:
Page 350 and 351:
Page 352 and 353:
Page 354 and 355:
Page 356 and 357:
Page 358 and 359:
Page 360 and 361:
Page 362 and 363:
Page 364 and 365:
Page 366 and 367:
Page 368 and 369:
Page 370 and 371:
Page 372 and 373:
Page 374 and 375:
Page 376 and 377:
Page 378 and 379:
Page 380 and 381:
Page 382 and 383:
Page 384 and 385:
Page 386 and 387:
Page 388 and 389:
Page 390 and 391:
Page 392 and 393:
Page 394 and 395:
Page 396 and 397:
Page 398 and 399:
Page 400 and 401:
Page 402 and 403:
Page 404 and 405:
Page 406 and 407:
Page 408 and 409:
Page 410 and 411:
Page 412 and 413:
Page 414 and 415:
Page 416 and 417:
Page 418 and 419:
Page 420 and 421:
Page 422 and 423:
Page 424 and 425:
Page 426 and 427:
Page 428 and 429:
Page 430 and 431:
Page 432 and 433:
Page 434 and 435:
Page 436 and 437:
Page 438 and 439:
Page 440 and 441:
Page 442 and 443:
Page 444 and 445:
Page 446 and 447:
Page 448 and 449:
Page 450 and 451:
Page 452 and 453:
Page 454 and 455:
Page 456 and 457:
Page 458 and 459:
Page 460 and 461:
Page 462 and 463:
Page 464 and 465:
Page 466 and 467:
Page 468 and 469:
Page 470 and 471:
Page 472 and 473:
Page 474 and 475:
Page 476 and 477:
Page 478 and 479:
Page 480 and 481:
Page 482 and 483:
Page 484 and 485:
Page 486 and 487:
Page 488 and 489:
Page 490 and 491:
Page 492 and 493:
Page 494 and 495:
Page 496 and 497:
Page 498 and 499:
Page 500 and 501:
Page 502 and 503: Copyright Cambridge University Pres
Page 514 and 515: ¡¤¢¢¤¨©¢£¡¢£¨¢£ ©Co
Page 554 and 555: ttttCopyright Cambridge University
Page 602 and 603:
Page 604 and 605:
Page 606 and 607:
Page 608 and 609:
Page 610 and 611:
Page 612 and 613:
Page 614 and 615:
Page 616 and 617:
Page 618 and 619:
Page 620 and 621:
Page 622 and 623:
Page 624 and 625:
Page 626 and 627:
Page 628 and 629:
Page 630 and 631:
Page 632 and 633:
Page 634 and 635:
Page 636 and 637:
Page 638 and 639:
Page 640:
show all

Information Theory, Inference, and Learning ... - Inference Group

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?