Information Theory, Inference, and Learning ... - Inference Group

More documents

Recommendations

Info

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.82 4 — The Source Coding TheoremChebyshev’s inequality 2. Let x be a random variable, and let α be apositive real number. ThenP ( (x − ¯x) 2 ≥ α ) ≤ σx 2 /α. (4.31)Proof: Take t = (x − ¯x) 2 and apply the previous proposition.Weak law of large numbers. Take x to be the average of N independentrandom variables h 1 , . . . , h N , having common mean ¯h and common varianceσ 2 h : x = 1 N∑ Nn=1 h n. ThenP ((x − ¯h) 2 ≥ α) ≤ σh 2 /αN. (4.32)Proof: obtained by showing that ¯x = ¯h and that σ 2 x = σ 2 h /N.We are interested in x being very close to the mean (α very small). No matterhow large σh 2 is, and no matter how small the required α is, and no matterhow small the desired probability that (x − ¯h) 2 ≥ α, we can always achieve itby taking N large enough.✷✷Proof of theorem 4.1 (p.78)We apply the law of large numbers to the random variable 1 N log 12 P (x) definedfor x drawn from the ensemble X N . This random variable can be written asthe average of N information contents h n = log 2 (1/P (x n )), each of which is arandom variable with mean H = H(X) and variance σ 2 ≡ var[log 2 (1/P (x n ))].(Each term h n is the Shannon information content of the nth outcome.)We again define the typical set with parameters N and β thus:{ [ 1T Nβ = x ∈ A N X :N log 2For all x ∈ T Nβ , the probability of x satisfiesAnd by the law of large numbers,] }1 2P (x) − H < β 2 . (4.33)2 −N(H+β) < P (x) < 2 −N(H−β) . (4.34)P (x ∈ T Nβ ) ≥ 1 −σ2β 2 N . (4.35)We have thus proved the ‘asymptotic equipartition’ principle. As N increases,the probability that x falls in T Nβ approaches 1, for any β. How does thisresult relate to source coding?We must relate T Nβ to H δ (X N ). We will show that for any given δ thereis a sufficiently big N such that H δ (X N ) ≃ NH.Part 1:1N H δ(X N ) < H + ɛ.The set T Nβ is not the best subset for compression. So the size of T Nβ givesan upper bound on H δ . We show how small H δ (X N ) must be by calculatinghow big T Nβ could possibly be. We are free to set β to any convenient value.The smallest possible probability that a member of T Nβ can have is 2 −N(H+β) ,and the total probability contained by T Nβ can’t be any bigger than 1. Sothat is, the size of the typical set is bounded by|T Nβ | 2 −N(H+β) < 1, (4.36)|T Nβ | < 2 N(H+β) . (4.37)σIf we set β = ɛ and N 0 such that 2ɛ 2 N 0≤ δ, then P (T Nβ ) ≥ 1 − δ, and the setT Nβ becomes a witness to the fact that H δ (X N ) ≤ log 2 |T Nβ | < N(H + ɛ).1N Hδ(XN )H 0(X)H0 1 δH + ɛH − ɛFigure 4.13. Schematic illustrationof the two parts of the theorem.Given any δ and ɛ, we show that1for large enough N,N H δ(X N )lies (1) below the line H + ɛ and(2) above the line H − ɛ.
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.4.6: Comments 83Part 2:1N H δ(X N ) > H − ɛ.Imagine that someone claims this second part is not so – that, for any N,the smallest δ-sufficient subset S δ is smaller than the above inequality wouldallow. We can make use of our typical set to show that they must be mistaken.Remember that we are free to set β to any value we choose. We will setβ = ɛ/2, so that our task is to prove that a subset S ′ having |S ′ | ≤ 2 N(H−2β)and achieving P (x ∈ S ′ ) ≥ 1 − δ cannot exist (for N greater than an N 0 thatwe will specify).So, let us consider the probability of falling in this rival smaller subset S ′ .The probability of the subset S ′ isP (x ∈ S ′ ) = P (x ∈ S ′ ∩T Nβ ) + P (x ∈ S ′ ∩T Nβ ), (4.38)where T Nβ denotes the complement {x ∉ T Nβ }. The maximum value ofthe first term is found if S ′ ∩ T Nβ contains 2 N(H−2β) outcomes all with themaximum probability, 2 −N(H−β) . The maximum value the second term canhave is P (x ∉ T Nβ ). So:✬✩T Nβ S ′✫✪❈❈❖ ❅■ ❅S❈′ ∩ T NβS ′ ∩ T NβP (x ∈ S ′ ) ≤ 2 N(H−2β) 2 −N(H−β) +σ2β 2 N = 2−Nβ +σ2β 2 N . (4.39)We can now set β = ɛ/2 and N 0 such that P (x ∈ S ′ ) < 1 − δ, which showsthat S ′ cannot satisfy the definition of a sufficient subset S δ . Thus any subsetS ′ with size |S ′ | ≤ 2 N(H−ɛ) has probability less than 1 − δ, so by the definitionof H δ , H δ (X N ) > N(H − ɛ).Thus for large enough N, the function 1 N H δ(X N ) is essentially a constantfunction of δ, for 0 < δ < 1, as illustrated in figures 4.9 and 4.13. ✷4.6 Comments1The source coding theorem (p.78) has two parts,N H δ(X N ) < H + ɛ, and1N H δ(X N ) > H − ɛ. Both results are interesting.The first part tells us that even if the probability of error δ is extremely1small, the number of bits per symbolN H δ(X N ) needed to specify a longN-symbol string x with vanishingly small error probability does not have toexceed H + ɛ bits. We need to have only a tiny tolerance for error, and thenumber of bits required drops significantly from H 0 (X) to (H + ɛ).What happens if we are yet more tolerant to compression errors? Part 2tells us that even if δ is very close to 1, so that errors are made most of thetime, the average number of bits per symbol needed to specify x must still beat least H − ɛ bits. These two extremes tell us that regardless of our specificallowance for error, the number of bits per symbol needed to specify x is Hbits; no more and no less.Caveat regarding ‘asymptotic equipartition’I put the words ‘asymptotic equipartition’ in quotes because it is importantnot to think that the elements of the typical set T Nβ really do have roughlythe same probability as each other. They are similar in probability only in1the sense that their values of log 2 P (x)are within 2Nβ of each other. Now, asβ is decreased, how does N have to increase, if we are to keep our bound onthe mass of the typical set, P (x ∈ T Nβ ) ≥ 1 − σ2 , constant? N must growβ 2 Nas 1/β 2 , so, if we write β in terms of N as α/ √ N, for some constant α, then
Page 1 and 2:
Copyright Cambridge University Pres
Page 3 and 4:
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44: Copyright Cambridge University Pres
Page 108 and 109: ¥¡¥¡¥¡¥¡¥¡¥¡¥¡¥¡¥
Page 144 and 145:
Page 146 and 147:
Page 148 and 149:
Page 150 and 151:
Page 152 and 153:
Page 154 and 155:
Page 156 and 157:
Page 158 and 159:
Page 160 and 161:
Page 162 and 163:
Page 164 and 165:
Page 166 and 167:
Page 168 and 169:
Page 170 and 171:
Page 172 and 173:
Page 174 and 175:
Page 176 and 177:
Page 178 and 179:
Page 180 and 181:
Page 182 and 183:
Page 184 and 185:
Page 186 and 187:
Page 188 and 189:
Page 190 and 191:
Page 192 and 193:
Page 194 and 195:
Page 196 and 197:
Page 198 and 199:
Page 200 and 201:
Page 202 and 203:
Page 204 and 205:
Page 206 and 207:
Page 208 and 209:
Page 210 and 211:
Page 212 and 213:
Page 214 and 215:
Page 216 and 217:
Page 218 and 219:
Page 220 and 221:
Page 222 and 223:
Page 224 and 225:
Page 226 and 227:
Page 228 and 229:
Page 230 and 231:
Page 232 and 233:
Page 234 and 235:
Page 236 and 237:
Page 238 and 239:
Page 240 and 241:
Page 242 and 243:
Page 244 and 245:
Page 246 and 247:
Page 248 and 249:
Page 250 and 251:
Page 252 and 253:
Page 254 and 255:
Page 256 and 257:
Page 258 and 259:
Page 260 and 261:
Page 262 and 263:
Page 264 and 265:
Page 266 and 267:
Page 268 and 269:
Page 270 and 271:
Page 272 and 273:
Page 274 and 275:
Page 276 and 277:
Page 278 and 279:
Page 280 and 281:
Page 282 and 283:
Page 284 and 285:
Page 286 and 287:
Page 288 and 289:
Page 290 and 291:
Page 292 and 293:
Page 294 and 295:
Page 296 and 297:
Page 298 and 299:
Page 300 and 301:
Page 302 and 303:
Page 304 and 305:
Page 306 and 307:
Page 308 and 309:
Page 310 and 311:
Page 312 and 313:
Page 314 and 315:
Page 316 and 317:
Page 318 and 319:
Page 320 and 321:
Page 322 and 323:
Page 324 and 325:
Page 326 and 327:
Page 328 and 329:
Page 330 and 331:
Page 332 and 333:
Page 334 and 335:
Page 336 and 337:
Page 338 and 339:
Page 340 and 341:
Page 342 and 343:
Page 344 and 345:
Page 346 and 347:
Page 348 and 349:
Page 350 and 351:
Page 352 and 353:
Page 354 and 355:
Page 356 and 357:
Page 358 and 359:
Page 360 and 361:
Page 362 and 363:
Page 364 and 365:
Page 366 and 367:
Page 368 and 369:
Page 370 and 371:
Page 372 and 373:
Page 374 and 375:
Page 376 and 377:
Page 378 and 379:
Page 380 and 381:
Page 382 and 383:
Page 384 and 385:
Page 386 and 387:
Page 388 and 389:
Page 390 and 391:
Page 392 and 393:
Page 394 and 395:
Page 396 and 397:
Page 398 and 399:
Page 400 and 401:
Page 402 and 403:
Page 404 and 405:
Page 406 and 407:
Page 408 and 409:
Page 410 and 411:
Page 412 and 413:
Page 414 and 415:
Page 416 and 417:
Page 418 and 419:
Page 420 and 421:
Page 422 and 423:
Page 424 and 425:
Page 426 and 427:
Page 428 and 429:
Page 430 and 431:
Page 432 and 433:
Page 434 and 435:
Page 436 and 437:
Page 438 and 439:
Page 440 and 441:
Page 442 and 443:
Page 444 and 445:
Page 446 and 447:
Page 448 and 449:
Page 450 and 451:
Page 452 and 453:
Page 454 and 455:
Page 456 and 457:
Page 458 and 459:
Page 460 and 461:
Page 462 and 463:
Page 464 and 465:
Page 466 and 467:
Page 468 and 469:
Page 470 and 471:
Page 472 and 473:
Page 474 and 475:
Page 476 and 477:
Page 478 and 479:
Page 480 and 481:
Page 482 and 483:
Page 484 and 485:
Page 486 and 487:
Page 488 and 489:
Page 490 and 491:
Page 492 and 493:
Page 494 and 495:
Page 496 and 497:
Page 498 and 499:
Page 500 and 501:
Page 502 and 503:
Page 504 and 505:
Page 506 and 507:
Page 508 and 509:
Page 510 and 511:
Page 512 and 513:
Page 514 and 515:
¡¤¢¢¤¨©¢£¡¢£¨¢£ ©Co
Page 516 and 517:
Page 518 and 519:
Page 520 and 521:
Page 522 and 523:
Page 524 and 525:
Page 526 and 527:
Page 528 and 529:
Page 530 and 531:
Page 532 and 533:
Page 534 and 535:
Page 536 and 537:
Page 538 and 539:
Page 540 and 541:
Page 542 and 543:
Page 544 and 545:
Page 546 and 547:
Page 548 and 549:
Page 550 and 551:
Page 552 and 553:
Page 554 and 555:
ttttCopyright Cambridge University
Page 556 and 557:
Page 558 and 559:
Page 560 and 561:
Page 562 and 563:
Page 564 and 565:
Page 566 and 567:
Page 568 and 569:
Page 570 and 571:
Page 572 and 573:
Page 574 and 575:
Page 576 and 577:
Page 578 and 579:
Page 580 and 581:
Page 582 and 583:
Page 584 and 585:
Page 586 and 587:
Page 588 and 589:
Page 590 and 591:
Page 592 and 593:
Page 594 and 595:
Page 596 and 597:
Page 598 and 599:
Page 600 and 601:
Page 602 and 603:
Page 604 and 605:
Page 606 and 607:
Page 608 and 609:
Page 610 and 611:
Page 612 and 613:
Page 614 and 615:
Page 616 and 617:
Page 618 and 619:
Page 620 and 621:
Page 622 and 623:
Page 624 and 625:
Page 626 and 627:
Page 628 and 629:
Page 630 and 631:
Page 632 and 633:
Page 634 and 635:
Page 636 and 637:
Page 638 and 639:
Page 640:
show all

Information Theory, Inference, and Learning ... - Inference Group

Create successful ePaper yourself

Delete template?

Save as template?