10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.2.1: Probabilities <strong>and</strong> ensembles 23xabcdefghijklmnopqrstuvwxyz –Figure 2.2. The probabilitydistribution over the 27×27possible bigrams xy in an Englishlanguage document, TheFrequently Asked QuestionsManual for Linux.a b c d e f g h i j k l m n o p q r s t u v w x y z –yMarginal probability. We can obtain the marginal probability P (x) fromthe joint probability P (x, y) by summation:P (x = a i ) ≡ ∑y∈A YP (x = a i , y). (2.3)Similarly, using briefer notation, the marginal probability of y is:P (y) ≡ ∑x∈A XP (x, y). (2.4)Conditional probabilityP (x = a i | y = b j ) ≡ P (x = a i, y = b j )P (y = b j )if P (y = b j ) ≠ 0. (2.5)[If P (y = b j ) = 0 then P (x = a i | y = b j ) is undefined.]We pronounce P (x = a i | y = b j ) ‘the probability that x equals a i , giveny equals b j ’.Example 2.1. An example of a joint ensemble is the ordered pair XY consistingof two successive letters in an English document. The possible outcomesare ordered pairs such as aa, ab, ac, <strong>and</strong> zz; of these, we might expectab <strong>and</strong> ac to be more probable than aa <strong>and</strong> zz. An estimate of thejoint probability distribution for two neighbouring characters is showngraphically in figure 2.2.This joint ensemble has the special property that its two marginal distributions,P (x) <strong>and</strong> P (y), are identical. They are both equal to themonogram distribution shown in figure 2.1.From this joint ensemble P (x, y) we can obtain conditional distributions,P (y | x) <strong>and</strong> P (x | y), by normalizing the rows <strong>and</strong> columns, respectively(figure 2.3). The probability P (y | x = q) is the probability distributionof the second letter given that the first letter is a q. As you can see infigure 2.3a, the two most probable values for the second letter y given

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!