10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.170 10 — The Noisy-Channel Coding TheoremResults that may help in finding the optimal input distribution1. All outputs must be used.2. I(X; Y ) is a convex ⌣ function of the channel parameters. Reminder: The term ‘convex ⌣’means ‘convex’, <strong>and</strong> the term3. There may be several optimal input distributions, but they all look the ‘concave ⌢’ means ‘concave’; thesame at the output.little smile <strong>and</strong> frown symbols areincluded simply to remind youwhat convex <strong>and</strong> concave mean.⊲ Exercise 10.6. [2 ] Prove that no output y is unused by an optimal input distribution,unless it is unreachable, that is, has Q(y | x) = 0 for all x.Exercise 10.7. [2 ] Prove that I(X; Y ) is a convex ⌣ function of Q(y | x).Exercise 10.8. [2 ] Prove that all optimal input distributions of a channel havethe same output probability distribution P (y) = ∑ xP (x)Q(y | x).These results, along with the fact that I(X; Y ) is a concave ⌢ function ofthe input probability vector p, prove the validity of the symmetry argumentthat we have used when finding the capacity of symmetric channels. If achannel is invariant under a group of symmetry operations – for example,interchanging the input symbols <strong>and</strong> interchanging the output symbols – then,given any optimal input distribution that is not symmetric, i.e., is not invariantunder these operations, we can create another input distribution by averagingtogether this optimal input distribution <strong>and</strong> all its permuted forms that wecan make by applying the symmetry operations to the original optimal inputdistribution. The permuted distributions must have the same I(X; Y ) as theoriginal, by symmetry, so the new input distribution created by averagingmust have I(X; Y ) bigger than or equal to that of the original distribution,because of the concavity of I.Symmetric channelsIn order to use symmetry arguments, it will help to have a definition of asymmetric channel. I like Gallager’s (1968) definition.A discrete memoryless channel is a symmetric channel if the set ofoutputs can be partitioned into subsets in such a way that for eachsubset the matrix of transition probabilities has the property that eachrow (if more than 1) is a permutation of each other row <strong>and</strong> each columnis a permutation of each other column.Example 10.9. This channelP (y = 0 | x = 0) = 0.7 ;P (y = ? | x = 0) = 0.2 ;P (y = 1 | x = 0) = 0.1 ;P (y = 0 | x = 1) = 0.1 ;P (y = ? | x = 1) = 0.2 ;P (y = 1 | x = 1) = 0.7.(10.23)is a symmetric channel because its outputs can be partitioned into (0, 1)<strong>and</strong> ?, so that the matrix can be rewritten:P (y = 0 | x = 0) = 0.7 ;P (y = 1 | x = 0) = 0.1 ;P (y = 0 | x = 1) = 0.1 ;P (y = 1 | x = 1) = 0.7 ;P (y = ? | x = 0) = 0.2 ; P (y = ? | x = 1) = 0.2.(10.24)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!