10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.11.2: Inferring the input to a real channel 17911.2 Inferring the input to a real channel‘The best detection of pulses’In 1944 Shannon wrote a memor<strong>and</strong>um (Shannon, 1993) on the problem ofbest differentiating between two types of pulses of known shape, representedby vectors x 0 <strong>and</strong> x 1 , given that one of them has been transmitted over anoisy channel. This is a pattern recognition problem. It is assumed that thenoise is Gaussian with probability densityP (n) =[ ( )] A 1/2 (det exp − 1 )2π2 nT An , (11.14)x 0where A is the inverse of the variance–covariance matrix of the noise, a symmetric<strong>and</strong> positive-definite matrix. (If A is a multiple of the identity matrix,I/σ 2 , then the noise is ‘white’. For more general A, the noise is ‘coloured’.)The probability of the received vector y given that the source signal was s(either zero or one) is thenP (y | s) =[ ( )] A 1/2 (det exp − 1 )2π2 (y − x s) T A(y − x s ) . (11.15)x 1yFigure 11.2. Two pulses x 0 <strong>and</strong>x 1 , represented as 31-dimensionalvectors, <strong>and</strong> a noisy version of oneof them, y.The optimal detector is based on the posterior probability ratio:P (s = 1 | y) P (y | s = 1) P (s = 1)=P (s = 0 | y) P (y | s = 0) P (s = 0)= exp(− 1 2 (y − x 1) T A(y − x 1 ) + 1 2 (y − x 0) T A(y − x 0 ) + ln(11.16))P (s = 1)P (s = 0)= exp (y T A(x 1 − x 0 ) + θ) , (11.17)where θ is a constant independent of the received vector y,θ = − 1 2 xT 1 Ax 1 + 1 2 xT 0 Ax 0 + lnP (s = 1)P (s = 0) . (11.18)If the detector is forced to make a decision (i.e., guess either s = 1 or s = 0) thenthe decision that minimizes the probability of error is to guess the most probablehypothesis. We can write the optimal decision in terms of a discriminantfunction:a(y) ≡ y T A(x 1 − x 0 ) + θ (11.19)with the decisionsa(y) > 0 → guess s = 1a(y) < 0 → guess s = 0a(y) = 0 → guess either.Notice that a(y) is a linear function of the received vector,(11.20)wFigure 11.3. The weight vectorw ∝ x 1 − x 0 that is used todiscriminate between x 0 <strong>and</strong> x 1 .a(y) = w T y + θ, (11.21)where w ≡ A(x 1 − x 0 ).11.3 Capacity of Gaussian channelUntil now we have measured the joint, marginal, <strong>and</strong> conditional entropyof discrete variables only. In order to define the information conveyed bycontinuous variables, there are two issues we must address – the infinite lengthof the real line, <strong>and</strong> the infinite precision of real numbers.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!