10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.88 4 — The Source Coding TheoremWe now consider a thermal distribution (the canonical ensemble), wherethe probability of a state x isP (x) = 1 (Z exp − E(x) ). (4.50)k B TWith this canonical ensemble we can associate a corresponding microcanonicalensemble, an ensemble with total energy fixed to the mean energy of thecanonical ensemble (fixed to within some precision ɛ). Now, fixing the totalenergy to a precision ɛ is equivalent to fixing the value of ln 1/ P (x) to withinɛk B T . Our definition of the typical set T Nβ was precisely that it consistedof all elements that have a value of log P (x) very close to the mean value oflog P (x) under the canonical ensemble, −NH(X). Thus the microcanonicalensemble is equivalent to a uniform distribution over the typical set of thecanonical ensemble.Our proof of the ‘asymptotic equipartition’ principle thus proves – for thecase of a system whose energy is separable into a sum of independent terms– that the Boltzmann entropy of the microcanonical ensemble is very close(for large N) to the Gibbs entropy of the canonical ensemble, if the energy ofthe microcanonical ensemble is constrained to equal the mean energy of thecanonical ensemble.Solution to exercise 4.18 (p.85). The normalizing constant of the Cauchy distributionP (x) = 1 1Z x 2 + 1is∫ ∞1Z = dxx 2 + 1 = [ tan −1 x ] ∞−∞ = π 2 − −π = π. (4.51)2−∞The mean <strong>and</strong> variance of this distribution are both undefined. (The distributionis symmetrical about zero, but this does not imply that its mean is zero.The mean is the value of a divergent integral.) The sum z = x 1 + x 2 , wherex 1 <strong>and</strong> x 2 both have Cauchy distributions, has probability density given bythe convolutionP (z) = 1 π 2 ∫ ∞−∞dx 11x 2 1 + 1 1(z − x 1 ) 2 + 1 , (4.52)which after a considerable labour using st<strong>and</strong>ard methods givesP (z) = 1 π 2 2 πz 2 + 4 = 2 1π z 2 + 2 2 , (4.53)which we recognize as a Cauchy distribution with width parameter 2 (wherethe original distribution has width parameter 1). This implies that the meanof the two points, ¯x = (x 1 + x 2 )/2 = z/2, has a Cauchy distribution withwidth parameter 1. Generalizing, the mean of N samples from a Cauchydistribution is Cauchy-distributed with the same parameters as the individualsamples. The probability distribution of the mean does not become narroweras 1/ √ N.The central-limit theorem does not apply to the Cauchy distribution, becauseit does not have a finite variance.An alternative neat method for getting to equation (4.53) makes use of theFourier transform of the Cauchy distribution, which is a biexponential e −|ω| .Convolution in real space corresponds to multiplication in Fourier space, sothe Fourier transform of z is simply e −|2ω| . Reversing the transform, we obtainequation (4.53).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!