10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.308 22 — Maximum Likelihood <strong>and</strong> Clusteringwhere the functions f k (x) are given, <strong>and</strong> the parameters w = {w k } arenot known. A data set {x (n) } of N points is supplied.Show by differentiating the log likelihood that the maximum-likelihoodparameters w ML satisfy∑P (x | w ML )f k (x) = 1 ∑f k (x (n) ), (22.32)Nxwhere the left-h<strong>and</strong> sum is over all x, <strong>and</strong> the right-h<strong>and</strong> sum is over thedata points. A shorth<strong>and</strong> for this result is that each function-averageunder the fitted model must equal the function-average found in thedata:n〈f k 〉 P (x | wML ) = 〈f k〉 Data. (22.33)⊲ Exercise 22.13. [3 ] ‘Maximum entropy’ fitting of models to constraints.When confronted by a probability distribution P (x) about which only afew facts are known, the maximum entropy principle (maxent) offers arule for choosing a distribution that satisfies those constraints. Accordingto maxent, you should select the P (x) that maximizes the entropyH = ∑ xP (x) log 1/P (x), (22.34)subject to the constraints. Assuming the constraints assert that theaverages of certain functions f k (x) are known, i.e.,〈f k 〉 P (x)= F k , (22.35)show, by introducing Lagrange multipliers (one for each constraint, includingnormalization), that the maximum-entropy distribution has theform)P (x) Maxent = 1 Z exp ( ∑kw k f k (x), (22.36)where the parameters Z <strong>and</strong> {w k } are set such that the constraints(22.35) are satisfied.And hence the maximum entropy method gives identical results to maximumlikelihood fitting of an exponential-family model (previous exercise).The maximum entropy method has sometimes been recommended as amethod for assigning prior distributions in Bayesian modelling. Whilethe outcomes of the maximum entropy method are sometimes interesting<strong>and</strong> thought-provoking, I do not advocate maxent as the approach toassigning priors.Maximum entropy is also sometimes proposed as a method for solvinginference problems – for example, ‘given that the mean score ofthis unfair six-sided die is 2.5, what is its probability distribution(p 1 , p 2 , p 3 , p 4 , p 5 , p 6 )?’ I think it is a bad idea to use maximum entropyin this way; it can give silly answers. The correct way to solve inferenceproblems is to use Bayes’ theorem.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!