10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.18.1: Crosswords 261We’ll find that the conclusions we come to depend on the value of H W <strong>and</strong>are not terribly sensitive to the value of L. Consider a large crossword of sizeS squares in area. Let the number of words be f w S <strong>and</strong> let the number ofletter-occupied squares be f 1 S. For typical crosswords of types A <strong>and</strong> B madeof words of length L, the two fractions f w <strong>and</strong> f 1 have roughly the values intable 18.2.We now estimate how many crosswords there are of size S using our simplemodel of Wenglish. We assume that Wenglish is created at r<strong>and</strong>om by generatingW strings from a monogram (i.e., memoryless) source with entropy H 0 .If, for example, the source used all A = 26 characters with equal probabilitythen H 0 = log 2 A = 4.7 bits. If instead we use Chapter 2’s distribution thenthe entropy is 4.2. The redundancy of Wenglish stems from two sources: ittends to use some letters more than others; <strong>and</strong> there are only W words inthe dictionary.Let’s now count how many crosswords there are by imagining filling inthe squares of a crossword at r<strong>and</strong>om using the same distribution that producedthe Wenglish dictionary <strong>and</strong> evaluating the probability that this r<strong>and</strong>omscribbling produces valid words in all rows <strong>and</strong> columns. The total number oftypical fillings-in of the f 1 S squares in the crossword that can be made isA2f wL + 1Lf 1L + 1B1L + 13 L4 L + 1Table 18.2. Factors f w <strong>and</strong> f 1 bywhich the number of words <strong>and</strong>number of letter-squaresrespectively are smaller than thetotal number of squares.|T | = 2 f1SH0 . (18.2)The probability that one word of length L is validly filled-in isβ =W , (18.3)LH02<strong>and</strong> the probability that the whole crossword, made of f w S words, is validlyfilled-in by a single typical in-filling is approximatelyβ fwS . (18.4)So the log of the number of valid crosswords of size S is estimated to belog β fwS |T | = S [(f 1 − f w L)H 0 + f w log W ] (18.5)which is an increasing function of S only if= S [(f 1 − f w L)H 0 + f w (L + 1)H W ] , (18.6)This calculation underestimatesthe number of valid Wenglishcrosswords by counting onlycrosswords filled with ‘typical’strings. If the monogramdistribution is non-uniform thenthe true count is dominated by‘atypical’ fillings-in, in whichcrossword-friendly words appearmore often.(f 1 − f w L)H 0 + f w (L + 1)H W > 0. (18.7)So arbitrarily many crosswords can be made only if there’s enough words inthe Wenglish dictionary thatH W > (f wL − f 1 )f w (L + 1) H 0. (18.8)Plugging in the values of f 1 <strong>and</strong> f w from table 18.2, we find the following.Crossword type A BCondition for crosswords H W > 1 2 L+1 H 0 H W > 1 4LLL+1 H 0If we set H 0 = 4.2 bits <strong>and</strong> assume there are W = 4000 words in a normalEnglish-speaker’s dictionary, all with length L = 5, then we find that thecondition for crosswords of type B is satisfied, but the condition for crosswordsof type A is only just satisfied. This fits with my experience that crosswordsof type A usually contain more obscure words.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!