12.07.2015 Views

Think Python - Denison University

Think Python - Denison University

Think Python - Denison University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

13.9. Datastructures 131In this text, the phrase “half the” is always followed by the word “bee,” but the phrase “the bee”might be followed by either “has” or“is”.The result of Markov analysis is a mapping from each prefix (like “half the” and “the bee”) to allpossiblesuffixes (like“has” and “is”).Given this mapping, you can generate a random text by starting with any prefix and choosing atrandom from the possible suffixes. Next, you can combine the end of the prefix and the new suffixtoform thenext prefix, and repeat.For example, if you start with the prefix “Half a,” then the next word has to be “bee,” becausethe prefix only appears once in the text. The next prefix is “a bee,” so the next suffix might be“philosophically,” “be” or “due.”In this example the length of the prefix is always two, but you can do Markov analysis with anyprefix length. The length of theprefix iscalled the “order” of theanalysis.Exercise 13.8 Markov analysis:1. Write a program to read a text from a file and perform Markov analysis. The result should bea dictionary that maps from prefixes to a collection of possible suffixes. The collection mightbe a list, tuple, or dictionary; it is up to you to make an appropriate choice. You can test yourprogramwithprefixlengthtwo,butyoushouldwritetheprograminawaythatmakesiteasytotryother lengths.2. AddafunctiontothepreviousprogramtogeneraterandomtextbasedontheMarkovanalysis.Here isan example from Emma withprefix length 2:He was very clever, be it sweetness or be angry, ashamed or only amused, at sucha stroke. She had never thought of Hannah till you were never meant for me?” ”Icannot make speeches, Emma:” hesoon cut itallhimself.Forthisexample,Ileftthepunctuationattachedtothewords. Theresultisalmostsyntacticallycorrect, but not quite. Semantically, italmost makes sense, but not quite.What happens ifyou increase the prefix length? Does therandom text make more sense?3. Once your program is working, you might want to try a mash-up: if you analyze text fromtwo or more books, the random text you generate will blend the vocabulary and phrases fromthe sources ininterestingways.13.9 DatastructuresUsing Markov analysis to generate random text is fun, but there is also a point to this exercise: datastructureselection. Inyour solutiontotheprevious exercises, you had tochoose:• How torepresent theprefixes.• How torepresent thecollection of possiblesuffixes.• How torepresent themapping fromeach prefix tothecollection of possiblesuffixes.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!