12.07.2015 Views

Think Python - Denison University

Think Python - Denison University

Think Python - Denison University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 13Case study: data structure selection13.1 Word frequencyanalysisAs usual, you should at least attempt the following exercises before you read mysolutions.Exercise 13.1 Write a program that reads a file, breaks each line into words, strips whitespace andpunctuation fromthe words, and converts them tolowercase.Hint: Thestringmoduleprovidesstringsnamedwhitespace,whichcontainsspace,tab,newline,etc., andpunctuationwhich contains the punctuation characters. Let’s see if we can make <strong>Python</strong>swear:>>> import string>>> print string.punctuation!"#$%&'()*+,-./:;?@[\]ˆ_`{|}˜Also, you might consider using thestringmethodsstrip,replaceandtranslate.Exercise 13.2 Go to Project Gutenberg (gutenberg.org) and download your favorite out-ofcopyrightbook inplain text format.Modify your program from the previous exercise to read the book you downloaded, skip over theheader information atthe beginning of thefile, and process therestof the words as before.Then modify the program to count the total number of words in the book, and the number of timeseach wordisused.Printthenumberofdifferentwordsusedinthebook. Comparedifferentbooksbydifferentauthors,writtenindifferent eras. Which author uses themost extensive vocabulary?Exercise 13.3 Modify the program from the previous exercise to print the 20 most frequently-usedwords inthebook.Exercise 13.4 Modify the previous program to read a word list (see Section 9.1) and then print allthewordsinthebookthatarenotinthewordlist. Howmanyofthemaretypos? Howmanyofthemarecommon words that should beinthe wordlist,and how many ofthem arereally obscure?

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!