12.07.2015 Views

Think Python - Denison University

Think Python - Denison University

Think Python - Denison University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

134 Chapter 13. Casestudy: datastructure selectionoverride: To replace adefault value withan argument.benchmarking: Theprocessofchoosingbetweendatastructuresbyimplementingalternativesandtestingthem on asample ofthe possibleinputs.13.12 ExercisesExercise 13.9 The “rank” of a word is its position in a list of words sorted by frequency: the mostcommon word has rank 1, thesecond most common has rank 2, etc.Zipf’slawdescribesarelationshipbetweentheranksandfrequenciesofwordsinnaturallanguages 2 .Specifically, itpredicts that the frequency, f,of theword withrankris:f =cr −swhere s and c are parameters that depend on the language and the text. If you take the logarithm ofboth sides ofthis equation, you get:log f =logc−slogrSo ifyou plot log f versus logr,you should get astraight linewithslope−sand intercept logc.Write a program that reads a text from a file, counts word frequencies, and prints one line for eachword, in descending order of frequency, with log f and logr. Use the graphing program of yourchoicetoplottheresultsandcheckwhethertheyformastraightline. Canyouestimatethevalueofs?2 Seewikipedia.org/wiki/Zipf’s_law.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!