12.07.2015 Views

Think Python - Denison University

Think Python - Denison University

Think Python - Denison University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

132 Chapter 13. Casestudy: datastructure selectionOk, the last one is the easy; the only mapping type we have seen is a dictionary, so it is the naturalchoice.For the prefixes, the most obvious options are string, list of strings, or tuple of strings. For thesuffixes, one option isalist;another isahistogram (dictionary).How should you choose? The first step is to think about the operations you will need to implementforeachdatastructure. Fortheprefixes,weneedtobeabletoremovewordsfromthebeginningandadd to the end. For example, if the current prefix is “Half a,” and the next word is “bee,” you needtobe able toformthe next prefix, “a bee.”Your first choice might be a list, since it is easy to add and remove elements, but we also need to beable to use the prefixes as keys in a dictionary, so that rules out lists. With tuples, you can’t appendor remove, but you can usetheaddition operator toformanew tuple:def shift(prefix, word):return prefix[1:] + (word,)shift takes a tuple of words, prefix, and a string, word, and forms a new tuple that has all thewords inprefixexcept the first,andwordadded totheend.For the collection of suffixes, the operations we need to perform include adding a new suffix (orincreasing the frequency of an existing one), and choosing a random suffix.Addinganewsuffixisequallyeasyforthelistimplementationorthehistogram. Choosingarandomelementfromalistiseasy;choosingfromahistogramishardertodoefficiently(seeExercise13.7).So far we have been talking mostly about ease of implementation, but there are other factors toconsider in choosing data structures. One is run time. Sometimes there is a theoretical reason toexpect one data structure to be faster than other; for example, I mentioned that the in operator isfasterfor dictionaries than for lists,at leastwhen thenumber ofelements islarge.But often you don’t know ahead of time which implementation will be faster. One option is toimplementbothofthemandseewhichisbetter. Thisapproachiscalledbenchmarking. Apracticalalternativeistochoosethedatastructurethatiseasiesttoimplement,andthenseeifitisfastenoughfortheintendedapplication. Ifso,thereisnoneedtogoon. Ifnot,therearetools,liketheprofilemodule, that can identifythe places inaprogram that take themost time.The other factor to consider is storage space. For example, using a histogram for the collection ofsuffixes might take less space because you only have to store each word once, no matter how manytimes it appears in the text. In some cases, saving space can also make your program run faster,and in the extreme, your program might not run at all if you run out of memory. But for manyapplications, space isasecondary consideration after runtime.One final thought: in this discussion, I have implied that we should use one data structure for bothanalysis and generation. But since these are separate phases, it would also be possible to use onestructure for analysis and then convert to another structure for generation. This would be a net winifthe timesaved during generation exceeded the timespent inconversion.13.10 DebuggingWhenyouaredebuggingaprogram,andespeciallyifyouareworkingonahardbug,therearefourthings totry:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!