12.07.2015 Views

Think Python - Denison University

Think Python - Denison University

Think Python - Denison University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

14.11. Glossary 14314.11 Glossarypersistent: Pertaining to a program that runs indefinitely and keeps at least some of its data inpermanent storage.format operator: An operator, %, that takes a format string and a tuple and generates a string thatincludes theelements of thetuple formattedas specified by theformat string.format string: A string,used withtheformat operator, that contains format sequences.format sequence: A sequence of characters in a format string, like %d, that specifies how a valueshould be formatted.text file: A sequence of characters storedinpermanent storagelikeaharddrive.directory: A named collection of files, alsocalled afolder.path: A stringthat identifies a file.relative path: A path that startsfromthe current directory.absolute path: A path that startsfromthe topmost directory inthefile system.catch: Toprevent anexception fromterminatingaprogram usingthetryandexceptstatements.database: Afilewhosecontentsareorganizedlikeadictionarywithkeysthatcorrespondtovalues.14.12 ExercisesExercise 14.5 The urllib module provides methods for manipulating URLs and downloadinginformation from the web. The following example downloads and prints a secret message fromthinkpython.com:import urllibconn = urllib.urlopen('http://thinkpython.com/secret.html')for line in conn.fp:print line.strip()Run thiscode and follow the instructions you seethere.Exercise14.6 InalargecollectionofMP3files,theremaybemorethanonecopyofthesamesong,stored in different directories or with different file names. The goal of this exercise is to search forthese duplicates.1. Writeaprogramthatsearchesadirectoryandallofitssubdirectories,recursively,andreturnsa list of complete paths for all files with a given suffix (like .mp3). Hint: os.path providesseveral useful functions formanipulating file and path names.2. To recognize duplicates, you can use a hash function that reads the file and generates ashort summary of the contents. For example, MD5 (Message-Digest algorithm 5) takes anarbitrarily-long “message” and returns a 128-bit “checksum.” The probability is very smallthat twofiles withdifferent contents willreturnthe samechecksum.You can read about MD5 at wikipedia.org/wiki/Md5. On a Unix system you can use theprogrammd5sumand apipe tocompute checksums from <strong>Python</strong>.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!