04.08.2014 Views

o_18ufhmfmq19t513t3lgmn5l1qa8a.pdf

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

246 CHAPTER 10 ■ BATTERIES INCLUDED<br />

$ python find_sender.py message.eml<br />

Foo Fie<br />

You should note the following about this program:<br />

• I compile the regular expression to make the processing more efficient.<br />

• I enclose the subpattern I want to extract in parentheses, making it a group.<br />

• I use a nongreedy pattern to match the name because I want to stop matching when I reach the first left<br />

angle bracket (or, rather, the space preceding it).<br />

• I use a dollar sign to indicate that I want the pattern to match the entire line, all the way to the end.<br />

• I use an if statement to make sure that I did in fact match something before I try to extract the match<br />

of a specific group.<br />

To list all the e-mail addresses mentioned in the headers, you need to construct a regular expression that matches<br />

an e-mail address but nothing else. You can then use the method findall to find all the occurrences in each line.<br />

To avoid duplicates, you keep the addresses in a set (described earlier in this chapter). Finally, you extract the keys,<br />

sort them, and print them out:<br />

import fileinput, re<br />

pat = re.compile(r'[a-z\-\.]+@[a-z\-\.]+', re.IGNORECASE)<br />

addresses = set()<br />

for line in fileinput.input():<br />

for address in pat.findall(line):<br />

addresses.add(address)<br />

for address in sorted(addresses):<br />

print address<br />

The resulting output when running this program (with the preceding e-mail message as input) is as follows:<br />

Mr.Gumby@bar.baz<br />

foo@bar.baz<br />

foo@baz.com<br />

magnus@bozz.floop<br />

Note that when sorting, uppercase letters come before lowercase letters.<br />

■Note I haven’t adhered strictly to the problem specification here. The problem was to find the addresses<br />

in the header, but in this case the program finds all the addresses in the entire file. To avoid that, you can<br />

call fileinput.close() if you find an empty line because the header can’t contain empty lines, and you<br />

would be finished. Alternatively, you can use fileinput.nextfile() to start processing the next file, if<br />

there is more than one.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!