15.04.2013 Views

Core Python Programming (2nd Edition)

Core Python Programming (2nd Edition)

Core Python Programming (2nd Edition)

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

domain. After the final double-colon, we put together a random integer string using the original time<br />

chosen (for the date string), followed by the lengths of the login and domain names, all separated by a<br />

single hyphen.<br />

15.4.1. Matching a String<br />

For the following exercises, create both permissive and restrictive versions of your REs. We recommend<br />

you test these REs in a short application that utilizes our sample redata.txt file above (or use your own<br />

generated data from running gendata.py). You will need to use it again when you do the exercises.<br />

To test the RE before putting it into our little application, we will import the re module and assign one<br />

sample line from redata.txt to a string variable data. These statements are constant across both<br />

illustrated examples.<br />

>>> import re<br />

>>> data = 'Thu Feb 15 17:46:04 2007::uzifzf@dpyivihw.gov::1171590364-6-8'<br />

In our first example, we will create a regular expression to extract (only) the days of the week from the<br />

timestamps from each line of the data file redata.txt. We will use the following RE:<br />

"^Mon|^Tue|^Wed|^Thu|^Fri|^Sat|^Sun"<br />

This example requires that the string start with ("^" RE operator) any of the seven strings listed. If we<br />

were to "translate" the above RE to English, it would read something like, "the string should start with<br />

"Mon," "Tue,"..., "Sat," or "Sun."<br />

Alternatively, we can bypass all the carat operators with a single carat if we group the day strings like<br />

this:<br />

"^(Mon|Tue|Wed|Thu|Fri|Sat|Sun)"<br />

The parentheses around the set of strings mean that one of these strings must be encountered for a<br />

match to succeed. This is a "friendlier" version of the original RE we came up with, which did not have<br />

the parentheses. Using our modified RE, we can take advantage of the fact that we can access the<br />

matched string as a subgroup:<br />

>>> patt = '^(Mon|Tue|Wed|Thu|Fri|Sat|Sun)'<br />

>>> m = re.match(patt, data)<br />

>>> m.group() # entire match<br />

'Thu'<br />

>>> m.group(1) # subgroup 1<br />

'Thu'<br />

>>> m.groups() # all subgroups<br />

('Thu',)<br />

This feature may not seem as revolutionary as we have made it out to be for this example, but it is<br />

definitely advantageous in the next example or anywhere you provide extra data as part of the RE to<br />

help in the string matching process, even though those characters may not be part of the string you are<br />

interested in.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!