04.08.2014 Views

o_18ufhmfmq19t513t3lgmn5l1qa8a.pdf

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 10 ■ BATTERIES INCLUDED 243<br />

The easiest way to harness the power of re.sub is to use group numbers in the substitution<br />

string. Any escape sequences of the form '\\n' in the replacement string are replaced by the<br />

string matched by group n in the pattern. For example, let’s say you want to replace words of<br />

the form '*something*' with 'something', where the former is a normal way of<br />

expressing emphasis in plain text documents (such as e-mail), and the latter is the corresponding<br />

HTML code (as used in Web pages). Let’s first construct the regexp:<br />

>>> emphasis_pattern = r'\*([^\*]+)\*'<br />

Note that regular expressions can easily become hard to read, so using meaningful variable<br />

names (and possibly a comment or two) is important if anyone (including you!) is going to be<br />

able to read the code.<br />

■Tip One way to make your regular expressions more readable is to use the VERBOSE flag in the re functions.<br />

This allows you to add whitespace (space characters, tabs, newlines, and so on) to your pattern, which<br />

will be ignored by re—except when you put it in a character class or escape it with a backslash. You can also<br />

put comments in such verbose regexps. The following is a pattern object that is equivalent to the emphasis<br />

pattern, but which uses the VERBOSE flag:<br />

>>> emphasis_pattern = re.compile(r'''<br />

... \* # Beginning emphasis tag -- an asterisk<br />

... ( # Begin group for capturing phrase<br />

... [^\*]+ # Capture anything except asterisks<br />

... ) # End group<br />

... \* # Ending emphasis tag<br />

... ''', re.VERBOSE)<br />

...<br />

Now that I have my pattern, I can use re.sub to make my substitution:<br />

>>> re.sub(emphasis_pattern, r'\1', 'Hello, *world*!')<br />

'Hello, world!'<br />

As you can see, I have successfully translated the text from plain text to HTML.<br />

But you can make your substitutions even more powerful by using a function as the<br />

replacement. This function will be supplied with the MatchObject as its only parameter, and<br />

the string it returns will be used as the replacement. In other words, you can do whatever you<br />

want to the matched substring, and do elaborate processing to generate its replacement. What<br />

possible use could you have for such power, you ask? Once you start experimenting with regular<br />

expressions, you will surely find countless uses for this mechanism. For one application, see<br />

the “Examples” section that follows.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!