05.05.2013 Views

Programming PHP

Programming PHP

Programming PHP

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

egular expression features. Perl regular expressions include the POSIX classes and<br />

anchors described earlier. A POSIX-style character class in a Perl regular expression<br />

works and understands non-English characters using the Unix locale system. Perl<br />

regular expressions act on arbitrary binary data, so you can safely match with patterns<br />

or strings that contain the NUL-byte (\x00).<br />

Delimiters<br />

Perl-style regular expressions emulate the Perl syntax for patterns, which means that<br />

each pattern must be enclosed in a pair of delimiters. Traditionally, the slash (/)<br />

character is used; for example, /pattern/. However, any nonalphanumeric character<br />

other than the backslash character (\) can be used to delimit a Perl-style pattern.<br />

This is useful when matching strings containing slashes, such as filenames. For<br />

example, the following are equivalent:<br />

preg_match('/\/usr\/local\//', '/usr/local/bin/perl'); // returns true<br />

preg_match('#/usr/local/#', '/usr/local/bin/perl'); // returns true<br />

Parentheses (()), curly braces ({}), square brackets ([]), and angle brackets () can<br />

be used as pattern delimiters:<br />

preg_match('{/usr/local/}', '/usr/local/bin/perl'); // returns true<br />

The later section on “Trailing Options” discusses the single-character modifiers you<br />

can put after the closing delimiter to modify the behavior of the regular expression<br />

engine. A very useful one is x, which makes the regular expression engine strip<br />

whitespace and #-marked comments from the regular expression before matching.<br />

These two patterns are the same, but one is much easier to read:<br />

'/([[:alpha:]]+)\s+\1/'<br />

'/( # start capture<br />

[[:alpha:]]+ # a word<br />

\s+ # whitespace<br />

\1 # the same word again<br />

) # end capture<br />

/x'<br />

Match Behavior<br />

While Perl’s regular expression syntax includes the POSIX constructs we talked<br />

about earlier, some pattern components have a different meaning in Perl. In particular,<br />

Perl’s regular expressions are optimized for matching against single lines of text<br />

(although there are options that change this behavior).<br />

The period (.) matches any character except for a newline (\n). The dollar sign ($)<br />

matches at the end of the string or, if the string ends with a newline, just before that<br />

newline:<br />

preg_match('/is (.*)$/', "the key is in my pants", $captured);<br />

// $captured[1] is 'in my pants'<br />

104 | Chapter 4: Strings<br />

This is the Title of the Book, eMatter Edition<br />

Copyright © 2002 O’Reilly & Associates, Inc. All rights reserved.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!