05.05.2013 Views

Programming PHP

Programming PHP

Programming PHP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

These three kinds of patterns can be combined in countless ways, to create regular<br />

expressions that match such things as valid phone numbers and URLs.<br />

Character Classes<br />

To specify a set of acceptable characters in your pattern, you can either build a character<br />

class yourself or use a predefined one. You can build your own character class<br />

by enclosing the acceptable characters in square brackets:<br />

ereg('c[aeiou]t', 'I cut my hand'); // returns true<br />

ereg('c[aeiou]t', 'This crusty cat'); // returns true<br />

ereg('c[aeiou]t', 'What cart?'); // returns false<br />

ereg('c[aeiou]t', '14ct gold'); // returns false<br />

The regular expression engine finds a "c", then checks that the next character is one<br />

of "a", "e", "i", "o", or"u". If it isn’t a vowel, the match fails and the engine goes<br />

back to looking for another "c". If a vowel is found, though, the engine then checks<br />

that the next character is a "t". If it is, the engine is at the end of the match and so<br />

returns true. If the next character isn’t a "t", the engine goes back to looking for<br />

another "c".<br />

You can negate a character class with a caret (^) at the start:<br />

ereg('c[^aeiou]t', 'I cut my hand'); // returns false<br />

ereg('c[^aeiou]t', 'Reboot chthon'); // returns true<br />

ereg('c[^aeiou]t', '14ct gold'); // returns false<br />

In this case, the regular expression engine is looking for a "c", followed by a character<br />

that isn’t a vowel, followed by a "t".<br />

You can define a range of characters with a hyphen (-). This simplifies character<br />

classes like “all letters” and “all digits”:<br />

ereg('[0-9]%', 'we are 25% complete'); // returns true<br />

ereg('[0123456789]%', 'we are 25% complete'); // returns true<br />

ereg('[a-z]t', '11th'); // returns false<br />

ereg('[a-z]t', 'cat'); // returns true<br />

ereg('[a-z]t', 'PIT'); // returns false<br />

ereg('[a-zA-Z]!', '11!'); // returns false<br />

ereg('[a-zA-Z]!', 'stop!'); // returns true<br />

When you are specifying a character class, some special characters lose their meaning,<br />

while others take on new meaning. In particular, the $ anchor and the period<br />

lose their meaning in a character class, while the ^ character is no longer an anchor<br />

but negates the character class if it is the first character after the open bracket. For<br />

instance, [^\]] matches any character that is not a closing bracket, while [$.^]<br />

matches any dollar sign, period, or caret.<br />

The various regular expression libraries define shortcuts for character classes, including<br />

digits, alphabetic characters, and whitespace. The actual syntax for these shortcuts<br />

differs between POSIX-style and Perl-style regular expressions. For instance, with<br />

POSIX, the whitespace character class is "[[:space:]]", while with Perl it is "\s".<br />

This is the Title of the Book, eMatter Edition<br />

Copyright © 2002 O’Reilly & Associates, Inc. All rights reserved.<br />

Regular Expressions | 97

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!