03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

152 ” PCRE Extension<br />

Ranges are respective <strong>to</strong> ASCII (American St a n d a rd Code for Information Interchange).<br />

In other words, the ASCII value for the beginning character must precede<br />

the ASCII value for the ending character. Otherwise, the warning “Warning:<br />

preg_match(): Compilation failed: range out of order in character class at offset n” is<br />

emitted, where n is character offset <strong>with</strong>in the regular expression.<br />

Within square brackets, single characters and special ranges are simply listed side<br />

by side <strong>with</strong> no delimiter, as shown in the second example above. Additionally, the<br />

escape sequences mentioned earlier such as \w can be used both inside and outside<br />

square brackets.<br />

i<br />

ASCII Ranges<br />

F o r an excellent ASCII lookup table, see http://www.asciitable.com.<br />

There are two other noteworthy points about character ranges, as illustrated in the<br />

examples below.<br />

<br />

• T ouse a literal ] character in a character range, escape it in the same manner<br />

in which other meta-characters are escaped.<br />

• T onegate a character range, use ˆ as the first character in that character range.<br />

(Yes,this can be confusing since ˆ is also used <strong>to</strong> denote the beginning of a line<br />

or entire string when it is not used inside a character range.) N o t e that negation<br />

applies <strong>to</strong> all characters in the range. In other words, a negated character<br />

range means “ a n ycharacter that is not any of these characters.”

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!