03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

PCRE Extension ” 151<br />

The first way involves using the . meta-character, which will match any single<br />

character except a line feed (“\n”) <strong>with</strong>out use of modifiers (which will be covered<br />

later). This can be used <strong>with</strong> repetition just like any other character.<br />

The second way requires using special escape sequences that represent a range<br />

of characters. Aside from the escape sequences mentioned in the previous section’s<br />

examples, here are some that are commonly used.<br />

• \d: a digit, 0 through 9.<br />

• \h: a horizontal whitespace character, such as a space or a tab.<br />

• \v: a vertical whitespace character, such as a carriage return or line feed.<br />

• \s: any whitespace character, the equivalent of all characters represented by<br />

\h and \v.<br />

• \w: any letter or digit or an underscore.<br />

Each of these escape sequences has a complement.<br />

• \D: a non-digit character.<br />

• \H: a non-horizontal whitespace character.<br />

• \V: a non-vertical whitespace character.<br />

• \S: a non-whitespace character.<br />

• \W: a character that is not a letter, digit, or underscore.<br />

The third and final way involves using character ranges, which are characters <strong>with</strong>in<br />

square brackets ([ and ]). A character range represents a single character, but like<br />

normal single characters they can have repetition applied <strong>to</strong> them.<br />

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!