php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
PCRE Extension ” 153<br />
• T ouse a literal ˆ character in a character range, either escape it in the same<br />
manner in which other meta-characters are escaped or do not use it as the<br />
first or only character in the range.<br />
i<br />
ctype Extension<br />
Some simple patterns have equivalent functions available in the ctype library. These<br />
generally perform better and should be used ov e r PCRE when appropriate. See<br />
http://php.net/ctype for more information on the ctype extension and the functions<br />
it offers.<br />
M o d i fi e r s<br />
The reason for having pattern delimiters <strong>to</strong> denote the start and end of a pattern<br />
is that the pattern precedes modifiers that affect the matching behavior of metacharacters.<br />
H e r e are a few modifiers that may prove useful in web scraping applications.<br />
• i: Any letters in the pattern will match both uppercase and lowercase regardless<br />
of the case of the letter used in the pattern.<br />
• m: ˆ and $ will match the beginning and ends of lines <strong>with</strong>in the string (delimited<br />
by line feed characters) rather than the beginning and end of the entire<br />
string.<br />
• s (lowercase): The . meta-character will match line feeds, which it does not by<br />
default.<br />
• S (uppercase): Additional time will be spent <strong>to</strong> analyze the pattern in order<br />
<strong>to</strong> speed up subsequent matches <strong>with</strong> that pattern. U s e f u lfor patterns used<br />
multiple times.<br />
• U: By default, the quantifiers * and + behave in a manner referred <strong>to</strong> as “ g r e e d y . ”<br />
That is, they match as many characters as possible rather than as few as possible.<br />
This modifier forces the latter behavior.