php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
94 ” T i d y Extension<br />
There are two types of issues <strong>to</strong> check for when using tidy for web scraping analysis:<br />
warnings and errors. Like their <strong>PHP</strong> counterparts, warnings are non-fatal and<br />
generally have some sort of au<strong>to</strong>mated response that tidy executes <strong>to</strong> handle them.<br />
Errors are not necessarily fatal, but do indicate that tidy may have no way <strong>to</strong> handle<br />
a particular issue.<br />
All issues are s<strong>to</strong>red in an error buffer regardless of their type. Accessing information<br />
in and about this buffer is one area in which the procedural and object-oriented<br />
APIs for the tidy extension differ.<br />
<br />
N o t e that errorBuffer is a property of the $tidy object, not a method. Also note<br />
the slight difference in naming conventions between the procedural function and<br />
the object property, versus the consistency held throughout most other areas of the<br />
APIs.<br />
The error buffer contained <strong>with</strong>in a string is in and of itself mostly useless. Below<br />
is a code sample derived from a user contributed comment on the <strong>PHP</strong> manual page<br />
for the tidy_get_error_buffer function. This parses individual components of each<br />
issue in<strong>to</strong> arrays where they are more easily accessible.<br />