03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

94 ” T i d y Extension<br />

There are two types of issues <strong>to</strong> check for when using tidy for web scraping analysis:<br />

warnings and errors. Like their <strong>PHP</strong> counterparts, warnings are non-fatal and<br />

generally have some sort of au<strong>to</strong>mated response that tidy executes <strong>to</strong> handle them.<br />

Errors are not necessarily fatal, but do indicate that tidy may have no way <strong>to</strong> handle<br />

a particular issue.<br />

All issues are s<strong>to</strong>red in an error buffer regardless of their type. Accessing information<br />

in and about this buffer is one area in which the procedural and object-oriented<br />

APIs for the tidy extension differ.<br />

<br />

N o t e that errorBuffer is a property of the $tidy object, not a method. Also note<br />

the slight difference in naming conventions between the procedural function and<br />

the object property, versus the consistency held throughout most other areas of the<br />

APIs.<br />

The error buffer contained <strong>with</strong>in a string is in and of itself mostly useless. Below<br />

is a code sample derived from a user contributed comment on the <strong>PHP</strong> manual page<br />

for the tidy_get_error_buffer function. This parses individual components of each<br />

issue in<strong>to</strong> arrays where they are more easily accessible.<br />

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!