php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 12<br />
XMLReader Extension<br />
The previous two chapters have covered two available XML extensions that implement<br />
tree parsers. This chapter will focus on the XMLReader extension, which implements<br />
a pull parser.<br />
As mentioned in the chapter on the DOM extension, pull parsers differ from tree<br />
parsers in that they read documents in a piecewise fashion rather than loading them<br />
in<strong>to</strong> memory all at once. A consequence of this is that pull parsers generally only<br />
traverse documents once in one direction and leave you <strong>to</strong> collect whatever data is<br />
relevant <strong>to</strong> you along the way.<br />
Before getting started, a noteworthy point is that XMLReader’s underlying library,<br />
libxml, uses UTF-8 encoding internally. As such, encoding issues will be mitigated<br />
if any document you imported (particularly one that’s been cleaned using the tidy<br />
extension) is encoded appropriately <strong>to</strong> avoid issues <strong>with</strong> conflicting encodings.<br />
i<br />
XML P a r s e r<br />
The XML P a r s e r extension, as it is referred <strong>to</strong> in the <strong>PHP</strong> manual, is a predecessor<br />
of XMLReader and an alternative for <strong>PHP</strong> 4 environments. Its API is oriented <strong>to</strong> a<br />
more event-driven style of programming as opposed <strong>to</strong> the iterative orientation of<br />
the XMLReader extension. F o r more information on the XML P a r s e r extension, see<br />
http://php.net/manual/en/book.xml.php.