php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
XMLReader Extension ” 125<br />
Element nodes can have attributes. When the itera<strong>to</strong>r points <strong>to</strong> an element<br />
node, the hasAttributes property indicates the presence of attributes and the<br />
getAttribute() method can be used <strong>to</strong> obtain an attribute value in the form of a<br />
string.<br />
The example below uses both of these <strong>to</strong>gether <strong>to</strong> parse data from an HTML table.<br />
localName == ’table’<br />
&& $doc->getAttribute(’id’) == ’thetable’) {<br />
$inTable = true;<br />
} elseif ($doc->localName == ’tr’ && $inTable) {<br />
$row = count($tableData);<br />
$tableData[$row] = array();<br />
} elseif ($doc->localName == ’td’ && $inTable) {<br />
$tableData[$row][] = $doc->readString();<br />
}<br />
break;<br />
case XMLREADER::END_ELEMENT:<br />
if ($doc->localName == ’table’ && $inTable) {<br />
$inTable = false;<br />
}<br />
break;<br />
}<br />
}<br />
?><br />
This showcases the main difference between pull parsers and tree parsers: the former<br />
have no concept of hierarchical context, only of the node <strong>to</strong> which the itera<strong>to</strong>r<br />
is currently pointing. As such, you must create your o w n indica<strong>to</strong>rs of context where<br />
they are needed.<br />
In this example, the node type is checked as nodes are read and any node that isn’t<br />
either an opening or closing element is ignored. If an opening element is encountered,<br />
its name ($doc->localName) is evaluated <strong>to</strong> confirm that it’s a table and its id<br />
attribute value ($doc->getAttribute(’id’)) is also examined <strong>to</strong> confirm that it has a<br />
value of ’thetable’ If so, . a flag variable $inTable is set <strong>to</strong> true. This is used <strong>to</strong> indi-