03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

XMLReader Extension ” 125<br />

Element nodes can have attributes. When the itera<strong>to</strong>r points <strong>to</strong> an element<br />

node, the hasAttributes property indicates the presence of attributes and the<br />

getAttribute() method can be used <strong>to</strong> obtain an attribute value in the form of a<br />

string.<br />

The example below uses both of these <strong>to</strong>gether <strong>to</strong> parse data from an HTML table.<br />

localName == ’table’<br />

&& $doc->getAttribute(’id’) == ’thetable’) {<br />

$inTable = true;<br />

} elseif ($doc->localName == ’tr’ && $inTable) {<br />

$row = count($tableData);<br />

$tableData[$row] = array();<br />

} elseif ($doc->localName == ’td’ && $inTable) {<br />

$tableData[$row][] = $doc->readString();<br />

}<br />

break;<br />

case XMLREADER::END_ELEMENT:<br />

if ($doc->localName == ’table’ && $inTable) {<br />

$inTable = false;<br />

}<br />

break;<br />

}<br />

}<br />

?><br />

This showcases the main difference between pull parsers and tree parsers: the former<br />

have no concept of hierarchical context, only of the node <strong>to</strong> which the itera<strong>to</strong>r<br />

is currently pointing. As such, you must create your o w n indica<strong>to</strong>rs of context where<br />

they are needed.<br />

In this example, the node type is checked as nodes are read and any node that isn’t<br />

either an opening or closing element is ignored. If an opening element is encountered,<br />

its name ($doc->localName) is evaluated <strong>to</strong> confirm that it’s a table and its id<br />

attribute value ($doc->getAttribute(’id’)) is also examined <strong>to</strong> confirm that it has a<br />

value of ’thetable’ If so, . a flag variable $inTable is set <strong>to</strong> true. This is used <strong>to</strong> indi-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!