php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
104 ” DOM Extension<br />
// A slightly more specific way (better if there are multiple lists)<br />
if ($list = $doc->getElementById(’thelist’)) {<br />
$listItems = $list->getElementsByTagName(’li’);<br />
}<br />
// Yet another way if the list doesn’t have an id<br />
$lists = $doc->getElementsByTagName(’ul’);<br />
if ($lists->length) {<br />
$list = $lists->item(0);<br />
$listItems = $list->getElementsByTagName(’li’);<br />
}<br />
// Outputs "thelist" (<strong>with</strong>out quotes)<br />
echo $list->getAttribute(’id’);<br />
// Outputs "Foo" on one line, then "Bar" on another<br />
foreach ($listItems as $listItem) {<br />
echo $listItem->nodeValue, <strong>PHP</strong>_EOL;<br />
}<br />
// Outputs text content inside and <br />
echo $list->nodeValue;<br />
?><br />
XPath and DOMXPath<br />
Somewhat similar <strong>to</strong> the way that regular expressions allow instances of character<br />
patterns <strong>to</strong> be found <strong>with</strong>in strings, XPath allows instances of node patterns <strong>to</strong> be<br />
found <strong>with</strong>in XML-compatible documents. Both technologies accomplish their purpose<br />
by providing a syntax comprised of meta-characters <strong>to</strong> express these patterns<br />
in a concise and succinct way. With the DOM extension, support for version 1.0 of<br />
the XPath standard is implemented as the DOMXPath class.<br />
The DOMXPath construc<strong>to</strong>r has a single required parameter: an existing DOMDocument<br />
instance on which queries will be performed. DOMXPath has two other relevant methods:<br />
evaluate and query. Both accept a string containing an XPath expression <strong>with</strong><br />
which <strong>to</strong> query the document as their first parameter.<br />
Optionally, a DOMNode instance associated <strong>with</strong> the document may be passed in as<br />
the second parameter ($contextNode) for either method. When specified, that node