03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

104 ” DOM Extension<br />

// A slightly more specific way (better if there are multiple lists)<br />

if ($list = $doc->getElementById(’thelist’)) {<br />

$listItems = $list->getElementsByTagName(’li’);<br />

}<br />

// Yet another way if the list doesn’t have an id<br />

$lists = $doc->getElementsByTagName(’ul’);<br />

if ($lists->length) {<br />

$list = $lists->item(0);<br />

$listItems = $list->getElementsByTagName(’li’);<br />

}<br />

// Outputs "thelist" (<strong>with</strong>out quotes)<br />

echo $list->getAttribute(’id’);<br />

// Outputs "Foo" on one line, then "Bar" on another<br />

foreach ($listItems as $listItem) {<br />

echo $listItem->nodeValue, <strong>PHP</strong>_EOL;<br />

}<br />

// Outputs text content inside and <br />

echo $list->nodeValue;<br />

?><br />

XPath and DOMXPath<br />

Somewhat similar <strong>to</strong> the way that regular expressions allow instances of character<br />

patterns <strong>to</strong> be found <strong>with</strong>in strings, XPath allows instances of node patterns <strong>to</strong> be<br />

found <strong>with</strong>in XML-compatible documents. Both technologies accomplish their purpose<br />

by providing a syntax comprised of meta-characters <strong>to</strong> express these patterns<br />

in a concise and succinct way. With the DOM extension, support for version 1.0 of<br />

the XPath standard is implemented as the DOMXPath class.<br />

The DOMXPath construc<strong>to</strong>r has a single required parameter: an existing DOMDocument<br />

instance on which queries will be performed. DOMXPath has two other relevant methods:<br />

evaluate and query. Both accept a string containing an XPath expression <strong>with</strong><br />

which <strong>to</strong> query the document as their first parameter.<br />

Optionally, a DOMNode instance associated <strong>with</strong> the document may be passed in as<br />

the second parameter ($contextNode) for either method. When specified, that node

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!