03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

106 ” DOM Extension<br />

// Also returns a DOMNodeList <strong>with</strong> only the body node<br />

$list = $xpath->query(’//body’);<br />

?><br />

• In the first two examples, note that the root element (html) is referenced in the<br />

expression even though it is assumed <strong>to</strong> be the context node (since no other<br />

node is specified as the second parameter in either query call).<br />

• A single forward slash / indicates a parent-child relationship. /html/body<br />

addresses all body nodes that are children the document’s root html element<br />

(which in this case only amounts <strong>to</strong> a single result).<br />

• A double forward slash // indicates an ances<strong>to</strong>r-descendant relationship.<br />

//body addresses all body nodes that are descendants of the context node<br />

(which again only amounts <strong>to</strong> a single result).<br />

The single and double forward slash opera<strong>to</strong>rs can be used multiple times and in<br />

combination <strong>with</strong> each other as shown below.<br />

<br />

i<br />

N a m e s p a c e s<br />

If you attempt <strong>to</strong> address nodes by their element name and receive no results when it<br />

appears you should, it’s possible that the document is namespacing nodes. The easiest<br />

way <strong>to</strong> get around this is <strong>to</strong> replace the element name <strong>with</strong> a condition.<br />

F o r example, if you are using the expression ul, an equivalent expression that<br />

disregards the namespace would be *[name()=“ul”] where * is a wildcard for all nodes<br />

and the name function compares the node name against a given value.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!