php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
106 ” DOM Extension<br />
// Also returns a DOMNodeList <strong>with</strong> only the body node<br />
$list = $xpath->query(’//body’);<br />
?><br />
• In the first two examples, note that the root element (html) is referenced in the<br />
expression even though it is assumed <strong>to</strong> be the context node (since no other<br />
node is specified as the second parameter in either query call).<br />
• A single forward slash / indicates a parent-child relationship. /html/body<br />
addresses all body nodes that are children the document’s root html element<br />
(which in this case only amounts <strong>to</strong> a single result).<br />
• A double forward slash // indicates an ances<strong>to</strong>r-descendant relationship.<br />
//body addresses all body nodes that are descendants of the context node<br />
(which again only amounts <strong>to</strong> a single result).<br />
The single and double forward slash opera<strong>to</strong>rs can be used multiple times and in<br />
combination <strong>with</strong> each other as shown below.<br />
<br />
i<br />
N a m e s p a c e s<br />
If you attempt <strong>to</strong> address nodes by their element name and receive no results when it<br />
appears you should, it’s possible that the document is namespacing nodes. The easiest<br />
way <strong>to</strong> get around this is <strong>to</strong> replace the element name <strong>with</strong> a condition.<br />
F o r example, if you are using the expression ul, an equivalent expression that<br />
disregards the namespace would be *[name()=“ul”] where * is a wildcard for all nodes<br />
and the name function compares the node name against a given value.