php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
DOM Extension ” 109<br />
// Returns the first li child node of each ul node<br />
$list = $xpath->query(’//ul/li[1]’);<br />
// Returns all ul nodes containing an li node <strong>with</strong> the value "foobar"<br />
$list = $xpath->query(’//ul[li = "foobar"]’);<br />
?><br />
• Square brackets are used <strong>to</strong> delimit a conditional expression.<br />
• Element and attribute nodes are denoted the same way <strong>with</strong>in a condition as<br />
they are outside of one. That is, elements are simply referred <strong>to</strong> by element<br />
name and attribute names are prefixed <strong>with</strong> @.<br />
• The = opera<strong>to</strong>r is used for equality comparisons. The converse, the != opera<strong>to</strong>r,<br />
checks for inequality. Other fairly standard comparison opera<strong>to</strong>rs are also<br />
supported, including =.<br />
• A condition comprised only of a single number is actually short for position()<br />
= # where # is the number used. position is a function that returns the position<br />
of each individual node <strong>with</strong>in the current context.<br />
R e s o u r c e s<br />
Only a fraction of what XPath offers has been covered here, mainly basic concepts<br />
and areas that are most likely <strong>to</strong> be applicable when using XPath <strong>to</strong> extract data from<br />
retrieved markup documents. Other functions and opera<strong>to</strong>rs and more advanced<br />
concepts are detailed further in the resources cited at the end of the chapter. Review<br />
of those resources is highly recommended for more extensive and complex data extraction<br />
applications.<br />
• DOM documentation in the <strong>PHP</strong> manual: http://php.net/dom<br />
• An excellent o v e r v i e w of XML and XPath:<br />
http://schlitt.info/opensource/blog/0704_xpath.html<br />
• M o r e information on XML: http://en.wikibooks.org/wiki/XML:_Managing_Data_Exchange