03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

DOM Extension ” 109<br />

// Returns the first li child node of each ul node<br />

$list = $xpath->query(’//ul/li[1]’);<br />

// Returns all ul nodes containing an li node <strong>with</strong> the value "foobar"<br />

$list = $xpath->query(’//ul[li = "foobar"]’);<br />

?><br />

• Square brackets are used <strong>to</strong> delimit a conditional expression.<br />

• Element and attribute nodes are denoted the same way <strong>with</strong>in a condition as<br />

they are outside of one. That is, elements are simply referred <strong>to</strong> by element<br />

name and attribute names are prefixed <strong>with</strong> @.<br />

• The = opera<strong>to</strong>r is used for equality comparisons. The converse, the != opera<strong>to</strong>r,<br />

checks for inequality. Other fairly standard comparison opera<strong>to</strong>rs are also<br />

supported, including =.<br />

• A condition comprised only of a single number is actually short for position()<br />

= # where # is the number used. position is a function that returns the position<br />

of each individual node <strong>with</strong>in the current context.<br />

R e s o u r c e s<br />

Only a fraction of what XPath offers has been covered here, mainly basic concepts<br />

and areas that are most likely <strong>to</strong> be applicable when using XPath <strong>to</strong> extract data from<br />

retrieved markup documents. Other functions and opera<strong>to</strong>rs and more advanced<br />

concepts are detailed further in the resources cited at the end of the chapter. Review<br />

of those resources is highly recommended for more extensive and complex data extraction<br />

applications.<br />

• DOM documentation in the <strong>PHP</strong> manual: http://php.net/dom<br />

• An excellent o v e r v i e w of XML and XPath:<br />

http://schlitt.info/opensource/blog/0704_xpath.html<br />

• M o r e information on XML: http://en.wikibooks.org/wiki/XML:_Managing_Data_Exchange

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!