03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

134 ” CSS Selec<strong>to</strong>r Libraries<br />

N o t e when reading this table that CSS selec<strong>to</strong>rs begin set indices at 0 whereas XPath<br />

begins them at 1.<br />

Content F i l t e r s<br />

Where basic filters are based mainly on the type of node or its position in the result<br />

set, content filters are based on node value or surrounding hierarchical structure.<br />

• a:contains(“About Us”); selects all a nodes where the node value contains the<br />

substring “About Us”.<br />

• img:empty selects all img nodes that contain no child nodes (including text<br />

nodes).<br />

• li:has(a:contains(“About Us”)) selects all li nodes that contain an a node<br />

<strong>with</strong> the substring “About Us” in its node value.<br />

• li:parent selects all li nodes that contain child nodes (including text nodes).<br />

Selec<strong>to</strong>r CSS XPath<br />

nodes containing text a:contains(“About //a[contains(text(),<br />

Us”)<br />

“About Us”)]<br />

nodes <strong>with</strong>out<br />

img:empty<br />

//img[not(node())]<br />

children<br />

nodes containing a li:has(a:contains //li//a[contains(<br />

selec<strong>to</strong>r match<br />

(“About Us”)) text(), “About Us”)]<br />

nodes <strong>with</strong> children li:parent //li[node()]<br />

Attribute F i l t e r s<br />

Filters up <strong>to</strong> this point have been specific <strong>to</strong> element nodes, but they also exist for<br />

attribute nodes. Attribute filters are surrounded by square brackets in both CSS and<br />

XPath, but differ in that CSS uses mostly opera<strong>to</strong>rs for conditions while XPath uses<br />

mostly functions. U n l i k eother filters described in this chapter, support for attribute<br />

filters is fairly universal between different libraries.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!