php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
134 ” CSS Selec<strong>to</strong>r Libraries<br />
N o t e when reading this table that CSS selec<strong>to</strong>rs begin set indices at 0 whereas XPath<br />
begins them at 1.<br />
Content F i l t e r s<br />
Where basic filters are based mainly on the type of node or its position in the result<br />
set, content filters are based on node value or surrounding hierarchical structure.<br />
• a:contains(“About Us”); selects all a nodes where the node value contains the<br />
substring “About Us”.<br />
• img:empty selects all img nodes that contain no child nodes (including text<br />
nodes).<br />
• li:has(a:contains(“About Us”)) selects all li nodes that contain an a node<br />
<strong>with</strong> the substring “About Us” in its node value.<br />
• li:parent selects all li nodes that contain child nodes (including text nodes).<br />
Selec<strong>to</strong>r CSS XPath<br />
nodes containing text a:contains(“About //a[contains(text(),<br />
Us”)<br />
“About Us”)]<br />
nodes <strong>with</strong>out<br />
img:empty<br />
//img[not(node())]<br />
children<br />
nodes containing a li:has(a:contains //li//a[contains(<br />
selec<strong>to</strong>r match<br />
(“About Us”)) text(), “About Us”)]<br />
nodes <strong>with</strong> children li:parent //li[node()]<br />
Attribute F i l t e r s<br />
Filters up <strong>to</strong> this point have been specific <strong>to</strong> element nodes, but they also exist for<br />
attribute nodes. Attribute filters are surrounded by square brackets in both CSS and<br />
XPath, but differ in that CSS uses mostly opera<strong>to</strong>rs for conditions while XPath uses<br />
mostly functions. U n l i k eother filters described in this chapter, support for attribute<br />
filters is fairly universal between different libraries.