03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CSS Selec<strong>to</strong>r Libraries ” 135<br />

• [href] matches all nodes that have an attribute node <strong>with</strong> the name href.<br />

• [href=“/home”] matches all nodes <strong>with</strong> an attribute node named href that has<br />

a value of “/home”.<br />

• [href!=“/home”] matches all nodes <strong>with</strong> an attribute node named href that do<br />

not have a value of “/home”.<br />

• [hrefˆ=“/”] matches all nodes <strong>with</strong> an attribute node named href and have a<br />

value that starts <strong>with</strong> “/”.<br />

• [href$=“-us”] matches all nodes <strong>with</strong> an attribute node named href and have<br />

a value that ends <strong>with</strong> “-us”.<br />

• [href*=“-us”] matches all nodes <strong>with</strong> an attribute node named href and have<br />

a value that contains “-us” anywhere <strong>with</strong>in the value.<br />

• [src*=“ad”][altˆ=“Advertisement”] matches all nodes that have both an attribute<br />

node named src <strong>with</strong> a value containing “ad” and an attribute node<br />

named alt <strong>with</strong> a value starting <strong>with</strong> “Advertisement”.<br />

Selec<strong>to</strong>r CSS XPath<br />

has attribute [href] //*[@href]<br />

has attribute value [href=“/home”] //*[@href=“/home”]<br />

has different attribute [href!=“/home”] //*[@href!=“/home”]<br />

value<br />

has attribute value<br />

starting <strong>with</strong> substring<br />

[hrefˆ=“/”] //*[starts-<strong>with</strong>(@href,<br />

“/”)]<br />

has attribute value<br />

ending <strong>with</strong> substring<br />

[href$=“-us”] //*[ends-width(@href,<br />

“-us”)]<br />

has attribute value<br />

containing substring<br />

[href*=“-us”] //*[contains(@href,<br />

“-us”)]<br />

multiple attribute<br />

filters<br />

[src*=“ad”][altˆ=<br />

“Advertisement”]<br />

//*[contains(@src,<br />

“ad”) and<br />

starts-<strong>with</strong>(@alt,<br />

“Advertisement”)]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!