03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

DOM Extension ” 103<br />

Elements and Attributes<br />

At this point, the DOM transcends the tree analogy. There are multiple types of<br />

nodes, or <strong>to</strong> phrase that <strong>with</strong>in the context of the DOM extension, DOMNode has multiple<br />

subclasses. The main two you’ll be dealing <strong>with</strong> are DOMElement for elements and<br />

DOMAttr for attributes. H e r e are how these concepts apply <strong>to</strong> the example in the last<br />

section.<br />

• ul is the name of an element.<br />

• id is the name of an attribute of the ul element.<br />

• thelist is the value of the id attribute.<br />

• Foo and Bar are the values of the li elements.<br />

Locating N o d e s<br />

T w omethods of the DOMDocument class allow you <strong>to</strong> reduce the number of nodes you<br />

have <strong>to</strong> traverse <strong>to</strong> find the data you want fairly quickly.<br />

getElementById attempts <strong>to</strong> locate a single element that meets two criteria: 1) it<br />

is a descendant of the document’s root element; 2) it has a given id attribute value.<br />

If such an element is found, it is returned as a DOMElement instance; if not, null is<br />

returned.<br />

getElementsByTagName attempts <strong>to</strong> locate all elements that meet two criteria: 1)<br />

it is a descendant of the document’s root element; 2) it has a given element name<br />

(such as ul). This method always returns a DOMNodeList of any found elements. The<br />

DOMNodeList class has a length property that will be equal <strong>to</strong> 0 if no elements are<br />

found. It is also iterable, so it can be used as the subject of a foreach loop.<br />

The DOMElement class also has a getElementsByTagName method, which functions<br />

the same way <strong>with</strong> the exception that located elements will be descendants of that<br />

element instead of the document’s root element.<br />

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!