php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
DOM Extension ” 103<br />
Elements and Attributes<br />
At this point, the DOM transcends the tree analogy. There are multiple types of<br />
nodes, or <strong>to</strong> phrase that <strong>with</strong>in the context of the DOM extension, DOMNode has multiple<br />
subclasses. The main two you’ll be dealing <strong>with</strong> are DOMElement for elements and<br />
DOMAttr for attributes. H e r e are how these concepts apply <strong>to</strong> the example in the last<br />
section.<br />
• ul is the name of an element.<br />
• id is the name of an attribute of the ul element.<br />
• thelist is the value of the id attribute.<br />
• Foo and Bar are the values of the li elements.<br />
Locating N o d e s<br />
T w omethods of the DOMDocument class allow you <strong>to</strong> reduce the number of nodes you<br />
have <strong>to</strong> traverse <strong>to</strong> find the data you want fairly quickly.<br />
getElementById attempts <strong>to</strong> locate a single element that meets two criteria: 1) it<br />
is a descendant of the document’s root element; 2) it has a given id attribute value.<br />
If such an element is found, it is returned as a DOMElement instance; if not, null is<br />
returned.<br />
getElementsByTagName attempts <strong>to</strong> locate all elements that meet two criteria: 1)<br />
it is a descendant of the document’s root element; 2) it has a given element name<br />
(such as ul). This method always returns a DOMNodeList of any found elements. The<br />
DOMNodeList class has a length property that will be equal <strong>to</strong> 0 if no elements are<br />
found. It is also iterable, so it can be used as the subject of a foreach loop.<br />
The DOMElement class also has a getElementsByTagName method, which functions<br />
the same way <strong>with</strong> the exception that located elements will be descendants of that<br />
element instead of the document’s root element.<br />