03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Introduction ” 5<br />

• Chapters 3-7 co ver specific <strong>PHP</strong> HTTP client libraries and their features, usage,<br />

and advantages and disadvantages of each.<br />

• Chapter 8 goes in<strong>to</strong> developing a cus<strong>to</strong>m client library and common concerns<br />

when using any library including prevention of throttling, access randomization,<br />

agent scheduling, and side effects of client-side scripts.<br />

• Chapter 9 details use of the tidy extension for correcting issues <strong>with</strong> retrieved<br />

markup prior <strong>to</strong> using other extensions <strong>to</strong> analyze it.<br />

• Chapters 10-12 review various XML extensions for <strong>PHP</strong>, compare and contrast<br />

the two classes of XML parsers, and pro vide a brief introduction <strong>to</strong> XP ath.<br />

• Chapter 13 is a study of CSS selec<strong>to</strong>rs, comparisons between them and XP ath<br />

expressions, and information on available libraries for using them <strong>to</strong> query<br />

markup documents.<br />

• Chapter 14 explores regular expressions using the PCRE extension, which can<br />

be useful in validating scraped data <strong>to</strong> ensure the stability of the web scraping<br />

application.<br />

• Chapter 15 outlines several general high-level strategies and best practices for<br />

designing and developing your web scraping applications.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!