03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Introduction ” 3<br />

in some instances. Like the obligation of a web service <strong>to</strong> generate valid markup , a<br />

web bro wser has certain responsibilities. These include respecting server requests<br />

<strong>to</strong> not index certain pages and keeping the number of requests sent <strong>to</strong> servers <strong>with</strong>in<br />

a reasonable amount.<br />

In short, web scraping is the subset of a web bro wser’s functionality necessary <strong>to</strong><br />

obtain and render data in a manner conducive <strong>to</strong> ho w that data will be used.<br />

A pplications of W eb <strong>Scraping</strong><br />

Though it’s becoming more common for web sites <strong>to</strong> expose their data using web<br />

services, the absence of a data source that is tailored <strong>to</strong> machines and offers all the<br />

data of a corresponding web site is still a common situation. In these instances, the<br />

web site itself must effectively become your data source, and web scraping can be<br />

emplo yed <strong>to</strong> au<strong>to</strong>mate the consumption of the data it makes available. Additionally,<br />

web services are also used <strong>to</strong> transfer information in<strong>to</strong> external data systems. In their<br />

absence, web scraping can also be used <strong>to</strong> integrate <strong>with</strong> such systems that don’t<br />

offer web services, but do offer a web-based interface for their users.<br />

Another application of web scraping that is likely more well-kno wn is the development<br />

of au<strong>to</strong>mated agents kno wn as crawlers, which seek out resources for s<strong>to</strong>rage<br />

and analysis that will eventually comprise the search results they deliver <strong>to</strong> you. In<br />

the earliest days of the internet, this type of data was sought out manually by human<br />

beings, a slo w and tedious process which limited ho w quickly a search engine could<br />

expand its offerings. W eb scraping pro vided an alternative <strong>to</strong> allo w computers <strong>to</strong> do<br />

the grunt work of finding new pages and extracting their content.<br />

Lastly, web scraping is one way – not the only way or necessarily the recommended<br />

way, but certainly a way – <strong>to</strong> implement integration testing for web applications. U s-<br />

ing its abilities <strong>to</strong> act as a client in extracting and transmitting data, a web scraping<br />

application can simulate the bro wser activity of a normal user. This can help <strong>to</strong> ensure<br />

that web application output complies <strong>with</strong> its expected response <strong>with</strong> respect<br />

<strong>to</strong> the application’s requirements.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!