php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
xviii ” CONTENTS<br />
Today, there are many services out there pro viding news feeds, and there are plenty<br />
of code libraries that can parse this data. So , <strong>with</strong> the proliferation of web services<br />
and public APIs, why is web scraping still so important <strong>to</strong> the future of the W eb? It is<br />
important because of the rise of microformats, Semantic W eb technologies, the W3C<br />
Linking Open Data Community Project, and the Open Data M o vement. J ust this year<br />
at the TED conference, T im Berners-Lee spoke of linked data saying, “W e want the<br />
data. W e want unadulterated data. W e have <strong>to</strong> ask for raw data no w .”<br />
The future of the W eb is in pro viding and accessing raw data. H o w do we access<br />
this raw data? Through web scraping.<br />
Yes, there are legal issues <strong>to</strong> consider when determining whether web scraping is a<br />
technique you want <strong>to</strong> emplo y, but the techniques this book describes are useful for<br />
accessing and parsing raw data of any kind found on the W eb , whether it is from a<br />
web service API, an XML feed, RDF data embedded in a web page, microformats in<br />
HTML, or plain old HTML itself.<br />
There is no way around it. To be a successful web programmer, you must master<br />
these techniques. Make them part of your <strong>to</strong>olbox. Let them inform your software<br />
design decisions. I encourage you <strong>to</strong> bring us in<strong>to</strong> the future of the W eb . Scrape the<br />
W eb <strong>with</strong>in the bounds of the law , publish your raw data for public use, and demand<br />
raw data no w!<br />
Ben Ramsey<br />
Atlanta, Georgia<br />
J une 28, 2009