03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

xviii ” CONTENTS<br />

Today, there are many services out there pro viding news feeds, and there are plenty<br />

of code libraries that can parse this data. So , <strong>with</strong> the proliferation of web services<br />

and public APIs, why is web scraping still so important <strong>to</strong> the future of the W eb? It is<br />

important because of the rise of microformats, Semantic W eb technologies, the W3C<br />

Linking Open Data Community Project, and the Open Data M o vement. J ust this year<br />

at the TED conference, T im Berners-Lee spoke of linked data saying, “W e want the<br />

data. W e want unadulterated data. W e have <strong>to</strong> ask for raw data no w .”<br />

The future of the W eb is in pro viding and accessing raw data. H o w do we access<br />

this raw data? Through web scraping.<br />

Yes, there are legal issues <strong>to</strong> consider when determining whether web scraping is a<br />

technique you want <strong>to</strong> emplo y, but the techniques this book describes are useful for<br />

accessing and parsing raw data of any kind found on the W eb , whether it is from a<br />

web service API, an XML feed, RDF data embedded in a web page, microformats in<br />

HTML, or plain old HTML itself.<br />

There is no way around it. To be a successful web programmer, you must master<br />

these techniques. Make them part of your <strong>to</strong>olbox. Let them inform your software<br />

design decisions. I encourage you <strong>to</strong> bring us in<strong>to</strong> the future of the W eb . Scrape the<br />

W eb <strong>with</strong>in the bounds of the law , publish your raw data for public use, and demand<br />

raw data no w!<br />

Ben Ramsey<br />

Atlanta, Georgia<br />

J une 28, 2009

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!