php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
A p p e n d i x A<br />
Legality of W e <strong>Scraping</strong> b<br />
The legality of web scraping is a rather complicated question, mainly due <strong>to</strong> copyright<br />
and intellectual property laws. U n f o r t u n a t e l y,there is no easy and completely<br />
cut-and-dry answer, particularly because these laws can vary between countries.<br />
There are, however, a few common points for examination when reviewing a<br />
prospective web scraping target.<br />
First, web sites often have documents known as T e r of Service m s (TOS), T e or m s<br />
Conditions of U s e , or U s e r Agreements (hereafter simply known as TOS documents<br />
for the sake of reference). These are generally located in an out-of-the-way location<br />
like a link in the site footer or in a Legal Documents or H e l p section. These types<br />
of documents are more common on larger and more well-known web sites. Below<br />
are segments of several such documents from web sites that explicitly prohibit web<br />
scraping of their content.<br />
• “You specifically agree not <strong>to</strong> access (or attempt <strong>to</strong> access) any of the Services<br />
through any au<strong>to</strong>mated means (including use of scripts or web crawlers)...” –<br />
Google T e r of Service, m s section 5.3 as of 2/14/10<br />
• “You will not collect users’ content or information, or otherwise access F a c e -<br />
book, using au<strong>to</strong>mated means (such as harvesting bots, robots, spiders, or<br />
scrapers) <strong>with</strong>out our permission.” – F a c e b o o k St a t e m e n t of Rights and Responsibilities,<br />
Safety section as of 2/14/10