03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

A p p e n d i x A<br />

Legality of W e <strong>Scraping</strong> b<br />

The legality of web scraping is a rather complicated question, mainly due <strong>to</strong> copyright<br />

and intellectual property laws. U n f o r t u n a t e l y,there is no easy and completely<br />

cut-and-dry answer, particularly because these laws can vary between countries.<br />

There are, however, a few common points for examination when reviewing a<br />

prospective web scraping target.<br />

First, web sites often have documents known as T e r of Service m s (TOS), T e or m s<br />

Conditions of U s e , or U s e r Agreements (hereafter simply known as TOS documents<br />

for the sake of reference). These are generally located in an out-of-the-way location<br />

like a link in the site footer or in a Legal Documents or H e l p section. These types<br />

of documents are more common on larger and more well-known web sites. Below<br />

are segments of several such documents from web sites that explicitly prohibit web<br />

scraping of their content.<br />

• “You specifically agree not <strong>to</strong> access (or attempt <strong>to</strong> access) any of the Services<br />

through any au<strong>to</strong>mated means (including use of scripts or web crawlers)...” –<br />

Google T e r of Service, m s section 5.3 as of 2/14/10<br />

• “You will not collect users’ content or information, or otherwise access F a c e -<br />

book, using au<strong>to</strong>mated means (such as harvesting bots, robots, spiders, or<br />

scrapers) <strong>with</strong>out our permission.” – F a c e b o o k St a t e m e n t of Rights and Responsibilities,<br />

Safety section as of 2/14/10

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!