php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
166 ” Legality of W e b <strong>Scraping</strong><br />
• “ A m grants a you z a o limited n license <strong>to</strong> access and make personal use of this<br />
site ... This license does not include ... any use of data mining, robots, or similar<br />
data gathering and extraction <strong>to</strong>ols.” – Amazon Conditions of U s e , LICENSE<br />
AND SITE AC C E S S section as of 2/14/10<br />
• “Youagree that you will not use any robot, spider, scraper or other au<strong>to</strong>mated<br />
means <strong>to</strong> access the Sites for any purpose <strong>with</strong>out our express written permission.”<br />
– eBay U s e r Agreement, Access and Interference section as of 2/14/10<br />
• “... you agree not <strong>to</strong>: ... access, moni<strong>to</strong>r or copy any content or information<br />
of this W e b s i using t e any robot, spider, scraper or other au<strong>to</strong>mated means or<br />
any manual process for any purpose <strong>with</strong>out our express written permission;<br />
...” – Expedia, Inc. W e b Site T e r Conditions, m s , and N o t i c e s , PROHIBITED<br />
AC T I V I T I E S section as of 2/14/10<br />
• “The foregoing licenses do not include any rights <strong>to</strong>: ... use any robot, spider,<br />
data miner, scraper or other au<strong>to</strong>mated means <strong>to</strong> access the Barnes & N o -<br />
ble.com Site or its systems, the Content or any portion or derivative thereof<br />
for any purpose; ...” – Barnes & N o b l e T e r of U m s e s , Section I LICENSES AND<br />
RESTRICTIONS as of 2/14/10<br />
Determining whether or not the web site in question has a TOS document will be the<br />
first step. If you find one, look for clauses using language similar <strong>to</strong> that of the above<br />
examples. Also, look for any broad “blanket” clauses of prohibited activities under<br />
which web scraping may fall.<br />
If you find a TOS document and it does not expressly forbid web scraping, the<br />
next step is <strong>to</strong> contact representatives who have authority <strong>to</strong> speak on behalf of the<br />
organization that o w n s the web site. Some organizations may allow web scraping assuming<br />
that you secure permission <strong>with</strong> appropriate authorities beforehand. When<br />
obtaining this permission, it is best <strong>to</strong> obtain a document in writing and on official<br />
letterhead that clearly indicates that it originated from the organization in question.<br />
This has the greatest chance of mitigating any legal issues that may arise.<br />
If intellectual property-related allegations are brought against an individual as a<br />
result of usage of an au<strong>to</strong>mated agent or information acquired by one, assuming<br />
the individual did not violate any TOS agreement imposed by its o w n e r or related<br />
computer use laws, a court decision will likely boil down <strong>to</strong> whether or not the usage