php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
42 ” cURL Extension<br />
containing the authentication credentials <strong>to</strong> use in the format ’username:password’.<br />
N ote that this has <strong>to</strong> be set for each request requiring authentication.<br />
R edir ection<br />
CURLOPT_FOLLOWLOCATION can be set <strong>to</strong> true <strong>to</strong> have cURL au<strong>to</strong>matically place process<br />
redirections. That is, it will detect Location headers in the server response and<br />
implicitly issue requests until the server response no longer contains a Location<br />
header. To set the maximum number of Location headers <strong>to</strong> have cURL process<br />
au<strong>to</strong>matically before terminating, use the CURLOPT_MAXREDIRS setting. To have<br />
authentication credentials persist in requests resulting from redirections, set the<br />
CURLOPT_UNRESTRICTED_AUTH setting <strong>to</strong> true.<br />
R efer ers<br />
CURLOPT_REFERER allo ws you <strong>to</strong> explicitly set the value of the Referer header. Setting<br />
CURLOPT_AUTOREFERER <strong>to</strong> true will cause cURL <strong>to</strong> au<strong>to</strong>matically set the value of the<br />
Referer header whenever it processes a Location header.<br />
Content Caching<br />
CURLOPT_TIMECONDITION must be set <strong>to</strong> either CURL_TIMECOND_IFMODSINCE or<br />
CURL_TIMECOND_IFUNMODSINCE <strong>to</strong> select whether the If-Modified-Since or<br />
If-Unmodified-Since header will be used respectively.<br />
CURLOPT_TIMEVALUE must be set <strong>to</strong> a UNIX timestamp (a date representation using<br />
the number of seconds between the UNIX epoch and the desired date) <strong>to</strong> indicate<br />
the last client access time of the resource. The time function can be used <strong>to</strong> derive<br />
this value.<br />
U ser Agents<br />
CURLOPT_USERAGENT can be used <strong>to</strong> set the U ser Agent string <strong>to</strong> use.