03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

42 ” cURL Extension<br />

containing the authentication credentials <strong>to</strong> use in the format ’username:password’.<br />

N ote that this has <strong>to</strong> be set for each request requiring authentication.<br />

R edir ection<br />

CURLOPT_FOLLOWLOCATION can be set <strong>to</strong> true <strong>to</strong> have cURL au<strong>to</strong>matically place process<br />

redirections. That is, it will detect Location headers in the server response and<br />

implicitly issue requests until the server response no longer contains a Location<br />

header. To set the maximum number of Location headers <strong>to</strong> have cURL process<br />

au<strong>to</strong>matically before terminating, use the CURLOPT_MAXREDIRS setting. To have<br />

authentication credentials persist in requests resulting from redirections, set the<br />

CURLOPT_UNRESTRICTED_AUTH setting <strong>to</strong> true.<br />

R efer ers<br />

CURLOPT_REFERER allo ws you <strong>to</strong> explicitly set the value of the Referer header. Setting<br />

CURLOPT_AUTOREFERER <strong>to</strong> true will cause cURL <strong>to</strong> au<strong>to</strong>matically set the value of the<br />

Referer header whenever it processes a Location header.<br />

Content Caching<br />

CURLOPT_TIMECONDITION must be set <strong>to</strong> either CURL_TIMECOND_IFMODSINCE or<br />

CURL_TIMECOND_IFUNMODSINCE <strong>to</strong> select whether the If-Modified-Since or<br />

If-Unmodified-Since header will be used respectively.<br />

CURLOPT_TIMEVALUE must be set <strong>to</strong> a UNIX timestamp (a date representation using<br />

the number of seconds between the UNIX epoch and the desired date) <strong>to</strong> indicate<br />

the last client access time of the resource. The time function can be used <strong>to</strong> derive<br />

this value.<br />

U ser Agents<br />

CURLOPT_USERAGENT can be used <strong>to</strong> set the U ser Agent string <strong>to</strong> use.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!