03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

cURL Extension ” 41<br />

that file and use it as the value for the Cookie request header when the request<br />

is constructed.<br />

• If $cookiejar is set <strong>to</strong> an empty string, cookie data will persist in memory rather<br />

than a local file. This impro ves performance (memory access is faster than<br />

disk) and security (file s<strong>to</strong>rage may be more open <strong>to</strong> access by other users and<br />

processes than memory depending on the server environment).<br />

In some instances it may be desirable for the CURLOPT_COOKIEJAR value <strong>to</strong> have<br />

a different value per request, such as for debugging. In most cases, ho wever,<br />

CURLOPT_COOKIEJAR will be set for the first request <strong>to</strong> receive the initial cookie data<br />

and its value will persist for subsequent requests. In most cases, CURLOPT_COOKIEFILE<br />

will be assigned the same value as CURLOPT_COOKIEJAR after the first request. This will<br />

result in cookie data being read <strong>to</strong> include in the request, follo wed by cookie data<br />

from the response being written back (and o verwriting any existing data at that location)<br />

for use in subsequent requests. On a related note, if you want cURL <strong>to</strong> begin a<br />

new session in order <strong>to</strong> have it discard data for session cookies (i.e. cookies <strong>with</strong>out<br />

an expiration date), you can set the CURLOPT_COOKIESESSION setting <strong>to</strong> true.<br />

If you want <strong>to</strong> handle cookie data manually for any reason, you can set the value<br />

of the Cookie request header via the CURLOPT_COOKIE setting. To get access <strong>to</strong> the response<br />

headers, set the CURLOPT_HEADER and CURLOPT_RETURNTRANSFER settings <strong>to</strong> true.<br />

This will cause the curl_exec call <strong>to</strong> return the entire response including the headers<br />

and the body. Recall that there is a single blank line between the headers and<br />

the body and that a colon separates each header name from its corresponding value.<br />

This information combined <strong>with</strong> the basic string handling functions in <strong>PHP</strong> should<br />

be all you need. Also , you ’ll need <strong>to</strong> set CURLOPT_FOLLOWLOCATION <strong>to</strong> false in order<br />

<strong>to</strong> prevent cURL from processing redirections au<strong>to</strong>matically. N ot doing this would<br />

cause any cookies set by requests resulting in redirections <strong>to</strong> be lost.<br />

HTTP A uthentication<br />

cURL supports both Basic and Digest HTTP authentication methods, among others.<br />

The CURLOPT_HTTPAUTH setting controls the method <strong>to</strong> use and is set using constants<br />

such as CURLAUTH_BASIC or CURLAUTH_DIGEST. The CURLOPT_USERPWD setting is a string

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!