php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
cURL Extension ” 41<br />
that file and use it as the value for the Cookie request header when the request<br />
is constructed.<br />
• If $cookiejar is set <strong>to</strong> an empty string, cookie data will persist in memory rather<br />
than a local file. This impro ves performance (memory access is faster than<br />
disk) and security (file s<strong>to</strong>rage may be more open <strong>to</strong> access by other users and<br />
processes than memory depending on the server environment).<br />
In some instances it may be desirable for the CURLOPT_COOKIEJAR value <strong>to</strong> have<br />
a different value per request, such as for debugging. In most cases, ho wever,<br />
CURLOPT_COOKIEJAR will be set for the first request <strong>to</strong> receive the initial cookie data<br />
and its value will persist for subsequent requests. In most cases, CURLOPT_COOKIEFILE<br />
will be assigned the same value as CURLOPT_COOKIEJAR after the first request. This will<br />
result in cookie data being read <strong>to</strong> include in the request, follo wed by cookie data<br />
from the response being written back (and o verwriting any existing data at that location)<br />
for use in subsequent requests. On a related note, if you want cURL <strong>to</strong> begin a<br />
new session in order <strong>to</strong> have it discard data for session cookies (i.e. cookies <strong>with</strong>out<br />
an expiration date), you can set the CURLOPT_COOKIESESSION setting <strong>to</strong> true.<br />
If you want <strong>to</strong> handle cookie data manually for any reason, you can set the value<br />
of the Cookie request header via the CURLOPT_COOKIE setting. To get access <strong>to</strong> the response<br />
headers, set the CURLOPT_HEADER and CURLOPT_RETURNTRANSFER settings <strong>to</strong> true.<br />
This will cause the curl_exec call <strong>to</strong> return the entire response including the headers<br />
and the body. Recall that there is a single blank line between the headers and<br />
the body and that a colon separates each header name from its corresponding value.<br />
This information combined <strong>with</strong> the basic string handling functions in <strong>PHP</strong> should<br />
be all you need. Also , you ’ll need <strong>to</strong> set CURLOPT_FOLLOWLOCATION <strong>to</strong> false in order<br />
<strong>to</strong> prevent cURL from processing redirections au<strong>to</strong>matically. N ot doing this would<br />
cause any cookies set by requests resulting in redirections <strong>to</strong> be lost.<br />
HTTP A uthentication<br />
cURL supports both Basic and Digest HTTP authentication methods, among others.<br />
The CURLOPT_HTTPAUTH setting controls the method <strong>to</strong> use and is set using constants<br />
such as CURLAUTH_BASIC or CURLAUTH_DIGEST. The CURLOPT_USERPWD setting is a string