php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
40 ” cURL Extension<br />
• CURLOPT_RETURNTRANSFER is set <strong>to</strong> true in the curl_se<strong>to</strong>pt_array call even<br />
though the return value of curl_exec isn’t captured. This is simply <strong>to</strong> prevent<br />
unwanted output.<br />
• CURLINFO_HEADER_OUT is set <strong>to</strong> true in the curl_se<strong>to</strong>pt_array call <strong>to</strong> indicate that<br />
the request should be retained because it will be extracted after the request is<br />
made.<br />
• CURLINFO_HEADER_OUT is specified in the curl_getinfo call <strong>to</strong> limit its return<br />
value <strong>to</strong> a string containing the request that was made.<br />
Cookies<br />
<br />
H ere is a quick list of pertinent points.<br />
• After the first curl_exec call, cURL will have s<strong>to</strong>red the value of the the<br />
Set-Cookie response header returned by the server in the file referenced by<br />
’/path/<strong>to</strong>/file’ on the local filesystem as per the CURLOPT_COOKIEJAR setting.<br />
This setting value will persist through the second curl_exec call.<br />
• When the second curl_exec call takes place, the CURLOPT_COOKIEFILE setting<br />
will also point <strong>to</strong> ’/path/<strong>to</strong>/file’. This will cause cURL <strong>to</strong> read the contents of