php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
php|architect's Guide to Web Scraping with PHP - Wind Business ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
44 ” cURL Extension<br />
operate unpredictably in a threaded environment such as <strong>Wind</strong>o ws Server or *NIX<br />
running a threaded Apache MPM such as worker.<br />
If you are using the HTTP streams wrapper or either of the <strong>PHP</strong> -based HTTP client<br />
libraries co vered in this chapter and you have access <strong>to</strong> install software on your<br />
server, you may want <strong>to</strong> install a local DNS caching daemon <strong>to</strong> impro ve performance.<br />
Try nscd or dnsmasq on *NIX. W riting DNS caching in<strong>to</strong> your o wn client<br />
will be co vered in a later chapter on writing your o wn HTTP client.<br />
T imeouts<br />
CURLOPT_CONNECTTIMEOUT is a maximum amount of time in seconds <strong>to</strong> which a connection<br />
attempt will be restricted for a cURL operation. It can be set <strong>to</strong> 0 <strong>to</strong> disable<br />
this limit, but this is inadvisable in a production environment. N ote that this time<br />
includes DNS lookups. F or environments where the DNS server in use or the web<br />
server hosting the target application is not particularly responsive, it may be necessary<br />
<strong>to</strong> increase the value of this setting.<br />
CURLOPT_TIMEOUT is a maximum amount of time in seconds <strong>to</strong> which the execution<br />
of individual cURL extension function calls will be limited. N ote that the value<br />
for this setting should include the value for CURLOPT_CONNECTTIMEOUT. In other words,<br />
CURLOPT_CONNECTTIMEOUT is a segment of the time represented by CURLOPT_TIMEOUT, so<br />
the value of the latter should be greater than the value of the former.<br />
R equest P ooling<br />
Because it is written C, the cURL extension has one feature that cannot be replicated<br />
exactly in libraries written in <strong>PHP</strong>: the ability <strong>to</strong> run multiple requests in parallel.<br />
What this means is that multiple requests can be pro vided <strong>to</strong> cURL all at once and,<br />
rather than waiting for a response <strong>to</strong> be received for the first request before mo ving<br />
on <strong>to</strong> sending the second, all requests will be sent and processed as responses are<br />
returned. This can significantly shorten the time required <strong>to</strong> collectively complete<br />
all the requests. H o wever, care should be taken not <strong>to</strong> o verload a single host <strong>with</strong><br />
requests when using this feature.<br />