03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

44 ” cURL Extension<br />

operate unpredictably in a threaded environment such as <strong>Wind</strong>o ws Server or *NIX<br />

running a threaded Apache MPM such as worker.<br />

If you are using the HTTP streams wrapper or either of the <strong>PHP</strong> -based HTTP client<br />

libraries co vered in this chapter and you have access <strong>to</strong> install software on your<br />

server, you may want <strong>to</strong> install a local DNS caching daemon <strong>to</strong> impro ve performance.<br />

Try nscd or dnsmasq on *NIX. W riting DNS caching in<strong>to</strong> your o wn client<br />

will be co vered in a later chapter on writing your o wn HTTP client.<br />

T imeouts<br />

CURLOPT_CONNECTTIMEOUT is a maximum amount of time in seconds <strong>to</strong> which a connection<br />

attempt will be restricted for a cURL operation. It can be set <strong>to</strong> 0 <strong>to</strong> disable<br />

this limit, but this is inadvisable in a production environment. N ote that this time<br />

includes DNS lookups. F or environments where the DNS server in use or the web<br />

server hosting the target application is not particularly responsive, it may be necessary<br />

<strong>to</strong> increase the value of this setting.<br />

CURLOPT_TIMEOUT is a maximum amount of time in seconds <strong>to</strong> which the execution<br />

of individual cURL extension function calls will be limited. N ote that the value<br />

for this setting should include the value for CURLOPT_CONNECTTIMEOUT. In other words,<br />

CURLOPT_CONNECTTIMEOUT is a segment of the time represented by CURLOPT_TIMEOUT, so<br />

the value of the latter should be greater than the value of the former.<br />

R equest P ooling<br />

Because it is written C, the cURL extension has one feature that cannot be replicated<br />

exactly in libraries written in <strong>PHP</strong>: the ability <strong>to</strong> run multiple requests in parallel.<br />

What this means is that multiple requests can be pro vided <strong>to</strong> cURL all at once and,<br />

rather than waiting for a response <strong>to</strong> be received for the first request before mo ving<br />

on <strong>to</strong> sending the second, all requests will be sent and processed as responses are<br />

returned. This can significantly shorten the time required <strong>to</strong> collectively complete<br />

all the requests. H o wever, care should be taken not <strong>to</strong> o verload a single host <strong>with</strong><br />

requests when using this feature.<br />

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!