03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

66 ” PEAR::HTTP_Client<br />

U sing the Client<br />

HTTP_Client persists explicit sets of headers and requests parameters across multiple<br />

requests, which are set using the setDefaultHeader and setRequestParameter methods<br />

respectively. The client construc<strong>to</strong>r also accepts arrays of these. Default headers<br />

and request parameters can be cleared by calling reset on the client instance.<br />

Internally, the client class actually creates a new instance of HTTP_Request per request.<br />

The request operation is set depending on which of the client instance methods<br />

are called; get, head, and post are supported.<br />

The capabilities of the client described up <strong>to</strong> this point can all be accomplished<br />

by reusing the same request instance for multiple requests. H o wever, the client also<br />

handles two things that the request class does not: cookies and redirects.<br />

By default, cookies are persisted au<strong>to</strong>matically across requests <strong>with</strong>out any additional<br />

configuration. HTTP_Client_CookieManager is used internally for this. F or cus<strong>to</strong>m<br />

cookie handling, this class can be extended and an instance of it passed as the<br />

third parameter <strong>to</strong> the client construc<strong>to</strong>r. If this is done, that instance will be used<br />

rather than an instance of the native cookie manager class being created by default.<br />

The maximum number of redirects <strong>to</strong> process can be set using the setMaxRedirects<br />

method of the client class. Internally, requests will be created and sent as needed <strong>to</strong><br />

process the redirect until a non-redirecting response is received or the maximum<br />

redirect limit is reached. In the former case, the client method being called will return<br />

an integer containing the response code rather than true as the request class<br />

does. In the latter case, the client method will return an error instance. N ote that<br />

the client class will process redirects contained in meta tags of HTML documents in<br />

addition <strong>to</strong> those performed at the HTTP level.<br />

To retrieve information for the last response received, use the currentResponse<br />

method of the client instance. It will return an associative array containing the keys<br />

’code’, ’headers’, and ’body’ <strong>with</strong> values corresponding <strong>to</strong> the return values of request<br />

methods getResponseCode, getResponseHeader, and getResponseBody respectively.<br />

By default, all responses are s<strong>to</strong>red and can be accessed individually as sho wn belo<br />

w . To disable s<strong>to</strong>rage of all responses except the last one, call enableHis<strong>to</strong>ry on the<br />

client instance and pass it false.<br />

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!