03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

16 ” HTTP<br />

set, it will persist for the duration of the client session. F or normal web bro wsers, this<br />

is generally when all instances of the bro wser application have been closed.<br />

Redirection<br />

The Location header is used by the server <strong>to</strong> redirect the client <strong>to</strong> a URI. In this<br />

scenario , the response will most likely include a 3xx class status code (such as 302<br />

F ound), but may also include a 201 code <strong>to</strong> indicate the creation of a new resource.<br />

See subsection 14.30 of RFC 2616 for more information.<br />

It is hypothetically possible for a malfunctioning application <strong>to</strong> cause the server <strong>to</strong><br />

initiate an infinite series of redirections between itself and the client. F or this reason,<br />

client libraries often implement a limit on the number of consecutive redirections it<br />

will process before assuming that the application being accessed is behaving inappropriately<br />

and terminating. Libraries generally implement a default limit, but allo w<br />

you <strong>to</strong> o verride it <strong>with</strong> your o wn.<br />

Referring URLs<br />

It is possible for a requested resource <strong>to</strong> refer <strong>to</strong> other resources in some way. When<br />

this happens, clients traditionally include the URL of the referring resource in the<br />

Referer header. Yes, the header name is misspelled there and intentionally so . The<br />

commonality of that particular misspelling caused it <strong>to</strong> end up in the official HTTP<br />

specification, thereby becoming the standard industry spelling used when referring<br />

<strong>to</strong> that particular header.<br />

There are multiple situations in which the specification of a referer can occur. A<br />

user may click on a hyperlink in a bro wser, in which case the full URL of the resource<br />

containing the hyperlink would be the referer. When a resource containing markup<br />

<strong>with</strong> embedded images is requested, subsequent requests for those images will contain<br />

the full URL of the page containing the images as the referer. A referer is also<br />

specified when redirection occurs, as described in the previous section.<br />

The reason this is relevant is because some applications depend on the value of the<br />

Referer header by design, which is less than ideal for the simple fact that the header<br />

value can be spoofed. In any case, it is important <strong>to</strong> be aware that some applications<br />

may not function as expected if the pro vided header value is not consistent <strong>with</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!