09.11.2016 Views

Foundations of Python Network Programming 978-1-4302-3004-5

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

C H A P T E R 9<br />

■ ■ ■<br />

HTTP<br />

The protocols <strong>of</strong> yore tended to be dense, binary, and decipherable only by Boolean machine logic. But<br />

the workhorse protocol <strong>of</strong> the World Wide Web, named the Hypertext Transfer Protocol (HTTP), is<br />

instead based on friendly, mostly-human-readable text. There is probably no better way to start this<br />

chapter than to show you what an actual request and response looks like; that way, you will already<br />

know the layout <strong>of</strong> a whole request as we start digging into each <strong>of</strong> its features.<br />

Consider what happens when you ask the urllib2 <strong>Python</strong> Standard Library to open this URL, which<br />

is the RFC that defines the HTTP protocol itself: www.ietf.org/rfc/rfc2616.txt<br />

The library will connect to the IETF web site, and send it an HTTP request that looks like this:<br />

GET /rfc/rfc2616.txt HTTP/1.1<br />

Accept-Encoding: identity<br />

Host: www.ietf.org<br />

Connection: close<br />

User-Agent: <strong>Python</strong>-urllib/2.6<br />

As you can see, the format <strong>of</strong> this request is very much like that <strong>of</strong> the headers <strong>of</strong> an e-mail<br />

message—in fact, both HTTP and e-mail messages define their header layout using the same standard:<br />

RFC 822. The HTTP response that comes back over the socket also starts with a set <strong>of</strong> headers, but then<br />

also includes a body that contains the document itself that has been requested (which I have truncated):<br />

HTTP/1.1 200 OK<br />

Date: Wed, 27 Oct 2010 17:12:01 GMT<br />

Server: Apache/2.2.4 (Linux/SUSE) mod_ssl/2.2.4 OpenSSL/0.9.8e PHP/5.2.6 with Suhosin-<br />

Patch mod_python/3.3.1 <strong>Python</strong>/2.5.1 mod_perl/2.0.3 Perl/v5.8.8<br />

Last-Modified: Fri, 11 Jun 1999 18:46:53 GMT<br />

ETag: "1cad180-67187-31a3e140"<br />

Accept-Ranges: bytes<br />

Content-Length: 422279<br />

Vary: Accept-Encoding<br />

Connection: close<br />

Content-Type: text/plain<br />

<strong>Network</strong> Working Group<br />

Request for Comments: 2616<br />

Obsoletes: 2068<br />

Category: Standards Track<br />

...<br />

R. Fielding<br />

UC Irvine<br />

J. Gettys<br />

Compaq/W3C<br />

Note that those last four lines are the beginning <strong>of</strong> RFC 2616 itself, not part <strong>of</strong> the HTTP protocol.<br />

Two <strong>of</strong> the most important features <strong>of</strong> this format are not actually visible here, because they pertain<br />

to whitespace. First, every header line is concluded by a two-byte carriage-return linefeed sequence, or<br />

'\r\n' in <strong>Python</strong>. Second, both sets <strong>of</strong> headers are terminated—in HTTP, headers are always<br />

137

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!