09.11.2016 Views

Foundations of Python Network Programming 978-1-4302-3004-5

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 9 ■ HTTP<br />

Server: Apache<br />

Content-Length: 285<br />

Connection: close<br />

Content-Type: text/html; charset=iso-8859–1<br />

As you can see, this particular web site does include a human-readable document with a 404 error;<br />

the response declares it to be an HTML page that is exactly 285 octets in length. (We will learn more<br />

about content length and types later in the chapter.) Like any HTTP response object, this exception can<br />

be queried for its status code; it can also be read like a file to see the returned page:<br />

>>> e.code<br />

404<br />

>>> e.msg<br />

'Not Found'<br />

>>> e.readline()<br />

'\n'<br />

If you try reading the rest <strong>of</strong> the file, then deep inside <strong>of</strong> the HTML you will see the actual error<br />

message that a web browser would display for the user:<br />

>>> e.read()<br />

'...The requested URL /better-living-through-http was not found<br />

on this server...'<br />

Redirections are very common on the World Wide Web. Conscientious web site programmers, when<br />

they undertake a major redesign, will leave 301 redirects sitting at all <strong>of</strong> their old-style URLs for the sake<br />

<strong>of</strong> bookmarks, external links, and web search results that still reference them. But the volume <strong>of</strong><br />

redirects might be even greater for the many web sites that have a preferred host name that they want<br />

displayed for users, yet also allow users to type any <strong>of</strong> several different hostnames to bring the site up.<br />

The issue <strong>of</strong> whether a site name begins with www` looms very large in this area. Google, for example,<br />

likes those three letters to be included, so an attempt to open the Google home page with the hostname<br />

google.com will be met with a redirect to the preferred name:<br />

>>> info = opener.open('http://google.com/')<br />

--------------------------------------------------<br />

GET / HTTP/1.1<br />

...<br />

Host: google.com<br />

...<br />

-------------------- Response --------------------<br />

HTTP/1.1 301 Moved Permanently<br />

Location: http://www.google.com/<br />

...<br />

--------------------------------------------------<br />

GET / HTTP/1.1<br />

...<br />

Host: www.google.com<br />

...<br />

-------------------- Response --------------------<br />

HTTP/1.1 200 OK<br />

...<br />

You can see that urllib2 has followed the redirect for us, so that the response shows only the final<br />

200 response code:<br />

>>> info.code<br />

200<br />

146

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!