09.11.2016 Views

Foundations of Python Network Programming 978-1-4302-3004-5

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 9 ■ HTTP<br />

Relative URLs<br />

Very <strong>of</strong>ten, the links used in web pages do not specify full URLs, but relative URLs that are missing<br />

several <strong>of</strong> the usual components. When one <strong>of</strong> these links needs to be resolved, the client needs to fill in<br />

the missing information with the corresponding fields from the URL used to fetch the page in the first<br />

place.<br />

Relative URLs are convenient for web page designers, not only because they are shorter and thus<br />

easier to type, but because if an entire sub-tree <strong>of</strong> a web site is moved somewhere else, then the links will<br />

keep working. The simplest relative links are the names <strong>of</strong> pages one level deeper than the base page:<br />

>>> urlparse.urljoin('http://www.python.org/psf/', 'grants')<br />

'http://www.python.org/psf/grants'<br />

>>> urlparse.urljoin('http://www.python.org/psf/', 'mission')<br />

'http://www.python.org/psf/mission'<br />

Note the crucial importance <strong>of</strong> the trailing slash in the URLs we just gave to the urljoin() function!<br />

Without the trailing slash, the call function will decide that the current directory (called <strong>of</strong>ficially the base<br />

URL) is / rather than /psf/; therefore, it will replace the psf component entirely:<br />

>>> urlparse.urljoin('http://www.python.org/psf', 'grants')<br />

'http://www.python.org/grants'<br />

Like file system paths on the POSIX and Windows operating systems, . can be used for the current<br />

directory and .. is the name <strong>of</strong> the parent:<br />

>>> urlparse.urljoin('http://www.python.org/psf/', './mission')<br />

'http://www.python.org/psf/mission'<br />

>>> urlparse.urljoin('http://www.python.org/psf/', '../news/')<br />

'http://www.python.org/news/'<br />

>>> urlparse.urljoin('http://www.python.org/psf/', '/dev/')<br />

'http://www.python.org/dev'<br />

And, as illustrated in the last example, a relative URL that starts with a slash is assumed to live at the<br />

top level <strong>of</strong> the same site as the original URL.<br />

Happily, the urljoin() function ignores the base URL entirely if the second argument also happens<br />

to be an absolute URL. This means that you can simply pass every URL on a given web page to the<br />

urljoin() function, and any relative links will be converted; at the same time, absolute links will be<br />

passed through untouched:<br />

# Absolute links are safe from change<br />

>>> urlparse.urljoin('http://www.python.org/psf/', 'http://yelp.com/')<br />

'http://yelp.com/'<br />

As we will see in the next chapter, converting relative to absolute URLs is important whenever we<br />

are packaging content that lives under one URL so that it can be displayed at a different URL.<br />

Instrumenting urllib2<br />

We now turn to the HTTP protocol itself. Although its on-the-wire appearance is usually an internal<br />

detail handled by web browsers and libraries like urllib2, we are going to adjust its behavior so that we<br />

can see the protocol printed to the screen. Take a look at Listing 9–1.<br />

141

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!