04.08.2014 Views

o_18ufhmfmq19t513t3lgmn5l1qa8a.pdf

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 15<br />

■ ■ ■<br />

Python and the Web<br />

This chapter tackles some aspects of Web programming with Python. This is a really vast area,<br />

but I've selected four main topics for your amusement: screen scraping, CGI, mod_python,<br />

and Web services. For extended examples using CGI, see Chapters 25 and 26. For an example<br />

of using the specific Web service protocol XML-RPC, see Chapter 27.<br />

WHAT ABOUT ZOPE?<br />

One area that I do not cover here, because it is really a topic worthy of books of its own (and such books have<br />

been written), is Web development with Zope. If you want to know more about Zope, a pure Python application<br />

server for Web development, you should check the Zope Web site (http://zope.org). A content management<br />

system built on Zope that is getting quite popular is Plone (http://plone.org). It, too, is definitely<br />

worth a look.<br />

There are many other Web development frameworks available for Python as well, some based on Zope<br />

and some not. Some options include Twisted (discussed in Chapter 14), along with its built-in Web server,<br />

CherryPy (http://cherrypy.org), SkunkWeb (http://skunkweb.sf.net), Webware for Python<br />

(http://webwareforpython.org), Quixote (http://mems-exchange.org/software/quixote),<br />

Spyce (http://spyce.sf.net), and Albatross (http://www.object-craft.com.au/projects/<br />

albatross). For a discussion of these options, see The Web Framework Shootout (http://colorstudy.<br />

com/docs/shootout.html). You should also check out the Python Wiki page on Web programming<br />

(http://wiki.python.org/moin/WebProgramming).<br />

Screen Scraping<br />

Screen scraping is a process whereby your program downloads Web pages and extracts information<br />

from them. This is a useful technique that pops up every time there is a page online that<br />

has information you want to use in your program. It is especially useful, of course, if the Web<br />

page in question is dynamic, that is, it changes over time. Otherwise, you could just download<br />

it once, and extract the information manually. (The ideal situation is, of course, one where the<br />

information is available through Web services, as discussed later in this chapter.)<br />

Conceptually, the technique is very simple. You download the data and analyze it. You<br />

could, for example, simply use urllib, get the Web page’s HTML source, and then use regular<br />

expressions (see Chapter 10) or some such to extract the information. Let’s say, for example,<br />

313

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!