o_18ufhmfmq19t513t3lgmn5l1qa8a.pdf

More documents

Recommendations

Info

CHAPTER 15 ■ ■ ■ Python and the Web This chapter tackles some aspects of Web programming with Python. This is a really vast area, but I've selected four main topics for your amusement: screen scraping, CGI, mod_python, and Web services. For extended examples using CGI, see Chapters 25 and 26. For an example of using the specific Web service protocol XML-RPC, see Chapter 27. WHAT ABOUT ZOPE? One area that I do not cover here, because it is really a topic worthy of books of its own (and such books have been written), is Web development with Zope. If you want to know more about Zope, a pure Python application server for Web development, you should check the Zope Web site (http://zope.org). A content management system built on Zope that is getting quite popular is Plone (http://plone.org). It, too, is definitely worth a look. There are many other Web development frameworks available for Python as well, some based on Zope and some not. Some options include Twisted (discussed in Chapter 14), along with its built-in Web server, CherryPy (http://cherrypy.org), SkunkWeb (http://skunkweb.sf.net), Webware for Python (http://webwareforpython.org), Quixote (http://mems-exchange.org/software/quixote), Spyce (http://spyce.sf.net), and Albatross (http://www.object-craft.com.au/projects/ albatross). For a discussion of these options, see The Web Framework Shootout (http://colorstudy. com/docs/shootout.html). You should also check out the Python Wiki page on Web programming (http://wiki.python.org/moin/WebProgramming). Screen Scraping Screen scraping is a process whereby your program downloads Web pages and extracts information from them. This is a useful technique that pops up every time there is a page online that has information you want to use in your program. It is especially useful, of course, if the Web page in question is dynamic, that is, it changes over time. Otherwise, you could just download it once, and extract the information manually. (The ideal situation is, of course, one where the information is available through Web services, as discussed later in this chapter.) Conceptually, the technique is very simple. You download the data and analyze it. You could, for example, simply use urllib, get the Web page’s HTML source, and then use regular expressions (see Chapter 10) or some such to extract the information. Let’s say, for example, 313
314 CHAPTER 15 ■ PYTHON AND THE WEB that you wanted to extract the various employer names and Web sites from the Python Job Board, at http://python.org/Jobs.html. You browse the source and see that the names and URLs can be found as links in h4 elements, like this (except on one, unbroken line): Google ... Listing 15-1 gives an example program using urllib and re to extract the required information. Listing 15-1. A Simple Screen Scraping Program from urllib import urlopen import re p = re.compile('(.*?)') text = urlopen('http://python.org/Jobs.html').read() for url, name in p.findall(text): print '%s (%s)' % (name, url) The code could certainly be improved (for example, by filtering out duplicates), but it does its job pretty well. There are, however, at least three weaknesses with this approach: • The regular expression isn’t exactly readable. For more complex HTML code and more complex queries, the expressions can become even more hairy and unmaintainable. • It doesn’t deal with HTML peculiarities like CDATA sections and character entities (such as &). If you encounter such beasts, the program will, most likely, fail. • The regular expression is tied to details in the HTML source code, rather than some more abstract structure. This means that small changes in how the Web page is structured can break the program. The following sections deal with two possible solutions for the problems posed by the regular expression-based approach. The first is to use a program called Tidy (as a Python library) together with XHTML parsing; the second is to use a library called Beautiful Soup, specifically designed for screen scraping. Tidy and XHTML Parsing The Python standard library has plenty of support for parsing structured formats such as HTML and XML (see the Python Library Reference, Section 13, “Structured Markup Processing Tools,” at http://python.org/doc/lib/markup.html). I discuss XML and XML parsing more in-depth in Chapter 22; in this section, I just give you the tools needed to deal with XHTML, the most upto-date dialect of HTML, which just happens to be a form of XML. If every Web page consisted of correct and valid XHTML, the job of parsing it would be quite simple. The problem is that older HTML dialects are a bit more sloppy, and some people don’t even care about the strictures of those sloppier dialects. The reason for this is, probably, that most Web browsers are quite forgiving, and will try to render even the most jumbled and
Page 2 and 3:
Beginning Python From Novice to Pro
Page 4:
For Ranveig
Page 7 and 8:
■CHAPTER 23 Project 4: In the New
Page 9 and 10:
viii ■CONTENTS Strings . . . . .
Page 11 and 12:
x ■CONTENTS ■CHAPTER 5 Conditio
Page 13 and 14:
xii ■CONTENTS ■CHAPTER 8 Except
Page 15 and 16:
xiv ■CONTENTS ■CHAPTER 11 Files
Page 17 and 18:
xvi ■CONTENTS Dynamic Web Pages w
Page 19 and 20:
xviii ■CONTENTS ■CHAPTER 20 Pro
Page 21 and 22:
xx ■CONTENTS Second Implementatio
Page 23 and 24:
xxii ■CONTENTS Preparations . . .
Page 26:
About the Technical Reviewer ■JER
Page 30 and 31:
Introduction A C program is like a
Page 32 and 33:
CHAPTER 1 ■ ■ ■ Instant Hacki
Page 34 and 35:
CHAPTER 1 ■ INSTANT HACKING: THE
Page 36 and 37:
Page 38 and 39:
Page 40 and 41:
Page 42 and 43:
Page 44 and 45:
Page 46 and 47:
Page 48 and 49:
Page 50 and 51:
Page 52 and 53:
Page 54 and 55:
Page 56 and 57:
Page 58 and 59:
Page 60 and 61:
Page 62 and 63:
CHAPTER 2 ■ ■ ■ Lists and Tup
Page 64 and 65:
CHAPTER 2 ■ LISTS AND TUPLES 33 I
Page 66 and 67:
CHAPTER 2 ■ LISTS AND TUPLES 35 A
Page 68 and 69:
CHAPTER 2 ■ LISTS AND TUPLES 37 >
Page 70 and 71:
CHAPTER 2 ■ LISTS AND TUPLES 39 M
Page 72 and 73:
CHAPTER 2 ■ LISTS AND TUPLES 41 T
Page 74 and 75:
CHAPTER 2 ■ LISTS AND TUPLES 43 L
Page 76 and 77:
CHAPTER 2 ■ LISTS AND TUPLES 45 W
Page 78 and 79:
CHAPTER 2 ■ LISTS AND TUPLES 47
Page 80 and 81:
CHAPTER 2 ■ LISTS AND TUPLES 49 a
Page 82 and 83:
CHAPTER 2 ■ LISTS AND TUPLES 51 S
Page 84 and 85:
CHAPTER 3 ■ ■ ■ Working with
Page 86 and 87:
CHAPTER 3 ■ WORKING WITH STRINGS
Page 88 and 89:
Page 90 and 91:
Page 92 and 93:
Page 94 and 95:
Page 96 and 97:
Page 98 and 99:
CHAPTER 4 ■ ■ ■ Dictionaries:
Page 100 and 101:
CHAPTER 4 ■ DICTIONARIES: WHEN IN
Page 102 and 103:
Page 104 and 105:
Page 106 and 107:
Page 108 and 109:
Page 110:
Page 113 and 114:
82 CHAPTER 5 ■ CONDITIONALS, LOOP
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
100 CHAPTER 5 ■ CONDITIONALS, LOO
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
110 CHAPTER 6 ■ ABSTRACTION But w
Page 143 and 144:
112 CHAPTER 6 ■ ABSTRACTION ■Ti
Page 145 and 146:
114 CHAPTER 6 ■ ABSTRACTION Can I
Page 147 and 148:
116 CHAPTER 6 ■ ABSTRACTION stora
Page 149 and 150:
118 CHAPTER 6 ■ ABSTRACTION 4. Yo
Page 151 and 152:
120 CHAPTER 6 ■ ABSTRACTION The p
Page 153 and 154:
122 CHAPTER 6 ■ ABSTRACTION >>> p
Page 155 and 156:
124 CHAPTER 6 ■ ABSTRACTION Also,
Page 157 and 158:
126 CHAPTER 6 ■ ABSTRACTION Feel
Page 159 and 160:
128 CHAPTER 6 ■ ABSTRACTION ■No
Page 161 and 162:
130 CHAPTER 6 ■ ABSTRACTION So yo
Page 163 and 164:
132 CHAPTER 6 ■ ABSTRACTION numer
Page 165 and 166:
134 CHAPTER 6 ■ ABSTRACTION LAMBD
Page 167 and 168:
136 CHAPTER 6 ■ ABSTRACTION In th
Page 169 and 170:
138 CHAPTER 6 ■ ABSTRACTION Scope
Page 171 and 172:
140 CHAPTER 7 ■ MORE ABSTRACTION
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 185 and 186:
Page 187 and 188:
Page 189 and 190:
Page 191 and 192:
160 CHAPTER 8 ■ EXCEPTIONS >>> im
Page 193 and 194:
162 CHAPTER 8 ■ EXCEPTIONS Custom
Page 195 and 196:
164 CHAPTER 8 ■ EXCEPTIONS More T
Page 197 and 198:
166 CHAPTER 8 ■ EXCEPTIONS try: x
Page 199 and 200:
168 CHAPTER 8 ■ EXCEPTIONS Invali
Page 201 and 202:
170 CHAPTER 8 ■ EXCEPTIONS If you
Page 204 and 205:
CHAPTER 9 ■ ■ ■ Magic Methods
Page 206 and 207:
CHAPTER 9 ■ MAGIC METHODS, PROPER
Page 208 and 209:
Page 210 and 211:
Page 212 and 213:
Page 214 and 215:
Page 216 and 217:
Page 218 and 219:
Page 220 and 221:
Page 222 and 223:
Page 224 and 225:
Page 226 and 227:
Page 228 and 229:
Page 230 and 231:
Page 232 and 233:
Page 234 and 235:
CHAPTER 10 ■ ■ ■ Batteries In
Page 236 and 237:
CHAPTER 10 ■ BATTERIES INCLUDED 2
Page 238 and 239:
Page 240 and 241:
Page 242 and 243:
Page 244 and 245:
Page 246 and 247:
Page 248 and 249:
Page 250 and 251:
Page 252 and 253:
Page 254 and 255:
Page 256 and 257:
Page 258 and 259:
Page 260 and 261:
Page 262 and 263:
Page 264 and 265:
Page 266 and 267:
Page 268 and 269:
Page 270 and 271:
Page 272 and 273:
Page 274 and 275:
Page 276 and 277:
Page 278 and 279:
Page 280 and 281:
Page 282 and 283:
Page 284:
Page 287 and 288:
256 CHAPTER 11 ■ FILES AND STUFF
Page 289 and 290:
Page 291 and 292:
Page 293 and 294: 262 CHAPTER 11 ■ FILES AND STUFF
Page 301 and 302: 270 CHAPTER 12 ■ GRAPHICAL USER I
Page 316 and 317: CHAPTER 13 ■ ■ ■ Database Sup
Page 318 and 319: CHAPTER 13 ■ DATABASE SUPPORT 287
Page 326: CHAPTER 13 ■ DATABASE SUPPORT 295
Page 329 and 330: 298 CHAPTER 14 ■ NETWORK PROGRAMM
Page 343: 312 CHAPTER 14 ■ NETWORK PROGRAMM
Page 347 and 348: 316 CHAPTER 15 ■ PYTHON AND THE W
Page 372 and 373: CHAPTER 16 ■ ■ ■ Testing, 1-2
Page 374 and 375: CHAPTER 16 ■ TESTING, 1-2-3 343 W
Page 376 and 377: CHAPTER 16 ■ TESTING, 1-2-3 345 d
Page 378 and 379: CHAPTER 16 ■ TESTING, 1-2-3 347 u
Page 380 and 381: CHAPTER 16 ■ TESTING, 1-2-3 349 F
Page 382 and 383: CHAPTER 16 ■ TESTING, 1-2-3 351 P
Page 384 and 385: CHAPTER 16 ■ TESTING, 1-2-3 353 "
Page 386: CHAPTER 16 ■ TESTING, 1-2-3 355 N
Page 389 and 390: 358 CHAPTER 17 ■ EXTENDING PYTHON
Page 395 and 396:
364 CHAPTER 17 ■ EXTENDING PYTHON
Page 397 and 398:
Page 399 and 400:
Page 401 and 402:
Page 404 and 405:
CHAPTER 18 ■ ■ ■ Packaging Yo
Page 406 and 407:
CHAPTER 18 ■ PACKAGING YOUR PROGR
Page 408 and 409:
Page 410 and 411:
Page 412 and 413:
CHAPTER 19 ■ ■ ■ Playful Prog
Page 414 and 415:
CHAPTER 19 ■ PLAYFUL PROGRAMMING
Page 416 and 417:
Page 418 and 419:
Page 420:
Page 423 and 424:
392 CHAPTER 20 ■ PROJECT 1: INSTA
Page 425 and 426:
Page 427 and 428:
Page 429 and 430:
Page 431 and 432:
Page 433 and 434:
Page 435 and 436:
Page 437 and 438:
Page 439 and 440:
Page 442 and 443:
CHAPTER 21 ■ ■ ■ Project 2: P
Page 444 and 445:
CHAPTER 21 ■ PROJECT 2: PAINTING
Page 446 and 447:
Page 448 and 449:
Page 450 and 451:
Page 452 and 453:
CHAPTER 22 ■ ■ ■ Project 3: X
Page 454 and 455:
CHAPTER 22 ■ PROJECT 3: XML FOR A
Page 456 and 457:
Page 458 and 459:
Page 460 and 461:
Page 462 and 463:
Page 464 and 465:
Page 466 and 467:
Page 468:
Page 471 and 472:
440 CHAPTER 23 ■ PROJECT 4: IN TH
Page 473 and 474:
Page 475 and 476:
Page 477 and 478:
Page 479 and 480:
Page 481 and 482:
Page 483 and 484:
Page 486 and 487:
CHAPTER 24 ■ ■ ■ Project 5: A
Page 488 and 489:
CHAPTER 24 ■ PROJECT 5: A VIRTUAL
Page 490 and 491:
Page 492 and 493:
Page 494 and 495:
Page 496 and 497:
Page 498 and 499:
Page 500 and 501:
Page 502 and 503:
Page 504 and 505:
CHAPTER 25 ■ ■ ■ Project 6: R
Page 506 and 507:
CHAPTER 25 ■ PROJECT 6: REMOTE ED
Page 508 and 509:
Page 510 and 511:
Page 512:
Page 515 and 516:
484 CHAPTER 26 ■ PROJECT 7: YOUR
Page 517 and 518:
Page 519 and 520:
Page 521 and 522:
Page 523 and 524:
Page 525 and 526:
Page 527 and 528:
Page 529 and 530:
Page 531 and 532:
500 CHAPTER 27 ■ PROJECT 8: FILE
Page 533 and 534:
Page 535 and 536:
Page 537 and 538:
Page 539 and 540:
Page 541 and 542:
Page 543 and 544:
Page 545 and 546:
Page 547 and 548:
Page 549 and 550:
Page 551 and 552:
Page 553 and 554:
Page 555 and 556:
Page 558 and 559:
CHAPTER 29 ■ ■ ■ Project 10:
Page 560 and 561:
CHAPTER 29 ■ PROJECT 10: DO-IT-YO
Page 562 and 563:
Page 564 and 565:
Page 566 and 567:
Page 568 and 569:
Page 570 and 571:
Page 572 and 573:
Page 574 and 575:
Page 576 and 577:
Page 578 and 579:
APPENDIX A ■ ■ ■ The Short Ve
Page 580 and 581:
APPENDIX A ■ THE SHORT VERSION 54
Page 582 and 583:
Page 584 and 585:
Page 586:
Page 589 and 590:
558 APPENDIX B ■ PYTHON REFERENCE
Page 591 and 592:
Page 593 and 594:
Page 595 and 596:
Page 597 and 598:
Page 599 and 600:
Page 601 and 602:
Page 603 and 604:
572 APPENDIX C ■ ONLINE RESOURCES
Page 606 and 607:
Index ■Symbols - operator 558 !=
Page 608 and 609:
■INDEX 577 assertEqual method Tes
Page 610 and 611:
■INDEX 579 cmd module 252, 501 Cm
Page 612 and 613:
■INDEX 581 get method 74-75 has_k
Page 614 and 615:
■INDEX 583 ■F Factory class twi
Page 616 and 617:
■INDEX 585 finding conflicts 197
Page 618 and 619:
■INDEX 587 initialization 38 nami
Page 620 and 621:
■INDEX 589 localtime function tim
Page 622 and 623:
■INDEX 591 nesting blocks 88 Netw
Page 624 and 625:
■INDEX 593 playful programming 38
Page 626 and 627:
■INDEX 595 Python C API 365 hand-
Page 628 and 629:
■INDEX 597 ■S safe_substitute m
Page 630 and 631:
■INDEX 599 split function re modu
Page 632 and 633:
■INDEX 601 TestCase class methods
Page 634 and 635:
■INDEX 603 further exploration 47
Page 641:
forums.apress.com FOR PROFESSIONALS
show all

o_18ufhmfmq19t513t3lgmn5l1qa8a.pdf

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?