Foundations of Python Network Programming 978-1-4302-3004-5

Recommendations

Info

CHAPTER 5 ■ NETWORK DATA AND NETWORK ERRORS But you cannot put such strings directly on a network connection without specifying which rival system of encoding you want to use to mix your characters down to bytes. A very popular system is UTF- 8, because normal characters are represented by the same codes as in ASCII, and longer sequences of bytes are necessary only for international characters: >>> elvish.encode('utf-8') 'Nam\xc3\xa1ri\xc3\xab!' You can see, for example, that UTF-8 represented the letter ë by a pair of bytes with hex values C3 and AB. Be very sure, by the way, that you understand what it means when Python prints out a normal string like the one just given. The letters strung between quotation characters with no leading u do not inherently represent letters; they do not inherently represent anything until your program decides to do something with them. They are just bytes, and Python is willing to store them for you without having the foggiest idea what they mean. Other encodings are available in Python—the Standard Library documentation for the codecs package lists them all. They each represent a full system for reducing symbols to bytes. Here are a few examples of the byte strings produced when you try encoding the same word in different ways; because each successive example has less in common with ASCII, you will see that Python's choice to use ASCII to represent the bytes in strings makes less and less sense: >>> elvish.encode('utf-16') '\xff\xfeN\x00a\x00m\x00\xe1\x00r\x00i\x00\xeb\x00!\x00' >>> elvish.encode('cp1252') 'Nam\xe1ri\xeb!' >>> elvish.encode('idna') 'xn--namri!-rta6f' >>> elvish.encode('cp500') '\xd5\x81\x94E\x99\x89SO' You might be surprised that my first example was the encoding UTF-16, since at first glance it seems to have created a far greater mess than the encodings that follow. But if you look closely, you will see that it is simply using two bytes—sixteen bits—for each character, so that most of the characters are simply a null character \x00 followed by the plain ASCII character that belongs in the string. (Note that the string also begins with a special sequence \xff\xfe that designates the byte order in use; see the next section for more about this concept.) On the receiving end of such a string, simply take the byte string and call its decode() method with the name of the codec that was used to encode it: >>> print '\xd5\x81\x94E\x99\x89SO'.decode('cp500') Namárië! These two steps—encoding to a byte string, and then decoding again on the receiving end—are essential if you are sending real text across the network and want it to arrive intact. Some of the protocols that we will learn about later in this book handle encodings for you (see, for example, the description of HTTP in Chapter 9), but if you are going to write byte strings to raw sockets, then you will not be able to avoid tackling the issue yourself. Of course, many encodings do not support enough characters to encode all of the symbols in certain pieces of text. The old-fashioned 7-bit ASCII encoding, for example, simply cannot represent the string we have been working with: >>> elvish.encode('ascii') Traceback (most recent call last): ... UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 3: ordinal not in range(128) 72
CHAPTER 5 ■ NETWORK DATA AND NETWORK ERRORS Note that some encodings have the property that every character they are able to encode will be represented by the same number of bytes; ASCII uses one byte for every character, for example, and UTF-32 uses four. If you use one of these encodings, then you can both determine the number of characters in a string by a simple examination of the number of bytes it contains, and jump to character n of the string very efficiently. (Note that UTF-16 does not have this property, since it uses 16 bits for some characters and 32 bits for others.) Some encodings also add prefix characters that are not part of the string, but help the decoder detect the byte ordering that was used (byte order is discussed in the next section)—thus the \xff\xfe prefix that Python's UTF-16 encoder added to the beginning of our string. Read the codecs package documentation and, if necessary, the specifications for particular encodings to learn more about the actions they perform when turning your stream of symbols into bytes. Note that it is dangerous to decode a partially received message if you are using an encoding that encodes some characters using multiple bytes, since one of those characters might have been split between the part of the message that you have already received and the packets that have not yet arrived. See the section later in this chapter on “Framing” for some approaches to this issue. Network Byte Order If all you ever want to send across the network is text, then encoding and framing (which we tackle in the next section) will be your only worries. But sometimes you might want to represent your data in a more compact format than text makes possible. Or you might be writing Python code to interface with a service that has already made the choice to use raw binary data. In either case, you will probably have to start worrying about a new issue: network byte order. To understand the issue of byte order, consider the process of sending an integer over the network. To be specific, think about the integer 4253. Many protocols, of course, will simply transmit this integer as the string '4253'—that is, as four distinct characters. The four digits will require at least four bytes to transmit, at least in any common text encoding. And using decimal digits will also involve some computational expense: since numbers are not stored inside computers in base 10, it will take repeated division—with inspection of the remainder—to determine that this number is in fact made of 4 thousands, plus 2 hundreds, plus 5 tens, plus 3 left over. And when the four-digit string '4253' is received, repeated addition and multiplication by powers of ten will be necessary to put the text back together into a number. Despite its verbosity, the technique of using plain text for numbers may actually be the most popular on the Internet today. Every time you fetch a web page, for example, the HTTP protocol expresses the Content-Length of the result using a string of decimal digits just like '4253'. Both the web server and client do the decimal conversion without a second thought, despite the bit of expense. Much of the story of the last 20 years in networking, in fact, has been the replacement of dense binary formats with protocols that are simple, obvious, and human-readable—even if computationally expensive compared to their predecessors. (Of course, multiplication and division are also cheaper on modern processors than back when binary formats were more common—not only because processors have experienced a vast increase in speed, but because their designers have become much more clever about implementing integer math, so that the same operation requires far fewer cycles today than on the processors of, say, the early 1980s.) In any case, the string '4253' is not how your computer represents this number as an integer variable in Python. Instead it will store it as a binary number, using the bits of several successive bytes to represent the one's place, two's place, four's place, and so forth of a single large number. We can glimpse the way that the integer is stored by using the hex() built-in function at the Python prompt: >>> hex(4253) '0x109d' 73
Page 2 and 3:
Foundations of Python Network Progr
Page 4 and 5:
To the Python community for creatin
Page 6 and 7:
Contents ■Contents at a Glance ..
Page 8 and 9:
■ CONTENTS Asking getaddrinfo() W
Page 10 and 11:
■ CONTENTS Using Message Queues f
Page 12 and 13:
■ CONTENTS Parsing Dates ........
Page 14 and 15:
■ CONTENTS Telnet ...............
Page 16 and 17:
About the Authors ■ Brandon Craig
Page 18 and 19:
Acknowledgements This book owes its
Page 20 and 21:
■ INTRODUCTION If you do know som
Page 22 and 23:
C H A P T E R 1 ■ ■ ■ Introdu
Page 24 and 25:
CHAPTER 1 ■ INTRODUCTION TO CLIEN
Page 26 and 27:
Page 28 and 29:
Page 30 and 31:
Page 32 and 33:
Page 34 and 35:
Page 36 and 37:
C H A P T E R 2 ■ ■ ■ UDP The
Page 38 and 39:
CHAPTER 2 ■ UDP server with SSH.
Page 40 and 41:
CHAPTER 2 ■ UDP them anywhere in
Page 42 and 43: CHAPTER 2 ■ UDP command-line argu
Page 44 and 45: CHAPTER 2 ■ UDP » » » » raise
Page 46 and 47: CHAPTER 2 ■ UDP world itself give
Page 48 and 49: CHAPTER 2 ■ UDP socket that is no
Page 50 and 51: CHAPTER 2 ■ UDP So binding to an
Page 52 and 53: CHAPTER 2 ■ UDP s.connect((hostna
Page 54 and 55: CHAPTER 2 ■ UDP else: » print >>
Page 56 and 57: C H A P T E R 3 ■ ■ ■ TCP The
Page 58 and 59: CHAPTER 3 ■ TCP situation), and t
Page 60 and 61: CHAPTER 3 ■ TCP » reply = recv_a
Page 62 and 63: CHAPTER 3 ■ TCP guess when the in
Page 64 and 65: CHAPTER 3 ■ TCP the system has no
Page 66 and 67: CHAPTER 3 ■ TCP » » » print '\
Page 68 and 69: CHAPTER 3 ■ TCP $ python tcp_dead
Page 70 and 71: CHAPTER 3 ■ TCP Using TCP Streams
Page 72 and 73: CHAPTER 4 ■ SOCKET NAMES AND DNS
Page 94 and 95: CHAPTER 5 ■ NETWORK DATA AND NETW
Page 106 and 107: C H A P T E R 6 ■ ■ ■ TLS and
Page 108 and 109: CHAPTER 6 ■ TLS AND SSL systems a
Page 110 and 111: CHAPTER 6 ■ TLS AND SSL • He wi
Page 112 and 113: CHAPTER 6 ■ TLS AND SSL discussio
Page 114 and 115: CHAPTER 6 ■ TLS AND SSL • The s
Page 116 and 117: CHAPTER 6 ■ TLS AND SSL The Links
Page 118 and 119: C H A P T E R 7 ■ ■ ■ Server
Page 120 and 121: CHAPTER 7 ■ SERVER ARCHITECTURE P
Page 122 and 123: CHAPTER 7 ■ SERVER ARCHITECTURE
Page 124 and 125: CHAPTER 7 ■ SERVER ARCHITECTURE N
Page 126 and 127: CHAPTER 7 ■ SERVER ARCHITECTURE L
Page 128 and 129: CHAPTER 7 ■ SERVER ARCHITECTURE F
Page 132 and 133: CHAPTER 7 ■ SERVER ARCHITECTURE p
Page 134 and 135: CHAPTER 7 ■ SERVER ARCHITECTURE N
Page 136 and 137: CHAPTER 7 ■ SERVER ARCHITECTURE L
Page 140 and 141: CHAPTER 7 ■ SERVER ARCHITECTURE s
Page 142 and 143:
CHAPTER 7 ■ SERVER ARCHITECTURE c
Page 144 and 145:
C H A P T E R 8 ■ ■ ■ Caches,
Page 146 and 147:
CHAPTER 8 ■ CACHES, MESSAGE QUEUE
Page 148 and 149:
Page 150 and 151:
Page 152 and 153:
Page 154 and 155:
Page 156 and 157:
C H A P T E R 9 ■ ■ ■ HTTP Th
Page 158 and 159:
CHAPTER 9 ■ HTTP Here, the URL sp
Page 160 and 161:
CHAPTER 9 ■ HTTP Relative URLs Ve
Page 162 and 163:
CHAPTER 9 ■ HTTP From now on, I a
Page 164 and 165:
CHAPTER 9 ■ HTTP • 303 See Othe
Page 166 and 167:
CHAPTER 9 ■ HTTP You cannot tell
Page 168 and 169:
CHAPTER 9 ■ HTTP Instead of stuff
Page 170 and 171:
CHAPTER 9 ■ HTTP POST And APIs Al
Page 172 and 173:
CHAPTER 9 ■ HTTP Content Type Neg
Page 174 and 175:
CHAPTER 9 ■ HTTP HTTP Caching Man
Page 176 and 177:
CHAPTER 9 ■ HTTP If the connectio
Page 178 and 179:
CHAPTER 9 ■ HTTP >>> import cooki
Page 180 and 181:
CHAPTER 9 ■ HTTP So the technique
Page 182 and 183:
C H A P T E R 10 ■ ■ ■ Screen
Page 184 and 185:
CHAPTER 10 ■ SCREEN SCRAPING Figu
Page 186 and 187:
CHAPTER 10 ■ SCREEN SCRAPING cont
Page 188 and 189:
CHAPTER 10 ■ SCREEN SCRAPING Thir
Page 190 and 191:
CHAPTER 10 ■ SCREEN SCRAPING Ther
Page 192 and 193:
CHAPTER 10 ■ SCREEN SCRAPING Beau
Page 194 and 195:
CHAPTER 10 ■ SCREEN SCRAPING If y
Page 196 and 197:
CHAPTER 10 ■ SCREEN SCRAPING Cond
Page 198 and 199:
C H A P T E R 11 ■ ■ ■ Web Ap
Page 200 and 201:
CHAPTER 11 ■ WEB APPLICATIONS Thi
Page 202 and 203:
CHAPTER 11 ■ WEB APPLICATIONS But
Page 204 and 205:
CHAPTER 11 ■ WEB APPLICATIONS the
Page 206 and 207:
CHAPTER 11 ■ WEB APPLICATIONS •
Page 208 and 209:
CHAPTER 11 ■ WEB APPLICATIONS hig
Page 210 and 211:
CHAPTER 11 ■ WEB APPLICATIONS The
Page 212 and 213:
CHAPTER 11 ■ WEB APPLICATIONS the
Page 214 and 215:
CHAPTER 11 ■ WEB APPLICATIONS The
Page 216 and 217:
C H A P T E R 12 ■ ■ ■ E-mail
Page 218 and 219:
CHAPTER 12 ■ E-MAIL COMPOSITION A
Page 220 and 221:
Page 222 and 223:
Page 224 and 225:
Page 226 and 227:
Page 228 and 229:
Page 230 and 231:
Page 232 and 233:
Page 234 and 235:
Page 236 and 237:
C H A P T E R 13 ■ ■ ■ SMTP A
Page 238 and 239:
CHAPTER 13 ■ SMTP anyway. Outgoin
Page 240 and 241:
CHAPTER 13 ■ SMTP How SMTP Is Use
Page 242 and 243:
CHAPTER 13 ■ SMTP This mechanism
Page 244 and 245:
CHAPTER 13 ■ SMTP s = smtplib.SMT
Page 246 and 247:
CHAPTER 13 ■ SMTP ETRN STARTTLS X
Page 248 and 249:
CHAPTER 13 ■ SMTP » s = smtplib.
Page 250 and 251:
CHAPTER 13 ■ SMTP exchange mail o
Page 252 and 253:
CHAPTER 13 ■ SMTP username = sys.
Page 254 and 255:
C H A P T E R 14 ■ ■ ■ POP PO
Page 256 and 257:
CHAPTER 14 ■ POP ■ Caution! Whi
Page 258 and 259:
CHAPTER 14 ■ POP finally: » p.qu
Page 260 and 261:
CHAPTER 14 ■ POP Subject: Backup
Page 262 and 263:
CHAPTER 15 ■ IMAP THE IMAP PROTOC
Page 264 and 265:
CHAPTER 15 ■ IMAP '(\\HasNoChildr
Page 266 and 267:
CHAPTER 15 ■ IMAP Examining Folde
Page 268 and 269:
CHAPTER 15 ■ IMAP Listing 15-5. D
Page 270 and 271:
CHAPTER 15 ■ IMAP key that IMAP h
Page 272 and 273:
CHAPTER 15 ■ IMAP » » print def
Page 274 and 275:
CHAPTER 15 ■ IMAP » From: Brando
Page 276 and 277:
CHAPTER 15 ■ IMAP • \Flagged: T
Page 278 and 279:
CHAPTER 15 ■ IMAP An IMAP message
Page 280 and 281:
CHAPTER 15 ■ IMAP display or summ
Page 282 and 283:
CHAPTER 16 ■ TELNET AND SSH cloud
Page 284 and 285:
CHAPTER 16 ■ TELNET AND SSH Unix
Page 286 and 287:
CHAPTER 16 ■ TELNET AND SSH Do yo
Page 288 and 289:
CHAPTER 16 ■ TELNET AND SSH As we
Page 290 and 291:
CHAPTER 16 ■ TELNET AND SSH tabif
Page 292 and 293:
CHAPTER 16 ■ TELNET AND SSH repla
Page 294 and 295:
CHAPTER 16 ■ TELNET AND SSH Listi
Page 296 and 297:
CHAPTER 16 ■ TELNET AND SSH def p
Page 298 and 299:
CHAPTER 16 ■ TELNET AND SSH We wi
Page 300 and 301:
CHAPTER 16 ■ TELNET AND SSH • p
Page 302 and 303:
CHAPTER 16 ■ TELNET AND SSH You w
Page 304 and 305:
CHAPTER 16 ■ TELNET AND SSH » »
Page 306 and 307:
CHAPTER 16 ■ TELNET AND SSH Listi
Page 308 and 309:
CHAPTER 16 ■ TELNET AND SSH Summa
Page 310 and 311:
CHAPTER 17 ■ FTP The biggest prob
Page 312 and 313:
CHAPTER 17 ■ FTP f.login() print
Page 314 and 315:
CHAPTER 17 ■ FTP if os.path.exist
Page 316 and 317:
CHAPTER 17 ■ FTP f = FTP(host) f.
Page 318 and 319:
CHAPTER 17 ■ FTP Windows servers
Page 320 and 321:
CHAPTER 17 ■ FTP » try: » » f.
Page 322 and 323:
C H A P T E R 18 ■ ■ ■ RPC Re
Page 324 and 325:
CHAPTER 18 ■ RPC sort of proxy ex
Page 326 and 327:
CHAPTER 18 ■ RPC The SimpleXMLRPC
Page 328 and 329:
CHAPTER 18 ■ RPC Traceback (most
Page 330 and 331:
CHAPTER 18 ■ RPC 8.0 If this
Page 332 and 333:
CHAPTER 18 ■ RPC Note that the po
Page 334 and 335:
CHAPTER 18 ■ RPC up being, simply
Page 336 and 337:
CHAPTER 18 ■ RPC such as Python i
Page 338 and 339:
CHAPTER 18 ■ RPC • Google Proto
Page 340 and 341:
■ INDEX mod_python, 194 Qpid, 131
Page 342 and 343:
■ INDEX Common Gateway Interface.
Page 344 and 345:
■ INDEX international characters
Page 346 and 347:
■ INDEX front-end web servers, 17
Page 348 and 349:
■ INDEX deleting folders, 260 del
Page 350 and 351:
■ INDEX mechanize, 138, 163 Memca
Page 352 and 353:
■ INDEX pausing terminal output,
Page 354 and 355:
■ INDEX resources. See also RFCs
Page 356 and 357:
■ INDEX shutdown(), 48 shutting d
Page 358 and 359:
■ INDEX terminals, 270-74 bufferi
Page 360 and 361:
■ INDEX ■ V validating cached r
show all

Foundations of Python Network Programming 978-1-4302-3004-5

Create successful ePaper yourself

Delete template?

Save as template?