09.11.2016 Views

Foundations of Python Network Programming 978-1-4302-3004-5

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

C H A P T E R 5<br />

■ ■ ■<br />

<strong>Network</strong> Data and <strong>Network</strong> Errors<br />

The first four chapters have given us a foundation: we have learned how hosts are named on an IP<br />

network, and we understand how to set up and tear down both TCP streams and UDP datagram<br />

connections between those hosts.<br />

But what data should we then send across those lengths? How should it be encoded and formatted?<br />

For what kinds <strong>of</strong> errors will our <strong>Python</strong> programs need to be prepared?<br />

These questions are all relevant regardless <strong>of</strong> whether we are using streams or datagrams. We will<br />

look at the basic answers in this chapter, and learn how to use sockets responsibly so that our data<br />

arrives intact.<br />

Text and Encodings<br />

If you were watching for it as you read the first few chapters, you may have caught me using two<br />

different terms for the same concept. Those terms were byte and octet, and by both words I always mean<br />

an 8-bit number—an ordered sequence <strong>of</strong> eight digits, that are each either a one or a zero. They are the<br />

fundamental units <strong>of</strong> data on modern computing systems, used both to represent raw binary numbers<br />

and to stand for characters or symbols. The binary number 1010000, for example, usually stands for<br />

either the number 80 or the letter P:<br />

>>> 0b1010000<br />

80<br />

>>> chr(0b1010000)<br />

'P'<br />

The reason that the Internet RFCs are so inveterate in their use <strong>of</strong> the term “octet” instead <strong>of</strong> “byte”<br />

is that the earliest <strong>of</strong> RFCs date from a very ancient era in which bytes could be one <strong>of</strong> several different<br />

lengths—byte sizes from as little as 5 to as many as 16 bits were used on various systems. So the term<br />

“octet,” meaning a “group <strong>of</strong> eight things,” is always used in the standards so that their meaning is<br />

unambiguous.<br />

Four bits <strong>of</strong>fer a mere sixteen values, which does not come close to even fitting our alphabet. But<br />

eight bits—the next-higher multiple <strong>of</strong> two—proved more than enough to fit both the upper and lower<br />

cases <strong>of</strong> our alphabet, all the digits, lots <strong>of</strong> punctuation, and 32 control codes, and it still left a whole half<br />

<strong>of</strong> the possible range <strong>of</strong> values empty. The problem is that many rival systems exist for the specific<br />

mapping used to turn characters into bytes, and the differences can cause problems unless both ends <strong>of</strong><br />

your network connection use the same rules.<br />

The use <strong>of</strong> ASCII for the basic English letters and numbers is nearly universal among network<br />

protocols these days. But when you begin to use more interesting characters, you have to be careful. In<br />

<strong>Python</strong> you should always represent a meaningful string <strong>of</strong> text with a “Unicode string” that is denoted<br />

with a leading u, like this:<br />

>>> elvish = u'Namárië!'<br />

71

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!