15.04.2013 Views

Core Python Programming (2nd Edition)

Core Python Programming (2nd Edition)

Core Python Programming (2nd Edition)

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Exceptions<br />

UnicodeError is defined in the exceptions module as a subclass of ValueError. All exceptions related to<br />

Unicode encoding/decoding should be subclasses of UnicodeError. See also the string encode() method.<br />

Standard Encodings<br />

Table 6.9 presents an extremely short list of the more common encodings used in <strong>Python</strong>. For a more<br />

complete listing, please see the <strong>Python</strong> Documentation. Here is an online link:<br />

http://docs.python.org/lib/standard-encodings.html<br />

RE Engine Unicode-Aware<br />

The regular expression engine should be Unicode aware. See the re Code Module sidebar in Section 6.9.<br />

Table 6.9. Common Unicode Codecs/Encodings<br />

Codec Description<br />

utf-8 8-bit variable length encoding (default encoding)<br />

utf-16 16-bit variable length encoding (little/big endian)<br />

utf-16-le UTF-16 but explicitly little endian<br />

utf-16-be UTF-16 but explicitly big endian<br />

ascii 7-bit ASCII codepage<br />

iso-8859-1 ISO 8859-1 (Latin-1) codepage<br />

unicode-escape (See <strong>Python</strong> Unicode Constructors for a definition)<br />

raw-unicode-escape (See <strong>Python</strong> Unicode Constructors for a definition)<br />

native Dump of the internal format used by <strong>Python</strong><br />

String Format Operator<br />

For <strong>Python</strong> format strings: %s performs str(u) for Unicode objects embedded in <strong>Python</strong> strings, so the<br />

output will be u.encode(). If the format string is a Unicode object, all parameters are<br />

coerced to Unicode first and then put together and formatted according to the format string. Numbers<br />

are first converted to strings and then to Unicode. <strong>Python</strong> strings are interpreted as Unicode strings<br />

using the . Unicode objects are taken as is. All other string formatters should work<br />

accordingly. Here is an example:<br />

u"%s %s" % (u"abc", "abc") u"abc abc"

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!