10.12.2012 Views

The Java EE 5 Tutorial (PDF) - Oracle Software Downloads

The Java EE 5 Tutorial (PDF) - Oracle Software Downloads

The Java EE 5 Tutorial (PDF) - Oracle Software Downloads

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A<br />

APPENDIX A<br />

<strong>Java</strong> Encoding Schemes<br />

This appendix describes the character-encoding schemes that are supported by the <strong>Java</strong><br />

platform.<br />

US-ASCII<br />

US-ASCII is a 7-bit character set and encoding that covers the English-language alphabet. It is<br />

not large enough to cover the characters used in other languages, however, so it is not very<br />

useful for internationalization.<br />

ISO-8859-1<br />

ISO-8859-1 is the character set for Western European languages. It’s an 8-bit encoding scheme<br />

in which every encoded character takes exactly 8 bits. (With the remaining character sets, on the<br />

other hand, some codes are reserved to signal the start of a multibyte character.)<br />

UTF-8<br />

UTF-8 is an 8-bit encoding scheme. Characters from the English-language alphabet are all<br />

encoded using an 8-bit byte. Characters for other languages are encoded using 2, 3, or even 4<br />

bytes. UTF-8 therefore produces compact documents for the English language, but for other<br />

languages, documents tend to be half again as large as they would be if they used UTF-16. If the<br />

majority of a document’s text is in a Western European language, then UTF-8 is generally a<br />

good choice because it allows for internationalization while still minimizing the space required<br />

for encoding.<br />

UTF-16<br />

UTF-16 is a 16-bit encoding scheme. It is large enough to encode all the characters from all the<br />

alphabets in the world. It uses 16 bits for most characters but includes 32-bit characters for<br />

ideogram-based languages such as Chinese. A Western European-language document that uses<br />

UTF-16 will be twice as large as the same document encoded using UTF-8. But documents<br />

written in far Eastern languages will be far smaller using UTF-16.<br />

1075

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!