23.07.2013 Views

Java IO.pdf - Nguyen Dang Binh

Java IO.pdf - Nguyen Dang Binh

Java IO.pdf - Nguyen Dang Binh

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Java</strong> I/O<br />

Table B.4 lists the encodings that <strong>Java</strong>, javac, and native2ascii understand. Detailed<br />

information about how these character sets map to Unicode can be found in the various files<br />

at ftp://ftp.unicode.org/Public/MAPPINGS/.<br />

Table B.4. Available Encodings in <strong>Java</strong> 1.1<br />

Name Encoding<br />

ISO Latin-1, ASCII plus the characters needed for most Western European languages,<br />

8859_1<br />

including Danish, Dutch, English, Faroese, Finnish, Flemish, German, Icelandic, Irish,<br />

Italian, Norwegian, Portuguese, Spanish, and Swedish. Some non-European languages are<br />

also sometimes written with these characters, including Hawaiian, Indonesian, and Swahili.<br />

8859_2<br />

ISO Latin-2, ASCII plus the characters needed for most Central European languages,<br />

including Croatian, Czech, Hungarian, Polish, Romanian, Slovak, and Slovenian.<br />

8859_3<br />

ISO Latin-3, ASCII plus the characters needed for Esperanto, Maltese, Turkish, and<br />

Galician, though Latin-5, ISO 8859-9, is now preferred for Turkish.<br />

8859_4<br />

ISO Latin-4, ASCII plus the characters needed for the Baltic languages Latvian, Lithuanian,<br />

Greenlandic, and Lappish<br />

8859_5<br />

ASCII plus variant forms of Cyrillic characters used for Byelorussian, Bulgarian,<br />

Macedonian, Russian, Serbian, and Ukrainian.<br />

8859_6 ASCII plus Arabic.<br />

8859_7 ASCII plus modern Greek.<br />

8859_8 ASCII plus Hebrew.<br />

8859_9<br />

ISO Latin-5. This is essentially the same as Latin-1 (ASCII plus Western Europe) except that<br />

the Icelandic letters Ý, ý, Ð, ð, and þ are replaced with the Turkish letters , , , , , and .<br />

Big5 Big5, traditional Chinese.<br />

CNS11643<br />

CNS 11643, the Han character standard interchange code for general use, traditional<br />

Chinese.<br />

EBCDIC, a non-ASCII-based character set, primarily used on IBM mainframes. This version<br />

Cp037<br />

includes characters for U.S., Australian, and Canadian English, Canadian French, Dutch, and<br />

Portuguese.<br />

Cp273 EBCDIC for German.<br />

Cp277 EBCDIC for Danish and Norwegian.<br />

Cp278 EBCDIC for Finnish and Swedish.<br />

Cp280 EBCDIC for Italian.<br />

Cp284 EBCDIC for Spanish and Catalan.<br />

Cp285 EBCDIC for British English (also used in Ireland).<br />

Cp297 EBCDIC for French.<br />

Cp420 EBCDIC for Arabic.<br />

Cp424 EBCDIC for Hebrew.<br />

DOS English character set for the United States, Australia, New Zealand, South Africa,<br />

Cp437<br />

ASCII plus various accented characters. Furthermore, like most DOS-based character sets,<br />

this includes a number of line and corner characters commonly used to display spreadsheets<br />

and tables. These are called box-drawing characters.<br />

Cp500 EBCDIC International, essentially a reordered set of the same characters in Latin-1.<br />

Cp737 DOS ASCII plus Greek and various box-drawing characters in the upper 128 places.<br />

Cp775 DOS ASCII plus Baltic and various box-drawing characters in the upper 128 places.<br />

Cp838 EBCDIC for Thai.<br />

Cp850<br />

DOS ASCII plus Western European and various box-drawing characters in the upper 128<br />

places.<br />

Cp852<br />

DOS ASCII plus Central European and various box-drawing characters in the upper 128<br />

places.<br />

Cp855 DOS ASCII plus Cyrillic and various box-drawing characters in the upper 128 places.<br />

468

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!