23.07.2013 Views

Java IO.pdf - Nguyen Dang Binh

Java IO.pdf - Nguyen Dang Binh

Java IO.pdf - Nguyen Dang Binh

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

0 0 0 0 0 x10 x9 x8<br />

x7 x6 x5 x4 x3 x2 x1 x0<br />

<strong>Java</strong> I/O<br />

These characters are encoded into two bytes, but not in the most obvious fashion. The 11<br />

significant bits of the character are broken up:<br />

1 1 0 x10 x9 x8 x7 x6<br />

1 0 x5 x4 x3 x2 x1 x0<br />

Neither of the bytes that make up this number begins with a bit. Thus, you can distinguish<br />

between bytes that are part of a two-byte character and bytes that represent one-byte<br />

characters (which all begin with 0).<br />

The remaining characters have values between 2048 and 65,535. Any or all of the bits in these<br />

characters may take on either value or 1. Thus, they are encoded in three bytes, like this:<br />

1 1 1 0 x15 x14 x13 x12<br />

1 0 x11 x10 x9 x8 x7 x6<br />

1 0 x5 x4 x3 x2 x1 x0<br />

Within this scheme, any byte beginning with a bit must be a single-byte ASCII character<br />

between 1 and 127. Any byte beginning with the three bits 110 must be the first byte of a twobyte<br />

character. Any byte beginning with the four bits 1110 must be the first byte of a threebyte<br />

character. Finally, any byte beginning with the two bits 10 must be the second or third<br />

byte of a multibyte character.<br />

The DataOutputStream class provides a writeUTF() method that encodes a string into UTF-<br />

8 format. It first writes the number of encoded bytes in the string (as an unsigned short)<br />

followed by the UTF-8 encoded format of the string onto the underlying output stream:<br />

public final void writeUTF(String s) throws <strong>IO</strong>Exception<br />

The DataInputStream class provides two corresponding readUTF() methods to read a UTF-<br />

8 encoded string from its underlying input stream:<br />

public final String readUTF() throws <strong>IO</strong>Exception<br />

public static final String readUTF(DataInput in) throws <strong>IO</strong>Exception<br />

Each of these first reads a two-byte, unsigned short that tells it how many more bytes to read.<br />

These bytes are then read and decoded from UTF-8 into a <strong>Java</strong> Unicode string. An<br />

EOFException is thrown if the stream ends before all the expected bytes have been read. If<br />

the bytes read cannot be interpreted as a valid UTF-8 string, then a<br />

UTFDataFormatException is thrown.<br />

DataInputStream and DataOutputStream actually read and write a slight modification of<br />

the official UTF-8 format. They encode the null character (0x00) in two bytes rather than one.<br />

This makes it slightly easier for C code that expects null-terminated strings to parse <strong>Java</strong><br />

.class files. The Reader and Writer classes discussed in the next chapter read and write true<br />

UTF-8 with one-byte nulls.<br />

347

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!