23.07.2013 Views

Java IO.pdf - Nguyen Dang Binh

Java IO.pdf - Nguyen Dang Binh

Java IO.pdf - Nguyen Dang Binh

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Java</strong> I/O<br />

Although other schemes are possible, almost all modern computers have standardized on<br />

binary arithmetic performed on integers composed of an integral number of bytes.<br />

Furthermore, they've standardized on two's complement arithmetic for signed numbers. In<br />

two's complement arithmetic, the most significant bit is 1 for a negative number and for a<br />

positive number; the absolute value of a negative number is calculated by taking the<br />

complement of the number and adding 1. In <strong>Java</strong> terms, this means (-n == ~n + 1) is true<br />

where n is a negative int.<br />

Regrettably, this is about all that's been standardized. One big difference between computer<br />

architectures is the size of an int. Probably the majority of modern computers use four-byte<br />

integers that can hold a number between -2,147,483,648 and 2,147,483,647. However, some<br />

systems are moving to 64-bit architectures where the native integer ranges from -<br />

9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 and takes eight bytes. And many<br />

older systems use 16-bit integers that only range from -32,768 to 32,767. Exactly how many<br />

bytes a C compiler uses for each int is platform-dependent, which is one of many reasons C<br />

code isn't as portable as one might wish. The sizes of C's short and long are even less<br />

predictable and may or may not be the same as the size of a C int. <strong>Java</strong> always uses a twobyte<br />

short, a four-byte int, and an eight-byte long, and this is one of the reasons <strong>Java</strong> code<br />

is more portable than C code. However, you must be aware of varying integer widths when<br />

your <strong>Java</strong> code needs to communicate binary numbers with programs written in other<br />

languages.<br />

C compilers also allow various unsigned types. For example, an unsigned byte is a binary<br />

number between and 255; an unsigned two-byte integer is a number between and 65,535; an<br />

unsigned four-byte integer is a number between and 4,294,967,295. <strong>Java</strong> doesn't have any<br />

unsigned numeric data types (unless you count char), but the DataInputStream class does<br />

provide two methods to read unsigned bytes and unsigned shorts.<br />

Perhaps worst of all, modern computers are split almost down the middle between those that<br />

use a big-endian and a little-endian ordering of the bytes in an integer. In a little-endian<br />

architecture, used on Intel (x86, Pentium)-based computers, the most significant byte is at the<br />

highest address in memory. On the other hand, on a big-endian system, the most significant<br />

byte is at the lowest address in memory.<br />

For example, consider the number 1,108,836,360. In hexadecimal this is written as<br />

0x42178008. On a big-endian system the bytes are ordered much as they are in a hex literal;<br />

that is, 42, 17, 80, 08. On the other hand, on a little endian system this is reversed; that is, 08,<br />

80, 17, 42. If 1,108,836,360 is written into a file on a little-endian system, then read on a bigendian<br />

system without any special treatment, it comes out as 0x08801742; that is,<br />

142,612,29—not the same thing at all.<br />

<strong>Java</strong> uses big-endian integers exclusively. Data input streams read and data output streams<br />

write big-endian integers. Most Internet protocols that rely on binary numbers such as the<br />

time protocol implicitly assume "network byte order," which is a fancy way of saying "bigendian."<br />

And finally, almost all computers manufactured today, except those based on the<br />

Intel architecture, use big-endian byte orders, so the Intel is really the odd one out. However,<br />

the Intel is the 1000-pound gorilla of computer architectures, so it's impossible to ignore it or<br />

the data formats it supports. Later in this chapter, I'll develop a class for reading little-endian<br />

data.<br />

99

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!