23.07.2013 Views

Java IO.pdf - Nguyen Dang Binh

Java IO.pdf - Nguyen Dang Binh

Java IO.pdf - Nguyen Dang Binh

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Java</strong> I/O<br />

downloads the page, and compares it to a previously retrieved copy for changes. However, if<br />

you need to do this for hundreds or thousands of web pages, the space to store the pages<br />

becomes prohibitive. Email clients have similar needs. Many broken mail clients and mailing<br />

list managers send multiple copies of the same message. A mail client should recognize when<br />

multiple copies of the same message are being passed through the system and delete them. On<br />

an ISP level, it might be possible to use this as a spam filter by comparing messages sent to<br />

different customers.<br />

All these tasks need a way to compare files at different times without storing the files<br />

themselves. You can write a special kind of method called a hash function that reads an<br />

indefinite number of sequential bytes and assigns a number to that sequence of bytes. This<br />

number is called a hash code or digest. The size of the number depends on the hash function.<br />

It is not necessarily the same size as any <strong>Java</strong> primitive data type like int or long. For<br />

instance, digests calculated with the SHA algorithm are 20-byte numbers. You can store the<br />

digest of the files, then compare the digests. The digests are generally much smaller than the<br />

files themselves.<br />

Hash functions are also used in digital signatures. To indicate that you actually authored a<br />

document, you first calculate the hash function for the message, then encrypt the hash code<br />

with your private key. To check your signature, the recipient of the message decrypts the hash<br />

code with your public key and compares it to the hash function you calculated. If they match,<br />

then only someone who knew your private key could have signed the message. Although you<br />

could simply encrypt the entire message with your private key rather than a hash code, public<br />

key algorithms are rather slow, and encrypting a 20-byte hash code is much faster than<br />

encrypting even a short email message. In <strong>Java</strong>, digital signatures are implemented through<br />

the java.security.Signature class. We won't talk much about that class in this book, but it<br />

is dependent on the MessageDigest classes we will discuss.<br />

10.1.1 Requirements for Hash Functions<br />

Hash codes are calculated by hash functions, and there are better and worse hash functions.<br />

Good hash functions (also called strong hash functions) make it extremely unlikely that two<br />

different documents will share a hash value. Furthermore, hash functions used for<br />

cryptography must also be one-way hash functions—that is, given a hash code, you should<br />

not be able to create a document with that hash code. A strong one-way hash function must<br />

meet several related criteria. Among these criteria are the following:<br />

• Hash functions are deterministic. The same document always has the same hash code.<br />

The hash code does not depend on the time it's calculated, a random number, or<br />

anything other than the sequence of bytes in the document. Without this requirement,<br />

the same document could have different hash codes at different times, thus indicating<br />

that documents had changed, when in fact they hadn't.<br />

• Hash codes should be uniformly distributed throughout the available range. Given any<br />

sample of the documents you wish to track, all hash codes are equally likely. For<br />

instance, given a 64-bit hash code, which might be interpreted as a long integer, it<br />

would be an error if even numbers were substantially more likely than odd numbers.<br />

• Hash codes should be extremely difficult to reverse engineer. Given a hash code, there<br />

should be no means easier than brute force to produce a document that matches that<br />

hash code. For instance, if I know the hash code is 9,423,456,789, I shouldn't be able<br />

to then create a file that happens to have that exact hash code.<br />

194

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!