23.07.2013 Views

Java IO.pdf - Nguyen Dang Binh

Java IO.pdf - Nguyen Dang Binh

Java IO.pdf - Nguyen Dang Binh

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Java</strong> I/O<br />

rush to snap up all the good ones like "TEXT" and "HTML". The list of codes that a particular<br />

Macintosh understands is stored in the Desktop database, a file users never see and only rarely<br />

have to worry about. Overall, this is a pretty good system that's worked incredibly well for<br />

more than a decade. Neither Windows nor Unix has anything nearly as simple and troublefree.<br />

Because Windows and Unix have not adopted Mac-style type and creator codes, <strong>Java</strong><br />

does not have any standard means for accessing them.<br />

None of these solutions is perfect. On a Mac you're likely to want to use Photoshop to create<br />

GIF files but use JPEGView or Netscape to view them. You can drag and drop the file onto<br />

the desired application, but only if both the file and the application you want to view it with<br />

are on the screen at the same time, which is not necessarily true if both are stored several<br />

folders deep. Furthermore, it's relatively hard to say that you want all text files opened in<br />

BBEdit. On the other hand, the Windows solution is prone to user error; filename extensions<br />

are too exposed. For example, novice HTML coders often can't understand why their HTML<br />

files painstakingly crafted in Notepad open as plaintext in Navigator. Notepad surreptitiously<br />

inserts a .txt extension on all the files it saves unless the filename is enclosed in double quote<br />

marks. For instance, a file saved as HelloWorld.html actually becomes HelloWorld.html.txt ,<br />

while a file saved as "HelloWorld.html" is saved with the expected name. Furthermore,<br />

filename extensions make it easy for a user to lie about the contents of a file, potentially<br />

confusing and crashing applications. (You can lie about a file type on a Mac too, but it takes a<br />

lot more work.) Finally, Windows provides absolutely no support for saying that you want<br />

one group of GIF images opened in Photoshop and another group opened in DeBabelizer.<br />

There are some algorithms that can attempt to determine a file's type from its contents, though<br />

these are also error-prone. Many file formats require files to begin with a particular magic<br />

number that uniquely identifies the format. For instance, all compiled <strong>Java</strong> class files begin<br />

with the number 0xCAFEBABE (in hexadecimal). If the first four bytes of a file aren't<br />

0xCAFEBABE, then it's definitely not a <strong>Java</strong> class file. Furthermore, barring deliberate fraud,<br />

there's only about a one in four billion chance that a random, non-<strong>Java</strong> file will begin with<br />

those four bytes. Unfortunately, only a few file formats require magic numbers. Text files, for<br />

instance, can begin with any four ASCII characters. There are some heuristics you can apply<br />

to identify such files. For example, a file of pure ASCII should not contain any bytes with<br />

values between 128 and 255 and should have a limited number of control characters with<br />

values less than 32. But such algorithms are complicated to devise and far from reliable. Even<br />

if you are able to identify a file as ASCII text, how would you determine whether it contains<br />

<strong>Java</strong> source code or a letter to your mother? Worse yet, how could you tell whether it contains<br />

<strong>Java</strong> source code or C source code? It's not impossible, barring deliberately perverse files like<br />

a concatenation of a C program with a <strong>Java</strong> program, but it's difficult and often not worth your<br />

time.<br />

One possible solution to the problem of identifying file types across platforms is using<br />

MIME. The file-extension-content type mappings listed in Table 12.1 are de facto standards.<br />

MIME, the Multipurpose Internet Mail Extensions, is a de jure (RFCs 2045-2049)<br />

specification for embedding and identifying arbitrary data types in Internet email. MIME is<br />

also used by HTTP servers that want to identify the kinds of data they're sending to a client.<br />

And, in the BeOS, it's used as a Mac-like means of identifying file types. A MIME type<br />

consists of a primary type like "text" or "image," followed by a forward slash, followed by a<br />

subtype like "html" or "gif." "text/html" is a typical MIME content type that indicates a file of<br />

textual information in the HTML format. MIME also uses x-types, like "application/x-tar"<br />

and "application/x-mif," to allow ad hoc extensions to the standard. There may be more than<br />

273

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!