02.06.2013 Views

XML Demystified

XML Demystified

XML Demystified

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 7 <strong>XML</strong> Parsers and Transformations<br />

of the SAX specification, the Java implementation is probably the most mature and<br />

most widely used.<br />

It’s important to understand that SAX is a standard for an application program<br />

interface (API). It specifies standards for classes that you use to build a SAX parser.<br />

This may sound confusing, especially if you’ve never programmed before.<br />

However, you can probably imagine the many steps that are necessary to read and<br />

transform an <strong>XML</strong> document. You need to write code for each step in order to build a<br />

parser to transform the <strong>XML</strong> documents. This is a tedious and time-consuming job.<br />

However, you can minimize the tedium and save time by using the classes of an<br />

API, which other developers have already written. Think of these classes as already<br />

assembled subparts of the parser. You assemble the subparts together to create a<br />

parser.<br />

You aren’t expected to write a parser, but you’ll need a parser in order to transform<br />

your <strong>XML</strong> document. A SAX parser (a parser that was developed using the SAX<br />

API) is designed to read large <strong>XML</strong> documents because it starts at the beginning of<br />

the <strong>XML</strong> document and reads a group of lines, called a block at a time, until it<br />

reaches the end of the document. The entire transformation process occurs in one<br />

reading.<br />

As it reads each block, the SAX parser determines if the block contains an <strong>XML</strong><br />

tag or information. If it’s an <strong>XML</strong> tag, the SAX parser compares the <strong>XML</strong> tag to the<br />

XSL and then transforms the information based on the XSL instructions. The SAX<br />

parser then reads the next block of the <strong>XML</strong> document.<br />

A block is discarded once it’s transformed. This frees memory for the next block,<br />

which gives the SAX parser an advantage over a DOM parser. A DOM parser loads<br />

the entire <strong>XML</strong> document in memory, which you’ll learn about in “The Document<br />

Object Model,” later in this chapter. The SAX parser requires a small amount of<br />

memory to transform a very large <strong>XML</strong> document.<br />

This advantage is also a disadvantage because a SAX parser cannot reference a<br />

block of an <strong>XML</strong> document other than the block that’s in memory. This means that<br />

it cannot modify <strong>XML</strong> information that has already been transformed based on the<br />

block that’s currently being read.<br />

A SAX parser gets one chance at reading each <strong>XML</strong> tag. Sometimes this is all<br />

you need, though for a more complex transformation, you’ll need to use a DOM<br />

parser that can reference any part of the <strong>XML</strong> document (see “The Document<br />

Object Model,” later in this chapter).<br />

Components of a SAX Parser<br />

There are four components in a SAX parser: the Content Handler, Error Handler,<br />

DTD Handler, and Entity Resolver.<br />

97

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!