You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
CHAPTER 7 <strong>XML</strong> Parsers and Transformations<br />
of the SAX specification, the Java implementation is probably the most mature and<br />
most widely used.<br />
It’s important to understand that SAX is a standard for an application program<br />
interface (API). It specifies standards for classes that you use to build a SAX parser.<br />
This may sound confusing, especially if you’ve never programmed before.<br />
However, you can probably imagine the many steps that are necessary to read and<br />
transform an <strong>XML</strong> document. You need to write code for each step in order to build a<br />
parser to transform the <strong>XML</strong> documents. This is a tedious and time-consuming job.<br />
However, you can minimize the tedium and save time by using the classes of an<br />
API, which other developers have already written. Think of these classes as already<br />
assembled subparts of the parser. You assemble the subparts together to create a<br />
parser.<br />
You aren’t expected to write a parser, but you’ll need a parser in order to transform<br />
your <strong>XML</strong> document. A SAX parser (a parser that was developed using the SAX<br />
API) is designed to read large <strong>XML</strong> documents because it starts at the beginning of<br />
the <strong>XML</strong> document and reads a group of lines, called a block at a time, until it<br />
reaches the end of the document. The entire transformation process occurs in one<br />
reading.<br />
As it reads each block, the SAX parser determines if the block contains an <strong>XML</strong><br />
tag or information. If it’s an <strong>XML</strong> tag, the SAX parser compares the <strong>XML</strong> tag to the<br />
XSL and then transforms the information based on the XSL instructions. The SAX<br />
parser then reads the next block of the <strong>XML</strong> document.<br />
A block is discarded once it’s transformed. This frees memory for the next block,<br />
which gives the SAX parser an advantage over a DOM parser. A DOM parser loads<br />
the entire <strong>XML</strong> document in memory, which you’ll learn about in “The Document<br />
Object Model,” later in this chapter. The SAX parser requires a small amount of<br />
memory to transform a very large <strong>XML</strong> document.<br />
This advantage is also a disadvantage because a SAX parser cannot reference a<br />
block of an <strong>XML</strong> document other than the block that’s in memory. This means that<br />
it cannot modify <strong>XML</strong> information that has already been transformed based on the<br />
block that’s currently being read.<br />
A SAX parser gets one chance at reading each <strong>XML</strong> tag. Sometimes this is all<br />
you need, though for a more complex transformation, you’ll need to use a DOM<br />
parser that can reference any part of the <strong>XML</strong> document (see “The Document<br />
Object Model,” later in this chapter).<br />
Components of a SAX Parser<br />
There are four components in a SAX parser: the Content Handler, Error Handler,<br />
DTD Handler, and Entity Resolver.<br />
97