url - Universität zu Lübeck
url - Universität zu Lübeck
url - Universität zu Lübeck
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
14 CHAPTER 2. FUNDAMENTALS<br />
2.2 Document Type Definitions and XML Schema<br />
So far, we made no restriction on the elements’ labels and their structures. For<br />
most applications not every well-formed XML document is understandable and<br />
processable: For example, an auction system that expects XMark data will not be<br />
able to process an XML formatted list of publications. Technically it is possible to<br />
read and parse the elements but semantically the application is not aware how to<br />
deal with it.<br />
Therefore, we need a mechanism to declare a class or type of documents. This<br />
is done by schema languages like Document Type Definitions and XML Schema<br />
documents. The idea is to predefine the allowed element labels and to declare<br />
how they are allowed to be nested. Schemas are comparable to grammars for<br />
programming languages, however, context-free grammars describe sets of words<br />
whereas we need to describe sets of trees. The term ”schema” comes from the<br />
database community.<br />
If an XML document satisfies all constraints of a schema it is valid. Validity<br />
implies that a document is well-formed and is checked by validating parsers.<br />
2.2.1 DTD: Document Type Definition<br />
A significant feature that XML inherits from its predecessor SGML is the concept<br />
of a Document Type Definition (DTD). The DTD is an optional feature which provides<br />
a formal set of rules to define a document structure. It defines the elements<br />
that may be used and states where they may be applied in relation to each other.<br />
Therefore, the DTD defines the document’s hierarchy and granularity.<br />
In the following figure the DTD for an XMark fragment is presented.<br />
1 <br />
2 <br />
3 <br />
4 <br />
5 <br />
6 <br />
7 <br />
8 <br />
9 <br />
10 <br />
11 <br />
Figure 2.2: The DTD for an XMark fragment<br />
Line 2 states that the root element is an containing a sequence of ,<br />
, , and elements. The + symbol<br />
indicates that the payment> element may appear more than once. A ∗ symbol<br />
states that zero to many elements are allowed. The ? symbol indicates that an<br />
element may appear zero times or once. In the example an item may have a description<br />
but it does not need to have one. If no symbol is attached to an element<br />
it may appear exactly once as child.