PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
8.3 <strong>TET</strong>ML Elements and the <strong>TET</strong>ML Schema<br />
A formal XML schema description (XSD) for all <strong>TET</strong>ML elements and attributes as well as<br />
their relationships is contained in the <strong>TET</strong> distribution. The <strong>TET</strong>ML namespace is the following:<br />
http://www.pdflib.com/XML/<strong>TET</strong>3/<strong>TET</strong>-3.0<br />
The schema can be downloaded from the following URL on the Web:<br />
http://www.pdflib.com/XML/<strong>TET</strong>3/<strong>TET</strong>-3.0.xsd<br />
Both <strong>TET</strong>ML namespace and schema location are present in the root element of each<br />
<strong>TET</strong>ML document.<br />
Table 8.3 describes the role of all <strong>TET</strong>ML elements. Figure 8.1 visualizes the XML hierarchy<br />
of the top-level <strong>TET</strong>ML elements. The hierarchy for the Content element is shown<br />
in Figure 8.2.<br />
Fig. 8.1<br />
Main <strong>TET</strong>ML element hierarchy. Optional<br />
elements are enclosed with dashed boxes;<br />
elements in stroked boxes are required.<br />
96 Chapter 8: <strong>TET</strong> Markup Language (<strong>TET</strong>ML)