17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8.4 Transforming <strong>TET</strong>ML with XSLT<br />

Very short overview of XSLT. XSLT (which stands for eXtensible Stylesheet Language<br />

Transformations) is a language for transforming XML documents to other documents.<br />

While the input is always an XML document (a <strong>TET</strong>ML document in our case), the output<br />

does not necessarily have to be XML. XSLT can also perform arbitrary calculations and<br />

produce plain text or HTML output. We will use XSLT stylesheets to process <strong>TET</strong>ML input<br />

in order to generate a new dataset (provided in text, XML, CSV, or HTML format)<br />

based on the input which in turn reflects the contents of a PDF document. The <strong>TET</strong>ML<br />

document has been created with the <strong>TET</strong> command-line tool or the <strong>TET</strong> library as explained<br />

in Section 8.1, »Creating <strong>TET</strong>ML«, page 89.<br />

While XSLT is very powerful, it is considerably different from conventional programming<br />

languages. We do not attempt to provide an introduction to XSLT programming<br />

in this section; please refer to the wide variety of printed and Web resources on this topic.<br />

We restrict our samples to XSLT 1.0. Although XSLT 2.0 implementations are available,<br />

they are not yet in widespread use compared to XSLT 1.0. The XSLT 1.0 specification<br />

can be found at www.w3.org/TR/xslt.<br />

However, we do want to assist you in getting XSLT processing of <strong>TET</strong>ML documents<br />

up and running quickly. This section describes the most important environments for<br />

Fig. 8.2<br />

<strong>TET</strong>ML element hierarchy for<br />

the page contents.<br />

8.4 Transforming <strong>TET</strong>ML with XSLT 97

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!