PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
unning XSLT stylesheets, and lists common software for this purpose. In order to apply<br />
XSLT stylesheets to XML documents you need an XSLT processor. There are various free<br />
and commercial XSLT processors available which can be used either in a stand-alone<br />
manner or in your own programs with the help of a programming language.<br />
XSLT stylesheets can make use of parameters which are passed from the environment<br />
to the stylesheet in order to control processing details. Since some of our XSLT<br />
samples make use of stylesheet parameters we will also supply information about passing<br />
parameters to stylesheets in various environments.<br />
Common XSLT processors which can be used in various packagings include the following:<br />
> Microsoft’s XML implementation called MSXML ships with the operating system<br />
since Windows 2000 SP4<br />
> Microsoft’s .NET Framework 2.0 XSLT implementation<br />
> Saxon, which is available in free and commercial versions<br />
> Xalan, an open-source project (available in C++ and Java implementations) hosted by<br />
the Apache foundation<br />
> The open-source libxslt library of the GNOME project<br />
> Sablotron, an open-source XSLT toolkit<br />
XSLT on the command line. Applying XSLT stylesheets from the command-line provides<br />
a convenient development and testing environment. The examples below show<br />
how apply XSLT stylesheets on the command-line. All samples process the input file<br />
FontReporter.tetml with the stylesheet tetml2html.xsl while setting the XSLT parameter<br />
toc-generate (which is used in the stylesheet) to the value 0, and send the generated output<br />
to FontReporter.html:<br />
> The Java-based Saxon processor (see www.saxonica.com) can be used as follows:<br />
java -jar saxon9.jar -o FontReporter.html FontReporter.tetml tetml2html.xsl<br />
toc-generate=0<br />
> The xsltproc tool is included in most Linux distributions, see xmlsoft.org/XSLT. Use the<br />
following command to apply a stylesheet to a <strong>TET</strong>ML document:<br />
xsltproc --output FontReporter.html --param toc-generate 0 tetml2html.xsl<br />
FontReporter.tetml<br />
> Xalan C++ provides a command-line tool which can be invoked as follows:<br />
Xalan -o FontReporter.html -p toc-generate 0 FontReporter.tetml tetml2html.xsl<br />
> On Windows systems with the MSXML parser you can use the free msxsl.exe program<br />
provided by Microsoft. The program (including source code) is available at the following<br />
location:<br />
www.microsoft.com/downloads/details.aspx?familyid=2FB55371-C94E-4373-B0E9-DB4816552E41<br />
Run the program as follows:<br />
msxsl.exe FontReporter.tetml tetml2html.xsl -o FontReporter.html toc-generate=0<br />
> On Windows systems with the .NET Framework 2.0 XSLT implementation you can<br />
use the free nxslt.exe program which is available from the following location:<br />
www.xmllab.net/Products/nxslt/tabid/62/Default.aspx<br />
98 Chapter 8: <strong>TET</strong> Markup Language (<strong>TET</strong>ML)