PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Constructing <strong>TET</strong> command lines. The following rules must be observed for constructing<br />
<strong>TET</strong> command lines:<br />
> Input files will be searched in all directories specified as searchpath.<br />
> Short forms are available for some options, and can be mixed with long options.<br />
> Long options can be abbreviated provided the abbreviation is unique.<br />
> Depending on the encryption status of the input file, a user or master password may<br />
be required for successfully extracting text. It must be supplied with the --password<br />
option. <strong>TET</strong> will check whether this password is sufficient for text extraction, and<br />
will generate an error if it isn’t.<br />
<strong>TET</strong> checks the full command line before processing any file. If an error is encountered<br />
in the options anywhere on the command line, no files will be processed at all.<br />
File names. File names which contain blank characters require some special handling<br />
when used with command-line tools like <strong>TET</strong>. In order to process a file name with blank<br />
characters you should enclose the complete file name with double quote " characters.<br />
Wildcards can be used according to standard practice. For example, *.pdf denotes all files<br />
in a given directory which have a .pdf file name suffix. Note that on some systems case<br />
is significant, while on others it isn’t (i.e., *.pdf may be different from *.PDF). Also note<br />
that on Windows systems wildcards do not work for file names containing blank characters.<br />
Response files. In addition to options supplied directly on the command-line, options<br />
can also be supplied in a response file. The contents of a response file will be inserted in<br />
the command-line at the location where the @filename option was found.<br />
A response file is a simple text file with options and parameters. It must adhere to<br />
the following syntax rules:<br />
> Option values must be separated with whitespace, i.e. space, linefeed, return, or tab.<br />
> Values which contain whitespace must be enclosed with double quotation marks: "<br />
> Double quotation marks at the beginning and end of a value will be omitted.<br />
> A double quotation mark must be masked with a backslash to use it literally: \"<br />
> A backslash character must be masked with another backslash to use it literally: \\<br />
Response files can be nested, i.e. the @filename syntax can itself be used in a response<br />
file.<br />
Exit codes. The <strong>TET</strong> command-line tool returns with an exit code which can be used to<br />
check whether or not the requested operations could be successfully carried out:<br />
> Exit code 0: all command-line options could be successfully and fully processed.<br />
> Exit code 1: one or more file processing errors occurred, but processing continued.<br />
> Exit code 2: some error was found in the command-line options. Processing stopped<br />
at the particular bad option, and no input file has been processed.<br />
2.1 Command-Line Options 17