17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Constructing <strong>TET</strong> command lines. The following rules must be observed for constructing<br />

<strong>TET</strong> command lines:<br />

> Input files will be searched in all directories specified as searchpath.<br />

> Short forms are available for some options, and can be mixed with long options.<br />

> Long options can be abbreviated provided the abbreviation is unique.<br />

> Depending on the encryption status of the input file, a user or master password may<br />

be required for successfully extracting text. It must be supplied with the --password<br />

option. <strong>TET</strong> will check whether this password is sufficient for text extraction, and<br />

will generate an error if it isn’t.<br />

<strong>TET</strong> checks the full command line before processing any file. If an error is encountered<br />

in the options anywhere on the command line, no files will be processed at all.<br />

File names. File names which contain blank characters require some special handling<br />

when used with command-line tools like <strong>TET</strong>. In order to process a file name with blank<br />

characters you should enclose the complete file name with double quote " characters.<br />

Wildcards can be used according to standard practice. For example, *.pdf denotes all files<br />

in a given directory which have a .pdf file name suffix. Note that on some systems case<br />

is significant, while on others it isn’t (i.e., *.pdf may be different from *.PDF). Also note<br />

that on Windows systems wildcards do not work for file names containing blank characters.<br />

Response files. In addition to options supplied directly on the command-line, options<br />

can also be supplied in a response file. The contents of a response file will be inserted in<br />

the command-line at the location where the @filename option was found.<br />

A response file is a simple text file with options and parameters. It must adhere to<br />

the following syntax rules:<br />

> Option values must be separated with whitespace, i.e. space, linefeed, return, or tab.<br />

> Values which contain whitespace must be enclosed with double quotation marks: "<br />

> Double quotation marks at the beginning and end of a value will be omitted.<br />

> A double quotation mark must be masked with a backslash to use it literally: \"<br />

> A backslash character must be masked with another backslash to use it literally: \\<br />

Response files can be nested, i.e. the @filename syntax can itself be used in a response<br />

file.<br />

Exit codes. The <strong>TET</strong> command-line tool returns with an exit code which can be used to<br />

check whether or not the requested operations could be successfully carried out:<br />

> Exit code 0: all command-line options could be successfully and fully processed.<br />

> Exit code 1: one or more file processing errors occurred, but processing continued.<br />

> Exit code 2: some error was found in the command-line options. Processing stopped<br />

at the particular bad option, and no input file has been processed.<br />

2.1 Command-Line Options 17

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!