PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
10 <strong>TET</strong> Library API Reference<br />
10.1 Option Lists<br />
Option lists are a powerful yet easy method to control <strong>TET</strong> operations. Instead of requiring<br />
a multitude of function parameters, many API methods support option lists, or<br />
optlists for short. Options lists are strings which may contain an arbitrary number of<br />
options. Since option lists will be evaluated from left to right an option can be supplied<br />
multiply within the same list; in this case the last occurrence will overwrite earlier ones.<br />
Optlists support various data types and composite data like arrays. In most languages<br />
optlists can easily be constructed by concatenating the required keywords and values. C<br />
programmers may want to use the sprintf( ) function in order to construct optlists.<br />
An optlist is a string containing one or more pairs of the form<br />
name value<br />
Names and values, as well as multiple name/value pairs can be separated by arbitrary<br />
whitespace characters (space, tab, carriage return, newline). The value may consist of a<br />
list of multiple values. You can also use an equal sign ’=’ between name and value:<br />
name=value<br />
Simple values. Simple values may use any of the following data types:<br />
> Boolean: true or false; if the value of a boolean option is omitted, the value true is assumed.<br />
As a shorthand notation nofoo can be used instead of foo=false to disable option<br />
foo.<br />
> String: these are plain ASCII strings which are generally used for non-localizable keywords.<br />
Strings containing whitespace or ’=’ characters must be bracketed with { and }.<br />
An empty string can be constructed with {}. The characters {, }, and \ must be preceded<br />
by an additional \ character if they are supposed to be part of the string.<br />
> Strings and name strings: these can hold Unicode content in various formats; see<br />
Section 3.2, »C Binding«, page 22 for C- and C++-specific details regarding name<br />
strings.<br />
> Unichar: these are single Unicode characters, where several syntax variants are supported:<br />
decimal values (e.g. 173), hexadecimal values prefixed with x, X, 0x, 0X, or U+<br />
(xAD, 0xAD, U+00AD), numerical or character references (see below), but without<br />
the ’&’ and ’;’ decoration (shy, #xAD, #173). Alternatively, literal characters can be<br />
supplied. Unichars must be in the range 0-65535 (0-xFFFF).<br />
> Keyword: one of a predefined list of fixed keywords<br />
> Float and integer: decimal floating point or integer numbers; point and comma can<br />
be used as decimal separators for floating point values. Integer values can start with<br />
x, X, 0x, or 0X to specify hexadecimal values. Some options (this is stated in the respective<br />
function description) support percentages by adding a % character directly<br />
after the value.<br />
> Handle: several internal object handles, e.g., document or page handles. Technically<br />
these are integer values.<br />
Depending on the type and interpretation of an option additional restrictions may apply.<br />
For example, integer or float options may be restricted to a certain range of values;<br />
10.1 Option Lists 121