17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

10 <strong>TET</strong> Library API Reference<br />

10.1 Option Lists<br />

Option lists are a powerful yet easy method to control <strong>TET</strong> operations. Instead of requiring<br />

a multitude of function parameters, many API methods support option lists, or<br />

optlists for short. Options lists are strings which may contain an arbitrary number of<br />

options. Since option lists will be evaluated from left to right an option can be supplied<br />

multiply within the same list; in this case the last occurrence will overwrite earlier ones.<br />

Optlists support various data types and composite data like arrays. In most languages<br />

optlists can easily be constructed by concatenating the required keywords and values. C<br />

programmers may want to use the sprintf( ) function in order to construct optlists.<br />

An optlist is a string containing one or more pairs of the form<br />

name value<br />

Names and values, as well as multiple name/value pairs can be separated by arbitrary<br />

whitespace characters (space, tab, carriage return, newline). The value may consist of a<br />

list of multiple values. You can also use an equal sign ’=’ between name and value:<br />

name=value<br />

Simple values. Simple values may use any of the following data types:<br />

> Boolean: true or false; if the value of a boolean option is omitted, the value true is assumed.<br />

As a shorthand notation nofoo can be used instead of foo=false to disable option<br />

foo.<br />

> String: these are plain ASCII strings which are generally used for non-localizable keywords.<br />

Strings containing whitespace or ’=’ characters must be bracketed with { and }.<br />

An empty string can be constructed with {}. The characters {, }, and \ must be preceded<br />

by an additional \ character if they are supposed to be part of the string.<br />

> Strings and name strings: these can hold Unicode content in various formats; see<br />

Section 3.2, »C Binding«, page 22 for C- and C++-specific details regarding name<br />

strings.<br />

> Unichar: these are single Unicode characters, where several syntax variants are supported:<br />

decimal values (e.g. 173), hexadecimal values prefixed with x, X, 0x, 0X, or U+<br />

(xAD, 0xAD, U+00AD), numerical or character references (see below), but without<br />

the ’&’ and ’;’ decoration (shy, #xAD, #173). Alternatively, literal characters can be<br />

supplied. Unichars must be in the range 0-65535 (0-xFFFF).<br />

> Keyword: one of a predefined list of fixed keywords<br />

> Float and integer: decimal floating point or integer numbers; point and comma can<br />

be used as decimal separators for floating point values. Integer values can start with<br />

x, X, 0x, or 0X to specify hexadecimal values. Some options (this is stated in the respective<br />

function description) support percentages by adding a % character directly<br />

after the value.<br />

> Handle: several internal object handles, e.g., document or page handles. Technically<br />

these are integer values.<br />

Depending on the type and interpretation of an option additional restrictions may apply.<br />

For example, integer or float options may be restricted to a certain range of values;<br />

10.1 Option Lists 121

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!