17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Table 10.1 Unicode set examples<br />

specification of Unicode set<br />

[U+0061-U+007A]<br />

[U+0640]<br />

[\x{0640}]<br />

[U+FB00-U+FB17]<br />

[^U+0061-U+007A]<br />

[:Lu:]<br />

[:UppercaseLetter:]<br />

[:L:]<br />

[:Letter:]<br />

[:General_Category=Dash_Punctuation:]<br />

[:Alphabetic=No:]<br />

[:Private_Use:]<br />

characters in the Unicode set<br />

lower case letters a through z<br />

single character Arabic Tatweel<br />

single character Arabic Tatweel<br />

Latin and Armenian ligatures<br />

all characters except a through z<br />

all uppercase letters (short and long forms of the Unicode<br />

set)<br />

all Unicode categories starting with L (short and long<br />

forms of the Unicode set)<br />

all characters in the general category Dash_Punctuation<br />

all non-alphabetic characters<br />

all characters in the Private Use Area (PUA)<br />

Number. Option list support several numerical types.<br />

Integer types can hold decimal and hexadecimal integers. Positive integers starting<br />

with x, X, 0x, or 0X specify hexadecimal values:<br />

-12345<br />

0<br />

0xFF<br />

Floats can hold decimal floating point or integer numbers; period and comma can be<br />

used as decimal separators for floating point values. Exponential notation is also supported.<br />

The following values are all equivalent:<br />

size = -123.45<br />

size = -123,45<br />

size = -1.2345E2<br />

size = -1.2345e+2<br />

146 Chapter 10: <strong>TET</strong> Library API Reference

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!