PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Table 10.1 Unicode set examples<br />
specification of Unicode set<br />
[U+0061-U+007A]<br />
[U+0640]<br />
[\x{0640}]<br />
[U+FB00-U+FB17]<br />
[^U+0061-U+007A]<br />
[:Lu:]<br />
[:UppercaseLetter:]<br />
[:L:]<br />
[:Letter:]<br />
[:General_Category=Dash_Punctuation:]<br />
[:Alphabetic=No:]<br />
[:Private_Use:]<br />
characters in the Unicode set<br />
lower case letters a through z<br />
single character Arabic Tatweel<br />
single character Arabic Tatweel<br />
Latin and Armenian ligatures<br />
all characters except a through z<br />
all uppercase letters (short and long forms of the Unicode<br />
set)<br />
all Unicode categories starting with L (short and long<br />
forms of the Unicode set)<br />
all characters in the general category Dash_Punctuation<br />
all non-alphabetic characters<br />
all characters in the Private Use Area (PUA)<br />
Number. Option list support several numerical types.<br />
Integer types can hold decimal and hexadecimal integers. Positive integers starting<br />
with x, X, 0x, or 0X specify hexadecimal values:<br />
-12345<br />
0<br />
0xFF<br />
Floats can hold decimal floating point or integer numbers; period and comma can be<br />
used as decimal separators for floating point values. Exponential notation is also supported.<br />
The following values are all equivalent:<br />
size = -123.45<br />
size = -123,45<br />
size = -1.2345E2<br />
size = -1.2345e+2<br />
146 Chapter 10: <strong>TET</strong> Library API Reference