17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

10.4 Geometric Types<br />

Rectangle. A rectangle is a list of four float values specifying the x and y coordinates of<br />

the lower left and upper right corners of a rectangle. The coordinate system for interpreting<br />

the coordinates (default or user coordinate system) varies depending on the option,<br />

and is documented separately. Example:<br />

includebox = {{0 0 500 100} {0 500 500 600}}^<br />

10.5 Encoding Names<br />

Various options and parameters support the names of encodings, e.g. the filenamehandling<br />

option of <strong>TET</strong>_set_option( ), the forceencoding option of <strong>TET</strong>_open_document( ),<br />

and the inputformat parameter of <strong>TET</strong>_convert_to_unicode( ). The following keywords can<br />

be supplied as encoding names:<br />

> The keyword auto specifies the most natural encoding for certain environments:<br />

> On Windows: the current system code page<br />

> On Unix and OS X: iso8859-1<br />

> On i5/iSeries: the current job’s encoding (IBMCCSID000000000000)<br />

> On zSeries: ebcdic<br />

> winansi (=cp1252)<br />

> iso8859-1 - iso8859-10, iso8859-13 - iso8859-14<br />

> cp1250 - cp1258<br />

> macroman, macroman_euro (replaces currency with Euro), macroman_apple, (replaces<br />

currency with Euro and includes additional mathematical/greek symbols)<br />

> U+XXXX (256 characters starting at the specified value)<br />

> ebcdic (=code page 1047), ebcdic_37 (=code page 037)<br />

> CJK encodings cp932, cp936, cp949, cp950<br />

> on the following systems all encodings available on the host system can be used:<br />

> cpXXXX on Windows<br />

> any Coded Character Set Identifier without the CCSID prefix on i5/iSeries<br />

> any Coded Character Set Identifier (CCSID) on zSeries<br />

> custom encodings can be defined as resources and referenced by their resource<br />

name<br />

10.4 Geometric Types 147

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!