17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>TET</strong>_CATCH(tet)<br />

{<br />

printf("Error %d in %s() on page %d: %s\n",<br />

<strong>TET</strong>_get_errnum(tet), <strong>TET</strong>_get_apiname(tet), pageno, <strong>TET</strong>_get_errmsg(tet));<br />

}<br />

<strong>TET</strong>_delete(tet);<br />

Unicode handling for name strings. The C language does not natively support Unicode.<br />

Some string parameters for API functions may be declared as name strings. These<br />

are handled depending on the length parameter and the existence of a BOM at the beginning<br />

of the string. In C, if the length parameter is different from 0 the string will be<br />

interpreted as UTF-16. If the length parameter is 0 the string will be interpreted as UTF-8<br />

if it starts with a UTF-8 BOM, or as EBCDIC UTF-8 if it starts with an EBCDIC UTF-8 BOM,<br />

or as host encoding if no BOM is found (or ebcdic on all EBCDIC-based platforms).<br />

Unicode handling for option lists. Strings within option lists require special attention<br />

since they cannot be expressed as Unicode strings in UTF-16 format, but only as byte arrays.<br />

For this reason UTF-8 is used for Unicode options. By looking for a BOM at the beginning<br />

of an option <strong>TET</strong> decides how to interpret it. The BOM will be used to determine<br />

the format of the string. More precisely, interpreting a string option works as follows:<br />

> If the option starts with a UTF-8 BOM (\xEF\xBB\xBF) it will interpreted as UTF-8.<br />

> If the option starts with an EBCDIC UTF-8 BOM (\x57\x8B\xAB) it will be interpreted as<br />

EBCDIC UTF-8.<br />

> If no BOM is found, the string will be treated as winansi (or ebcdic on EBCDIC-based<br />

platforms).<br />

Note The <strong>TET</strong>_utf16_to_utf8( ) utility function can be used to create UTF-8 strings from UTF-16<br />

strings, which is useful for creating option lists with Unicode values.<br />

3.2 C Binding 23

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!