PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>TET</strong>_CATCH(tet)<br />
{<br />
printf("Error %d in %s() on page %d: %s\n",<br />
<strong>TET</strong>_get_errnum(tet), <strong>TET</strong>_get_apiname(tet), pageno, <strong>TET</strong>_get_errmsg(tet));<br />
}<br />
<strong>TET</strong>_delete(tet);<br />
Unicode handling for name strings. The C language does not natively support Unicode.<br />
Some string parameters for API functions may be declared as name strings. These<br />
are handled depending on the length parameter and the existence of a BOM at the beginning<br />
of the string. In C, if the length parameter is different from 0 the string will be<br />
interpreted as UTF-16. If the length parameter is 0 the string will be interpreted as UTF-8<br />
if it starts with a UTF-8 BOM, or as EBCDIC UTF-8 if it starts with an EBCDIC UTF-8 BOM,<br />
or as host encoding if no BOM is found (or ebcdic on all EBCDIC-based platforms).<br />
Unicode handling for option lists. Strings within option lists require special attention<br />
since they cannot be expressed as Unicode strings in UTF-16 format, but only as byte arrays.<br />
For this reason UTF-8 is used for Unicode options. By looking for a BOM at the beginning<br />
of an option <strong>TET</strong> decides how to interpret it. The BOM will be used to determine<br />
the format of the string. More precisely, interpreting a string option works as follows:<br />
> If the option starts with a UTF-8 BOM (\xEF\xBB\xBF) it will interpreted as UTF-8.<br />
> If the option starts with an EBCDIC UTF-8 BOM (\x57\x8B\xAB) it will be interpreted as<br />
EBCDIC UTF-8.<br />
> If no BOM is found, the string will be treated as winansi (or ebcdic on EBCDIC-based<br />
platforms).<br />
Note The <strong>TET</strong>_utf16_to_utf8( ) utility function can be used to create UTF-8 strings from UTF-16<br />
strings, which is useful for creating option lists with Unicode values.<br />
3.2 C Binding 23