PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
utf8_to_utf16( ), or <strong>TET</strong>_utf32_to_utf16( ) or until an exception is thrown. Clients must<br />
copy the string if they need it longer.<br />
Bindings<br />
This function is not available in Unicode-capable language bindings. The memory used<br />
for the converted string will be managed by <strong>TET</strong>, and must not be freed by the client.<br />
C++<br />
Perl PHP<br />
C<br />
string utf16_to_utf8(string utf16string)<br />
string <strong>TET</strong>_utf16_to_utf8(resource tet, string utf16string)<br />
const char *<strong>TET</strong>_utf16_to_utf8(<strong>TET</strong> *tet, const char *utf16string, int len, int *size)<br />
Convert a string from UTF-16 format to UTF-8.<br />
utf16string The string to be converted. A Byte Order Mark (BOM) in the string will be<br />
interpreted. If it is missing the platform’s native byte ordering is assumed.<br />
len<br />
(C language binding only) Length of utf16string in bytes.<br />
size (C language binding only) C-style pointer to a memory location where the length<br />
of the returned string (in bytes) will be stored. If the pointer is NULL it will be ignored.<br />
Returns<br />
Bindings<br />
The converted UTF-8 string. The generated UTF-8 string will start with the UTF-8 BOM<br />
(\xEF\xBB\xBF). On EBCDIC platforms the conversion result including the BOM will finally<br />
be converted to EBCDIC. The returned string is valid until the next call to any function<br />
other than <strong>TET</strong>_utf16_to_utf8( ), <strong>TET</strong>_utf8_to_utf16( ), or <strong>TET</strong>_utf32_to_utf16( ) or until<br />
an exception is thrown. Clients must copy the string if they need it longer.<br />
This function is not available in Unicode-capable language bindings.<br />
C++<br />
Perl PHP<br />
C<br />
string utf32_to_utf16(string utf32string, string ordering)<br />
string <strong>TET</strong>_utf32_to_utf16(resource p, string utf32string, string ordering)<br />
const char *<strong>TET</strong>_utf32_to_utf16(<strong>TET</strong> *tet, const char *utf32string, int len, const char *ordering, int<br />
*size)<br />
Convert a string from UTF-32 format to UTF-16.<br />
utf32string The string to be converted, which must contain a valid UTF-32 sequence. If<br />
a Byte Order Mark (BOM) is present, it will be interpreted<br />
len<br />
(C language binding only) Length of utf32string in bytes.<br />
ordering Specifies the byte ordering of the result string:<br />
> utf16 or an empty string: the converted string will not have any BOM, and will be<br />
stored in the platform’s native byte order.<br />
> utf16le: the converted string will be formatted in little endian format, and will be prefixed<br />
with the little-endian BOM (\xFF\xFE).<br />
> utf16be: the converted string will be formatted in big endian format, and will be prefixed<br />
with the big-endian BOM (\xFE\xFF).<br />
size (C language binding only) C-style pointer to a memory location where the length<br />
of the returned string (in bytes) will be stored.<br />
124 Chapter 10: <strong>TET</strong> Library API Reference