17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

utf8_to_utf16( ), or <strong>TET</strong>_utf32_to_utf16( ) or until an exception is thrown. Clients must<br />

copy the string if they need it longer.<br />

Bindings<br />

This function is not available in Unicode-capable language bindings. The memory used<br />

for the converted string will be managed by <strong>TET</strong>, and must not be freed by the client.<br />

C++<br />

Perl PHP<br />

C<br />

string utf16_to_utf8(string utf16string)<br />

string <strong>TET</strong>_utf16_to_utf8(resource tet, string utf16string)<br />

const char *<strong>TET</strong>_utf16_to_utf8(<strong>TET</strong> *tet, const char *utf16string, int len, int *size)<br />

Convert a string from UTF-16 format to UTF-8.<br />

utf16string The string to be converted. A Byte Order Mark (BOM) in the string will be<br />

interpreted. If it is missing the platform’s native byte ordering is assumed.<br />

len<br />

(C language binding only) Length of utf16string in bytes.<br />

size (C language binding only) C-style pointer to a memory location where the length<br />

of the returned string (in bytes) will be stored. If the pointer is NULL it will be ignored.<br />

Returns<br />

Bindings<br />

The converted UTF-8 string. The generated UTF-8 string will start with the UTF-8 BOM<br />

(\xEF\xBB\xBF). On EBCDIC platforms the conversion result including the BOM will finally<br />

be converted to EBCDIC. The returned string is valid until the next call to any function<br />

other than <strong>TET</strong>_utf16_to_utf8( ), <strong>TET</strong>_utf8_to_utf16( ), or <strong>TET</strong>_utf32_to_utf16( ) or until<br />

an exception is thrown. Clients must copy the string if they need it longer.<br />

This function is not available in Unicode-capable language bindings.<br />

C++<br />

Perl PHP<br />

C<br />

string utf32_to_utf16(string utf32string, string ordering)<br />

string <strong>TET</strong>_utf32_to_utf16(resource p, string utf32string, string ordering)<br />

const char *<strong>TET</strong>_utf32_to_utf16(<strong>TET</strong> *tet, const char *utf32string, int len, const char *ordering, int<br />

*size)<br />

Convert a string from UTF-32 format to UTF-16.<br />

utf32string The string to be converted, which must contain a valid UTF-32 sequence. If<br />

a Byte Order Mark (BOM) is present, it will be interpreted<br />

len<br />

(C language binding only) Length of utf32string in bytes.<br />

ordering Specifies the byte ordering of the result string:<br />

> utf16 or an empty string: the converted string will not have any BOM, and will be<br />

stored in the platform’s native byte order.<br />

> utf16le: the converted string will be formatted in little endian format, and will be prefixed<br />

with the little-endian BOM (\xFF\xFE).<br />

> utf16be: the converted string will be formatted in big endian format, and will be prefixed<br />

with the big-endian BOM (\xFE\xFF).<br />

size (C language binding only) C-style pointer to a memory location where the length<br />

of the returned string (in bytes) will be stored.<br />

124 Chapter 10: <strong>TET</strong> Library API Reference

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!