17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

10.2 General Functions<br />

Perl PHP resource <strong>TET</strong>_new( )<br />

C <strong>TET</strong> *<strong>TET</strong>_new(void)<br />

Create a new <strong>TET</strong> object.<br />

Returns<br />

Bindings<br />

A handle to a <strong>TET</strong> object to be used in subsequent calls. If this function doesn’t succeed<br />

due to unavailable memory it will return NULL.<br />

This function is not available in object-oriented language bindings since it is hidden in<br />

the <strong>TET</strong> constructor.<br />

Java void delete( )<br />

C# void Dispose( )<br />

Perl PHP resource <strong>TET</strong>_delete(resource tet)<br />

C void <strong>TET</strong>_delete(<strong>TET</strong> *tet)<br />

Delete a <strong>TET</strong> object and release all related internal resources.<br />

Details<br />

Bindings<br />

Deleting a <strong>TET</strong> object automatically closes all of its open documents. The <strong>TET</strong> object<br />

must no longer be used in any function after it has been deleted.<br />

In object-oriented language bindings this function is generally not required since it is<br />

hidden in the <strong>TET</strong> destructor. However, in Java it is available nevertheless to allow explicit<br />

cleanup in addition to automatic garbage collection. In .NET Dispose( ) should be<br />

called at the end of processing to clean up unmanaged resources.<br />

C++<br />

Perl PHP<br />

C<br />

string utf8_to_utf16(string utf8string, string ordering)<br />

string <strong>TET</strong>_utf8_to_utf16(resource tet, string utf8string, string ordering)<br />

const char *<strong>TET</strong>_utf8_to_utf16(<strong>TET</strong> *tet, const char *utf8string, const char *ordering, int *size)<br />

Convert a string from UTF-8 format to UTF-16.<br />

utf8string String to be converted. It must contain a valid UTF-8 sequence (on EBCDIC<br />

platforms it must be encoded in EBCDIC). If a Byte Order Mark (BOM) is present, it will<br />

be removed.<br />

ordering Specifies the byte ordering of the result string:<br />

> utf16 or an empty string: The converted string will not have a BOM, and will be stored<br />

in the platform’s native byte order.<br />

> utf16le: The converted string will be formatted in little endian format, and will be<br />

prefixed with the LE BOM (\xFF\xFE).<br />

> utf16be: The converted string will be formatted in big endian format, and will be prefixed<br />

with the BE BOM (\xFE\xFF).<br />

size (C language binding only) Pointer to a memory location where the length of the<br />

returned string (in bytes, but excluding the terminating two null bytes) will be stored.<br />

Returns<br />

The converted UTF-16 string. In C it will be terminated by two null bytes. The returned<br />

string is valid until the next call to any function other than <strong>TET</strong>_utf16_to_utf8( ), <strong>TET</strong>_<br />

10.2 General Functions 123

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!