PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
10.2 General Functions<br />
Perl PHP resource <strong>TET</strong>_new( )<br />
C <strong>TET</strong> *<strong>TET</strong>_new(void)<br />
Create a new <strong>TET</strong> object.<br />
Returns<br />
Bindings<br />
A handle to a <strong>TET</strong> object to be used in subsequent calls. If this function doesn’t succeed<br />
due to unavailable memory it will return NULL.<br />
This function is not available in object-oriented language bindings since it is hidden in<br />
the <strong>TET</strong> constructor.<br />
Java void delete( )<br />
C# void Dispose( )<br />
Perl PHP resource <strong>TET</strong>_delete(resource tet)<br />
C void <strong>TET</strong>_delete(<strong>TET</strong> *tet)<br />
Delete a <strong>TET</strong> object and release all related internal resources.<br />
Details<br />
Bindings<br />
Deleting a <strong>TET</strong> object automatically closes all of its open documents. The <strong>TET</strong> object<br />
must no longer be used in any function after it has been deleted.<br />
In object-oriented language bindings this function is generally not required since it is<br />
hidden in the <strong>TET</strong> destructor. However, in Java it is available nevertheless to allow explicit<br />
cleanup in addition to automatic garbage collection. In .NET Dispose( ) should be<br />
called at the end of processing to clean up unmanaged resources.<br />
C++<br />
Perl PHP<br />
C<br />
string utf8_to_utf16(string utf8string, string ordering)<br />
string <strong>TET</strong>_utf8_to_utf16(resource tet, string utf8string, string ordering)<br />
const char *<strong>TET</strong>_utf8_to_utf16(<strong>TET</strong> *tet, const char *utf8string, const char *ordering, int *size)<br />
Convert a string from UTF-8 format to UTF-16.<br />
utf8string String to be converted. It must contain a valid UTF-8 sequence (on EBCDIC<br />
platforms it must be encoded in EBCDIC). If a Byte Order Mark (BOM) is present, it will<br />
be removed.<br />
ordering Specifies the byte ordering of the result string:<br />
> utf16 or an empty string: The converted string will not have a BOM, and will be stored<br />
in the platform’s native byte order.<br />
> utf16le: The converted string will be formatted in little endian format, and will be<br />
prefixed with the LE BOM (\xFF\xFE).<br />
> utf16be: The converted string will be formatted in big endian format, and will be prefixed<br />
with the BE BOM (\xFE\xFF).<br />
size (C language binding only) Pointer to a memory location where the length of the<br />
returned string (in bytes, but excluding the terminating two null bytes) will be stored.<br />
Returns<br />
The converted UTF-16 string. In C it will be terminated by two null bytes. The returned<br />
string is valid until the next call to any function other than <strong>TET</strong>_utf16_to_utf8( ), <strong>TET</strong>_<br />
10.2 General Functions 123