17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.3 C++ Binding<br />

Note For applications written in C++ we recommend to access the <strong>TET</strong> .NET DLL directly instead of<br />

via the C++ binding (except for cross-platform applications which should use the C++ binding).<br />

The <strong>TET</strong> distribution contains C++ sample code for use with .NET CLI which demonstrates this<br />

combination.<br />

In addition to the tetlib.h C header file, an object-oriented wrapper for C++ is supplied<br />

for <strong>TET</strong> clients. It requires the tet.hpp header file, which in turn includes tetlib.h. Since<br />

tet.hpp contains a template-based implementation no corresponding tet.cpp module is<br />

required. Using the C++ object wrapper replaces the functional approach with API functions<br />

and <strong>TET</strong>_ prefixes in all <strong>TET</strong> function names with a more object-oriented approach.<br />

String handling in C++. <strong>TET</strong>’s template-based string handling approach supports the<br />

following usage patterns with respect to string handling:<br />

> Strings of the C++ standard library type std::wstring are used as basic string type.<br />

They can hold Unicode characters encoded as UTF-16 or UTF-32. This is the default behavior<br />

since <strong>TET</strong> 4.0 and the recommended approach for new applications unless<br />

custom data types (see next item) offer a significant advantage over wstrings.<br />

> Custom (user-defined) data types for string handling can be used as long as the custom<br />

data type is an instantiation of the basic_string class template and can be converted<br />

to and from Unicode via user-supplied converter methods.<br />

> Plain C++ strings can be used for compatibility with existing C++ applications which<br />

have been developed against <strong>TET</strong> 3.0 or earlier versions. This compatibility variant is<br />

only meant for existing applications (see below for notes on source code compatibility).<br />

The new interface assumes that all strings passed to and received from <strong>TET</strong> methods are<br />

native wstrings. Depending on the size of the wchar_t data type, wstrings are assumed to<br />

contain Unicode strings encoded as UTF-16 (2-byte characters) or UTF-32 (4-byte characters).<br />

Literal strings in the source code must be prefixed with L to designate wide strings.<br />

Unicode characters in literals can be created with the \u and \U syntax. Although this<br />

syntax is part of standard ISO C++, some compilers don’t support it. In this case literal<br />

Unicode characters must be created with hex characters.<br />

Note On EBCDIC-based systems the formatting of option list strings for the wstring-based interface<br />

requires additional conversions to avoid a mixture of EBCDIC and UTF-16 wstrings in option<br />

lists. Convenience code for this conversion and instructions are available in the auxiliary module<br />

utf16num_ebcdic.hpp.<br />

Adjusting applications to the new C++ binding. Existing C++ applications which have<br />

been developed against <strong>TET</strong> 3.0 or earlier versions can be adjusted as follows:<br />

> Since the <strong>TET</strong> C++ class now lives in the pdflib namespace the class name must be<br />

qualified. In order to avoid the pdflib::<strong>TET</strong> construct client applications should add<br />

the following before using <strong>TET</strong> methods:<br />

using namespace pdflib;<br />

> Switch the application’s string handling to wstrings. This includes data from external<br />

sources. However, string literals in the source code (including option lists) must also<br />

be adjusted by prepending the L prefix, e.g.<br />

26 Chapter 3: <strong>TET</strong> Library Language Bindings

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!