PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3 <strong>TET</strong> Library Language Bindings<br />
This chapter discusses specifics for the language bindings which are supplied for the<br />
<strong>TET</strong> library. The <strong>TET</strong> distribution contains full sample code for several small <strong>TET</strong> applications<br />
in all supported language bindings.<br />
3.1 Exception Handling<br />
Errors of a certain kind are called exceptions in many languages for good reasons – they<br />
are mere exceptions, and are not expected to occur very often during the lifetime of a<br />
program. The general strategy is to use conventional error reporting mechanisms (read:<br />
special error return codes) for function calls which may go wrong often times, and use a<br />
special exception mechanism for those rare occasions which don’t justify cluttering the<br />
code with conditionals. This is exactly the path that <strong>TET</strong> goes: Some operations can be<br />
expected to go wrong rather frequently, for example:<br />
> Trying to open a PDF document for which one doesn’t have the proper password (but<br />
see also the shrug feature described in Section 5.1, »Indexing protected PDF Documents«,<br />
page 49);<br />
> Trying to open a PDF document with a wrong file name;<br />
> Trying to open a PDF document which is damaged beyond repair.<br />
<strong>TET</strong> signals such errors by returning a value of –1 as documented in the API reference.<br />
Other events may be considered harmful, but will occur rather infrequently, e.g.<br />
> running out of virtual memory;<br />
> supplying wrong function parameters (e.g. an invalid document handle);<br />
> supplying malformed option lists;<br />
> a required resource (e.g. a CMap file for CJK text extract) cannot be found.<br />
When <strong>TET</strong> detects such a situation, an exception will be thrown instead of passing a special<br />
error return value to the caller. In languages which support native exceptions<br />
throwing the exception will be done using the standard means supplied by the language<br />
or environment. For the C language binding <strong>TET</strong> supplies a custom exception<br />
handling mechanism which must be used by clients (see Section 3.2, »C Binding«, page<br />
22).<br />
It is important to understand that processing a document must be stopped when an<br />
exception occurred. The only methods which can safely be called after an exception are<br />
<strong>TET</strong>_delete( ), <strong>TET</strong>_get_apiname( ), <strong>TET</strong>_get_errnum( ), and <strong>TET</strong>_get_errmsg( ). Calling any<br />
other method after an exception may lead to unexpected results. The exception will<br />
contain the following information:<br />
> A unique error number;<br />
> The name of the API function which caused the exception;<br />
> A descriptive text containing details of the problem;<br />
Querying the reason of a failed function call. Some <strong>TET</strong> function calls, e.g. <strong>TET</strong>_open_<br />
document( ) or <strong>TET</strong>_open_page( ), can fail without throwing an exception (they will return<br />
-1 in case of an error). In this situation the functions <strong>TET</strong>_get_errnum( ), <strong>TET</strong>_get_errmsg( ),<br />
and <strong>TET</strong>_get_apiname( ) can be called immediately after a failed function call in order to<br />
retrieve details about the nature of the problem.<br />
3.1 Exception Handling 21