PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
A <strong>TET</strong> Library Quick Reference<br />
The following tables contain an overview of all <strong>TET</strong> API functions. The prefix (C) denotes<br />
C prototypes of functions which are not available in the Java language binding.<br />
General Functions<br />
Function prototype<br />
page<br />
(C) <strong>TET</strong> *<strong>TET</strong>_new(void) 123<br />
void delete( ) 123<br />
(C) const char *<strong>TET</strong>_utf8_to_utf16(<strong>TET</strong> *tet, const char *utf8string, const char *ordering, int *size) 123<br />
(C) const char *<strong>TET</strong>_utf16_to_utf8(<strong>TET</strong> *tet, const char *utf16string, int len, int *size) 124<br />
(C) const char *<strong>TET</strong>_utf32_to_utf16(<strong>TET</strong> *tet, const char *utf32string, int len, const char *ordering, int *size) 124<br />
void create_pvf(String filename, byte[] data, String optlist) 125<br />
int delete_pvf(String filename) 126<br />
Exception Handling Functions<br />
Function prototype<br />
page<br />
String get_apiname( ) 127<br />
String get_errmsg( ) 127<br />
int get_errnum( ) 127<br />
Document Functions<br />
Function prototype<br />
page<br />
int open_document(String filename, String optlist) 129<br />
(C) int <strong>TET</strong>_open_document_callback(<strong>TET</strong> *tet, void *opaque, size_t filesize, size_t (*readproc)(void<br />
*opaque, void *buffer, size_t size), int (*seekproc)(void *opaque, long offset), const char *optlist) 133<br />
void close_document(int doc) 133<br />
Page Functions<br />
Function prototype<br />
page<br />
int open_page(int doc, int pagenumber, String optlist) 134<br />
void close_page(int page) 140<br />
<strong>Text</strong> and Metrics Retrieval Functions<br />
Function prototype<br />
page<br />
String get_text(int page) 142<br />
int get_char_info(int page) 143<br />
Appendix A: <strong>TET</strong> Library Quick Reference 157