PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Table 10.9 Suboptions for the glyphmapping option of <strong>TET</strong>_open_document( ) and <strong>TET</strong>_open_document_callback( )<br />
option<br />
tounicodecmap<br />
description<br />
(String) Name of a ToUnicode CMap resource to be applied to the font; it will have higher priority than an<br />
embedded ToUnicode CMap or encoding entry.<br />
1. Limited wildcards: The stand-alone character »*« denotes all fonts; Using »*« after a prefix (e.g. »MSTT*«) denotes all fonts starting<br />
with the specified prefix.<br />
2. Encoding name according to Section 10.5, »Encoding Names«, page 147<br />
C++<br />
C<br />
int open_document_callback(void *opaque, size_t filesize,<br />
size_t (*readproc)(void *opaque, void *buffer, size_t size),<br />
int (*seekproc)(void *opaque, long offset),<br />
wstring optlist)<br />
int <strong>TET</strong>_open_document_callback(<strong>TET</strong> *tet, void *opaque, size_t filesize,<br />
size_t (*readproc)(void *opaque, void *buffer, size_t size),<br />
int (*seekproc)(void *opaque, long offset),<br />
const char *optlist)<br />
Open a PDF document from a custom data source for content extraction.<br />
opaque A pointer to some user data that might be associated with the input PDF document.<br />
This pointer will be passed as the first parameter of the callback functions, and<br />
can be used in any way. <strong>TET</strong> will not use the opaque pointer in any other way.<br />
filesize<br />
The size of the complete PDF document in bytes.<br />
readproc A C callback function which copies size bytes to the memory pointed to by<br />
buffer. If the end of the document is reached it may copy less data than requested. The<br />
function must return the number of bytes copied.<br />
seekproc A C callback function which sets the current read position in the document.<br />
offset denotes the position from the beginning of the document (0 meaning the first<br />
byte). If successful, this function must return 0, otherwise -1.<br />
optlist An option list specifying document options according to Table 10.8.<br />
Returns See <strong>TET</strong>_open_document( ).<br />
Details See <strong>TET</strong>_open_document( ).<br />
Bindings<br />
This function is only available in the C and C++ language bindings.<br />
C++ void close_document(int doc)<br />
C# Java void close_document(int doc)<br />
Perl PHP close_document(long doc)<br />
VB RB Sub close_document(doc As Long)<br />
C void <strong>TET</strong>_close_document(<strong>TET</strong> *tet, int doc)<br />
Release a document handle and all internal resources related to that document.<br />
doc A valid document handle obtained with <strong>TET</strong>_open_document*( ).<br />
10.7 Document Functions 167