17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Table 10.9 Suboptions for the glyphmapping option of <strong>TET</strong>_open_document( ) and <strong>TET</strong>_open_document_callback( )<br />

option<br />

tounicodecmap<br />

description<br />

(String) Name of a ToUnicode CMap resource to be applied to the font; it will have higher priority than an<br />

embedded ToUnicode CMap or encoding entry.<br />

1. Limited wildcards: The stand-alone character »*« denotes all fonts; Using »*« after a prefix (e.g. »MSTT*«) denotes all fonts starting<br />

with the specified prefix.<br />

2. Encoding name according to Section 10.5, »Encoding Names«, page 147<br />

C++<br />

C<br />

int open_document_callback(void *opaque, size_t filesize,<br />

size_t (*readproc)(void *opaque, void *buffer, size_t size),<br />

int (*seekproc)(void *opaque, long offset),<br />

wstring optlist)<br />

int <strong>TET</strong>_open_document_callback(<strong>TET</strong> *tet, void *opaque, size_t filesize,<br />

size_t (*readproc)(void *opaque, void *buffer, size_t size),<br />

int (*seekproc)(void *opaque, long offset),<br />

const char *optlist)<br />

Open a PDF document from a custom data source for content extraction.<br />

opaque A pointer to some user data that might be associated with the input PDF document.<br />

This pointer will be passed as the first parameter of the callback functions, and<br />

can be used in any way. <strong>TET</strong> will not use the opaque pointer in any other way.<br />

filesize<br />

The size of the complete PDF document in bytes.<br />

readproc A C callback function which copies size bytes to the memory pointed to by<br />

buffer. If the end of the document is reached it may copy less data than requested. The<br />

function must return the number of bytes copied.<br />

seekproc A C callback function which sets the current read position in the document.<br />

offset denotes the position from the beginning of the document (0 meaning the first<br />

byte). If successful, this function must return 0, otherwise -1.<br />

optlist An option list specifying document options according to Table 10.8.<br />

Returns See <strong>TET</strong>_open_document( ).<br />

Details See <strong>TET</strong>_open_document( ).<br />

Bindings<br />

This function is only available in the C and C++ language bindings.<br />

C++ void close_document(int doc)<br />

C# Java void close_document(int doc)<br />

Perl PHP close_document(long doc)<br />

VB RB Sub close_document(doc As Long)<br />

C void <strong>TET</strong>_close_document(<strong>TET</strong> *tet, int doc)<br />

Release a document handle and all internal resources related to that document.<br />

doc A valid document handle obtained with <strong>TET</strong>_open_document*( ).<br />

10.7 Document Functions 167

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!