17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

C++ const <strong>TET</strong>_char_info *get_char_info(int page)<br />

C# Java int get_char_info(int page)<br />

Perl PHP object <strong>TET</strong>_get_char_info(resource tet, long page)<br />

VB Function get_char_info(int page) As Long<br />

C const <strong>TET</strong>_char_info *<strong>TET</strong>_get_char_info(<strong>TET</strong> *tet, int page)<br />

Get detailed information for the next character in the most recent text fragment.<br />

page A valid page handle obtained with <strong>TET</strong>_open_page( ).<br />

Returns<br />

Details<br />

Bindings<br />

If no more characters are available for the most recent text fragment returned by <strong>TET</strong>_<br />

get_text( ), a binding-specific value will be returned. See section Bindings below for more<br />

details.<br />

This function can be called after <strong>TET</strong>_get_text( ). It will advance to the next character for<br />

the current text fragment associated with the supplied page handle (or return 0 or NULL<br />

if there are no more characters), and provide detailed information for this character.<br />

There will be N successful calls to this function where N is the number of UTF-16 characters<br />

in the text fragment returned by the most recent call to <strong>TET</strong>_get_text( ).<br />

For granularities other than glyph this function will advance to the next character of<br />

the string returned by the most recent call to <strong>TET</strong>_get_text( ). This way it is possible to retrieve<br />

character metrics when the wordfinder is active and a text fragment may contain<br />

more than one character. In order to retrieve all character details for the current text<br />

fragment this function must be called repeatedly until it returns NULL or 0.<br />

The character details in the structure or properties/fields are valid until the next call<br />

to <strong>TET</strong>_get_char_info( ) or <strong>TET</strong>_close_page( ) with the same page handle (whichever occurs<br />

first). Since there is only a single set of character info properties/fields per <strong>TET</strong> object,<br />

clients must retrieve all character info before they call <strong>TET</strong>_get_char_info( ) again for the<br />

same or another page or document.<br />

C and C++ language bindings: If no more characters are available for the most recent<br />

text fragment returned by <strong>TET</strong>_get_text( ), a NULL pointer will be returned. Otherwise, a<br />

pointer to a <strong>TET</strong>_char_info structure containing information about a single character<br />

will be returned. The members of the data structure are detailed in Table 10.10.<br />

COM, Java and .NET language bindings: -1 will be returned if no more characters are<br />

available for the most recent text fragment returned by <strong>TET</strong>_get_text( ), otherwise 1. Individual<br />

character info can be retrieved from the <strong>TET</strong> properties/public fields according<br />

to Table 10.10. All properties/fields will contain a value of -1 (the unknown field will contain<br />

false) if they are accessed although the function returned -1.<br />

Perl language binding: 0 will be returned if no more characters are available for the<br />

most recent text fragment returned by <strong>TET</strong>_get_text( ), otherwise a hash containing the<br />

keys listed in Table 10.10. Individual character info can be retrieved with the keys in this<br />

hash.<br />

PHP language binding: an empty (null) object will be returned if no more characters are<br />

available for the most recent text fragment returned by <strong>TET</strong>_get_text( ), otherwise an object<br />

containing the fields listed in Table 10.10. Individual character info can be retrieved<br />

from the member fields of this object. Integer fields in the character info object are implemented<br />

as long in the PHP language binding.<br />

10.6 <strong>Text</strong> and Metrics Retrieval Functions 143

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!