17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Table 10.10 Members of the <strong>TET</strong>_char_info structure (C and C++), equivalent public fields (Java, PHP), keys (Perl) or<br />

properties (COM and .NET) with their type and meaning. See »Glyph metrics«, page 64, for more details.<br />

property/<br />

field name<br />

uv<br />

type<br />

unknown<br />

explanation<br />

(Integer) UTF-32 Unicode value of the current character. It will be 0 if the corresponding UTF-16 value is<br />

the trailing value of a surrogate pair (i.e. if type=11).<br />

(Integer) Type of the character. The following types describe real characters which correspond to a glyph<br />

on the page. The values of all other properties/fields are determined by the corresponding glyph:<br />

0 Normal character which corresponds to exactly one glyph<br />

1 Start of a sequence (e.g. ligature)<br />

The following types describe artificial characters which do not correspond to a glyph on the page. The x<br />

and y fields will specify the most recent real character’s endpoint, the width field will be 0, and all other<br />

fields except uv will contain the values corresponding to the most recent real character:<br />

10 Continuation of a sequence (e.g. ligature)<br />

11 Trailing value of a surrogate pair; the leading value has type=0, 1, or 10.<br />

12 Inserted word, line, or zone separator<br />

(Boolean, in C and C++: integer) Usually false (0), but will be true (1) if the original glyph could not be<br />

mapped to Unicode and has been replaced with the character specified as unknownchar.<br />

x, y (Double) Position of the glyph’s reference point. The reference point is the lower left corner of the glyph<br />

box for horizontal writing mode, and the top center point for vertical writing mode. For artificial characters<br />

the x, y coordinates will be those of the end point of the most recent real character.<br />

width<br />

alpha<br />

beta<br />

fontid<br />

fontsize<br />

textrendering<br />

(Double) Width of the corresponding glyph (for both horizontal and vertical writing mode). For artificial<br />

characters the width will be 0.<br />

(Double) Direction of inline text progression in degrees measured counter-clockwise. For horizontal writing<br />

mode this is the direction of the text baseline; for vertical writing mode it is the digression from the<br />

standard -90° direction. The angle will be in the range -180° < alpha ³ +180°. For standard horizontal<br />

text as well as for standard text in vertical writing mode the angle will be 0°.<br />

(Double) <strong>Text</strong> slanting angle in degrees (counter-clockwise), relative to the perpendicular of alpha. The<br />

angle will be 0° for upright text, and negative for italicized (slanted) text. The angle will be in the range<br />

-180° < beta ³ 180°, but different from ±90°. If abs(beta) > 90° the text is mirrored at the baseline.<br />

(Integer) Index of the font in the fonts[] pseudo object (see Table 9.5). fontid is never negative.<br />

(Double) Size of the font (always positive); the relation of this value to the actual height of glyphs is not<br />

fixed, but may vary with the font design. For most fonts the font size is chosen such that it encompasses<br />

all ascenders (including accented characters) and descenders.<br />

(Integer) <strong>Text</strong> rendering mode:<br />

0 fill text<br />

1 stroke text (outline)<br />

2 fill and stroke text<br />

3 invisible text (often used for OCR results)<br />

4 fill text and add it to the clipping path<br />

5 stroke text and add it to the clipping path<br />

6 fill and stroke text and add it to the clipping path<br />

7 add text to the clipping path<br />

144 Chapter 10: <strong>TET</strong> Library API Reference

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!