17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Table 10.15 Members of the <strong>TET</strong>_char_info structure (C, C++, Ruby), equivalent public fields (Java, PHP, Objective-C), keys<br />

(Perl) or properties (COM and .NET) with their type and meaning. See »Glyph metrics«, page 74, and Figure 6.3 for more<br />

details.<br />

property/<br />

field name<br />

attributes 1<br />

unknown<br />

explanation<br />

(Integer) Glyph attributes expressed as bits which can be combined:<br />

bit 0 Geometric or semantic subscript<br />

bit 1 Geometric or semantic superscript<br />

bit 2 Drop cap character (initial large character at the start of a paragraph)<br />

bit 3 Glyph- or word-based shadow duplicate of this glyph has been removed<br />

bit 4 Glyph represents last character before hyphenation point<br />

bit 5 Hyphenation artifact (i.e. the hyphen character) which was removed unless<br />

contentanalysis={keephyphenglyphs=true} was specified.<br />

bit 6 Glyph represents the character after hyphenation point<br />

(Boolean, in C, C++ and Perl: integer) Usually false (0), but will be true (1) if the original glyph could not<br />

be mapped to Unicode and has been replaced with the character specified as unknownchar.<br />

x, y (Double) Position of the glyph’s reference point. The reference point is the lower left corner of the glyph<br />

box for horizontal writing mode, and the top center point for vertical writing mode. For artificial characters<br />

the x, y coordinates will be those of the end point of the most recent real character.<br />

width<br />

alpha<br />

beta<br />

fontid<br />

fontsize<br />

textrendering<br />

(Double) Width of the corresponding glyph (for both horizontal and vertical writing mode). For artificial<br />

characters (i.e. inserted separators with type=12 and hyphenation artifacts with attribute bit 5 set) the<br />

width is 0.<br />

(Double) Direction of inline text progression in degrees measured counter-clockwise (or clockwise for topdown<br />

coordinates). For horizontal writing mode this is the direction of the text baseline; for vertical writing<br />

mode it is the digression from the standard vertical direction. The angle will be in the range<br />

-180° < alpha ³ +180°. For standard horizontal text as well as for standard text in vertical writing mode<br />

the angle will be 0°.<br />

(Double) <strong>Text</strong> slanting angle in degrees measured counter-clockwise (or clockwise for topdown coordinates),<br />

relative to the perpendicular of alpha. The angle will be 0° for upright text, and negative for italicized<br />

(slanted) text (positive for topdown coordinates). The angle will be in the range -180° < beta ³ 180°,<br />

but different from ±90°. If abs(beta) > 90° the text is mirrored at the baseline.<br />

(Integer) Index of the font in the fonts[] pseudo object (see the pCOS Path Reference). fontid is never<br />

negative.<br />

(Double) Size of the font (always positive); the relation of this value to the actual height of glyphs is not<br />

fixed, but may vary with the font design. For most fonts the font size is chosen such that it encompasses<br />

all ascenders (including accented characters) and descenders.<br />

(Integer) <strong>Text</strong> rendering mode:<br />

0 fill text<br />

1 stroke text (outline)<br />

2 fill and stroke text<br />

3 invisible text (often used for OCR results)<br />

4 fill text and add it to the clipping path<br />

5 stroke text and add it to the clipping path<br />

6 fill and stroke text and add it to the clipping path<br />

7 add text to the clipping path<br />

1. In the REALbasic binding this field is called attrs.<br />

180 Chapter 10: <strong>TET</strong> Library API Reference

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!