17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

width<br />

(x, y)<br />

beta<br />

fontsize<br />

baseline<br />

fontsize<br />

(x, y)<br />

alpha<br />

width<br />

Fig. 6.3<br />

Glyph metrics for horizontal and vertical writing mode<br />

in <strong>TET</strong>’s default coordinate system (topdown=false)<br />

group of artificial characters comprises the continuation of a multi-character sequence<br />

(e.g. the second character of a ligature) and inserted separator characters. For<br />

artificial characters the position (x, y) will specify the endpoint of the most recent<br />

real character, the width is 0, and all other fields except uv are those of the most recent<br />

real character. The endpoint is the point (x, y) plus the width added in direction<br />

alpha (in horizontal writing mode) or plus the fontsize in direction -90˚ (in vertical<br />

writing mode).<br />

> The unknown field will usually be false (in C and C++: 0), but has a value of true (in C<br />

and C++: 1) if the original glyph could not be mapped to Unicode and has therefore<br />

been replaced with the character specified in the unknownchar option. Using this<br />

field you can distinguish real document content from replaced characters if you<br />

specified a common character as unknownchar, such as a question mark or space.<br />

> The attributes field contains information about the subscript, superscript, dropcap,<br />

or shadow status of the glyph as determined by <strong>TET</strong>’s content analysis algorithms.<br />

> The (x, y) fields specify the position of the glyph’s reference point, which is the lower<br />

left corner of the glyph rectangle in horizontal writing mode, and the top center in<br />

vertical writing mode (see Section 6.3, »Chinese, Japanese, and Korean <strong>Text</strong>«, page 79<br />

for details on vertical writing mode). For artificial characters, which do not correspond<br />

to any glyph on the page, the point (x, y) specifies the end point of the most recent<br />

real character. The value of y is subject to the topdown page option.<br />

> The width field specifies the width of a glyph according to the corresponding font<br />

metrics and text output parameters, such as character spacing and horizontal scaling.<br />

Since these parameters control the position of the next glyph, the distance between<br />

the reference points of two adjacent glyphs may be different from width. The<br />

width may be zero for non-spacing characters. On the other hand, the outline may<br />

actually be wider than the glyph’s width value, e.g. for slanted text.<br />

The width is 0 for artificial characters.<br />

> The angle alpha provides the direction of inline text progression, specified as the deviation<br />

from the standard direction. The standard direction is 0˚ for horizontal writing<br />

mode, and -90˚ for vertical writing mode (see below for more details on vertical<br />

6.2 Page and <strong>Text</strong> Geometry 75

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!