17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Table 10.4 Suboptions for the glyphmapping option of <strong>TET</strong>_open_document( ) and <strong>TET</strong>_open_document_callback( )<br />

option<br />

codelist<br />

fontname<br />

fonttypes<br />

forceencoding<br />

globalglyphlist<br />

glyphlist<br />

glyphrule<br />

override<br />

forcettsymbolencoding<br />

ignoretounicodecmap<br />

tounicodecmap<br />

description<br />

(String) Name of a codelist resource to be applied to the font. It will have higher priority than an embedded<br />

ToUnicode CMap or encoding entry.<br />

(Name string) Partial or full name of the font(s) which will be selected for the rule. If a subset prefix has<br />

been supplied only the specified subset will be selected. If no subset prefix has been supplied, all fonts<br />

where the name (without any subset prefix) matches will be selected. Limited wildcards 1 are supported.<br />

Default: *<br />

(List of keywords) The glyphmapping will only be applied to the specified font types : * (designates all<br />

font types), Type1, MMType1, TrueType, CIDFontType2, CIDFontType0, Type3. Default: *<br />

(List with one or two strings 2 , If there are two names, the first must be winansi, macroman, or Custom) Replace<br />

the first encoding with the encoding resource specified by the second name. If only one entry is supplied,<br />

the specified encoding will be used to replace all instances of MacRoman, WinAnsi, and MacExpert<br />

encoding. If this option matches a font no other glyph mappings will be applied to the same font.<br />

(Keyword or string 2 ) The name of an encoding which will be used to determine Unicode mappings for<br />

embedded pseudo TrueType symbol fonts which are actually text fonts, or one of the following keywords<br />

(default: auto):<br />

auto<br />

If the font’s builtin encoding (see below) contains at least one Unicode character in the<br />

symbolic range U+F0000-U+F0FF, the encoding specified in the encodinghint option will be<br />

used to map the pseudo symbol characters to real text characters. Otherwise encodinghint<br />

will not be used, and the characters will be mapped according to the builtin keyword.<br />

builtin Use the font’s builtin encoding, which results from the Unicode mappings of the glyph names<br />

in the font’s post table.<br />

The well-known TrueType fonts Wingdings* and Webdings* will always be treated as symbol fonts.<br />

(Boolean) If true, the specified glyph list will be kept in memory until the end of the <strong>TET</strong> object, i.e. it can<br />

be applied to more than one document. Default: false<br />

(String) Name of a glyphlist resource to be applied<br />

(Option list) Mapping rule for numerical glyph names (in addition to the predefined rules). The option list<br />

must contain the following suboptions:<br />

prefix (String; may be empty) Prefix of the glyph names to which the rule will be applied.<br />

base (Keyword) Specifies the interpretation of glyph names:<br />

ascii Single-byte glyphnames will be interpreted as the corresponding literal ASCII<br />

character (e.g. 1 will be mapped to U+0031).<br />

auto Automatically determine whether glyph names represent decimal or hexidecimal<br />

values. If the result is not unique, decimal will be assumed.<br />

dec The glyphnames will be interpreted as a decimal representation of a code.<br />

hex The glyphnames will be interpreted as a hexadecimal representation of a code.<br />

encoding<br />

(String) Name of an encoding resource which will be used for this rule, or the keyword none to<br />

disable the rule.<br />

(Boolean) If true, a ToUnicode CMap for the font will be ignored. Default: false<br />

(Boolean; only reasonable together with the glyphlist or glyphrule option) If true, the glyphmapping<br />

rule will be applied before the standard (builtin) glyph name mappings (i.e. the new mappings will have<br />

priority over the builtin ones), otherwise before. Default: true<br />

(String) Name of a ToUnicode CMap resource to be applied to the font; it will have higher priority than<br />

an embedded ToUnicode CMap or encoding entry.<br />

1. Limited wildcards: The standalone character »*« denotes all fonts; Using »*« after a prefix (e.g. »MSTT*«) denotes all fonts starting<br />

with the specified prefix.<br />

2. The following predefined encoding names can be used without additional configuration: winansi, macroman, macroman_apple,<br />

macroman_euro, ebcdic, ebcdic_37, iso8859-X, cpXXXX, and U+XXXX. Custom encodings can be defined as resources.<br />

132 Chapter 10: <strong>TET</strong> Library API Reference

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!