17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

glyphmapping {{fontname=Warnock* tounicodecmap=warnock}}<br />

Glyph list resources for simple fonts. Glyph lists (short for: glyph name lists) can be<br />

used to provide custom Unicode values for non-standard glyph names, or override the<br />

existing values for standard glyph names. A glyph list is a text file where each line describes<br />

a Unicode mapping for a single glyph name according to the following rules:<br />

> <strong>Text</strong> after a percent sign ’%’ will be ignored; this can be used for comments.<br />

> The first column contains the glyph name. Any glyph name used in a font can be<br />

used (i.e. even the Unicode values of standard glyph names can be overridden). In order<br />

to use the percent sign as part of a glyph name the sequence \% must be used<br />

(since the percent sign serves as the comment introducer).<br />

> At most one mapping for a particular glyph name is allowed; multiple mappings for<br />

the same glyph name will be treated as an error.<br />

> The remainder of the line contains up to 7 Unicode code points for the glyph name.<br />

The values can be supplied in decimal notation or (with the prefix x or 0x) in hexadecimal<br />

notation. UTF-32 is supported, i.e. surrogate pairs can be used.<br />

> Unprintable characters in glyph names can be inserted by using escape sequences<br />

for text files (see Section 5.2, »Resource Configuration and File Searching«, page 51)<br />

By convention, glyph lists use the file name suffix .gl. Glyph lists can be configured with<br />

the glyphlist resource. If no glyph list resource has been specified explicitly, <strong>TET</strong> will<br />

search for a file named .gl (where is the resource name) in the<br />

searchpath hierarchy (see Section 5.2, »Resource Configuration and File Searching«, page<br />

51, for details). In other words: if the resource name and the file name (without the .gl<br />

suffix) are identical you don’t have to configure the resource since <strong>TET</strong> will implicitly do<br />

the equivalent of the following call (where name is an arbitrary resource name):<br />

<strong>TET</strong>_set_option(tet, "glyphlist {name name.gl}");<br />

Due to the precedence rules for glyph mapping, glyph lists will not be consulted if the<br />

font contains a ToUnicode CMap. The following sample demonstrates the use of glyph<br />

lists:<br />

% Unicode values for glyph names used in TeX documents<br />

precedesequal<br />

similarequal<br />

negationslash<br />

union<br />

prime<br />

0x227C<br />

0x2243<br />

0x2044<br />

0x222A<br />

0x2032<br />

In order to apply a glyph list to all font names starting with CMSY use the following option<br />

for <strong>TET</strong>_open_document( ):<br />

glyphmapping {{fontname=CMSY* glyphlist=tarski}}<br />

Rules for interpreting numerical glyph names in simple fonts. Sometimes PDF documents<br />

contain glyphs with names which are not taken from some predefined list, but<br />

are generated algorithmically. This can be a »feature« of the application generating the<br />

PDF, or may be caused by a printer driver which converts fonts to another format: sometimes<br />

the original glyph names get lost in the process, and are replaced with schematic<br />

names such as G00, G01, G02, etc. <strong>TET</strong> contains builtin glyph name rules for processing<br />

6.8 Advanced Unicode Mapping Controls 79

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!