17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Fig. 6.5<br />

Sample font reports created with the <strong>PDFlib</strong> FontReporter plugin for Adobe Acrobat<br />

Precedence rules. <strong>TET</strong> will apply the glyph mapping controls in the following order:<br />

> Codelist and ToUnicode CMap resources will be consulted first.<br />

> If the font has an internal ToUnicode CMap it will be considered next.<br />

> For glyph names <strong>TET</strong> will apply an external or internal glyph name mapping rule if<br />

one is available which matches the font and glyph name.<br />

> Lastly, a user-supplied glyph list will be applied.<br />

Code list resources for all font types. Code lists are similar to glyph lists except that<br />

they specify Unicode values for individual codes instead of glyph names. Although<br />

multiple fonts from the same foundry may use identical code assignments, codes (also<br />

called glyph ids) are generally font-specific. As a consequence, separate code lists will be<br />

required for individual fonts. A code list is a text file where each line describes a Unicode<br />

mapping for a single code according to the following rules:<br />

> <strong>Text</strong> after a percent sign ’%’ will be ignored; this can be used for comments.<br />

> The first column contains the glyph code in decimal or hexadecimal notation. This<br />

must be a value in the range 0-255 for simple fonts, and in the range 0-65535 for CID<br />

fonts.<br />

> The remainder of the line contains up to 7 Unicode code points for the code. The values<br />

can be supplied in decimal notation or (with the prefix x or 0x) in hexadecimal<br />

notation. UTF-32 is supported, i.e. surrogate pairs can be used.<br />

6.8 Advanced Unicode Mapping Controls 77

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!