PDFlib 8 Windows COM/.NET Tutorial
PDFlib 8 Windows COM/.NET Tutorial
PDFlib 8 Windows COM/.NET Tutorial
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
5.2 Unicode Characters and Glyphs<br />
5.2.1 Glyph IDs<br />
A font is a collection of glyphs, where each glyph is defined by its geometric outline.<br />
<strong>PDFlib</strong> assigns a number to each glyph in the font. This number is called the glyph id or<br />
GID. GID 0 (zero) refers to the .notdef glyph in all font formats. The visual appearance of<br />
the .notdef glyph varies among font formats and vendors; typical implementations are<br />
the space glyph or a hollow or crossed-out rectangle. The highest GID is one less than<br />
the number of glyphs in the font which can be queried with the numglyphs keyword of<br />
info_font( ).<br />
The assignment of glyph IDs depends on the font format:<br />
> Since TrueType and OpenType fonts already contain internal GIDs, <strong>PDFlib</strong> uses these<br />
GIDs.<br />
> For CID-keyed OpenType CJK fonts CIDs will be used as GIDs.<br />
> For other font types <strong>PDFlib</strong> numbers the glyphs according to the order of the corresponding<br />
outline descriptions in the font.<br />
<strong>PDFlib</strong> supports glyph selection via GID as an alternative to Unicode and other encodings<br />
(see »Glyphid encoding«, page 129). Direct GID addressing is only useful for specialized<br />
applications, e.g. printing font overview tables by querying the number of glyphs<br />
and iterating over all glyph IDs.<br />
5.2.2 Unicode Mappings for Glyphs<br />
Unicode mappings. <strong>PDFlib</strong> assigns a unique Unicode value to each GID. This mapping<br />
process depends on the font format and is detailed in the sections below for the supported<br />
font types. Although a unique Unicode value will be assigned to each GID, the reverse<br />
is not necessarily true, i.e. a particular glyph can represent multiple Unicode values.<br />
Common examples in many TrueType and OpenType fonts are the empty glyph<br />
which represents U+0020 Space as well as U+00A0 No-Break Space, and a glyph which<br />
represents both U+2126 Ohm Sign and U+03A9 Greek Capital Letter Omega. If multiple<br />
Unicode values point to the same glyph in a font <strong>PDFlib</strong> will assign the first Unicode value<br />
found in the font.<br />
Unmapped glyphs and the Private Use Area (PUA). In some situations the font may<br />
not provide a Unicode value for a particular glyph. In this case <strong>PDFlib</strong> assigns a value<br />
from the Unicode Private Use Area (PUA, see Section 4.1, »Important Unicode Concepts«,<br />
page 99) to the glyph. Such glyphs are called unmapped glyphs. The number of unmapped<br />
glyphs in a font can be queried with the unmappedglyphs keyword of info_<br />
font( ). Unmapped glyphs will be represented by the Unicode replacement character<br />
U+FFFD in the font’s ToUnicode CMap which controls searchability and text extraction.<br />
As a consequence, unmapped glyphs cannot be properly extracted as text from the generated<br />
PDF.<br />
When <strong>PDFlib</strong> assigns PUA values to unmapped glyphs it uses ascending values from<br />
the following pool:<br />
> The basis is the Unicode PUA range in the Basic Multilingual Plane (BMP), i.e. the<br />
range U+E000 - U+F8FF. Additional PUA values in plane 15 (U+F0000 to U+FFFFD) are<br />
used if required.<br />
5.2 Unicode Characters and Glyphs 121