17.05.2014 Views

PDFlib 8 Windows COM/.NET Tutorial

PDFlib 8 Windows COM/.NET Tutorial

PDFlib 8 Windows COM/.NET Tutorial

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4 Unicode and Legacy Encodings<br />

This chapter provides basic information about Unicode and other encoding schemes.<br />

Text handling in <strong>PDFlib</strong> heavily relies on the Unicode standard, but also supports various<br />

legacy and special encodings.<br />

4.1 Important Unicode Concepts<br />

Characters and glyphs. When dealing with text it is important to clearly distinguish<br />

the following concepts:<br />

> Characters are the smallest units which convey information in a language. Common<br />

examples are the letters in the Latin alphabet, Chinese ideographs, and Japanese syllables.<br />

Characters have a meaning: they are semantic entities.<br />

> Glyphs are different graphical variants which represent one or more particular characters.<br />

Glyphs have an appearance: they are representational entities.<br />

There is no one-to-one relationship between characters and glyphs. For example, a ligature<br />

is a single glyph which is represented by two or more separate characters. On the<br />

other hand, a specific glyph may be used to represent different characters depending on<br />

the context (some characters look identical, see Figure 4.1).<br />

BMP and PUA. The following terms will occur frequently in Unicode-based environments:<br />

> The Basic Multilingual Plane (BMP) comprises the code points in the Unicode range<br />

U+0000...U+FFFF. The Unicode standard contains many more code points in the supplementary<br />

planes, i.e. in the range U+10000...U+10FFFF.<br />

> A Private Use Area (PUA) is one of several ranges which are reserved for private use.<br />

PUA code points cannot be used for general interchange since the Unicode standard<br />

does not specify any characters in this range. The Basic Multilingual Plane includes a<br />

PUA in the range U+E000...U+F8FF. Plane fifteen (U+F0000... U+FFFFD) and plane sixteen<br />

(U+100000...U+10FFFD) are completely reserved for private use.<br />

Characters<br />

Glyphs<br />

U+0067 LATIN SMALL LETTER G<br />

U+0066 LATIN SMALL LETTER F +<br />

U+0069 LATIN SMALL LETTER I<br />

U+2126 OHM SIGN or<br />

U+03A9 GREEK CAPITAL LETTER OMEGA<br />

U+2167 ROMAN NUMERAL EIGHT or<br />

U+0056 V U+0049 I U+0049 I U+0049 I<br />

Fig. 4.1<br />

Relationship of glyphs<br />

and characters<br />

4.1 Important Unicode Concepts 99

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!