10.07.2015 Views

Download - Multivac!

Download - Multivac!

Download - Multivac!

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

EBCDIC UTF-8 BOM). If the usehypertextencoding parameter is true, the encoding specifiedin hypertextencoding will be applied to name strings as well. This can be used, for example,to specify font or file names in Shift-JIS.In C the length parameter must be 0 for UTF-8 strings. If it is different from 0 thestring will be interpreted as UTF-16. In all other non-Unicode-aware language bindingsthere is no length parameter available in the API functions, and name strings must alwaysbe supplied in UTF-8 format. In order to create Unicode name strings in this caseyou can use the PDF_utf16_to_utf8( ) utility function to create UTF-8 (see below).Unicode conversion functions. In non-Unicode-aware language bindings PDFlib offersthe PDF_utf16_to_utf8( ), PDF_utf8_to_utf16( ), and PDF_utf32_to_utf16( ) conversion functionswhich can be used to create UTF-8 or UTF-16 strings for passing them to PDFlib.The language-specific sections in Chapter 2, »PDFlib Language Bindings«, page 25,provide more details regarding useful Unicode string conversion methods provided bycommon language environments.Text format for content and hypertext strings. Unicode strings in PDFlib can be suppliedin the UTF-8, UTF-16, or UTF-32 formats with any byte ordering. The choice of formatcan be controlled with the textformat parameter for all text on page descriptions,and the hypertextformat parameter for interactive elements. Table 4.2 lists the valueswhich are supported for both of these parameters. The default for the [hyper]textformatparameter is auto. Use the usehypertextencoding parameter to enforce the same behaviorfor name strings. The default for the hypertextencoding parameter is auto.Table 4.2 Values for the textformat and hypertextformat parameters[hyper]textformatbytesutf8ebcdicutf8utf16utf16beutf16leautoexplanationOne byte in the string corresponds to one character. This is mainly useful for 8-bit encodings andsymbolic fonts. A UTF-8 BOM at the start of the string will be evaluated and then removed.Strings are expected in UTF-8 format. Invalid UTF-8 sequences will trigger an exception ifglyphcheck=error, or will be deleted otherwise.Strings are expected in EBCDIC-coded UTF-8 format (only on iSeries and zSeries).Strings are expected in UTF-16 format. A Unicode Byte Order Mark (BOM) at the start of the stringwill be evaluated and then removed. If no BOM is present the string is expected in the machine’snative byte ordering (on Intel x86 architectures the native byte order is little-endian, while onSparc and PowerPC systems it is big-endian).Strings are expected in UTF-16 format in big-endian byte ordering. There is no special treatmentfor Byte Order Marks.Strings are expected in UTF-16 format in little-endian byte ordering. There is no special treatmentfor Byte Order Marks.Content strings: equivalent to bytes for 8-bit encodings and non-Unicode CMaps, and utf16 forwide-character addressing (unicode, glyphid, or a UCS2 or UTF16 CMap).Hypertext strings: UTF-8 and UTF-16 strings with BOM will be detected (in C UTF-16 strings mustbe terminated with a double-null). If the string does not start with a BOM, it will be interpreted asan 8-bit encoded string according to the hypertextencoding parameter.This setting will provide proper text interpretation in most environments which do not use Unicodenatively.78 Chapter 4: Unicode and Legacy Encodings

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!