PDFlib TET PDF IFilter 4.0 Manual

More documents

Recommendations

Info

In the Unicode code charts compatibility mappings are marked with the symbol ALMOST EQUAL TO U+00C4 U+2248 , followed by the decomposition name (or »tag«) in angle brackets, e.g. . If no tag name is provided, is assumed. The tag names are identical to the option names in Table 2.5. As can be seen in some of the examples, the result of a decomposition may convert a single character to a sequence of multiple characters. The following document option preserves wide (double-byte or zenkaku) and hankaku (narrow) characters: decompose={wide=_none narrow=_none} 32 Chapter 2: Indexing PDF Contents
2.4.3 Unicode Normalization The Unicode standard defines four normalization forms which are based on the notions of canonical equivalence and compatibility equivalence. All normalization forms put combining marks in a specific order and apply decomposition and composition in different ways: > Normalization Form C (NFC) applies canonical decomposition followed by canonical composition. > Normalization Form D (NFD) applies canonical decomposition. > Normalization Form KC (NFKC) applies compatibility decomposition followed by canonical composition. > Normalization Form KD (NFKD) applies compatibility decomposition. The normalization forms are specified in Unicode Standard Annex #15 »Unicode Normalization Forms« (see www.unicode.org/versions/Unicode5.2.0/ch03.pdf#G21796 and www.unicode.org/reports/tr15/). TET PDF IFilter supports all four Unicode normalization forms. Unicode normalization can be controlled via the normalize document option, e.g. normalize=nfc TET PDF IFilter does not apply normalization by default. Because of the possible interaction between the decompose and normalize options, setting the normalize option to a value different from none disables the default decompositions. The choice of normalization form depends on the application’s requirements. For example, some databases expect text in NFC which also the preferred format for Unicode text on the Web. Table 2.6 demonstrates the effect of Normalization on various characters. Table 2.6 Unicode normalization forms: examples before normalization NFC NFD NFKC NFKD U+00C4 U+00C4 U+0041 U+0308 U+00C4 U+0041 U+0308 U+0041 U+0308 U+00C4 U+0041 U+0308 U+00C4 U+0041 U+0308 U+0308 U+0041 U+0308 U+0041 U+0308 U+0041 U+0308 U+0041 U+0308 U+0041 U+FB01 U+FB01 U+FB01 U+0066 U+0069 U+0066 U+0069 U+0033 U+2075 U+0033 U+2075 U+0033 U+2075 U+0033 U+0035 U+0033 U+0035 U+212B U+00C5 U+0041 U+030A U+00C5 U+0041 U+030A U+2122 U+2122 U+2122 U+0054 U+004D U+0054 U+004D 2.4 Unicode Postprocessing 33
Page 1 and 2: ABC TET PDF IFilter Version 4.0 Ent
Page 3 and 4: Contents 0 Installing TET PDF IFilt
Page 5 and 6: 0 Installing TET PDF IFilter TET PD
Page 7 and 8: 1 Getting Started This chapter desc
Page 9 and 10: Table 1.1 Query syntax examples for
Page 11 and 12: C:\Program Files\Common Files\Micro
Page 13 and 14: 1.4 SQL Server System requirements.
Page 15 and 16: Now you can create the full-text in
Page 17 and 18: Type the following to stop the serv
Page 19 and 20: 2 Indexing PDF Contents 2.1 PDF Doc
Page 21 and 22: XMP metadata on document level. XMP
Page 23 and 24: play conformance information for th
Page 25 and 26: Note TET PDF IFilter does not apply
Page 27 and 28: 2.3 PDF Versions and Protected Docu
Page 29 and 30: Table 2.3 Examples for the fold opt
Page 31: Table 2.5 Compatibility decompositi
Page 35: 2.5 Custom Glyph Mapping Tables Alt
Page 38 and 39: Extended pCOS paths. The pCOS (PDFl
Page 40 and 41: 3.2 Metadata Organization Metadata
Page 42 and 43: 3.4 Custom Metadata Properties Cust
Page 44 and 45: 3.5 Multivalued Properties Metadata
Page 46 and 47: Scenario 1: Transparently blend met
Page 48 and 49: Table 3.3 XML configuration for Win
Page 50 and 51: SQL queries for metadata properties
Page 52 and 53: SOME ARRAY ['Bembo', 'TimesNewRoman
Page 54 and 55: defined Metadata Properties« (if y
Page 56 and 57: Table 3.6 Property data types for S
Page 58 and 59: 3.9 Metadata in SQL Server SQL Serv
Page 60 and 61: Predefined properties. A column def
Page 62 and 63: Table 3.11 Metadata query examples
Page 64 and 65: An XSD schema description for the X
Page 66 and 67: Table 4.1 XML elements and attribut
Page 68 and 69: Table 4.1 XML elements and attribut
Page 71 and 72: 5 Troubleshooting 5.1 TET PDF IFilt
Page 73 and 74: 5.2 Problems with TET PDF IFilter O
Page 75 and 76: HKEY_LOCAL_MACHINE\SOFTWARE\Microso
Page 77 and 78: Sample output: FILE: udhr_japanese.
Page 79 and 80: A Predefined Metadata Properties Th
Page 81 and 82: Table A.1 Property handling in TET
Page 83:
Table A.1 Property handling in TET
Page 87 and 88:
Index A annotations 21 B bookmarks
Page 90:
ABC PDFlib GmbH Franziska-Bilek-Weg
show all

PDFlib TET PDF IFilter 4.0 Manual

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?