PDFlib TET PDF IFilter 4.0 Manual
PDFlib TET PDF IFilter 4.0 Manual
PDFlib TET PDF IFilter 4.0 Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Table 4.1 XML elements and attributes in the configuration file<br />
element<br />
DocOptions<br />
parent: Tet<br />
Filtering<br />
parent:<br />
TetPdf<strong>IFilter</strong>Config<br />
LocaleId<br />
parent: Filtering<br />
Metadata<br />
parent:<br />
TetPdf<strong>IFilter</strong>Config<br />
description of the element and its attributes<br />
(May appear zero or one time) The value contains an option list for <strong>TET</strong>_open_document( ) in the<br />
<strong>TET</strong> kernel.<br />
(May appear zero or one time) Specify details of the <strong>PDF</strong> filtering process. Supported attributes:<br />
indexNestedPdf<br />
(Boolean; optional) Process <strong>PDF</strong> attachments recursively (see Section 2.1, »<strong>PDF</strong><br />
Document Domains«, page 19). Default: true<br />
metadataHandling<br />
(Choice; optional) Select the type of metadata handling (see Section 3.6, »Indexing<br />
Metadata Properties as Text«, page 45). Default: property<br />
ignore Drop all metadata properties. This may be useful for debugging or performance<br />
optimization in situations where metadata is not required.<br />
property Treat metadata as properties.<br />
propertyAndPrefixedText<br />
In addition to treating metadata as properties, prepend the prefix specified<br />
in textIndexPrefix (if present) for custom properties and the prefixes<br />
according to Table 3.2, page 45, for predefined properties, and treat the<br />
result as plain text.<br />
propertyAndText<br />
In addition to treating metadata as properties, treat metadata as plain<br />
text.<br />
useIdentifier<br />
(Boolean; optional) Specify whether identifier or friendlyName will be used to<br />
identify properties if both of these attributes for the Property element are present.<br />
Default: true<br />
(May appear zero or one time) Configure locale ID detection (see Section 2.2, »Automatic<br />
Language Detection«, page 24). Supported attributes:<br />
arabic (LCID; optional) LCID for Arabic text. Default: 0x0401 Arabic (SA)<br />
chinese<br />
cyrillic<br />
default<br />
(LCID; optional) LCID for Chinese text. Default: 0x0804 Chinese (People's Republic of<br />
China)<br />
(LCID; optional) LCID for Cyrillic text. Default: 0x0419 Russian (RU)<br />
(LCID; optional) Global LCID which will be used for all text chunks if detection is<br />
disabled. Default: 0x0800 (system-locale)<br />
detection (Choice; optional) Control automatic LCID detection. Default: auto<br />
auto Determine LCID based on script and statistical language analysis.<br />
disabled Disable LCID detection; all other attributes except default and use-<br />
CatalogLang will be ignored.<br />
script (<strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> <strong>4.0</strong>) Determine LCID based on script.<br />
latin (LCID; optional) LCID for Latin text. Default: 0x0409 English (US)<br />
useCatalogLang<br />
(Boolean; optional; <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> <strong>4.0</strong>) Specify whether the Lang entry in the<br />
document’s catalog will be evaluated. If true, <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> checks the Lang entry in<br />
the <strong>PDF</strong> document catalog. If present, the Lang entry will be converted to an LCID. If<br />
the conversion is successful the LCID overrides the value of the LocaleId/@default<br />
attribute; if the LCID belongs to one of the Arabic, Chinese, Cyrillic, or Latin scripts it<br />
overrides the value of the corresponding attribute of the LocaleId element. Default:<br />
true<br />
(May appear zero or one time) Specify metadata properties (see Section 3.4, »Custom Metadata<br />
Properties«, page 42). If present, this element must appear after Filtering and Tet.<br />
66 Chapter 4: XML Configuration File