PDFlib TET PDF IFilter 4.0 Manual
PDFlib TET PDF IFilter 4.0 Manual
PDFlib TET PDF IFilter 4.0 Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3.4 Custom Metadata Properties<br />
Custom metadata properties are additional properties beyond the predefined properties<br />
which meet specific requirements within an enterprise, organization, industry etc.<br />
<strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> gives you full control over custom properties: they can be specified in<br />
the configuration file so that they will be generated by <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> and indexed by<br />
the search engine.<br />
Planning custom metadata properties. In order to specify custom properties you<br />
must consider the following aspects (see »Property identification and GUIDs«, page 40,<br />
for details on GUIDs, identifiers, and friendly names):<br />
> You can group one or more properties in a property set. Each property set needs a<br />
unique 128-bit identifier called the GUID.<br />
> The property identifier is a unique integer which identifies the property within its<br />
property set. Property identifiers in a set start at the value 2. With some <strong>IFilter</strong> clients<br />
the identifier can be replaced with a friendly name. You can override predefined<br />
properties by assigning the corresponding GUID+ID combination.<br />
> The friendly name for a property is optional if an identifier is available, and required<br />
otherwise. It can be an arbitrary name which must be unique within the configuration<br />
file. While for some <strong>IFilter</strong> clients it can be used instead of the identifier, friendly<br />
names do not work with all <strong>IFilter</strong> clients.<br />
> Property source: properties can be populated from document metadata or general<br />
<strong>PDF</strong> information according to Section 3.1, »Sources of Metadata in <strong>PDF</strong>«, page 37.<br />
> The data type of the property: Int32 (32-bit integer), Double (floating point number<br />
with double precision), Boolean (true/false), DateTime (specification of a point in<br />
time), and String.<br />
> The precedence rule: if there is more than one data source for the property you can<br />
specify whether the first available non-empty data source will have precedence (i.e.<br />
subsequent sources will be ignored), or whether data from all non-empty sources<br />
will be collected.<br />
> Specify whether the property will be emitted as a vector, i.e. multiple values will be<br />
handed to the <strong>IFilter</strong> interface in an array structure instead of a flat value (see Section<br />
3.5, »Multivalued Properties«, page 44).<br />
> A prefix which will be prepended to the property name if properties are indexed as<br />
part of the full text (see Section 3.6, »Indexing Metadata Properties as Text«, page 45).<br />
XML configuration for custom properties. One or more custom properties can be specified<br />
in the PropertySet element, where each Property element describes a property in the<br />
set:<br />
<br />
<br />
<br />
<br />
<br />
Multiple <strong>PDF</strong> sources can be mapped to the same Windows property. The presence of a<br />
Property element will automatically enable processing for the specified property. However,<br />
handling of all predefined and custom metadata properties can be completely disabled<br />
with the metadataHandling attribute of the Filtering element:<br />
42 Chapter 3: Metadata Properties