17.05.2014 Views

PDFlib TET PDF IFilter 4.0 Manual

PDFlib TET PDF IFilter 4.0 Manual

PDFlib TET PDF IFilter 4.0 Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2 Indexing <strong>PDF</strong> Contents<br />

2.1 <strong>PDF</strong> Document Domains<br />

<strong>PDF</strong> documents are much more than just a set of pages: they may contain text in other<br />

places, such as annotations and bookmarks, and make use of metadata in Adobe’s XMP<br />

form or as classical document info entries. The places in a <strong>PDF</strong> document which may<br />

contain text are referred to as <strong>PDF</strong> document domains. The list below describes all <strong>PDF</strong><br />

document domains along with notes how to display the corresponding text in Acrobat.<br />

The list also contains the default actions of <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> for all document domains. In<br />

short, <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> will index the text in all relevant locations. As a result, you may<br />

get search hits for documents where it is not obvious at first glance why a hit is produced.<br />

Since search term highlighting is generally not available in <strong>IFilter</strong> clients, it is important<br />

to know how to locate the search term in the result documents. Remember that<br />

the searched text may be present in a location different from the actual page contents,<br />

and refer to the list below if you have trouble locating the search text in a <strong>PDF</strong> document<br />

where <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> reports a search hit.<br />

Notes regarding the descriptions below:<br />

> Searching »Multiple <strong>PDF</strong>s« with Acrobat refers to the following kind of search: Edit,<br />

Search and in the Look In: pull-down select a folder of <strong>PDF</strong> documents.<br />

> Some of the descriptions refer to the property set collections documentXMP,<br />

imageXMP, shell, pdf, and internal. These can be enabled in the XML configuration file<br />

(see Section 3.3, »Predefined Metadata Properties«, page 41). By default, the shell and<br />

internal property set collections are enabled, while the pdf, documentXMP, and<br />

imageXMP property set collections are disabled. See Section 3.3, »Predefined Metadata<br />

Properties«, page 41, for more details on property set collections.<br />

> The notation @indexNestedPdf refers to an attributes in the XML configuration file<br />

(see Section 4.2, »XML Elements and Attributes«, page 65).<br />

Text on the page. Page contents are the main source of text in <strong>PDF</strong>. Text on a page is<br />

rendered with fonts and encoded using one of the many encoding techniques available<br />

in <strong>PDF</strong>.<br />

> How to display with Acrobat 8/9: page contents are always visible<br />

> How to search a single <strong>PDF</strong> with Acrobat 8/9: Edit, Find or Edit, Search. <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong><br />

may be able to process the text in documents where Acrobat does not correctly map<br />

glyphs to Unicode values. In this situation you can use the <strong>TET</strong> Plugin. The <strong>TET</strong> Plugin<br />

offers its own search dialog via Plug-Ins, <strong><strong>PDF</strong>lib</strong> <strong>TET</strong> Plugin... <strong>TET</strong> Find. However, it is<br />

not intended as a full-blown search facility.<br />

> How to search multiple <strong>PDF</strong>s with Acrobat 8/9: Edit Search and in Where would you like<br />

to search? select All <strong>PDF</strong> Documents in, and browse to a folder with <strong>PDF</strong> documents.<br />

> <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong>: page contents are always indexed<br />

Predefined document info entries. Classical document info entries are key/value<br />

pairs.<br />

> How to display with Acrobat 8/9: File, Properties...<br />

> How to search a single <strong>PDF</strong> with Acrobat 8/9: not available<br />

2.1 <strong>PDF</strong> Document Domains 19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!