PDFlib TET PDF IFilter 4.0 Manual
PDFlib TET PDF IFilter 4.0 Manual
PDFlib TET PDF IFilter 4.0 Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
2.3 <strong>PDF</strong> Versions and Protected Documents<br />
<strong>PDF</strong> versions. <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> accepts all <strong>PDF</strong> versions up to <strong>PDF</strong> 1.7 extension level 3,<br />
the file format of Acrobat 9. This includes various <strong>PDF</strong>-based ISO standards, e.g. <strong>PDF</strong>/A,<br />
<strong>PDF</strong>/E, and <strong>PDF</strong>/X. Note that ISO 32 000 is technically equivalent to <strong>PDF</strong> 1.7 and therefore<br />
also supported.<br />
Protected <strong>PDF</strong> documents. <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> indexes text and metadata from all documents<br />
as long as it can open it. This includes the following kinds of <strong>PDF</strong> documents:<br />
> Unencrypted documents;<br />
> Documents which are encrypted with a master password, but do not require any<br />
user password. The status of Acrobat’s security setting Content Copying Allowed/Not<br />
Allowed does not affect documents in this group.<br />
At first glance the second category may look like a violation of the document author’s<br />
intention for protecting the document. However, it is not, since <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> does not<br />
provide any means for actually copying the text; it merely helps the search engine with<br />
indexing the document and subsequently locating the document in a search. Once the<br />
document is identified in a search and opened in Acrobat, it is still subject to any restrictions<br />
regarding content copying which may have been specified for the document.<br />
Encrypted <strong>PDF</strong> documents which can not be opened will be logged. This category includes<br />
the following cases:<br />
> Encrypted documents which require a user password, i.e. those which cannot be<br />
opened in Acrobat without supplying the corresponding password.<br />
> Documents which have been encrypted with a user-specific security certificate.<br />
Damaged <strong>PDF</strong> documents. <strong>PDF</strong> documents may contain damaged data structures, either<br />
because of faulty <strong>PDF</strong> generation software or because of some accidental modification<br />
(e.g. caused by failed network transfer). <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> automatically detects damaged<br />
<strong>PDF</strong> documents and attempts to repair such documents in order to allow for<br />
successful extraction of text and metadata. This repair mode will be run automatically<br />
as part of the indexing process. In some cases this mode will not be sufficient, and <strong>TET</strong><br />
<strong>PDF</strong> <strong>IFilter</strong> will therefore process the document with a more thorough repair mode.<br />
Since it is more time-consuming, this forced repair mode is only applied for severely<br />
damaged <strong>PDF</strong>s which cannot successfully be processed in automatic repair mode.<br />
If a document can be opened successfully, but contains one or more damaged pages,<br />
these pages will be ignored and processing continues with subsequent pages. For each<br />
ignored page an entry will be written to the application event log.<br />
2.3 <strong>PDF</strong> Versions and Protected Documents 27