17.05.2014 Views

PDFlib TET PDF IFilter 4.0 Manual

PDFlib TET PDF IFilter 4.0 Manual

PDFlib TET PDF IFilter 4.0 Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.3 <strong>PDF</strong> Versions and Protected Documents<br />

<strong>PDF</strong> versions. <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> accepts all <strong>PDF</strong> versions up to <strong>PDF</strong> 1.7 extension level 3,<br />

the file format of Acrobat 9. This includes various <strong>PDF</strong>-based ISO standards, e.g. <strong>PDF</strong>/A,<br />

<strong>PDF</strong>/E, and <strong>PDF</strong>/X. Note that ISO 32 000 is technically equivalent to <strong>PDF</strong> 1.7 and therefore<br />

also supported.<br />

Protected <strong>PDF</strong> documents. <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> indexes text and metadata from all documents<br />

as long as it can open it. This includes the following kinds of <strong>PDF</strong> documents:<br />

> Unencrypted documents;<br />

> Documents which are encrypted with a master password, but do not require any<br />

user password. The status of Acrobat’s security setting Content Copying Allowed/Not<br />

Allowed does not affect documents in this group.<br />

At first glance the second category may look like a violation of the document author’s<br />

intention for protecting the document. However, it is not, since <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> does not<br />

provide any means for actually copying the text; it merely helps the search engine with<br />

indexing the document and subsequently locating the document in a search. Once the<br />

document is identified in a search and opened in Acrobat, it is still subject to any restrictions<br />

regarding content copying which may have been specified for the document.<br />

Encrypted <strong>PDF</strong> documents which can not be opened will be logged. This category includes<br />

the following cases:<br />

> Encrypted documents which require a user password, i.e. those which cannot be<br />

opened in Acrobat without supplying the corresponding password.<br />

> Documents which have been encrypted with a user-specific security certificate.<br />

Damaged <strong>PDF</strong> documents. <strong>PDF</strong> documents may contain damaged data structures, either<br />

because of faulty <strong>PDF</strong> generation software or because of some accidental modification<br />

(e.g. caused by failed network transfer). <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> automatically detects damaged<br />

<strong>PDF</strong> documents and attempts to repair such documents in order to allow for<br />

successful extraction of text and metadata. This repair mode will be run automatically<br />

as part of the indexing process. In some cases this mode will not be sufficient, and <strong>TET</strong><br />

<strong>PDF</strong> <strong>IFilter</strong> will therefore process the document with a more thorough repair mode.<br />

Since it is more time-consuming, this forced repair mode is only applied for severely<br />

damaged <strong>PDF</strong>s which cannot successfully be processed in automatic repair mode.<br />

If a document can be opened successfully, but contains one or more damaged pages,<br />

these pages will be ignored and processing continues with subsequent pages. For each<br />

ignored page an entry will be written to the application event log.<br />

2.3 <strong>PDF</strong> Versions and Protected Documents 27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!