PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
9.6 Encrypted PDF Documents<br />
pCOS supports encrypted and unencrypted PDF documents as input. However, full object<br />
retrieval for encrypted documents requires the appropriate master password to be<br />
supplied when opening the document. Depending on the availability of user and master<br />
password, encrypted documents can be processed in one of the pCOS modes described<br />
below.<br />
See Section 5.1, »Indexing protected PDF Documents«, page 49, for information on<br />
indexing protected documents without supplying the master password.<br />
Full pCOS mode (mode 2). Encrypted PDFs can be processed without any restriction<br />
provided the master password has been supplied upon opening the file. All objects will<br />
be returned unencrypted. Unencrypted documents will always be opened in full pCOS<br />
mode.<br />
Restricted pCOS mode (mode 1). If the document has been opened without the appropriate<br />
master password and does not require a user password (or the user password has<br />
been supplied) pCOS operations are subject to the following restriction: The contents of<br />
objects with type string, stream, or fstream can not be retrieved with the following exceptions:<br />
> The objects /Root/Metadata and /Info/* (document info keys) can be retrieved if<br />
nocopy=false or plainmetadata=true.<br />
> The objects bookmarks[...]/Title and pages[...]/annots/Contents (bookmark and annotation<br />
contents) can be retrieved if nocopy=false, i.e. if text extraction is allowed for the<br />
main text on the pages.<br />
Note See Section 5.1, »Indexing protected PDF Documents«, page 49, for information on retrieving<br />
text in restricted pCOS mode under certain conditions.<br />
Minimum pCOS mode (mode 0). Regardless of the encryption status and the availability<br />
of passwords, the universal pCOS pseudo objects listed in Table 9.3 are always available.<br />
For example, the encrypt pseudo object can be used to query a document’s encryption<br />
status. Encrypted objects can not be retrieved in minimum pCOS mode.<br />
Table 9.6 lists the resulting pCOS modes for various password combinations. Depending<br />
on the document’s encryption status and the password supplied when opening<br />
the file, PDF object paths may be available in minimum, restricted, or full pCOS<br />
mode. Trying to retrieve a pCOS path which is inappropriate for the respective mode<br />
will raise an exception.<br />
Table 9.6 Resulting pCOS modes for various password combinations<br />
If you know...<br />
none of the passwords<br />
only the user password<br />
the master password<br />
...pCOS will run in...<br />
restricted pCOS mode if no user password is set, minimum pCOS mode<br />
otherwise<br />
restricted pCOS mode<br />
full pCOS mode<br />
9.6 Encrypted PDF Documents 119