17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

9.6 Encrypted PDF Documents<br />

pCOS supports encrypted and unencrypted PDF documents as input. However, full object<br />

retrieval for encrypted documents requires the appropriate master password to be<br />

supplied when opening the document. Depending on the availability of user and master<br />

password, encrypted documents can be processed in one of the pCOS modes described<br />

below.<br />

See Section 5.1, »Indexing protected PDF Documents«, page 49, for information on<br />

indexing protected documents without supplying the master password.<br />

Full pCOS mode (mode 2). Encrypted PDFs can be processed without any restriction<br />

provided the master password has been supplied upon opening the file. All objects will<br />

be returned unencrypted. Unencrypted documents will always be opened in full pCOS<br />

mode.<br />

Restricted pCOS mode (mode 1). If the document has been opened without the appropriate<br />

master password and does not require a user password (or the user password has<br />

been supplied) pCOS operations are subject to the following restriction: The contents of<br />

objects with type string, stream, or fstream can not be retrieved with the following exceptions:<br />

> The objects /Root/Metadata and /Info/* (document info keys) can be retrieved if<br />

nocopy=false or plainmetadata=true.<br />

> The objects bookmarks[...]/Title and pages[...]/annots/Contents (bookmark and annotation<br />

contents) can be retrieved if nocopy=false, i.e. if text extraction is allowed for the<br />

main text on the pages.<br />

Note See Section 5.1, »Indexing protected PDF Documents«, page 49, for information on retrieving<br />

text in restricted pCOS mode under certain conditions.<br />

Minimum pCOS mode (mode 0). Regardless of the encryption status and the availability<br />

of passwords, the universal pCOS pseudo objects listed in Table 9.3 are always available.<br />

For example, the encrypt pseudo object can be used to query a document’s encryption<br />

status. Encrypted objects can not be retrieved in minimum pCOS mode.<br />

Table 9.6 lists the resulting pCOS modes for various password combinations. Depending<br />

on the document’s encryption status and the password supplied when opening<br />

the file, PDF object paths may be available in minimum, restricted, or full pCOS<br />

mode. Trying to retrieve a pCOS path which is inappropriate for the respective mode<br />

will raise an exception.<br />

Table 9.6 Resulting pCOS modes for various password combinations<br />

If you know...<br />

none of the passwords<br />

only the user password<br />

the master password<br />

...pCOS will run in...<br />

restricted pCOS mode if no user password is set, minimum pCOS mode<br />

otherwise<br />

restricted pCOS mode<br />

full pCOS mode<br />

9.6 Encrypted PDF Documents 119

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!