PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
tected PDF (after the search engine indexed the contents and the hit list contained a link<br />
to the PDF), the document’s internal permission settings will protect the document as<br />
usual when accessed by the user.<br />
The shrug feature for protected documents. <strong>TET</strong> offers a feature which can be used to<br />
extract text and images from protected documents, assuming the <strong>TET</strong> user accepts responsibility<br />
for respecting the document author’s rights. This feature is called shrug,<br />
and works as follows: by supplying the shrug option to <strong>TET</strong>_open_document( ) the user asserts<br />
that he or she will not violate any document authors’ rights. <strong>PDFlib</strong> GmbH’s terms<br />
and conditions require that <strong>TET</strong> customers respect PDF permission settings.<br />
If all of the following conditions are true, the shrug feature will be enabled:<br />
> The shrug option has been supplied to <strong>TET</strong>_open_document( ).<br />
> The document requires a master password but it has not been supplied to <strong>TET</strong>_open_<br />
document( ).<br />
> If the document requires a user (open) password, it must have been supplied to <strong>TET</strong>_<br />
open_document( ).<br />
> <strong>Text</strong> extraction is not allowed in the document’s permission settings, i.e.<br />
nocopy=true.<br />
The shrug feature will have the following effects:<br />
> Extracting content from the document is allowed despite nocopy=true. The user is responsible<br />
for respecting the document author’s rights.<br />
> The pCOS pseudo object shrug will be set to true/1.<br />
> pCOS runs in full mode (instead of restricted mode), i.e. the pcosmode pseudo object<br />
will be set to 2.<br />
The shrug pseudo object can be used according to the following idiom to determine<br />
whether or not the contents can directly be made available to the user, or should only<br />
be used for indexing and similar indirect purposes:<br />
int doc = tet.open_document(filename, "shrug");<br />
...<br />
if ((int) tet.pcos_get_number(doc, "shrug") == 1)<br />
{<br />
/* only indexing allowed */<br />
}<br />
else<br />
{<br />
/* content may be delivered to the user */<br />
}<br />
50 Chapter 5: Configuration