PDFlib TET PDF IFilter 4.0 Manual
PDFlib TET PDF IFilter 4.0 Manual
PDFlib TET PDF IFilter 4.0 Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Extended pCOS paths. The pCOS (<strong><strong>PDF</strong>lib</strong> Comprehensive Object Syntax) interface provides<br />
a simple and elegant facility for retrieving arbitrary information from all sections<br />
of a <strong>PDF</strong> document which do not describe page contents, such as page dimensions,<br />
metadata, interactive elements, etc. Examples for using the pCOS interface and a description<br />
of the pCOS path syntax are contained in the pCOS Path Reference which is<br />
available as a separate document. Additional examples can be found in the pCOS Cookbook<br />
at www.pdflib.com/pcos-cookbook/.<br />
pCOS can be used in <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> to address information about a document. pCOS<br />
paths represent some aspect of the <strong>PDF</strong> document such as bookmarks, font name, or<br />
page size, and can also address document info entries or XMP metadata streams (but<br />
not properties within an XMP stream). While the pCOS API functions are not available in<br />
<strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong>, you can supply pCOS paths as expressions in the XML configuration file<br />
in order to index information about a document<br />
Table 3.1 lists the pCOS paths for some commonly used <strong>PDF</strong> objects. Many pCOS objects<br />
are represented by arrays which require an array index in angle brackets, e.g.<br />
pages[0] denotes the first page (pages are counted starting at 0). In addition to all pCOS<br />
paths supported by the <strong>TET</strong> product (which forms the basis of <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong>), the <strong>IFilter</strong><br />
supports the following syntax extension for pCOS paths: You can use a » * « (asterisk)<br />
wildcard character instead of an array or dictionary index. This means that <strong>TET</strong> <strong>PDF</strong><br />
<strong>IFilter</strong> will iterate over all possible values of the array index, and include all corresponding<br />
object values in the indexing process. An arbitrary number of wildcards may be<br />
used within a single pCOS path.<br />
Table 3.1 pCOS paths for various <strong>PDF</strong> objects<br />
pCOS path type explanation<br />
length:pages number number of pages in the document<br />
/Info/Title string standard document info field Title<br />
/Info/ArticleNumber string custom document info field ArticleNumber (document info entries can<br />
use arbitrary names)<br />
/Root/Metadata stream XMP stream with the document’s metadata<br />
images[*]/Metadata stream XMP metadata streams for all images in the document<br />
fonts[*]/name string name of a font; the number of entries can be retrieved with length:fonts<br />
length:fonts string the number of fonts in a document<br />
length:images string the number of images in a document<br />
fonts[*]/embedded boolean embedding status of a font<br />
pages[*]/width number width of the visible area of the page<br />
pages[*]/annots[*]/A/URI string target URL of the Web links on all pages<br />
pages[*]/Resources/<br />
Properties[*]/Metadata<br />
stream<br />
XMP metadata for placed artwork on all pages in the document<br />
fields[*]/V string text contents (value) of all form fields on all pages in the document<br />
38 Chapter 3: Metadata Properties