PDFlib TET PDF IFilter 4.0 Manual

More documents

Recommendations

Info

Extended pCOS paths. The pCOS (PDFlib Comprehensive Object Syntax) interface provides a simple and elegant facility for retrieving arbitrary information from all sections of a PDF document which do not describe page contents, such as page dimensions, metadata, interactive elements, etc. Examples for using the pCOS interface and a description of the pCOS path syntax are contained in the pCOS Path Reference which is available as a separate document. Additional examples can be found in the pCOS Cookbook at www.pdflib.com/pcos-cookbook/. pCOS can be used in TET PDF IFilter to address information about a document. pCOS paths represent some aspect of the PDF document such as bookmarks, font name, or page size, and can also address document info entries or XMP metadata streams (but not properties within an XMP stream). While the pCOS API functions are not available in TET PDF IFilter, you can supply pCOS paths as expressions in the XML configuration file in order to index information about a document Table 3.1 lists the pCOS paths for some commonly used PDF objects. Many pCOS objects are represented by arrays which require an array index in angle brackets, e.g. pages[0] denotes the first page (pages are counted starting at 0). In addition to all pCOS paths supported by the TET product (which forms the basis of TET PDF IFilter), the IFilter supports the following syntax extension for pCOS paths: You can use a » * « (asterisk) wildcard character instead of an array or dictionary index. This means that TET PDF IFilter will iterate over all possible values of the array index, and include all corresponding object values in the indexing process. An arbitrary number of wildcards may be used within a single pCOS path. Table 3.1 pCOS paths for various PDF objects pCOS path type explanation length:pages number number of pages in the document /Info/Title string standard document info field Title /Info/ArticleNumber string custom document info field ArticleNumber (document info entries can use arbitrary names) /Root/Metadata stream XMP stream with the document’s metadata images[*]/Metadata stream XMP metadata streams for all images in the document fonts[*]/name string name of a font; the number of entries can be retrieved with length:fonts length:fonts string the number of fonts in a document length:images string the number of images in a document fonts[*]/embedded boolean embedding status of a font pages[*]/width number width of the visible area of the page pages[*]/annots[*]/A/URI string target URL of the Web links on all pages pages[*]/Resources/ Properties[*]/Metadata stream XMP metadata for placed artwork on all pages in the document fields[*]/V string text contents (value) of all form fields on all pages in the document 38 Chapter 3: Metadata Properties
XML configuration for metadata sources. Sources of metadata properties can be configured in the Source element (child of the Property element) of the XML configuration file. While the pdfObject attribute contains an extended pCOS path for a PDF object, the xmpName attribute contains the schema prefix and name of an XMP property: 3.1 Sources of Metadata in PDF 39
Page 1 and 2: ABC TET PDF IFilter Version 4.0 Ent
Page 3 and 4: Contents 0 Installing TET PDF IFilt
Page 5 and 6: 0 Installing TET PDF IFilter TET PD
Page 7 and 8: 1 Getting Started This chapter desc
Page 9 and 10: Table 1.1 Query syntax examples for
Page 11 and 12: C:\Program Files\Common Files\Micro
Page 13 and 14: 1.4 SQL Server System requirements.
Page 15 and 16: Now you can create the full-text in
Page 17 and 18: Type the following to stop the serv
Page 19 and 20: 2 Indexing PDF Contents 2.1 PDF Doc
Page 21 and 22: XMP metadata on document level. XMP
Page 23 and 24: play conformance information for th
Page 25 and 26: Note TET PDF IFilter does not apply
Page 27 and 28: 2.3 PDF Versions and Protected Docu
Page 29 and 30: Table 2.3 Examples for the fold opt
Page 31 and 32: Table 2.5 Compatibility decompositi
Page 33 and 34: 2.4.3 Unicode Normalization The Uni
Page 35: 2.5 Custom Glyph Mapping Tables Alt
Page 40 and 41: 3.2 Metadata Organization Metadata
Page 42 and 43: 3.4 Custom Metadata Properties Cust
Page 44 and 45: 3.5 Multivalued Properties Metadata
Page 46 and 47: Scenario 1: Transparently blend met
Page 48 and 49: Table 3.3 XML configuration for Win
Page 50 and 51: SQL queries for metadata properties
Page 52 and 53: SOME ARRAY ['Bembo', 'TimesNewRoman
Page 54 and 55: defined Metadata Properties« (if y
Page 56 and 57: Table 3.6 Property data types for S
Page 58 and 59: 3.9 Metadata in SQL Server SQL Serv
Page 60 and 61: Predefined properties. A column def
Page 62 and 63: Table 3.11 Metadata query examples
Page 64 and 65: An XSD schema description for the X
Page 66 and 67: Table 4.1 XML elements and attribut
Page 68 and 69: Table 4.1 XML elements and attribut
Page 71 and 72: 5 Troubleshooting 5.1 TET PDF IFilt
Page 73 and 74: 5.2 Problems with TET PDF IFilter O
Page 75 and 76: HKEY_LOCAL_MACHINE\SOFTWARE\Microso
Page 77 and 78: Sample output: FILE: udhr_japanese.
Page 79 and 80: A Predefined Metadata Properties Th
Page 81 and 82: Table A.1 Property handling in TET
Page 83: Table A.1 Property handling in TET
Page 87 and 88: Index A annotations 21 B bookmarks
Page 90:
ABC PDFlib GmbH Franziska-Bilek-Weg
show all

PDFlib TET PDF IFilter 4.0 Manual

Create successful ePaper yourself

Delete template?

Save as template?