PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
PDF packages and portfolios. Acrobat 8 (PDF 1.7) introduced the concept of PDF packages<br />
which are file attachments with additional properties. Acrobat 9 (PDF 1.7 extension<br />
level 3) extends this concept with the introduction of PDF portfolios.<br />
> How to display with Acrobat 8/9: Acrobat presents the cover sheet of the package/<br />
portfolio and the constituent PDF documents with dedicated user interface elements<br />
for PDF packages.<br />
> How to search a single PDF package with Acrobat 8/9: Edit, Search and in the Look In:<br />
pull-down select In the Entire PDF Package<br />
> How to search multiple PDF packages with Acrobat 8/9: not available<br />
> Sample code for the <strong>TET</strong> library: get_attachments mini sample<br />
> <strong>TET</strong>ML element: /<strong>TET</strong>/Document/Attachments/Attachment/Document<br />
PDF properties. This domain does not explicitly contain text, but is used as a pseudo<br />
domain which collects various intrinsic properties of a PDF document, e.g. PDF/X and<br />
PDF/A status, Tagged PDF status, etc.<br />
> How to display with Acrobat 8: Acrobat 8 does not directly display standards conformance<br />
information, but you can find relevant entries in File, Properties..., Custom<br />
or File, Properties..., Additional Metadata... You can also use the free <strong>PDFlib</strong> custom<br />
XMP panel 1 for ISO standards to explicitly display conformance information for the<br />
PDF/A-1, PDF/X-4, PDF/X-5, and PDF/E-1 standards.<br />
> Acrobat 9: View, Navigation Panels, Standards (only present for standard-conforming<br />
PDFs)<br />
> How to search with Acrobat 8/9: not available<br />
> Sample code for the <strong>TET</strong> library: dumper mini sample<br />
> <strong>TET</strong>ML elements and attributes: /<strong>TET</strong>/Document/@pdfa, /<strong>TET</strong>/Document/@pdfx<br />
1. See www.pdflib.com/developer/xmp-metadata/xmp-panels<br />
60 Chapter 6: <strong>Text</strong> <strong>Extraction</strong>