PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
imageFormat = "JPEG";<br />
break;<br />
case 30:<br />
imageFormat = "JPEG2000";<br />
break;<br />
case 40:<br />
imageFormat = "RAW";<br />
break;<br />
case 50:<br />
imageFormat = "JBIG2";<br />
break;<br />
default:<br />
System.err.println("write_image_file() returned unknown value "<br />
+ imageType + ", skipping image, error: "<br />
+ tet.get_errmsg());<br />
}<br />
XMP metadata for images. PDF uses the XMP format to attach metadata to the whole<br />
document or parts of it. You can find more information about XMP and its use in PDF at<br />
the following location: www.pdflib.com/knowledge-base/xmp-metadata/<br />
An image object may have XMP metadata associated with it in the PDF document. If<br />
XMP metadata is present, <strong>TET</strong> will by default embed it in the extracted image for the<br />
output formats JPEG and TIFF. This behavior can be controlled with the keepxmp option<br />
of <strong>TET</strong>_write_image_file( ) and <strong>TET</strong>_get_image_data( ). If this option has been set to false,<br />
<strong>TET</strong> will ignore image metadata when generating the image output file.<br />
The image_metadata topic in the pCOS Cookbook shows how to extract image metadata<br />
with the pCOS interface directly, without generating any image file.<br />
ICC profiles. An image in PDF may have an associated ICC profile which allows precise<br />
color reproduction. By default, <strong>TET</strong> processes attached ICC profiles and embeds them in<br />
the generated TIFF or JPEG image files. You can disable ICC profile embedding with the<br />
option keepiccprofile=false in <strong>TET</strong>_write_image_file( ) and <strong>TET</strong>_get_image_data( ). This will<br />
reduce the size of the image files at the expense of color fidelity. Disabling ICC profile<br />
embedding is therefore not recommended for workflows which require precise color<br />
representation.<br />
114 Chapter 8: Image <strong>Extraction</strong>