17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

imageFormat = "JPEG";<br />

break;<br />

case 30:<br />

imageFormat = "JPEG2000";<br />

break;<br />

case 40:<br />

imageFormat = "RAW";<br />

break;<br />

case 50:<br />

imageFormat = "JBIG2";<br />

break;<br />

default:<br />

System.err.println("write_image_file() returned unknown value "<br />

+ imageType + ", skipping image, error: "<br />

+ tet.get_errmsg());<br />

}<br />

XMP metadata for images. PDF uses the XMP format to attach metadata to the whole<br />

document or parts of it. You can find more information about XMP and its use in PDF at<br />

the following location: www.pdflib.com/knowledge-base/xmp-metadata/<br />

An image object may have XMP metadata associated with it in the PDF document. If<br />

XMP metadata is present, <strong>TET</strong> will by default embed it in the extracted image for the<br />

output formats JPEG and TIFF. This behavior can be controlled with the keepxmp option<br />

of <strong>TET</strong>_write_image_file( ) and <strong>TET</strong>_get_image_data( ). If this option has been set to false,<br />

<strong>TET</strong> will ignore image metadata when generating the image output file.<br />

The image_metadata topic in the pCOS Cookbook shows how to extract image metadata<br />

with the pCOS interface directly, without generating any image file.<br />

ICC profiles. An image in PDF may have an associated ICC profile which allows precise<br />

color reproduction. By default, <strong>TET</strong> processes attached ICC profiles and embeds them in<br />

the generated TIFF or JPEG image files. You can disable ICC profile embedding with the<br />

option keepiccprofile=false in <strong>TET</strong>_write_image_file( ) and <strong>TET</strong>_get_image_data( ). This will<br />

reduce the size of the image files at the expense of color fidelity. Disabling ICC profile<br />

embedding is therefore not recommended for workflows which require precise color<br />

representation.<br />

114 Chapter 8: Image <strong>Extraction</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!