PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
7.4 Restrictions and Caveats<br />
Image color fidelity. <strong>TET</strong> does not degrade image quality when extracting images:<br />
> Raster images are never downsampled.<br />
> The color space of an image will be retained in the output. <strong>TET</strong> never applies any<br />
CMYK-to-RGB or similar color conversion.<br />
> The number of color components will always be unchanged. For example, RGB images<br />
will not be changed to grayscale if they contain only gray colors.<br />
Image workarounds. In some situations the color appearance of the extracted image<br />
may be different from the visual appearance of the PDF page. While the image shape is<br />
preserved, the colors may appear different because of the following reasons:<br />
> Image masks are be applied.<br />
> Colorized grayscale images are extracted without the color, but as grayscale images.<br />
> Since DeviceN color is not supported in TIFF, images with the DeviceN colorspace are<br />
extracted as grayscale, RGB, or CMYK images for N=1, 3, and 4, respectively. For N>4<br />
CMYK TIFF images with one or more alpha channels are generated.<br />
> Images with Separation colorspace are extracted as grayscale images. The spot color<br />
used to colorize the image will be lost.<br />
> Images with Indexed ICCBased colorspace: the ICC profile will be ignored.<br />
Unexpected results when extracting images. In some cases the shape of extracted images<br />
may appear different from the PDF page:<br />
> Images may appear mirrored horizontally (upside down) or vertically. This is caused<br />
by the fact that <strong>TET</strong> extracts the original pixel data of the image, without respect to<br />
any transformation which may have been applied to the image on the PDF page.<br />
> Since image masks are ignored, masking effects will not be reflected in the extracted<br />
image.<br />
Unsupported image types. The following types of PDF images can not be extracted, i.e.<br />
<strong>TET</strong>_write_image_file( ) will return -1 in these cases:<br />
> PDF inline images<br />
> Images with JBIG2 compression<br />
> Images with Indexed Lab colorspace.<br />
7.4 Restrictions and Caveats 87