17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

7.4 Restrictions and Caveats<br />

Image color fidelity. <strong>TET</strong> does not degrade image quality when extracting images:<br />

> Raster images are never downsampled.<br />

> The color space of an image will be retained in the output. <strong>TET</strong> never applies any<br />

CMYK-to-RGB or similar color conversion.<br />

> The number of color components will always be unchanged. For example, RGB images<br />

will not be changed to grayscale if they contain only gray colors.<br />

Image workarounds. In some situations the color appearance of the extracted image<br />

may be different from the visual appearance of the PDF page. While the image shape is<br />

preserved, the colors may appear different because of the following reasons:<br />

> Image masks are be applied.<br />

> Colorized grayscale images are extracted without the color, but as grayscale images.<br />

> Since DeviceN color is not supported in TIFF, images with the DeviceN colorspace are<br />

extracted as grayscale, RGB, or CMYK images for N=1, 3, and 4, respectively. For N>4<br />

CMYK TIFF images with one or more alpha channels are generated.<br />

> Images with Separation colorspace are extracted as grayscale images. The spot color<br />

used to colorize the image will be lost.<br />

> Images with Indexed ICCBased colorspace: the ICC profile will be ignored.<br />

Unexpected results when extracting images. In some cases the shape of extracted images<br />

may appear different from the PDF page:<br />

> Images may appear mirrored horizontally (upside down) or vertically. This is caused<br />

by the fact that <strong>TET</strong> extracts the original pixel data of the image, without respect to<br />

any transformation which may have been applied to the image on the PDF page.<br />

> Since image masks are ignored, masking effects will not be reflected in the extracted<br />

image.<br />

Unsupported image types. The following types of PDF images can not be extracted, i.e.<br />

<strong>TET</strong>_write_image_file( ) will return -1 in these cases:<br />

> PDF inline images<br />

> Images with JBIG2 compression<br />

> Images with Indexed Lab colorspace.<br />

7.4 Restrictions and Caveats 87

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!