17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

7.2 Image Geometry<br />

Using <strong>TET</strong>_get_image_info( ) you can retrieve geometric information for a placed image.<br />

The following values are available for each image in the <strong>TET</strong>_image_info structure (see<br />

Figure 7.1):<br />

> The x and y fields are the coordinates of the image reference point. The reference<br />

point is usually the lower left corner of the image. However, coordinate system<br />

transformations on the page may result in a different reference point. For example,<br />

the image may be mirrored horizontally with the result that the reference point becomes<br />

the upper left corner of the image.<br />

> The width and height fields correspond to the physical dimensions of the placed image<br />

on the page. They are provided in points (i.e. 1/72 inch).<br />

> The angle alpha describes the direction of the pixel rows. This angle will be in the<br />

range -180˚ < alpha ³ +180˚. The angle alpha rotates the image at its reference point.<br />

For upright images alpha will be 0˚.<br />

> The angle beta describes the direction of the pixel columns, relative to the perpendicular<br />

of alpha. This angle will be in the range -180˚ < beta ³ +180˚, but different<br />

from • 90˚. The angle beta skews the image, and beta=180˚ mirrors the image at the x<br />

axis. For upright images beta will be in the range -90˚ < beta < +90˚. If abs(beta) >90˚<br />

the image is mirrored at the baseline.<br />

> The imageid field contains the pCOS ID of the image. It can be used to retrieve detailed<br />

image information with pCOS functions and the actual image pixel data with<br />

<strong>TET</strong>_write_image_file( ) or <strong>TET</strong>_get_image_data( ).<br />

As a result of image transformations, the orientation of the extracted images may appear<br />

wrong since the extracted image data is based on the image object in the PDF. Any<br />

rotation or mirror transformations applied to the placed image on the PDF page will not<br />

be applied to the pixel data, but the original pixel data will be extracted.<br />

Image resolution. In order to calculate the image resolution in dpi (dots per inch) you<br />

must divide the image width in pixels by the image width in points and multiply by 72:<br />

while (tet.get_image_info(page) == 1) {<br />

String imagePath = "images[" + tet.imageid + "]";<br />

int width = (int) tet.pcos_get_number(doc, imagePath + "/Width");<br />

int height = (int) tet.pcos_get_number(doc, imagePath + "/Height");<br />

double xDpi = 72 * width / tet.width;<br />

height<br />

Fig. 7.1<br />

Image geometry<br />

(x, y)<br />

alpha<br />

width<br />

7.2 Image Geometry 83

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!