PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
7.2 Image Geometry<br />
Using <strong>TET</strong>_get_image_info( ) you can retrieve geometric information for a placed image.<br />
The following values are available for each image in the <strong>TET</strong>_image_info structure (see<br />
Figure 7.1):<br />
> The x and y fields are the coordinates of the image reference point. The reference<br />
point is usually the lower left corner of the image. However, coordinate system<br />
transformations on the page may result in a different reference point. For example,<br />
the image may be mirrored horizontally with the result that the reference point becomes<br />
the upper left corner of the image.<br />
> The width and height fields correspond to the physical dimensions of the placed image<br />
on the page. They are provided in points (i.e. 1/72 inch).<br />
> The angle alpha describes the direction of the pixel rows. This angle will be in the<br />
range -180˚ < alpha ³ +180˚. The angle alpha rotates the image at its reference point.<br />
For upright images alpha will be 0˚.<br />
> The angle beta describes the direction of the pixel columns, relative to the perpendicular<br />
of alpha. This angle will be in the range -180˚ < beta ³ +180˚, but different<br />
from • 90˚. The angle beta skews the image, and beta=180˚ mirrors the image at the x<br />
axis. For upright images beta will be in the range -90˚ < beta < +90˚. If abs(beta) >90˚<br />
the image is mirrored at the baseline.<br />
> The imageid field contains the pCOS ID of the image. It can be used to retrieve detailed<br />
image information with pCOS functions and the actual image pixel data with<br />
<strong>TET</strong>_write_image_file( ) or <strong>TET</strong>_get_image_data( ).<br />
As a result of image transformations, the orientation of the extracted images may appear<br />
wrong since the extracted image data is based on the image object in the PDF. Any<br />
rotation or mirror transformations applied to the placed image on the PDF page will not<br />
be applied to the pixel data, but the original pixel data will be extracted.<br />
Image resolution. In order to calculate the image resolution in dpi (dots per inch) you<br />
must divide the image width in pixels by the image width in points and multiply by 72:<br />
while (tet.get_image_info(page) == 1) {<br />
String imagePath = "images[" + tet.imageid + "]";<br />
int width = (int) tet.pcos_get_number(doc, imagePath + "/Width");<br />
int height = (int) tet.pcos_get_number(doc, imagePath + "/Height");<br />
double xDpi = 72 * width / tet.width;<br />
height<br />
Fig. 7.1<br />
Image geometry<br />
(x, y)<br />
alpha<br />
width<br />
7.2 Image Geometry 83