17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Table 10.7 Suboptions for the layoutanalysis option of <strong>TET</strong>_open_page( ) and <strong>TET</strong>_process_page( )<br />

option<br />

supertablecolumns<br />

description<br />

(Integer; only if layoutastable=true) Minium number of columns in a layout row to consider the sequence<br />

of zones as a supertable. When a table is created from paragraphs, these columns are recognized<br />

as separate zones instead of being combined. As a consequence of this, layout recognition can identify<br />

these zone sequences as a table (default: 4).<br />

tabledetect (Integer) Specifies the depth of recursive table recognition (default: 1):<br />

0 No table recognition.<br />

1 Table recognition for each zone.<br />

2 Table recognition for each table cell detected in level 1. This is required for nested tables and<br />

resolving row spans.<br />

Table 10.8 Suboptions for the imageanalysis option of <strong>TET</strong>_open_page( ) and <strong>TET</strong>_process_page( )<br />

option<br />

smallimages<br />

merge<br />

description<br />

(Option list) Control small image removal. Small images must often be ignored since they are artifacts<br />

and not real images. Supported options:<br />

disable (Boolean) If true, small image removal will be disabled. Default: false<br />

maxarea (Float) Maximum area (=width x height) in pixels of an image to be considered as a small<br />

image. Default: 500<br />

maxcount (Integer) Maximum allowed number of small images. If more small images are found all of<br />

them will be removed. Default: 50<br />

(Option list) Control image merging. This process combines adjacent images which together may form a<br />

single larger image. This is useful for multi-strip images where the individual strips have been preserved<br />

in the PDF, and for background images which are broken into a large number of very small images.<br />

Supported options:<br />

disable (Boolean) If true, image merging will be disabled. Default: false<br />

gap<br />

(Float) Maximum gap in points between two images to be considered for merging. Default:<br />

1.0 (not 0.0 because of unavoidable inaccuracies in the position calculations)<br />

C++ void close_page(int page)<br />

C# Java void close_page(int page)<br />

Perl PHP <strong>TET</strong>_close_page(resource tet, long page)<br />

VB Sub close_page(page As Long)<br />

C void <strong>TET</strong>_close_page(<strong>TET</strong> *tet, int page)<br />

Release a page handle and all related resources.<br />

page A valid page handle obtained with <strong>TET</strong>_open_page( ).<br />

Details All open pages of the document will be closed automatically when <strong>TET</strong>_close_document( )<br />

is called. It is good programming practice, however, to close pages explicitly when they<br />

are no longer needed. Closed page handles must no longer be used in any function call.<br />

140 Chapter 10: <strong>TET</strong> Library API Reference

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!