17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Table 10.7 Suboptions for the layoutanalysis option of <strong>TET</strong>_open_page( ) and <strong>TET</strong>_process_page( )<br />

option<br />

layoutastable<br />

layoutcolumnhint<br />

description<br />

(Boolean) If true, the layout recognition engine will treat the zones on the page as one or more tables.<br />

The minimum number of columns which is required to consider the sequence as a table depends on the<br />

document style. If false, supertable recognition will be disabled (default: true).<br />

(Keyword) This option may improve zone reading order detection for complex layouts. Supported keywords<br />

(default: multicolumn):<br />

multicolumn<br />

The page contains multi-column text; zones will be sorted column by column.<br />

none No hint available; zone ordering will be determined by page content order.<br />

singlecolumn<br />

The page contains single-column text; zones will be sorted row by row.<br />

layoutdetect (Integer) Specifies the depth of recursive layout recognition (default: 1):<br />

0 No layout recognition.<br />

1 Layout recognition for the whole page. This is sufficient for the vast majority of documents.<br />

2 Layout recognition for the results of level 1. This is required for layouts with different multicolumn<br />

sublayouts and titles on different parts of the page as well as multi-paragraph tables.<br />

3 Layout recognition for the results of level 2. This is required only for very complex layouts.<br />

layoutrowhint<br />

mergetables<br />

splithint<br />

standalonefontsize<br />

(Option list) Control layout row processing. Supported options (default: none):<br />

full Enable layout row processing.<br />

none Disable layout row processing.<br />

separation (Keyword) Enable layout row processing, but disable it if layout recognition suspects a<br />

supertable. The following suboptions can be supplied:<br />

preservecolumns<br />

Try to keep vertical columns based on the geometric relationship between zones.<br />

This is recommended if zones within columns are separated by large gaps (e.g.<br />

caused by images).<br />

thick Try to combine neighboring zones and place them in the same layout row. This results<br />

in a smaller number larger layout rows. This is recommended for complex<br />

layouts, such as magazines and papers where paragraphs within columns are separated<br />

from each other by more than the font size, and for layouts with several<br />

multi-column articles one under the other.<br />

thin Try to separate neighboring zones and place them in different layout rows. This<br />

results in a larger number of smaller layout rows.<br />

Example: layoutanalysis = {layoutrowhint={full separation=thick}}<br />

(Integer) Tables with a single row will be skipped during table recognition, and treated as regular zones.<br />

If two sequential zones are tables (even with only a single row) they can be combined. (default: none):<br />

down Combine downstairs only.<br />

none Don’t merge.<br />

up Combine upstairs only.<br />

updown Combine in both directions.<br />

(Keyword or option list) Activate special treatment of double-page spreads (or even pages consisting of<br />

more spreads). The page may be divided vertically or horizontally in two or more sections. The keyword<br />

includebox means that the split areas will be defined by the includebox option. Alternatively the following<br />

options can be supplied:<br />

x (Float) Divider for the x axis, e.g. 0.5 for a double-page spread, 0.33 for a three-page spread.<br />

y (Float) Divider for y axis.<br />

(Float) Minimum font size for huge glyphs. Huge glyphs form single-glyph strips, and will not be combined<br />

with other zones (default: 70).<br />

10.5 Page Functions 139

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!