17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Table 10.9 Suboptions for the structureanalysis option of <strong>TET</strong>_open_page( ) and <strong>TET</strong>_process_page( )<br />

option<br />

bullets<br />

list<br />

paragraph<br />

table<br />

description<br />

(List of option lists; only if list=true) Specifies combinations of Unicode characters and font names<br />

which are used as bullet characters in lists. Supported suboptions:<br />

bulletchars<br />

(List of Unicode values) One or more Unicode values for the bullet characters. If this suboption<br />

is not supplied, all characters using the specified fontname will be treated as bullet characters.<br />

fontname (String) Name of the font from which bullet characters are drawn. If this suboption is not<br />

supplied, the characters specified in the bulletchars suboption will always be treated as<br />

bullet characters.<br />

Examples:<br />

bullets={{fontname=ZapfDingbats}}<br />

bullets={{bulletchars={U+2022}}<br />

bullets={{fontname=KozGoPro-Medium bulletchars={U+2460 U+2461 U+2462 U+2463 U+2464}}<br />

(Boolean) Enable list recognition (default: false). If false, no information about list structure will be determined.<br />

(Boolean) Enable paragraph recognition (default: true). If false, no information about paragraph<br />

structure will be determined.<br />

(Boolean) Enable table recognition (default: true). If false, the table recognition engine will be disabled.<br />

10.5 Page Functions 141

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!