PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Table 10.9 Suboptions for the structureanalysis option of <strong>TET</strong>_open_page( ) and <strong>TET</strong>_process_page( )<br />
option<br />
bullets<br />
list<br />
paragraph<br />
table<br />
description<br />
(List of option lists; only if list=true) Specifies combinations of Unicode characters and font names<br />
which are used as bullet characters in lists. Supported suboptions:<br />
bulletchars<br />
(List of Unicode values) One or more Unicode values for the bullet characters. If this suboption<br />
is not supplied, all characters using the specified fontname will be treated as bullet characters.<br />
fontname (String) Name of the font from which bullet characters are drawn. If this suboption is not<br />
supplied, the characters specified in the bulletchars suboption will always be treated as<br />
bullet characters.<br />
Examples:<br />
bullets={{fontname=ZapfDingbats}}<br />
bullets={{bulletchars={U+2022}}<br />
bullets={{fontname=KozGoPro-Medium bulletchars={U+2460 U+2461 U+2462 U+2463 U+2464}}<br />
(Boolean) Enable list recognition (default: false). If false, no information about list structure will be determined.<br />
(Boolean) Enable paragraph recognition (default: true). If false, no information about paragraph<br />
structure will be determined.<br />
(Boolean) Enable table recognition (default: true). If false, the table recognition engine will be disabled.<br />
10.5 Page Functions 141