PDFlib TET PDF IFilter 4.0 Manual
PDFlib TET PDF IFilter 4.0 Manual
PDFlib TET PDF IFilter 4.0 Manual
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Table 2.3 Examples for the fold option<br />
description and option list before folding after folding<br />
Replace all characters in a Unicode set with a specific character<br />
Space folding: map all variants of Unicode space characters to U+0020:<br />
fold={{[:blank:] U+0020}}<br />
Dashes folding: map all variants of Unicode dash characters to U+002D:<br />
fold={{[:Dash:] U+002D}}<br />
Replace all unassigned characters (i.e. Unicode code points to which no character<br />
is assigned) with U+FFFD: fold={{[:Unassigned:] U+FFFD}}<br />
Special handling for individual characters<br />
Preserve all hyphen characters at line breaks while keeping the remaining default<br />
foldings. Since these characters are identified internally in <strong>TET</strong> (as opposed to having<br />
a fixed Unicode property) the keyword _dehyphenation is used to identify the<br />
folding’s domain: fold={{_dehyphenation preserve}}<br />
Preserve Arabic Tatweel characters (which are removed by default):<br />
fold={{[U+0640] preserve}}<br />
Replace various punctuation characters with their ASCII counterparts:<br />
fold={ {[U+2018] U+0027} {[U+2019] U+0027} {[U+201C] U+0022}<br />
{[U+201D] U+0022}}<br />
<br />
U+00A0<br />
<br />
U+2011<br />
<br />
U+03A2<br />
<br />
U+002D<br />
<br />
U+0640<br />
<br />
U+201C<br />
<br />
U+0020<br />
<br />
U+002D<br />
<br />
U+FFFD<br />
<br />
U+002D<br />
<br />
U+0640<br />
<br />
U+002D U+0022<br />
2.4 Unicode Postprocessing 29