PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
artificial characters which will be delivered although a directly corresponding glyph is<br />
not available:<br />
> A composite character (see above) will map to a sequence of multiple Unicode characters.<br />
While the first character in the sequence corresponds to the actual glyph, the<br />
remaining characters do not correspond to any glyph.<br />
> Separator characters inserted via the lineseparator/wordseparator options are artefacts<br />
without any corresponding glyph.<br />
> While the leading value of a surrogate pair will be associated with a glyph, the trailing<br />
value will be treated as not having a corresponding glyph on the page (see Section<br />
6.5, »Unicode Pipeline«, page 68, section »Characters outside the BMP and surrogate<br />
handling«).<br />
6.2 Unicode Concepts 63