17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

artificial characters which will be delivered although a directly corresponding glyph is<br />

not available:<br />

> A composite character (see above) will map to a sequence of multiple Unicode characters.<br />

While the first character in the sequence corresponds to the actual glyph, the<br />

remaining characters do not correspond to any glyph.<br />

> Separator characters inserted via the lineseparator/wordseparator options are artefacts<br />

without any corresponding glyph.<br />

> While the leading value of a surrogate pair will be associated with a glyph, the trailing<br />

value will be treated as not having a corresponding glyph on the page (see Section<br />

6.5, »Unicode Pipeline«, page 68, section »Characters outside the BMP and surrogate<br />

handling«).<br />

6.2 Unicode Concepts 63

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!