PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
PDFlib Text Extraction Toolkit (TET) Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
[TheSansBold-Plain/13.98] 1<br />
[TheSansBold-Plain/13.98] Installing<br />
[TheSansBold-Plain/13.98] <strong>PDFlib</strong><br />
[TheSansBold-Plain/13.98] FontReporter<br />
[TheSansBold-Plain/13.98] 2<br />
[TheSansBold-Plain/13.98] Working<br />
[TheSansBold-Plain/13.98] with<br />
[TheSansBold-Plain/13.98] FontReporter<br />
[TheSansBold-Plain/13.98] A<br />
[TheSansBold-Plain/13.98] Revision<br />
[TheSansBold-Plain/13.98] History<br />
[TheSansBold-Plain/24] 1<br />
[TheSansBold-Plain/24] Installing<br />
[TheSansBold-Plain/24] <strong>PDFlib</strong><br />
[TheSansBold-Plain/24] FontReporter<br />
...<br />
Searching for font usage. The fontfinder.xsl stylesheet expects <strong>TET</strong>ML input in glyph or<br />
wordplus mode. For all fonts in a document, it lists all occurrences of text using this particular<br />
font along with page number and the position on the page. This may be useful<br />
for detecting unwanted fonts and checking consistency, locating use of a particular bad<br />
font size, etc.<br />
TheSansExtraBold-Plain used on:<br />
page 1:<br />
(111, 636), (165, 636), (219, 636), (292, 636), (301, 636), (178, 603), (221, 603), (226,<br />
603),<br />
(272, 603), (277, 603), (102, 375), (252, 375), (261, 375), (267, 375)<br />
TheSans-Plain used on:<br />
page 1:<br />
(102, 266), (119, 266), (179, 266), (208, 266), (296, 266), (346, 266), (367, 266)<br />
...<br />
Font statistics. The fontstat.xsl stylesheet expects <strong>TET</strong>ML input in glyph or wordplus<br />
mode. It generates font and glyph statistics. This may be useful for quality control and<br />
even accessibility testing since unmapped glyphs (i.e. glyphs which cannot be mapped<br />
to any Unicode character) will also be reported for each font.<br />
19894 total glyphs in the document; breakdown by font:<br />
68.71% ThesisAntiqua-Normal: 13669 glyphs<br />
22.89% TheSans-Italic: 4553 glyphs<br />
6.38% TheSansBold-Plain: 1269 glyphs<br />
0.9% TheSansMonoCondensed-Plain: 179 glyphs<br />
0.49% TheSansBold-Italic: 98 glyphs<br />
0.27% TheSansExtraBold-Plain: 54 glyphs<br />
0.21% TheSerif-Caps: 42 glyphs<br />
0.15% TheSans-Plain: 29 glyphs<br />
0.01% Gen_TheSans-Plain: 1 glyphs<br />
Create an index. The index.xsl stylesheet expects <strong>TET</strong>ML input in word or wordplus<br />
mode. It generates a back-of-the-book index, i.e. an alphabetically sorted list of words in<br />
the document and the corresponding page numbers. Numbers and punctuation characters<br />
will be ignored.<br />
138 Chapter 9: <strong>TET</strong> Markup Language (<strong>TET</strong>ML)