17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

[TheSansBold-Plain/13.98] 1<br />

[TheSansBold-Plain/13.98] Installing<br />

[TheSansBold-Plain/13.98] <strong>PDFlib</strong><br />

[TheSansBold-Plain/13.98] FontReporter<br />

[TheSansBold-Plain/13.98] 2<br />

[TheSansBold-Plain/13.98] Working<br />

[TheSansBold-Plain/13.98] with<br />

[TheSansBold-Plain/13.98] FontReporter<br />

[TheSansBold-Plain/13.98] A<br />

[TheSansBold-Plain/13.98] Revision<br />

[TheSansBold-Plain/13.98] History<br />

[TheSansBold-Plain/24] 1<br />

[TheSansBold-Plain/24] Installing<br />

[TheSansBold-Plain/24] <strong>PDFlib</strong><br />

[TheSansBold-Plain/24] FontReporter<br />

...<br />

Searching for font usage. The fontfinder.xsl stylesheet expects <strong>TET</strong>ML input in glyph or<br />

wordplus mode. For all fonts in a document, it lists all occurrences of text using this particular<br />

font along with page number and the position on the page. This may be useful<br />

for detecting unwanted fonts and checking consistency, locating use of a particular bad<br />

font size, etc.<br />

TheSansExtraBold-Plain used on:<br />

page 1:<br />

(111, 636), (165, 636), (219, 636), (292, 636), (301, 636), (178, 603), (221, 603), (226,<br />

603),<br />

(272, 603), (277, 603), (102, 375), (252, 375), (261, 375), (267, 375)<br />

TheSans-Plain used on:<br />

page 1:<br />

(102, 266), (119, 266), (179, 266), (208, 266), (296, 266), (346, 266), (367, 266)<br />

...<br />

Font statistics. The fontstat.xsl stylesheet expects <strong>TET</strong>ML input in glyph or wordplus<br />

mode. It generates font and glyph statistics. This may be useful for quality control and<br />

even accessibility testing since unmapped glyphs (i.e. glyphs which cannot be mapped<br />

to any Unicode character) will also be reported for each font.<br />

19894 total glyphs in the document; breakdown by font:<br />

68.71% ThesisAntiqua-Normal: 13669 glyphs<br />

22.89% TheSans-Italic: 4553 glyphs<br />

6.38% TheSansBold-Plain: 1269 glyphs<br />

0.9% TheSansMonoCondensed-Plain: 179 glyphs<br />

0.49% TheSansBold-Italic: 98 glyphs<br />

0.27% TheSansExtraBold-Plain: 54 glyphs<br />

0.21% TheSerif-Caps: 42 glyphs<br />

0.15% TheSans-Plain: 29 glyphs<br />

0.01% Gen_TheSans-Plain: 1 glyphs<br />

Create an index. The index.xsl stylesheet expects <strong>TET</strong>ML input in word or wordplus<br />

mode. It generates a back-of-the-book index, i.e. an alphabetically sorted list of words in<br />

the document and the corresponding page numbers. Numbers and punctuation characters<br />

will be ignored.<br />

102 Chapter 8: <strong>TET</strong> Markup Language (<strong>TET</strong>ML)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!