17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Table 9.3 <strong>TET</strong>ML elements and attributes<br />

<strong>TET</strong>ML element<br />

Encryption<br />

Exception<br />

Font<br />

Fonts<br />

Glyph<br />

Image<br />

Images<br />

Line<br />

Metadata<br />

Options<br />

Page<br />

Pages<br />

Para<br />

PlacedImage<br />

Resources<br />

Row<br />

Table<br />

<strong>TET</strong><br />

<strong>Text</strong><br />

Word<br />

description and attributes<br />

Describes various security settings.<br />

Attributes: keylength, algorithm (<strong>TET</strong> 4.1: new values 8-11), attachment (<strong>TET</strong> 4.1), description<br />

(<strong>TET</strong> 4.1: new values for algorithms 8-11), masterpassword, userpassword, noprint, nomodify,<br />

nocopy, noannots, noassemble, noforms, noaccessible, nohiresprint, plainmetadata<br />

Error message and number associated with an exception which was thrown by <strong>TET</strong>. The<br />

Exception element may replace other elements if not enough information can be extracted from<br />

the input because of malformed PDF data structures.<br />

Attribute: errnum<br />

Describes a font resource. The required name attribute contains the canonical font name, while<br />

the optional fullname attribute contains the font name including subset prefix.<br />

Attributes: ascender (<strong>TET</strong> 4.1), capheight (<strong>TET</strong> 4.1), descender (<strong>TET</strong> 4.1), embedded, fullname (<strong>TET</strong><br />

4.0), id, italicangle (<strong>TET</strong> 4.1), type, name, vertical, weight (<strong>TET</strong> 4.1), xheight (<strong>TET</strong> 4.1)<br />

Container of Font elements<br />

Describes font and geometry details for a single glyph. The element content holds the Unicode<br />

character(s) produced by this glyph. A single glyph may produce more than one character, e.g. for<br />

ligatures. The Glyph elements for a word are grouped within one or more Box elements.<br />

Attributes: x, y 1 , width, alpha 1 , beta 1 , shadow (<strong>TET</strong> 4.0), dropcap (<strong>TET</strong> 4.0), font, size, sub (<strong>TET</strong><br />

4.0), sup (<strong>TET</strong> 4.0), textrendering, unknown, dehyphenation (<strong>TET</strong> 4.0)<br />

Describes an image resource, i.e. the actual pixel array comprising the image.<br />

Attributes: bitsPerComponent, colorspace, extractedAs (<strong>TET</strong> 4.0, additional value introduced<br />

with <strong>TET</strong> 4.2), height, id, mask, maskonly, mergetype, width<br />

Container of Image elements<br />

<strong>Text</strong> for a single line. <strong>TET</strong> 4.0: Line may also contain Word elements.<br />

XMP metadata which can be associated with the document, a font, or an image<br />

Document or page options used for generating the <strong>TET</strong>ML<br />

Contents of a single page.<br />

Attributes: number, height, width, topdown (<strong>TET</strong> 4.0)<br />

Container of Page elements<br />

<strong>Text</strong> comprising a single paragraph<br />

Describes an instance of an image placed on the page.<br />

Attributes: alpha 1 , beta 1 , height, image, width, x, y 1<br />

Colorspace, font, and image resources<br />

One or more table cells<br />

One or more table rows<br />

Root element<br />

Attribute: version (<strong>TET</strong> 4.2 creates 4.2; <strong>TET</strong> 4.1 creates 4.1; <strong>TET</strong> 4.0 creates 4.0, <strong>TET</strong> 3 creates 3)<br />

<strong>Text</strong> contents of a word or other element<br />

Single word<br />

1. All vertical coordinates and angles are expressed relative to the lower left or upper left corner subject to the topdown page option.<br />

132 Chapter 9: <strong>TET</strong> Markup Language (<strong>TET</strong>ML)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!